Mellanox WinOF VPI User Manual Rev 4.60 www.mellanox.
Rev 4.60 NOTE: THIS HARDWARE, SOFTWARE OR TEST SUITE PRODUCT (“PRODUCT(S)”) AND ITS RELATED DOCUMENTATION ARE PROVIDED BY MELLANOX TECHNOLOGIES “AS-IS” WITH ALL FAULTS OF ANY KIND AND SOLELY FOR THE PURPOSE OF AIDING THE CUSTOMER IN TESTING APPLICATIONS THAT USE THE PRODUCTS IN DESIGNATED SOLUTIONS. THE CUSTOMER'S MANUFACTURING TEST ENVIRONMENT HAS NOT MET THE STANDARDS SET BY MELLANOX TECHNOLOGIES TO FULLY QUALIFY THE PRODUCTO(S) AND/OR THE SYSTEM USING IT.
Rev 4.60 Table of Contents Document Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 About this Manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rev 4.60 3.9.6 DSCP Sanity Testing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Chapter 4 Deploying Windows Server 2012 and Above with SMB Direct. . . . . . . . . . 38 4.1 4.2 4.3 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Hardware and Software Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 SMB Configuration Verification . . . . . . . . . .
Rev 4.60 8.3.14 8.3.15 8.3.16 8.3.17 8.3.18 8.3.19 8.3.20 8.3.21 8.3.22 8.4 vstat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 osmtest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 ibaddr. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 ibcacheedit. . . . . . . . . . . . . . . . . . . .
Rev 4.60 List of Tables Table 1 Table 2 Table 3 Table 4 Table 5 Table 6 Table 7 Table 8 Table 9 Table 10 Table 11 Table 12 Table 13 Table 14 Table 15 Table 16 Table 17 Table 18 Table 19 Table 20 Table 21 Table 22 Table 23 Table 24 Table 25 Table 26 Table 27 Table 28 Table 29 Table 30 Table 31 Table 32 Table 33 Table 34 Table 35 Table 36 Table 37 Table 38 Table 39 Table 40 Table 41 Table 42 Table 43 Document Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rev 4.60 Table 44 Table 45 Table 46 Table 47 Table 48 Table 49 Table 50 nd_write_bw Flags and Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 nd_write_lat Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 nd_read_bw Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 nd_read_lat Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Rev 4.60 Document Revision History Table 1 - Document Revision History Document Revision Rev 4.60 Date December 30, 2013 Changes Updated the following sections: • • • Section 3.7.2.2, “Configuring Windows Host”, on page 30 - Updated the example in Step 5 Section 6.1.4.1, “Performance Tuning Tool Application”, on page 47 - Updated the Options table Section 6.
Rev 4.60 Table 1 - Document Revision History Document Revision Rev 4.40 Date June 10, 2013 Changes Updated the following sections: • Section 2.2, “Downloading Mellanox Firmware Tools”, on page 14 • Section 8, “InfiniBand Fabric”, on page 61 • Section 10, “Troubleshooting”, on page 124 • Section 11, “Documentation”, on page 128 • Section , “Options”, on page 48 Added the following sections: • “perf_tuning”Appendix ,“Synopsys,” on page 48 • Section , “”, on page 15 • Section 3.7.
Rev 4.60 Table 1 - Document Revision History Document Revision Date Changes Rev 2.1.3 January 28. 2011 Complete restructure Rev 2.1.2 October 10, 2010 • • • • Removed section Debug Options. Updated Section 3, “Uninstalling Mellanox VPI Driver,” on page 11 Added Section 6, “InfiniBand Fabric,” on page 38 and its subsections Added Section 6.3, “InfiniBand Fabric Performance Utilities,” on page 71 and its subsections Rev 2.1.1.
Rev 4.60 About this Manual Scope The document describes WinOF Rev 4.60 features, performance, InfiniBand diagnostic, tools content and configuration. Additionally, this document provides information on various performance tools supplied with this version. Intended Audience This manual is intended for system administrators responsible for the installation, configuration, management and maintenance of the software and hardware of VPI (InfiniBand, Ethernet) adapter cards.
Rev 4.60 Common Abbreviations and Acronyms Table 3 - Abbreviations and Acronyms Abbreviation / Acronym Whole Word / Description B (Capital) ‘B’ is used to indicate size in bytes or multiples of bytes (e.g., 1KB = 1024 bytes, and 1MB = 1048576 bytes) b (Small) ‘b’ is used to indicate size in bits or multiples of bits (e.g.
Rev 4.60 1 Introduction This User Manual addresses the Mellanox WinOF driver Rev 4.60 package. Mellanox WinOF is composed of several software modules that contain an InfiniBand and Ethernet driver. The Mellanox WinOF driver supports 10 or 40 Gb/s Ethernet, and 40 or 56 Gb/s InfiniBand network ports. The port type is determined upon boot based on card capabilities and user settings.
Rev 4.60 2 Firmware Upgrade The adapter card may not have been shipped with the latest firmware version. The section below describes how to update firmware. 2.1 Downloading Firmware To identify your adapter card, perform the following steps: Step 1. Extract the HCA PSID. Run “vstat”. Step 2. Download the latest firmware using the PSID from the step above. Go to: http://www.mellanox.com > Support > Support Downloader, Step 3. 2.2 Unzip the binary image (.zip file).
Rev 4.60 Step 2. Install and Run WinMFT. To install the WinMFT package, double click the MSI package or run it from the command prompt. Installing the WinMFT package from the command line requires administrator privileges. Example: PS $ msiexec.exe /i WinMFT_x64_3_0_0_17.msi To check the device status: Step 1. Start/Stop mst. PS $ mst start OR PS $ mst stop Step 2. Check the device’s status.
Rev 4.60 3 Driver Features The Mellanox VPI WinOF driver release introduces the following capabilities: • One or two ports • Up to 16 Rx queues per port • Rx steering mode (RSS) • Hardware Tx/Rx checksum calculation • Large Send off-load (i.e.
Rev 4.60 3.3 Receive Side Scaling (RSS) Mellanox WinOF Rev 4.60 IPoIB and Ethernet drivers use NDIS 6.30 new RSS capabilities. The main changes are: • Supports unlimited number of processors (previously 64) • Individual network adapter RSS configuration usage RSS capabilities can be set per individual adapters as well as global. To do so, set the registry keys listed below: Table 4 - Registry Keys Setting Sub-key Description HKLM\SYSTEM\CurrentControlSet\Control\Class\{4d36e972-e325-11ce-bfc108002
Rev 4.60 Step 4. 3.4 Select the Driver key, and obtain the nn number. Port Configuration After WinOF OFED VPI installation, it is possible to modify the network protocol that runs on each port of VPI adapter cards. Each port can be set to run as InfiniBand, Ethernet or Auto Sensing. 3.4.1 Auto Sensing Auto Sensing enables the NIC to automatically sense the link type (InfiniBand or Ethernet) based on the cable connected to the port and load the appropriate driver stack (InfiniBand or Ethernet).
Rev 4.60 3.4.2 Port Protocol Configuration Step 1. Display the Device Manager and expand “System devices”. Step 2. Right-click on the Mellanox ConnectX Ethernet network adapter and left-click Properties. Select the Port Protocol tab from the Properties window. The “Port Protocol” tab is displayed only if the NIC is a VPI (IB and ETH). The figure below is an example of the displayed Port Protocol window for a dual port VPI adapter card.
Rev 4.60 Step 3. In this step, you can perform the following functions: • If you choose the HW Defaults option, the port protocols will be determined according to the NIC’s hardware default values. • Choose the desired port protocol for the available port(s). If you choose IB or ETH, both ends of the connection must be of the same type (IB or ETH). • Enable Auto Sensing by checking the AUTO checkbox. If the NIC does not support Auto Sensing, the AUTO option will be grayed out.
Rev 4.60 5. Adaptive Load Balancing The same functionality as Load Balancing (Send & Receive). In case of traffic load in one of the adapters, the load balancing channels the traffic between the other team adapter. 6. Dynamic Link Aggregation (802.3ad) Provides dynamic link aggregation allowing creation of one or more channel groups using same speed or mixed-speed server adapters. 7. Static Link Aggregation (802.
Rev 4.60 Step 2. Right-click a Mellanox ConnectX 10Gb Ethernet adapter (under “Network adapters” list) and left click Properties. Select the LBFO tab from the Properties window. It is not recommended to open the Properties window of more than one adapter simultaneously. The LBFO dialog enables creating, modifying or removing a bundle. Only Mellanox Technologies adapters can be part of the LBFO. To create a new bundle, perform the following Step 1. Click Create. Step 2. Enter a (unique) bundle name.
Rev 4.60 Step 7. Check the checkbox.
Rev 4.60 To modify an existing bundle, perform the following: a. Select the desired bundle and click Modify b. Modify the bundle name, its type, and/or the participating adapters in the bundle c. Click the Commit button To remove an existing bundle, select the desired bundle and click Remove. You will be prompted to approve this action. Notes on this step: a. Each adapter that participates in a bundle has two properties: • Status: Connected/Disconnected/Disabled • Role: Active or Backup b.
Rev 4.60 Step 1. Display the Device Manager.
Rev 4.60 Step 2. Right-click a Mellanox network adapter (under “Network adapters” list) and left-click Properties. Select the VLAN tab from the Properties sheet. If a physical adapter has been added to a bundle (team), the VLAN tab will not be displayed. Step 3. Click New to open a VLAN dialog window. Enter the desired VLAN Name and VLAN ID, and select the VLAN Priority.
Rev 4.60 After installing the first virtual adapter (VLAN) on a specific port, the port becomes disabled. This means that it is not possible to bind to this port until all the virtual adapters associated with it are removed. When using a VLAN, the network address is configured using the VLAN ID. Therefore, the VLAN ID on both ends of the connection must be the same. Step 4. Verify the new VLAN(s) by opening the Device Manager window or the Network Connections window.
Rev 4.60 3.5.5 Step 4. Select the VLAN to be removed. Step 5. Click Remove and confirm the operation. Configuring a Port to Work with VLAN in Windows 2012 and Above In this procedure you DO NOT create a VLAN, rather use an existing VLAN ID. To configure a port to work with VLAN using the Device Manager. 3.6 Step 1. Open the Device Manager. Step 2. Go to the Network adapters. Step 3. Right click ' Properties on Mellanox ConnectX®-3 Ethernet Adapter card. Step 4. Go to Advanced tab. Step 5.
Rev 4.60 Step 5. Choose the ‘Tx Throughput Port Arbiter’ option. Step 6. Set one of the following values: • Best Effort (Default) - Default behavior. No precedence is given to this port over the other. • Guaranteed - Give higher precedence to this port. • Not Present - No configuration exists, defaults are used. 3.7 RDMA over Converged Ethernet (RoCE) 3.7.
Rev 4.60 • All InfiniBand verbs applications which run over InfiniBand verbs should work on RoCE links if they use GRH headers. • Set HCA to use Ethernet protocol: Display the Device Manager and expand “System Devices”. Please refer to Section 3.4.2, “Port Protocol Configuration”, on page 19. 3.7.2.2 Configuring Windows Host Since PFC is responsible for flow controlling at the granularity of traffic priority, it is necessary to assign different priorities to different types of network traffic.
Rev 4.60 Step 6. Create a VLAN, in case VLANs are used in the network. (config)# vlan 55 Step 7. Create a VLAN interface (Should be configured when working with L3 only). (config)# interface vlan 55 Step 8. Assign an IP address to the vlan interface. (config)# interface vlan 55 ip address 100.5.5.1 255.255.255.252 Step 9. Allow ports that belong to LAG (port-channel 55) access VLAN 55. (config)# interface port-channel 55 switchport trunk allowed-vlan 55 Step 10.
Rev 4.60 3.7.4 Configuring Router (PFC only) The router uses L3's DSCP value to mark the egress traffic of L2 PCP. The required mapping, maps the three most significant bits of the DSCP into the PCP. This is the default behavior, and no additional configuration is required. 3.7.4.1 Copying Port Control Protocol (PCP) between Subnets The captured PCP option from the Ethernet header of the incoming packet can be used to set the PCP bits on the outgoing Ethernet header. 3.7.
Rev 4.60 a Hyper-V host can be tunneled using a single PA on that Hyper-V host. CAs must be unique across all VMs on the same virtual network, but they do not need to be unique across virtual networks with different Virtual Subnet ID. The VM generates a packet with the addresses of the sender and the recipient within the CA space. Then Hyper-V host encapsulates the packet with the addresses of the sender and the recipient in PA space. PA addresses are determined by using virtualization table.
Rev 4.60 3.8.2 Configuring the NVGRE using PowerShell Hyper-V Network Virtualization policies can be centrally configured using PowerShell 3.0 and PowerShell Remoting. Step 1. Create a vSwitch. New-VMSwitch -NetAdapterName -AllowManagementOS $true Step 2. Shut down the VMs. Stop-VM -Name -Force -Confirm Step 3. Configure the Virtual Subnet ID on the Hyper-V Network Switch Ports for each Virtual Machine on each Hyper-V Host (Host 1 and Host 2).
Rev 4.60 3.9 Differentiated Services Code Point (DSCP) DSCP is a mechanism used for classifying network traffic on IP networks. It uses the 6-bit Differentiated Services Field (DS or DSCP field) in the IP header for packet classification purposes. Using Layer 3 classification enables you to maintain the same classification semantics beyond local network, across routers. Every transmitted packet holds the information allowing network devices to map the packet to the appropriate 802.1Qbb CoS.
Rev 4.60 3.9.4 Configuring DSCP for RDMA Traffic • Create a QoS policy to tag the ND traffic for port 10000 with CoS value 3. $ New-NetQosPolicy "ND10000" -NetDirectPortMatchCondition 10000 - PriorityValue8021Action 3 Related Commands: • Get-NetAdapterQos - Gets the QoS properties of the network adapter • Get-NetQosPolicy - Retrieves network QoS policies • Get-NetQosFlowControl - Gets QoS status per priority 3.9.
Rev 4.60 Table 6 - DSCP Default Registry Keys Settings Registry Key 3.9.
Rev 4.60 4 Deploying Windows Server 2012 and Above with SMB Direct 4.1 Overview The Server Message Block (SMB) protocol is a network file sharing protocol implemented in Microsoft Windows. The set of message packets that defines a particular version of the protocol is called a dialect. The Microsoft SMB protocol is a client-server implementation and consists of a set of data packets, each containing a request sent by the client or a response sent by the server.
Rev 4.60 4.3.2 Verifying SMB Connection To verify the SMB connection on the SMB client: Step 1. Copy the large file to create a new session with the SMB Server. Step 2. Open a PowerShell window while the copy is ongoing. Step 3. Verify the SMB Direct is working properly and that the correct SMB dialect is used. Get-SmbConnection Get-SmbMultichannelConnection netstat.
Rev 4.60 5 Driver Configuration Once you have installed Mellanox WinOF VPI package, you can perform various modifications to your driver to make it suitable for your system’s needs Changes made to the Windows registry happen immediately, and no backup is automatically made. Do not edit the Windows registry unless you are confident regarding the changes. 5.1 Configuring the InfiniBand Driver 5.1.
Rev 4.60 Step 1. Display the Device Manager. Step 2. Select the Information tab from the Properties sheet. To save this information for debug purposes, click Save to File and provide the output file name.
Rev 4.60 5.2 Configuring the Ethernet Driver The following steps describe how to configure advanced features. Step 1. Display the Device Manager. Step 2. Right-click a Mellanox network adapter (under “Network adapters” list) and left-click Properties. Select the Advanced tab from the Properties sheet.
Rev 4.60 Step 3. Modify configuration parameters to suit your system. Please note the following: a. For help on a specific parameter/option, check the help button at the bottom of the dialog. b. If you select one of the entries Off-load Options, Performance Options, or Flow Control Options, you’ll need to click the Properties button to modify parameters via a pop-up dialog. 5.
Rev 4.60 Step 5. [Optional] If VLANs are used, mark the egress traffic with the relevant VlanID. The NIC is referred as "Ethernet 4” in the examples below. PS $ Set-NetAdapterAdvancedProperty -Name "Ethernet 4" -RegistryKeyword "VlanID" -RegistryValue "55" Step 6. [Optional] Configure the IP address for the NIC. If DHCP is used, the IP address will be assigned automatically.
Rev 4.60 6 Performance Tuning This section describes how to modify Windows registry parameters in order to improve performance. Please note that modifying the registry incorrectly might lead to serious problems, including the loss of data, system hang, and you may need to reinstall Windows. As such it is recommended to back up the registry on your system before implementing recommendations included in this section.
Rev 4.60 Step 6. • Single port traffic - Improves performance for running single port traffic each time. • Dual port traffic - Improves performance for running traffic on both ports simultaneously. • Forwarding traffic - Improves performance for running scenarios that involve both ports (for example: via IXIA) • Multicast traffic - Improves performance when the main traffic runs on multicast. Click on “Run Tuning” button.
Rev 4.60 7. Click on “Run Tuning” button. Clicking the "Run Tuning" button activates the general tuning as explained above and changes several driver registry entries for the current adapter and its sibling device once the sibling is an Ethernet device as well. It also generates a log including the applied changes. Users can view this log to restore the previous values. The log path is: %HOMEDRIVE%\Windows\System32\LogFiles\PerformanceTunning.
Rev 4.60 Synopsys perf_tuning.exe -s -c1 [-c2 ] perf_tuning.exe -d -c1 -c2 perf_tuning.exe -f -c1 -c2 perf_tuning.exe -m -c1 -b -n perf_tuning -st -c1 [-c2 ] Options Flag -s Description Single port traffic scenario.
Rev 4.60 Flag -f Description Forwarding traffic scenario. This option must be followed by two connection names. The tuning in this case is codependent. This option automatically sets: • • • • • SendCompletionMethod = 1 RecvCompletionMethod = 0 *ReceiveBuffers = 4096 UseRSSForRawIP = 0 UseRSSForUDP = 0 Additionally, this option chooses the best processors to assign to: • • • • DefaultRecvRingProcessor TxInterruptProcessor TxForwardingProcessor In Operating Systems support NDIS6.
Rev 4.60 Flag -st Description Single stream traffic scenario. This option must be followed by one or two connection names for an Ethernet adapter. The tuning will restore the default settings on the second connection and performed on the first connection. This option automatically sets: • • • • SendCompletionMethod = 0 RecvCompletionMethod = 2 *ReceiveBuffers = 1024 In Operating Systems support NDIS6.
Rev 4.60 Step 2. Open "Network Adapters". Step 3. Right click the relevant Ethernet adapter and select Properties. Step 4. Select the "Advanced" tab Step 5. Modify performance parameters (properties) as desired. 6.2.1.1 Performance Known Issues 6.2.2 • On Intel I/OAT supported systems, it is highly recommended to install and enable the latest I/OAT driver (download from www.intel.com). • With I/OAT enabled, sending 256-byte messages or larger will activate I/OAT.
Rev 4.60 • Send Buffers The number of sent buffers (default 2048). • Performance Options Configures parameters that can improve adapter performance. • Interrupt Moderation Moderates or delays the interrupts’ generation. Hence, optimizes network throughput and CPU utilization (default Enabled). • When the interrupt moderation is enabled, the system accumulates interrupts and sends a single interrupt rather than a series of interrupts.
Rev 4.60 Maximum elapsed time (in usec) between the receiving of a packet and the generation of an interrupt, even if the moderation count has not been reached (default 10). • Rx Interrupt Moderation Type Sets the rate at which the controller moderates or delays the generation of interrupts making it possible to optimize network throughput and CPU utilization. The default setting (Adaptive) adjusts the interrupt rates dynamically depending on the traffic type and network usage.
Rev 4.60 • Offload Options Allows you to specify which TCP/IP offload settings are handled by the adapter rather than the operating system. Enabling offloading services increases transmission performance as the offload tasks are performed by the adapter hardware rather than the operating system. Thus, freeing CPU resources to work on other tasks. • IPv4 Checksums Offload Enables the adapter to compute IPv4 checksum upon transmit and/or receive instead of the CPU (default Enabled).
Rev 4.60 6.4.1 Supported Standard Performance Counters 6.4.1.1 Proprietary Mellanox Adapter Traffic Counters Proprietary Mellanox adapter traffic counter set consists of global traffic statistics which gather information from ConnectX®-3 and ConnectX®-3 Pro network adapters, and includes traffic statistics, and various types of error and indications from both the Physical Function and Virtual Function.
Rev 4.60 Table 7 - Mellanox Adapter Traffic Counters Mellanox Adapter Traffic Counters Description ERRORS, DROP, AND MISC. INDICATIONS Packets Outbound Errors Shows the number of outbound packets that could not be transmitted because of errors. Packets Outbound Discarded Shows the number of outbound packets to be discarded even though no errors had been detected to prevent transmission. One possible reason for discarding packets could be to free up buffer space.
Rev 4.60 Table 8 - Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Counters Description Responder protection errors Number of local protection errors when the local machine receives inbound traffic. Requester CQE errors Number of local CQE with errors when the local machine generates outbound traffic. Responder CQE errors Number of local CQE with errors when the local machine receives inbound traffic.
Rev 4.60 Table 8 - Mellanox Adapter Diagnostics Counters Mellanox Adapter Diagnostics Counters Description Bad multicast received Number of bad multicast packet received. Discarded UD packets Number of UD packets silently discarded on the receive queue due to lack of receives descriptor. Discarded UC packets Number of UC packets silently discarded on the receive queue due to lack of receives descriptor. CQ overflows Number of CQ overflows.
Rev 4.60 Table 9 - Mellanox QoS Counters Mellanox QoS Counters Description Bytes Sent The number of bytes sent that are covered by this priority. The counted bytes include framing characters (modulo 2^64). Bytes Sent/Sec The number of bytes sent per second that are covered by this priority. The counted bytes include framing characters. Packets Sent The number of packets sent that are covered by this priority (modulo 2^64).
Rev 4.60 7 OpenSM - Subnet Manager OpenSM v3.3.11 is an InfiniBand Subnet Manager. In order to operate one host machine or more in the InfiniBand cluster., at least one Subnet Manger is required in the fabric. Please use the embedded OpenSM in the WinOF package for testing purpose and small cluster. Otherwise, we recommend using OpenSM from FabricIT EFM™ or UFM® or MLNX-OS®. OpenSM can run as a Windows service and can be started manually from the following directory: \tools.
Rev 4.60 8 8.1 InfiniBand Fabric Network Direct Interface The Network Direct Interface (NDI) architecture provides application developers with a networking interface that enables zero-copy data transfers between applications, kernel-bypass I/O generation and completion processing, and one-sided data transfer operations. NDI is supported by Microsoft and is the recommended method to write InfiniBand application.
Rev 4.60 8.3.1.1 Common Configuration, Interface and Addressing Topology File (Optional) An InfiniBand fabric is composed of switches and channel adapter (HCA/TCA) devices. To identify devices in a fabric (or even in one switch system), each device is given a GUID (a MAC equivalent). Since a GUID is a non-user-friendly string of characters, it is better to alias it to a meaningful, user-given name.
Rev 4.60 • Using port names defined in the topology file: (Tool option ‘-n’) This option refers to the source and destination ports by the names defined in the topology file. (Therefore, this option is relevant only if a topology file is specified to the tool.) In this mode, the tool uses the names to extract the port LIDs from the matched topology, then the tool operates as in the ‘-l’ option. 8.3.
Rev 4.60 8.3.2.2 ibdiagnet Output Files Table 11 - ibdiagnet Output Files Output File Description ibdiagnet.log A dump of all the application reports generate according to the provided flags ibdiagnet.lst List of all the nodes, ports and links in the fabric ibdiagnet.fdbs A dump of the unicast forwarding tables of the fabric switches ibdiagnet.mcfdbs A dump of the multicast forwarding tables of the fabric switches ibdiagnet.
Rev 4.60 8.3.2.3 ibdiagnet Error Codes 1 2 3 4 5 6 8.3.3 - Failed Failed Failed Failed Failed Failed to to to to to to fully discover the fabric parse command line options interact with IB fabric use local device or local port use Topology File load required Package ibportstate Enables querying the logical (link) and physical port states of an InfiniBand port. It also allows adjusting the link speed that is enabled on any InfiniBand port.
Rev 4.60 Table 12 - ibportstate Flags and Options (Continued) Flag Description -G/--Guid Use GUID address argument. In most cases, it is the Port GUID. Example: ‘0x08f1040023’ -s/--sm_port Use as the target lid for SM/SA queries -C/--Ca Use the specified channel adapter or router -P/--Port Use the specified port -u/--usage Usage message -t/--timeout Override the default timeout for the solicited MADs [msec] Destination’s directed path, LID, or GUID.
Rev 4.60 LinkWidthEnabled:................1X or 4X LinkWidthActive:.................4X LinkSpeedSupported:..............2.5 Gbps or 5.0 Gbps LinkSpeedEnabled:................2.5 Gbps or 5.0 Gbps LinkSpeedActive:.................5.0 Gbps 2. Query the status of two channel adapters using directed paths. > ibportstate -C mlx4_0 -D 0 1 PortInfo: # Port info: DR path slid 65535; dlid 65535; 0 port 1 LinkState:.......................Initialize PhysLinkState:...................LinkUp LinkWidthSupported:.........
Rev 4.60 LinkSpeedEnabled:................2.5 Gbps After PortInfo set: # Port info: DR path slid 65535; dlid 65535; 0 port 1 LinkSpeedEnabled:................5.0 Gbps (IBA extension) # Show the new configuration > ibportstate -C mlx4_0 -D 0 1 PortInfo: # Port info: DR path slid 65535; dlid 65535; 0 port 1 LinkState:.......................Initialize PhysLinkState:...................LinkUp LinkWidthSupported:..............1X or 4X LinkWidthEnabled:................1X or 4X LinkWidthActive:.................
Rev 4.60 Table 13 - ibroute Flags and Options Flag Description -n/--no_dests Do not try to resolve destinations -D/--Direct Use directed path address arguments. The path is a comma separated list of out ports. Examples: ‘0’ – self port ‘0,1,2,1,4’ – out via port 1, then 2, ... -G/--Guid Use GUID address argument. In most cases, it is the Port GUID. Example: ‘0x08f1040023’ -M/--Multicast Show multicast forwarding tables. The parameters and specify the MLID range.
Rev 4.60 Unicast lids [0x3-0x7] of switch Lid 2 guid 0x0002c902fffff00a (MT47396 InfiniscaleIII Mellanox Technologies): Lid Out Destination Port Info 0x0003 021 : (Switch portguid 0x000b8cffff004016: 'MT47396 Infiniscale-III Mellanox Technologies') 0x0006 007 : (Channel Adapter portguid 0x0002c90300001039: 'sw137 HCA-1') 0x0007 021 : (Channel Adapter portguid 0x0002c9020025874a: 'sw157 HCA-1') 3 valid lids dumped 3. Dump all Lids with valid out ports of the switch with portguid 0x000b8cffff004016.
Rev 4.60 .pcap format. This file can be loaded by the Wireshark tool (www.wireshark.org) for graphical traffic analysis. This provides the ability to analyze network behavior and performance, and to debug applications that send or receive RDMA network traffic. Run "ibdump -h" to display a help message which details the tools options. 8.3.5.1 ibdump Synopsis - ibdump 8.3.5.2 ibdump Options The table below lists the various ibdump flags of the command. Table 14 - ibdump Flags and Options Flag 8.3.
Rev 4.60 8.3.6.2 smpquery Synopsys smpquery [-h] [-d] [-e] [-c] [-v] [-D] [-G] [-s ] [-L] [-u] [-V] [-C ] [-P ] [-t ] [--node-name-map ] [op params] 8.3.6.3 smpquery Options The table below lists the various flags of the command. Table 15 - smpquery Flags and Options Flag Description -h/--help Print the help menu -d/--debug Raise the IB debug level.
Rev 4.60 Table 15 - smpquery Flags and Options Flag Description Destination’s directed path, LID, or GUID --node-name-map Node name map file -x/--extended Use extended speeds Examples 1. Query PortInfo by LID, with port modifier. > smpquery portinfo 1 1 # Port info: Lid 1 port 1 Mkey:............................0x0000000000000000 GidPrefix:.......................0xfe80000000000000 Lid:.............................0x0001 SMLid:...........................
Rev 4.60 MtuCap:..........................2048 VLStallCount:....................0 HoqLife:.........................31 OperVLs:.........................VL0-3 PartEnforceInb:..................0 PartEnforceOutb:.................0 FilterRawInb:....................0 FilterRawOutb:...................0 MkeyViolations:..................0 PkeyViolations:..................0 QkeyViolations:..................0 GuidCap:.........................128 ClientReregister:................0 SubnetTimeout:...................
Rev 4.60 PartCap:.........................128 DevId:...........................0x634a Revision:........................0x000000a0 LocalPort:.......................1 VendorId:........................0x0002c9 8.3.7 perfquery Queries InfiniBand ports’ performance and error counters. Optionally, it displays aggregated counters for all ports of a node. It can also reset counters after reading them or simply reset them. 8.3.7.1 perfquery Applicable Hardware All InfiniBand devices. 8.3.7.
Rev 4.
Rev 4.60 # Port counters: Lid 6 port 1 PortSelect:......................1 CounterSelect:...................0x1000 SymbolErrors:....................0 LinkRecovers:....................0 LinkDowned:......................0 RcvErrors:.......................0 RcvRemotePhysErrors:.............0 RcvSwRelayErrors:................0 XmtDiscards:.....................0 XmtConstraintErrors:.............0 RcvConstraintErrors:.............0 LinkIntegrityErrors:.............0 ExcBufOverrunErrors:.............
Rev 4.60 RcvErrors:.......................0 RcvRemotePhysErrors:.............0 RcvSwRelayErrors:................0 XmtDiscards:.....................3 XmtConstraintErrors:.............0 RcvConstraintErrors:.............0 LinkIntegrityErrors:.............0 ExcBufOverrunErrors:.............0 VL15Dropped:.....................0 XmtData:.........................0 RcvData:.........................0 XmtPkts:.........................0 RcvPkts:.........................0 8.3.
Rev 4.60 Table 17 - ibping Flags and Options Flag 8.3.9 Description --Guid, -G Uses GUID address argument. In most cases, it is the Port GUID.
Rev 4.60 Table 18 - ibnetdiscover Flags and Options Flag Description --node-name-map Specifies a node name map. The node name map file maps GUIDs to more user friendly names. See “Topology File Format” on page 81. --cache Caches the ibnetdiscover network data in the specified filename. This cache may be used by other tools for later analysis --load-cache Loads and use the cached ibnetdiscover data stored in the specified filename.
Rev 4.60 8.3.9.3 Topology File Format The topology file format is largely intuitive. Most identifiers are given textual names like vendor ID (vendid), device ID (device ID), GUIDs of various types (sysimgguid, caguid, switchguid, etc.). PortGUIDs are shown in parentheses (). For switches, this is shown on the switchguid line. For CA and router ports, it is shown on the connectivity lines. The IB node is identified followed by the number of ports and the node GUID.
Rev 4.
Rev 4.
Rev 4.60 Table 19 - ibtracert Flags and Options Flag Description --Lid, -L Uses LID address argument --errors, -e Shows send and receive errors --usage, -u Usage message --Guid, -G Uses GUID address argument. In most cases, it is the Port GUID.
Rev 4.60 8.3.11.1 sminfo Synopsys sminfo [-d(ebug)] [-e(rr_show)] [-s state] [-p prio] [-a activity] [-D(irect)] [-L(id)] [-u(sage)] [-G(uid)] [-C ca_name] [-P ca_port] [-t(imeout) timeout_ms] [V(ersion)] [-h(elp)] sm_lid | sm_dr_path [modifier] 8.3.11.2 sminfo Options The table below lists the various flags of the command. Most OpenIB diagnostics take the following common flags. The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name -h syntax.
Rev 4.60 Examples sminfo # local ports sminfo sminfo 32 # show sminfo of lid 32 sminfo -G 0x8f1040023 # same but using guid address 8.3.12 ibclearerrors ibclearerrors is a script which clears the PMA error counters in PortCounters by either waking the InfiniBand subnet topology or using an already saved topology file. 8.3.12.1 ibclearerrors Synopsys ibclearerrors [-h] [-N | -nocolor] [ | -C ca_name -P ca_port -t(imeout) timeout_ms] 8.3.12.
Rev 4.
Rev 4.60 • It verifies the existing inventory, with all the object fields, and matches it to a pre-saved one. • A Multicast Compliancy test. • An Event Forwarding test. • A Service Record registration test. • An RMPP stress test. • A Small SA Queries stress test. It is recommended that after installing opensm, the user should run "osmtest -f c" to generate the inventory file, and immediately afterwards run "osmtest -f a" to test OpenSM.
Rev 4.60 Table 24 - osmtest Flags and Options Flag Description -m, --max_lid This option specifies the maximal LID number to be searched for during inventory file build (default to 100) -g, --guid This option specifies the local port GUID value with which OpenSM should bind. OpenSM may be bound to 1 port at a time. If GUID given is 0, OpenSM displays a list of possible port GUIDs and waits for user input.
Rev 4.60 Table 24 - osmtest Flags and Options Flag Description -vf This option sets the log verbosity level. A flags field must follow the -D option.
Rev 4.60 Table 25 - ibaddr Flags and Options Flags Description Debugging Flags Description NOTE: Most OpenIB diagnostics take the following common flags. The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name -h syntax. -d Raises the IB debugging level. Can be used several times (-ddd or -d -d -d). -e shows send and receive errors (timeouts and others) -h shows the usage message -v Increases the application verbosity level.
Rev 4.60 Examples ibaddr ibaddr ibaddr ibaddr ibaddr ibaddr 32 -G -l -L -g 0x8f1040023 32 # 32 # 32 # # local port´s address # show lid range and gid of lid 32 # same but using guid address show lid range only show decimal lid range only show gid address only 8.3.17 ibcacheedit ibcacheedit allows users to edit an ibnetdiscover cache created through the --cache option in ibnetdiscover(8). 8.3.17.
Rev 4.60 8.3.18 iblinkinfo iblinkinfo reports link info for each port in an IB fabric, node by node. Optionally, iblinkinfo can do partial scans and limit its output to parts of a fabric. 8.3.18.1 iblinkinfo Synopsis [-hcdl -C -P -p -S -G -D --load-cache ] 8.3.18.
Rev 4.60 Table 27 - iblinkinfo Flags and Options Flags Description --diffcheck Specifies what diff checks should be done in the--diffoption above. Comma separate multiple diff check key(s). The available diff checks are:port = port connections,state = port state, lid = lids, nodedesc = node descriptions. If port is specified alongside lid or nodedesc, remote port lids and node descriptions will also be compared.
Rev 4.60 Table 28 - ibqueryerrors Flags and Options Flags Description -r Reports the port information. This includes LID, port, external port (if applicable), link speed setting, remote GUID, remote port, remote external port (if applicable), and remote node description information. --data Includes the optional transmit and receive data counters. --threshold-file Specifies an alternate threshold file.
Rev 4.60 Table 28 - ibqueryerrors Flags and Options Flags -t Description Overrides the default timeout for the solicited mads. 8.3.19.3 ibqueryerrors Exit Status If a failure to scan the fabric occurs return -1. If the scan succeeds without errors beyond thresholds return 0. If errors are found on ports beyond thresholds return 1. 8.3.19.4 ibqueryerrors Files /opt/ufm/files/conf/infiniband-diags/error_thresholds Define threshold values for errors. File format is simple "name=val".
Rev 4.60 Table 29 - ibsysstat Flags and Options Flags Description NOTE: Most OpenIB diagnostics take the following common flags. The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name -h syntax. -d Raises the IB debugging level. Can be used several times (-ddd or -d -d -d). -e Shows send and receive errors (timeouts and others) -h Shows the usage message -v Increases the application verbosity level.
Rev 4.60 8.3.21 saquery saquery issues the selected SA query. Node records are queried by default. 8.3.21.
Rev 4.60 8.3.21.2 saquery Options Table 30 - saquery Flags and Options Flags Description -p Gets PathRecord info. -N Gets NodeRecord info. --list | -D Gets NodeDescriptions of CAs only. -S Gets ServiceRecord info. -I Gets InformInfoRecord (subscription) info.
Rev 4.60 Table 30 - saquery Flags and Options Flags Description -t, -timeout Specifies SA query response timeout in milliseconds. Default is 100 milliseconds. You may want to use this option if IB_TIMEOUT is indicated. --node-name-map Specifies a node name map.The node name map file maps GUIDs to more user friendly names. See ibnetdiscover(8) for node name map file format.Only used with the -O and -U options.
Rev 4.60 8.3.22.2 smpdump Options Table 31 - smpdump Flags and Options Flags Description attr IBA attribute ID for SM attribute mod IBA modifier for SM attribute Debugging Flags Description NOTE: Most OpenIB diagnostics take the following common flags. The exact list of supported flags per utility can be found in the usage message and can be shown using the util_name -h syntax. -d Raises the IB debugging level. Can be used several times (-ddd or -d -d -d).
Rev 4.60 Examples Direct Routed Examples: smpdump -D 0,1,2,3,5 16 # NODE DESC smpdump -D 0,1,2 0x15 2 # PORT INFO, port 2 LID Routed Examples: smpdump 3 0x15 2 smpdump 0xa0 0x11 8.4 # PORT INFO, lid 3 port 2 # NODE INFO, lid 0xa0 InfiniBand Fabric Performance Utilities The performance utilities described in this chapter are intended to be used as a performance micro-benchmark. 8.4.1 ib_read_bw ib_read_bw calculates the BW of RDMA read between a pair of machines.
Rev 4.60 Table 32 - ib_read_bw Flags and Options Flag 8.4.2 Description -b, --bidirectional Measures bidirectional bandwidth (default unidirectional) -V, --version Displays version number -g, --grh Use GRH with packets (mandatory for RoCE) ib_read_lat ib_read_lat calculates the latency of RDMA read operation of message_size between a pair of machines. One acts as a server and the other as a client.
Rev 4.60 Table 33 - ib_read_lat Flags and Options Flag -g, --grh 8.4.3 Description Use GRH with packets (mandatory for RoCE) ib_send_bw ib_send_bw calculates the BW of SEND between a pair of machines. One acts as a server and the other as a client. The server receive packets from the client and they both calculate the throughput of the operation.
Rev 4.60 which you send packet only if you receive one. Each of the sides samples the CPU each time they receive a packet in order to calculate the latency. 8.4.4.1 ib_send_lat Synopsys ib_send_lat [-i(b_port) ib_port] [-c(onnection_type) RC\UC\UD] [-m(tu) mtu_size] [s(ize) message_size] [-t(x-depth) tx_size] [-n iteration_num] [-p(ort) PDT_port] [-a(ll)] [-V(ersion)] [-C report cycles] [-H report histogram] [-U report unsorted] 8.4.4.
Rev 4.60 8.4.5.1 ib_write_bw Synopsys ib_write_bw [-q num of qps] [-c(onnection_type) RC\UC] [-i(b_port) ib_port] [-m(tu) mtu_size] [-s(ize) message_size] [-t(x-depth) tx_size] [-n iteration_num] [-p(ort) PDT_port] [-b(idirectional)] [-a(ll)] [-V(ersion)] 8.4.5.2 ib_write_bw Options The table below lists the various flags of the command. Table 36 - ib_write_bw Flags and Options Flag 8.4.
Rev 4.60 8.4.6.1 ib_write_lat Synopsys ib_write_lat [-i(b_port) ib_port] [-c(onnection_type) RC\UC] [-m(tu) mtu_size] [s(ize) message_size] [-t(x-depth) tx_size] [-n iteration_num] [-p(ort) PDT_port] [-a(ll)] [-V(ersion)] [-C report cycles] [-H report histogram] [-U report unsorted] 8.4.6.2 ib_write_lat Options The table below lists the various flags of the command. Table 37 - ib_write_lat Flags and Options Flag 8.4.
Rev 4.60 8.4.7.1 ibv_read_bw Synopsys ibv_read_bw [-i(b_port) ib_port] [-d ib device] [-o(uts) outstanding reads] [-m(tu) mtu_size] [-s(ize) message_size] [-t(x-depth) tx_size] [-n iteration_num] [-p(ort) PDT_port] [-u qp timeout] [-S(l) sl type] [-x gid index] [-e(vents) use events] [-F CPU freq fail] [-b(idirectional)] [-a(ll)] [-V(ersion)] 8.4.7.2 ibv_read_bw Options The table below lists the various flags of the command.
Rev 4.60 Table 38 - ibv_read_bw Flags and Options Flag 8.4.8 Description -Q, --cq-mod Generate Cqe only after <--cq-mod> completion -N, --no peak-bw Cancel peak-bw calculation (default with peak) ibv_read_lat This is a more advanced version of ib_read_lat ,and contains more flags and features than the older version and also improved algorithms. ibv_read_lat calculates the latency of RDMA read operation of message_size between a pair of machines. One acts as a server and the other as a client.
Rev 4.60 Table 39 - ibv_read_lat Flags and Options Flag 8.4.
Rev 4.
Rev 4.60 They perform a ping pong benchmark on which you send packet only after you receive one. Each of the sides samples the CPU clock each time they receive a send packet, in order to calculate the latency. 8.4.10.1 ibv_send_lat Synopsys ibv_send_lat [-i(b_port) ib_port] [-c(onnection_type) RC\UC\UD] [-d ib_device name] [-m(tu) mtu_size] [-s(ize) message_size] [-t(x-depth) tx_size] [I(nline_size) inline size] [-u qp timeout] [-S(L) sl type] [-x gid index] [-e(events) use events] [-n iteration_num] [-g n
Rev 4.60 Table 41 - ibv_send_lat Flags and Options Flag Description -g, --post= The number of posts for each qp in the chain (default tx_depth) -I, --inline_size= The maximum size of message to be sent in “inline mode” (default 0) -e, --events Inactive during CQ events (default poll) -g, --mcg= Sends messages to multicast group with qps attached to it. -M, --MGID= In case of multicast, uses as the group MGID.
Rev 4.60 Table 42 - ibv_write_bw Flags and Options Flag Description -c, --connection= Connection type RC/UC(default RC) -s, --size= The size of message to exchange (default 65536) -a, --all Runs sizes from 2 till 2^23 -t, --tx-depth= The size of tx queue (default 100) -n, --iters= The number of exchanges (at least 2, default 1000) -u, --qp-timeout= QP timeout.
Rev 4.60 8.4.12.1 ibv_write_lat Synopsis ibv_write_lat [-i(b_port) ib_port] [-c(onnection_type) RC\UC\UD][-m(tu) mtu_size] [-s(ize) message_size] [-t(x-depth) tx_size] [-I(nline_size) inline size] [-u qp timeout] [-S(L) sl type] [-d ib_device name] [-x gid index] [-n iteration_num] [-p(ort) PDT_port] [-a(ll)] [-V(ersion)] [-C report cycles] [-H report histogram] [-U report unsorted] 8.4.12.2 ibv_write_lat Options The table below lists the various flags of the command.
Rev 4.60 8.4.13 nd_write_bw This test is used for performance measuring of RDMA-Write requests in Microsoft Windows Operating Systems. nd_write_bw is performance oriented for RDMA-Write with maximum throughput, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time.
Rev 4.60 latency, and runs over Microsoft's NetworkDirect standard. The level of customizing for the user is relatively high. User may choose to run with a customized message size, customized number of iterations, or alternatively, customized test duration time. nd_write_lat runs with all message sizes from 1B to 4MB (powers of 2), message inlining, CQ moderation. 8.4.14.1 nd_write_lat Synopsys Server side: start /b /affinity 0X1 nd_write_lat -s1048576 -D10 -S 11.137.53.
Rev 4.60 8.4.15.1 nd_read_bw Synopsys Server side: start /b /affinity 0X1 nd_read_bw -s1048576 -D10 -S 11.137.53.1 Client side: start /b /wait /affinity 0X1 nd_read_bw -s1048576 -D10 -C 11.137.53.1 8.4.15.2 nd_read_bw Options The table below lists the various flags of the command. Table 46 - nd_read_bw Options Flags Description -h Shows the Help screen. -v Shows the version number. -p Connects to the port .
Rev 4.60 8.4.16.1 nd_read_lat Synopsys Server side: start /b /affinity 0X1 nd_read_lat -s1048576 -D10 -S 11.137.53.1 Client side: start /b /wait /affinity 0X1 nd_read_lat -s1048576 -D10 -C 11.137.53.1 8.4.16.2 nd_read_lat Options The table below lists the various flags of the command. Table 47 - nd_read_lat Options Flags Description -h Shows the Help screen. -v Shows the version number. -p Connects to the port .
Rev 4.60 8.4.17.1 nd_send_bw Synopsys Server side: start /b /affinity 0X1 nd_send_bw -s1048576 -D10 -S 11.137.53.1 Client side: start /b /wait /affinity 0X1 nd_send_bw -s1048576 -D10 -C 11.137.53.1 8.4.17.2 nd_send_bw Options The table below lists the various flags of the command. Table 48 - nd_send_bw Flags and Options Flag Description -h Shows the Help screen. -v Shows the version number. -p Connects to the port .
Rev 4.60 8.4.18.1 nd_send_lat Synopsys Server side: start /b /affinity 0X1 nd_send_lat -s1048576 -D10 -S 11.137.53.1 Client side: start /b /wait /affinity 0X1 nd_send_lat -s1048576 -D10 -C 11.137.53.1 8.4.18.2 nd_send_lat Options The table below lists the various flags of the command. Table 49 - nd_send_lat Options Flag Description -h Shows the Help screen. -v Shows the version number. -p Connects to the port .
Rev 4.60 8.4.19.1 NTttcp Synopsys Server: ntttcp_x64.exe -r -t 15 -m 16,*, Client: ntttcp_x64.exe -s -t 15 -m 16,*, 8.4.19.2 NTttcp Options The table below lists the various flags of the command.
Rev 4.60 9 Software Development Kit Software Development Kit (SDK) a set of development tools that allows the creation of InfiniBand applications for MLNX_VPI software package. The SDK package contains, header files, libraries, and code examples. To compile the examples provided with the SDK you must install Windows Driver Kit (WDK) version 8.1 and higher. To open the SDK package you must run the sdk.exe file and get the complete list of files. SDK package can be found under \IB\SD
Rev 4.60 10 Troubleshooting 10.1 InfiniBand Troubleshooting Issue 1. The InfiniBand interfaces are not up after the first reboot after the installation process is completed. Suggestion: To troubleshoot this issue, follow the steps below: 1. Check that the InfiniBand driver is running on all nodes by using “vstat”. The vstat utility located at \tools, displays the status and capabilities of the network adaptor card(s). 2.
Rev 4.60 Suggestion: The error message indicates that the wrong firmware image has been programmed on the adapter card. See Section 2,“Firmware Upgrade,” on page 14. Issue 5. The Ethernet driver fails to start. A yellow sign appears near the "Mellanox ConnectX 10Gb Ethernet Adapter" in the Device Manager display. Suggestion: This can happen due to a hardware error. Try to disable and re-enable "Mellanox ConnectX Adapter" from the Device Manager display. Issue 6.
Rev 4.60 10.3 • Mellanox ConnectX EN 10Gbit Ethernet Adapter device detected that the link connected to port is up, and has initiated normal operation. • Mellanox ConnectX EN 10Gbit Ethernet Adapter device detected that the link connected to port is down. This can occur if the physical link is disconnected or damaged, or if the other end-port is down. • Mismatch in the configurations between the two ports may affect the performance.
Rev 4.60 3. Results appear in MB/s (Mega Bytes 2^20), and reflect the actual data that was transferred, excluding headers. 4. If these results are not as expected, the problem is most probably with one or more of the following: Issue 3. • Old Firmware version. • Misconfigured Flow-control: Global pause or PFC is configured wrong on the hosts, routers andswitches. See Section 3.7,“RDMA over Converged Ethernet (RoCE),” on page 29 • CPU/power options are not set to "Maximum Performance".
Rev 4.60 11 Documentation • Under \Documentation: • License file • User Manual (this document) • MLNX_VPI_WinOF Installation Guide • MLNX_VPI_WinOF Release Notes • MLNX_VPI_WinOF Registry Keys Mellanox Technologies 128
Rev 4.60 Appendix A: Windows MPI (MS-MPI) A.1 Overview Message Passing Interface (MPI) is meant to provide virtual topology, synchronization, and communication functionality between a set of processes. With MPI you can run one process on several hosts. • Windows MPI run over the following protocols: • Sockets (Ethernet) • Network Direct (ND) A.1.1 A.2 Prerequisites • Install HPC (Build: 4.0.3906.0). • Validate traffic (ping) between the whole MPI Hosts.
Rev 4.60 Step 3. [Recommended] Direct ALL TCP/UDP traffic to a lossy priority by using the “IPProtocolMatchCondition”. TCP is being used for MPI control channel (smpd), while UDP is being used for other services such as remote-desktop. Arista switches forwards the pcp bits (e.g. 802.1p priority within the vlan tag) from ingress to egress to enable any two End-Nodes in the fabric as to maintain the priority along the route.
Rev 4.60 A.5.2 • New-NetQosPolicy “UDP" -IPProtocolMatchCondition UDP PriorityValue8021Action 1 • Enable-NetQosFlowControl 3 • Disable-NetQosFlowControl 0,1,2,4,5,6,7 • Enable-netadapterqos -Name Running MPI Command Examples • Running MPI pallas test over ND. mpiexec.exe -p 19020 -hosts 4 11.11.146.101 11.21.147.101 11.21.147.51 11.11.145.101 -env MPICH_NETMASK 11.0.0.0/ 255.0.0.0 -env MPICH_ND_ZCOPY_THRESHOLD -1 -env MPICH_DISABLE_ND 0 -env MPICH_DISABLE_SOCK 1 -affinity c:\\test1.
Rev 4.60 Appendix B: NVGRE Configuration Scrips Examples The setup is as follow for both examples below: Hypervisor VM on VM on Hypervisor VM on VM on B.1 mtlae14 mtlae14 mtlae14 mtlae15 mtlae15 mtlae15 = = = = = = "Port1", 192.168.20.114/24 mtlae14-005, 172.16.14.5/16, mtlae14-006, 172.16.14.6/16, "Port1", 192.168.20.115/24 mtlae15-005, 172.16.15.5/16, mtlae15-006, 172.16.15.
Rev 4.60 # Step 3. Configure the Provider Address and Route records on Hyper-V Host 1 (Host 1 Only) mtlae14 $NIC = Get-NetAdapter "Port1" New-NetVirtualizationProviderAddress -InterfaceIndex $NIC.InterfaceIndex -ProviderAddress 192.168.20.114 -PrefixLength 24 New-NetVirtualizationProviderRoute -InterfaceIndex $NIC.InterfaceIndex -DestinationPrefix "0.0.0.0/0" -NextHop 192.168.20.1 # Step 5.
Rev 4.60 # ------- The commands from Step 2 - 4 are not persistent, Its suggested to create script is running after each OS reboot # Step 2. Configure a Subnet Locator and Route records on each Hyper-V Host (Host 1 and Host 2) mtlae14 & mtlae15 New-NetVirtualizationLookupRecord -CustomerAddress 172.16.14.5 -ProviderAddress 192.168.20.114 -VirtualSubnetID 5001 -MACAddress "00155D720100" -Rule "TranslationMethodEncap" New-NetVirtualizationLookupRecord -CustomerAddress 172.16.14.6 -ProviderAddress 192.168.20.