ScaleIO/VxFlex OS IP Fabric Best Practice and Deployment Guide with Dell EMC Networking OS10 Enterprise Edition *Dell EMC VxFlex OS – formerly ScaleIO Dell EMC Networking Leaf-Spine Architecture with the S4248FB-ON Dell EMC Networking Infrastructure Solutions May 2018 A Dell EMC Best Practice and Deployment Guide
Revisions Date Version Description Authors May 2018 1.0 Gerald Myres, Rutvij Shah, Curtis Bunch, Colin King Initial release with OS10 content. (Based on ScaleIO IP Fabric Best Practice and Deployment Guide version 2.0) The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose.
Table of contents 1 Introduction to network virtualization and hyper-converged infrastructure and networking .........................................6 1.1 VxFlex OS ........................................................................................................................................................... 7 1.1.1 Benefits of VxFlex OS ........................................................................................................................................ 7 1.1.2 Components .
3.3.7 Associate hosts and assign uplinks ..................................................................................................................28 3.3.8 Configure teaming and failover on LAGs .........................................................................................................31 3.3.9 Add VMkernel adapters for MDM, SDS-SDC, and vMotion .............................................................................32 4 3.3.10 Verify VDS configuration ..........................
C.2.2 Installing the SDC on ESXi hosts .....................................................................................................................54 C.2.3 VxFlex OS deployment .....................................................................................................................................54 C.2.4 Deployment wizard modifications .....................................................................................................................56 Routed leaf-spine ..............
1 Introduction to network virtualization and hyper-converged infrastructure and networking With the advent of network virtualization and hyper-converged infrastructure (HCI), the nature of network traffic is undergoing a significant transformation. A dedicated Fibre Channel (FC) network with many of the advanced storage features located on a dedicated storage array is no longer the norm.
This document defines a VxFlex OS deployment on a Leaf-Spine topology where deployed networkmanagement technologies, advanced routing, and link aggregations are implemented. The paper focuses on a Leaf-Spine network with greater cluster size and workloads, requiring high-speed network ports on leaf switches and interconnection ports for large workloads. 1.
operations in the background so that, when necessary, optimization has minimal or no impact to applications and users. Compelling Economics VxFlex OS can reduce the cost and complexity of a typical SAN by allowing users to exploit unused local storage capacity on the servers. This eliminates the need for an FC fabric between servers and storage, as well as additional hardware like Host Bus Adapters.
Meta Data Manager (MDM) The MDM configures and monitors VxFlex OS. It contains all the metadata required for VxFlex OS operation. Configure the MDM in Single Mode on a single server or redundant Cluster Mode – three members on three servers or five members on five servers. Note: Use Cluster Mode for all production environments. Dell EMC does not recommend Single Mode because it exposes the system to a single point of failure. 1.
2 Building a Leaf-Spine topology This paper describes a general-purpose virtualization infrastructure suitable for a modern data center. The solution is based on a Leaf-Spine topology utilizing Dell EMC Networking S4248FB-ON switches for the leaf switches, and Dell EMC Networking Z9100-ON switches for the spine switches. This example topology uses four spine switches to maximize throughput between leaf switches and spine switches.
The following physical concepts apply to all routed Leaf-Spine topologies: • • • • • 2.1 Each leaf switch connects to every spine switch in the topology. Spine switches only connect to leaf switches. Leaf switches connect to spine switches and other devices such as servers, storage arrays, and edge routers. Servers, storage arrays, edge routers and other non-leaf-switch devices never connect to spine switches. It is a best practice to use VLT for connecting leaf switch pairs.
2.3 Routing protocol selection for Leaf-Spine Typical routing protocols used when designing a Leaf-Spine network. • • Border Gateway Protocol (BGP) - External (EBGP) or Internal (IBGP) Open Shortest Path First (OSPF) This guide includes examples for both OSPF and EBGP configurations. Table 1 lists items to consider when choosing between OSPF and BGP protocols. OSPF and BGP table OSPF OSPF is the choice of protocol for networks under same administration or internal networks.
2.3.2 OSPF OSPF provides routing inside a company’s autonomous network, or a network that a single organization controls. While generally more memory and CPU-intensive than BGP, if offers a faster convergence without any tuning. An example for configuring OSPF routing on the Leaf-Spine network is also provided in the guide. Note: For detailed information on Leaf-Spine design, see Dell EMC Networking L3 Design for Leaf-Spine with OS10EE.
3 Configuration and Deployment This section describes how to configure the physical and virtual networking environment for this hyperconverged VxFlex OS deployment. The following items are discussed: • • • • • 3.1 Key networking protocols Network IP addressing Physical switch configuration Virtual datacenter, clusters, and hosts Virtual networking configuration VxFlex OS solution example There are several options available when deploying VxFlex OS.
The commands within each configuration file can be modified to apply to the reader’s network. Network interfaces, VLANs, and IP schemes can be easily changed and adapted using a text editor. Once modified, copy/paste the commands directly into the switch CLI of the appropriate switch. The following subsections provide information to assist in the deployment examples detailed in this guide. 3.2.
Loopback IP addressing Each leaf connects to each spine, but note that the spines do not connect to one another. In a Leaf-Spine, topology there is no requirement for the spines to have any interconnectivity. Given any single-link failure scenario, all leaf switches retain connectivity to one another. Table 2 shows loopback addressing and BGP ASN numbering associations. Building BGP neighbor relationships require this information along with the point-to-point information in Table 3 the next section.
Interface and IP configuration Link Source switch Source interface Source IP Network Destination switch Destination interface Destination IP A Leaf 1A eth1/1/45 .1 192.168.1.0/31 Spine 1 eth1/1/1 .0 B Leaf 1A eth1/1/46 .1 192.168.2.0/31 Spine 2 eth1/1/1 .0 C Leaf 1B eth1/1/45 .3 192.168.1.2/31 Spine 1 eth1/1/2 .2 D Leaf 1B eth1/1/46 .3 192.168.2.2/31 Spine 2 eth1/1/2 .
3.2.6 ECMP ECMP is the core protocol facilitating the deployment of a layer 3 leaf-spine topology. ECMP gives each spine and leaf switch the ability to load balance flows across a set of equal next-hops. For example, when using two spine switches, each leaf has a connection to each spine. For every flow egressing a leaf switch, there exists two equal next-hops, one to each spine. ECMP 3.2.7 VRRP A VRRP instance is created for each VLAN/network in Table 4.
3.3 VMware virtual networking configuration This section provides details on configuration of virtual networking settings within VMware vCenter. All steps provided in the next sections are completed with the vCenter Web Client. Note: ESXi has been installed on all hosts, vCenter has been deployed, and all hosts have been added to vCenter. See Appendix B for information on preparing servers for VxFlex OS deployment. 3.3.
3.3.2 Create clusters and add hosts When a host is added to a cluster, the host's resources become part of the cluster's resources. The cluster manages the resources of all hosts within it. This section shows how to create a cluster. All ESXi hosts are added to the cluster. The cluster name in this example is for identification purposes only. To add clusters to the datacenter, complete the following steps: 1. 2. 3. 4. On the web-client Home screen, select Hosts and Clusters.
Note: The steps to migrate an ESXi management VMkernel adapter from a standard switch to a VDS configured for a LAG is not covered in this document. The following VMware documents provide details to assist with the migration: https://kb.vmware.com/s/article/1010614 https://docs.vmware.com/en/VMware-vSphere/6.5/com.vmware.vsphere.networking.doc/GUID-34A968485930-4417-9BEB-CEF487C6F8B6.
ScaleIO-data01 port group ScaleIO management port group 22 ScaleIO/VxFlex OS IP Fabric Best Practice and Deployment Guide with OS10EE | version 1.
vMotion port group 3.3.4 Create the VDS To create the VDS complete the following steps: 1. 2. 3. 4. 5. On the web client Home screen, select Networking. Right-click Datacenter. Select Distributed switch > New Distributed Switch. Provide a name for the VDS. Click Next. On the Select version page, select Distributed switch: 6.5.0 > Next. On the Edit settings page: a. Set the Number of uplinks to 4. b. Leave Network I/O Control set to Enabled. c. Uncheck the Create a default port group box. 6.
VDS created for compute and ScaleIO 3.3.5 Add distributed port groups In this section, separate distributed port groups for management, ScaleIO-management, vMotion, and ScaleIO-data01are added to the VDS. To create the port group for management traffic on the VDS, complete the following steps: 1. On the web client Home screen, select Networking. 2. Right-click the VDS. Select Distributed Port Group > New Distributed Port Group. 3.
Distributed port group settings page – vMotion port group Repeat steps 1-5 above to create the distributed port group for ScaleIO-management, vMotion, and ScaleIOdata01. In this example the following VLAN IDs were used: • • • • Management VLAN ID > 1731 ScaleIO-management VLAN ID > 1733 vMotion VLAN ID > 1732 ScaleIO-data01 VLAN ID > 1734 Ensure to set each port group name with a unique descriptive name. When complete, the Navigator pane appears similar to Figure 18.
Distributed switches with each port group created 3.3.6 Create LACP LAGs Since Link Aggregation Control Protocol (LACP) LAGs are used in the physical network between ESXi hosts and physical switches, LACP LAGs are also configured on each VDS. To enable LACP on the VDS, complete the following steps: 1. On the web client Home screen, select Networking. 2. In the Navigator pane, select the VDS. 3. In the center pane, select Configure > Settings > LACP. 4. Click the icon.
LAG configuration This creates lag1 on the VDS. Repeat steps 1-7 to create another LACP LAG, named lag2. Click the refresh icon ( 20. 27 ) at the top of the screen if the lag does not appear in the table as shown in Figure ScaleIO/VxFlex OS IP Fabric Best Practice and Deployment Guide with OS10EE | version 1.
LAGs created on the VDS 3.3.7 Associate hosts and assign uplinks Hosts and their vmnics must be associated with each vSphere-distributed switch. All traffic uses the LAG uplinks to take advantage of the improved bandwidth usage and failover behavior the VLT feature provides. This section details configuration of the uplinks. Note: Before starting this section, be sure you know the vmnic-to-physical adapter mapping for each host.
i. On the first host in the cluster, select the appropriate vmnic (vmnic0 in this example) and click . ii. Select lag1-0 > OK. iii. On the same host, select the next appropriate vmnic (vmnic5 in this example) and click . iv. Select lag1-1 > OK. v. On the same host, select the next appropriate vmnic (vmnic1 in this example) and click . vi. Select lag2-0 > OK. vii. On the same host, select the next appropriate vmnic (vmnic4 in this example) and click . viii. Select lag2-1 > OK. ix.
Uplinks configured on ScaleIO VDS This configuration brings up the LAGs on the upstream switches. Confirm the configuration by running the vltport-detail show command on the upstream switches as shown in the example below. The Status column now indicates all LAGs are up.
VLT Unit ID Port-Channel Status Configured ports Active ports ------------------------------------------------------------------------* 1 port-channel2 up 1 1 2 port-channel2 up 1 1 vlt-port-channel ID : 3 VLT Unit ID Port-Channel Status Configured ports Active ports ------------------------------------------------------------------------* 1 port-channel3 up 1 1 2 port-channel3 up 1 1 vlt-port-channel ID : 4 VLT Unit ID Port-Channel Status Configured ports Active ports --------------------------------------
5. On the Teaming and failover page, click lag1 and move it up to the Active uplinks section by clicking the up arrow. Move Uplinks 1-4 down to the Unused uplinks section by clicking the down arrow. Leave other settings at their defaults. The Teaming and failover page should look similar to Figure 22 when complete. Teaming and failover settings for LAGs 6. Click Next followed by Finish to apply settings.
VLAN and network examples VMkernel Name VLAN ID IP Address Subnet Mask Management > vmk0 1731 172.17.31.yyy 255.255.255.0 vMotion > vmk1 1732 172.17.32.yyy 255.255.255.0 ScaleIO-data01 > vmk2 1734 172.17.34.yyy 255.255.255.0 Note: VMkernel adapter vmk0 is installed by default for host management at the time of ESXi installation. To add the vMotion VMkernel adapters to all hosts connected to the VDS, complete the following steps: 1. On the web client Home screen, select Networking. 2.
Host VMkernel adapters page To verify the configuration, ensure the vMotion adapter, vmk1 in this example is shown as Enabled in the vMotion Traffic column. Verify the VMkernel adapter IP addresses are correct. Verify the information is correct on other hosts, as needed. Note: The example used in this guide only uses four hosts on a single leaf pair. For large scale designs with multiple routed leaf pairs, additional configuration steps are required.
VDS VMkernel ports, VLANs, and IP addresses for management port group Repeat steps 1-3 above for all port groups within the VDS. 3.3.11 Enable LLDP Enabling Link Layer Discovery Protocol (LLDP) on vSphere-distributed switches is optional but can be helpful for link identification and troubleshooting. Note: LLDP functionality may vary with adapter type. LLDP must also be configured on the physical switches per the switch configuration instructions provided earlier in this guide.
To view LLDP information sent from the ESXi host adapters, run the following command from the CLI of a directly connected switch: Dell#show lldp neighbors Loc PortID Rem Host Name Rem Port Id Rem Chassis Id -------------------------------------------------------------------------ethernet1/1/1 atx01w02esx01.del... 00:50:56:50:f6:cf vmnic1 ethernet1/1/2 atx01w02esx01.del... 00:50:56:50:da:f0 vmnic0 ethernet1/1/3 atx01w02esx02.del... 00:50:56:5e:2d:0a vmnic1 ethernet1/1/4 atx01w02esx02.del...
4 Scaling and tuning guidance 4.1 Decisions on scaling Dell EMC Networking provides a resilient, high-performance architecture that improves availability and meets Service Level Agreements (SLAs) more effectively. This example uses the Dell EMC Networking S4248FBON switch because of its ability to provide a low latency with deep buffers for optimum performance.
4.3 Scaling beyond 16 racks The proof-of-concept scaling that Figure 25 shows allows four 16-rack pods connected using an additional spine layer to scale in excess of 1,000 nodes with the same oversubscription ratio. This scenario reduces the number of racks available per pod to accommodate the uplinks required to connect to the super spine layer. It is important to understand the port-density of switches used and their feature sets’ impact on the number of available ports.
a. The total reservation for system traffic must not exceed 75% of the bandwidth supported by the physical adapter with the lowest capacity of all adapters connected to the distributed switch. 6. In the Limit text box, enter the maximum bandwidth that system traffic of the selected type can use. 7. Click OK to apply the allocation settings. 4.5 Tuning Jumbo frames In the initial system solution that was put together for this set of VxFlex OS validation scenarios, jumbo frames was not enabled.
4.6 5 Leaf switch interface leading to spine switch 6 Spine switch interface from originating leaf switch 7 Spine switch interface to designated leaf switch 8 Leaf switch interface from originating spine switch 9 VLAN interface 10 Port channel interface and members 11 Rack designated VDS 12 VMkernel interface 13 SDS VM interface Quality of Service (QoS) The hyper-converged solution includes application and storage traffic distributed across the entire leaf-spine network architecture.
i. j. Leave the Source and Destination Address settings as default. Click OK. Traffic rule for ScaleIO-data01 distributed port group 5. Click OK to save the MDM port group settings. This simplified example for QoS only demonstrates the use of a single class of traffic, VxFlex OS storage traffic. For more complex traffic priority requirements, administrators can assign multiple unique DSCP values to the appropriate port group.
• • • Uses the DSCP values as configured on the ScaleIO-data01 port group Maps DSCP value to a specified queue Prioritizes egress traffic on uplinks through strict queuing Note: The example below uses a DCSP value of 46 and a queue value of 3. The choice of values is arbitrary and is used to show any DSCP value can be manually mapped to any queue. Leaf switch configuration procedure: 1. Access the command line and enter configuration mode. 2. Create a class map to match traffic for the DSCP value.
4.6.3 QoS validation Monitor QoS marking and performance on the switches through show commands. This section details the show command that can be used to evaluate if the QoS configuration is functioning. The statistics below are from application and storage traffic generated by a test traffic generator. Following is from OS10 with QoS policy applied above. Shows no dropped packets exiting queue 3.
A Additional resources This section tells you where to find documentation and other support resources for components as used in the examples this document describes. A.1 Virtualization components The table below lists the software components used by this document: Software Components A.2 Software Version Link to Documentation VMware vSphere ESXi 6.5.0 U1 5969303 http://www.vmware.com/products/vsphere/ VMware vCenter Server Appliance (vCSA) 6.5.0 U1 5973321 http://www.vmware.
A.3 Server and switch component details The table below lists the BIOS, firmware and driver components used in the examples shown in this document: BIOS, Firmware, and Switch OS Components A.4 Component Version Notes PowerEdge R730xd Server BIOS 2.4.3 BIOS facilitates the hardware initialization process and transitions control to the operating system. Integrated Remote Access Controller (iDRAC) 2.41.40.
R730xd front view with bezel R730xd front view without bezel In addition to the R730xd back-panel features, the R730xd includes two optional 2.5” hot-plug drives in the back of the system. R730xd back view A.5 S4248FB-ON switch The S4248FB-ON is the latest generation S-Series multi-purpose 10/40/100GbE switch with deep buffers for optimum performance and connectivity, featuring 40 x 10GbE SFP+ ports + 2 x 40GbE QFSP+ ports + 6 x 100GbE QSFP28 ports.
A.6 Z9100-ON switch The Z9100-ON is a 1RU layer 2/3 switch with 32 ports supporting 10/25/40/50/100GbE. Two Z9100-ON switches are used as spine switches in the leaf-spine topology covered in this guide. Z9100-ON A.7 S3048-ON switch The S3048-ON is a 1RU layer 2/3 switch with 48, 1GbE Base-T ports. One S3048-ON switch is used for OOB management traffic in this guide. S3048-ON 47 ScaleIO/VxFlex OS IP Fabric Best Practice and Deployment Guide with OS10EE | version 1.
B Prepare your environment This section covers basic PowerEdge server preparation and ESXi hypervisor installation. Installation of guest operating systems (Microsoft Windows Server, Red Hat Linux, etc.) is outside the scope of this document. Note: Exact iDRAC console steps in this section may vary slightly depending on hardware, software and browser versions used. See your PowerEdge server documentation for steps to connect to the iDRAC virtual console. B.
B.3 Configure the PERC H730 Controller As a best practice, Dell EMC recommends using the PERC H730 controller in RAID mode and create a RAID0 container for each disk attached to the controller. This allows RDM (Raw Device Mapping) for all hard disk to be mapped directly to the SVM. Storage controllers used in an EMC VxFlex OS deployment should be set to RAID mode. For the deployment used in this guide, this applies to all PERC H730 controllers in each of the R730XD servers.
4. From the System Setup Main Menu, select Device Settings. 5. From the list of devices, select the PERC controller. This opens the Modular RAID Controller Configuration Utility Main Menu. 6. Select Virtual Disk Management. 7. The total number of virtual disks should equal 24 (Virtual Disk 0 through Virtual Disk 23). B.4 Install ESXi Dell EMC recommends using the latest Dell EMC customized ESXi .iso image available on www.dell.com/support.
7. Optionally, under Troubleshooting Options, enable the ESXi shell and SSH to enable remote access to the CLI. 8. Log out of the ESXi console. 51 ScaleIO/VxFlex OS IP Fabric Best Practice and Deployment Guide with OS10EE | version 1.
C VxFlex OS with routed leaf - multiple rack considerations The section provides information relating to scaling the VxFlex OS example described throughout this document. When using a layer 3 design to include the leaf and spine, additional steps are needed to ensure VxFlex OS and VMware network settings are complete. C.1 VMkernel configuration - add static routes for default gateways The default gateway for each network configured on the VMkernels cannot be modified during VMkernel creation.
Note: Navigate to Dell EMC ScaleIO/VxFlex OS Software Only: Documentation Library to download the full documentation package and Dell EMC ScaleIO/VxFLex OS Documentation Hub for additional VxFlex OS documentation. You may need to enter EMC Community Network (ECN) login credentials or create an ECN account to access the documentation package. C.2.
EMC ScaleIO plugin icon C.2.2 Installing the SDC on ESXi hosts The SDC component must be applied to every ESXi host within the VxFlex OS system. Complete the Pre-Deployment procedure summarized below within the EMC VxFlex OS plugin application: • • • C.2.3 Basic tasks > Pre-Deployment Actions. Select all the ESXi hosts, check Install SDC, and provide root passwords. Complete install and restart the ESXi hosts.
• • • • Create a new ScaleIO system; agree to license terms. Enter a System Name and password. Select the vCenter server and all hosts in all clusters. Select a 3-node cluster. - • • • • • Configure Performance, Sizing, and Syslog; leave the setting as default. Enter a Protection Domain name. Enter Storage Pool name(s), enable zero padding. Create new Fault Sets is left as its default; none are created in this example. Add SDSs. - • Confirm an ESXi host selected to be the Gateway VM.
• Configure SVM. - Configure ScaleIO Virtual Machine (SVM) IP addresses. Enter IP information for all SVMs. See Table 11 for IP information for this deployment example. This example uses 172.17.34.4 is used for the Cluster Virtual IP address, network ScaleIOdata01. ScaleIO Wizard – Configure SVM ESXi Name (function) Mgmt IP & Subnet Mask Default Gateway Data IP & Subnet Mask ScaleIO-172.17.33.11-GW (ScaleIO Gateway) 172.17.33.11 / 255.255.255.0 172.17.33.253 172.17.34.11 / 255.255.255.
Modification steps This section provides step-by-step instructions for modifying the SVMs to continue with the deployment wizard. Procedure: 1. Let the deployment process complete with failed tasks. The EMC VxFlex OS plugin screen will show errors. 2. Navigate to the console of any SVM. Use the Open Console option or an SSH client to open a session to the Mgmt IP listed in Table 11. 3.