Dell EMC Integrated System for Microsoft Azure Stack HCI: Stretched Cluster Deployment Reference Architecture Guide Abstract This reference architecture guide provides an overview of the Microsoft Azure Stack HCI operating system and guidance on how to deploy stretched clusters in your environment.
Notes, cautions, and warnings NOTE: A NOTE indicates important information that helps you make better use of your product. CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. © 2021 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries.
Contents Chapter 1: Introduction................................................................................................................. 4 Document overview............................................................................................................................................................ 4 Audience and scope............................................................................................................................................................
1 Introduction This chapter presents the following topics: Topics: • • Document overview Audience and scope Document overview This reference architecture guide provides an overview of the Microsoft Azure Stack HCI operating system and guidance on how to deploy stretched clusters in your environment. The guide provides network topology references and best practices to consider during a stretched cluster deployment.
2 Solution overview This chapter presents the following topics: Topics: • • Introduction Solution integration and network architecture Introduction Dell EMC Solutions for Azure Stack HCI offers stretched cluster solutions with AX nodes from Dell Technologies. Built using industry-leading PowerEdge servers, AX nodes offer fully validated HCI nodes for a variety of use cases.
The following figure shows an Active-Active setup: Figure 1. An Active-Active setup Sites can be logical or physical. For logical sites, a stretched cluster can exist on single or multiple racks or in different rooms in the same data center. For physical sites, the stretched cluster can be in different data centers on the same campus or in different cities or regions. Stretched clusters using two physical sites provide disaster recovery and business continuity should a site suffer an outage.
routes are needed on the L2/L3 to ensure that the Replica networks reach the intended destination. Subsequent sections of this guide provide more information about the expectations of customer networking teams. A stretched cluster environment has two storage pools, one per site. In both topologies described in the preceding section, storage traffic requires Remote Direct Memory Access (RDMA) to transfer data between nodes within the same site.
3 Solution deployment This chapter presents the following topics: Topics: • • • • • Introduction Deployment prerequisites for stretched clusters Customer network team requirements Design principles and best practices Validated network topology Introduction Stretched clusters with Dell EMC Solutions for Azure Stack HCI can be configured using PowerShell. This guide describes the prerequisites for this deployment.
Table 2. Deployment prerequisites for stretched clusters (continued) Component Requirements ● If two sites have host networks in different subnets, no additional configuration is needed for creating clusters. Otherwise, manual configuration of the cluster fault domain is required. ● RDMA Adapters for Storage/SMB traffic. ● RDMA is not supported for Replica traffic across WAN. ● At least a 1 Gb network between sites for Replication and inter-site Live Migration is required.
optimum performance of workloads. Low bandwidth and high latency between sites can result in very poor performance on the primary site in the case of both synchronous and asynchronous replication. Synchronous replication involves data blocks being written to log files on both sites before being committed. In asynchronous replication, the remote node accepts the block of replicated data and acknowledges back to the source copy.
rate of change of data is faster than the bandwidth of the replica link between the sites for large periods of time. This is critical and must be taken into consideration when designing the solution. NOTE: Both replication scenarios affect application performance because each data block has to be written multiple times, assuming that all volumes are configured for replication. NOTE: Stretched cluster with Storage Replica is not a substitute for a backup solution.
Figure 3. Network topology for a stretched cluster (basic) High throughput configuration In this topology we use two 25 GbE and one 1/10 GbE/25 GbE NICs for each host to configure a high throughput stretched cluster. One NIC is dedicated for intra-site RDMA traffic, similar to a standalone Storage Spaces Direct environment. The second NIC is used for replica traffic. SMB Multichannel is used to distribute traffic evenly across both replica adapters and it increases network performance and availability.
The following figure shows the network topology of an advanced stretched cluster: Figure 4.
4 Creating a stretched cluster This chapter presents the following topics: Topics: • • • • • • Introduction Test-Cluster Cluster creation Volumes Storage efficiency Test-SRTopology Introduction This section outlines the steps that are needed for configuring a stretched cluster. Complete the network configuration on all nodes for the network topology applicable to you. A sample IP address schema is provided for both supported network topologies in the previous section of this guide.
If Sites and Services with IP Subnets are configured on Active Directory, Failover Cluster Manager correctly shows a node to Site mapping, under Cluster Name >> Nodes. The following is a sample image of IP subnets defined in an Active Directory: Figure 5. IP subnets in an Active Directory If both sites are in the same IP network, use the New-ClusterFaultDomain cmdlet to define the two site names. Site names defined using New-ClusterFaultDomain override the names given in Active Directory.
Figure 7. Cluster networks Volumes Replication-enabled volumes can be created using a combination of PowerShell and Failover Cluster Manager or by using Windows Admin Center. NOTE: Install Storage Replica Module for Windows PowerShell (RSAT-Storage-Replica) on the management node with Desktop Experience that is used for installing Windows Admin Center and Failover Cluster Manager to access the cluster.
Storage efficiency Due to high I/Os on the underlying disks, stretched clusters require an underlying infrastructure capable of delivering high I/Os with low latency. Dell Technologies recommends all-flash configurations for stretched cluster deployments. All-flash configurations do not have a cache tier. The following table shows the difference in storage efficiency for a two-way and three-way mirror created on a single site and stretched cluster environment: Table 7.
5 Virtual Machines This chapter presents the following topics: Topics: • • • Introduction VM and storage affinity rules Preferred sites Introduction Virtual Machines in a stretched cluster environment can be managed using: ● PowerShell ● Failover Cluster Manager ● Windows Admin Center For more information, see Manage VMs on Azure Stack HCI using Windows Admin Center. In a stretched cluster environment, volumes hosting the virtual machines may or may not be replicated, depending on business requirements.
6 Failure/Recovery from failure of Site/Node This chapter presents the following topics: Topics: • • Planned failover Operation steps Planned failover Windows Admin Center has a Switch Direction feature that allows you to migrate workloads from one site to the other. This must be initiated on each volume. VMs hosted on the volumes follow the volumes to the migrated site after 10 minutes.
Site failure A site failure in a stretched cluster topology requires rebuilding all of the nodes of the affected site. If the failure happens at the primary site, the following scenarios occur: ● ● ● ● All volumes hosted on the affected site and associated VMs become inaccessible. After a brief period, the volumes move to the secondary site. The VMs restart on the secondary site.
A Appendices These appendices present the following topics: Topics: • • Appendix A: Sample PowerShell cmdlets for end-to-end deployment Appendix B: Supported hardware Appendix A: Sample PowerShell cmdlets for end-toend deployment Install required Windows features Install-WindowsFeature -Name Fs-Fileserver,Storage-Replica ,Hyper-V, FailoverClustering, Data-Center-Bridging -IncludeAllSubFeature -IncludeManagementTools Verbose Create VM switches and configure host networking #Create VMSwitch for Management
#Configure Replica Network as applicable #Replica 1 New-NetIPAddress -InterfaceAlias "SLOT 2 Port 1" -IPAddress 192.168.111.11 PrefixLength 24 -AddressFamily IPv4 -Verbose #Replica 2 New-NetIPAddress -InterfaceAlias "SLOT 2 Port 1" -IPAddress 192.168.112.
Preferred Sites can also be configured at cluster role and group level. (Get-ClusterGroup -Name SQLServer1).PreferredSite = 'Bangalore' If there is an Active-Active stretched cluster where Preferred Sites are not configured, it is highly recommended that you configure Preferred Sites for each volume. This will ensure that the volumes stay at the same site if there is a single node failure on either site. (Get-ClusterSharedVolume "Cluster Virtual Disk (ax740xds2N2)" ClusterGroup).
To enable replication on volumes, go to Storage >> Disks and right-click on the primary volume on which you want to enable replication. Then follow these steps: ● ● ● ● ● ● Select Replication and click Enable Select the log volume for the primary site Select the Replica volume and associated log volume for the secondary site Overwrite the destination volume unless you have a seeded disk Select the mode of replication Complete the wizard This enables replication on the volume after the initial block copy.
NOTE: Run Test-SRTopology only for a single volume. NOTE: If you choose asynchronous replication, ensure that you choose a log volume size of at least 30 GB. NOTE: Use either PowerShell or Windows Admin Center to create volumes, do not mix the tools. There is a minor difference in volume sizes created using PowerShell and Windows Admin Center that results in a failure if you try to enable replication.