Microsoft HCI Solutions from Dell Technologies: Managing and Monitoring the Solution Infrastructure Life Cycle Operations Guide Dell Technologies Solutions Part Number: H17518.
Notes, cautions, and warnings NOTE: A NOTE indicates important information that helps you make better use of your product. CAUTION: A CAUTION indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. © 2018 —2021 Dell Inc. or its subsidiaries. All rights reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries.
Contents Chapter 1: Introduction................................................................................................................. 4 Document scope.................................................................................................................................................................. 4 Audience and assumptions................................................................................................................................................ 4 Known issues..
1 Introduction Topics: • • • • • Document scope Audience and assumptions Known issues Microsoft HCI Solutions from Dell Technologies overview Deployment guidance Document scope This operations guide focuses on operational aspects of a hyperconverged infrastructure solution on Azure Stack HCI with Hyper-V and Storage Spaces Direct.
built by using these AX nodes uses a flexible solution architecture rather than a fixed component design. The following figure illustrates one of the flexible solution architectures. It consists of a compute cluster alongside the redundant top-of-rack (ToR) switches, a separate out-of-band network, and an existing management infrastructure in the data center. NOTE: Microsoft HCI Solutions from Dell Technologies are available in both hybrid and all-flash configurations.
Deployment guidance For deployment guidance and instructions for configuring a cluster using Dell EMC Solutions for Azure Stack HCI, see Microsoft HCI Solutions from Dell Technologies. This operations guidance is applicable only to cluster infrastructure that is built using the instructions provided in the deployment documentation for AX nodes.
2 Day 0 Operations After deploying the Azure Stack HCI cluster, complete day 0 operations. Topics: • • • • Azure onboarding for Azure Stack HCI OS Licensing for Azure Stack HCI for Windows Server 2016 and 2019 Creating virtual disks Managing and Monitoring Azure Stack HCI Cluster using Windows Admin Center Azure onboarding for Azure Stack HCI OS Clusters deployed using Azure Stack HCI OS must be onboarded to Microsoft Azure for full functionality and support.
● On Windows Server 2016, Windows Server 2019, and the Azure Stack HCI operating system clusters with three or more nodes—Three-way mirror ● On Windows Server 2019 and Azure Stack HCI operating system clusters with four or more nodes—Three-way mirror or mirror-accelerated parity Managing and Monitoring Azure Stack HCI Cluster using Windows Admin Center Windows Admin Center is a browser-based management tool developed by Microsoft to monitor and manage Windows servers, failover clusters, and hyperconverged
Adding the HCI cluster connection About this task For monitoring and management purposes, add the hyperconverged cluster that is based on Dell EMC Solutions for Azure Stack HCI as a connection in Windows Admin Center. Steps 1. Go to Windows Admin Center > Cluster Manager, as shown in the following figure. Figure 3. HCI cluster navigation 2. Click Add. The Add Cluster window is displayed. 3. Enter the cluster FQDN and select Also add servers in the cluster, as shown in the following figure. Figure 4.
Accessing the HCI cluster To view the dashboard for the HCI cluster that you have added to Windows Admin Center, in the Cluster Manager window, click the cluster name. This dashboard provides the real-time performance view from the HCI cluster. This view includes total IOPS, average latency values, throughput achieved, average CPU usage, memory usage, and storage usage from all cluster nodes. It also provides a summarized view of the Azure Stack HCI cluster with drives, volumes, and VM health.
Figure 6. Servers: Inventory tab NOTE: The metrics in the figure are for a four-node Azure Stack HCI cluster with all-flash drive configuration. Viewing drive details About this task View the total number of drives in the cluster, the health status of the drives, and the used, available, and reserve storage of the cluster as follows. Steps 1. In the left pane, select Drives. 2. Click the Summary tab, as shown in the following figure. Figure 7.
To view the drive inventory from the cluster nodes, from the left pane, select Drives, and then click the Inventory tab. Figure 8. Drives: Inventory tab The HCI cluster is built using four AX-740xd nodes, each with two 1.92 TB NVMe drives. By clicking the serial number of the drive, you can view the drive information, which includes health status, slot location, size, type, firmware version, IOPS, used or available capacity, and storage pool of the drive.
Figure 9. Volumes: Summary tab The Inventory tab provides the volume inventory from the HCI cluster nodes. You can manage and monitor the volumes. Figure 10. Volumes: Inventory tab Creating volumes in Storage Spaces Direct About this task Create volumes in Storage Spaces Direct in Windows Admin Center as follows. Steps 1. Go to Volumes > Inventory. 2. Click Create. The Create volume window is displayed. 3. Enter the volume name, resiliency, and size of the volume, and then click Create.
Managing volumes About this task Open, expand, delete, or make a volume offline as follows. Steps 1. Go to Volumes > Inventory. 2. Click the volume name. 3. Click Open to open the volume folder. 4. Click Offline or Delete to make the volume offline, or to delete the volume. 5. Click Expand to expand the volume. The Expand volume window is displayed. 6. Enter the additional size of the volume. 7. Select the volume size from the drop-down list and click Expand.
Figure 11.
Figure 12. VMs: Summary tab You can perform the following tasks from the Windows Admin Center console: ● View a list of VMs that are hosted on HCI cluster. ● View individual VM state, host server information, virtual machine uptime, CPU, memory utilization, and so on. ● Create a new VM. ● Modify VM settings. ● Set up VM protection. ● Delete, start, turn off, shut down, save, delete saved state, pause, resume, reset, add new checkpoint, move, rename, and connect VMs.
Figure 13. Virtual switches Dell EMC OpenManage Integration with Windows Admin Center Dell EMC OpenManage Integration with Windows Admin Center enables IT administrators to manage the hyperconverged infrastructure (HCI) that is created by using Microsoft HCI Solutions from Dell Technologies. OpenManage Integration with Windows Admin Center simplifies the tasks of IT administrators by remotely managing the AX nodes and clusters throughout their life cycle.
Installing the Azure Stack HCI license (Ready Nodes only) AX nodes have a preinstalled Azure Stack HCI license. Storage Spaces Direct Ready Nodes require the installation of an After Point of Sale (APOS) license. Steps 1. Log in to iDRAC. 2. Select Configuration > Licenses. 3. Select Import, browse to and select the license, and then click Upload. Managing Azure Stack HCI clusters Steps 1. In the upper left of Windows Admin Center, select Cluster Manager from the menu. 2.
● ● ● ● ● ● ● ● CPUs Fans Storage controllers iDRAC Storage enclosures Physical disks Voltages Temperatures Selecting the Critical or Warning section in the overall health status doughnut chart displays the nodes and components that are in the critical or warning state respectively. Select sections in the doughnut chart to filter the health status of the components. For example, selecting the red section displays only the components with critical health status.
Figure 15. iDRAC dashboard Settings Use the Settings tab in the Dell EMC OpenManage Integration with Windows Admin Center UI to view the latest update compliance report, update the cluster, and configure proxy settings. Update tools To view the latest update compliance report and update the cluster using an offline catalog, OpenManage Integration with Windows Admin Center requires that you configure the settings for the update compliance tools.
To use an offline catalog, the update tools must be configured under the Settings tab, and the catalog file must be exported using the Dell Repository Manager and placed in a shared folder. See Obtaining the firmware catalog for AX nodes or Ready Nodes using Dell EMC Repository Manager. 2. Click Next: Compliance Details to generate the update compliance report. By default, all the upgrades are selected, but you can make alternate selections as needed. Figure 16. Compliance Details 3.
Figure 17. Update Summary 4. To schedule the update for a later time, click Schedule later, select Date/time and click Next cluster aware update to download the required updates. To use the schedule later feature, download the required downloads and keep them ready to update at the specified time. 5. Click Next: Cluster Aware Update to begin the update process and click Yes at the prompt to enable Credential Security Service Provider (CredSSP) to update the selected components.
Figure 18. Cluster Aware Update When the update job is completed, the compliance job is triggered automatically. Full Stack Cluster-Aware Updating prerequisites for AX-7525 and AX-6515 nodes (offline update) About this task If an Internet connection is not available, run Full Stack Cluster-Aware Updating (CAU) in offline mode as follows: Steps 1. Download the asHCISolutionSupportMatrix.json and asHCISolutionSupportMatrix.json.sign files from http://downloads.dell.com/omimswac/supportmatrix/. 2.
NOTE: Full Stack Cluster-Aware Updating (CAU) is only available on Azure Stack HCI clusters built using the Azure Stack HCI operating system. For more information about CAU, see the Cluster-Aware Updating Overview. NOTE: Full Stack CAU is a licensed feature. Ensure that the Azure Stack HCI license is installed before proceeding. To perform both operating system updates and hardware upgrades on Azure Stack HCI cluster nodes, carry out the following steps: Steps 1.
● Windows 10 gateway system—\Users\\AppData\Local\Temp\generated\logs ● After the cluster update is over, DSU logs for individual nodes can be found in the \Temp\OMIMSWAC folder on the respective nodes. To run the compliance report again, click Re-run Compliance and repeat steps 4 to 7. 10. After the updates are downloaded, follow the instructions in the Windows Admin Center to install both operating system and hardware updates.
Table 1. Known issues (continued) Issue Resolution/workaround adapters, and USB NIC IPv4 addresses cannot be used to communicate externally, which, therefore, breaks cluster communication on those NICs. if ($rndisAdapter) { Write-Log -Message 'Remote NDIS found on the system. Cluster communication will be disabled on this adapter.' # Get the network adapter and associated cluster network $adapterId = [Regex]::Matches($rndisAdapter.InstanceID, '(?<={)(.*?)(?=})').
OpenManage Integration for Microsoft System Center provides operating system deployment, Azure Stack HCI cluster creation, hardware and firmware updating, and maintenance of servers and modular systems. Integrate OpenManage Integration for Microsoft System Center with Microsoft System Center Virtual Machine Manager (SCVMM) to manage your PowerEdge servers in virtual and cloud environments. Checking compliance and updating firmware NOTE: This method is applicable only for Storage Spaces Direct Ready Nodes.
4. In the left pane, select Maintenance Center, and then, at the top of the window, select Maintenance Settings. 5. Update the systems by using the online catalog or offline catalog. Using the online catalog: a. Select DELL ONLINE HTTPS CATALOG (default), and then click edit. b. On the Firmware Update Source page, keep the default values, create a proxy credentials profile, and select the proxy credentials to connect to the Internet. c.
Updating the firmware with the cluster-aware feature With OpenManage Integration for Microsoft System Center, you can update firmware or schedule firmware updates using the cluster-aware feature. Steps 1. Launch SCVMM. 2. In the left pane, click Fabric, and then, under Servers, select All Hosts. 3. On the top banner, click DELL EMC OMIMSSC. 4. In the left pane, select Maintenance Center. 5.
Placing an AX node in maintenance mode About this task After ensuring that the prerequisites are met and before performing the platform updates, place the AX node in maintenance mode (pause and drain). You can move roles or VMs and gracefully flush and commit data in the AX node. Steps 1. Run the following command to put the node in maintenance mode (pause and drain).
The Update Catalog for Microsoft HCI Solutions is populated in the Base Catalog section. 15. In the Manual Repository Type, click All systems in base catalog and then click Add. The repository is displayed on the repository dashboard available in the home page. 16. Select the repository and click Export. The Export Deployment Tools window is displayed. 17. Select the location to export files and click Export. The files are exported to the specified location.
Figure 20. Select updates 6. Select the updates and click Install Next Reboot to install and reboot the system. Updating the out-of-box drivers For certain system components, you might need to update the drivers to the latest Dell supported versions, which are listed in the Supported Firmware and Software Matrix.
Restarting a cluster node or taking a cluster node offline About this task Use the following procedure to restart a cluster node or to take a cluster node offline for maintenance: Steps 1. Verify the health status of your cluster and volumes: ● Get-StorageSubSystem -FriendlyName *Cluster* | Get-StorageHealthReport ● Get-physicaldisk ● Get-virtualdisks 2. Suspend the cluster node: ● Suspend-ClusterNode -name “Hostname” -Drain 3.
Figure 21. Expanding the Azure Stack HCI cluster Azure Stack HCI node expansion In an HCI cluster, adding server nodes increases the storage capacity, improves the overall storage performance of the cluster, and provides more compute resources to add VMs. Before adding new server nodes to an HCI cluster, complete the following requirements: ● Verify that the processor model, HBA, and NICs are of the same configuration as the current nodes on the cluster and PCIe slots.
1. 2. 3. 4. Pass cluster validation and SES device compliance tests. Verify that the nodes are compliant with the firmware baseline. Update the hardware timeout configuration for the Spaces port. After the node configuration, update Microsoft Windows to bring the node to the same level as the cluster. Adding server nodes manually NOTE: The procedure is applicable only if the cluster and Storage Spaces Direct configuration is done manually. To manually add server nodes to the cluster, see https://technet.
Performing AX node recovery If a cluster node fails, perform node operating system recovery in a systematic manner to ensure that the node is brought up with the configuration that is consistent with other cluster nodes. The following sections provide details about operating system recovery and post-recovery configuration that is required to bring the node into an existing Azure Stack HCI cluster. NOTE: To perform node recovery, ensure that the operating system is reinstalled.
Figure 23. Create a virtual disk 4. Provide a virtual disk name and select BOSS M.2 devices in the physical disks. Figure 24. Provide virtual disk name Figure 25.
5. Click Add Pending Operations. 6. Go to Configuration > Storage Configuration > Virtual Disk Configuration. Figure 26. Initialize configuration 7. Select the virtual disk, and then select Initialize: Fast in Virtual Disk Actions. 8. Reboot the server. NOTE: The virtual disk creation process might take several minutes to complete. 9. After the initialization is completed successfully, the virtual disk health status is displayed. Figure 27.
Factory operating system recovery For the factory-installed OEM license of the operating system, Dell Technologies recommends that you use the operating system recovery media that shipped with the PowerEdge server. Using this media for operating system recovery ensures that the operating system stays activated after the recovery. Using any other operating system media triggers the need for activation after operating system deployment.
FullPowerCycle FullPowerCycle is a calling interface function that provides a way to reset the server auxiliary power. An increasing amount of server hardware runs on server auxiliary power. Troubleshooting some server issues requires you to physically unplug the server power cable to reset the hardware running on auxiliary power. The FullPowerCycle feature enables the administrator to connect or disconnect auxiliary power remotely without visiting the data center.