Dell™ Failover Clusters With Microsoft® Windows Server® 2008 and Windows Server 2008 R2 Software Installation and Troubleshooting Guide w w w. d e l l . c o m | s u p p o r t . d e l l .
Notes, Cautions, and Warnings NOTE: A NOTE indicates important information that helps you make better use of your computer. CAUTION: A CAUTION indicates potential damage to hardware or loss of data if instructions are not followed. WARNING: A WARNING indicates a potential for property damage, personal injury, or death. ___________________ Information in this document is subject to change without notice. © 2008-2009 Dell Inc. All rights reserved.
Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . Features of Failover Clusters Running Windows Server 2008 . . . . . . . . . Supported Cluster Configurations . . . . . . . . . . . 10 . . . . . . 10 . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . 10 Operating System . System Requirements Cluster Storage . . . . . . . . . . . . . . . . . . . 12 . . . . . . . . . . . .
Configuring Windows Networking . . . . . . . . . . . Assigning Static IP Addresses to Cluster Resources and Components . . . . . . . . . . . . 22 . . . . 26 . . . . . . . . 27 . . . . . . . . . 27 Verifying Communications Between Nodes Installing the Storage Connection Ports and Drivers . . . . . . . . . . . . . . . . Installing and Configuring the Shared Storage System . . . . . . . . . . . . . Configuring Hard Drive Letters When Using Multiple Shared Storage Systems . . . . . .
3 Installing Your Cluster Management Software . . . . . . . . . . . . . . . . . . . . . . . . . 37 Microsoft Failover Cluster Management Console . . . 37 . . . . 37 . . . . . 38 Running Failover Cluster Management on a Remote Console . . . . . . . . . . . . . Launching Failover Cluster Management Console on a Remote Console . . . . . . 4 Understanding Your Failover Cluster . . . 39 . . . . . . . . . . . . . . . . . . . . . 39 . . . . . . . . . . . . . . . . . . . . 39 . . . . . . . . . .
5 Maintaining Your Cluster . . . . . . . . . . . . . . . . . . 55 . . . . . . . . . . . 55 Adding a storage to a Failover Cluster Node Configuring Network Settings of a Failover Cluster Node . . . . . . . Maintaining a Clustered Service or Application . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Running chkdsk on a Clustered Disk in Maintenance Mode . . . . . . . . . . . . . . . . . . . 57 . . . . . . 57 . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . .
Introduction Dell™ Failover Cluster is a group of systems working together to run a common set of applications that presents a single logical system to client applications. The systems (or nodes) in the cluster are physically connected by either local area network (LAN) or wide area network (WAN) and are configured with the cluster software. If a system or the network connections in the cluster fail, the services on the active node failover to the passive node in the cluster.
• Redundant paths to the shared storage • Failure recovery for applications and services • Flexible maintenance capabilities, allowing you to repair, maintain, or upgrade a node or storage system without taking the entire cluster offline The services and capabilities that are included with Failover Clusters running Windows Server 2008 are: • The Failover Cluster Management Interface — The Failover Cluster Management Interface is a task-oriented tool.
• Improvements in Scoping and Managing Shares — The process of creating a highly-available share with Failover Cluster running Windows Server 2008 is very simple when you use the Add a Shared Folder wizard. You can also use the Browse button to quickly and reliably identify the folder you want to use for the highly-available share.
Supported Cluster Configurations For the list of Dell-validated hardware, firmware, and software components for a Failover Cluster running Windows Server 2008, see the Dell Cluster Configuration Support Matrices located on the Dell High Availability Clustering website at www.dell.com/ha.
Table 1-1. Cluster Node Requirements (continued) Component Minimum Requirement NICs At least two NICs: one NIC for the public network and another NIC for the private network. NOTE: It is recommended that the NICs on each public network are identical, and that the NICs on each private network are identical. Internal disk controller One controller connected to at least two internal hard drives for each node. Use any supported RAID controller or disk controller.
Cluster Storage While configuring your Dell Failover Cluster with Windows Server 2008, attach all cluster nodes to a common shared storage. The type of storage array and topology in which the array is deployed can influence the design of your cluster. For example, a direct-attached SAS storage array may offer support for two cluster nodes whereas a SAN-attached Fibre Channel or iSCSI array has the ability to support sixteen cluster nodes.
• The Dell Cluster Configuration Support Matrices list the Dell validated hardware, firmware, and software components for a Failover Cluster environment. • The Rack Installation Guide included with your rack solution describes how to install your system into a rack. • The Getting Started Guide provides an overview of initially setting up your system. • The HBA documentation provides installation instructions for the HBAs.
Introduction
Preparing Your Systems for Clustering WARNING: Only trained service technicians are authorized to remove and access any of the components inside the system. See your safety information shipped with your system for complete information about safety precautions, working inside the system, and protecting against electrostatic discharge.
4 Establish the physical network topology and the TCP/IP settings for network adapters on each cluster node to provide access to the cluster public and private networks. 5 Configure each cluster node as a member in the same Microsoft® Active Directory® Domain. 6 Establish the physical storage topology and any required storage network settings to provide connectivity between the storage array and the systems that are configured as cluster nodes.
Installation Overview This section provides installation overview procedures for configuring a cluster running the Windows Server 2008 operating system. NOTE: The Storage management software may use different terms than those in this guide to refer to similar entities. For example, the terms "LUN" and "Virtual Disk" are often used interchangeably to designate an individual RAID volume that is provided to the cluster nodes by the storage array.
6 Install or update the storage connection drivers. For more information on connecting your cluster nodes to a shared storage array, see "Preparing Your Systems for Clustering" in the Dell Failover Cluster Hardware Installation and Troubleshooting Guide that corresponds to your storage array on the Dell Support website at support.dell.com/manuals.
Selecting a Domain Model On a cluster running the Microsoft Windows operating system, all nodes must belong to a common domain or directory model. The following configurations are supported: • It is recommended that all nodes of High Availability applications are member systems in an Microsoft Active Directory® domain. • All nodes are domain controllers in an Active Directory domain. • At least one node is a domain controller in an Active Directory and the remaining nodes are member systems.
Installing and Configuring the Windows Operating System CAUTION: Windows standby mode and hibernation mode are not supported in cluster configurations. Do not enable either mode. 1 Ensure that the cluster configuration meets the requirements listed in "Cluster Configuration Overview" on page 15. 2 Cable the hardware. NOTE: Do not connect the nodes to the shared storage systems at this time.
9 From node 1, go to the Windows Disk Management application, write the disk signature, partition the disk, format the disk, and assign drive letters and volume labels to the hard drives in the storage system. For more information, see "Preparing Your Systems for Clustering" in the Dell Failover Cluster Hardware Installation and Troubleshooting Guide for the specific storage array on the Dell Support website at support.dell.com/manuals.
Assigning Static IP Addresses to Cluster Resources and Components NOTE: WSFC supports configuring cluster IP address resources to obtain IP address from a DHCP server in addition to through static entries. It is recommended that you use static IP addresses. A static IP address is an Internet address that a network administrator assigns exclusively to a system or a resource. The address assignment remains in effect until it is changed by the network administrator.
Table 2-1. Applications and Hardware Requiring IP Address Assignments (continued) Application/Hardware Description Cluster node network adapters For cluster operation, two network adapters are required: one for the public network (LAN/WAN) and another for the private network (sharing heartbeat information between the nodes).
Table 2-2. Examples of IP Address Assignments (continued) Usage Cluster Node 1 Cluster Node 2 Private network static IP address 10.0.0.1 cluster interconnect (for node-to-node communications) 10.0.0.2 Private network subnet mask 255.255.255.0 255.255.255.0 NOTE: Do not configure Default Gateway, NetBIOS, WINS, and DNS on the private network.
• Anycast addresses — Anycast addresses used for one-to-one-of-many communication. In Anycast addressing an IP packet is sent to the nearest member in a group. Unicast Addresses have the following types: 1 Global unicast addresses — This address can be identified by format prefix (FP) of 001. The global unicast addresses are equivalent to public IPv4 addresses and can be used for Public Interfaces. They are globally routable and reachable on the IPv6 portion of Internet.
Configuring the Network Interface Binding Order for Clusters Running Windows Server 2008 After configuring the IP addresses for networks on your Failover Cluster, configure the Network Interface Binding Order: 1 Click Start→Control Panel→double-click Network And Sharing Center. 2 In the Tasks pane, click the Manage Network Connections. The Network Connections window appears. 3 Click the Advanced menu and then click Advanced Settings. The Advanced Settings window appears.
Ensure that each local server responds to the ping command. If the IP assignments are not set up correctly, the nodes may not be able to communicate with the domain. For more information on this issue, see "Troubleshooting" on page 5. Installing the Storage Connection Ports and Drivers Before you connect each cluster node to the shared storage: • Ensure that an appropriate storage connection exists on the nodes.
Configuring Hard Drive Letters When Using Multiple Shared Storage Systems Before creating the cluster, ensure that both nodes have the same view of the shared storage systems. Because each node has access to hard drives that are in a common storage array, each node must have identical drive letters assigned to each hard drive. Using volume mount points in Windows Server 2008, your cluster can access more than 22 volumes. NOTE: Drive letters A through D are reserved for the local system.
3 Perform the following steps on all the node(s): a Open Disk Management on the Server Manager. b Assign the drive letters for the drives. c Reassign the drive letter, if necessary. To reassign the drive letter: • With the mouse pointer on the same icon, right-click and select Change Drive Letter and Path from the submenu. • Click Change, select the letter you want to assign the drive (for example, Z), and then click OK. • Click Yes to confirm the changes.
To install Failover Clustering feature: 1 Click Start→Administrative Tools→Server Manager, if Server Manager is not running, 2 If prompted for permission to continue, click Continue. 3 Under Features Summary, click Add Features. 4 In the Add Features Wizard, click Failover Clustering and Multipath I/O and then click Install. 5 Click Close to close the wizard. 6 Repeat the step 1 to step 5 for each system that you want to configure as cluster node.
4 On the Testing Options window, select the specific tests you want to run or select Run all tests (recommended). 5 On the last screen of the Validate a Configuration wizard, click Next to confirm. This runs a list of validation tests and prompt the errors or warnings that are present in the configuration in the form of a Summary window.
Configuring the Quorum Disk in Failover Clustering The Quorum configuration determines the maximum number of failures a Failover Cluster can sustain without stopping the Cluster Service. In your cluster configured with the Windows Server 2008 operating system, you do not have to configure a shared storage resource for the Quorum disk.
3 Node Majority: The cluster nodes determine the maximum number of failures that the cluster can sustain (similar to the Majority Node Set feature in the Windows Server 2003 operating system). A Node Majority Cluster can sustain the failure of one node less than half the total number of nodes rounded up in the cluster.
1 In the Failover Cluster Management console, right-click on Failover Cluster Management, click Manage a Cluster, and select or specify the cluster you want to configure. 2 Click Services and Applications and click Configure a Service or Application under Actions. 3 Follow the instructions in the wizard to specify the service or application that you want to configure for high availability. When prompted, enter the following information: • A name for the clustered service or application.
Modifying Properties of a Clustered Service or Application Failover Clustering allows you to modify the failover behavior of a clustered service or application. To modify the clustered service properties: 1 Right-click on the clustered service or application and click on Properties. 2 Select from the two tabs General and Failover. The following options available under these tabs: • Preferred owners - This is an option under General tab.
Preparing Your Systems for Clustering
Installing Your Cluster Management Software This section provides information on configuring and administering your cluster using Microsoft® Failover Cluster Management console. Microsoft Failover Cluster Management Console Failover Cluster Management console is Microsoft's tool for configuring and administering a cluster. The following sections describe the procedures to run Failover Cluster Management console locally on a cluster node and to install the tool on a remote console.
Launching Failover Cluster Management Console on a Remote Console Perform the following steps on the remote console: 1 Ensure that the Failover Clustering Tools is installed from RSAT on the system. 2 Click Start and select Administrative Tools. 3 Select Failover Cluster Management. 4 Click Action tab in the console and select Manage a Cluster option. 5 Provide the name of the cluster you want to manage and click OK.
Understanding Your Failover Cluster Cluster Objects Cluster objects are the physical and logical units managed by a cluster. Each object is associated with the following: • Properties that define the object and its behavior within the cluster • A set of cluster control codes used to manipulate the object's properties • A set of object management functions to manage the object through Microsoft® Windows Server® 2008 Failover Cluster (WSFC).
Network Interfaces You can use the Failover Cluster Management console to view the state of all cluster network interfaces. Cluster Nodes A cluster node is a system in a cluster running the Microsoft Windows® operating system and WSFC.
Forming a New Cluster Failover Clustering maintains a current copy of the cluster database on all active nodes. If a node cannot join a cluster, the node attempts to gain control of the witness disk resource in Node and Disk Majority model and forms a cluster. The node uses the recovery logs in the quorum resource to update its cluster database. Joining an Existing Cluster A node can join a cluster if it can communicate with another active node in the cluster.
AND or OR. If you use AND, all the dependent resources must come online before your resource can come online. If you use OR, any one of the dependent resources must be online before your resource can come online. • Policies- Allows you to define your desired response to a failure of your resource. You can also specify the Pending time-out value here which is the length of time your resource can take to change states between online and offline before the Cluster Service puts it in a Failed state.
A dependent resource requires another resource to operate. Table 4-3 describes resource dependencies. Table 4-3. Resource Dependencies Term Definition Dependent resource A resource that depends on other resources. Dependency A resource on which another resource depends. Dependency tree A series of dependency relationships or hierarchy. The following rules apply to a dependency tree: • A dependent resource and its dependencies must be in the same group.
Resource Failure Failover Clustering periodically checks if a resource is functioning properly using either basic health check or thorough health check. 1 In the Failover Clustering console, right-click on the resource you want to modify, and click Properties. 2 Under the Advanced Policies tab, you can define the Basic resource health check interval and Thorough resource health check interval.
Replacing a Failed Disk If a disk in a Failover Cluster has failed, you can assign a different disk. To replace the failed disk: 1 Right-click on the resource and click Properties. 2 In the General tab, click Repair, and select a new disk that you want to use. The new disk that you assign must be one that can be clustered. NOTE: The Repair option does not recover data. You can restore data to the disk before using the Repair option.
An active/active configuration contains virtual servers running separate applications or services on each node. When an application is running on node 1, the remaining node(s) do not have to wait for node 1 to fail. Those node(s) can run their own cluster-aware applications (or another instance of the same application) while providing failover for the resources on node 1.
Table 4-4. Windows Server 2008 Failover Policies Failover Policy Description Advantage Disadvantage(s) N+I One or more nodes provides backup for multiple systems. Highest resource availability. • May not handle more than one backup node failure. • May not fully utilize all of the nodes. Failover pair Applications can failover between the two nodes. Easy to plan the Applications on the pair capacity of each cannot tolerate two node node. failures.
Figure 4-1. Example of an N+I Failover Configuration for an Eight-Node Cluster cluster node 1 cluster node 2 cluster node 3 cluster node 4 cluster node 5 cluster node 6 cluster node 7 cluster node 8 (backup) (backup) Table 4-5.
Configuring Group Affinity On N + I (active/passive) Failover Clusters running Windows Server 2008, some resource groups may conflict with other groups if they are running on the same node. For example, running more than one Microsoft Exchange virtual server on the same node may generate application conflicts. Use Windows Server 2008 to assign a public property (or attribute) to a dependency between groups to ensure that they failover to similar or separate nodes. This property is called group affinity.
If you have applications that run well on two-node cluster, and you want to migrate these applications to Windows Server 2008, failover pair is a good policy. This solution is easy to plan and administer, and applications that do not run well on the same server can easily be moved into separate failover pairs. However, in a failover pair, applications on the pair cannot tolerate two node failures. Figure 4-2 shows an example of a failover pair configuration.
Table 4-7 shows a four-node multiway failover configuration for the cluster shown in Figure 4-3. For each resource group, the failover order in the Preferred Owners list in Failover Cluster Management console outlines the order that you want that resource group to failover. In this example, node 1 owns applications A, B, and C. If node 1 fails, applications A, B, and C failover to cluster nodes 2, 3, and 4. Configure the applications similarly on nodes 2, 3, and 4.
Failover Ring Failover ring is an active/active policy where all running applications migrate from the failed node to the next preassigned node in the Preferred Owners List. If the failing node is the last node in the list, the failed node’s applications failover to the first node. While this type of failover provides high availability, ensure that the next node for failover has sufficient resources to handle the additional workload. Figure 4-4 shows an example of a failover ring configuration. Figure 4-4.
Failback Failback returns the resources back to their original node. When the system administrator repairs and restarts the failed node, WSFC takes the running application and its resources offline, moves them from the Failover Cluster node to the original node, and then restarts the application. You can configure failback to occur immediately, at any given time, or not at all. To minimize the delay until the resources come back online, configure the failback time during off-peak hours.
Understanding Your Failover Cluster
Maintaining Your Cluster This section provides instructions to perform multiple maintenance tasks like adding, configuring, and removing cluster components in your Dell™ Failover Cluster. Adding a storage to a Failover Cluster Node Failover Clustering groups all available disks on the shared storage into a group named Available Storage group. You can also add storage to an existing Failover Cluster.
3 Configure the networks: • For your Private network, select Allow the cluster to use this network only. • For your Public network, select both Allow the cluster to use this network and Allow clients to connect through this network. • For any other network such as iSCSI network that might be configured, select Do not allow the cluster to use this network.
To stop or restart the cluster service on a node: 1 Right-click the node that you want to stop or restart in the Failover Cluster Management console. 2 Click More Actions and select from either of the following options that are displayed: • Stop Cluster Service • Start Cluster Service Running chkdsk on a Clustered Disk in Maintenance Mode Failover Clustering allows you to put a disk in Maintenance mode without taking the disk offline.
3 In the Cluster Events Filter dialog window, select the criteria for the events that you want to display and click OK. 4 To view an event, click on the event, and see the details in the Event Details screen. If you want the cluster logs to be displayed in textual format, then run the following command in the command prompt of each node: cluster log /g. You must be logged in as an Administrator to run the command.
Upgrading to a Cluster Configuration This section provides instructions to perform upgrading a cluster configuration in your Dell™ Failover Cluster. Before You Begin Before you upgrade your non-clustered system to a cluster solution: • Back up your data. • Verify that your hardware and storage systems meet the minimum system requirements for a cluster as described in "System Requirements" on page 10.
Completing the Upgrade After installing the required hardware and network adapter upgrades, set up and cable the system hardware. NOTE: You may need to reconfigure your switch or storage groups so that both nodes in the cluster can access their logical unit numbers (LUNs). The final phase for upgrading to a cluster solution is to install and configure Microsoft® Windows Server® 2008 with WSFC.
Troubleshooting This appendix provides troubleshooting information for your cluster configuration. Table A-1 describes general cluster problems you may encounter and the probable causes and solutions for each problem. Table A-1. General Cluster Troubleshooting Problem Probable Cause Corrective Action The nodes cannot access the storage system, or the cluster software is not functioning with the storage system.
Table A-1. General Cluster Troubleshooting (continued) Problem Probable Cause Corrective Action The nodes cannot access the storage system, or the cluster software is not functioning with the storage system. You are using a Verify the following: Dell/EMC storage • EMC® Access Logix™ software is array and Access enabled on the storage system. control is not enabled • All logical unit numbers (LUNs) correctly. and hosts are assigned to the proper storage groups.
Table A-1. General Cluster Troubleshooting (continued) Problem Probable Cause Corrective Action One of the nodes takes a long time to join the cluster. The node-to-node network has failed due to a cabling or hardware failure. Check the network cabling. Ensure that the node-to-node interconnection and the public network are connected to the correct NICs. OR One of the nodes fail Long delays in to join the cluster. node-to-node communications may be normal.
Table A-1. Problem General Cluster Troubleshooting (continued) Probable Cause Corrective Action Attempts to connect • The Cluster Service to a cluster using has not been Cluster Administrator started. fail. • A cluster has not been formed on the system. Verify that the Cluster Service is running and that a cluster has been formed.
Table A-1. General Cluster Troubleshooting (continued) Problem Probable Cause Unable to add a node The new node cannot to the cluster. access the shared disks. The shared disks are enumerated by the operating system differently on the cluster nodes. Corrective Action Ensure that the new cluster node can enumerate the cluster disks using Windows Disk Administration.
Table A-1. General Cluster Troubleshooting (continued) Problem Probable Cause Cluster Services may not operate correctly on a cluster running Windows Server 2008 when the Internet Firewall enabled. The Windows Perform the following steps: Internet Connection 1 On the Windows desktop, rightFirewall is enabled, click My Computer and click which may conflict Manage. with Cluster Services. 2 In the Computer Management window, double-click Services. 3 In the Services window, doubleclick Cluster Services.
Table A-1. General Cluster Troubleshooting (continued) Problem Probable Cause Corrective Action You are using a The failback mode for Set the correct failback mode on each PowerVault MD3000 the cluster node(s) is cluster node: or PowerVault not set properly. • For PowerVault MD3000 storage, MD3000i storage merge the Cluster.
Table A-1. General Cluster Troubleshooting (continued) Problem Probable Cause Corrective Action You are using a PowerVault MD3000 or PowerVault MD3000i storage array and one of the following occurs: The snapshot virtual disk has been erroneously mapped to the node that does not own the source disk. Unmap the snapshot virtual disk from the node not owning the source disk, then assign it to the node that owns the source disk.
Index A D active/active about, 45 domain model selecting, 19 C drivers installing and configuring Emulex, 27 chkdsk/f running, 57 cluster cluster objects, 39 forming a new cluster, 41 joining an existing cluster, 41 Cluster Administrator about, 37 E Emulex HBAs installing and configuring, 27 installing and configuring drivers, 27 cluster configurations active/active, 45 active/passive, 45 supported configurations, 59 F cluster nodes about, 40 states and definitions, 40 failover modifying failover
failover policies, 46 failover pair, 49 failover ring, 52 for Windows Server 2003, Enterprise Edition, 46 multiway failover, 50 N+I failover, 47 M G N group affinity about, 49 configuring, 49 N+I failover configuring group affinity, 47 H HBA drivers installing and configuring, 27 host bus adapter configuring the Fibre Channel HBA, 27 MSCS installing and configuring, 29 multiway failover, 50 network adapters using dual-port for the private network, 26 network failure preventing, 39 network interfaces
P T period values adjusting, 44 threshold adjusting, 44 private network configuring IP addresses, 23 creating separate subnets, 25 using dual-port network adapters, 26 troubleshooting connecting to a cluster, 64 shared storage subsystem, 61-62 public network creating separate subnets, 25 U Q upgrading to a cluster solution before you begin, 59 completing the upgrade, 60 quorum resource running chkdsk, 57 R resource creating, 43 resource dependencies, 42 upgrading operating system, 60 W warranty
Index