Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard for Linux and Red Hat Global File System for RHEL5 October 2008
Executive Summary ............................................................................................................................3 Introduction.......................................................................................................................................3 Audience ......................................................................................................................................4 Cluster membership .......................................................
Executive Summary Organizations today are deploying critical applications on Linux clusters that require high availability, ease of cluster management, and access to large pools of stored information. HP Serviceguard for Linux provides high availability for applications running on a cluster of servers. Red Hat’s Global File System (GFS) enables a cluster of Linux servers to simultaneously read and write to a single shared file system on a Storage Area Network (SAN).
The concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters on the same group of servers must ensure that co-existence is stable. The concurrent deployment is based on the criteria that, both the clusters, maintain the same cluster membership at all times; during failures or normal cluster management operations. This ensures that, in case of a failure, both the clusters, select and removes the same members from the cluster.
Fencing protects data integrity by preventing the failed node from writing to a shared storage. Red Hat Cluster supports various mechanisms, but the only one supported in conjunction with Serviceguard is Integrated Lights Out (iLO) fencing. Using that mechanism, a message is sent to the iLO of a server to restart that server. Use of iLO is less costly and easier to manage than most other methods. In the event of “unequal sized partition” (i.e.
The qdisk feature allows users to configure arbitrary heuristics so that each cluster member can determine its fitness for participating in a cluster. The fitness information is communicated to other cluster members through the qdisk residing on shared storage. A qdisk is small 10MB disk partition shared across the cluster, where each node periodically updates its assigned portion of the disk with its health information.
successfully. During this period application attempts any GFS file system related operations (e.g. a file open) from any of the cluster node will hang. However, all other applications, that do not perform any GFS file system, will not be impacted (so reads/writes from/to previously open files will not hang). Upon confirmation, that the failed node is fenced, DLM and GFS perform recovery. DLM releases locks of the failed node; GFS recovers the journal of the failed node.
In the event of a failure in a concurrent deployment the following sequence of events occur: 1. Serviceguard detects the failure first, proceeds to resolve quorum, and removes the failed nodes from cluster (by resetting them) before Red Hat cluster detects the failure. 2. While the failed node is booting up, the Red Hat Cluster detects failure, gains quorum and request HP iLO to fence the failed nodes (which resets the node for the second time).
In the concurrent deployment of Serviceguard and Red Hat GFS, a majority node failure of exactly half the member can be sustained, provided, the qdisk is configured. In the event of majority node failure of exactly half the member the qdisk vote will break the tie allowing the members to gain quorum and proceed to form the cluster. Without a qdisk Red Hat cluster will not gain quorum and hence disallow all further GFS operations requiring operator intervention.
Cluster Configuration System The Cluster Configuration System (CCS) manages the cluster configuration and provides configuration information to other cluster components in a Red Hat Cluster. CCS daemon runs in each cluster node and makes sure that the cluster configuration file in each cluster is up to date. When the cluster.conf is modified by the operator, the local CCS daemon broadcasts the new cluster.conf file and CCS daemon on other cluster nodes replace their copy with the new one.
file system is mounted before the Serviceguard package starts up. To prevent Serviceguard and Red Hat clusters from having different membership at the time of a node start up, it is recommended that the cluster services startup is enabled, machine startup time, for both Serviceguard and Red Hat clusters. Cluster startup sequence In Red Hat cluster, as soon the newly formed nodes gain quorum, all the possible nodes (those listed in cluster.conf) that are not currently cluster members are fenced.
finally changing the super block of each GFS file system to use the DLM locking protocol using the gfs_tool. 2. Upgrade your operating system to Red Hat Enterprise Linux 5. Then Serviceguard cluster and packages are started up on RHEL5 systems with DLM lock manager. For information on upgrading to Red Hat Enterprise Linux 5 and converting GFS file systems to use the DLM lock manager, see “Configuring and Managing a Red Hat Cluster for RHEL5” document available at: http://www.redhat.com/docs/manuals/csgfs/.
Legacy package configuration After creating a legacy package configuration the guidelines described below needs to be followed. LV designators are used to specify the LVM2 volumes in case of Red Hat GFS for RHEL5 (similar to what is used for Red Hat GFS 6.1). LVM2 device names are of the format /dev/mapper/vgX-lvY where vgX is the volume group and lvY is the logical volume. Set the variable FS_TYPE, to “GFS” to indicate Red Hat GFS file system.
Figure 2 - Four node example of a recommended configuration for HP Serviceguard for Linux and Red Hat GFS Coexistence Support information • Red Hat GFS2 which is an enhanced version of the Red Hat GFS is currently released as technology preview in RHEL5 update 1 i.e., not fully supported by Red Hat. Hence Serviceguard will currently support Red Hat GFS and not Red Hat GFS2 • XEN/VMware guests with GFS as Serviceguard nodes are not supported. • HP Integrity servers are not yet supported.
Conclusions HP Serviceguard for Linux and Red Hat GFS for RHEL5 clusters can co-exist on the same set of servers adding value to each other. Serviceguard can provide HA encapsulation to multiple instances of the same application managing single instance of data from multiple nodes in the cluster simultaneously. Stable co-existence of the two clusters can be achieved with a proper choice of redundant hardware and software components and configurations as mentioned below.
Red Hat Global File System is a cluster file system that allows a cluster of nodes to simultaneously access a block device that is shared among nodes GULM Grand Unified Lock Manager, one of the methods of locking in Red Hat GFS 6.1 HA High Availability iLO HP Integrated Lights Out Legacy package Package Configuration pre-11.
For More Information • HP Serviceguard for Linux product documentation at http://docs.hp.com • Red Hat Cluster Suite for RHEL5, configuration guide at http://www.redhat.com/docs/manuals/csgfs • Red Hat Global File System for RHEL5, configuration guide at http://www.redhat.com/docs/manuals/csgfs • HP Serviceguard for Linux certification matrix showing servers, storage, and software versions supported: http://www.hp.com/info/sglx • HP Serviceguard for Linux with Red Hat GFS version 6.0 and version 6.