CMU : Cluster Management Utility CMU Installation guide with Serviceguard Version 4.0, January 2009 Version 4.
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
CMU Installation guide with Serviceguard................................................................................. 1 Version 4.0, January 2009..................................................................................................... 1 Preface.................................................................................................................................... 4 1-1 About this document .............................................................................................
Preface 1-1 About this document This guide describes how to install HP’s Cluster Management Utility (CMU) V 4.0 under control of Serviceguard on HP systems. 1-2 Intended audience This guide is intended primarily for system managers and operators who want to configure or manage a large collection of systems in a HPC Cluster Architecture (called cluster in this document). Users should be familiar with: • the installation and administration of RedHat Linux or SuSE Linux.
Table 1 Terminology Term Definition Management Node CMU is installed and runs from the Administration Disk, which is typically mounted on the Management Node. Typically, the management node is also the image server. Site alias IP address In a CMU Serviceguard configuration, a unique IP address on the site network to connect to the CMU management cluster. The address the GUI connects to this address. This unique address is configured on only one of the two management servers at a time.
2 CMU installation with Serviceguard In a ”classic” CMU cluster you have a single management server. In case of failure of that server, though the CMU cluster continues to work for the customer application point of view, you loose management functions such as backup, cloning, booting a compute node, ssh through CMU GUI, etc. If the CMU cluster uses a private network for management, you also loose connection to the site network.
Figure 2 Figure 3 shows the case of a “classic” CMU with one CMU management server with compute nodes connected directly to the site network. A unique ip addres IP0 is used for compute node management and Site network access. Figure 3 Figure 4 show the corresponding configuration where two CMU management servers run CMU software in active/standby mode under control of serviceguard. The address IP0 is attached to the server actually running CMU software at a given time.
Figure 4 8
2-1 Hardware requirements The hardware requirements for CMU under Serviceguard are: • 2 Administration servers; • One shared storage accessed by the two servers. • A minimum of two ethernet networks for interconnecting the two management servers. This is normally the case in a CMU cluster, as you normally have a private ethernet for compute nodes management.
2-3 Preparing Serviceguard environment CAUTION: Unless differently specified, the steps described in the following paragraphs must be performed on both systems. Take care about caution messages for operations which must be performed on one server only. 2-3-1 Installing the qlogic driver The example in this document has been taken from an installation on RedHat 5 update 1 server. Some difference may apply for another distribution.
2-3-2 Configuring network On both management servers, configure the site network ethernet interface and the management network ethernet interface. Then edit /etc/hosts, using the following example as a template: # Do not remove the following line, or various programs # that require network functionality will fail. 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 16.16.8.215 cmumgt1.gre.hp.com cmumgt1 10.0.0.1 cmumgt1-eth1.gre.hp.com cmumgt1-eth1 cmumgt1 16.16.8.216 cmumgt2.
port disable = 113 = no } Restart inetd: # /etc/init.d/xinetd restart # ps ax |grep ident 714 pts/1 S+ 0:00 grep ident 31241 ? Ssl 0:00 identd Mount the serviceguard iso file and cd to the mount directory: # mount -o loop REPOSITORY/SGLX_11_18_x86.iso /mnt # cd /mnt Then change directory to the appropriate subdirectory, according to your distribution, and look for the rpm to install: tog-pegasus # rpm -ivh Serviceguard/x86_64/tog-pegasus-2.5.3-1.sles10.x86_64.rpm Preparing...
Preparing... (100########################################### [100%] 1:cmsnmpd ( 20########################################### [100%] Verifying snmpd.conf contains"master agentx" Verifying snmp.conf contains reference to the new mibs Verifying snmp.conf contains knows where to find the new mibs Configuring snmpd for autostart. Starting snmpd: [ OK ] Configuring cmsnmpd for autostart. Starting cmsnmpd: [ OK ] sgcmom # rpm -ivh Serviceguard/x86_64/sgcmom-B.05.00.00-0.rhel5.x86_64.rpm Preparing...
CAUTION: For SLES installation only: For a SLES installation, you have to modify the name of the serviceguard startup file. Proceed an follows: # # # # # # # # # # # # # cd rm rm rm mv cd ln ln cd ln cd ln ln /etc/init.d rc3.d/*cmcluster rc4.d/*cmcluster rc5.d/*cmcluster cmcluster.init cmcluster /etc/init.d/rc3.d -s /etc/rc.d/cmcluster K01cmcluster -s /etc/rc.d/cmcluster S99cmcluster ../rc4.d -s /etc/rc.d/cmcluster S99cmcluster ../rc5.d -s /etc/rc.d/cmcluster S99cmcluster -s /etc/rc.
Command (m for help): p Disk /dev/sda: 53.6 GB, 53687091200 bytes 64 heads, 32 sectors/track, 51200 cylinders Units = cylinders of 2048 * 512 = 1048576 bytes Device Boot Start End Blocks Id System Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-51200, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-51200, default 51200): 1 Command (m for help): p Disk /dev/sda: 53.
Command action e extended p primary partition (1-4) p Partition number (1-4): 2 First cylinder (2-51200, default 2): Using default value 2 Last cylinder or +size or +sizeM or +sizeK (2-51200, default 51200): Using default value 51200 Command (m for help): p Disk /dev/sda: 53.
create /etc/lvm_[hostname] files On both nodes, execute the following command: WARNING! The command below create the file /etc/lvm/lvm_[hostname].conf. The presence of this file ensures exclusive access of the shared volume by only one of the two cmu management servers at a time. # echo activation { volume_list=[\"@$(uname -n)\"] } > /etc/lvm/lvm_$(uname -n).conf Verify that the files have correctly been been created: # ls /etc/lvm/lvm_* /etc/lvm/lvm_cmumgt1.
In the example above, the shared volume is active on cmumgt2, run format on that server. If the volume is inactive on both servers, activate it on one of the two. Note: when the volume is inactive on a node, the device special file does not exist. It is created as the volume is activated. Format the volume: # mke2fs -j /dev/vgcmu/lvol0 mke2fs 1.39 (29-May-2006) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 2621440 inodes, 5242880 blocks 262144 blocks (5.
# unconfigured node. Once the node is configured, # Serviceguard will not consult this file. ########################################################### + Copy this file the other node. # scp $SGCONF/cmclnodelist 16.16.8.216:$SGCONF/cmclnodelist CAUTION: This step has to be performed on only one server. On one node only, create the cluster configuration file by entering the command below. # cmquerycl -v -C $SGCONF/cmu-mgt-clust.config -L /dev/sda1 -n cmumgt1 -n cmumgt2 Looking for other clusters ...
Once you did this copy it to the second server in $SGCONF. # scp /usr/local/cmcluster/conf/cmu-mgt-clust.config root@cmumgt2:$SGCONF NOTE: You do not strictly need to have the text configuration file on both servers, cmapplyconf (see next paragraphs) ensures both servers to have the binary copy of the configuration. Having this duplication may be useful anyway if you want to be able to perform cluster configuration operation indifferently on oe server or the other.
2-5 Installing CMU 2-5-1 Prepare the CMU environment On both management servers, perform the steps in this paragraph to prepare the CMU environment. Activation of xinetd services Linux uses the xinetd daemon. All configuration files for the services are usually in /etc/xinetd.d/. You need to edit the /etc/xinetd.d/tftp file. (An example of this file is available in the CMU CDROM in the directory ConfigFiles/xinetd-service).
Verify the dhcpd listen interface Ensure that dhcpd is correctly configured to listen on the Ethernet interface connected to the compute network: # grep DHCPD_INTERFACE= /etc/sysconfig/dhcpd # Examples: DHCPD_INTERFACE="eth0" # DHCPD_INTERFACE="eth0 eth1 eth2 tr0 wlan0" # DHCPD_INTERFACE="internal0 internal1" # DHCPD_INTERFACE="id-00:50:fc:e4:f2:65 id00:a0:24:cb:cc:5c wlan0" DHCPD_INTERFACE="eth2" In this example, dhcpd listens to the interface eth2.
# ls -1 /opt/cmu/tools/cmu_sg* cmu_sg_clean cmu_sg_pkg.config cmu_sg_pkg.sh cmu_sg_pkg.env Perform the following commands: # # # # # # # cp /opt/cmu/tools/cmu_sg_clean /etc/init.d scp /opt/cmu/tools/cmu_sg_clean cmumgt2 /etc/init.d mkdir $SGCONF/cmu_sg_pkg cd $SGCONF/cmu_sg_pkg cp /opt/cmu/tools/cmu_sg_pkg.config . cp /opt/cmu/tools/cmu_sg_pkg.sh . cp /opt/cmu/tools/cmu_sg_pkg.env .
SUBNET : This variable defines the subnet monitored by the cmu pakage to take a decision on whether a failover should occur or not. If your cmu cluster has two networks (the site network and the compute network) you may choose one of the two networks to monitor. CAUTION: Once you completed the customization, copy the file to the other management server at the same location, to ensure that changes are applied to both servers.
On one of the two management servers run the command below to apply the package configuration: # cmapplyconf -C $SGCONF/cmu-mgt-clust.config -P $SGCONF/cmu_sg_pkg/cmu_sg_pkg.config Note : a NODE_TIMEOUT value of 2000000 was found in line 124. This value is recommended if the top priority is to reform the cluster as fast as possible in case of failure. If the top priority is to minimize reformations, consider using a higher setting.
PACKAGE cmu_sg_pkg NODE cmumgt2 STATUS up STATUS up STATE running AUTO_RUN enabled NODE cmumgt1 STATE running The package AUTO_RUN option controls whether or not there is an automatic failver in case of one server failure. It must be enabled in order to have an automatic failover.
2-7-1 Serviceguard gui interface Open a browser on one of the two servers, port 2381, for example: https://16.16.8.216:2381. You get the following window: By clicking on “Serviceguard” you can view and manage the cluster and the cmu_sg_pkg package. Refer to Service guard documentation for details.