Optimizing Failover Time in a Serviceguard Environment, June 2007
Optimizing failover time in a Serviceguard environment
June 2007
The HP Serviceguard failover process..................................................................................................... 2
The process when failover is caused by a node failure ......................................................................... 2
Node Failure Detection ................................................................................................................. 3
Cluster Reformation Time............................................................................................................... 3
Election of Cluster Membership ...................................................................................................... 4
Lock Acquisition ........................................................................................................................... 4
Quiescence ................................................................................................................................. 4
Cluster Component Recovery ......................................................................................................... 5
Standard Serviceguard implementation: Resource Recovery............................................................... 5
Standard Serviceguard implementation: Applications Recovery.......................................................... 5
Serviceguard Extension for RAC: Group Membership Reconfiguration ................................................ 5
Serviceguard Extension for RAC: RAC Reconfiguration ..................................................................... 5
The process when failover is caused by a package failure .................................................................... 6
Standard Serviceguard implementation: Resource Failure Detection.................................................... 7
Standard Serviceguard implementation: Package Determination ........................................................ 7
Standard Serviceguard implementation: Resource Recovery............................................................... 7
Standard Serviceguard implementation: Application Startup.............................................................. 7
Serviceguard Extension for RAC: Group Membership Reconfiguration ................................................ 7
Serviceguard Extension for RAC: RAC Reconfiguration and Database Recovery................................... 7
How you can optimize failover time....................................................................................................... 8
Some help in estimating time for failover............................................................................................. 8
Node timeout value.......................................................................................................................... 9
Testing ...................................................................................................................................... 10
Lock acquisition (cluster lock, also called tie-breaker or arbitrator)........................................................ 10
Heartbeat subnet............................................................................................................................ 11
Network failure detection................................................................................................................ 11
Number of nodes and number of packages ...................................................................................... 12
EMS resources............................................................................................................................... 12
Package control scripts ................................................................................................................... 12
System Restart Options ................................................................................................................... 12
Applications .................................................................................................................................. 13
Serviceguard Extension for Faster Failover ............................................................................................ 13
Requirements for SGeFF.................................................................................................................. 14
Environments suitable to SGeFF ....................................................................................................... 14
Cluster parameter considerations ..................................................................................................... 15
Conclusion........................................................................................................................................ 16
For more information.......................................................................................................................... 16