High Availability Cluster Multiprocessing Best Practices: March 2005
High Availability Cluster Multiprocessing Best Practices: March 2005
High Availability Cluster Multiprocessing Best Practices: March 2005
MULTIPROCESSING
BEST PRACTICES
March 2005
WHITE PAPER
Overview
Contents
Overview
hafeedbk@us.ibm.com
psssHACMPPractices030405.doc
Page 2
Cluster Components
Here are the recommended practices for important cluster components.
Nodes
HACMP supports clusters of up to 32 nodes, with any combination
of active and standby nodes. While it is possible to have all nodes
in the cluster running applications (a configuration referred to as
"mutual takeover"), the most reliable and available clusters have at
least one standby node - one node that is normally not running any
applications, but it available to take them over in the event of a
failure on an active node.
Additionally, it is important to pay attention to environmental
considerations. Nodes should not have a common power supply which may happen if they are placed in a single rack. Similarly,
building a cluster of nodes that are actually logical partitions
(LPARs) in a single processor is useful as a test cluster, but should
not be considered for availability of production applications.
Nodes should be chosen that have sufficient I/O slots to install
redundant network and disk adapters. That is, twice as many slots
as would be required for single node operation. This naturally
suggests that processors with small numbers of slots should be
avoided. Use of nodes without redundant adapters should not be
considered best practice. Blades are an outstanding example of
this. And, just as every cluster resource should have a backup, the
root volume group in each node should be mirrored, or be on a
RAID device.
Nodes should also be chosen so that when the production
applications are run at peak load, there is still sufficient CPU cycles
and I/O bandwidth to allow HACMP to operate. The production
application should be carefully benchmarked (preferable) or
modeled (if benchmarking is not feasible) and nodes chosen so that
they will not exceed 85% busy, even under the heaviest expected
load.
psssHACMPPractices030405.doc
Page 3
Networks
Networks and adapters are the routes by which HACMP transfers
heart beats between nodes. This is an area where more is better;
the most reliable clusters have three or more networks to ensure
that no simple combinations of hardware or software failures
prevent heart beats from flowing.
HACMP strongly recommends that there be at least one non-IP
network connecting a node to at least one other node. For clusters
with more than two nodes, the most reliable configuration provides
two non-IP networks on each node. The distance limitations on
non-IP links particularly RS-232 has often made this
requirement difficult to meet. For such clusters, HACMP V5.1 disk
heart beating should be strongly considered. Disk heart beating
allows the easy creation of multiple non-IP networks without
requiring additional hardware. A cluster without at least one non-IP
heart beat path from each node should not be considered as best
practice, it is barely viable.
(The purpose of the non-IP heart beat link is often misunderstood.
The requirement comes from the following: HACMP heart beats on
IP networks are sent as UDP datagrams. This means that if a node
or network is congested, the heart beats can be discarded. If there
were only IP networks, and if this congestion went on long enough,
the node would be seen as having failed, and HACMP would initiate
takeover. Since the node is still alive, HACMP takeover can cause
both nodes to have the same IP address, and can cause the nodes
to both try to own and access the shared disks. This situation is
sometimes referred to as split brain. Data corruption is all but
inevitable in this circumstance.)
An installation will often find that it must access a particular node in
an HACMP cluster, for purposes such as running reports or
diagnostics. To support this, the best practice is to define a node
psssHACMPPractices030405.doc
Page 4
alias for each cluster node. This has the advantage that HACMP
will keep that IP address available despite individual adapter failures
(provided there are spare adapters on that network). Experience
shows that there is some temptation to use a boot or standby
address for this purpose; that temptation must be resisted. Such
use of a boot or standby adapter say, as target of a telnet
operation will interfere with the HACMP use of that adapter, and
conceivably cause takeover to fail.
Adapters
Each network defined to HACMP should have at least two adapters
per node. While it is possible to build a cluster with fewer, the
reaction to adapter failures is more severe: the resource group must
be moved to another node. AIX 5L V5.2 provides a Network
Interface Backup (NIB) facility that can be used to provide
particularly fast responses to adapter failures. This must be set up
with some care in an HACMP cluster; the appropriate
documentation should be consulted. When done properly, this
provides the highest level of availability against adapter failure.
Many IBM
pSeries processors contain built-in Ethernet
adapters. If the nodes are physically close together, it is possible to
use the built-in Ethernet adapters on two nodes and a "cross-over"
Ethernet cable (sometimes referred to as a "data transfer" cable) to
build an inexpensive Ethernet network between two nodes for heart
beating. Note that this is not a substitute for a non-IP network.
Applications
The most important part of making an application run well in an
HACMP cluster is understanding the application's dependencies.
That includes both the resources that HACMP directly manipulates such as IP addresses, volume groups and file systems - and those
that it does not - such as configuration information. The latter is
often a source of problems in clusters: if the configuration
information is not kept on a shared volume group, it is easy to forget
psssHACMPPractices030405.doc
Page 5
Testing
Simplistic as it may seem, the most important thing about testing is
to actually do it.
A cluster should be thoroughly tested prior to initial production (and
once clverify runs without errors or warnings). This means that
every cluster node and every interface that HACMP uses should be
brought down and up again, to validate that HACMP responds as
expected. Best practice would be to perform the same level of
testing after each change to the cluster. HACMP V5.2 provides a
cluster test tool that can be run on a cluster before it is put into
production. This will verify that the applications are brought back on
line after node, network and adapter failures. The test tool should
be run as part of any comprehensive cluster test effort.
psssHACMPPractices030405.doc
Page 6
Maintenance
Prior to any change to a cluster node, take an HACMP snapshot. If
the change involves installing an HACMP, AIX 5L or other software
fix, also take a mksysb backup. On successful completion of the
change, use SMIT to display the cluster configuration, print out and
save the smit.log file. The Online Planning Worksheets facility can
also be used to generate a report of the cluster configuration.
Enterprises that have a number of identical or nearly identical
clusters should, as best practice, maintain a test cluster identical to
the production ones. All changes to applications, cluster
configuration, or software should be first thoroughly tested on the
test cluster prior to being put on the production clusters. The
HACMP V5.2 cluster test tool can be used to at least partially
automate this effort.
Software maintenance or upgrades (AIX 5L, HACMP or application)
should be first applied to a standby node (after taking the abovementioned backups). Once that node has been rebooted (which
should be done whether or not the maintenance specific instructions
call for it), the resource group for the application should be moved to
that node. In the event of immediate difficulties, the application can
then be moved back to the original production node. If the
application runs without obvious problems, it should be allowed to
continue on the standby node for some period of time before the
production node is upgraded. Note that while HACMP will work with
mixed levels of AIX 5L or HACMP in the cluster, the goal should be
to have all nodes at exactly the same levels of AIX 5L, HACMP and
application software. Additionally, HACMP prevents changes to the
cluster configuration when mixed levels of HACMP are present.
psssHACMPPractices030405.doc
Page 7
Monitoring
HACMP provides a rich set of facilities for monitoring a cluster.
The actual facilities used may well be set by enterprise policy (e.g.,
Tivoli is used to monitor all enterprise systems). In the event that
there is no such policy, clstat should be used. That is, it should be
running all the time to allow easy determination of cluster state.
Furthermore, HACMP can invoke notification methods (such as a
program to send a message or e-mail) on cluster events, or even
send a page. Best practice is to have notification of some form in
place for all cluster events associated with hardware or software
failures.
psssHACMPPractices030405.doc
Page 8
IBM's HACMP product was first shipped in 1991 and is now in its 14th
release, with over 60,000 HACMP clusters in production world wide. It
is generally recognized as a robust, mature high-availability product.
For more information about HACMP, contact your IBM Representative,
or visit:
http://www.ibm.com/servers/eserver/pseries/ha/
References
IBM Web Pages:
HACMP for AIX 5L Version 5.2
http://www.ibm.com/servers/aix/products/ibmsw/high_avail_network/h
acmp.html
IBM Learning Services Classes:
pSeries Logical Partitioning (LPAR) for AIX 5L, course code Q1370
HACMP System Administration I: Planning and Implementation, course code
Q1554
HACMP System Administration II: Maintenance and Migration, course code
Q1557
HACMP System Administration III: Problem Determination and Recovery, course
code Q1559
HACMP System Administration II: Master Class, course code Q1556
psssHACMPPractices030405.doc
Page 9
psssHACMPPractices030405.doc
Page 10