Configuring Db2 Pacemaker HADR Cluster QDevice AWS Final
Configuring Db2 Pacemaker HADR Cluster QDevice AWS Final
Configuring Db2 Pacemaker HADR Cluster QDevice AWS Final
triton.co.uk
Configuring Db2 Pacemaker HADR cluster with QDevice in AWS 2
Contents
Introduction 3
The Prerequisites 4
Install Db2 5
Create Db2 Instance 7
Install Pacemaker 7
Install AWS CLI 8
Clone the Servers 8
Set up Passwordless SSH 8
Set up Hosts Files 9
Create the Db2 Database 9
Configure the Db2 Database for HADR 9
Configure the Pacemaker Cluster 11
Create the VIP resource using the AWS Overlay IP address 12
Configure the ACR (Automatic Client Reroute) 15
Install and configure a QDevice Quorum 15
Corosync Heartbeat Hardening 18
Introduction
This article will demonstrate how a Db2 Pacemaker/HADR cluster with a QDevice can be
deployed on RHEL9 virtual servers in the AWS VPC (Virtual Private Cloud) environment.
The latest (currently available) version of Db2 will be used for this exercise: v11.5.9, which includes the required
Pacemaker filesets.
(Note: Pacemaker comes prepackaged with Db2 since v11.5.6, and for earlier versions it must be downloaded and installed
separately).
The cluster will use a separate server as the Quorum Device, which is probably the best option for both reliability
and simplicity (read more about the QDevice options in the Db2 Knowledge Centre).
Moreover, the HADR nodes will be spread across different AZs (Availability Zones) within the same AWS Region,
providing availability even in a case of a whole Data Centre (AZ) failure. This will require the configuration of the AWS
Overlay IP, which will be used as the HADR cluster’s VIP address (i.e. the single “end point”, or IP address, which all remote
clients use to connect to the database, regardless of which HADR node – and in which AZ – is the current primary).
Lastly, the ACR (Automatic Client Reroute) will be configured and used (together with the AWS Overlay IP) to
provide seamless and automatic client re-connection to the database in case of a failover.
When everything is done, the complete Pacemaker/HADR cluster will look like this:
Corosync
heartbeat
External
Remote
Clients Internal
Remote
Clients
Note: throughout this document, I have used the following terms interchangeably, while they all mean
the same: server == AWS instance == virtual machine == environment == platform == node.
Also, I will assume the user AWS account has already been set up and working properly, allowing us access to the
AWS VPC (Virtual Private Cloud) where all instances can and will be created (as well as all other required AWS
resources), without restrictions. None of this will be discussed here, as it’s out of the scope of this article.
The Prerequisites
One way to check if there are any missing prerequisites before starting the Db2 (and Pacemaker) installation
(and then investigating why it failed) is to first run the db2prereqcheck utility, which is available as a part of the
Db2 installation package.
But, in my experience, this tool tends to miss some prerequisites (and reports some which don’t seem necessary),
so I usually end up investigating why my Db2/Pacemaker installation failed anyway.
Therefore, I will simply list all the filesets that I had to install to get Db2 and Pacemaker installed and running OK.
Following is a list of commands I used to install the missing filesets on my Linux (RHEL9) servers:
Additionally, I always try to disable the SE (Security Enhanced) Linux, unless specifically told otherwise, to avoid the
extra administration overhead (outside of Db2!) and the additional prerequisites and is generally a pain in the back
which I like to avoid before it hits me later when I least expect it:
Note: The db2prereqcheck utility might show two warning messages about the missing 32-bit libraries, but in my
experience, these can be safely ignored (unless you need them for some exotic purpose):
Install Db2
When all prerequisites are (hopefully) in place, the next step is to run the system installation of Db2.
This will also install the Pacemaker filesets, which come prepackaged in the Db2 installation image (as of Db2
v11.5.6, as mentioned above).
We run the Db2 installation as follows:
cd [path_to_wherever_the_Db2_installation_image_was_unpacked]
db2_install -b /opt/ibm/Db2_v11.5.9 -y -f NOTSAMP
The switch “-f NOTSAMP” instructs the installer to not install the TSAMP files, as that won’t be needed any more,
with Pacemaker taking over the role of the “HA Controller” from the TSAMP.
Once again, I get those (unnecessary?) warnings about the missing 32-bit libraries, which I ignore and continue with
the installation:
DBT3514W The db2prereqcheck utility failed to find the following 32-bit
library file: “/lib/libpam.so*”.
DBT3514W The db2prereqcheck utility failed to find the following 32-bit
library file: “libstdc++.so.6”.
If all prerequisites are present (and all else goes well), the installer reports the installation was successful:
...
The execution completed successfully.
For more information see the Db2 installation log at “/tmp/db2_install.
log.40826”.
Missing Prerequisites
In a case where there are some prerequisites missing, the installer message will look like this:
...
The execution completed with warnings.
For more information see the Db2 installation log at “/tmp/db2_install.
log.80279”.
The Db2 installation log file shows additional information on the missing stuff, for example:
cat /tmp/db2_install.log.80279
...
Installing: PCMK
WARNING: DBI20105E An error occurred while installing the following file set:
“
PCMK”. Because these files were not successfully installed, functionality
that depends on these files might not work as expected.
To further check what is wrong with the “PCMK” (yes, this is the Pacemaker!) installation, we must look at the
Pacemaker installation log file:
cat /tmp/db2prereqPCMK.log.44256
...
The db2prereqPCMK utility found that python3-dnf-plugin-versionlock package
is not installed on the system.
...
And then repeat the whole Db2 installation (using the same command as before), which will only install the failed
“PCMK” files:
cd /opt/ibm/Db2_v11.5.9/instance
./db2icrt -a server -p 50000 -u db2fenc1 db2inst1
...
The execution completed successfully.
For more information see the Db2 installation log at “/tmp/db2icrt.
log.1805”.DBI1070I Program db2icrt completed successfully.
To verify the Db2 instance has been created OK, we can run the db2level command:
su - db2inst1
db2level
Db21085I This instance or install (instance name, where applicable:
“db2inst1”) uses “64” bits and Db2 code release “SQL11059” with level
identifier “060A010F”.
Informational tokens are “Db2 v11.5.9.0”, “s2310270807”,
“DYN2310270807AMD64”,and Fix Pack “0”.
Product is installed at “/opt/ibm/Db2_v11.5.9”.
Install Pacemaker
The next step of the “system” installation is to install the Pacemaker – this is done from the filesets installed (or
rather, unpacked?) during the Db2 installation.
Quick scan of the system shows where the Pacemaker installation files are located (and this will be under the
directory where the Db2 installation image was unpacked to – in this case /opt/ibm/server_dec)
Next, we run the Pacemaker installation from any of the above directories:
/opt/ibm/server_dec/db2/linuxamd64/pcmk/db2installPCMK -i
Installing “Pacemaker”
Success
DBI1070I Program db2installPCMK completed successfully.
To confirm the installation, we can check if the Pacemaker Daemon executable is present on the server:
aws --version
aws-cli/2.7.4 Python/3.9.11 Linux/4.18.0-513.9.1.el8_9.x86_64 exe/x86_64.
rhel.8
From my experience, here are a few notes worth remembering when setting up the passwordless SSH:
– leave the SSH passphrase empty.
– probably a good idea to remove the banners (/etc/motd, /etc/ssh/my_banner, …) straight away, so that they
don’t interfere with the Pacemaker configuration later.
– copy the root’s public SSH key to the local authorized_keys file as well, Pacemaker seems to need this when
configuring the cluster.
Note: Editing the local hosts file is probably not necessary if the above can be handled by the DNS, but I’ve included
it here for completeness (not all of us know how to setup DNS, or have access to it!), and because it is mentioned in
the documentation.
cd ~/sqllib/bin
./db2sampl
In real life, we will likely want to use an existing database (or restore a database from a backup image) and there is
nothing wrong with that, the rest of the process remains the same.
mkdir ~/logarchive
db2 “update db cfg for SAMPLE using LOGARCHMETH1 ‘disk:/home/db2inst1/
logarchive’”
db2 backup db SAMPLE
– on the Standby:
Now we are ready to start the HADR on the Primary and the Standby:
First the Standby:
When both commands complete (without errors), we can check the HADR status by running the following
command (on either server):
So, now our database is running within a HADR cluster, which means if the active server (Primary) fails, the
Standby server will still contain a complete and up-to-date copy of the database. But the failover must be initiated
by hand, HADR does not do this for us. Therefore, we have up until now covered only the disaster recovery (DR)
aspect of the HADR.
To get the high availability (HA) feature as well (that is, fully automated failovers), we must configure the Pacemaker
cluster. We will do this next.
cd /home/db2inst1/sqllib/bin/
./db2cm -create -cluster -domain PCM_Domain -host hadr_prim -publicEthernet
eth0 -host hadr_stby -publicEthernet eth0
Created db2_hadr_prim_eth0 resource.
Created db2_hadr_stby_eth0 resource.
Cluster created successfully.
With all this, the Db2 database (i.e. the whole HADR cluster) has been put under the Pacemaker control.
To verify this, run the following command (on all HADR nodes):
The output should look like this (if it doesn’t, something went wrong somewhere…):
crm status
And it should show all above defined resources to be up and running (green colour).
Now, if anything happens with the Primary node and it fails, the failover to the Standby node will be automatic and it
will assume the role of the primary node. However, the clients will still need to be reconfigured (manually?!) to access
the Standby node via a different IP address.
To make all this even better, we will create the VIP resource next and then ACR as well to fully automate the failovers.
1. Decide on an (unused) IP address and a network interface that will be used for this purpose:
<OVERLAY_IP>
<ETHx>
Note: all hosts must use a network interface with the same name (for example: eth1)
2. Disable “source/destination check” for the EC2 instances hosting the IBM Db2 primary and standby database:
-go to the AWS Console, select the EC2 Instance in the EC2 Management Console, then select “Actions” and
“Networking” and set “Change source / destination checking” to “stop”
3. Create a policy and attach it to your IAM role by using the AWS IAM Management Console: <ROUTE_TABLE_ID>
Now, on the command line, we create a profile by using the aws configure command:
– for this, we need to interactively type in the AWS Access Key ID and the Secret Access Key, as well as the AWS region
where our instances are located, so have this information ready! (find more details about the AWS keys here):
If we got everything right, the profile is created, and we can check it with:
Next, we run a set of commands to update the routing tables (one for each AZ!) with the Overlay IP pointing
to the node with the Db2 primary instance (the route-table-ids can be found in the AWS Console, as well as the
AWS ID of the Primary instance):
Finally, we can now create the Overlay IP resource itself (notice the same set of routing table IDs here, as in the
above “create-route” commands):
At this point, if everything went smoothly, we should have our AWS Overlay VIP address available, and from now on
it will serve as a single “end point” which all remote clients (both outside and within AWS) will use to connect to the
SAMPLE database.
We can check the Pacemaker cluster status, together with the just now created AWS Overlay VIP (highlighted in
red colour, below) and all other defined resources, with the following command:
crm status
Cluster Summary:
* Stack: corosync (Pacemaker is running)
* Current DC: ip-hadr-stby (version 2.1.6-4.db2pcmk.el9-6fdc9deea29) -
partition with quorum
* Last updated: Fri Jan 19 17:09:07 2024 on ip-hadr-prim
* Last change: Fri Jan 19 17:08:37 2024 by root via cibadmin on ip-hadr-
prim
* 2 nodes configured
* 9 resource instances configured
Node List:
* Online: [ ip-hadr-prim ip-hadr-stby ]
Note: For more information on the AWS Overlay IP prerequisites and setup, check out this document!
On Standby:
db2 update alternate server for database SAMPLE
using hostname <OVERLAY_IP> port 50000
To check the configuration has been updated successfully, run the following command (output slightly edited to
only show the relevant info – in red colour):
The change is automatic and there is no need to restart the database or the instance. Only the already connected
clients will need to be reconnected to become aware of the change.
Now, it would be a good idea to test the client connections’ resiliency during a failover, to confirm they can survive
the event without any external help, and don’t need any explicit reconnections to the database.
I will not repeat this testing here, as I have already done it, but will instead point you to my blog post where I’ve
described the testing in detail.
If it is not already installed, we can install it now, taking care to specify the correct path to the installation packages:
But the files are already installed on my Db2 servers, so nothing to be done here.
On the third host (Q-DEVICE), we must install the Corosync QNet software. Again, we need to specify the correct
path to the installation packages, such as:
dnf install /opt/ibm/server_dec/db2/linuxamd64/install/pcmk/Linux/rhel/rhel9/
x86_64/corosync-qnetd*
Finally, as the root user, we run the db2cm command to setup the QDevice from one of the cluster nodes
(Primary, Standby):
At this point, the setup should be complete, and the Pacemaker cluster should have switched from the default
“two-node quorum” to the more reliable QDevice quorum!
Qdevice information
-------------------
Model: Net
Node ID: 2
Configured node list:
0 Node ID = 1
1 Node ID = 2
Membership node list: 1, 2
Qdevice-net information
----------------------
Cluster name: PCM_Domain
QNetd host: <Q-DEVICE-IP>:5403
Algorithm: LMS
Tie-breaker: Node with lowest node ID
State: Connected
corosync-qnetd-tool -l
Cluster “ PCM_Domain”:
Algorithm: LMS
Tie-breaker: Node with lowest node ID
Node ID 1:
Client address: ::ffff:<hadr-primary-IP>:39642
Configured node list: 1, 2
Membership node list: 1, 2
Vote: ACK (ACK)
Node ID 2:
Client address: ::ffff:<hadr-standby-IP>:60674
Configured node list: 1, 2
Membership node list: 1, 2
Vote: ACK (ACK)
triton.co.uk