Category: ‘Clustering’

Building HA cluster with Pacemaker, Corosync and DRBD

August 16, 2013 Posted by admin

If you want to setup a Highly Available Linux cluster, but for some reason do not want to use an “enterprise” solution like Red Hat Cluster, you might consider using Pacemaker, Corosync and DRBD [1], [2], [3].

Pacemaker is a cluster resource manager. It achieves maximum availability for your cluster services by detecting and recovering from node and resource-level failures by making use of the messaging and membership capabilities provided by your preferred cluster infrastructure – either Corosync or Heartbeat.

For the purpose of this blog, we’ll use Corosync and setup a two node highly available Apache web server with an Active/Passive cluster using DRBD and Ext4 to store data.

To install the software we’ll be using a Fedora repository:

[root@node1 ~]# sed -i.bak “s/enabled=0/enabled=1/g” /etc/yum.repos.d/fedora.repo
[root@node1 ~]# sed -i.bak “s/enabled=0/enabled=1/g” /etc/yum.repos.d/fedora-updates.repo
[root@node1 ~]# yum install -y pacemaker corosync

To configure Corosync, we need to choose unused multicast address and a port:

[root@node1 ~]# export ais_port=4000
[root@node1 ~]# export ais_mcast=226.94.1.1
[root@node1 ~]# export ais_addr=`ip addr | grep “inet ” | tail -n 1 | awk ‘{print $4}’ | sed s/255/0/`
[root@node1 ~]# cp /etc/corosync/corosync.conf.example /etc/corosync/corosync.conf
[root@node1 ~]# sed -i.bak “s/.*mcastaddr:.*/mcastaddr:\ $ais_mcast/g” /etc/corosync/ corosync.conf
[root@node1 ~]# sed -i.bak “s/.*mcastport:.*/mcastport:\ $ais_port/g” /etc/corosync/corosync.conf
[root@node1 ~]# sed -i.bak “s/.*bindnetaddr:.*/bindnetaddr:\ $ais_addr/g” /etc/corosync/corosync.conf

We also need to tell Corosync to load the Pacemaker plugin:

[root@node1 ~]# cat < <-END >>/etc/corosync/service.d/pcmk
service {
# Load the Pacemaker Cluster Resource Manager
name: pacemaker
ver: 1
}
END

At this point we need to propagate the configuration changes we made to the second node:

[root@node1 ~]# for f in /etc/corosync/corosync.conf /etc/corosync/service.d/pcmk /etc/hosts; do scp $f node2:$f ; done

Now we can start Corosync on the first node and check /var/log/messages:

[root@node1 ~]# /etc/init.d/corosync start

If all looks good we can start Corosync on the second node as well, and check if the cluster was formed by tailing /var/log/messages.

The next step is to start Pacemaker on both nodes:

[root@node1 ~]# /etc/init.d/pacemaker start

To display the cluster status run:

[root@node1 ~]# crm_mon

Now that we have a working cluster make sure you get familiar with the main cluster administration tool:

[root@node1 ~]# crm –help

Let’s examine the current cluster configuration:

[root@node1 ~]# crm configure show

One thing to note is that Pacemaker ships with STONITH enabled. STONITH is a common node fencing mechanism that is used to ensure data integrity by powering off (or Shooting The Other Node In The Head) a problematic node.

For the purpose of this example let’s simplify things and disable STONITH, at least for now:

[root@node1 ~]# crm configure property stonith-enabled=false
[root@node1 ~]# crm_verify -L

We should also disable Quorum, since this is a two node cluster:

[root@node1 ~]# crm configure property no-quorum-policy=ignore

Now it’s time to add the first shared resource – an IP address, because regardless of where the cluster service(s) are running, we need a consistent address to contact them on:

[root@node1 ~]# crm configure primitive ClusterIP ocf:heartbeat:IPaddr2 params ip=192.168.122.101 cidr_netmask=32 op monitor interval=30s

The other important piece of information here is ocf:heartbeat:IPaddr2. This tells Pacemaker three things about the resource you want to add. The first field, ocf, is the standard to which the resource script conforms to and where to find it. The second field is specific to OCF resources and tells the cluster which namespace to find the resource script in, in this case heartbeat. The last field indicates the name of the resource script.

To obtain a list of the available resource classes, run

[root@node1 ~]# crm ra classes
[root@node1 ~]# crm ra list ocf pacemaker
[root@node1 ~]# crm ra list ocf heartbeat

Let’s test this by performing a fail-over. The IP should move from the first node it’s currently being hosted on to the second – passive – node.
First let’s check on what node the IP recourse is currently running:

[root@node1 ~]# crm resource status ClusterIP

On that node stop Pacemaker and Corosync, in that order:

[root@node1 ~]# /etc/init.d/pacemaker stop
[root@node1 ~]# /etc/init.d/corosync stop

Or put the node on stand-by:

[root@node1 ~]# crm node standby

Check the status of the cluster and observer where the IP resourse has moved:

[root@node1 ~]# crm_mon

You can also check with:

[root@node1 ~]# ip addr show

Now let’s simulate node recovery by starting the services back in the following order:

[root@node1 ~]# /etc/init.d/corosync start
[root@node1 ~]# /etc/init.d/pacemaker start

Or put the node back online:

[root@node1 ~]# crm node online

It’s time to add more services to the cluster. Let’s install Apache:

[root@node1 ~]# yum install -y httpd

Create an index page on both nodes, displaying the name of the node:

[root@node1 ~]# cat < <-END >/var/www/html/index.html

HA Apache – node1

END

In order to monitor the health of your Apache instance, and recover it if it fails, the resource agent used by Pacemaker assumes the server-status URL is available.
Look for the following in /etc/httpd/conf/httpd.conf and make sure it is not disabled or commented out:


SetHandler server-status
Order deny,allow
Deny from all
Allow from 127.0.0.1

At this point, Apache is ready to go, all that needs to be done is to add it to the cluster. Lets call the resource WebSite. We need to use an OCF script called apache in the heartbeat namespace, the only required parameter is the path to the main Apache configuration file and we’ll tell the cluster to check once a minute that apache is still running:

[root@node1 ~]# crm configure primitive WebSite ocf:heartbeat:apache params configfile=/etc/httpd/conf/httpd.conf op monitor interval=1min
[root@node1 ~]# crm configure show

Pacemaker will generally try to spread the configured resources across the cluster nodes. In the case with Apache we need to tell the cluster that two resources are related and need to run on the same host (or not at all). Here we instruct the cluster that WebSite can only run on the host that ClusterIP is active on:

[root@node1 ~]# crm configure colocation website-with-ip INFINITY: WebSite ClusterIP
[root@node1 ~]# crm configure show

When Apache starts, it binds to the available IP addresses. It doesn’t know about any addresses we add afterwards, so not only do they need to run on the same node, but we need to make sure ClusterIP is already active before we start WebSite. We do this by adding an ordering constraint. We need to give it a name (choose something descriptive like apache-after-ip), indicate that its mandatory (so that any recovery for ClusterIP will also trigger recovery of WebSite) and list the two resources in the order we need them to start:

[root@node1 ~]# crm configure order apache-after-ip mandatory: ClusterIP WebSite
[root@node1 ~]# crm configure show

We can also specify a preferred location – node – on which the Apache server should run (if it’s a better hardware for example):

[root@node1 ~]# crm configure location prefer-node1 WebSite 50: node1
[root@node1 ~]# crm configure show

To manually move a resource from one node to the other we need to run:

[root@node1 ~]# crm resource move WebSite node1
[root@node1 ~]# crm_mon

And to move it back:

[root@node1 ~]# crm resource unmove WebSite

Configuring DRBD as a cluster resource.

Think of DRBD as network based RAID-1. Instead of manually syncing data between nodes, we can use a block level replication to do it for us.
For more information on how to setup DRBD refer to [3] and [4].

Run the cluster configuration utility:

[root@node1 ~]# crm

Next we must create a working copy or the current configuration. This is where all our changes will go. The cluster will not see any of them until we say its ok.

cib crm(live)# cib new drbd

Now let’s create the DRBD clone, display the revised configuration and commit the changes:

crm(drbd)# configure primitive WebData ocf:linbit:drbd params drbd_resource=wwwdata op monitor interval=60s
crm(drbd)# configure ms WebDataClone WebData meta master-max=1 master-node-max=1 clone-max=2 clone-node-max=1 notify=true
crm(drbd)# configure show
crm(drbd)# cib commit drbd

Now that DRBD is functioning we can configure a Filesystem resource to use it. In addition to the filesystem’s definition, we also need to tell the cluster where it can be located (only on the DRBD Primary) and when it is allowed to start (after the Primary was promoted):

[root@node1 ~]# crm
crm(live)# cib new fs
crm(fs)# configure primitive WebFS ocf:heartbeat:Filesystem params device=”/dev/drbd/by-res/wwwdata” directory=”/var/www/html” fstype=”ext4″
crm(fs)# configure colocation fs_on_drbd inf: WebFS WebDataClone:Master
crm(fs)# configure order WebFS-after-WebData inf: WebDataClone:promote WebFS:start

We also need to tell the cluster that Apache needs to run on the same machine as the filesystem and that it must be active before Apache can start and commit the changes:

crm(fs)# configure colocation WebSite-with-WebFS inf: WebSite WebFS
crm(fs)# configure order WebSite-after-WebFS inf: WebFS WebSite
crm(fs)# cib commit fs

Now we have a fully functional two node HA solution for Apache!
You can easily setup HA Mysql or NFS using the same method.

For more detailed information please read the main tutorial at [5].

Resources:

[1] http://www.clusterlabs.org/
[2] http://www.corosync.org/
[3] http://www.drbd.org/
[4] http://kaivanov.blogspot.com/2012/01/deploying-drbd-on-linux.html
[5] http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/index.html