HA Active / Passive NFS Cluster
Get a Pair of CentOS 5.5 machines serving up Highly available NFS
I use NFS volumes for all of my production vmware servers. As a result, loosing your NFS server to a hardware failure would be a very bad thing.
We accomplish this using drbd, and heartbeat
This Guide assumes you have two identicaly configured systems, with an empty partition on each node that is un-formatted and ready to go. It also assumes a dedicated crossover connection between network cards on both nodes.
This setup was completed on a CentOS 5.5 system, utilizing the centosplus yum repository for drbd and the fedora epel repository for heartbeat.
Installing
I use epel, centosplus, centos extra yum repositories, after setting them up do:
yum install drbd83 kmod-drbd83 heartbeat
If you want to use the xfs file system (wonderful for big files on big filesystems.. vmware anyone?)
yum install xfsprogs kmod-xfs
DRBD Configuration
You need a blank, unformatted block device, it can be a partition (/dev/sdb1 for example) or it can be a whole block device (/dev/sdb) careful not to use any in use file systems. (it is possible to turn an existing filesystem with existing data into a drbd device, that's another blog)
we need to set up the distributed block devices to mirror the main data file systems:
edit /etc/drbd.conf :
global {
usage-count yes;
}
common {
protocol C;
}
resource drbd0 {
device /dev/drbd0;
disk <your_blank_drbd_partition eg: /dev/sdb1>;
meta-disk internal;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
split-brain "/usr/lib/drbd/notify-split-brain.sh <your_name@email_server>";
}
startup {
degr-wfc-timeout 120;
}
disk {
on-io-error detach;
no-disk-flushes;
no-md-flushes;
}
net {
cram-hmac-alg "sha1";
shared-secret "HaDxWpLXRIB6dxa54CnV";
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
rr-conflict disconnect;
}
syncer {
rate 100M;
al-extents 257;
csums-alg sha1;
}
on drbd-lvm-test1 {
address <ip_address_of_node1>:7789;
}
on drbd-lvm-test2 {
address <ip_address_of_node2>:7789;
}
}
then issue these commands on BOTH nodes:
drbdadm create-md drbd0
service drbd start
you can see the device sucessfully created, by issuing:
cat /proc/drbd
Issue the following command on the PRIMARY node (Only one)
drbdadm -- --overwrite-data-of-peer primary drbd0
Wait for the Sync to complete, just periodically run cat /proc/drbd until it looks like this:
version: 8.3.8 (api:88/proto:86-94)
GIT-hash: d78846e52224fd00562f7c225bcc25b2d422321d build by mockbuild@builder10.centos.org, 2010-06-04 08:04:09
0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----
ns:223608780 nr:0 dw:44 dr:223610936 al:1 bm:13649 lo:0 pe:0 ua:0 ap:0 ep:1 wo:b oos:0
Create an LVM stack on top of the drbd device, edit lvm.conf to force lvm to ignore the underlying block device(s) that are being used by drbd. This will prevent LVM from starting on the wrong device when heartbeat starts it up.
On BOTH Nodes:
vim /etc/lvm/lvm.conf
comment out this line:
#filter = [ "a/.*/" ]
Add this line so LVM ignores the block device that drbd sits on top of:
filter = [ "a|drbd.*|", "r|/dev/sda3|" ]
On the drbd PRIMARY node Only:
pvcreate /dev/drbd0
vgcreate <volume_group_name> /dev/drbd0
lvcreate -l 100%FREE -n <logical_volume_name> <volume_group_name>
create an XFS file system on the logical volume created above, I suggest you use the tuning parameters below:
mkfs.xfs -f -d su=256k,sw=<number_of_data_disks_in_the_raid> -l size=64m /dev/<volume_group_name>/<logical_volume_name>
- the sw parameter is the number of data disks in the array, example: if there are 24 drives, and 2 are used as hot spares, 2 are used for raid6, then sw=20
make sure it all works:
mkdir /data && mount /dev/<volume_group_name> /<logical_volume_name>
setting up heartbeat is next so we need to make sure the filesystem is un-mounted:
umount /data
The NFS metadata has to go on to the shared block device, otherwise, all your NFS clients will suffer from "Stale NFS File Handle" errors, and will need to be rebooted when your cluster fails over, not good.. so this procedure must be done on both nodes, one after the other:
On Node1:
Change where the rpc_pipefs file system gets mounted:
mkdir /var/lib/rpc_pipefs
vim /etc/modprobe.d/modprobe.conf.dist
locate the module commands for sunrpc, and change the mount path statement from /var/lib/nfs/rpc_pipefs to /var/lib/rpc_pipefs
vim /etc/sysconfig/nfs
Add this line to the bottom:
RPCIDMAPDARGS="-p /var/lib/rpc_pipefs"
reboot the node.
Make it the primary drbd node:
drbdadm primary drbd0
scan for the volume group on the drbd0 block device:
vgscan
Make the drbd volume group active:
vgchange -a y
mount the xfs file system:
mount /dev/<volume_group_name>/<logical_volume_name> /data
Move /var/lib/nfs to the shared filesystem:
mv /var/lib/nfs /data/
ln -s /data/nfs /var/lib/nfs
Put the nfs exports config file in the shared file system as well:
mv /etc/exports /data/nfs/
ln -s /data/nfs/exports /etc/exports
create a dir under /data for export:
mkdir /data/supercriticalstuff
export it:
echo "/data/supercriticalstuff *(ro,async,no_root_squash)" >> /data/nfs/exports
edit /etc/init.d/nfs, and change killproc nfs -2 to killproc nfs -9, to make sure nfs really dies when stopped:
back it up so you can fix it after rpm updates:
cp /etc/init.d/nfs ~/nfs_modded_init_script
Start NFS and make sure it all works:
service nfs start
now on to Node 2: shut down NFS on Node 1
umount /data
deactivate the logical volume:
vgchange -a n
give up the drbd resource:
drbdadm secondary drbd0
On Node 2:
Change where the rpc_pipefs file system gets mounted:
mkdir /var/lib/rpc_pipefs
vim /etc/modprobe.d/modprobe.conf.dist
locate the module commands for sunrpc, and change the mount path statement from /var/lib/nfs/rpc_pipefs to /var/lib/rpc_pipefs
vim /etc/sysconfig/nfs
Add this line to the bottom:
RPCIDMAPDARGS="-p /var/lib/rpc_pipefs"
reboot the node.
Make it the primary drbd node:
drbdadm primary drbd0
scan for the volume group on the drbd0 block device:
vgscan
Make the drbd volume group active:
vgchange -a y
mount the xfs file system:
mount /dev/<volume_group_name>/<logical_volume_name> /data
get rid of /var/lib/nfs, and /etc/exports
rm -rf /var/lib/nfs
rm -f /etc/exports
Make the appropriate symlinks:
ln -s /data/nfs /var/lib/nfs
ln -s /data/nfs/exports /etc/exports
edit /etc/init.d/nfs, and change killproc nfs -2 to killproc nfs -9, to make sure nfs really dies when stopped:
back it up so you can fix it after rpm updates:
cp /etc/init.d/nfs ~/nfs_modded_init_script
Start NFS and make sure it all works:
service nfs start
shut down NFS
umount /data
deactivate the logical volume:
vgchange -a n <volume_group_name>
give up the drbd resource:
drbdadm secondary drbd0
On Both Nodes:
edit /etc/hosts - make sure both cluster nodes on both hosts are listed using their crossover IP addresses.
rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm
yum install heartbeat heartbeat-stonith heartbeat-pils
NOTE - make sure yum only tries to install the x86_64 version of the rpm, you may have to specify the exact version like so:
yum install heartbeat-2.1.4-11.el5.x86_64 (the version number might not be as shown here, check the output of yum for the right rpm)
edit /etc/ha.d/ha.cf :
logfacility local0
keepalive 5
deadtime 20
warntime 10
udpport 695
bcast bond0 # ethernet interface 1
bcast bond1 # ethernet interface 2
bcast bond2 # ethernet interface, or serial interface
auto_failback off
node node1 node2
respawn hacluster /usr/lib64/heartbeat/ipfail
edit /etc/ha.d/haresources:
node1 \
IPaddr2::<virtual_ha_ip_address>/24/bond0 \
IPaddr2::<virtual_ha_ip_address>/24/bond1 \
drbddisk::drbd0 \
LVM::ha-lvm \
Filesystem::/dev/<volume_group_name>/<logical_volume_name>::/<mountpoint>::xfs::rw,nobarrier,noatime,nodiratime,logbufs=8 \
nfslock \
nfs \
Note: use ip address that are NOT currently assigned to any network adapters. This IP will move from host to host as the cluster fails over.
edit /etc/ha.d/authkeys:
auth 2
2 sha1 <random_gibberish_20_carachetrs_long>
chmod 600 /etc/ha.d/authkeys
make sure the HA services are disabled at boot:
chkconfig nfs off
chkconfig nfslock off
chkconfig heartbeat on
On the primary node:
service heartbeat start && tail -f /var/log/messages
make sure the file system is mounted:
mount
make sure the HA IP address is up:
ip addr
On the secondary node:
service heartbeat start
/var/log/messages should show: Status update: Node node1 now has status active
/var/log/messages on the primary node should show the secondary node joining the cluster
service heartbeat stop on the primary node, make sure the services fail over properly
halt -p
do it a few timesa fail the services back and forth while the nfs export is mounted from another system to make sure everything fails over as it should.
enjoy an active/passive NFS server.
two comments
Hiya,
Thanks for these instructions. A few problems to correct:
1) You don’t indicate it, but in the drbd.conf file you need to change drbd-lvm-test1 and drbd-lvm-test2 to be whatever hostname you have for your machines
2) you can also check the status of drbd replication by going: service drbd status
3) In LVM setup, shouldn’t this line: filter = [ “a|drbd.*|”, “r|/dev/sda3|” ]
be filter = [ “a|drbd.*|”, “r|/dev/sdb1|” ] if we are using /dev/sdb1 as the mount for drbd0?
4) Under XFS Setup, you don’t provide an example of what the mkfs.xfs line should look like. I used a sw=2 in my config, but I am not sure if that is right
5) Under XFS Setup, you have this line:
mkdir /data && mount /dev/ /
I think it should be:
mkdir /data && mount /dev// /data
Thanks!
Jim
Hiya,
One other thing:
The exports line you have is:
echo “/data/supercriticalstuff *(ro,async,no_root_squash)” >> /data/nfs/exports
Shouldn’t it be rw? Like:
echo “/data/supercriticalstuff *(rw,async,no_root_squash)” >> /data/nfs/exports
Jim