fredag, juli 20, 2007

Debian DRBD NFS Failover Cluster

NFS Cluster Lab Notes

Motivation: Customers demand access 24/7, unacceptable with scheduled/unscheduled downtime for maintenance tasks/failure handling

What: DRBD provides "cheap" storage mirroring using commodity hardware and free software fortifying your enterprise
My primary Linux choice for mission critical NOS is Debian
Debian may not be the right choice for you/your environment/setup.

Installation notes specific to my lab. I will install two hosts, debian40drbd-1 and debian40drbd-2.Each host will have two NIC's with one private ip (cross-over patch cable) and one public ip (switch) connection.
Replication takes place over private network. Cluster share common virtual ip address with accessible NFS Services on public network.
/data/exports will be the NFS Share directory on each member

Hardware:
2 Dell GX150 SFF 933MHz with two NIC's each have single ATA disk1 crossover patch cable (Private network)
1 switch (Public network)
1 Laptop (Debian NFS client)

Setup

Install from netinst
[http://cdimage.debian.org/debian-cd/4.0_r0/i386/iso-cd/debian-40r0-i386-netinst.iso]

Partition disk
Type Mount Size
P / 1Gb
L /home 1Gb
L /usr 1Gb
L /var 1Gb
L /tmp 1Gb
L /usr/local 1Gb
L (empty) 1Gb Used for DRBD Meta data (Only needs 128mb, but I keep it simple
L /srv 1Gb
L /boot 1Gb
L (empty) 10Gb Used for DRBD Storage
L swap 512Mb

Name first server to debian40drbd-1
Name second server to debian40drbd-2
Use dhcp during install
Select minimal install options
Add user and root pw
Restart

Configure networking

Host debian40drbd-1

$vi /etc/network/interfaces

# Public network interface
auto eth1
iface eth1 inet static
address 192.168.200.20
netmask 255.255.255.0
gateway 192.168.200.254

# Replication network interface
auto eth0
iface eth0 inet static
address 192.168.254.10
netmask 255.255.255.0

:wq

Host debian40drbd-2

$vi /etc/network/interfaces

# Public network interface
auto eth1
iface eth1 inet static
address 192.168.200.40
netmask 255.255.255.0
gateway 192.168.200.254

# Replication network interface
auto eth0
iface eth0 inet static
address 192.168.254.20
netmask 255.255.255.0

:wq

After configuring network, reboot and check connectivity on all interfaces

Update package system sources.
If needed, add distribution resportory to /etc/apt/sources.list
http://ftp.debian.org etch contrib main

Run apt-get update && apt-get upgrade
Install SSH and time-keeping utils
$apt-get install ssh ntp ntpdate
Modify sshd to listen on ipv4
$vi /etc/ssh/sshd_config
Uncomment #ListenAddress

:wq

invoke-rc.d ssh restart

Install NFS Server

$aptitude install nfs-kernel-server
..
Get:1 http://ftp.debian.org/ etch/main nfs-kernel-server 1:1.0.10-6 [136kB]Fetched 136kB in 1s (111kB/s)Selecting previously deselected package nfs-kernel-server.(Reading database ... 18487 files and directories currently installed.)Unpacking nfs-kernel-server (from .../nfs-kernel-server_1%3a1.0.10-6_i386.deb) ...Setting up nfs-kernel-server (1.0.10-6) ...
Creating config file /etc/exports with new version
Creating config file /etc/default/nfs-kernel-server with new version
Starting NFS common utilities: statd idmapd.
Exporting directories for NFS kernel daemon....Starting NFS kernel daemon: nfsd mountd...

Remove NFS autostart

$update-rc.d -f nfs-kernel-server remove
$update-rc.d -f nfs-common remove

Optional, reboot and verify no NFS autostart

Configure NFS Export

$vi /etc/exports

/data/export/ 192.168.200.0/255.255.255.0(rw,no_root_squash,no_all_squash,sync)

:wq

Install DRBD!

$apt-get install linux-headers-`uname -r` drbd0.7-module-source drbd0.7-utils

$cd /usr/src && tar -xvzf drbd0.7.tar.gz
$cd /usr/src/modules/drbd/drbd && make
$cd /usr/src/modules/drbd/drbd && make install
$mv /etc/drbd.conf /etc/drbd.conf.orig

$vi /etc/drbd.conf

resource r0 {
protocol C;
incon-degr-cmd "halt -f";
startup {
degr-wfc-timeout 120; # 2 minutes.
}
disk { on-io-error detach;
}
net {
} syncer {
rate 10M;
group 1;
al-extents 257;
}
on debian40drbd-1 { # hostname of server 1 (uname -n)
device /dev/drbd0; #
disk /dev/hda13; # ** EDIT ** data partition on server 1
address 192.168.254.10:7788; # ** EDIT ** IP address on server 1
meta-disk /dev/hda10[0]; # ** EDIT ** X MB partition for DRBD metadata on server 1
}
on debian40drbd-2 { # hostname of server 2 (uname -n)
device /dev/drbd0; #
disk /dev/hda13; # ** EDIT ** data partition on server 2
address 192.168.254.20:7788; # ** EDIT ** IP address on server 2
meta-disk /dev/hda10[0]; # ** EDIT ** X MB partition for DRBD metadata on server 2
}
}

:wq

Fire up DRBD!

$modprobe drbd
$drbdadm up all

$cat /proc/drbd
version: 0.7.21 (api:79/proto:74)SVN Revision: 2326 build by root@debian40drbd-2, 2007-07-17 17:02:24 0: cs:Connected st:Secondary/Secondary ld:Inconsistent ns:0 nr:0 dw:0 dr:0 al:0 bm:1194 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured

Make debian40drbd-1 primary
$drbdadm -- --do-what-I-say primary all
$drbdadm -- connect all

Check status

debian40drbd-1:/etc# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)SVN Revision: 2326 build by root@debian40drbd-1, 2007-07-17 17:05:40 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:608348 nr:0 dw:0 dr:687168 al:0 bm:1235 lo:0 pe:35 ua:39 ap:0 [=>..................] sync'ed: 6.3% (8868/9462)M finish: 0:13:57 speed: 10,844 (10,308) K/sec 1: cs:Unconfigured

Initial block level replication done

debian40drbd-2:/# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)SVN Revision: 2326 build by root@debian40drbd-2, 2007-07-17 17:02:24 0: cs:Connected st:Secondary/Primary ld:Consistent ns:0 nr:9767544 dw:9767544 dr:0 al:0 bm:1791 lo:0 pe:0 ua:0 ap:0 1: cs:Unconfigured

Configure NFS directory holding data
$mkdir /data

On server debian40drbd-1
$mkfs.ext3 /dev/drbd0
..
mke2fs 1.40-WIP (14-Nov-2006)Filesystem label=OS type: LinuxBlock size=4096 (log=2)Fragment size=4096 (log=2)1221600 inodes, 2441872 blocks122093 blocks (5.00%) reserved for the super userFirst data block=0Maximum filesystem blocks=250399948875 block groups32768 blocks per group, 32768 fragments per group16288 inodes per groupSuperblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
Writing inode tables: doneCreating journal (32768 blocks): doneWriting superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 35 mounts or180 days, whichever comes first. Use tune2fs -c or -i to override.
..

On server debian40drbd-1
$/etc/init.d/./nfs-kernel-server stop
$mount -t ext3 /dev/drbd0 /data
$mv /var/lib/nfs/ /data/
$ln -s /data/nfs/ /var/lib/nfs
$mkdir /data/export
$umount /data

On server debian40drbd-2
$rm -fr /var/lib/nfs/
$ln -s /data/nfs/ /var/lib/nfs

Some error messages will display, don't bother right now

Install Linux HA

$aptitude install heartbeat
...
Setting up heartbeat (1.2.5-3) ...Heartbeat not configured: /etc/ha.d/ha.cf not found. Heartbeat failure [rc=1]. Failed...
...

So now we need to setup configs
$less /etc/ha.d/README.config
...
You need three configuration files to make heartbeat happy,and they all go in this directory.
They are:
ha.cf Main configuration file
haresources Resource configuration file
authkeys Authentication information

:q

On debian40drbd-1 and -2
$vi /etc/ha.d/ha.cf
logfacility local0
keepalive 2
#deadtime 30 # USE THIS!!!
deadtime 10
bcast eth1
node debian40drbd-1 debian40drbd-2

:wq

On debian40drbd-1 and -2
$vi /etc/ha.d/haresources

debian40drbd-1 IPaddr::192.168.200.60/24/eth1 drbddisk::r0 \ Filesystem::/dev/drbd0::/data::ext3 nfs-kernel-server

:wq

On debian40drbd-1 and -2
$vi /etc/ha.d/authkeys
auth 33 md5 %¤&%*`2¤%&%35ER;er.,,wrw!"##&%¤#%¤%

:wq

Root read-only on Authkeys
$chmod 600 /etc/ha.d/authkeys

Now we can start our daemons..

$invoke-rc.d drbd start
$invoke-rc.d heartbeat start

If no errors, now reboot each server

Check /proc/drbd after each server up again

Now i test mounting NFS Share from another debian server,
$mount 192.168.200.60:/data/export /data
$df -h
...
192.168.200.60:/data/export 9.2G 150M 8.6G 2% /data..
...

Looks good!

Now I want to stream some data onto my debian40drbd-1 and 2
debian01:~# cat /dev/urandom > /data/urandomseed.testfile

And yes! HDD lights on both server blink in concert, replicating data blocks in realtime..

So, now one user accidentally power off debian40drbd-1.. what happens to my stream..?

Lets see..

On debian40drbd-2, I install tcpdump and look for new traffic arrive after debian40drbd-1 is down..

$tcpdump -i eth1 not tcp port 22 -w hafailover.pcap
$tcpdump -nn -r hafailover.pcap less
..
14:45:20.474863 IP 192.168.200.20.32778 > 192.168.200.255.694: UDP, length 146
14:45:21.692595 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:22.474836 IP 192.168.200.20.32778 > 192.168.200.255.694: UDP, length 146
14:45:23.692768 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:25.696681 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:27.696859 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:29.697120 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:31.697129 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:33.640233 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 171
14:45:33.697822 IP 192.168.200.40.32781 > 192.168.200.255.694: UDP, length 145
14:45:34.302344 arp who-has 192.168.200.60 tell 192.168.200.60
14:45:34.624485 IP 192.168.200.10.3892429578 > 192.168.200.60.2049: 1472 write [nfs]14:45:34.624602 IP 192.168.200.10 > 192.168.200.60: udp
14:45:34.624723 IP 192.168.200.10 > 192.168.200.60: udp
14:45:34.624845 IP 192.168.200.10 > 192.168.200.60: udp
...
Host debian40drbd-1 is downed
...
14:45:34.627062 IP 192.168.200.10 > 192.168.200.60: udp
14:45:34.627088 IP 192.168.200.10 > 192.168.200.60: udp
14:45:34.629014 arp who-has 192.168.200.10 tell 192.168.200.60
14:45:34.629163 arp reply 192.168.200.10 is-at 00:40:63:e5:17:92
14:45:34.629185 IP 192.168.200.60 > 192.168.200.10: ICMP 192.168.200.60 udp port 2049 unreachable, length 556
14:45:34.813261 arp reply 192.168.200.60 is-at 00:b0:d0:d5:12:7f
14:45:35.325226 arp who-has 192.168.200.60 tell 192.168.200.60
14:45:35.504480 IP 192.168.200.10.3909206794 > 192.168.200.60.2049: 1472 write [nfs]14:45:35.504584 IP 192.168.200.10 > 192.168.200.60: udp
14:45:35.504708 IP 192.168.200.10 > 192.168.200.60: udp
14:45:35.504833 IP 192.168.200.10 > 192.168.200.60: udp...
Now debian40drbd-2 recieves data

I power up debian40drbd-1, login and check drbd status

debian40drbd-1:~# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)SVN Revision: 2326 build by root@debian40drbd-1, 2007-07-17 17:05:40 0: cs:SyncSource st:Primary/Secondary ld:Consistent ns:274972 nr:0 dw:4 dr:275901 al:0 bm:274 lo:199 pe:61 ua:227 ap:0 [=====>..............] sync'ed: 26.5% (776472/1051200)K finish: 0:01:13 speed: 10,508 (7,424) K/sec

And after a while systems are syncronised again!

debian40drbd-2:~# cat /proc/drbd
version: 0.7.21 (api:79/proto:74)SVN Revision: 2326 build by root@debian40drbd-2, 2007-07-17 17:02:24 0: cs:Connected st:Secondary/Primary ld:Consistent ns:920048 nr:104 dw:104 dr:920048 al:0 bm:220 lo:0 pe:0 ua:0 ap:0

So now I know my NFS storage cluster can tolerate one host failure

TODO
Setup DRBD between machines debian40drbd-1 and -2. export /dev/drbd0 device using iSCSI enterprise Target to the clients. Use heartbeat for control of iSCSI Target and DRBD. Clients access the iscsi device you had already exported using iscsi initiator. Format iscsi device with gfs or ocfs2 (multipel accesses).