DRBD and corosync/pacemaker without tears.

dicko

Joined
Oct 24, 2008
Messages
4,099
Likes
0
Points
0
#1
I wondered which forum where to start this in. I decided on here as there is no feeling of lack of security more than realizing that a catastrophic failure of your asterisk server will take maybe half an hour to restore and piss of a lot of people, (and that only if you are a wise virgin and have a handy mondorestore image, and a trusted wet-ware unit on site).

Installing and using heartbeat/drbd has been covered well here, but heartbeat is dead, long live pacemaker/corosync.

This is really just an extension of the previous discussions we have had here on HA solutions, and probably not for the newbie yet. Hopefully by the end of this discussion we will have developed a script to automate the process largely.

My friend tylerd (who prompted me to start this thread) has had problems with drbd, not so much myself but the next few bits will help with the setup.

A couple of tips would be don't use volume labels on your partitions and lvm adds a level of complexity that is unnecessary with SAN type deployments. Consider using raid/mdadm from the beginning if you have the inclination, so add another HDD (never use mb based raid's they really aren't raid controllers), For ease of use use two identical platform for your eventual sysytem , you can eventually reuse the production machine and we will do all the prep stuff one sided so maybe get another hardware platform that matches what you have already if you consider it adequate

Step 1. Make a mondoarchive of your working machine.

Step 2. Restore it to the new hardware but change the partitioning to something like

P1 100M /boot bootable
P2 10000G / (should be more than enough for the OS)
P3 1.5 x memory as swap
P4 rest of it extended werer we will put the DRBD disks
P5-Pn The DRBD partitions maybe r0 = 50000G for starters

again if you choose RAID 1 or whatever build your arrays before restoring the data, another tip is to start mondorestore in expert mode and

export TERM=vt100

first, or you will go cross eyed. While you're in the chroot environment to do the initrd thingy, setup the the IP's appropriately as you will be rebooting into the same network, and comment out all the HWADDR=X in /etc/sysconfig/network-scripts/ifcfg-eth? files for later ease (don't forget these are hardlinked files, so choose your editor carefully so you don't break the links).

Reboot the cloned machine.

Now the easy bit

download the latest from

http://oss.linbit.com/drbd-mc/

(there is even a boobtube video to help us boobs)

if you are NIC impoverished I suggest you

yum -y install vconfig

and set up your vlans to suit (512 for phones (cisco standard) , perhaps 256 for management 128 for corosync/drbd) I need the management vlan because in my deployments there is only one WAN address presented to the innertubes. Many will be using redphone for redundancy if so don't forget to account for that layer two traffic.

The DRBD-MC will prompt you for getting drbd and the repos necessary for corosync/pacemaker, making it almost easy (apart from reading the FM's :) ) and present you with a basically functional /etc/drbd.conf file.

mine for reference :-
-------------------------------------
cat /etc/drbd.conf
#
# please have a a look at the example configuration file in
# /usr/share/doc/drbd83/drbd.conf
#

common {
syncer { rate 4000; }
# net {
# sndbuf-size 256k;
# max-buffers 256;
# }
disk {
no-disk-barrier;
no-disk-flushes;
no-md-flushes;
}
}

resource "r0" {
protocol A;
startup { wfc-timeout 3; degr-wfc-timeout 5;}
syncer { rate 4000; }



handlers {
split-brain "/usr/lib/drbd/notify-split-brain.sh definitelynotme@compuserve.com";

}
net {
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
}


on alex.local {
device /dev/drbd0;

disk /dev/md3;
address 192.168.1.1:7789;
meta-disk internal;
}

on zoe.local {
device /dev/drbd0;
disk /dev/md3;

address 192.168.1.2:7789;
meta-disk internal;
}
}


------------------------------------------------------

Here I am using raid's but you get the idea I hope. Don't be too aggressive with the rate or you might see load-averages going through the roof on a resync.

Now you will need to force promote the drbd disk to primary and mkfs it.

Do all the tar/rsync/ ln -s stuff on your identified directory structures as previously discussed, into the replicated partition.

a simple script I use to keep the cloned machine uptodate is (8.8.8.4 being your production server (google were nice enough to let me use one of their name servers for the weekend) , 2222 the port you run ssh on, and the acceptance that you are using key authorization)


service httpd stop
service hylafax stop
service asterisk stop
service mysqld stop

rsync -av --progress --delete -e 'ssh -p 2222 8.8.8.4:/var/www/ /var/www/
rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/lib/mysql/ /var/lib/mysql/
rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/spool/hylafax/ /var/spool/hylafax/
#rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/etc/sv/ /etc/sv/
rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/etc/iaxmodem/ /etc/iaxmodem/

service mysqld start
service hylafax start
service asterisk start
service httpd start



rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/tftpboot/ /tftpboot/
rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/lib/asterisk/backups/ /var/lib/asterisk/backups/
rsync -avH --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/spool/asterisk/ /var/spool/asterisk/

of course add postfix and whatever you need for your deployment.


-----------------------------------------------------------------

The prettiest bit of all is to use the java console to start and stop all the replicated disks and adding a group to do the IP/gateway/services starting and stopping.

When your happy it works, do another mondoarchive of alex and clone it to zoe, change the hostname in /etc/sysconfig/network and the IP address in the ifcfg-eth? files.

O.K. it's easier to do the cloning into two machines at the same time but I'm a minimalist

awaiting input . . . . .

and the day you can run crm_mon an the standby machine and watch what happens when you pull the plug on the primary machine.


another tip unless you are fluent in vi, add

export editor=mcedit

to your .bashrc file for when you want to hand edit the xml config file while in crmadmin

dicko
 

ramoncio

Joined
May 12, 2010
Messages
1,663
Likes
0
Points
0
#2
Great info, thank you master!!!

I don't have much time lately, but I'll try to pull this up in my to-do list.
 

Lee Sharp

Joined
Sep 28, 2010
Messages
332
Likes
0
Points
0
#3
So would you recommend this for a beginner? ;)

In all seriousness, it is a tad complex for most people. If I did a howto for a semi-regular backup to a cold standby, do you think it would be handy?
 

dicko

Joined
Oct 24, 2008
Messages
4,099
Likes
0
Points
0
#4
Lee Sharp said:
So would you recommend this for a beginner? ;)

In all seriousness, it is a tad complex for most people. If I did a howto for a semi-regular backup to a cold standby, do you think it would be handy?
Lee: I believe you might be negligent in not actually reading my post before replying, I believe I stated quite specifically

.
.
and probably not for the newbie yet.
.
.
.

your offer to write the script is unnecessary, it's already built into freepbx 2.8

http://www.freepbx.org/news/2010-05-30/ ... nd-restore

true Elastix doesn't officially support 2.8 but . . . it should

Whatever your bias, the original post was intended to re-incite interest in a true high availability solution which many want, and some need and a few have failed to realize as yet, agreed many will be truly happy with a magic-jack :) If this thread is not for the current reader, then please move right on, there's nothing to see here.

However, anybody, please feel free to constructively add to the discussion.

dicko
 

Lee Sharp

Joined
Sep 28, 2010
Messages
332
Likes
0
Points
0
#5
Actually, my post was in jest. :) I was also looking from something in between HA and an Elastix backup. (That doesn't get all the other config changes like the firewall, for example.) And what do you know, you got a link. :) Thanks!
 

Members online

No members online now.

Latest posts

Forum statistics

Threads
30,913
Messages
130,917
Members
17,589
Latest member
cristian.saiz
Top