DRBD and corosync/pacemaker without tears.

Discussion in 'General' started by dicko, Dec 5, 2010.

  1. dicko

    Joined:
    Oct 24, 2008
    Messages:
    4,099
    Likes Received:
    0
    I wondered which forum where to start this in. I decided on here as there is no feeling of lack of security more than realizing that a catastrophic failure of your asterisk server will take maybe half an hour to restore and piss of a lot of people, (and that only if you are a wise virgin and have a handy mondorestore image, and a trusted wet-ware unit on site).

    Installing and using heartbeat/drbd has been covered well here, but heartbeat is dead, long live pacemaker/corosync.

    This is really just an extension of the previous discussions we have had here on HA solutions, and probably not for the newbie yet. Hopefully by the end of this discussion we will have developed a script to automate the process largely.

    My friend tylerd (who prompted me to start this thread) has had problems with drbd, not so much myself but the next few bits will help with the setup.

    A couple of tips would be don't use volume labels on your partitions and lvm adds a level of complexity that is unnecessary with SAN type deployments. Consider using raid/mdadm from the beginning if you have the inclination, so add another HDD (never use mb based raid's they really aren't raid controllers), For ease of use use two identical platform for your eventual sysytem , you can eventually reuse the production machine and we will do all the prep stuff one sided so maybe get another hardware platform that matches what you have already if you consider it adequate

    Step 1. Make a mondoarchive of your working machine.

    Step 2. Restore it to the new hardware but change the partitioning to something like

    P1 100M /boot bootable
    P2 10000G / (should be more than enough for the OS)
    P3 1.5 x memory as swap
    P4 rest of it extended werer we will put the DRBD disks
    P5-Pn The DRBD partitions maybe r0 = 50000G for starters

    again if you choose RAID 1 or whatever build your arrays before restoring the data, another tip is to start mondorestore in expert mode and

    export TERM=vt100

    first, or you will go cross eyed. While you're in the chroot environment to do the initrd thingy, setup the the IP's appropriately as you will be rebooting into the same network, and comment out all the HWADDR=X in /etc/sysconfig/network-scripts/ifcfg-eth? files for later ease (don't forget these are hardlinked files, so choose your editor carefully so you don't break the links).

    Reboot the cloned machine.

    Now the easy bit

    download the latest from

    http://oss.linbit.com/drbd-mc/

    (there is even a boobtube video to help us boobs)

    if you are NIC impoverished I suggest you

    yum -y install vconfig

    and set up your vlans to suit (512 for phones (cisco standard) , perhaps 256 for management 128 for corosync/drbd) I need the management vlan because in my deployments there is only one WAN address presented to the innertubes. Many will be using redphone for redundancy if so don't forget to account for that layer two traffic.

    The DRBD-MC will prompt you for getting drbd and the repos necessary for corosync/pacemaker, making it almost easy (apart from reading the FM's :) ) and present you with a basically functional /etc/drbd.conf file.

    mine for reference :-
    -------------------------------------
    cat /etc/drbd.conf
    #
    # please have a a look at the example configuration file in
    # /usr/share/doc/drbd83/drbd.conf
    #

    common {
    syncer { rate 4000; }
    # net {
    # sndbuf-size 256k;
    # max-buffers 256;
    # }
    disk {
    no-disk-barrier;
    no-disk-flushes;
    no-md-flushes;
    }
    }

    resource "r0" {
    protocol A;
    startup { wfc-timeout 3; degr-wfc-timeout 5;}
    syncer { rate 4000; }



    handlers {
    split-brain "/usr/lib/drbd/notify-split-brain.sh definitelynotme@compuserve.com";

    }
    net {
    after-sb-0pri discard-younger-primary;
    after-sb-1pri discard-secondary;
    after-sb-2pri call-pri-lost-after-sb;
    }


    on alex.local {
    device /dev/drbd0;

    disk /dev/md3;
    address 192.168.1.1:7789;
    meta-disk internal;
    }

    on zoe.local {
    device /dev/drbd0;
    disk /dev/md3;

    address 192.168.1.2:7789;
    meta-disk internal;
    }
    }


    ------------------------------------------------------

    Here I am using raid's but you get the idea I hope. Don't be too aggressive with the rate or you might see load-averages going through the roof on a resync.

    Now you will need to force promote the drbd disk to primary and mkfs it.

    Do all the tar/rsync/ ln -s stuff on your identified directory structures as previously discussed, into the replicated partition.

    a simple script I use to keep the cloned machine uptodate is (8.8.8.4 being your production server (google were nice enough to let me use one of their name servers for the weekend) , 2222 the port you run ssh on, and the acceptance that you are using key authorization)


    service httpd stop
    service hylafax stop
    service asterisk stop
    service mysqld stop

    rsync -av --progress --delete -e 'ssh -p 2222 8.8.8.4:/var/www/ /var/www/
    rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/lib/mysql/ /var/lib/mysql/
    rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/spool/hylafax/ /var/spool/hylafax/
    #rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/etc/sv/ /etc/sv/
    rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/etc/iaxmodem/ /etc/iaxmodem/

    service mysqld start
    service hylafax start
    service asterisk start
    service httpd start



    rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/tftpboot/ /tftpboot/
    rsync -av --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/lib/asterisk/backups/ /var/lib/asterisk/backups/
    rsync -avH --progress --delete -e 'ssh -p 2222' 8.8.8.4:/var/spool/asterisk/ /var/spool/asterisk/

    of course add postfix and whatever you need for your deployment.


    -----------------------------------------------------------------

    The prettiest bit of all is to use the java console to start and stop all the replicated disks and adding a group to do the IP/gateway/services starting and stopping.

    When your happy it works, do another mondoarchive of alex and clone it to zoe, change the hostname in /etc/sysconfig/network and the IP address in the ifcfg-eth? files.

    O.K. it's easier to do the cloning into two machines at the same time but I'm a minimalist

    awaiting input . . . . .

    and the day you can run crm_mon an the standby machine and watch what happens when you pull the plug on the primary machine.


    another tip unless you are fluent in vi, add

    export editor=mcedit

    to your .bashrc file for when you want to hand edit the xml config file while in crmadmin

    dicko
     
  2. ramoncio

    Joined:
    May 12, 2010
    Messages:
    1,663
    Likes Received:
    0
    Great info, thank you master!!!

    I don't have much time lately, but I'll try to pull this up in my to-do list.
     
  3. Lee Sharp

    Joined:
    Sep 28, 2010
    Messages:
    332
    Likes Received:
    0
    So would you recommend this for a beginner? ;)

    In all seriousness, it is a tad complex for most people. If I did a howto for a semi-regular backup to a cold standby, do you think it would be handy?
     
  4. dicko

    Joined:
    Oct 24, 2008
    Messages:
    4,099
    Likes Received:
    0
    Lee: I believe you might be negligent in not actually reading my post before replying, I believe I stated quite specifically

    .
    .
    and probably not for the newbie yet.
    .
    .
    .

    your offer to write the script is unnecessary, it's already built into freepbx 2.8

    http://www.freepbx.org/news/2010-05-30/ ... nd-restore

    true Elastix doesn't officially support 2.8 but . . . it should

    Whatever your bias, the original post was intended to re-incite interest in a true high availability solution which many want, and some need and a few have failed to realize as yet, agreed many will be truly happy with a magic-jack :) If this thread is not for the current reader, then please move right on, there's nothing to see here.

    However, anybody, please feel free to constructively add to the discussion.

    dicko
     
  5. Lee Sharp

    Joined:
    Sep 28, 2010
    Messages:
    332
    Likes Received:
    0
    Actually, my post was in jest. :) I was also looking from something in between HA and an Elastix backup. (That doesn't get all the other config changes like the firewall, for example.) And what do you know, you got a link. :) Thanks!
     

Share This Page