Elastix, Alta Disponibilidad en detalle

Taken from:

Elastix Application Note #20140405:

 

These application notes are intended to be a guide to implement features or extend the features of the Elastix IP PBX system.

Whilst many (but not all) guides available are basically a random collection of notes, usually while someone is implementing a feature for themselves, these guides are meant to be more definitive guide that has been tested in a lab with specific equipment, and particular versions of Elastix.

Author Bob Fryer
Date Document Written 5th April 2014
Date of Last Revision 5th April 2014
Revision 1.0
Replaces Document N/A
Tested on Elastix Version 2.4 / 2.5
Backward Compatible Yes from 2.2 onwards but not tested
Elastix Level Intermediate to Experienced
Linux Level Intermediate to Experienced
Network Level Intermediate
Latest Document Source available from www.elastixconnection.com
Sponsored By BluePackets, Canberra, Australia
Licence GNU FDL

Contents

Foreword

These application notes are intended to be a guide to implement features or extend the features of the Elastix IP PBX system.

Whilst many (but not all) guides available are basically a random collection of notes, usually while someone is implementing a feature for themselves, these guides are meant to be more definitive guide that has been tested in a lab with specific equipment, and particular versions of Elastix.

Finding information on the Internet can be haphazard due to the lack of document version control, lack of attention to software versions, and in some cases they are wrong. Then you have the cross pollination issues, where a guide has been done for another distribution, which may or may not be applicable to your Elastix system.

You will note on the front page of every Application note written in this way, will be an easy to read summary, regarding the Elastix system it was tested on, when the document was written, whether it is backward compatible, and the level of expertise needed to accomplish the implementation.

These application notes are written up and tested in a lab that has been specially setup to write these notes. This includes:

  • 5 x Elastix IP PBX Hardware with a mixture of SIP only, Digium, Sangoma, OpenVox Cards
  • 1 x WAN Simulator (including latency, jitter, random disconnects, random packet drop)
  • 8 x Consumer / Business routers, including Drayteks, Cisco 1842, Cisco 877, Linksys WRT54GL
  • 2 x IBM XSeries servers running VMware with 8 images of various versions of Elastix IP PBX
  • 1 x Standard Microsoft SBS Network providing DHCP and DNS and Mail system
  • 2 x Linux Servers

The Elastix IP PBX systems, both hardware and Virtual based have image systems to refresh the systems to limit infection from other testing. Combined with a range of Phones, which include Aastra, Linksys, Cisco, Yealink, it provides a reasonable cross section of typical systems currently in the field.

These application notes are not just done in isolation either. Behind them is over 6-7 years of commercial implementation of IP PBX systems, utilising these methods and concepts. The Lab is just used to reconfirm the implementation in a less production like environment.

How you use these application notes is entirely up to you. However, it is highly recommended that in the first instance, that you follow the notes and configurations in their entirety (except for IP addresses) of course. If you follow it exactly, then it will be easier for others to assist you when you
do have an issue.

Introduction

I have tried to make this Application note as comprehensive as possible and it has been tested on two test systems as well as two production systems utilising this Application note as a guide, checking each step. I have tried to explain what we are doing at each step so that you have an idea of the stage you are up to and what you are expecting to see.

There are many variables in this installation, so it may be different on your system, but if you are running Elastix 2.5/2.5 with SATA hard drives, you should see this guide will match quite well.

If you do run into error messages (which I have not addressed), do a google search, and most cases it will come up with an answer as Linux HA has been done for many years and plenty documented.

High availability has been around for Linux based systems for many years. DRBD is pretty well the industry standard for HA solutions in Linux systems.

As most of the texts say, imagine it as a Mirrored Raid for your Linux System (in other words your Elastix System).

There are quite a few guides out there, but the majority start with the default graphic for DRBD and then commence with telling you line by line what commands to type. Particularly when dealing with a High Availability setup, one of the crucial points is to understand the concepts, otherwise it is not worth starting as you need :

  • Allow you to identify changes that you need to make in the installation to suit your needs
  • Allow you to test at each stage
  • Allow you to understand what has broken if something does break
  • Allow you to understand what changes may be needed for other services you need to consider on your system

The fact that you are considering implementing High Availability means that you have an Elastix system that needs to be online with an absolute minimum downtime and that it cannot be a halfbaked solution. This document alone should not be relied on, it is up to you to read more on how Heart Beat and DRBD work. I will not be held responsible for any issues that you have, any downtime you might experience, or any other monetary loss. In fact my firm recommendation to you is to employ a company that can provide backup support on High Availabilty solutions.

Now that’s out of the way, I am going to start with a simple diagram (no not the standard one on the Internet)

elxappnote-20140405-chapter-2-1

 

One of the first things you will notice if you have looked at your default partitions on the Elastix System is that this looks different. You are absolutely right, this is not the default partition layout that 90% of Elastix builders may use (usually by selecting defaults in the Elastix install).

One of the other things you will notice is that these are no Logical Volumes (which is part of the Elastix install), as they are all physical partitions.

So what does this mean…at least for this application note?? It means a fresh install of Elastix and manually partitioning if you want to implement DRBD and Heartbeat. Now that is not to say that you cannot convert an existing system to High Availability and use Logical Volumes, but that is way past the scope of this Application note. It would be almost impossible to provide a guide on how to convert an existing system, not that it can’t be written, but everyone’s system is different, different size hard drives, different partition sizes, services and data already populated and most likely a system already in production. Furthermore, you are adding a further layer of complexity and unless you are very competent in both, it is not something you want to tackle.

So the main steps we are going to take are:

  • Install Elastix onto two Servers with identical size hard drives (important) utilising manual partitioning
  • Besides the standard partitions used for Elastix, we are going to create a fourth partition for the replicated data on both servers
  • We are going to copy the data we want to replicate from the root partition to the DRBD partition and provide symlinks from the root to the DRBD partition
  • We are going to install the HeartBeat software onto both Servers

A couple of things to consider whilst reading this document:

I refer regularly to the two Servers as Node 1 and Node 2. Also netintegrity.loc is a local domain that I use – you would replace this references with your own internal domain.

Consider your Voice Provider Connections

This topic should be at the top of this guide as it could be a make or break moment in your High Availability implementation, but here will do.

Predominately this guide assumes that you are using a SIP Connection for your Incoming and outgoing calls.

If you using Telephony cards then you will need to do the research on what can be done with your card(s) to connect to a single physical connection (e.g. an E1/T1) from your carrier to two servers.

Otherwise you have to manually switch it over which is not practical nor fits the description of a High Availability solution with no user/client intervention. Fortunately, it is my understanding that cards from Sangoma firstly have the capability to disable the E1/T1 interface so it can be controlled by the Heartbeat Service, and secondly can also startup with the interface disabled (which is important). I don’t believe this capability is available with many other cards in the market, but I will leave this to you to review and is well past the scope of this document.

So what options are available if you need a physical interface such as an E1/T1, BRI or PSTN. You need to look at utilising Gateways from vendors like Patton, Redfone, Epygi, Digium etc. These gateways connect to a physical connection from your carrier and provide a SIP connection to your Elastix system. So if you haven’t gone and purchased your telephony cards yet and you want to do High Availability, then it is worth considering a Gateway. They do cost a little more than your average telephony card, but that additional cost would be eliminated in the time that you spend trying to setup the additional scripts needed. Furthermore many of these Gateways have some great diagnostics functionality further improving your case to purchase a Gateway over a telephony card.

[code language=”text”]
yum list installed net-snmp*
[/code]

Which should list the installed packages with the name beginning with net-snmp and will list similar to the following diagram on the next page.

elxappnote-201110101-chapter-3-1

 

There is one package that needs to be installed which is net-snmp-utils. We can complete this step easily by:

[code language=”text”]
yum install net-snmp-utils
[/code]

You should see the following screen once completed

elxappnote-201110101-chapter-3-2

 

What Does DRBD (Distributed Replication and HA (HeartBeat) Do?

These are the two critical components that we will install on our Elastix Servers.

DRBD is responsible for the “mirroring” of the DRBD partition. If you think of it similar to mirroring hard drives, you have a general idea of how it works. It looks at the changes in this partition and mirrors them to the secondary server. Note that this is a two way replication based on which one is the primary and which one is the secondary. In the event of a Server (say the primary), the secondary will take up the role of primary.

Heartbeat is the application responsible for monitoring for failures but it does a lot when a failure occurs. This includes the following:

  • Manages the Virtual IP address and points to the primary server.
  • Stops the Services that are running on the Primary Server (if it is possible)
  • Starts the Services that are needed on the secondary server (e.g. Apache, Asterisk, MYSQL, amongst others)
  • Instructs DRBD to recognise the secondary server as Primary

Actually some of what I have said is not 100% technically true, but for the purposes of understanding the concepts and keeping it in plain english, it gives you the idea. Before you start or even after you have a test system running, I recommend re-reading the many DRBD guides on the Internet as they will provide a more technical content and after you have done your first one, you will appreciate the technical detail more thoroughly.

Two Network Cards or One?

You may see many guides utilising one Network Card per server. That’s up to you but my recommendation is that most servers, particularly of the type in use in commercial Elastix systems, normally come with two Network cards or ports. Ideally as we are working with voice, it is better to separate the the replication traffic from the voice traffic. Many generic guides for High Availability will show utilising one card, we know that voice can be impacted and wherever we can reduce latency, we should, so the recommendation is a second card in each server, but I have done it with single cards as well and on a small system, there was no impact with the bandwidth set at an appropriate level (in the configuration file).

Other things to consider is how much you write to the hard drive. If you record every one of your calls then the amount of disk writing will be quite reasonable. If you can imagine the replication of this data will generate a large amount of traffic and would be a case for implementing a dedicated link for DRBD traffic. Ideally this link would be a Cross Over Ethernet cable direct from Ethernet port to Ethernet port. This also reduces the risk of failure due to switch failure or power failure on the switch.

elxappnote-20140405-chapter-5-1

 

For the dual network card setup, just assign a different subnet for the replication traffic, make sure that you make the changes in the Hosts file and make the changes in /etc/drbd.conf

Note: On the dual card setup there is no relationship with the last set of digits in addresses for the two subnets on server e.g. 10.0.0.231 and 172.22.22.231. It just makes it easier when configuring.

Elastix Installation & Partitioning

This following partitioning is done on both Elastix Servers. I have not included all the screenshots relating to the complete installation of Elastix. It is assumed that you are already familiar with the standard installation process or if needed the Elastix 2.x installation applicacation note is available from www.elastixconnection.com.

Commence the installation as per normal until you get to the following screen.

elxappnote-20140405-chapter-6-1

 

You will note that I have told you to take note of the disk device. This is because you will use this in the configuration files. On the whole most installs with be with the SDA device, however if you have some unusual or old hardware or a raid controller or software raid, yours may differ. So where you see SDA in this documentation, you will need to change to suit your system.

You are now going to create a Custom Partitioning Layout (important)

elxappnote-20140405-chapter-6-2

elxappnote-20140405-chapter-6-3

elxappnote-20140405-chapter-6-4

elxappnote-20140405-chapter-6-5

elxappnote-20140405-chapter-6-6

elxappnote-20140405-chapter-6-7

elxappnote-20140405-chapter-6-8

 

 

 

 

 

 

 

Click ok to finish the rest of the process as per normal install except give your Elastix Servers easy to recognise names. The unpartitioned space is meant to be there for the moment.

In this application note I have used host name elxnode1.netintegrity.local on the primary box and on on the secondary box host name elxnode2.netintegrity.local

At this point I recommend now moving and performing the rest of these configuration steps via SSH to both Servers. You will probably doing a fair bit of cut and paste which will a lot simpler. Having the two SSH windows open allows you to perform the command on each server.

On both the Node 1 and Node 2 Servers – Create partition that will contain the replicated data using the following command:

{note this is where you might need to change this to match your disk device}

[code language=”text”]
fdisk /dev/sda
[/code]

Now follow the inputs

  • add a new partion (n)
  • Primary (p)
  • Partition number (4)
  • Press enter until returned to fdisk command prompt {accepting defaults}
  • Press “t” to change the partition system ID
  • Press “4” to choose partition number
  • Choose HEX 83 for type
  • Press “w” to save changes

Restart both Servers once you have completed it on each server – This is important

After reboot check your partition table on both Servers using the following command

[code language=”text”]
fdisk /dev/sda
[/code]

and use p at the command prompt for Print Partition Table

elxappnote-20140405-chapter-6-9

 

It should look similar (although your hard disk sizes and ultimately your final partition size will be different, but both your servers should be identical in partition sizes.

Make sure you quit out of fdisk before the next commands

Now on each server we are going to prepare the file system on the DRBD partition

[code language=”text”]
mke2fs –j /dev/sda4
[/code]

which should look like the following

elxappnote-20140405-chapter-6-10

 

Now issue the next commands on both servers which will make sure that the partition (SDA4) are clean

[code language=”text”]
dd if=/dev/zero bs=1M count=1 of=/dev/sda4
sync
[/code]

which will provide all the OID’s and results relating to astChanTypeChannels

Remember, whilst we have been concentrating on the ASTERISK MIB, there is a whole lot more to explore including other areas that can be measured, now that you have SNMP running on your Elastix System

Installing Heartbeat and DRBD

We are going to install Heartbeat with the following command

[code language=”text”]
yum install heartbeat drbd83 kmod-drbd83
[/code]

elxappnote-20140405-chapter-7-1

When the YUM install is completed on a Elastix 2.5 system, you should see similar screen to the above. Make sure you perform this on both systems.

Now update the /etc/hosts file and add your two servers

elxappnote-20140405-chapter-7-2

 

Getting these names correct is important, the last two lines are the ones you are adding. If you are using a separate network for the DRBD traffic, make sure that the correct IP addresses (of the second network) are in the hosts file.

On your Node 1 Server use you favourite editor to edit /etc/drbd.conf and add the following lines:

[code language=”text”]
global { usage-count no; }
resource repdata {
protocol C;
startup { wfc-timeout 10; degr-wfc-timeout 30; }
disk { on-io-error detach; }
net {
after-sb-0pri discard-younger-primary;
after-sb-1pri discard-secondary;
after-sb-2pri call-pri-lost-after-sb;
cram-hmac-alg "sha1";
shared-secret "Wind2Hear2See!"; # choose your own secret!
}
syncer { rate 10M; }
# set the above to 10M – single network card or 100M if using a
# dedicated (second) network card for DRBD
# make sure the name below matches your host name
on elxnode1.netintegrity.local {
device /dev/drbd0;
disk /dev/sda4;
# if using a second network card – make sure that this is the IP
# of the second card on Node 1 Server
address 172.22.22.231:7788;
meta-disk internal;
}
# make sure the name below matches your host name
on elxnode2.netintegrity.local {
device /dev/drbd0;
disk /dev/sda4;
address 172.22.22.232:7788;
# if using a second network card – make sure that this is the IP
# of the second card on Node 2 server
meta-disk internal;
}
}
[/code]

Replicate this file by copying to the secondary server from the primary server

[code language=”text”]
scp /etc/drbd.conf root@elxnode2.netintegrity.local:/etc/
[/code]

Initialise the DRBD meta data area on both servers

[code language=”text”]
drbdadm create-md repdata
[/code]

You should see something similar to the following

elxappnote-20140405-chapter-7-3

 

If you don’t get something similar to the above e.g. unusual parse errors – it’s possible that some of the remark lines in the DRBD.CONF have become two lines – just remove the remark lines.

Start the service on both servers

[code language=”text”]
service drbd start
[/code]

Which should show something similar to the following screen

elxappnote-20140405-chapter-7-4

 

Now to check service is running and that both servers are currently secondary

type on both servers

[code language=”text”]
cat /proc/drbd
[/code]

You should see similar to below on both servers

elxappnote-20140405-chapter-7-5

 

You can see that both Servers are currently in secondary mode, so we need to place Node 1 Server into primary mode

On the Node 2 Server issue the following command

[code language=”text”]
drbdadm invalidate repdata
[/code]

The reason that we do this is that neither server can work out who has the up to date data and quite often generates an error to that effect. By issuing this command we have made it very clear that the Node 2 Server does not have the up to date data (yes we know they are the same – but we need to take control)

Now on the the Node 1 Server we can now put it in the primary role with the following command (perform this command on the Node 1 Server only)

[code language=”text”]
drbdadm primary repdata
[/code]

to put the main one into Primary mode (otherwise you cannot setup the filesystem)

if you perform the cat /proc/drbd command again on the Node 1 Server you will see the following

elxappnote-20140405-chapter-7-6

 

As you can see the Node 1 Server is now in the Primary Role. Further more we can see that it is currently replicating/Syncing the partition

If you perform the command on the Node 2 Server, you will see the following

elxappnote-20140405-chapter-7-7

 

That initial Sync will take a while (up to a couple of hours) depending on the speed of your system/hardware. My preference is to let it sync and perform the following steps after it has completed as we want a working system so we can test the replication is working correctly.

Finally when it’s all synced, you should be able to perform the

[code language=”text”]
cat /proc/drbd
[/code]

on both Servers and you should see something similar to the following

elxappnote-20140405-chapter-7-8

elxappnote-20140405-chapter-7-9

 

 

Very simply, we can see the ds:uptodate/uptodate on both servers, this tells us that they are synced.

Now setup the file system on the Node 1 Server

[code language=”text”]
mkfs.ext3 /dev/drbd0
mkdir /repdata
mount /dev/drbd0 /repdata
[/code]

Now we will make some dummy files and place in the repdata

[code language=”text”]
for i in {1..5};do dd if=/dev/zero of=/repdata/file$i bs=1M count=100;done
[/code]

Word of warning : if you are curious like most, you have probably checked the directory to see if the files created. That’s fine, but make sure you have exited the directory before you perform the next command otherwise it will not unmount as the repdata directory locked).

Now we have done that, we are now going to turn the Primary system into the secondary and turn the secondary into the primary using the following commands

On the Node 1 server issue the following commands which will turn the primary into the secondary

[code language=”text”]
umount /repdata
drbdadm secondary repdata
[/code]

on the Node 2 server issue the following commands which will turn the secondary into the primary

[code language=”text”]
mkdir /repdata
drbdadm primary repdata
mount /dev/drbd0 /repdata
[/code]

now to check that the file did indeed replicate we are going to do a directory listing

On the Node 2 Server issue the following commands

[code language=”text”]
cd /repdata
ls
[/code]

You should see a similar screen

elxappnote-20140405-chapter-7-10

 

Now before we switch back, issue the following command

[code language=”text”]
cd /
[/code]

(this is what that note was about earlier, we cannot be in the /repdata directory as it will cause a lock error when we try to switch back

Now we will switch back but before we do, we will delete a file on the Node 2 server, before we switch the Node 1 Server back to being the primary, so that we can confirm drbd is working.

So on the secondary node issue the following command (which deletes a file called file2 and adds a new file called file6)

[code language=”text”]
rm /repdata/file2
dd if=/dev/zero of=/repdata/file6 bs=100M count=2
[/code]

then these commands which revert it back

So on Node 2 Server issue the following commands

[code language=”text”]
umount /repdata/
drbdadm secondary repdata
[/code]

and on Node 1 Server issue the following commands

[code language=”text”]
drbdadm primary repdata
mount /dev/drbd0 /repdata
[/code]

Now on the Node 1 Server perform the following command

[code language=”text”]
cd /repdata
ls
[/code]

After the last command you should see this on the primary node with file2 missing and a new file6

elxappnote-20140405-chapter-7-11

 

In summary we have setup a DRBD partition, placed data in the partition and manually switched the servers from being primary to secondary and then back to secondary to primary.

At this point we have setup the basics of DRBD – this is exactly where we need to be before moving on to the next stage….if it is not working for you, you need to resolve this before moving on.

Migrating Elastix and Other directories to the replication area

Before you commence the migration of the Elastix directories, if you are going to do any yum upgrades to your Elastix system or add new features, it is recommended that you complete these before you complete the migration.

At this point in time, if you want to run an Elastix system for a few weeks to make sure everything is working, and all the upgrades are completed that are needed, you can leave your system as is.

Once you commence the migration to the DRBD partition, you cannot easily perform upgrades or add-ons that may utilise other directories that may not be replicated e.g. software that uses the /opt directory as this will probably only exist on the Node 1 server, and when your system cuts over, you may find that functionality does not work.

If you feel that you are happy with the system, then we now need to move the files and directories we want to be replicated to the DRBD Partition. Otherwise Elastix will continue to run on the root partition and none of its files will be replicated.

We do this by moving the files and directories that we want replicated into the /repdata area, We then add Symlinks in the original location of these files and directories so that we don’t have to change anything in Elastix. To Elastix all the files look like they are still in the same location as they originally were.

On the Node 1 server issue the following commands which will move the data to the /repdata area (the area that will sync between the servers), remove the old files and setup symbolic links

[code language=”text”]
cd /repdata
tar -zcvf etc-asterisk.tgz /etc/asterisk/
tar -zxvf etc-asterisk.tgz
tar -zcvf var-lib-asterisk.tgz /var/lib/asterisk/
tar -zxvf var-lib-asterisk.tgz
tar -zcvf usr-lib-asterisk.tgz /usr/lib/asterisk/
tar -zxvf usr-lib-asterisk.tgz
tar -zcvf var-www.tgz /var/www/
tar -zxvf var-www.tgz
tar -zcvf var-spool-asterisk.tgz /var/spool/asterisk/
tar -zxvf var-spool-asterisk.tgz
tar -zcvf var-lib-mysql.tgz /var/lib/mysql/
tar -zxvf var-lib-mysql.tgz
tar -zcvf var-log-asterisk.tgz /var/log/asterisk/
tar -zxvf var-log-asterisk.tgz
tar -zcvf tftpboot.tgz /tftpboot/
tar -zxvf tftpboot.tgz
rm -rf /etc/asterisk/
rm -rf /var/lib/asterisk/
rm -rf /usr/lib/asterisk/
rm -rf /var/spool/asterisk/
rm -rf /var/lib/mysql/
rm -rf /var/log/asterisk/
rm -rf /tftpboot/
rm -rf /var/www
ln -s /repdata/etc/asterisk/ /etc/asterisk
ln -s /repdata/var/lib/asterisk/ /var/lib/asterisk
ln -s /repdata/usr/lib/asterisk/ /usr/lib/asterisk
ln -s /repdata/var/spool/asterisk/ /var/spool/asterisk
ln -s /repdata/var/lib/mysql/ /var/lib/mysql
ln -s /repdata/var/log/asterisk/ /var/log/asterisk
ln -s /repdata/var/www /var/www
ln -s /repdata/tftpboot /tftpboot
[/code]

On Node 1 Server issue the following commands to bring these services to a stop.

[code language=”text”]
service mysqld stop
service asterisk stop
service httpd stop
service elastix-portknock stop
service elastix-updaterd stop
[/code]

You should see OK after each service has stopped. If you are not using a service e.g. port-knock, then it’s ok if it fails to stop.

note if you have found other services you need heartbeat to stop and start add them above.

Still on the Node 1 Server, we are now going to manually bring the server as the secondary

[code language=”text”]
umount /repdata
drbdadm secondary repdata
[/code]

On the Node 2 Server, we are going to bring it up as the primary

[code language=”text”]
drbdadm primary repdata
mount /dev/drbd0 /repdata
[/code]

If you now check the /repdata directory on the secondary server, you will now see the files are now replicated. So we now need to remove the files from the root directory and provide symlinks. Issue the following commands:

[code language=”text”]
rm -rf /etc/asterisk/
rm -rf /var/lib/asterisk/
rm -rf /usr/lib/asterisk/
rm -rf /var/spool/asterisk/
rm -rf /var/lib/mysql/
rm -rf /var/log/asterisk/
rm -rf /var/www/
rm -rf /tftpboot/
ln -s /repdata/etc/asterisk/ /etc/asterisk
ln -s /repdata/var/lib/asterisk/ /var/lib/asterisk
ln -s /repdata/usr/lib/asterisk/ /usr/lib/asterisk
ln -s /repdata/var/spool/asterisk/ /var/spool/asterisk
ln -s /repdata/var/lib/mysql/ /var/lib/mysql
ln -s /repdata/var/log/asterisk/ /var/log/asterisk
ln -s /repdata/var/www /var/www
ln -s /repdata/tftpboot /tftpboot
Now stop the services
service mysqld stop
service asterisk stop
service httpd stop
service elastix-portknock stop
service elastix-updaterd stop
[/code]

Now thats done lets put it all back to normal by taking the secondary back to offline and putting the primary back online on the the Node 2 Server issue the following command

[code language=”text”]
umount /repdata
drbdadm secondary repdata
[/code]

On the Node 1 Server issue the following command

[code language=”text”]
drbdadm primary repdata
mount /dev/drbd0 /repdata
[/code]

In summary we have now completed the migration work and confirmed that it is replicating.

Heartbeat

Now we can move onto setup Heartbeat. This is what monitors the nodes and runs the various scripts to bring services back online.

So very simply, we are going to disable services on both systems so they do not start up at boot – Heartbeat will manage the start up of these services depending on which system is the primary

so on both Node Servers run the following commands

[code language=”text”]
chkconfig asterisk off
chkconfig mysqld off
chkconfig httpd off
chkconfig elastix-portknock off
chkconfig elastix-updaterd off
service mysqld stop
service asterisk stop
service httpd stop
service elastix-portknock stop
service elastix-updaterd stop
[/code]

you may get a few fails as we have already stopped some services depending on what server you are on. As long as we have made sure that they are stopped, we are fine.

On the Node 1 Server edit the /etc/ha.d/ha.cf file and place the following lines in the file

[code language=”text”]
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694
bcast eth0
auto_failback on
node elxnode1.netintegrity.local
node elxnode2.netintegrity.local
[/code]

edit the /etc/ha.d/authkeys file and place the following in the file (replace the Quick2TheDraw! with you own password of your choosing)

[code language=”text”]
auth 1
1 sha1 Quick2TheDraw!
[/code]

Change the permissions on the authkeys file

[code language=”text”]
chmod 600 /etc/ha.d/authkeys
[/code]

Edit the /etc/ha.d/haresources file and place the following in the file

[code language=”text”]
elxnode1.netintegrity.local drbddisk::repdata
Filesystem::/dev/drbd0::/repdata::ext3
IPaddr::172.22.22.230/24/eth0:0 mysqld asterisk httpd elastix-portknock
elastix-updaterd
[/code]

Important : The above line that goes into the haresources file needs to be all on one line when you enter it into your editor (it’s worth extending your SSH terminal so you can see it is on one lineThe IP address in the line above is going to be your Virtual IP address. The names at the end of the line are the services that HA will control e.g. it will turn them on or off

Now to start the heartbeat service on the Node 1 Server

[code language=”text”]
service heartbeat start
[/code]

You might get what appear to be critical errors coming up. Ignore until after the reboot coming up shortly

now copy the ha config files across to the Node 2 Server with the following command

[code language=”text”]
scp /etc/ha.d/ha.cf /etc/ha.d/authkeys /etc/ha.d/haresources root@elxnode2.netintegrity.local:/etc/ha.d/
[/code]

on the Node 2 Server start the heartbeat service as well

[code language=”text”]
service heartbeat start
[/code]

Now Reboot both Nodes!!

They should come up with Node 1 as the primary and Node 2 as the secondary. You may want to perform a cat /proc/drbd on both servers, and you may find that they are still syncing with all that data that we put onto the DRBD partition. This might take a little while, but once synced, the small changes you make to the configuration should replicate across in seconds.

That’s it, you now have your high availability system in operation. One of the main things to remember is that you no longer access the individual Server Node IP addresses to make changes, you perform everything through your Virtual IP address. The only time you should need to access the Node Server IP address is to update your DRBD / HA configuration files or to perform a test or maintenance e.g. you want to shut down Server Node 1 to confirm performs the high availability function.

Additional Notes

You have just completed a High Availability implementation of Elastix. Before you put it into operation you need to test it thoroughly.

It is also very worthwhile checking other documents on the Internet. Whilst care is taken with the preparation of this document, it is still possible errors or even omissions may have occurred.

Setting your phones to connect to the Virtual IP Address

Remember, everything needs to address or operate via the Virtual IP Address, otherwise your High Availability setup will fail. If the phones are pointing to the IP address of Server Node 1, when Server Node 1 fails, so do your phones.

This also means changing the Option 66 in your DHCP if you implemented this. This should now point to the Virtual IP Address.

Checking Heartbeat controlled services

As you have seen under the /etc/ha.d/haresources file, there are a number of services which Heartbeat has been assigned to control. These are currently:

mysqld asterisk httpd elastix-portknock elastix-updaterd

These are the basic services that we need heartbeat to control. However you may find that your implementation has other services that need to be stopped depending on what addon’s you have installed. For instance one that is not included above is Openfire. If you have implemented this and it makes connections to other gateways, you probably need to think about including Openfire as one of the services controlled by HeartBeat. There may be others. Not all services need to be controlled by Heartbeat, mainly the ones that may interfere.

You can check what Services you currenly have running by running the following command

[code language=”text”]
chkconfig –list
[/code]

which will list all the current services whether they are running or not.

Thinking about the Endpoint Configurator or Provisioning files

This is not the only item to think about, but it gives you an idea of where your thinking needs to go.

You will note that I included the TFTPBOOT directory and files as part of the replication. This is a good idea, however one of the issues is that configuration files when they are written by the EndPoint Configurator, contain the IP address of the Server that they are created on. If you understand what I mean, is that the Server Node 1 address is written into the file, not the Virtual IP Address. This is a failing of the Endpoint configurator scripts, but to be fair, these scripts were never meant to consider a high availability capability.

So it may be necessary to implement a workaround and edit the main template script that creates these files and hard set an IP address. Not ideal, but necessary if you want your High Availability solution to work successfully.

I have successfully made this change and works well, but as I mentioned, this is just one area that needed to be considered. There may be others.

Troubleshooting

cat /proc/drbd
Run on either node – provides a detailed status of your DRBD replication and current role

drbd-overview
Run on either node – provides a very quick to read status of your DRBD replication and current role

/etc/init.d/drbd status
Run on either node – provides a very quick to read status of your DRBD replication connection state,current role and includes mount directory

drbdadm cstate repdata
Run on either node – reports back the resource (repdata) connection state

Trade Marks

Elastix – Elastix is a trademark of Palostanto Solutions
DRBD is a registered trademarks of LINBIT Information Technologies GmbH in Austria, the United States and other countries
Asterisk – Asterisk is a trademark of Digium, Inc.
Linux – Linux is a registered trademark of Linus Torvalds in several countries
Apache – Apache is a trademark of the Apache Software Foundation
MYSQL – MySQL is a trademark of MySQL AB in the United States and other countries.

All other trademarks mentioned in this document are the property of their respective owners. The Trademark declarations on this page have been sourced where the company or product mentioned maintains a Trademark usage page.

Credits

BluePackets – Sponsor Recognition

I would like to recognise the support of BluePackets (Canberra, Australia) who have provided resources for the completion of this document which may have included:

  • Time provided to document this process
  • Access to equipment for the purpose of testing
  • Support in completing this document

BluePackets are a strong supporter of the Open Source Initiative as well as the official distributor of Elastix in the Oceania region1

References

The initial document that was used to prepare this guide.
http://wiki.centos.org/HowTos/Ha-Drbd

Disclaimer

Your use of these application notes is subject to the following conditions:

  • Your application of the information provided is entirely at your own risk
  • Whilst tested in a test environment, your environment may be different and the application of these notes may be totally incorrect.
  • It is up to you to test in a test environment as to the suitability of these notes.
  • You will not hold myself, or any company that I am associated with, responsible for any damages arising from the use of these notes.

Document History

Version Date Change
0.9 3/3/2013 Initial Document Commencement based on original test.
1.0 5/4/2014 Almost a year to get a document together – but tested multiple times