Jump to: navigation, search

Difference between revisions of "HAforNovaDB"

(import cleanup)
 
(25 intermediate revisions by one other user not shown)
Line 1: Line 1:
__NOTOC__
+
= Making Nova Database Highly Available with Pacemaker =
= Making Nova Database Highly Available with Pacemaker (UNDER CONSTRUCTION) =
 
  
 
== Abstract ==
 
== Abstract ==
Line 24: Line 23:
 
== Terms ==
 
== Terms ==
  
""Shoot The Other Node In The Head or STONITH""; - one of the techniques to prevent data corruption in clustering.  It is possible that two nodes in a high availability cluster will each think the other one is dead when in fact they are both alive.  If both nodes attempt recovery, they will corrupt the shared data (file system).  The example in this wiki implements STONITH with something called IPMI which allows one server to reset another server.  For more information, refer to the Pacemaker documentation.
+
'''Shoot The Other Node In The Head or STONITH''' - one of the techniques to prevent data corruption in clustering.  It is possible that two nodes in a high availability cluster will each think the other one is dead when in fact they are both alive.  If both nodes attempt recovery, they will corrupt the shared data (file system).  The example in this wiki implements STONITH with something called IPMI which allows one server to reset another server.  For more information, refer to the Pacemaker documentation.
  
""Virtual IP address or VIP"" - IP address associated with an application and moves with the application when the application is moved from one server to another server.  It allows a client to always call the same IP address without having to figure out if the application is presently available on server1 or server2.
+
'''Virtual IP address or VIP''' - IP address associated with an application and moves with the application when the application is moved from one server to another server.  It allows a client to always call the same IP address without having to figure out if the application is presently available on server1 or server2.
  
 
== Before You Start ==
 
== Before You Start ==
  
 
* After each configuration step, make sure you test!  The best way to ensure high availability is to test different scenarios as you build the solution.
 
* After each configuration step, make sure you test!  The best way to ensure high availability is to test different scenarios as you build the solution.
* Helpful links for additional information is found at the end of the wiki
+
* Helpful links for additional information are found at the end of the wiki
 
* These steps were tested with Ubuntu 10.04.3 LTS using [[OpenStack]] Nova Cactus
 
* These steps were tested with Ubuntu 10.04.3 LTS using [[OpenStack]] Nova Cactus
  
 
== Setting Up Storage ==
 
== Setting Up Storage ==
  
Before setting up HA on the servers, we must first setup the shared storage.
+
Before setting up HA on the servers, we must test the shared storage and then setup LVM.
  
ASSUMPTIONS:
+
=== Shared Storage Test Case ===
  
* Already have the shared storage connected and if using multipath storage it is already configured.
+
* Test that can access the storage from both servers and that you know the LUNs which are shared
* Have tested that shared storage works with these tests
+
* Tested that shared storage works with these tests
** * reboot both nodes at once and no errors in syslog
+
** reboot both nodes at once and make sure that you see no errors in syslog
** * doing reads from one server to one LUN and reboot second server - should not see any SCSI errors in syslog
+
** use '''dd''' to do reads from one server to one LUN and reboot second server - should not see any SCSI errors in syslog
 
+
** if have multiple paths, test that paths work by pulling one of the active paths while doing I/O and make sure continues to work
== TEST CASE ==
 
  
 
== LVM Setup ==
 
== LVM Setup ==
Line 82: Line 80:
  
  
NOTE: volume_list must contain the name of any LVM volume groups placed under HA control as well as any “root” volumes if you are using LVM for the root disk.  If the root volumes are not listed then the system will fail to boot!  Therefore, be sure to add them as need.  Since we are using the LVM volume group named “db-vg” we have added it above.
+
NOTE: volume_list must contain the name of any LVM volume groups placed under HA control as well as any “root” volumes if you are using LVM for the root disk.  If the root volumes are not listed then the system will fail to boot!  Therefore, be sure to add them as need.  Since we are using the LVM volume group named '''db-vg''' we have added it above.
  
 
Recreate ramfs with the new lvm.conf so that the node does not automatically active the volume group:
 
Recreate ramfs with the new lvm.conf so that the node does not automatically active the volume group:
Line 131: Line 129:
 
# mount  /dev/db-vg/db-vol /dbmnt
 
# mount  /dev/db-vg/db-vol /dbmnt
 
</nowiki></pre>
 
</nowiki></pre>
 +
 +
 +
=== LVM Test Case ===
 +
 +
* Reboot both systems and make sure that volume group '''db-vg''' is not displayed with the command '''vgdisplay -a'''
 +
* On one system, import the volume group, start the volume and mount it with these commands
 +
** '''vgchange -a y db-vg'''
 +
** '''mount /dev/db-vg/db-vol /dbmnt'''
  
 
== Installing MySQL ==
 
== Installing MySQL ==
On both nodes, make sure the mysql user and group exists and has the same uid/gid.:
+
On both nodes, make sure the mysql user and group exists and has the same user ID and group ID.:
  
 
<pre><nowiki>
 
<pre><nowiki>
Line 152: Line 158:
  
  
NOTE: [[AppArmor]] (http://en.wikipedia.org/wiki/AppArmor) is a security module in Linux.  For example, it allows the administrator to restrict an application to only access certain capabilities such as certain files and directories.  If the below change is not made, MySQL will not be able to open the directory containing the database.
+
  NOTE: [[AppArmor]] (http://en.wikipedia.org/wiki/AppArmor) is a security module in Linux.  For example, it allows the administrator to restrict an application to only access certain capabilities such as certain files and directories.  If the below change is not made, MySQL will not be able to open the directory containing the database.
  
 
On both nodes, setup /etc/apparmor.d/usr.sbin.mysqld so that mysql can read/write the file system on the shared disk.  Here is the diff:
 
On both nodes, setup /etc/apparmor.d/usr.sbin.mysqld so that mysql can read/write the file system on the shared disk.  Here is the diff:
Line 173: Line 179:
  
  
        NOTE: The above lines must be changed if you install the mysql database in a directory other than /dbmnt.
+
  NOTE: The above lines must be changed if you install the mysql database in a directory other than /dbmnt.
  
On both nodes, /etc/mysql/my.cnf as follows:
+
On both nodes, modify /etc/mysql/my.cnf as follows:
  
  
 
<pre><nowiki>
 
<pre><nowiki>
root@cc-vol-4-1:/etc/mysql# diff my.cnf my.cnf-orig
+
root@server1:/etc/mysql# diff my.cnf my.cnf-orig
 
46,47c46
 
46,47c46
 
< #datadir                = /var/lib/mysql
 
< #datadir                = /var/lib/mysql
Line 208: Line 214:
 
MySQL will only be started/stopped/failed over by Pacemaker.  This change prevents MySQL from being started at boot.
 
MySQL will only be started/stopped/failed over by Pacemaker.  This change prevents MySQL from being started at boot.
  
On the node which has /dbmnt mounted on the LVM volume do this commands from the command line:
+
On the node which has /dbmnt mounted on the LVM volume, do these commands from the command line to create a MySQL database in /dbmnt/mysql:
  
 
<pre><nowiki>
 
<pre><nowiki>
Line 224: Line 230:
  
  
??? how start database by hand here????? how connect by hand here??? what about test cases here????? how configure pacemaker with the “crm configure edit”???? how setup quorum policy???
+
=== MySQL Test Case ===
 +
 
 +
* Connect to the database instance by doing '''mysql -u root -p'''
  
 
== Installing and Configuring Pacemaker ==
 
== Installing and Configuring Pacemaker ==
 
This document will cover how to make the servers highly available in [[OpenStack]].
 
  
 
Assumptions
 
Assumptions
Line 234: Line 240:
 
* Someone has already configured the shared disk storage and has already setup LVM using the naming outlined in this document.
 
* Someone has already configured the shared disk storage and has already setup LVM using the naming outlined in this document.
 
* HA will start/stop the LVM volume groups
 
* HA will start/stop the LVM volume groups
* LVM volume group will be named “db-vg”.
+
* LVM volume group will be named '''db-vg'''.
* The LVM volume will be called “db-vol”.
+
* The LVM volume will be called '''db-vol'''.
* The mount point for the MySQL database is /dbmnt.
+
* The mount point for the MySQL database is '''/dbmnt'''
 
* An ext4 file system has already been created on the LVM volume for the MySQL database.
 
* An ext4 file system has already been created on the LVM volume for the MySQL database.
 
* Someone has already configured IPMI access between the nodes for STONITH
 
* Someone has already configured IPMI access between the nodes for STONITH
Line 243: Line 249:
 
== Setting Up Direct Attached Network Cables with Corosync ==
 
== Setting Up Direct Attached Network Cables with Corosync ==
  
Direct connect cables should already be connected between the hosts.
+
Ethernet cables should be directly connected between the hosts and should be configured.
  
The cables will be bonded together on the device “bond1” OR “bond2”. Check /etc/network/interfaces for the hearbeat bond to be sure.    An example /etc/network/interfaces file entry for the bonded hearbeat link is:
+
A bond interface should be created for the Ethernet devices. Check /etc/network/interfaces for the hearbeat bond to be sure.    An example /etc/network/interfaces file entry for the bonded hearbeat link is:
  
  
Line 261: Line 267:
  
  
where 172.16.0.1 would be replaced by the IP address on each server for the private heartbeat link.
+
where 172.16.0.1 would be replaced by the IP address on each server for the private heartbeat link and eth0/eth1 would list the actual ethernet interfaces used.
Setting Up Pacemaker for High Availability
 
  
Pacemaker is an HA framework.   To set it up do:
+
Exact details of creating a bond device from the two Ethernet devices is outside the scope of this document.
  
== Install ==
+
== Install Pacemaker ==
  
  
Line 279: Line 284:
 
On both nodes, configure and start OpenAIS (the cluster membership protocol)
 
On both nodes, configure and start OpenAIS (the cluster membership protocol)
  
For reference, you can look at the document (everything covered in steps below)
+
For reference, you can look at the document (http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf) specifically the sections titled '''Configuring OpenAIS''', '''Verify OpenAIS Installation''' and '''Verify Pacemaker Installation'''.
http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
+
 
        specifically the sections titled “Configuring OpenAIS”, “Verify OpenAIS Installation” and “Verify Pacemaker Installation”.
+
However, everything you need is covered below.
  
 
Change /etc/default/corosync as follows:
 
Change /etc/default/corosync as follows:
 
  
 
<pre><nowiki>
 
<pre><nowiki>
Line 297: Line 301:
  
  
Configure the bindnetadd, multi-cast address and multi-cast port in /etc/corosync/corosync.conf as well as other settings.  I only modified bindnetaddr in the default file.
+
Configure the bindnetadd in /etc/corosync/corosync.conf file.
For reference, I included my working file using a direct cable (no network switch):
 
  
 +
For reference, I included my working file using direct connect cables (no network switch):
  
 
<pre><nowiki>
 
<pre><nowiki>
Line 397: Line 401:
  
 
<pre><nowiki>
 
<pre><nowiki>
# crm_mon -n
+
# crm_mon -n -1
 
</nowiki></pre>
 
</nowiki></pre>
  
Line 409: Line 413:
 
</nowiki></pre>
 
</nowiki></pre>
  
You should see a message in the “crm_mon -n” output that lists one of the nodes as “UNCLEAN (offline)”.
+
You should see a message in the “crm_mon -n -1” output that lists one of the nodes as “UNCLEAN (offline)”.
  
After you see this, reboot the system where the bond interface was taken down.
+
After you see this, reboot both servers.
  
 
Based on the above corosync.conf file, you should also see messages in /var/log/syslog like:
 
Based on the above corosync.conf file, you should also see messages in /var/log/syslog like:
  
 
<pre><nowiki>
 
<pre><nowiki>
Jul 12 22:41:17 server1 corosync[26356]:  [TOTEM ] Initializing transport (UDP/IP).
+
<DATE> server1 corosync[26356]:  [TOTEM ] Initializing transport (UDP/IP).
Jul 12 22:41:17 server1 corosync[26356]:  [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
+
<DATE> server1 corosync[26356]:  [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Jul 12 22:41:17 server1 corosync[26356]:  [TOTEM ] The network interface [10.4.1.29] is now up.
+
<DATE> server1 corosync[26356]:  [TOTEM ] The network interface [<IP_ADDRESS>] is now up.
 
</nowiki></pre>
 
</nowiki></pre>
  
Line 424: Line 428:
 
=== Setting Up STONITH ===
 
=== Setting Up STONITH ===
  
?????? need procedure for this based on the Dells....
+
Setting up IPMI access is outside of the scope of this document.
 +
 
 +
However, an simple overview is that IP addresses must be statically assigned to the IPMI ports of each server and it must be possible to reset each server.
 +
 
 +
=== Pacemaker Test Case ===
  
Refer them to a link?????????????
+
* From one server, check the status of Pacemaker with the command '''crm_mon -n -1'''
 +
* From each server, do the command '''ipmitool -l lan -H <IPMI address of peer server> -U ADMIN -P <IPMI password> power status'''
 +
* From each server, take turns resetting the peer server with the command '''ipmitool -l lan -H <IPMI address of peer server> -U ADMIN -P <IPMI password> power reset'''
  
 
=== Setup Modprobe Alias for Device Driver ===
 
=== Setup Modprobe Alias for Device Driver ===
This is needed because the [[FileSystem]] agent looks for this device.  On BOTH nodes do:
+
This is needed because the [[FileSystem]] agent of Pacemaker (used to mount, umount and monitor a file system) looks for this device.  On BOTH nodes do:
  
  
Line 443: Line 453:
  
  
== Test Cases ==
+
== Configure Resources in Pacemaker ==
 +
 
 +
Pacemaker must be configured to start and stop the resources in the correct order.  For example, first the LVM volume group '''db-vg''' must be started before the file system can be mounted and the MySQL database started.
 +
 
 +
To do this, take the example configuration in the Appendix and save it to a file named '''/tmp/pace-config'''.
 +
 
 +
Then, modify the values SERVER1, SERVER2, MYSQL_INTERFACE, MYSQL_VIP, ADD_PASSWORD_HERE, SERVER1_IPMI_IP, and SERVER2_IPMI_IP to the values for your local setup.
 +
 
 +
Setup Pacemaker with the new configuration with the command:
 +
 
 +
<pre><nowiki>
 +
# crm configure load update /tmp/pace-config
 +
</nowiki></pre>
 +
 
  
== Troubleshooting Tools ==
+
Tell Pacemaker that STONITH should be used with the command:
  
 
<pre><nowiki>
 
<pre><nowiki>
crm_mon -n
+
# crm configure property stonith-enabled=true
crm configure show (refer to doc)
 
 
</nowiki></pre>
 
</nowiki></pre>
 +
 +
 +
== Second Set of Pacemaker Test Cases ==
 +
 +
* Reboot both servers at the same time and make sure that the database is started on one of the servers and that the command '''mysql -uroot -p<database password>''' is able to connect to the database
 +
* Reboot the server which is presently running the MySQL database and make sure that the database is failed over to the second server.  Use the command '''crm_mon -n -1''' to track that there are no errors
 +
 +
== Next Steps ==
 +
 +
The next steps are to keep testing, create the database tables for Nova, make sure the database is secure and then modify the other [[OpenStack]] services to point to the VIP of MySQL.
 +
 +
These steps are outside of the scope of this document but are necessary.
 +
 +
== Pacemaker Troubleshooting Tools/Tips ==
 +
 +
Refer to the Pacemaker documentation for additional information.
 +
 +
A simple command to see what systems and resources Pacemaker sees is '''crm_mon -n -1'''
 +
 +
To see the Pacemaker configuration do '''crm configure show'''
 +
 +
Pacemaker generally logs messages in '''/var/log/syslog'''.  If a resource such as the file system does not mount, read the log file to see if you can figure out why it is not mounting.
 +
 +
Pacemaker uses shell scripts named '''agents or RAs''' to start, stop and monitor resources like a file system.  These scripts are found /usr/lib/ocf/resource.d.  If for example the file system will not start (mount), try the exact set of commands from the agent to reproduce the problem.
  
 
== References ==
 
== References ==
  
Setting Up MySQL (requires Linbit login) - (http://www.linbit.com/en/education/tech-guides/mysql-high-availability-on-the-pacemaker-cluster-stack/)
+
* Setting Up MySQL (requires Linbit login) - (http://www.linbit.com/en/education/tech-guides/mysql-high-availability-on-the-pacemaker-cluster-stack/)
Pacemaker website - (http://www.clusterlabs.org)
+
* Pacemaker website - (http://www.clusterlabs.org)
 +
* Installing Pacemaker - (http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf)
  
 
== Appendix - Sample Pacemaker Configuration File ==
 
== Appendix - Sample Pacemaker Configuration File ==
 +
 +
Below is an example Pacemaker configuration file for MySQL.  This file must be modified with:
 +
 +
* SERVER1 - name of first server
 +
* SERVER2 - name of second server
 +
* MYSQL_INTERFACE - replace with ethernet interface used for MYSQL_VIP (i.e. bond1:200)
 +
* MYSQL_VIP - replace with IP address of VIP for MYSQL
 +
* ADD_PASSWORD_HERE - replace with password of IPMI interface
 +
* SERVER1_IPMI_IP - replace with IP address of IPMI interface of server1
 +
* SERVER2_IPMI_IP - replace with IP address of IPMI interface of server2
 +
  
 
<pre><nowiki>
 
<pre><nowiki>
node server1
+
node SERVER1
node server2
+
node SERVER2
 
primitive db-fs-p ocf:heartbeat:Filesystem \
 
primitive db-fs-p ocf:heartbeat:Filesystem \
 
         params device="/dev/db-vg/db-vol" directory="/dbmnt" fstype="ext4" \
 
         params device="/dev/db-vg/db-vol" directory="/dbmnt" fstype="ext4" \
Line 493: Line 551:
 
location loc-ipmilan-server1 ipmilan-server1-p -inf: server1
 
location loc-ipmilan-server1 ipmilan-server1-p -inf: server1
 
location loc-ipmilan-server2 ipmilan-server2-p -inf: server2
 
location loc-ipmilan-server2 ipmilan-server2-p -inf: server2
property $id="cib-bootstrap-options" \
 
        dc-version="1.0.8-042548a451fce8400660f6031f4da6f0223dd5dd" \
 
        cluster-infrastructure="openais" \
 
        stonith-enabled="true" \
 
        expected-quorum-votes="2" \
 
        no-quorum-policy="ignore" \
 
        last-lrm-refresh="1320372008"
 
rsc_defaults $id="rsc-options" \
 
        resource-stickiness="1000"
 
 
</nowiki></pre>
 
</nowiki></pre>
 
 
MYSQL_INTERFACE - replace with ethernet interface used for MYSQL_VIP
 
MYSQL_VIP - replace with IP address of VIP for MYSQL
 
ADD_PASSWORD_HERE - replace with password of IPMI interface
 
SERVER1_IPMI_IP - replace with IP address of IPMI interface of server1
 
SERVER2_IPMI_IP - replace with IP address of IPMI interface of server2
 
 
mention that have to change configuration files of other services to mention that mysql vip is above....
 

Latest revision as of 22:22, 16 February 2013

Making Nova Database Highly Available with Pacemaker

Abstract

Nova maintains a list of resources and their state in a database. This database is queried and modified frequently by various services in the Nova installation. Therefore, this database must be highly available - if the database server dies the database must quickly be restarted either locally or on another server.

This document will illustrate the steps necessary to make this happen using the open source High Availability (HA) software called Pacemaker.

Overview

At Internap, we are using the open source High Availability software called Pacemaker (http://www.clusterlabs.org) for HA. Pacemaker is configured on two servers with the servers sharing a disk subsystem.

We use hardware RAID for the shared disk and then put the LUNs into an LVM volume so that we can increase the storage online in the future. A file system is then created on top of the volume and MySQL is configured on the file system. MySQL is then bound to a virtual IP address (VIP). As long as the other Openstack components only use the VIP to talk to the database, fail over of the database from one server to another server is virtually transparent.

Some implementations may use something called Distributed Replicated Block Device or DRBD instead of shared disk. It was felt that DRBD adds an extra level of complexity that was unnecessary for the first release. Therefore, these steps do not cover DRBD.

This wiki will start the reader with the two servers sharing storage all the way through Pacemaker configuration and MySQL installation and database creation at which point the user will be ready to create the database tables.

After each configuration step, the user will be asked to verify that the step worked as expected with a simple test case.

Finally, this wiki will give some practical steps on configuring Pacemaker, troubleshooting tips and test cases.

Terms

Shoot The Other Node In The Head or STONITH - one of the techniques to prevent data corruption in clustering. It is possible that two nodes in a high availability cluster will each think the other one is dead when in fact they are both alive. If both nodes attempt recovery, they will corrupt the shared data (file system). The example in this wiki implements STONITH with something called IPMI which allows one server to reset another server. For more information, refer to the Pacemaker documentation.

Virtual IP address or VIP - IP address associated with an application and moves with the application when the application is moved from one server to another server. It allows a client to always call the same IP address without having to figure out if the application is presently available on server1 or server2.

Before You Start

  • After each configuration step, make sure you test! The best way to ensure high availability is to test different scenarios as you build the solution.
  • Helpful links for additional information are found at the end of the wiki
  • These steps were tested with Ubuntu 10.04.3 LTS using OpenStack Nova Cactus

Setting Up Storage

Before setting up HA on the servers, we must test the shared storage and then setup LVM.

Shared Storage Test Case

  • Test that can access the storage from both servers and that you know the LUNs which are shared
  • Tested that shared storage works with these tests
    • reboot both nodes at once and make sure that you see no errors in syslog
    • use dd to do reads from one server to one LUN and reboot second server - should not see any SCSI errors in syslog
    • if have multiple paths, test that paths work by pulling one of the active paths while doing I/O and make sure continues to work

LVM Setup

We used hardware based RAID and then configured LVM on top of the RAID LUNs. This was done so that as our needs increase we can expand the storage later by growing the LVM volume.

The steps to configure LVM are:

On one node, partition the RAID devices. The device names on the server are /dev/mapper/mpath*. Repeat these steps for all LUNs:

# parted /dev/mapper/<device name> mklabel gpt
# parted /dev/mapper/<device name> mkpart primary 1 <sizeof LUN>


On both nodes, disable the LVM cache of disks by changing /etc/lvm/lvm.conf per below and remove the cache. Also, filter disks that LVM should not scan:


# rm -rf /etc/lvm/cache/.cache
# diff /etc/lvm/lvm.conf /etc/lvm/lvm.conf-orig
53,54c53
<     #filter = [ "a/.*/" ]
<     filter = [ "r|/dev/sd*|" ]
---
>     filter = [ "a/.*/" ]
79,80c78
<     #write_cache_state = 1
<     write_cache_state = 0
---
>     write_cache_state = 1
360,361d355
<     volume_list = [ "db-vg" ]


NOTE: volume_list must contain the name of any LVM volume groups placed under HA control as well as any “root” volumes if you are using LVM for the root disk. If the root volumes are not listed then the system will fail to boot! Therefore, be sure to add them as need. Since we are using the LVM volume group named db-vg we have added it above.

Recreate ramfs with the new lvm.conf so that the node does not automatically active the volume group:

# /usr/sbin/update-initramfs -u

On one node, create a physical volume on partition 1 of each disk:

# pvcreate /dev/mapper/<LUN name>path1

On one node, setup the db volume group with the LUN(s).

# vgcreate db-vg /dev/mapper/mpath1-part1 

On one node, create the logical volume. Here we are creating a 16TB volume:

# lvcreate -L16TB -ndb-vol db-vg

On one node, create a file system on the volumes. The file system is created as ext4:

# mke2fs -t ext4 -j /dev/db-vg/db-vol

On each node, make sure that the volume does not automatically start:

# vgchange -a n 

On each node, make the directories where the file system will be mounted when Pacemaker starts the database service on each node.:

# mkdir /dbmnt

On one node, mount the directory:

# mount  /dev/db-vg/db-vol /dbmnt


LVM Test Case

  • Reboot both systems and make sure that volume group db-vg is not displayed with the command vgdisplay -a
  • On one system, import the volume group, start the volume and mount it with these commands
    • vgchange -a y db-vg
    • mount /dev/db-vg/db-vol /dbmnt

Installing MySQL

On both nodes, make sure the mysql user and group exists and has the same user ID and group ID.:

# groupadd -g<group_id> mysql
# useradd -u<user_id> -d/var/lib/mysql -s/bin/false -g<group_id> mysql
# mkdir /var/run/mysqld
# chmod 755 /var/run/mysqld
# chown mysql /var/run/mysqld
# chgrp mysql /var/run/mysqld

Install mysql-server and mysql-client on both nodes (we use root/nova):

# apt-get install mysql-server
# apt-get install mysql-client


 NOTE: AppArmor (http://en.wikipedia.org/wiki/AppArmor) is a security module in Linux.  For example, it allows the administrator to restrict an application to only access certain capabilities such as certain files and directories.  If the below change is not made, MySQL will not be able to open the directory containing the database.

On both nodes, setup /etc/apparmor.d/usr.sbin.mysqld so that mysql can read/write the file system on the shared disk. Here is the diff:

# diff *mys* $HOME/*mysq*
33,34d31
<   /dbmnt/mysql/ rw,
<   /dbmnt/mysql/** rwkl,

and also add these lines:

 /var/run/mysql/mysqld.pid w,
 /var/run/mysql/mysqld.sock w,
 /dbmnt/ rw,
 /dbmnt/** rwkl,


 NOTE: The above lines must be changed if you install the mysql database in a directory other than /dbmnt.

On both nodes, modify /etc/mysql/my.cnf as follows:


root@server1:/etc/mysql# diff my.cnf my.cnf-orig
46,47c46
< #datadir                = /var/lib/mysql
< datadir                = /dbmnt/mysql
---
> datadir                = /var/lib/mysql


also change bind-address to:


bind-address = 0.0.0.0


On both nodes, prevent MySQL from starting automatically by commenting out the start lines in /etc/init/mysql.conf as follows:


#start on (net-device-up
#          and local-filesystems
#         and runlevel [2345])


MySQL will only be started/stopped/failed over by Pacemaker. This change prevents MySQL from being started at boot.

On the node which has /dbmnt mounted on the LVM volume, do these commands from the command line to create a MySQL database in /dbmnt/mysql:

# mkdir /dbmnt/mysql
# cd /dbmnt
# chgrp mysql mysql
# chown mysql .
# chgrp mysql .
# ls -l /dbmnt
# mysql_install_db --datadir=/dbmnt/mysql --user=mysql
# /usr/bin/mysqladmin -u root password 'nova'
# mysql -u root -p
mysql> show databases;


MySQL Test Case

  • Connect to the database instance by doing mysql -u root -p

Installing and Configuring Pacemaker

Assumptions

  • Someone has already configured the shared disk storage and has already setup LVM using the naming outlined in this document.
  • HA will start/stop the LVM volume groups
  • LVM volume group will be named db-vg.
  • The LVM volume will be called db-vol.
  • The mount point for the MySQL database is /dbmnt
  • An ext4 file system has already been created on the LVM volume for the MySQL database.
  • Someone has already configured IPMI access between the nodes for STONITH
  • Heartbeat links for HA heartbeating will be provided with two heartbeat links between the HA nodes. The heartbeat link will NOT use a switch but instead will use a network cable (direct connect). (If two network cables are used then they should be bonded together)

Setting Up Direct Attached Network Cables with Corosync

Ethernet cables should be directly connected between the hosts and should be configured.

A bond interface should be created for the Ethernet devices. Check /etc/network/interfaces for the hearbeat bond to be sure. An example /etc/network/interfaces file entry for the bonded hearbeat link is:


# heartbeat bond
auto bond2
iface bond2 inet static
  address 172.16.0.1
  netmask 255.255.255.0
  bond-slaves eth0 eth1
  bond_mode 4
  bond_miimon 100
  bond_lacp_rate 1


where 172.16.0.1 would be replaced by the IP address on each server for the private heartbeat link and eth0/eth1 would list the actual ethernet interfaces used.

Exact details of creating a bond device from the two Ethernet devices is outside the scope of this document.

Install Pacemaker

# apt-get install pacemaker
# apt-get install ipmitool


Configure Pacemaker

On both nodes, configure and start OpenAIS (the cluster membership protocol)

For reference, you can look at the document (http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf) specifically the sections titled Configuring OpenAIS, Verify OpenAIS Installation and Verify Pacemaker Installation.

However, everything you need is covered below.

Change /etc/default/corosync as follows:

# pwd
/etc/default
# diff corosync corosync-orig
2c2
< START=yes
---
> START=no


Configure the bindnetadd in /etc/corosync/corosync.conf file.

For reference, I included my working file using direct connect cables (no network switch):

# Please read the openais.conf.5 manual page

totem {
       version: 2

       # How long before declaring a token lost (ms)
       token: 3000

       # How many token retransmits before forming a new configuration
       token_retransmits_before_loss_const: 10

       # How long to wait for join messages in the membership protocol (ms)
       join: 60

       # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms)
       consensus: 5000

       # Turn off the virtual synchrony filter
       vsftype: none

       # Number of messages that may be sent by one processor on receipt of the token
       max_messages: 20

       # Limit generated nodeids to 31-bits (positive signed integers)
       clear_node_high_bit: yes

       # Disable encryption
       secauth: off

       # How many threads to use for encryption/decryption
       threads: 0

       # Optionally assign a fixed node id (integer)
       # nodeid: 1234

       # This specifies the mode of redundant ring, which may be none, active, or passive.
       rrp_mode: none

       interface {
               # The following values need to be set based on your environment
               ringnumber: 0
               #bindnetaddr: 127.0.0.1
               bindnetaddr: 172.16.0.0
               mcastaddr: 226.94.1.1
               mcastport: 5405
       }
}

amf {
       mode: disabled
}

service {
       # Load the Pacemaker Cluster Resource Manager
       ver:       0
       name:      pacemaker
}

aisexec {
       user:   root
       group:  root
}

logging {
       fileline: off
       to_stderr: yes
       to_logfile: no
       to_syslog: yes
       syslog_facility: daemon
       debug: off
       timestamp: on
       logger_subsys {
    subsys: AMF
               debug: off
               tags: enter|leave|trace1|trace2|trace3|trace4|trace6
       }
}


NOTE: 172.16.0.0 should be changed to the public IP address of the systems with the last octet of the public IP address replaced with a 0.

Start Corosync:


# /etc/init.d/corosync start


Verify that the pacemaker nodes can see each other with the command:


# crm_mon -n -1


which should show both nodes. This means that Pacemaker can see both nodes and we are ready to configure Pacemaker to manage the OpenStack services.

Test that the heartbeats are actually going through the direct connect cables by doing this command on the first node:

# ifconfig <bond interface> down

You should see a message in the “crm_mon -n -1” output that lists one of the nodes as “UNCLEAN (offline)”.

After you see this, reboot both servers.

Based on the above corosync.conf file, you should also see messages in /var/log/syslog like:

<DATE> server1 corosync[26356]:   [TOTEM ] Initializing transport (UDP/IP).
<DATE> server1 corosync[26356]:   [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
<DATE> server1 corosync[26356]:   [TOTEM ] The network interface [<IP_ADDRESS>] is now up.


Setting Up STONITH

Setting up IPMI access is outside of the scope of this document.

However, an simple overview is that IP addresses must be statically assigned to the IPMI ports of each server and it must be possible to reset each server.

Pacemaker Test Case

  • From one server, check the status of Pacemaker with the command crm_mon -n -1
  • From each server, do the command ipmitool -l lan -H <IPMI address of peer server> -U ADMIN -P <IPMI password> power status
  • From each server, take turns resetting the peer server with the command ipmitool -l lan -H <IPMI address of peer server> -U ADMIN -P <IPMI password> power reset

Setup Modprobe Alias for Device Driver

This is needed because the FileSystem agent of Pacemaker (used to mount, umount and monitor a file system) looks for this device. On BOTH nodes do:


# echo “alias scsi_hostadapter ahci” >> /etc/modprobe.d/modprobe.conf

After this is done, verify that there are no duplicates in the file with this command:

# cat /etc/modprobe.d/modprobe.conf


Configure Resources in Pacemaker

Pacemaker must be configured to start and stop the resources in the correct order. For example, first the LVM volume group db-vg must be started before the file system can be mounted and the MySQL database started.

To do this, take the example configuration in the Appendix and save it to a file named /tmp/pace-config.

Then, modify the values SERVER1, SERVER2, MYSQL_INTERFACE, MYSQL_VIP, ADD_PASSWORD_HERE, SERVER1_IPMI_IP, and SERVER2_IPMI_IP to the values for your local setup.

Setup Pacemaker with the new configuration with the command:

# crm configure load update /tmp/pace-config


Tell Pacemaker that STONITH should be used with the command:

# crm configure property stonith-enabled=true


Second Set of Pacemaker Test Cases

  • Reboot both servers at the same time and make sure that the database is started on one of the servers and that the command mysql -uroot -p<database password> is able to connect to the database
  • Reboot the server which is presently running the MySQL database and make sure that the database is failed over to the second server. Use the command crm_mon -n -1 to track that there are no errors

Next Steps

The next steps are to keep testing, create the database tables for Nova, make sure the database is secure and then modify the other OpenStack services to point to the VIP of MySQL.

These steps are outside of the scope of this document but are necessary.

Pacemaker Troubleshooting Tools/Tips

Refer to the Pacemaker documentation for additional information.

A simple command to see what systems and resources Pacemaker sees is crm_mon -n -1

To see the Pacemaker configuration do crm configure show

Pacemaker generally logs messages in /var/log/syslog. If a resource such as the file system does not mount, read the log file to see if you can figure out why it is not mounting.

Pacemaker uses shell scripts named agents or RAs to start, stop and monitor resources like a file system. These scripts are found /usr/lib/ocf/resource.d. If for example the file system will not start (mount), try the exact set of commands from the agent to reproduce the problem.

References

Appendix - Sample Pacemaker Configuration File

Below is an example Pacemaker configuration file for MySQL. This file must be modified with:

  • SERVER1 - name of first server
  • SERVER2 - name of second server
  • MYSQL_INTERFACE - replace with ethernet interface used for MYSQL_VIP (i.e. bond1:200)
  • MYSQL_VIP - replace with IP address of VIP for MYSQL
  • ADD_PASSWORD_HERE - replace with password of IPMI interface
  • SERVER1_IPMI_IP - replace with IP address of IPMI interface of server1
  • SERVER2_IPMI_IP - replace with IP address of IPMI interface of server2


node SERVER1
node SERVER2
primitive db-fs-p ocf:heartbeat:Filesystem \
        params device="/dev/db-vg/db-vol" directory="/dbmnt" fstype="ext4" \
        op start interval="0" timeout="120" \
        op monitor interval="60" timeout="60" OCF_CHECK_LEVEL="20" \
        op stop interval="0" timeout="240"
primitive db-lvm-p ocf:heartbeat:LVM \
        params volgrpname="db-vg" exclusive="true" \
        op start interval="0" timeout="30" \
        op stop interval="0" timeout="30"
primitive db-mysql-p ocf:heartbeat:mysql \
        params binary="/usr/sbin/mysqld" config="/etc/mysql/my.cnf" datadir="/dbmnt/mysql" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" additional_parameters="--bind-address=MYSQL_VIP " \
        op start interval="0" timeout="120s" \
        op stop interval="0" timeout="120s" \
        op monitor interval="20s" timeout="30s"
primitive ipmilan-server1-p stonith:external/ipmi \
        params hostname="server1" ipaddr="SERVER1_IPMI_IP" userid="root" passwd="ADD_PASSWORD_HERE" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60" timeout="60" start-delay="0"
primitive ipmilan-server2-p stonith:external/ipmi \
        params hostname="server2" ipaddr="SERVER2_IPMI_IP" userid="root" passwd="ADD_PASSWORD_HERE" \
        op start interval="0" timeout="60" \
        op stop interval="0" timeout="60" \
        op monitor interval="60" timeout="60" start-delay="0"
primitive nova-ip-p ocf:heartbeat:IPaddr2 \
        params ip="MYSQL_VIP" nic="MYSQL_INTERFACE" \
        op monitor interval="5s"
group lvm-fs-ip-mysql-g db-lvm-p db-fs-p glance-ip-p nova-ip-p db-mysql-p \
        meta target-role="Started"
location loc-ipmilan-server1 ipmilan-server1-p -inf: server1
location loc-ipmilan-server2 ipmilan-server2-p -inf: server2