OpenStack – Take 2 – High Availability for all services

A major part of our design decision for rebuilding our OpenStack from scratch was availability, closer to what one would see in production. This is one of the things Juju got right, installing most services using HAProxy so that clients could connect to any of the servers running the requested service. What it lacked was load balancers and external HA access.

Since we’re doing 3 controller nodes, and basically converging all services onto those 3 nodes, we’ll do that with the load balancers as well. We need both internal and externally accessible load balanced and redundant servers to take into account both customer APIs and internal/management access from the compute nodes.

Virtual IPs for Inside and Outside use

Since HAProxy doesn’t handle redundancy for itself, we’ll need a VIP that clients can point to for access. Keepalived handles that nicely with a VRRP-like Virtual IP that uses gratuitous arp for rapid failover. (While keepalived calls it VRRP, it is not compatible with other VRRP devices and cannot be joined to a VRRP group with, say, a Cisco or Juniper router) We’ll need both internal and external IPs and to ensure that both are capable of failing over. Technically, these don’t need to fail over together, since they function independently as far as HAProxy is concerned, so we don’t need to do interface or VRRP group tracking which greatly simplifies the configuration.

Our management DHCP range starts at, with IPs below that reserve for things like this. Since this is the main controller IP, we’ll assign it for a nice round number. On the “external” network, we have available, and we want to keep most of it for floating IPs. We’ll use for the controllers real and virtual IPs on the outside network.

First we need to allow the virtual IP address to bind to the NICs:

echo “net.ipv4.ip_nonlocal_bind=1” >> /etc/sysctl.conf
sysctl -p

Then we’ll install the haproxy and keepalived package: “apt-get install keepalived haproxy”

We’ll need to create a keepalived config for each controller:


global_defs {
router_id controller-0
vrrp_script haproxy {
script “killall -0 haproxy”
interval 2
weight 2
vrrp_instance 1 {
virtual_router_id 1
advert_int 1
priority 100
state MASTER
interface eth0
virtual_ipaddress { dev eth0
track_script {

vrrp_instance 2 {
virtual_router_id 2
advert_int 1
priority 100
state MASTER
interface eth1
virtual_ipaddress { dev eth1
track_script {

All the needs to be changed from one host to another is the router_id and possibly the interface name. Start up keepalived: “service keepalived restart” and you should see a Virtual IP available on the first node that is restarted.

root@controller-0:~# ip -4 addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
inet scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc pfifo_fast state UP group default qlen 1000
inet brd scope global eth0
valid_lft forever preferred_lft forever
inet scope global eth0
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
inet brd scope global eth1
valid_lft forever preferred_lft forever
inet scope global eth1
valid_lft forever preferred_lft forever

Rebooting the first node should immediately cause the VIP /32 to move to one of the other servers. (Our recommendation is to have each node with a different priority so there is no ambiguity as to which node is the master, and to set the backups into initial state BACKUP)

HAProxy and Statistics

Now that we have a redundant VirtualIP, we need to get the load balancer working. We installed HAProxy in the previous step, and it nearly works out of the box. We’re going to add an externally facing statistics webpage to the default config however so we can see what it’s doing.


log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
user haproxy
group haproxy
stats socket /var/lib/haproxy/status
maxconn 4000

log global
mode http
option httplog
option dontlognull
contimeout 5000
clitimeout 50000
srvtimeout 50000
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http

listen stats
mode http
stats enable
stats uri /stats
stats realm HAProxy\ Statistics
stats auth admin:openstack

For each node, we’ll want to change the listen IP to be the “outside” IP of that particular server. For now, no services are defined, we’ll get to that in the next step.

The last step here is to edit “/etc/default/haproxy” so it says “ENABLED=1”. Once that’s done, activate the proxy with “service haproxy restart” and you should be able to reach the proxy’s statistics pages on the addresses that were defined.

To be continued – we’ll setup a clustered database and message queue for use with HAProxy in the next step.