HAGuideImprovements/TOC

(Moved original to the bottom of the page for reference.)

Proposed Revision
This spec refers to the https://blueprints.launchpad.net/openstack-manuals/+spec/improve-ha-guide blueprint.

Strategy and assumptions:

 * Audience is people who have some experience installing OpenStack, not first time users
 * Focus on installation of OpenStack core services
 * Structure the guide sequentially -- the steps to take in a reasonable order
 * Avoid redundancy with the Install Guide; for steps that are identical for HA and non-HA installations, link to appropriate sections in the Install Guide
 * One guide for all Linux distros/platforms
 * Emphasize a reasonable, standard deployment based on open source components. We can provide some notes about alternatives as appropriate (for example, using a commercial load-balancer might be a better alternative than relying on HAProxy) and perhaps a link to the OpenStack Marketplace.
 * The Active/Active versus Active/Passive configurations will be discussed for each component rather than dividing the guide into sections for A/A and A/P as in the current guide.

Redundancy and failover
A failover procedure of single node of the OpenStack environment consists of failover procedures of all existing node roles and installed components. What follows is basic information about each component or role failover.

Controller node

Normally, the following components are included ine the standard layout of the OpenStack Controller node:

Network components

Hardware

Bonding interfaces.

Routing

The configuration is static routing without Virtual Router Redundancy Protocol (VRRP) or similar techniques for gateways to failover.

Endpoints (VIP addresses)

Need description of VIP failover inside Linux namespaces and expected SLA.

LB (HAProxy)

HAProxy runs on each Controller node and does not synchronize the state. Each instance of HAProxy configures its frontend to accept connections only from the VIP address and to terminate them as a list of all instances of the corresponding service under load balancing. For example, any OpenStack API service. This makes the instances of HAProxy act independently and failover transparently together with the Network endpoints (VIP addresses) failover and shares the same SLA.

DB (MySQL)

MySQL with Galera runs behind HAProxy. There is always an active backend and a backup one configured. There is a zero slave lag due to Galera synchronous replication. As a result, the failover procedure completes once HAProxy detects when its active backend goes down and switches to the backup one, which should be marked as 'UP'. If there are no backends up (e.g. if Galera cluster is not ready to accept connections), the failover procedure will finish only when the Galera cluster finishes to reassemble. The SLA is normally no more than 5 minutes.

AMQP (RabbitMQ)

RabbitMQ nodes fail over both on the application and the infrastructure layers. The application layer is controlled by the oslo.messaging configuration options for multiple AMQP hosts. If the AMQP node fails, the application reconnects to the next one configured within the specified reconnect interval. The specified reconnect interval constitutes its SLA. On the infrastructure layer, the SLA is the time for which RabbitMQ cluster reassembles. There are several cases possible. When the Mnesia keeper node fails, which is the master of the corresponding Pacemaker resource for RabbitMQ, there will be a full AMQP cluster downtime interval. Normally, its SLA is no more than several minutes. When another node fails, which is a slave of the corresponding Pacemaker resource for RabbitMQ, there will be no AMQP cluster downtime at all.

Memcached backend

Need descrption of how memcache_pool backend fails over. The SLA is several minutes.

OpenStack stateless components

Needs a reference to the OpenStack HA/admin/ops guides describing the failover procedure in OpenStack for API and other stateless services.

OpenStack stateful components

Needs a reference to the OpenStack HA/admin/ops guides describing the failover procedure in OpenStack, including Neutron and its agents.

Storage components

CEPH-MON

Needs description.

CEPH RADOS-GW

Needs description.

SWIFT-all

Needs description.

Storage node (CEPH-OSD)

Needs description.

Storage node (Cinder-LVM)

Cinder nodes with the cinder-volume service running cannot failover. When a node fails, all LVM volumes on it become unavailable as well. To make them available, a shared storage is required.

Compute node

Compute nodes cannot failover. When a node fails, all instances running on it become unavailable as well. To make them available, HA for instances feature is required.

Mongo DB node

Needs description.

Zabbix node

Needs description.

Stateless/stateful, active/passive, active/active
See:


 * Stateless vs. Stateful services in OpenStack documentation
 * HA Controller role description commit

Quorums
Services should use an odd number of nodes equal to or greater than 3. See also HA Controller role description commit.

Single-controller HA mode
HA environment is not actually highly-available until at least three Controller nodes are configured but it is run using Pacemaker, Corosync, and the other utilities used to manage an HA environment. To make it highly available, add two or more Controller nodes. Each Controller cluster should include an odd number of nodes -- 1 node, 3 nodes, 5 nodes, et cetera.

Storage Backends
(Priority: 1) This section contains more concepts than actual procedures; our expectation is that the specific technologies discussed have their own configuration documentation that can be referenced.

This section describes the data plane (infrastructure) elements that factor into the overall HA capabilities of the storage; in other words, how does one ensure that ones data is not lost when systems fail. Topics to be discussed include RAID, Erasure Coding, etc. and describe the protections they do and do not offer.

We will also in a blurb of the options that are available. Finally, we could state that cinder supports multiple storage providers (Ceph, EMC, NetApp, SolidFire, etc.) and you can also get additional details from your storage providers documentation.

Swift combines control and data plane so we would cover some aspects of both.

Hardware setup
Generally HA can be configured and run on any hardware that is supported by Linux kernel. See http://www.linux-drivers.org/.

As a reference you can also use the hardware supported by Ubuntu. See http://www.ubuntu.com/certification/.

Basic Environment
(Priority: 1)
 * Install O/S on each node (link to Install Guide, e.g http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html
 * Install Memcached (Verify that Oslo supports hash synchronization; if so, this should not take more than load balancing. See http://docs.openstack.org/high-availability-guide/content/_memcached.html
 * Run NTP servers on every controller and configure other nodes to use all of them for synchronization. Link to http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-ntp

Basic HA facilities
(Priority: 1)
 * Install pacemaker, crmsh, corosync, cluster-glue, fence-agents (Fedora only), resource-agents. (Modify: http://docs.openstack.org/high-availability-guide/content/_install_packages.html)
 * What is needed for LSB/upstart/systemd alternative to OCF scripts (RA) for Pacemaker? See https://bugs.launchpad.net/openstack-manuals/+bug/1349398
 * Set up and start Corosync and Pacemaker. Stick with 'crm' tool for Ubuntu/Debian and 'pcs' for RHEL/Fedora (Modify http://docs.openstack.org/high-availability-guide/content/_set_up_corosync.html; Modify: http://docs.openstack.org/high-availability-guide/content/_start_pacemaker.html)
 * Set basic cluster properties (Modify: http://docs.openstack.org/high-availability-guide/content/_set_basic_cluster_properties.html))
 * Configure fencing for Pacemaker cluster (Links to http://clusterlabs.org/doc/)
 * Configure the VIP (Keep: http://docs.openstack.org/high-availability-guide/content/s-api-vip.html )
 * API services -- Do those belong here or in specific sections? (Modify Glance API: http://docs.openstack.org/high-availability-guide/content/s-glance-api.html and Modify Cinder API: http://docs.openstack.org/high-availability-guide/content/s-cinder-api.html )
 * Schedulers
 * Memcached service on Controllers (Keep: http://docs.openstack.org/high-availability-guide/content/_memcached.html, which links to http://code.google.com/p/memcached/wiki/NewStart for specifics)

Install and Configure MySQL
(Priority: 2)
 * Two nodes plus GARBD.
 * MySQL variant with Galera: Cover major options (Galera Cluster for MySQL, Percona XtraDB Cluster, and MariaDB Galera Cluster) and link off to resources to understand installation and initial config options (e.g., SST).
 * Pacemaker multistate clone resource for Galera cluster
 * Pacemaker resource agent for Galera cluster management
 * Deprecate MySQL DRBD configuration because of split-brain issues

RabbitMQ Message broker
(Priority: 2)

The RabbitMQ team is creating their own Guide and we will link to that. This section will include some basic concepts and configuration information but defer to the RabbitMQ documentation for details and advanced tasks.


 * Install and configure message broker on Controller; see http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-prerequisites
 * Oslo messaging for active/active
 * I think services need some special configuration with more than two nodes?
 * No need for active/passive AMQP; Two-node active/active cluster with mirrored queues instead
 * Pacemaker multistate clone resource for RabbitMQ cluster
 * Pacemaker resource agent for RabbitMQ cluster management
 * Deprecate DRBD for RabbitMQ

Keystone Identity services
(Priority: 3, Depends-on: infrastructure)
 * Install Guide for concepts: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-concepts.html
 * Install Guide to configure prerequisites, install and configure the components, and finalize the installation: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-install.html
 * Configure Keystone for HA MySQL and HA RabbitMQ
 * Add Keystone resource to Pacemaker
 * Change bind parameters in keystone.conf
 * Configure OpenStack services to use HA Keystone

Glance image service
(Priority: 5, Depends-on: swift, keystone, infrastructure)
 * Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_keystone.html )
 * Configure Glance for HA MySQL and HA RabbitMQ
 * Add OpenStack Image API resource to Pacemaker, Configure OpenStack Image Service API, Configure OpenStack services to use HA Image API (Modify: http://docs.openstack.org/high-availability-guide/content/s-keystone.html )
 * Configure OpenStack Image Service API (http://docs.openstack.org/high-availability-guide/content/_configure_openstack_image_service_api.html)
 * Configure OpenStack services to use HA Image API (http://docs.openstack.org/high-availability-guide/content/_configure_openstack_services_to_use_high_available_openstack_image_api.html)
 * Should Glance use a redundant storage backend such as Swift or Ceph?

Cinder Block Storage Service
cinder-api

One instance of cinder-scheduler is run per Controller node. API processes are stateless and run in an active/active mode only with a load balancer put in front of them (e.g. HAProxy). The load balancer periodically checks, whether a particular API backend server is currently available or not, and forwards HTTP requests to the available backends in a round-robin fashion. APIs are started on all Controller nodes of a cluster - API requests will be fulfilled, if at least one of cinder-api instances remains available.

cinder-scheduler

One instance of cinder-scheduler is run per Controller node. cinder-scheduler instances work in an active-active mode. RPC requests are distributed between running scheduler instances.

cinder-volume (LVM Backend)

cinder-volume services are run on Storage nodes and cannot work in HA mode with LVM backend.

cinder-volume (Ceph Backend)

cinder-volume services with Ceph backend are run on Controller nodes in HA mode by default.

Swift Object Storage
(Priority: 4, Depends-on: keystone, infrastructure)
 * Install Guide for basic installation
 * The installation guide covers basic storage node redundancy, but only deploys one proxy server. Do we want to discuss the process of adding proxy servers and load balancing them? Also, what about adding storage nodes and perhaps discussing regions/zones?

Nova compute service
OpenStack Compute consists of a number of services. HA implementation is service specific.

Nova API

API processes are stateless and run in active/active mode with a load balancer put in front of them (e.g. HAProxy). The load balancer periodically checks whether a particular API backend server is currently available or not, and forwards HTTP requests to the available backends in a round-robin fashion. APIs are started on all Controller nodes of a cluster - API requests are fulfilled if at least one of the nova-api-* instances remains available.

nova-scheduler

One instance of nova-scheduler is run per one Controller node. nova-scheduler instances work in the active-active mode. RPC requests are distributed between the running scheduler instances.

nova-conductor

If used, nova-conductor instances are run in the active/active mode on each Controller node of the cluster. RPC requests are distributed between running Controller instances.

nova-compute

Exactly one nova-compute instance is run per one Compute node. A Compute node represents a fault domain: if one of the Compute nodes goes down, it will only affect VMs running on this particular node. In this case it will be possible to evacuate the affected VMs to other Compute nodes of the cluster. Note that the evacuation of VMs from a failed compute node is not performed automatically.

nova-network

When used as part of HA, nova-network is run on each Compute node of the cluster (the so called multi-host mode). Each of these nodes must have access to the public network. nova-network provides DHCP/gateway for VMs running on this particular Сompute node and, thus, a Compute node represents a fault domain for nova-network: failure of one of the Compute nodes does not affect any other Compute nodes.

nova-novncproxy, nova-consoleauth, nova-objectstore

An instance is run on each Controller node of the cluster in the active/active mode. Note that the endpoints need to be accessible by the users of the cloud.

Availability zones

See Availability Zones and Host Aggregates in OpenStack documentation.

Heat Orchestration
The heat service consists of:


 * Several API services: native, AWS CloudFormation compatible, and AWS CloudWatch compatible.
 * heat-engine that does the orchestration itself.

By default, each service is deployed on every controller, providing horizontal scalability for both the APIs (API redundancy) and the heat engine.

The API services in are placed behind HAProxy as other OpenStack APIs. To add more redundancy and/or to scale out the deployment, just adding more Controllers is enough.

Notes on Corosync

Currently the heat engine is placed under Corosync/Pacemaker control. This is a code cruft from before LP Bug 1387345 was fixed, when Pacemaker control was necessary to manually force the running of only one instance of the heat engine.

Notes on failover

The heat engine does not support automatic failover. If a stack that is processed by a given heat-engine dies and leaves the stack in the IN_PROGRESS state, no other heat-engine will automatically pick up the stack for further processing. If a different instance of heat-engine is then (re)started, all IN_PROGRESS stacks with no active engines assigned to them will be automatically put to the FAILED state. A user then can manually attempt to make a "recovery" update call with the same template, delete the stack or repeat the stack action (e.g. for non-modifying stack actions like "check").

Ceilometer Telemetry and MongoDB
You can use Ceilometer and MongoDB for the Telemetry service. MongoDB is the default storage backend for Ceilometer.

Ceilometer consists of the following services:


 * Ceilometer API - a real world facing service. It aims to provide the API to query and view the data recorded by the collector service.
 * Ceilometer collector - a daemon designed to gather and record events and metering data created by notifications and sent by polling agents.
 * Ceilometer notification agent - a daemon designed to listen to notifications on message queue and convert them into the Events and Samples.
 * Ceilometer polling agents (central and compute) - daemons created to poll OpenStack services and build Meters. Compute agent polls data only from OpenStack compute service(s). Polling via service APIs for non-compute resources is handled by a central agent usually running on a Cloud Controller node.
 * Ceilometer alarm services (evaluator and notifier) - daemons to evaluate and notify using predefined alarming rules.

Ceilometer-api, ceilometer-agent-notification and ceilometer-collector run on all Controller nodes. ceilometer-agent-compute runs on every Compute node and polls its compute resources.

Ceilometer-api services are placed behind HAProxy in accordance with the practice designed for all OpenStack APIs. A round-robin strategy is used to choose ceilometer-api service for the request processing.

Notes on Corosync

The ceilometer-agent-central and ceilometer-alarm-evaluator services are monitored by Pacemaker. There should be only one ceilometer-agent-central in the cluster to avoid samples duplication from other OpenStack services running in HA mode.

Also there should be only one running instance of the ceilometer-alarm-evaluator service in the cluster because the alarm evaluator sends a signal to the ceilometer-alarm-notifier.

Note that with several instances running, Ceilometer may send some activating signal and cause unexpected actions.

Notes on failover

Ceilometer failures do not affect the performance of other services except for the services which use Ceilometer alarm components (e.g. Heat).

All other services affect only the Ceilometer meter collecting. Broken ceilometer-collector does not poll data from the AMQP queue and does not save data to the storage backend (MongoDB in HA mode). So, the notification.info queue in the AMQP backend may be bulked by published and not collected messages. The notification agent and polling agent failures do not let to get the notification and polling meters respectively. Ceilometer API failures break alarm processes and the API work itself.

Database Service (Trove)
(Priority 9: Depends-on: neutron, nova, glance, keystone, infrastructure)
 * Install Guide for basics
 * Need details about how to apply HA

Sahara
(Priority 9: Depends-on: neutron, nova, glance, keystone, infrastructure)
 * Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_sahara.html )
 * Should link to Sahara docs for discussion of OpenStack HA versus Hadoop HA and how they work together, although the installation instructions at http://docs.openstack.org/developer/sahara/userdoc/installation.guide.html do not currently mention HA

Other

 * Configure Pacemaker service group to ensure that the VIP is linked to the API services resource
 * Systemd alternative to OCF scripts for Pacemaker RA
 * MariaDB/Percona with Galera alternative to MySQL
 * Install and configure HAProxy for API services and MySQL with Galera cluster load balancing
 * Mention value of redundant hardware load balancers for stateless services such as REST APIs
 * Describe scaling single node to 3 nodes HA
 * Ceph?
 * Murano?

Original for reference
NOTE: This is the original for us to depart from.

I. Introduction to OpenStack High Availability
 * Stateless vs. Stateful services
 * Active/Passive
 * Active/Active

II. HA Using Active/Passive

1. The Pacemaker Cluster Stack
 * Installing Packages
 * Setting up Corosync
 * Starting Corosync
 * Starting Pacemaker
 * Setting basic cluster properties

2. Cloud Controller Cluster Stack
 * Highly available MySQL
 * Highly available RabbitMQ

3. API Node Cluster Stack
 * Configure the VIP
 * Highly available OpenStack Identity
 * Highly available OpenStack Image API
 * Highly available Cinder API
 * Highly available OpenStack Networking Server
 * Highly available Ceilometer Central Agent
 * Configure Pacemaker Group

4. Network Controller Cluster Stack
 * Highly available Neutron L3 Agent
 * Highly available Neutron DHCP Agent
 * Highly available Neutron Metadata Agent
 * Manage network resources

III. HA Using Active/Active

5. Database
 * MySQL with Galera
 * Galera Monitoring Scripts
 * Other ways to provide a Highly Available database

6. RabbitMQ
 * Install RabbitMQ
 * Configure RabbitMQ
 * Configure OpenStack Services to use RabbitMQ

7. HAproxy Nodes 8. OpenStack Controller Nodes
 * Running OpenStack API & schedulers
 * Memcached

9. OpenStack Network Nodes
 * Running Neutron DHCP Agent
 * Running Neutron L3 Agent
 * Running Neutron Metadata Agent