Jump to: navigation, search

Difference between revisions of "HAGuideImprovements/TOC"

(Basic Environment)
 
(30 intermediate revisions by 5 users not shown)
Line 3: Line 3:
 
== '''Proposed Revision''' ==
 
== '''Proposed Revision''' ==
  
 +
This spec refers to the https://blueprints.launchpad.net/openstack-manuals/+spec/improve-ha-guide blueprint.
 
=== Strategy and assumptions: ===
 
=== Strategy and assumptions: ===
 
:# Audience is people who have some experience installing OpenStack, not first time users
 
:# Audience is people who have some experience installing OpenStack, not first time users
Line 10: Line 11:
 
:#  One guide for all Linux distros/platforms
 
:#  One guide for all Linux distros/platforms
 
:# Emphasize a reasonable, standard deployment based on open source components.  We can provide some notes about alternatives as appropriate (for example, using a commercial load-balancer might be a better alternative than relying on HAProxy) and perhaps a link to the OpenStack Marketplace.
 
:# Emphasize a reasonable, standard deployment based on open source components.  We can provide some notes about alternatives as appropriate (for example, using a commercial load-balancer might be a better alternative than relying on HAProxy) and perhaps a link to the OpenStack Marketplace.
 +
:# The Active/Active versus Active/Passive configurations will be discussed for each component rather than dividing the guide into sections for A/A and A/P as in the current guide.
  
 
=== Structure/Outline ===
 
=== Structure/Outline ===
  
 
==== HA Intro and Concepts ====
 
==== HA Intro and Concepts ====
:# Redundancy and failover
+
 
:# Stateless/stateful, active/passive, active/active (Keep: http://docs.openstack.org/high-availability-guide/content/stateless-vs-stateful.html)
+
=====Redundancy and failover=====
:# Quorums; many services should use an odd number of nodes equal to or greater than 3
+
 
:# Single-controller HA mode and scaling up to 3 or more
+
A failover procedure of single node of the OpenStack environment consists of failover procedures of all existing node roles and installed components. What follows is basic information about each component or role failover.
 +
 
 +
'''Controller node'''
 +
 
 +
Normally, the following components are included ine the standard layout of the OpenStack Controller node:
 +
 
 +
'''Network components'''
 +
 
 +
''Hardware''
 +
 
 +
Bonding interfaces.
 +
 
 +
''Routing''
 +
 
 +
The configuration is static routing without Virtual Router Redundancy Protocol (VRRP) or similar techniques for gateways to failover.
 +
 
 +
''Endpoints (VIP addresses)''
 +
 
 +
Need description of VIP failover inside Linux namespaces and expected SLA.
 +
 
 +
'''LB (HAProxy)'''
 +
 
 +
HAProxy runs on each Controller node and does not synchronize the state. Each instance of HAProxy configures its frontend to accept connections only from the VIP address and to terminate them as a list of all instances of the corresponding service under load balancing. For example, any OpenStack API service.
 +
This makes the instances of HAProxy act independently and failover transparently together with the Network endpoints (VIP addresses) failover and shares the same SLA.
 +
 
 +
'''DB (MySQL)'''
 +
 
 +
MySQL with Galera runs behind HAProxy. There is always an active backend and a backup one configured. There is a zero slave lag due to Galera synchronous replication. As a result,  the failover procedure completes once HAProxy detects when its active backend goes down and switches to the backup one, which should be marked as 'UP'. If there are no backends up (e.g. if Galera cluster is not ready to accept connections), the failover procedure will finish only when the Galera cluster finishes to reassemble. The SLA is normally no more than 5 minutes.
 +
 
 +
'''AMQP (RabbitMQ)'''
 +
 
 +
RabbitMQ nodes fail over both on the application and the infrastructure layers. The application layer is controlled by the oslo.messaging configuration options for multiple AMQP hosts. If the AMQP node fails, the application reconnects to the next one configured within the specified reconnect interval. The specified reconnect interval constitutes its SLA.
 +
On the infrastructure layer, the SLA is the time for which RabbitMQ cluster reassembles.
 +
There are several cases possible. When the Mnesia keeper node fails, which is the master of the corresponding Pacemaker resource for RabbitMQ, there will be a full AMQP cluster downtime interval. Normally, its SLA is no more than several minutes. When another node fails, which is a slave of the corresponding Pacemaker resource for RabbitMQ, there will be no AMQP cluster downtime at all.
 +
 
 +
'''Memcached backend'''
 +
 
 +
Need descrption of how memcache_pool backend fails over. The SLA is several minutes.
 +
 
 +
'''OpenStack stateless components'''
 +
 
 +
Needs a reference to the OpenStack HA/admin/ops guides describing the failover procedure in OpenStack for API and other stateless services.
 +
 
 +
'''OpenStack stateful components'''
 +
 
 +
Needs a reference to the OpenStack HA/admin/ops guides describing the failover procedure in OpenStack, including Neutron and its agents.
 +
 
 +
'''Storage components'''
 +
 
 +
''CEPH-MON''
 +
 
 +
Needs description.
 +
 
 +
''CEPH RADOS-GW''
 +
 
 +
Needs description.
 +
 
 +
''SWIFT-all''
 +
 
 +
Needs description.
 +
 
 +
''Storage node (CEPH-OSD)''
 +
 
 +
Needs description.
 +
 
 +
''Storage node (Cinder-LVM)''
 +
 
 +
Cinder nodes with the cinder-volume service running cannot failover. When a node fails, all LVM volumes on it become unavailable as well. To make them available, a shared storage is required.
 +
 
 +
''Compute node''
 +
 
 +
Compute nodes cannot failover. When a node fails, all instances running on it become unavailable as well. To make them available, HA for instances feature is required.
 +
 
 +
''Mongo DB node''
 +
 
 +
Needs description.
 +
 
 +
''Zabbix node''
 +
 
 +
Needs description.
 +
 
 +
=====Stateless/stateful, active/passive, active/active=====
 +
 
 +
See:
 +
 
 +
* [http://docs.openstack.org/high-availability-guide/content/stateless-vs-stateful.html Stateless vs. Stateful services in OpenStack documentation]
 +
* [https://review.openstack.org/#/c/153283/ HA Controller role description commit]
 +
 
 +
=====Quorums=====
 +
 
 +
Services should use an odd number of nodes equal to or greater than 3. See also [https://review.openstack.org/#/c/153283/ HA Controller role description commit].
 +
 
 +
=====Single-controller HA mode=====
 +
 
 +
HA environment is not actually highly-available until at least three Controller nodes are configured but it is run using Pacemaker, Corosync, and the other utilities used to manage an HA environment. To make it highly available, add two or more Controller nodes.
 +
Each Controller cluster should include an odd number of nodes -- 1 node, 3 nodes, 5 nodes, et cetera.
 +
 
 +
==== Storage Backends ====
 +
'''(Priority: 1)'''
 +
This section contains more concepts than actual procedures; our expectation is that the specific technologies discussed have their own configuration documentation that can be referenced.
 +
 
 +
This section describes the data plane (infrastructure) elements that factor into the overall HA capabilities of the storage; in other words, how does one ensure that ones data is not lost when systems fail.  Topics to be discussed include RAID, Erasure Coding, etc. and describe the protections they do and do not offer.
 +
 
 +
We will also  in a blurb of the options that are available.  Finally, we could state that cinder supports multiple storage providers (Ceph, EMC, NetApp, SolidFire, etc.) and you can also get additional details from your storage providers documentation.
 +
 
 +
Swift combines control and data plane so we would cover some aspects of both.
  
 
==== Hardware setup ====
 
==== Hardware setup ====
:# Minimal Architecture Example -- Network Layout, styled as in http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-prerequisites for easy comparison
 
  
==== Prerequisites ====
+
Generally HA can be configured and run on any hardware that is supported by Linux kernel. See http://www.linux-drivers.org/.
:# Link to Install Guide: Install O/S on each node
+
 
 +
As a reference you can also use the hardware supported by Ubuntu. See http://www.ubuntu.com/certification/.
 +
 
 +
==== Basic Environment ====
 +
'''(Priority: 1)'''
 +
:# Install O/S on each node (link to Install Guide, e.g http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html
 +
:# Install Memcached (Verify that Oslo supports hash synchronization; if so, this should not take more than load balancing.  See http://docs.openstack.org/high-availability-guide/content/_memcached.html
 +
:# Run NTP servers on every controller and configure other nodes to use all of them for synchronization.  Link to http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-ntp
 +
 
 +
==== Basic HA facilities ====
 +
'''(Priority: 1)'''
 
:# Install pacemaker, crmsh, corosync, cluster-glue, fence-agents (Fedora only), resource-agents. (Modify: http://docs.openstack.org/high-availability-guide/content/_install_packages.html)
 
:# Install pacemaker, crmsh, corosync, cluster-glue, fence-agents (Fedora only), resource-agents. (Modify: http://docs.openstack.org/high-availability-guide/content/_install_packages.html)
 
:# What is needed for LSB/upstart/systemd alternative to OCF scripts (RA) for Pacemaker?  See https://bugs.launchpad.net/openstack-manuals/+bug/1349398
 
:# What is needed for LSB/upstart/systemd alternative to OCF scripts (RA) for Pacemaker?  See https://bugs.launchpad.net/openstack-manuals/+bug/1349398
Line 33: Line 149:
 
:# Schedulers
 
:# Schedulers
 
:# Memcached service on Controllers (Keep: http://docs.openstack.org/high-availability-guide/content/_memcached.html , which links to http://code.google.com/p/memcached/wiki/NewStart for specifics)
 
:# Memcached service on Controllers (Keep: http://docs.openstack.org/high-availability-guide/content/_memcached.html , which links to http://code.google.com/p/memcached/wiki/NewStart for specifics)
 
==== Configure networking on each node ====
 
:# Rather than configuring neutron here, we should simply mention physical network HA methods (e.g., bonding) and additional node/network requirements for L3HA and DVR for planning purposes. As this point the, the networking guide likely won't cover the former.
 
:# Link to Networking Guide
 
:# (Neutron agents should be described for active/active; deprecate single agent's instances case)
 
:# For Kilo and beyond, focus on L3HA and DVR
 
  
 
==== Install and Configure MySQL ====
 
==== Install and Configure MySQL ====
 +
'''(Priority: 2)'''
 
:# Two nodes plus GARBD.  
 
:# Two nodes plus GARBD.  
:# MySQL with Galera
+
:# MySQL variant with Galera: Cover major options (Galera Cluster for MySQL, Percona XtraDB Cluster, and MariaDB Galera Cluster) and link off to resources to understand installation and initial config options (e.g., SST).
 
:# Pacemaker multistate clone resource for Galera cluster
 
:# Pacemaker multistate clone resource for Galera cluster
 
:# Pacemaker resource agent for Galera cluster management
 
:# Pacemaker resource agent for Galera cluster management
Line 48: Line 159:
  
 
==== RabbitMQ Message broker ====
 
==== RabbitMQ Message broker ====
 +
'''(Priority: 2)'''
 +
 +
The RabbitMQ team is creating their own Guide and we will link to that.
 +
This section will include some basic concepts and configuration information but defer to the RabbitMQ documentation for details and advanced tasks.
 +
 
:# Install and configure message broker on Controller; see http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-prerequisites
 
:# Install and configure message broker on Controller; see http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-prerequisites
 
:# Oslo messaging for active/active
 
:# Oslo messaging for active/active
Line 55: Line 171:
 
:# Pacemaker resource agent for RabbitMQ cluster management
 
:# Pacemaker resource agent for RabbitMQ cluster management
 
:# Deprecate DRBD for RabbitMQ
 
:# Deprecate DRBD for RabbitMQ
 
==== Memcached ====
 
Does this go here or in "Prerequisites" section above?
 
:# I think Oslo supports hash synchronization so this shouldn't take more than load balancing.
 
 
==== NTP ====
 
:# Run NTP servers on every controller and configure other nodes to use all of them for synchronization.  Link to http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-ntp
 
  
 
==== Keystone Identity services ====
 
==== Keystone Identity services ====
 +
'''(Priority: 3, Depends-on: infrastructure)'''
 
:# Install Guide for concepts: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-concepts.html
 
:# Install Guide for concepts: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-concepts.html
 
:# Install Guide to configure prerequisites, install and configure the components, and finalize the installation: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-install.html
 
:# Install Guide to configure prerequisites, install and configure the components, and finalize the installation: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-install.html
Line 72: Line 182:
  
 
==== Glance image service ====
 
==== Glance image service ====
 +
'''(Priority: 5, Depends-on: swift, keystone, infrastructure)'''
 
:# Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_keystone.html )
 
:# Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_keystone.html )
 
:# Configure Glance for HA MySQL and HA RabbitMQ
 
:# Configure Glance for HA MySQL and HA RabbitMQ
 
:# Add OpenStack Image API resource to Pacemaker, Configure OpenStack Image Service API, Configure OpenStack services to use HA Image API (Modify: http://docs.openstack.org/high-availability-guide/content/s-keystone.html )
 
:# Add OpenStack Image API resource to Pacemaker, Configure OpenStack Image Service API, Configure OpenStack services to use HA Image API (Modify: http://docs.openstack.org/high-availability-guide/content/s-keystone.html )
:# Should Glance use a redundant storage backend such as Swift?
+
:# Configure OpenStack Image Service API (http://docs.openstack.org/high-availability-guide/content/_configure_openstack_image_service_api.html)
 +
:#  Configure OpenStack services to use HA Image API (http://docs.openstack.org/high-availability-guide/content/_configure_openstack_services_to_use_high_available_openstack_image_api.html)
 +
:# Should Glance use a redundant storage backend such as Swift or Ceph?
  
 
==== Cinder Block Storage Service ====
 
==== Cinder Block Storage Service ====
:# Install Guide for basic installation
+
 
:# The installation guide covers one API/scheduler node and one volume node.
+
'''cinder-api'''
:# Add API/scheduler redundancy and multiple volume nodes.
+
 
:# Discuss availability zones?
+
One instance of cinder-scheduler is run per Controller node. API processes are stateless and run in an active/active mode only with a load balancer put in front of them (e.g. HAProxy). The load balancer periodically checks, whether a particular API backend server is currently available or not, and forwards HTTP requests to the available backends in a round-robin fashion. APIs are started on all Controller nodes of a cluster - API requests will be fulfilled, if at least one of cinder-api instances remains available.
:# Need to use Ceph as the storage backend to have data redundancy? We should support at least one open source option such as Ceph and perhaps NFS... and simply mention other options such as NetApp and NFS.
+
 
 +
'''cinder-scheduler'''
 +
 
 +
One instance of cinder-scheduler is run per Controller node. '''cinder-scheduler''' instances work in an active-active mode. RPC requests are distributed between running scheduler instances.
 +
 
 +
'''cinder-volume (LVM Backend)'''
 +
 
 +
cinder-volume services are run on Storage nodes and cannot work in HA mode with LVM backend.
 +
 
 +
'''cinder-volume (Ceph Backend)'''
 +
 
 +
cinder-volume services with Ceph backend are run on Controller nodes in HA mode by default.
  
 
==== Swift Object Storage ====
 
==== Swift Object Storage ====
 +
'''(Priority: 4, Depends-on: keystone, infrastructure)'''
 
:# Install Guide for basic installation
 
:# Install Guide for basic installation
 
:# The installation guide covers basic storage node redundancy, but only deploys one proxy server. Do we want to discuss the process of adding proxy servers and load balancing them? Also, what about adding storage nodes and perhaps discussing regions/zones?
 
:# The installation guide covers basic storage node redundancy, but only deploys one proxy server. Do we want to discuss the process of adding proxy servers and load balancing them? Also, what about adding storage nodes and perhaps discussing regions/zones?
  
 
==== Nova compute service ====
 
==== Nova compute service ====
:# Install Guide for basic setup
+
OpenStack Compute consists of a number of services. HA implementation is service specific.
:# The installation guide covers multiple compute nodes, but only deploys one instance of API and other services. We should discuss the process of deploying multiple instances of the latter.
+
 
 +
'''Nova API'''
 +
 
 +
API processes are stateless and run in active/active mode with a load balancer put in front of them (e.g. HAProxy). The load balancer periodically checks whether a particular API backend server is currently available or not, and forwards HTTP requests to the available backends in a round-robin fashion. APIs are started on all Controller nodes of a cluster - API requests are fulfilled if at least one of the nova-api-* instances remains available.
 +
 
 +
'''nova-scheduler'''
 +
 
 +
One instance of nova-scheduler is run per one Controller node. nova-scheduler instances work in the active-active mode. RPC requests are distributed between the running scheduler instances.
 +
 
 +
'''nova-conductor'''
 +
 
 +
If used, nova-conductor instances are run in the active/active mode on each Controller node of the cluster. RPC requests are distributed between running Controller instances.
 +
 
 +
'''nova-compute'''
 +
 
 +
Exactly one nova-compute instance is run per one Compute node.
 +
A Compute node represents a fault domain: if one of the Compute nodes goes down, it will only affect VMs running on this particular node. In this case it will be possible to evacuate the affected VMs to other Compute nodes of the cluster. Note that the evacuation of VMs from a failed compute node is not performed automatically.
 +
 
 +
'''nova-network'''
 +
 
 +
When used as part of HA, nova-network is run on each Compute node of the cluster (the so called multi-host mode).
 +
Each of these nodes must have access to the public network. nova-network provides DHCP/gateway for VMs running on this particular Сompute node and, thus, a Compute node represents a fault domain for nova-network: failure of one of the Compute nodes does not affect any other Compute nodes.
 +
 
 +
'''nova-novncproxy, nova-consoleauth, nova-objectstore'''
 +
 
 +
An instance is run on each Controller node of the cluster in the active/active mode. Note that the endpoints need to be accessible by the users of the cloud.
 +
 
 +
Availability zones
 +
 
 +
See [http://docs.openstack.org/openstack-ops/content/scaling.html#availability_zones Availability Zones and Host Aggregates in OpenStack documentation].
  
 
==== Heat Orchestration ====
 
==== Heat Orchestration ====
:# Install Guide for basic installation
+
The heat service consists of:
:# Add API redundancy
+
 
:# How to set up so that VMs on a failed compute node are quickly migrated to other compute nodes
+
* Several API services: native, AWS CloudFormation compatible, and AWS CloudWatch compatible.
 +
* heat-engine that does the orchestration itself.
 +
 
 +
By default, each service is deployed on every controller, providing horizontal scalability for both the APIs (API redundancy) and the heat engine.
 +
 
 +
The API services in are placed behind HAProxy as other OpenStack APIs. To add more redundancy and/or to scale out the deployment, just adding more Controllers is enough.
 +
 
 +
'''Notes on Corosync'''
 +
 
 +
Currently the heat engine is placed under Corosync/Pacemaker control. This is a code cruft from before [https://bugs.launchpad.net/mos/+bug/1387345 LP Bug 1387345] was fixed, when Pacemaker control was necessary to manually force the running of only one instance of the heat engine.
 +
 
 +
'''Notes on failover'''
 +
 
 +
The heat engine does not support automatic failover.
 +
If a stack that is processed by a given heat-engine dies and leaves the stack in the IN_PROGRESS state, no other heat-engine will automatically pick up the stack for further processing.
 +
If a different instance of heat-engine is then (re)started, all IN_PROGRESS stacks with no active engines assigned to them will be automatically put to the FAILED state.
 +
A user then can manually attempt to make a "recovery" update call with the same template, delete the stack or repeat the stack action (e.g. for non-modifying stack actions like "check").
  
 
==== Ceilometer Telemetry and MongoDB ====
 
==== Ceilometer Telemetry and MongoDB ====
:# Install Guide for basic installation
+
 
:# Need one MongoDB node for each Controller node
+
You can use Ceilometer and MongoDB for the Telemetry service. MongoDB is the default storage backend for Ceilometer.
 +
 
 +
Ceilometer consists of the following services:
 +
 
 +
* Ceilometer API - a real world facing service. It aims to provide the API to query and view the data recorded by the collector service.
 +
* Ceilometer collector - a daemon designed to gather and record events and metering data created by notifications and sent by polling agents.
 +
* Ceilometer notification agent - a daemon designed to listen to notifications on message queue and convert them into the Events and Samples.
 +
* Ceilometer polling agents (central and compute) - daemons created to poll OpenStack services and build Meters. Compute agent polls data only from OpenStack compute service(s). Polling via service APIs for non-compute resources is handled by a central agent usually running on a Cloud Controller node.
 +
* Ceilometer alarm services (evaluator and notifier) - daemons to evaluate and notify using predefined alarming rules.
 +
 
 +
Ceilometer-api, ceilometer-agent-notification and ceilometer-collector run on all Controller nodes. ceilometer-agent-compute runs on every Compute node and polls its compute resources.
 +
 
 +
Ceilometer-api services are placed behind HAProxy in accordance with the practice designed for all OpenStack APIs. A round-robin strategy is used to choose ceilometer-api service for the request processing.
 +
 
 +
'''Notes on Corosync'''
 +
 
 +
The ceilometer-agent-central and ceilometer-alarm-evaluator services are monitored by Pacemaker. There should be only one ceilometer-agent-central in the cluster to avoid samples duplication from other OpenStack services running in HA mode.
 +
 
 +
Also there should be only one running instance of the ceilometer-alarm-evaluator service in the cluster because the alarm evaluator sends a signal to the ceilometer-alarm-notifier.
 +
 
 +
Note that with several instances running, Ceilometer may send some activating signal and cause unexpected actions.
 +
 
 +
'''Notes on failover'''
 +
 
 +
Ceilometer failures do not affect the performance of other services except for the services which use Ceilometer alarm components (e.g. Heat).
 +
 
 +
All other services affect only the Ceilometer meter collecting. Broken ceilometer-collector does not poll data from the AMQP queue and does not save data to the storage backend (MongoDB in HA mode). So, the ''notification.info'' queue in the AMQP backend may be bulked by published and not collected messages. The notification agent and polling agent failures do not let to get the notification and polling meters respectively. Ceilometer API failures break alarm processes and the API work itself.
  
 
==== Database Service (Trove) ====
 
==== Database Service (Trove) ====
 +
'''(Priority 9: Depends-on: neutron, nova, glance, keystone, infrastructure)'''
 
:# Install Guide for basics
 
:# Install Guide for basics
 
:# Need details about how to apply HA
 
:# Need details about how to apply HA
  
 
==== Sahara ====
 
==== Sahara ====
 +
'''(Priority 9: Depends-on: neutron, nova, glance, keystone, infrastructure)'''
 
:# Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_sahara.html )
 
:# Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_sahara.html )
 
:# Should link to Sahara docs for discussion of OpenStack HA versus Hadoop HA and how they work together, although the installation instructions at http://docs.openstack.org/developer/sahara/userdoc/installation.guide.html do not currently mention HA
 
:# Should link to Sahara docs for discussion of OpenStack HA versus Hadoop HA and how they work together, although the installation instructions at http://docs.openstack.org/developer/sahara/userdoc/installation.guide.html do not currently mention HA
Line 112: Line 310:
 
:# Configure Pacemaker service group to ensure that the VIP is linked to the API services resource
 
:# Configure Pacemaker service group to ensure that the VIP is linked to the API services resource
 
:# Systemd alternative to OCF scripts for Pacemaker RA
 
:# Systemd alternative to OCF scripts for Pacemaker RA
:# MariaDB with Galera alternative to MySQL
+
:# MariaDB/Percona with Galera alternative to MySQL
 
:# Install and configure HAProxy for API services and MySQL with Galera cluster load balancing
 
:# Install and configure HAProxy for API services and MySQL with Galera cluster load balancing
 
:# Mention value of redundant hardware load balancers for stateless services such as REST APIs
 
:# Mention value of redundant hardware load balancers for stateless services such as REST APIs

Latest revision as of 19:14, 21 May 2015

(Moved original to the bottom of the page for reference.)

Proposed Revision

This spec refers to the https://blueprints.launchpad.net/openstack-manuals/+spec/improve-ha-guide blueprint.

Strategy and assumptions:

  1. Audience is people who have some experience installing OpenStack, not first time users
  2. Focus on installation of OpenStack core services
  3. Structure the guide sequentially -- the steps to take in a reasonable order
  4. Avoid redundancy with the Install Guide; for steps that are identical for HA and non-HA installations, link to appropriate sections in the Install Guide
  5. One guide for all Linux distros/platforms
  6. Emphasize a reasonable, standard deployment based on open source components. We can provide some notes about alternatives as appropriate (for example, using a commercial load-balancer might be a better alternative than relying on HAProxy) and perhaps a link to the OpenStack Marketplace.
  7. The Active/Active versus Active/Passive configurations will be discussed for each component rather than dividing the guide into sections for A/A and A/P as in the current guide.

Structure/Outline

HA Intro and Concepts

Redundancy and failover

A failover procedure of single node of the OpenStack environment consists of failover procedures of all existing node roles and installed components. What follows is basic information about each component or role failover.

Controller node

Normally, the following components are included ine the standard layout of the OpenStack Controller node:

Network components

Hardware

Bonding interfaces.

Routing

The configuration is static routing without Virtual Router Redundancy Protocol (VRRP) or similar techniques for gateways to failover.

Endpoints (VIP addresses)

Need description of VIP failover inside Linux namespaces and expected SLA.

LB (HAProxy)

HAProxy runs on each Controller node and does not synchronize the state. Each instance of HAProxy configures its frontend to accept connections only from the VIP address and to terminate them as a list of all instances of the corresponding service under load balancing. For example, any OpenStack API service. This makes the instances of HAProxy act independently and failover transparently together with the Network endpoints (VIP addresses) failover and shares the same SLA.

DB (MySQL)

MySQL with Galera runs behind HAProxy. There is always an active backend and a backup one configured. There is a zero slave lag due to Galera synchronous replication. As a result, the failover procedure completes once HAProxy detects when its active backend goes down and switches to the backup one, which should be marked as 'UP'. If there are no backends up (e.g. if Galera cluster is not ready to accept connections), the failover procedure will finish only when the Galera cluster finishes to reassemble. The SLA is normally no more than 5 minutes.

AMQP (RabbitMQ)

RabbitMQ nodes fail over both on the application and the infrastructure layers. The application layer is controlled by the oslo.messaging configuration options for multiple AMQP hosts. If the AMQP node fails, the application reconnects to the next one configured within the specified reconnect interval. The specified reconnect interval constitutes its SLA. On the infrastructure layer, the SLA is the time for which RabbitMQ cluster reassembles. There are several cases possible. When the Mnesia keeper node fails, which is the master of the corresponding Pacemaker resource for RabbitMQ, there will be a full AMQP cluster downtime interval. Normally, its SLA is no more than several minutes. When another node fails, which is a slave of the corresponding Pacemaker resource for RabbitMQ, there will be no AMQP cluster downtime at all.

Memcached backend

Need descrption of how memcache_pool backend fails over. The SLA is several minutes.

OpenStack stateless components

Needs a reference to the OpenStack HA/admin/ops guides describing the failover procedure in OpenStack for API and other stateless services.

OpenStack stateful components

Needs a reference to the OpenStack HA/admin/ops guides describing the failover procedure in OpenStack, including Neutron and its agents.

Storage components

CEPH-MON

Needs description.

CEPH RADOS-GW

Needs description.

SWIFT-all

Needs description.

Storage node (CEPH-OSD)

Needs description.

Storage node (Cinder-LVM)

Cinder nodes with the cinder-volume service running cannot failover. When a node fails, all LVM volumes on it become unavailable as well. To make them available, a shared storage is required.

Compute node

Compute nodes cannot failover. When a node fails, all instances running on it become unavailable as well. To make them available, HA for instances feature is required.

Mongo DB node

Needs description.

Zabbix node

Needs description.

Stateless/stateful, active/passive, active/active

See:

Quorums

Services should use an odd number of nodes equal to or greater than 3. See also HA Controller role description commit.

Single-controller HA mode

HA environment is not actually highly-available until at least three Controller nodes are configured but it is run using Pacemaker, Corosync, and the other utilities used to manage an HA environment. To make it highly available, add two or more Controller nodes. Each Controller cluster should include an odd number of nodes -- 1 node, 3 nodes, 5 nodes, et cetera.

Storage Backends

(Priority: 1) This section contains more concepts than actual procedures; our expectation is that the specific technologies discussed have their own configuration documentation that can be referenced.

This section describes the data plane (infrastructure) elements that factor into the overall HA capabilities of the storage; in other words, how does one ensure that ones data is not lost when systems fail. Topics to be discussed include RAID, Erasure Coding, etc. and describe the protections they do and do not offer.

We will also in a blurb of the options that are available. Finally, we could state that cinder supports multiple storage providers (Ceph, EMC, NetApp, SolidFire, etc.) and you can also get additional details from your storage providers documentation.

Swift combines control and data plane so we would cover some aspects of both.

Hardware setup

Generally HA can be configured and run on any hardware that is supported by Linux kernel. See http://www.linux-drivers.org/.

As a reference you can also use the hardware supported by Ubuntu. See http://www.ubuntu.com/certification/.

Basic Environment

(Priority: 1)

  1. Install O/S on each node (link to Install Guide, e.g http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html
  2. Install Memcached (Verify that Oslo supports hash synchronization; if so, this should not take more than load balancing. See http://docs.openstack.org/high-availability-guide/content/_memcached.html
  3. Run NTP servers on every controller and configure other nodes to use all of them for synchronization. Link to http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-ntp

Basic HA facilities

(Priority: 1)

  1. Install pacemaker, crmsh, corosync, cluster-glue, fence-agents (Fedora only), resource-agents. (Modify: http://docs.openstack.org/high-availability-guide/content/_install_packages.html)
  2. What is needed for LSB/upstart/systemd alternative to OCF scripts (RA) for Pacemaker? See https://bugs.launchpad.net/openstack-manuals/+bug/1349398
  3. Set up and start Corosync and Pacemaker. Stick with 'crm' tool for Ubuntu/Debian and 'pcs' for RHEL/Fedora (Modify http://docs.openstack.org/high-availability-guide/content/_set_up_corosync.html; Modify: http://docs.openstack.org/high-availability-guide/content/_start_pacemaker.html)
  4. Set basic cluster properties (Modify: http://docs.openstack.org/high-availability-guide/content/_set_basic_cluster_properties.html))
  5. Configure fencing for Pacemaker cluster (Links to http://clusterlabs.org/doc/)
  6. Configure the VIP (Keep: http://docs.openstack.org/high-availability-guide/content/s-api-vip.html )
  7. API services -- Do those belong here or in specific sections? (Modify Glance API: http://docs.openstack.org/high-availability-guide/content/s-glance-api.html and Modify Cinder API: http://docs.openstack.org/high-availability-guide/content/s-cinder-api.html )
  8. Schedulers
  9. Memcached service on Controllers (Keep: http://docs.openstack.org/high-availability-guide/content/_memcached.html , which links to http://code.google.com/p/memcached/wiki/NewStart for specifics)

Install and Configure MySQL

(Priority: 2)

  1. Two nodes plus GARBD.
  2. MySQL variant with Galera: Cover major options (Galera Cluster for MySQL, Percona XtraDB Cluster, and MariaDB Galera Cluster) and link off to resources to understand installation and initial config options (e.g., SST).
  3. Pacemaker multistate clone resource for Galera cluster
  4. Pacemaker resource agent for Galera cluster management
  5. Deprecate MySQL DRBD configuration because of split-brain issues

RabbitMQ Message broker

(Priority: 2)

The RabbitMQ team is creating their own Guide and we will link to that. This section will include some basic concepts and configuration information but defer to the RabbitMQ documentation for details and advanced tasks.

  1. Install and configure message broker on Controller; see http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-prerequisites
  2. Oslo messaging for active/active
  1. I think services need some special configuration with more than two nodes?
  1. No need for active/passive AMQP; Two-node active/active cluster with mirrored queues instead
  2. Pacemaker multistate clone resource for RabbitMQ cluster
  3. Pacemaker resource agent for RabbitMQ cluster management
  4. Deprecate DRBD for RabbitMQ

Keystone Identity services

(Priority: 3, Depends-on: infrastructure)

  1. Install Guide for concepts: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-concepts.html
  2. Install Guide to configure prerequisites, install and configure the components, and finalize the installation: http://docs.openstack.org/juno/install-guide/install/apt/content/keystone-install.html
  3. Configure Keystone for HA MySQL and HA RabbitMQ
  4. Add Keystone resource to Pacemaker
  5. Change bind parameters in keystone.conf
  6. Configure OpenStack services to use HA Keystone

Glance image service

(Priority: 5, Depends-on: swift, keystone, infrastructure)

  1. Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_keystone.html )
  2. Configure Glance for HA MySQL and HA RabbitMQ
  3. Add OpenStack Image API resource to Pacemaker, Configure OpenStack Image Service API, Configure OpenStack services to use HA Image API (Modify: http://docs.openstack.org/high-availability-guide/content/s-keystone.html )
  4. Configure OpenStack Image Service API (http://docs.openstack.org/high-availability-guide/content/_configure_openstack_image_service_api.html)
  5. Configure OpenStack services to use HA Image API (http://docs.openstack.org/high-availability-guide/content/_configure_openstack_services_to_use_high_available_openstack_image_api.html)
  6. Should Glance use a redundant storage backend such as Swift or Ceph?

Cinder Block Storage Service

cinder-api

One instance of cinder-scheduler is run per Controller node. API processes are stateless and run in an active/active mode only with a load balancer put in front of them (e.g. HAProxy). The load balancer periodically checks, whether a particular API backend server is currently available or not, and forwards HTTP requests to the available backends in a round-robin fashion. APIs are started on all Controller nodes of a cluster - API requests will be fulfilled, if at least one of cinder-api instances remains available.

cinder-scheduler

One instance of cinder-scheduler is run per Controller node. cinder-scheduler instances work in an active-active mode. RPC requests are distributed between running scheduler instances.

cinder-volume (LVM Backend)

cinder-volume services are run on Storage nodes and cannot work in HA mode with LVM backend.

cinder-volume (Ceph Backend)

cinder-volume services with Ceph backend are run on Controller nodes in HA mode by default.

Swift Object Storage

(Priority: 4, Depends-on: keystone, infrastructure)

  1. Install Guide for basic installation
  2. The installation guide covers basic storage node redundancy, but only deploys one proxy server. Do we want to discuss the process of adding proxy servers and load balancing them? Also, what about adding storage nodes and perhaps discussing regions/zones?

Nova compute service

OpenStack Compute consists of a number of services. HA implementation is service specific.

Nova API

API processes are stateless and run in active/active mode with a load balancer put in front of them (e.g. HAProxy). The load balancer periodically checks whether a particular API backend server is currently available or not, and forwards HTTP requests to the available backends in a round-robin fashion. APIs are started on all Controller nodes of a cluster - API requests are fulfilled if at least one of the nova-api-* instances remains available.

nova-scheduler

One instance of nova-scheduler is run per one Controller node. nova-scheduler instances work in the active-active mode. RPC requests are distributed between the running scheduler instances.

nova-conductor

If used, nova-conductor instances are run in the active/active mode on each Controller node of the cluster. RPC requests are distributed between running Controller instances.

nova-compute

Exactly one nova-compute instance is run per one Compute node. A Compute node represents a fault domain: if one of the Compute nodes goes down, it will only affect VMs running on this particular node. In this case it will be possible to evacuate the affected VMs to other Compute nodes of the cluster. Note that the evacuation of VMs from a failed compute node is not performed automatically.

nova-network

When used as part of HA, nova-network is run on each Compute node of the cluster (the so called multi-host mode). Each of these nodes must have access to the public network. nova-network provides DHCP/gateway for VMs running on this particular Сompute node and, thus, a Compute node represents a fault domain for nova-network: failure of one of the Compute nodes does not affect any other Compute nodes.

nova-novncproxy, nova-consoleauth, nova-objectstore

An instance is run on each Controller node of the cluster in the active/active mode. Note that the endpoints need to be accessible by the users of the cloud.

Availability zones

See Availability Zones and Host Aggregates in OpenStack documentation.

Heat Orchestration

The heat service consists of:

  • Several API services: native, AWS CloudFormation compatible, and AWS CloudWatch compatible.
  • heat-engine that does the orchestration itself.

By default, each service is deployed on every controller, providing horizontal scalability for both the APIs (API redundancy) and the heat engine.

The API services in are placed behind HAProxy as other OpenStack APIs. To add more redundancy and/or to scale out the deployment, just adding more Controllers is enough.

Notes on Corosync

Currently the heat engine is placed under Corosync/Pacemaker control. This is a code cruft from before LP Bug 1387345 was fixed, when Pacemaker control was necessary to manually force the running of only one instance of the heat engine.

Notes on failover

The heat engine does not support automatic failover. If a stack that is processed by a given heat-engine dies and leaves the stack in the IN_PROGRESS state, no other heat-engine will automatically pick up the stack for further processing. If a different instance of heat-engine is then (re)started, all IN_PROGRESS stacks with no active engines assigned to them will be automatically put to the FAILED state. A user then can manually attempt to make a "recovery" update call with the same template, delete the stack or repeat the stack action (e.g. for non-modifying stack actions like "check").

Ceilometer Telemetry and MongoDB

You can use Ceilometer and MongoDB for the Telemetry service. MongoDB is the default storage backend for Ceilometer.

Ceilometer consists of the following services:

  • Ceilometer API - a real world facing service. It aims to provide the API to query and view the data recorded by the collector service.
  • Ceilometer collector - a daemon designed to gather and record events and metering data created by notifications and sent by polling agents.
  • Ceilometer notification agent - a daemon designed to listen to notifications on message queue and convert them into the Events and Samples.
  • Ceilometer polling agents (central and compute) - daemons created to poll OpenStack services and build Meters. Compute agent polls data only from OpenStack compute service(s). Polling via service APIs for non-compute resources is handled by a central agent usually running on a Cloud Controller node.
  • Ceilometer alarm services (evaluator and notifier) - daemons to evaluate and notify using predefined alarming rules.

Ceilometer-api, ceilometer-agent-notification and ceilometer-collector run on all Controller nodes. ceilometer-agent-compute runs on every Compute node and polls its compute resources.

Ceilometer-api services are placed behind HAProxy in accordance with the practice designed for all OpenStack APIs. A round-robin strategy is used to choose ceilometer-api service for the request processing.

Notes on Corosync

The ceilometer-agent-central and ceilometer-alarm-evaluator services are monitored by Pacemaker. There should be only one ceilometer-agent-central in the cluster to avoid samples duplication from other OpenStack services running in HA mode.

Also there should be only one running instance of the ceilometer-alarm-evaluator service in the cluster because the alarm evaluator sends a signal to the ceilometer-alarm-notifier.

Note that with several instances running, Ceilometer may send some activating signal and cause unexpected actions.

Notes on failover

Ceilometer failures do not affect the performance of other services except for the services which use Ceilometer alarm components (e.g. Heat).

All other services affect only the Ceilometer meter collecting. Broken ceilometer-collector does not poll data from the AMQP queue and does not save data to the storage backend (MongoDB in HA mode). So, the notification.info queue in the AMQP backend may be bulked by published and not collected messages. The notification agent and polling agent failures do not let to get the notification and polling meters respectively. Ceilometer API failures break alarm processes and the API work itself.

Database Service (Trove)

(Priority 9: Depends-on: neutron, nova, glance, keystone, infrastructure)

  1. Install Guide for basics
  2. Need details about how to apply HA

Sahara

(Priority 9: Depends-on: neutron, nova, glance, keystone, infrastructure)

  1. Install Guide for basics (http://docs.openstack.org/juno/install-guide/install/apt/content/ch_sahara.html )
  2. Should link to Sahara docs for discussion of OpenStack HA versus Hadoop HA and how they work together, although the installation instructions at http://docs.openstack.org/developer/sahara/userdoc/installation.guide.html do not currently mention HA

Other

  1. Configure Pacemaker service group to ensure that the VIP is linked to the API services resource
  2. Systemd alternative to OCF scripts for Pacemaker RA
  3. MariaDB/Percona with Galera alternative to MySQL
  4. Install and configure HAProxy for API services and MySQL with Galera cluster load balancing
  5. Mention value of redundant hardware load balancers for stateless services such as REST APIs
  6. Describe scaling single node to 3 nodes HA
  7. Ceph?
  8. Murano?

Original for reference

NOTE: This is the original for us to depart from.

I. Introduction to OpenStack High Availability

  1. Stateless vs. Stateful services
  2. Active/Passive
  3. Active/Active

II. HA Using Active/Passive

1. The Pacemaker Cluster Stack

  1. Installing Packages
  2. Setting up Corosync
  3. Starting Corosync
  4. Starting Pacemaker
  5. Setting basic cluster properties

2. Cloud Controller Cluster Stack

  1. Highly available MySQL
  2. Highly available RabbitMQ

3. API Node Cluster Stack

  1. Configure the VIP
  2. Highly available OpenStack Identity
  3. Highly available OpenStack Image API
  4. Highly available Cinder API
  5. Highly available OpenStack Networking Server
  6. Highly available Ceilometer Central Agent
  7. Configure Pacemaker Group

4. Network Controller Cluster Stack

  1. Highly available Neutron L3 Agent
  2. Highly available Neutron DHCP Agent
  3. Highly available Neutron Metadata Agent
  4. Manage network resources

III. HA Using Active/Active

5. Database

  1. MySQL with Galera
  2. Galera Monitoring Scripts
  3. Other ways to provide a Highly Available database

6. RabbitMQ

  1. Install RabbitMQ
  2. Configure RabbitMQ
  3. Configure OpenStack Services to use RabbitMQ

7. HAproxy Nodes 8. OpenStack Controller Nodes

  1. Running OpenStack API & schedulers
  2. Memcached

9. OpenStack Network Nodes

  1. Running Neutron DHCP Agent
  2. Running Neutron L3 Agent
  3. Running Neutron Metadata Agent