HAGuideImprovements/TOC
Revision as of 20:29, 19 February 2015 by StackScribe (talk | contribs)
(Moved original to the bottom of the page for reference.)
Contents
- 1 Proposed Revision
- 2 Original for reference
Proposed Revision
Strategy and assumptions:
- Audience is people who have some experience installing OpenStack, not first time users
- Focus on installation of OpenStack core services
- Structure the guide sequentially -- the steps to take in a reasonable order
- Avoid redundancy with the Install Guide; for steps that are identical for HA and non-HA installations, link to appropriate sections in the Install Guide
- One guide for all Linux distros/platforms
- Emphasize a reasonable, standard deployment based on open source components. We can provide some notes about alternatives as appropriate (for example, using a commercial load-balancer might be a better alternative than relying on HAProxy) and perhaps a link to the OpenStack Marketplace.
Structure/Outline
HA Intro and Concepts
- Redundancy and failover
- Stateless/stateful, active/passive, active/active
- Quorums; many services should use an odd number of nodes equal to or greater than 3
- Single-controller HA mode and scaling up to 3 or more
Hardware setup
- Minimal Architecture Example -- Network Layout, styled as in http://docs.openstack.org/juno/install-guide/install/apt/content/ch_basic_environment.html#basics-prerequisites for easy comparison
Prerequisites
- Link to Install Guide: Install O/S on each node
- Install pacemaker, crmsh, corosync, cluster-glue, fence-agents (Fedora only), resource-agents.
- Set up and start Corosync and Pacemaker. Stick with 'crm' tool for Ubuntu/Debian and 'pcs' for RHEL/Fedora
- Set basic cluster properties
- Configure fencing for Pacemaker cluster (Links to http://clusterlabs.org/doc/)
- Configure the VIP
- API services
- Schedulers
- Memcached service
Configure networking on each node
- Link to Networking Guide
- (Neutron agents should be described for active/active; deprecate single agent's instances case)
- For Kilo and beyond, focus on L3HA and DVR
Install and Configure MySQL
- Two nodes plus GARBD.
- MySQL with Galera
- Pacemaker multistate clone resource for Galera cluster
- Pacemaker resource agent for Galera cluster management
- Deprecate MySQL DRBD configuration because of split-brain issues
RabbitMQ Message broker
- Oslo messaging for active/active
- No need for active/passive AMQP; Two-node active/active cluster with mirrored queues instead
- Pacemaker multistate clone resource for RabbitMQ cluster
- Pacemaker resource agent for RabbitMQ cluster management
- Deprecate DRBD for RabbitMQ
NTP
- Link to Install Guide
Keystone Identity services
- Install Guide: basic installation, register each service to Keystone
- Configure Keystone for HA MySQL and HA RabbitMQ
- Add Keystone resource to Pacemaker
- Change bind parameters in keystone.conf
- Configure OpenStack services to use HA Keystone
Glance image service
Cinder Block Storage Service
- Install Guide for basic installation
- Need to use Ceph as the storage backend to have data redundancy?
Swift Object Storage
- Install Guide for basic installation
Nova compute service
- Install Guide for basic setup
Heat Orchestration
- Install Guide for basic installation
- How to set up so that VMs on a failed compute node are quickly migrated to other compute nodes
Ceilometer Telemetry and MongoDB
- Install Guide for basic installation
- Need one MongoDB node for each Controller node
Database Service (Trove)
- Install Guide for basics
- Need details about how to apply HA
Sahara
- Install Guide for basics
- Link to Sahara docs for discussion of OpenStack HA versus Hadoop HA and how they work together
Other
- Configure Pacemaker service group to ensure that the VIP is linked to the API services resource
- Systemd alternative to OCF scripts for Pacemaker RA
- MariaDB with Galera alternative to MySQL
- Install and configure HAProxy for API services and MySQL with Galera cluster load balancing
- Mention value of redundant hardware load balancers for stateless services such as REST APIs
- Describe scaling single node to 3 nodes HA
- Ceph?
- Murano?
Original for reference
NOTE: This is the original for us to depart from.
I. Introduction to OpenStack High Availability
- Stateless vs. Stateful services
- Active/Passive
- Active/Active
II. HA Using Active/Passive
1. The Pacemaker Cluster Stack
- Installing Packages
- Setting up Corosync
- Starting Corosync
- Starting Pacemaker
- Setting basic cluster properties
2. Cloud Controller Cluster Stack
- Highly available MySQL
- Highly available RabbitMQ
3. API Node Cluster Stack
- Configure the VIP
- Highly available OpenStack Identity
- Highly available OpenStack Image API
- Highly available Cinder API
- Highly available OpenStack Networking Server
- Highly available Ceilometer Central Agent
- Configure Pacemaker Group
4. Network Controller Cluster Stack
- Highly available Neutron L3 Agent
- Highly available Neutron DHCP Agent
- Highly available Neutron Metadata Agent
- Manage network resources
III. HA Using Active/Active
5. Database
- MySQL with Galera
- Galera Monitoring Scripts
- Other ways to provide a Highly Available database
6. RabbitMQ
- Install RabbitMQ
- Configure RabbitMQ
- Configure OpenStack Services to use RabbitMQ
7. HAproxy Nodes 8. OpenStack Controller Nodes
- Running OpenStack API & schedulers
- Memcached
9. OpenStack Network Nodes
- Running Neutron DHCP Agent
- Running Neutron L3 Agent
- Running Neutron Metadata Agent