Host-aggregates


 * Launchpad Entry: NovaSpec:host-aggregates
 * Created: 10 October 2011
 * Contributors: Armando Migliaccio, John Garbutt

Summary
This blueprint introduces the concept of aggregate into Nova. Host aggregates are different from zones and availability zones: while the former allows the partition of Nova deployments into logical groups for load balancing and instance distribution, the latter are for providing some form of physical isolation and redundancy from other availability zones (e.g. by using separate power supply and network gears). Availability zones do not necessarily mean geographic distribution whereas zones usually do. Host aggregates can be regarded as a mechanism to further partitioning an availability zone, i.e. into multiple groups of hosts that share common resources like storage and network.

Release Note
Support for host aggregates (i.e. clusters, groups or pools of hypervisor hosts) in Nova. The Aggregates concept is another level of scaling for Nova deployments, after zones and availability zones.

Rationale
Host aggregates enable a finer level of granularity in which to structure an entire OpenStack deployment. Aggregates also allow higher availability of a single guest instance within an availability zone, it enables advanced VM placement strategies, and more importantly it enables hosts' zero-downtime upgrades. Aggregates are exposed via Admin OSAPI API. Check OSAPI documentation for more details.

User stories
Host aggreate creation:


 * Install and start nova-compute on host_1
 * Install and start nova-compute on host_2
 * Install and start nova-compute on host_N
 * Create an aggregate called Aggregate_X
 * Add host_1 to aggregate_X, host_1 joins the aggregate
 * Add host_2 to aggregate_X, host_2 joins the aggregate
 * Add host_N to aggregate_X, host_3 joins the aggregate

PLEASE NOTE: the setup of an aggregate must happen before all the hosts joining the aggregate can be fully operational (i.e. allowed to serve requests like VM creation). To this aim, nova-compute may need a new flag that tells the service to register with the Nova infrastructure, but at the same time being in disabled mode. The service status can then be switched to 'enabled' as soon as the host joins an aggregate.

Host maintenance:


 * Put a host in maintenance mode
 * VMs running on a host migrate on member of the aggregate
 * Administrator intervenes on the host (e.g. adds extra memory)
 * Put host back into operational mode

Assumptions
For further details, see implementation section below.


 * In the first release of this feature, things like shared storage and networking are manually setup and configured.
 * Nova has zones and availability zones.
 * Aggregates introduce a sub-level of grouping. The relationship between availability zones and aggregates is 1-M (i.e. 1 availability zone contains 0 or more aggregates, and an aggregate can belong to one and only one availability zone).
 * A host is zero or one Aggregates
 * Aggregates must be visible to admin only.
 * Aggregates is an abstract concept, i.e. it does not necessarily translated to a resource pool in the hypervisor-land. The implication of this statement is that if it is decided to have, for instance, XenServer/VMWare resource pools to map to an aggregate in the host-aggregates-v1, this might no longer the case in host-aggregates-v2.
 * There might be changes required on the specific virt layer support.
 * For instance in the xenapi case, it may be needed to ensure that instances can be explicitly streamed to the local SR or to another storage. At the moment instances get streamed to the default SR and this typically is the local SR. This is fine for non-HA instances, however for HA-enabled instances some form of shared storage is needed. In a nutshell, it is expected that some coordination with whoever is in charge of https://blueprints.launchpad.net/nova/+spec/guest-ha will take place.
 * With the introduction of hypervisor pools (e.g. XenServer pools or ESX resource pools), live migration becomes bonus; this means that there will be work to be coordinated with whoever is in charge of https://blueprints.launchpad.net/nova/+spec/xenapi-live-migration.
 * Effort is to be coordinated with the wider community team to get this stuff working primarely for both XenServer and KVM (ESXi may be in doubt). Primary focus for Essex is to close the gap between the two hypervisors, rather than widening it.

Design
The OSAPI Admin API will be extended to support the following operations:


 * Hosts
 * start host maintenance (or evacuate-host): disallow a host to serve API requests and migrate instances to other hosts of the aggregate
 * stop host maintenance: (or rebalance-host): put the host back into operational mode, migrating instances back onto that host (optional)
 * Aggregates
 * list aggregates: returns a list of all the host-aggregates (optionally filtered by availability zone)
 * create aggregate: creates an aggregate, takes a friendly name, etc. returns an id
 * show aggregate: shows the details of an aggregate (id, name, availability_zone, hosts and metadata)
 * update aggregate: updates the name and availability zone of an aggregate
 * set metadata: sets the metadata on an aggregate to the values supplied
 * delete aggregate: deletes an aggregate, it fails if the aggregate is not empty
 * add host: adds a host to the aggregate
 * remove host: removes a host from the aggregate, it fails if the host is not disabled or

Implementation
Please read the notes below for further details on the first cut of the implementation.


 * XenServer
 * some properties associated to the aggregate will affect the configuration of a pool (e.g. shared storage, master election etc)
 * first host to be added to the aggregate becomes master, other hosts become slave
 * need to deal with master failures
 * host maintenance: this means that guest instances on the host need to move to other hosts in the aggregate (and back once the maintenance window has been closed). This move has ramifications on instance networking, the controllability of the instances, etc. There is extensive work to be done in this area to ensure that instances can still be reached and controlled by the orchestration layer.
 * Scheduling layer needs to be aggregate-aware: this means two things:
 * metrics (memory, storage, cpu, etc.) should still be accounted for on a host-basis, even if the host is member of a pool;
 * VM placement algorithms should not be affected by the existance of the pool;
 * A headroom may be optionally provided in order to deal with maintenance windows (this may imply a considerable packing problem);
 * Optimizations: availability zone can be obtained from host details, so it may be omitted on aggregate creation. This means that the last host/service to be removed from an aggregate clears the availability zone.
 * may need to consider what it means for other services (like volume) join a host aggregate

Targets for blueprint implementation:


 * E3/Early E4
 * OSAPI extensions
 * client mappings (python-novaclient)
 * Aggregate model and API
 * Virt extensions for XenServer
 * pool creation and setup
 * host maintenance
 * E4/Essex RC/F1
 * smoketests & documentation

UI Changes
Extensions may be provided to Horizon to support Admin tasks, like aggregates creation and management. However the nova cli client is expected to provide the following operations:

* aggregate-list                                                   Print a list of all aggregates. * aggregate-create                       Create a new aggregate with the specified details. * aggregate-delete                                             Delete the aggregate by its id. * aggregate-details                                            Show details of the specified aggregate. * aggregate-add-host                                    Add the host to the specified aggregate. * aggregate-remove-host                                 Remove the specified host from the specfied aggregate. * aggregate-set-metadata   [ ...]       Update the metadata associated with the aggregate. * aggregate-update         []        Update the aggregate's name and optionally availablity zone.

Code Changes
Changes are well confined, and can be seen at the following reviews:


 * Model - https://review.openstack.org/#change,3035
 * OSAPI exts - https://review.openstack.org/#change,3109
 * OSAPI exts - https://review.openstack.org/#change,3184
 * Aggregate API - https://review.openstack.org/#change,3149
 * Client mappings - https://review.openstack.org/#change,3144
 * Storage selection - https://review.openstack.org/#change,3380
 * Virt/xenapi implementation - https://review.openstack.org/#change,3761 and https://review.openstack.org/#change,4244
 * Host maintenance - https://review.openstack.org/#change,4023, https://review.openstack.org/#change,4322

For full details see blueprint page on Launchpad.

Migration
Because of the introduction of new abstractions to the Nova conceptual model, DB Schema may be required. Migration file can be seen in https://review.openstack.org/#change,3035.

Test/Demo Plan
Extensive unit test coverage has been added. Further tests will be added to exercise.sh available via devstack. More details to follow.

Unresolved issues
None.

BoF agenda and discussion
None.