Jump to: navigation, search

Difference between revisions of "Host-aggregates"

Line 20: Line 20:
 
* Install and start nova-compute on host_N
 
* Install and start nova-compute on host_N
 
* Create an aggregate called Aggregate_X
 
* Create an aggregate called Aggregate_X
* Add host_1 to aggregate_X
+
* Add host_1 to aggregate_X, host_1 joins the aggregate
* Add host_2 to aggregate_X
+
* Add host_2 to aggregate_X, host_2 joins the aggregate
* Add host_N to aggregate_X
+
* Add host_N to aggregate_X, host_3 joins the aggregate
  
PLEASE NOTE: the setup of an aggregate must happen before all the hosts joining the aggregate can be fully operational (i.e. allowed to serve requests like VM creation). To this aim, nova-compute may need a new flag that tells the service to register with the Nova infrastructure, but at the same time being in disabled mode.
+
PLEASE NOTE: the setup of an aggregate must happen before all the hosts joining the aggregate can be fully operational (i.e. allowed to serve requests like VM creation). To this aim, nova-compute may need a new flag that tells the service to register with the Nova infrastructure, but at the same time being in disabled mode. The service status can then be switched to 'enabled' as soon as the host joins an aggregate.
  
 
Host maintenance:
 
Host maintenance:
* The pool can be use with KVM as a grouping
+
* Put a host in maintenance mode
* all those hosts have the same shared storage
+
* VMs running on a host migrate on member of the aggregate
* change live migration so it can only happen between hosts in the same aggregate
+
* Administrator intervenes on the host (e.g. adds extra memory)
 +
* Put host back into operational mode
  
 
== Assumptions ==
 
== Assumptions ==

Revision as of 09:54, 11 January 2012

Summary

This blueprint introduces the concept of aggregate into Nova. Host aggregates are different from zones and availability zones: while the former allows the partition of Nova deployments into logical groups for load balancing and instance distribution, the latter are for providing some form of physical isolation and redundancy from other availability zones (e.g. by using separate power supply and network gears). Availability zones do not necessarily mean geographic distribution whereas zones usually do. Host aggregates can be regarded as a mechanism to further partitioning an availability zone, i.e. into multiple groups of hosts that share common resources like storage and network.

Release Note

Support for host aggregates (i.e. clusters, groups or pools of hypervisor hosts) in Nova. The Aggregates concept is another level of scaling for Nova deployments, after zones and availability zones.

Rationale

Host aggregates enable a finer level of granularity in which to structure an entire OpenStack deployment. Aggregates also allow higher availability of a single guest instance within an availability zone, it enables advanced VM placement strategies, and more importantly it enables hosts' zero-downtime upgrades. Aggregates are exposed via Admin OSAPI API. Check OSAPI documentation for more details.

User stories

Host aggreate creation:

  • Install and start nova-compute on host_1
  • Install and start nova-compute on host_2
  • Install and start nova-compute on host_N
  • Create an aggregate called Aggregate_X
  • Add host_1 to aggregate_X, host_1 joins the aggregate
  • Add host_2 to aggregate_X, host_2 joins the aggregate
  • Add host_N to aggregate_X, host_3 joins the aggregate

PLEASE NOTE: the setup of an aggregate must happen before all the hosts joining the aggregate can be fully operational (i.e. allowed to serve requests like VM creation). To this aim, nova-compute may need a new flag that tells the service to register with the Nova infrastructure, but at the same time being in disabled mode. The service status can then be switched to 'enabled' as soon as the host joins an aggregate.

Host maintenance:

  • Put a host in maintenance mode
  • VMs running on a host migrate on member of the aggregate
  • Administrator intervenes on the host (e.g. adds extra memory)
  • Put host back into operational mode

Assumptions

For further details, see implementation section below.

  • In the first release of this feature, things like shared storage and networking are manually setup and configured.
  • There might be changes required on the specific virt layer support.
    • For instance in the xenapi case, it may be needed to ensure that instances can be explicitly streamed to the local SR or to another storage. At the moment instances get streamed to the default SR and this typically is the local SR. This is fine for non-HA instances, however for HA-enabled instances some form of shared storage is needed. In a nutshell, it is expected that some coordination with whoever is in charge of https://blueprints.launchpad.net/nova/+spec/guest-ha will take place.
    • With the introduction of hypervisor pools (e.g. XenServer pools or ESX resource pools), live migration becomes bonus; this means that there will be work to be coordinated with whoever is in charge of https://blueprints.launchpad.net/nova/+spec/xenapi-live-migration.
    • Effort is to be coordinated with the wider community team to get this stuff working primarely for both XenServer and KVM (ESXi may be in doubt). Primary focus for Essex is to close the gap between the two hypervisors, rather than widening it.

Design

  • Nova has zones and availability zones, the relationship is N-M.
  • Aggregates introduce a sub-level of grouping. The relationship between availability zones and aggregates is 1-M (i.e. 1 availability zone contains 0 or more aggregates, and an aggregate can belong to one and only one availability zone).
  • Aggregates must be visible to admin only.
  • Aggregates is an abstract concept, i.e. it does not necessarily translated to a resource pool in the hypervisor-land. The implication of this statement is that if it is decided to have, for instance, XenServer/VMWare resource pools to map to an aggregate in the host-aggregates-v1, this might no longer the case in host-aggregates-v2.

API Design

The workflows will make use of the following existing Host operations:

  • Update: disable hosts (to hide from scheduler) and enable hosts

There will be a new API relating to "os-host-aggregates".

It will have the following collection actions on os-host-aggregates:

  • list: returns a list of all the host-aggregates (optionally filtered by availability zone)
  • create: creates an aggregate, takes a friendly name, returns the GUID

A host-aggregate will contain:

  • GUID: id (auto-generated on create)
  • Name: a friendly name
  • AvailabilityZone: where all the hosts are (set by the first host that is added)
  • Hosts: a list of the host/(? service) ids
  • Metadata: arbitrary key/value pairs

It will have the following operations on aggregates:

  • update: changes an arbitrary key/value pair (similar to current hosts update)
  • delete: if the aggregate is empty, you can delete it
  • addHost: add a host, takes the host id, (only a compute service, only if already disabled, and has no VMs running, and in the correct availability zone)
  • removeHost: remove a host (only if already disabled, and no VMs on that server)
  • NOTE: the "no VM" restrictions above, can only be implemented once (live) migration works on that particular virt layer

Details:

  • the first host to be added into an aggregate sets the availability zone
  • may need to consider what it means for other services (like volume) join a host aggregate
  • the last host/service to be removed from an aggregate clears the availability zone
  • in XenServer/XCP the first host added with be treated as the pool master, additional nodes will be added to the existing pool, metadata on the aggregate will be updated should a master election occur due to the failure of the first master

Implementation

Implementation – Stage 1a:

Operations at 1 and 2 are model operations, so there is no pollution due to hypervisors details. This means that most of the work to be done on Nova is related to adding admin extensions to OSAPI, extending the data model, and providing sqlalchemy migration snippets:

  • Basic API operations on aggregates
  • Basic operations on single aggregates

A new flag "DISABLE_COMPUTE_ON_FIRST_REGISTRATION" to ease the case when you add new hosts to an existing cloud, that you may want to later add into a host aggregate, and configure it's shared storage, before enabling user's VMs to be placed on those hosts.

Implementation – Stage 1b:

  • On XenServer, add the host into the pool
    • On the first enable call the host in question becomes master
    • subsequent calls put host as slave
    • update loop may need to update which node is the master (in case of a new master election due to a failure)
    • remove host: this removes host from pool
  • Look at how to "evacuate" a host before removing from the pool
    • Put/Resume host in/from maintenance: this triggers a lot of intervention on the platform. This means that guest instances on the host being put in maintenance need to move to other hosts in the aggregate (and back once the maintenance window has been closed). This move has ramifications on instance networking, the controllability of the instances, etc. There is extensive work to be done in this area to ensure that instances can still be reached and controlled by the orchestration layer.
  • Write CI tests for the lot.
  • Write user documentation to get this up and running.

In Stage 2 of the implementation the following aspects need to be looked at:

  • In the XenServer case, certainly properties may need to translate to changes to XenServer pool configuration (e.g. shared storage etc).
  • Scheduling layer needs to be aggregate-aware (e.g. to provide headroom in face of maintenance windows).

UI Changes

Extensions may be provided to Horizon to support Admin tasks, like aggregates creation and management.

Code Changes

To be detailed when the implementation starts.

Migration

Because of the introduction of new abstractions to the Nova conceptual model, DB Schema may be required. These changes will be documented here when the implementation commences.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.