Jump to: navigation, search

Difference between revisions of "Host-aggregates"

(Added a use case, and updated the API design)
Line 2: Line 2:
 
* '''Launchpad Entry''': [[NovaSpec]]:host-aggregates
 
* '''Launchpad Entry''': [[NovaSpec]]:host-aggregates
 
* '''Created''': 10 October 2011
 
* '''Created''': 10 October 2011
* '''Contributors''': [https://launchpad.net/~armando-migliaccio Armando Migliaccio]
+
* '''Contributors''': [https://launchpad.net/~armando-migliaccio Armando Migliaccio], [https://launchpad.net/~johngarbutt John Garbutt]
  
 
== Summary ==
 
== Summary ==
Line 14: Line 14:
  
 
== User stories ==
 
== User stories ==
TBC
+
 
 +
Creating a [[XenServer]] pool for live migration:
 +
* Set DISABLE_COMPUTE_ON_FIRST_REGISTRATION=true
 +
* Start up host1 and host2 compute hosts using [[XenServer]]
 +
* Create and aggregate called Pool1
 +
* Add host1 to aggregate
 +
** host1's pool renamed to "Pool1"
 +
* Add host2 to aggregate
 +
** host2 is added to host1's pool
 +
* Configure the hosts shared storage (once that feature is implemented! - out of scope here)
 +
* Enable the hosts
 +
* Start VMs on that host
 +
* Take down a host for maintainance, by live migrating all VMs to the other host
 +
 
 +
Restricting Live Migration:
 +
* The pool can be use with KVM as a grouping
 +
* all those hosts have the same shared storage
 +
* change live migration so it can only happen between hosts in the same aggregate
  
 
== Assumptions ==
 
== Assumptions ==
Line 33: Line 50:
  
 
=== API Design ===
 
=== API Design ===
There will be a new API relating to "os-host-aggregates", loosely based around, and slightly extending the current Hosts extension.
+
The workflows will make use of the following existing Host operations:
 +
* Update: disable hosts (to hide from scheduler) and enable hosts
 +
 
 +
There will be a new API relating to "os-host-aggregates".
 +
 
 +
It will have the following collection actions on os-host-aggregates:
 +
* list: returns a list of all the host-aggregates (optionally filtered by availability zone)
 +
* create: creates an aggregate, takes a friendly name, returns the GUID
  
 
A host-aggregate will contain:
 
A host-aggregate will contain:
* GUID: id
+
* GUID: id (auto-generated on create)
 
* Name: a friendly name
 
* Name: a friendly name
* [[AvailabilityZone]]: where all the hosts are
+
* [[AvailabilityZone]]: where all the hosts are (set by the first host that is added)
 
* Hosts: a list of the host/(? service) ids
 
* Hosts: a list of the host/(? service) ids
 
* Metadata: arbitrary key/value pairs
 
* Metadata: arbitrary key/value pairs
Line 45: Line 69:
 
* update: changes an arbitrary key/value pair (similar to current hosts update)
 
* update: changes an arbitrary key/value pair (similar to current hosts update)
 
* delete: if the aggregate is empty, you can delete it
 
* delete: if the aggregate is empty, you can delete it
 
+
* addHost: add a host, takes the host id, (only a compute service, only if already disabled, and has no VMs running, and in the correct availability zone)
It will have the following collection actions on os-host-aggregates:
+
* removeHost: remove a host (only if already disabled, and no VMs on that server)
* list (GET): returns a list of all the host-aggregates (optionally filtered by availability zone)
+
* NOTE: the "no VM" restrictions above, can only be implemented once (live) migration works on that particular virt layer
* create (POST): creates an aggregate, takes a friendly name, returns the GUID
 
 
 
We will extend the Hosts extension to include:
 
* addToHostAggregate (POST): takes the GUID and adds the host into the aggregate, host must be a compute service and be disabled (in maintainance mode)
 
* removeHostAggregate (POST): removes from the current aggregate, but must be disabled for this to happen
 
* ? may need an extra flag that says if the current status has been "applied" (i.e. is the host really disabled)
 
  
 
Details:
 
Details:
Line 59: Line 77:
 
* may need to consider what it means for other services (like volume) join a host aggregate
 
* may need to consider what it means for other services (like volume) join a host aggregate
 
* the last host/service to be removed from an aggregate clears the availability zone
 
* the last host/service to be removed from an aggregate clears the availability zone
 
+
* in [[XenServer]]/XCP the first host added with be treated as the pool master, additional nodes will be added to the existing pool, metadata on the aggregate will be updated should a master election occur due to the failure of the first master
Issues:
 
* here we assume that a disabled compute node will migrate away all the instances on that node
 
* maybe xenserver compute nodes start out as disabled, allowing you to join an aggregate, or just become a standalone server
 
* should disable on a xenserver host migrate (or terminate) all instances currently on a host (thus making it ready to join a pool)
 
  
 
== Implementation ==
 
== Implementation ==
Line 70: Line 84:
 
Operations at 1 and 2 are model operations, so there is no pollution due to hypervisors details. This means that most of the work to be done on Nova is related to adding admin extensions to OSAPI, extending the data model, and providing sqlalchemy migration snippets:
 
Operations at 1 and 2 are model operations, so there is no pollution due to hypervisors details. This means that most of the work to be done on Nova is related to adding admin extensions to OSAPI, extending the data model, and providing sqlalchemy migration snippets:
  
* Basic operations on aggregates. For instance:
+
* Basic API operations on aggregates
** Create: create empty aggregate with some attributes (e.g. name-label, kwargs, availability_zone).
+
* Basic operations on single aggregates
** Read: get list of aggregates per specified availability zone.
+
 
** ...
+
A new flag "DISABLE_COMPUTE_ON_FIRST_REGISTRATION" to ease the case when you add new hosts to an existing cloud, that you may want to later add into a host aggregate, and configure it's shared storage, before enabling user's VMs to be placed on those hosts.
* Basic operations on a single aggregate. For instance:
 
** Read/Update properties.
 
** Add/Remove host.
 
* Unit test the lot.
 
  
 
Implementation – Stage 1b:
 
Implementation – Stage 1b:
  
* Basic operations on a host within aggregate. For instance:
+
* On [[XenServer]], add the host into the pool
** Enable host: I see the hypervisor pool being created here on the first enable call the host in question becomes master; subsequent calls put host as slave).
+
** On the first enable call the host in question becomes master
** Disable host: this removes host from pool – note that host cannot be removed if it has not been disabled first; if pool becomes empty it gets destroyed).
+
** subsequent calls put host as slave
* Put/Resume host in/from maintenance: this triggers a lot of intervention on the platform. This means that guest instances on the host being put in maintenance need to move to other hosts in the aggregate (and back once the maintenance window has been closed). This move has ramifications on instance networking, the controllability of the instances, etc. There is extensive work to be done in this area to ensure that instances can still be reached and controlled by the orchestration layer.
+
** update loop may need to update which node is the master (in case of a new master election due to a failure)
 +
** remove host: this removes host from pool
 +
* Look at how to "evacuate" a host before removing from the pool
 +
** Put/Resume host in/from maintenance: this triggers a lot of intervention on the platform. This means that guest instances on the host being put in maintenance need to move to other hosts in the aggregate (and back once the maintenance window has been closed). This move has ramifications on instance networking, the controllability of the instances, etc. There is extensive work to be done in this area to ensure that instances can still be reached and controlled by the orchestration layer.
 
* Write CI tests for the lot.
 
* Write CI tests for the lot.
 
* Write user documentation to get this up and running.
 
* Write user documentation to get this up and running.
Line 90: Line 103:
 
In Stage 2 of the implementation the following aspects need to be looked at:
 
In Stage 2 of the implementation the following aspects need to be looked at:
  
* Admin API to create, associate and manage properties for the aggregate (e.g. shared storage etc).
+
* In the [[XenServer]] case, certainly properties may need to translate to changes to [[XenServer]] pool configuration (e.g. shared storage etc).
 
* Scheduling layer needs to be aggregate-aware (e.g. to provide headroom in face of maintenance windows).
 
* Scheduling layer needs to be aggregate-aware (e.g. to provide headroom in face of maintenance windows).
  

Revision as of 18:00, 10 January 2012

Summary

This blueprint introduces the concept of aggregate into Nova. Host aggregates are different from zones and availability zones: while the former allows the partition of Nova deployments into logical groups for load balancing and instance distribution, the latter are for providing some form of physical isolation and redundancy from other availability zones (e.g. by using separate power supply and network gears). Availability zones do not necessarily mean geographic distribution whereas zones usually do. Host aggregates can be regarded as a mechanism to further partitioning an availability zone, i.e. into multiple groups of hosts that share common resources like storage and network.

Release Note

Support for host aggregates (i.e. clusters, groups or pools of hypervisor hosts) in Nova. The Aggregates concept is another level of scaling for Nova deployments, after zones and availability zones.

Rationale

Host aggregates enable a finer level of granularity in which to structure an entire OpenStack deployment. Aggregates also allow higher availability of a single guest instance within an availability zone, it enables advanced VM placement strategies, and more importantly it enables hosts' zero-downtime upgrades. Aggregates are exposed via Admin OSAPI API. Check OSAPI documentation for more details.

User stories

Creating a XenServer pool for live migration:

  • Set DISABLE_COMPUTE_ON_FIRST_REGISTRATION=true
  • Start up host1 and host2 compute hosts using XenServer
  • Create and aggregate called Pool1
  • Add host1 to aggregate
    • host1's pool renamed to "Pool1"
  • Add host2 to aggregate
    • host2 is added to host1's pool
  • Configure the hosts shared storage (once that feature is implemented! - out of scope here)
  • Enable the hosts
  • Start VMs on that host
  • Take down a host for maintainance, by live migrating all VMs to the other host

Restricting Live Migration:

  • The pool can be use with KVM as a grouping
  • all those hosts have the same shared storage
  • change live migration so it can only happen between hosts in the same aggregate

Assumptions

For further details, see implementation section below.

  • In the first release of this feature, things like shared storage and networking are manually setup and configured.
  • There might be changes required on the specific virt layer support.
    • For instance in the xenapi case, it may be needed to ensure that instances can be explicitly streamed to the local SR or to another storage. At the moment instances get streamed to the default SR and this typically is the local SR. This is fine for non-HA instances, however for HA-enabled instances some form of shared storage is needed. In a nutshell, it is expected that some coordination with whoever is in charge of https://blueprints.launchpad.net/nova/+spec/guest-ha will take place.
    • With the introduction of hypervisor pools (e.g. XenServer pools or ESX resource pools), live migration becomes bonus; this means that there will be work to be coordinated with whoever is in charge of https://blueprints.launchpad.net/nova/+spec/xenapi-live-migration.
    • Effort is to be coordinated with the wider community team to get this stuff working primarely for both XenServer and KVM (ESXi may be in doubt). Primary focus for Essex is to close the gap between the two hypervisors, rather than widening it.

Design

  • Nova has zones and availability zones, the relationship is N-M.
  • Aggregates introduce a sub-level of grouping. The relationship between availability zones and aggregates is 1-M (i.e. 1 availability zone contains 0 or more aggregates, and an aggregate can belong to one and only one availability zone).
  • Aggregates must be visible to admin only.
  • Aggregates is an abstract concept, i.e. it does not necessarily translated to a resource pool in the hypervisor-land. The implication of this statement is that if it is decided to have, for instance, XenServer/VMWare resource pools to map to an aggregate in the host-aggregates-v1, this might no longer the case in host-aggregates-v2.

API Design

The workflows will make use of the following existing Host operations:

  • Update: disable hosts (to hide from scheduler) and enable hosts

There will be a new API relating to "os-host-aggregates".

It will have the following collection actions on os-host-aggregates:

  • list: returns a list of all the host-aggregates (optionally filtered by availability zone)
  • create: creates an aggregate, takes a friendly name, returns the GUID

A host-aggregate will contain:

  • GUID: id (auto-generated on create)
  • Name: a friendly name
  • AvailabilityZone: where all the hosts are (set by the first host that is added)
  • Hosts: a list of the host/(? service) ids
  • Metadata: arbitrary key/value pairs

It will have the following operations on aggregates:

  • update: changes an arbitrary key/value pair (similar to current hosts update)
  • delete: if the aggregate is empty, you can delete it
  • addHost: add a host, takes the host id, (only a compute service, only if already disabled, and has no VMs running, and in the correct availability zone)
  • removeHost: remove a host (only if already disabled, and no VMs on that server)
  • NOTE: the "no VM" restrictions above, can only be implemented once (live) migration works on that particular virt layer

Details:

  • the first host to be added into an aggregate sets the availability zone
  • may need to consider what it means for other services (like volume) join a host aggregate
  • the last host/service to be removed from an aggregate clears the availability zone
  • in XenServer/XCP the first host added with be treated as the pool master, additional nodes will be added to the existing pool, metadata on the aggregate will be updated should a master election occur due to the failure of the first master

Implementation

Implementation – Stage 1a:

Operations at 1 and 2 are model operations, so there is no pollution due to hypervisors details. This means that most of the work to be done on Nova is related to adding admin extensions to OSAPI, extending the data model, and providing sqlalchemy migration snippets:

  • Basic API operations on aggregates
  • Basic operations on single aggregates

A new flag "DISABLE_COMPUTE_ON_FIRST_REGISTRATION" to ease the case when you add new hosts to an existing cloud, that you may want to later add into a host aggregate, and configure it's shared storage, before enabling user's VMs to be placed on those hosts.

Implementation – Stage 1b:

  • On XenServer, add the host into the pool
    • On the first enable call the host in question becomes master
    • subsequent calls put host as slave
    • update loop may need to update which node is the master (in case of a new master election due to a failure)
    • remove host: this removes host from pool
  • Look at how to "evacuate" a host before removing from the pool
    • Put/Resume host in/from maintenance: this triggers a lot of intervention on the platform. This means that guest instances on the host being put in maintenance need to move to other hosts in the aggregate (and back once the maintenance window has been closed). This move has ramifications on instance networking, the controllability of the instances, etc. There is extensive work to be done in this area to ensure that instances can still be reached and controlled by the orchestration layer.
  • Write CI tests for the lot.
  • Write user documentation to get this up and running.

In Stage 2 of the implementation the following aspects need to be looked at:

  • In the XenServer case, certainly properties may need to translate to changes to XenServer pool configuration (e.g. shared storage etc).
  • Scheduling layer needs to be aggregate-aware (e.g. to provide headroom in face of maintenance windows).

UI Changes

Extensions may be provided to Horizon to support Admin tasks, like aggregates creation and management.

Code Changes

To be detailed when the implementation starts.

Migration

Because of the introduction of new abstractions to the Nova conceptual model, DB Schema may be required. These changes will be documented here when the implementation commences.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.