Jump to: navigation, search

Blueprint-spare-hosts

Specification for the spare-hosts blueprint

Use cases

As an openstack administrator, when I set up a new cell, I need to set 10 or so hosts as spares before enabling the cell / openstack install.

This is primarily aimed at cells / openstack-installs, where all hosts are the same size and the scheduling model is to spread out the VMs as much as possible.

A spare host will be guaranteed not to have any VMs on it.

* When another host goes down, I can chassis swap it with a spare host.
* If a surprise VM building spree happens and we run out of space in the huddle, we can enable some hosts while new ones are being ordered/burnt in
* If a raid array shows signs of imminent failure, and there's not much room in the huddle, we can enable a spare and evacuate to it

Possible later additions

* A warning threshhold, if the number of free machines is less than double the number of spares, a warning is emitted via the notifications system each time a host is activated.
* Queueing of migrations, if a migration is waiting on a spare host to be enabled, it can be put in a new state 'waiting for host availability' rather than failing.==

Plan of action

Scheduler

 1. When scheduler chooses a destination host, the list of hosts to be filtered will already have spares removed. We choose this way instead of writing a new filter, as the filtering code is currently already very slow

DB

1. We'll add an 'is_spare' column to the ComputeNode field, with an index to make filtering fast
2. In the API we'll add an optional parameter to the compute_node_get_all method, to either filter out, or only return spare hosts.
3. We'll add an API method to list the migrations headed towards or on the host. This is needed so we can guarantee that when a host is set to spare, it has zero VMs on it.

openstack api extension

We'll add an api extension with the following methods:

* list_spares(cell = None) - returns a list of host_names of spare machines. Takes an optional 'cell' argument, that would restrict the output to that of the single cell.
* reserve_spare(host_name=None, count=1, cell=None) - Sets the is_spare flag and disables the host. The box must have zero running VMs and no migrations heading towards it.
   * If host_name is None, it spares the eligible host with the most available RAM that it can
   * if host_name is None, count will make it spare the 'count' biggest hosts it finds (measured by available RAM)
   * if cell is passed, and cells is enabled, it'll restrict the operation to that cell
   * It will return a list of host_names that were set to spare
 * release_spare(host_name=None, count=1, cell=None) - removes the is_spare flag and enables the host.
   * host_name - restrict to one certain host
   * count - if host_name is None, release 'count' of the smallest spares (by ram)
   * cell - restrict to one cell