Jump to: navigation, search

Blueprint-nova-compute-cells

Revision as of 23:11, 10 December 2014 by Alaski (talk | contribs) (Terminology, and History)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Warning.svg Old Design Page

This page was used to help design a feature that has been implemented. As a result, this page is unlikely to be updated and could contain outdated information. It was last updated on 2014-12-10

  • Launchpad Entry: NovaSpec:nova-compute-cells
  • Created: Chris Behrens
  • Contributors: Chris Behrens, Brian Elliott, Dragon, Alex Meade, Brian Lamar, Matt Sherborne, Sam Morrison

Summary

This blueprint introduces the new nova-cells service.

The aims of the service are:

  • to allow additional scaling and (geographic) distribution without complicated database or message queue clustering
  • to separate cell scheduling from host scheduling

Release Note

Rationale

Terminology, and History

Readers familiar with Amazon EC2 (eg), will understand that

  • A Geographical Region has multiple Availability zones
  • Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. (AWS term)
  • eg EU-West-1 has EU-West-1a EU-West-1b EU-West-1c.
  • Client connects to the Region (i.e. EC2 endpoint)
  • When asking for a VM, if an Availability Zone is not specified, the scheduler will choose which Availability Zone in the Region to use
  • Alternately, if the user specifies an Availability Zone, the VM will start in that Availability Zone

Support for "Zones" has been present since early versions of OpenStack. The Bexar release implemented `Availability Zones` for an instance, based on Amazon terminology.

Zone

Later, the concept of a Nova `Zone` came up:

  • A stand-alone Nova deployment was called a Zone.
  • A Zone allowed you to partition your deployments into logical groups for load balancing and instance distribution. At the very least a Zone required an API node, a Scheduler node, a database and RabbitMQ. Zones shared nothing. No database, queue, user or project definition is shared between Zones. (OpenStack term)

Inter-Zone communication was considered untrusted and communications between Zones would be done using only the public OpenStack API. In Diablo, with the addition of Keystone, Zones were broken beyond usability, and in Essex they were removed entirely.

Introducing Cells

At the Folsom Design Summit, following some discussion on the mailing list, Chris Behrens proposed 'Folsom Compute Cells' as a new design replace the old 'Zone' concept. In contrast to before, Cell-Cell communication is trusted and goes via the AQMP bus.

Design

The service implementation is based on:

  • A separate database and message broker per cell
  • Inter-cell communication via pluggable driver (RPC is the only current driver available)
  • A tree structure, with
    • nova-API server in the 'top cell' only, not in children
    • support for multiple parent cells
  • Cell scheduling database from information pushed from children,
    • based on periodic broadcasts of capabilities and capacities
    • on database updates (instance update/destroy/fault_create)

Services per cell

An API cell contains:

  • AMQP Broker
  • Database
  • nova-cells
  • nova-api

A child cell contains:

  • AMQP Broker
  • Database
  • nova-cells
  • nova-scheduler
  • nova-network
  • nova-compute

Global services:

  • Glance
  • Keystone

Cell routing

TBD

Configuration

New configuration options are added for Cells within their own config group called 'cells'. One should create a [cells] section in their nova.conf.

Options:

  • `enable` # enables the cells code
  • `name` # A short name for the current cell. Think of this like a non-fully-qualified hostname like 'api'
  • `capabilities` # Arbitrary key/value pairs to advertise to neighbor cells. (Unused in the first implementation)

Additionally, you'll need to configure other options in the DEFAULT section such as `compute_api_class` and `quota_driver`

Example API cell config:

[DEFAULT]
# Swap out the compute_api class so actions are proxied to nova-cells service.
compute_api_class=nova.compute.cells_api.ComputeCellsAPI

[cells]
name=api
enable=true


Example Child cell config:

[GLOBAL]
# Disable quota checking in child cells.  Let API cell do it exclusively.
quota_driver=nova.quota.NoopQuotaDriver

[cells]
enable=true
name=cell1  # something unique per child cell


Before bringing services online, you'll want to tell each cell about each other. The global cell needs to know about its immediate children. The child cells need to know about their immediate parents. Information needed is the rabbit server credentials for the particular cell. We can add these via nova-manage in each cell.

nova-manage cell create usage:


> bin/nova-manage cell create -h
Usage: nova-manage cell create <args> [options]

Options:
  -h, --help            show this help message and exit
  --name=<name>         Name for the new cell
  --cell_type=<parent|child>
                        Whether the cell is a parent or child
  --username=<username>
                        Username for the message broker in this cell
  --password=<password>
                        Password for the message broker in this cell
  --hostname=<hostname>
                        Address of the message broker in this cell
  --port=<number>       Port number of the message broker in this cell
  --virtual_host=<virtual_host>
                        The virtual host of the message broker in this cell
  --woffset=<float>     
  --wscale=<float>


Let's assume we have an API cell named 'api' and 2 child cells 'cell1' and 'cell2'. Within the api cell, we have the following rabbit server info:

rabbit_host=10.0.0.10 rabbit_port=5672 rabbit_username=api_user rabbit_password=api_passwd rabbit_virtual_host=api_vhost

And in the child cell named 'cell1' we have the following rabbit server info:

rabbit_host=10.0.1.10 rabbit_port=5673 rabbit_username=cell1_user rabbit_password=cell1_passwd rabbit_virtual_host=cell1_vhost

And in the child cell named 'cell2' we have the following rabbit server info:

rabbit_host=10.0.2.10 rabbit_port=5673 rabbit_username=cell2_user rabbit_password=cell2_passwd rabbit_virtual_host=cell2_vhost

We would run these in the API cell to tell it about its children:


> nova-manage cell create --name=cell1 --cell_type=child --username=cell1_user --password=cell1_passwd --hostname=10.0.1.10 --port=5673 --virtual_host=cell1_vhost --woffset=1.0 --wscale=1.0
> nova-manage cell create --name=cell2 --cell_type=child --username=cell2_user --password=cell2_passwd --hostname=10.0.2.10 --port=5673 --virtual_host=cell2_vhost --woffset=1.0 --wscale=1.0


In both child cells, we would run this to tell them about their parent:


> nova-manage cell create --name=api --cell_type=parent --username=api1_user --password=api1_passwd --hostname=10.0.0.10 --port=5672 --virtual_host=api_vhost --woffset=1.0 --wscale=1.0


References