Jump to: navigation, search

Blueprint-nova-compute-cells

Revision as of 21:57, 2 November 2012 by ChrisBehrens (talk)
  • Launchpad Entry: NovaSpec:nova-compute-cells
  • Created: Chris Behrens
  • Contributors: Chris Behrens, Brian Elliott, Dragon, Alex Meade, Brian Lamar, Matt Sherborne, Sam Morrison

Summary

This blueprint introduces the new nova-cells service.

The aims of the service are:

  • to allow additional scaling and (geographic) distribution without complicated database or message queue clustering
  • to separate cell scheduling from host scheduling

Release Note

Rationale

Terminology, and History

Readers familiar with Amazon EC2 (eg), will understand that

  • A Geographical Region has multiple Availability zones
  • Availability Zones are distinct locations that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region. (AWS term)
  • eg EU-West-1 has EU-West-1a EU-West-1b EU-West-1c.
  • Client connects to the Region (i.e. EC2 endpoint)
  • When asking for a VM, if an Availability Zone is not specified, the scheduler will choose which Availability Zone in the Region to use
  • Alternately, if the user specifies an Availability Zone, the VM will start in that Availability Zone

Support for "Zones" has been present since early versions of OpenStack. The Bexar release implemented `Availability Zones` for an instance, based on Amazon terminology.

Later, the concept of a Nova `Zone` came up:

  • A stand-alone Nova deployment was called a Zone.
  • A Zone allowed you to partition your deployments into logical groups for load balancing and instance distribution. At the very least a Zone required an API node, a Scheduler node, a database and RabbitMQ. Zones shared nothing. No database, queue, user or project definition is shared between Zones. (OpenStack term)

Inter-Zone communication was considered untrusted and communications between Zones would be done using only the public OpenStack API. In Diablo, with the addition of Keystone, Zones were broken beyond usability, and in Essex they were removed entirely.

Introducing Cells

At the Folsom Design Summit, following some discussion on the mailing list, Chris Behrens proposed 'Folsom Compute Cells' as a new design replace the old 'Zone' concept. In contrast to before, Cell-Cell communication is trusted and goes via the AQMP bus.

Design

The service implementation is based on:

  • A separate database and message broker per cell
  • Inter-cell communication via pluggable driver (RPC is the only current driver available)
  • A tree structure, with
    • nova-API server in the 'top cell' only, not in children
    • support for multiple parent cells
  • Cell scheduling database from information pushed from children,
    • based on periodic broadcasts of capabilities and capacities
    • on database updates (instance update/destroy/fault_create)

Services per cell

An API cell contains:

  • AMQP Broker
  • Database
  • nova-cells
  • nova-api

A child cell contains:

  • AMQP Broker
  • Database
  • nova-cells
  • nova-scheduler
  • nova-network
  • nova-compute

Global services:

  • Glance, Keystone

References