- We need to use integer IDs to maintain backwards compatability with OpenStack API v1.X
How do we organize Glance within Zones so that we can:
- Discover available images for a customer
- Efficiently and reliably backup instances
- Support globally available images
Currently, region-level Nova zones contain a Glance API, Glance Registry, and Swift instance. Base install and snapshot data reside within Swift while the metadata exists in the Glance Registry.
Storing the backup data in a zone local Swift makes a lot of sense in terms of isolation and performance. Assuming that this Swift instance is publicly accessible, this will mean that image data is available globally.
The problem lies with the metadata. While a region-level Glance Registry has the same benefits in terms of performance and isolation as the region-level Swift, it introduces several large problems:
- Image discovery is difficult because there is no clear way for discovering which zones are present globally and what images they possess
- Base install data and metadata needs to be replicated into the zone which creates a burden on Ops to keep the images in sync
- The OpenStack API requires images to have a single globally unique integer ID. Since each zone would possess its own Glance Registry database, it's not clear who would be the arbiter of this ID.
To solve the problems mentioned above, we propose replacing the region-level Glance Registries with a single global Glance Registry.
In this plan, each region would maintain its own Swift and Glance API servers which would keep the heavy traffic--the image data--within the zone. But, the metadata would be pushed up to this central location.
This has a number of benefits:
- Image discovery is trivial: we just query the global Registry for the available images for a given customer
- Base install management is easy: the metadata is stored in the registry just like any other image and we serve the data out of a single base install Swift instance.
- We get the globally unique ID for free by using the databases AUTOINCREMENT ID column
Of course, there is one potentially large drawback to this approach: we're introducing a single point of failure. This has important performance and availability implications which are addressed below.
The first question to ask is: can a single global Glance Registry provide the performance characteristics we need?
The Glance Registry is really two components, a small web server exposing a REST interface and a traditional RDBMS storing the metadata. Given that the web servers can scale horizontally, the bottleneck is really the database.
For a first approximation, assume we're trying to scale to 1 million instances. For backups, in the worst case, each instance has daily backups turned on, meaning we'll have to accommodate 10**6/86400 or about 12 write transactions per second . This volume is low enough to be a non-issue.
Read performance can be scaled as well. First, we can use read-only slaves to increase the throughput (as well as geographically dispersing them to reduce latency). Second, we can avoid round-trips to the global registry entirely by caching the responses within each region. This will mean that many requests (in particular base installs), will be able to satisfied without having to go outside of the zone (giving us back the isolation and performance we want).
Another critical question is: what happens when (not if) the Glance Registry goes off-line? Will this cause instance-builds and backups to fail across the entire OpenStack deployment?
Since the region-level zone is caching image-metadata, it is very likely that the base install metadata will be present in the cache. This means that base installs, even with the global registry down, will still build.
Customer backups, however, will likely not be available for building during the period when the Glance Registry is down since they are unlikely to be cached.
The Glance Registry being down imposes two problem for backup jobs:
- There is no place to write the image-metadata
- There is no way to obtain the globally unique image identifier
The first problem can be mitigated by queuing writes until the Glance Registry becomes available again.
The second problem can be addressed by generating non-overlapping blocks of IDs and handing them off to each Zone. The zone-level Glance APIs can then use the ID blocks to return image IDs until the block is exhausted, at which point they request another. By having these IDs present within the zone, backups will be able to continue in the face of Glance-Registry downtime.
- Make the Glance Registry global
- Scale master database vertically to improve write throughput (if needed), use read-only slaves to increase read throughput and reduce latency (by dispersing them geographically)
- Use image-metadata caching to avoid round-trips to the Glance Registry as well as provide image availability in the face of downtime
- Queue writes within the zone so that that backup jobs can finish even if the Glance Registry is down
- Allocate blocks of image-IDs to each zone to allow the OpenStack API to return from a snapshot request without having to make a round-trip to the Glance Registry as well as allowing the backup job to complete in the event of Glance Registry downtime.Or use UUIDs.
- Since customer-backups and snapshots are much less likely to be cached at the region level, this approach doesn't allow customers to build from backups and snapshots when the Glance Registry is down. Is this acceptable for a first cut? What are the potential solutions here?
- At what point can we transition to something more scalable than integer IDs for images?
- In order to do so, we would need either a central zone registry or use a peer-to-peer zone discovery protocol.
- Assuming a uniform distribution of backup jobs over the 24 hour period.