Services Heartbeat with ZooKeeper
Proper operation of Nova depends on up-to-date information about the availability of the various nodes, and services running on them. For example, scheduler needs to know which compute nodes are up, to calculate adequate placement for the new instance(s), and maximize the chance that instance creation would succeed.
Currently, this is done by each service updating the corresponding DB record every 10 seconds (updating the timestamp of last 'heartbeat'), and by each 'consumer' retrieving those records from the DB, and verifying that the last 'heartbeat' was within allowed time window. This mechanism is extremely inefficient and not scalable.
By introducing ZooKeeper into the Nova environment, we can implement an efficient Membership service, which will monitor availability ('heartbeat') of registered services, and will make it instantly available to those who need this information. Each service will register a ZK 'ephemeral' znode on startup. ZK framework will automatically maintain a 'heartbeat' session with the server, and once the service 'dies', the corresponding ephemeral node will be automatically removed. 'Heartbeat' information will be accessible via a new API, that would perform the corresponding ZK query. Services that need instant access to the membership information for large number of nodes/services (such as scheduler) will maintain a cache of the membership information, and will register to ZK to receive asynchronous membership updates, using ZK 'watchers' mechanism. These notifications will be triggered by birth/death of ephemeral znodes which represent the corresponding services.
The ZK interaction will be abstracted via a new Membership API, while a backwards-compatible ('legacy') implementation will provide the membership service capabilities the same way it works today -- via the DB. Main methods of the proposed Group Membership API are:
- Join group
- Query members
- Membership changes notification [optional]
- Get metadata [optional]
Services metadata in the DB
While it might make sense in the future to transfer the entire content of Service table from the DB to ZooKeeper, the current proposal is to keep the Service table in the DB untouched. One of the reasons is to avoid issues with multiple existing DB methods that perform joins with Service table. Moreover, one possible approach is to flash the membership updates (received via ZK) back to the DB (e.g., by defining a new column 'is_up'), so that DB queries can efficiently filter out services which are unavailable.
Other potential usage scenarios for ZK
Later on, ZooKeeper can be used for additional purposes, such as:
- keeping configuration metadata (e.g., details from nova.conf)
- membership service for other projects (not just Nova)
- leader election to maintain high availability of 'singleton' services (or maintain a given number of 'active' nodes among larger number of 'stand-by' nodes)
- leverage failure events (provided by membership service) to trigger corrective actions (e.g., cleanup, HA/remote restart of instances, etc)
A separate project doing this already
It was proposed at the IceHouse design summit for a library to provide an API for groups/membership with different backend implementations (such as ZooKeeper). It was decided that it doesn't belong in oslo, and would exist in a separate project.