NovaZooKeeperLeaderElection

= Leader Election in Nova using ZooKeeper =

Overview
A common scenario in distributed systems is to 'elect' a singleton node to perform a certain role or function, among a larger group of nodes. When the leader fails, the failure is detected, triggering election of a new leader. In a more general case, instead of choosing a single leader, it might be desired to have certain number of leader (or 'active') nodes at any point of time, while a larger group of nodes (potentially all of them) are capable of performing the job. Such a capability is crucial for a large-scale system that requires built-in high availability -- such as OpenStack fabric.

A leader election service for Nova can be implemented using Zookeeper. A general implementation scheme is described at http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection The general flow of the leader election process within a group is as follows:
 * 1) Nodes in the group nominate themselves to become a leader
 * 2) The service chooses and notifies the leader node
 * 3) The service monitors the liveness of the leader (e.g., using ZK-based heartbeat & membership service, described in a separate blueprint)
 * 4) When the service stops seeing the leader, election of a new leader is initiated
 * 5) At any point in time, nodes can query the service for the current leader, or subscribe for notifications to receive leadership changes

API
The main methods of the proposed leader election API are:
 * getLeader(groupID) -- The method can be called by any client, independent of participation in the leader election. We assume that several independent leader election processes can be maintained in parallel, identified by a groupID.
 * nominate(groupID) -- A client nominates itself to be a leader. the method is blocked, and returns only if the client was chosen.
 * optional: nominateAsync(groupID, leader_callback).  A client nominates itself to be a leader. The method is non-blocking. When the client is chosen, the leader_callback is called.
 * optional: unnominateAsync(groupID), An optional method, which is complementary to the non-blocking nominateAsync method.
 * optional: subscribeLeaderChanges(groupID, change_callback) -- Optionally, a node can subscribe to receive leadership updates asynchronously, via change_callback.
 * optional: unsubscribeLeaderChanges
 * optional: subscribeErrors(groupID, error_callback) -- see Notes below