Jump to: navigation, search

Difference between revisions of "ComputeDriverEvents"

Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
 
= Compute Driver Events =
 
= Compute Driver Events =
 +
This page came out of discussions in the mailing list thread [http://lists.openstack.org/pipermail/openstack-dev/2013-January/004501.html RFC: Synchronizing hypervisor <-> nova state with event notifications]
 +
 
The Nova compute manager has a periodic task that runs every 10 minutes to update Nova's view of instance states, with the actual state reported by the hypervisor. This is particularly important to allow Nova to detect when an instance has shutdown of its own accord, either because the guest OS admin ran 'shutdown -h' or because the hypervisor killed it for some reason. Originally the periodic task ran more frequently, but this causes excessive load on the hypervisors, and also consumes too much CPU time in Nova itself causing other greenthreads to be blocked.
 
The Nova compute manager has a periodic task that runs every 10 minutes to update Nova's view of instance states, with the actual state reported by the hypervisor. This is particularly important to allow Nova to detect when an instance has shutdown of its own accord, either because the guest OS admin ran 'shutdown -h' or because the hypervisor killed it for some reason. Originally the periodic task ran more frequently, but this causes excessive load on the hypervisors, and also consumes too much CPU time in Nova itself causing other greenthreads to be blocked.
  

Revision as of 14:00, 12 February 2013

Compute Driver Events

This page came out of discussions in the mailing list thread RFC: Synchronizing hypervisor <-> nova state with event notifications

The Nova compute manager has a periodic task that runs every 10 minutes to update Nova's view of instance states, with the actual state reported by the hypervisor. This is particularly important to allow Nova to detect when an instance has shutdown of its own accord, either because the guest OS admin ran 'shutdown -h' or because the hypervisor killed it for some reason. Originally the periodic task ran more frequently, but this causes excessive load on the hypervisors, and also consumes too much CPU time in Nova itself causing other greenthreads to be blocked.

Most hypervisors have a way to inform management applications that some kind of lifecycle state event has occurred via a callback mechanism. For the libvirt driver this is referred to as the "domain events" capability. Making use of these hypervisor event capabilities (where available), would allow the periodic task to be avoided. This would remove the delay between an instance changing state, and Nova updating its own internal state, as well as reducing overhead in Nova itself.

Libvirt Design

Making use of events in Libvirt is fairly straightforward

  • Invoking libvirt.virEventRegisterDefaultImpl() will register libvirt's default event loop implementation, which is the same event loop used internally by libvirtd. While a custom pure python event loop impl could be written, there are many troublesome edge cases to take care of, so from a reliability POV it is preferable to re-use the existing default impl provided by libvirt
  • Invoking libvirt.virEventRunDefaultImpl() will perform one iteration of the libvirt default event loop. This should be called in a "while True" loop forever, to ensure prompt processing of events.
  • Invoking conn.domainEventRegisterAny() will register event callbacks against libvirt connection instances. The callbacks registered will be triggered from the execution context of virEventRunDefaultImpl()

To avoid blocking all greenthreads, the virEventRunDefaultImpl method needs to be run in a native thread. This in turn means that all callbacks registered with domainEventRegisterAny will also execute in a native thread. The compute manager will want to receive events in the context of a greenthread. To deal with this, the callbacks running in the native thread will need to just put incoming events on a queue, to be dispatched from a greenthread afterwards.

Compute Driver Design

To facilitate the handling of events in the compute driver classes, there will be a number of helper APIs / classes provided by the base compute driver

  • def queue_event(queue) - this is to be invoked from the hypervisor callback to queue an event for later dispatch to the compute manager. It is safe to call this from a native thread, since it will synchronize updating of the event queue via a python lock.
  • def emit_event(event) - this will dispatch a single event to the compute manager callback. This is only to be invoked from a greenthread.
  • def emit_queued_events() - this will dispatch all events previously queued via the queue_event() method. This is only to be invoked from a greenthread.
  • def register_event_listener(callback) - register a callback function to receive events. The callback will be invoked with a single parameter - the event object

The actual data associated with events will be provided via a number of classes

  • Event - the (abstract) base class for all events. Simply maintains a timestamp indicating when the event was raised
  • InstanceEvent - the (abstract) base class for all events associated with an individual instance. Maintains an instance UUID.
  • LifecycleEvent - the class used for reporting changes in an instance state (started/stopped/paused/resumed)