Jump to: navigation, search

DuplicateWorkCeilometer

Revision as of 16:08, 15 August 2013 by SandyWalsh (talk | contribs)

Preventing Duplicate Work in Ceilometer

aka "Retry Semantics"

Or ... "What happens when Step 4 of a 10 Step Pipeline fails?"

Here's a common use case in Ceilometer (and, most parts of OpenStack for that matter):

  1. We have some work to do that will take several steps. We could fail anywhere along the way.
  2. If we fail, we may get called again by the service that initiated the action.
  3. We don't want to redo the steps we've already done. Instead, we'd like to retry just the part that failed. Redo'ing previous work could have disastrous ramifications:
  • writing duplicate records to a database,
  • sending a notice to a downstream user/service thousands of times,
  • performing an expensive calculation over and over DDOS'ing ourselves, etc.

Currently in CM, we have the the following major components:

  • The collector
  • The dispatcher
  • The pipeline
  • Alarm states
  • and, coming soon, the trigger pipeline


The problem with the above use case manifests itself in the dispatcher, the pipeline and will in the trigger pipeline. The collector is pretty dumb now thanks to the dispatcher.

The way I was hoping to prevent this problem is:

  1. the collector would re-publish to a new CM exchange for each downstream component that needs to do some unit of work. This could be a pipeline plugin, a dispatcher plugin or a trigger pipeline plugin.
  2. Each plugin would do its work and use the normal queue ack(), reject(), requeue() semantics of the queuing engine to deal with failures.
  3. This means that each plugin is responsible for detecting if it has already run or not. For some entities, such as events, this is easy since there is a unique message_id per event. But let's say that event generates 10 samples, we would need to repeatably produce a unique id for each sample based on that event id. For things like publishing, that gets very difficult since there is no related database record.

The downside with always going back to the queuing system is the large amount of chatter it will produce. Consider, 1 event that produces 10 meters, each with their own 3-step pipeline doing complex calculations and outbound publishing. That gets expensive quickly.

What we need is some way to say "I've done this step already".

Some possibilities:

  1. Each pipeline would manage the retry semantics for the pipeline. If any step fails, the pipeline manager would retry starting at the failed step. This is tricky since we would need to have a context object to persist and pass back to the failed plugin during the retry.
  2. Ditch the pipeline model. The collector would call a dispatcher when a particular event comes in. The dispatcher would do some work and either:
    1. ack and republish a new event to a new exchange, or
    2. requeue the event, or
    3. reject the event
  3. For non-pipeline stuff (the alarm state), we sort of need a persistent state-machine anyway. Rather than reinvent this for every component a reusable state machine library would be preferred (perhaps like the one the orchestration/state-management team is working on). Something that has schema-defined transitions with proper timeouts & persistence handled along each transition. A collection of enums and a snarl of if-statements is not a good state machine. I think this state machine library could also be used for the pipeline management as well.

A lightweight state machine engine that supports atomic updates is what we need. This could be backed by a database with good transaction support or something like zookeeper. Something like memcache wouldn't have locking we need. Either way the backend for this piece should be pluggable. We should chat with Josh Harlow to find out where they are with their effort and help out there if possible.

Additional problems:

  • the problem of a single plugin generating N items before failing on the N+1 th item. My gut says we simply need to persist for each step of the generator. :/