Difference between revisions of "DuplicateWorkCeilometer"

Revision as of 20:23, 16 August 2013

Preventing Duplicate Work in Ceilometer

aka "Retry Semantics"

Or ... "What happens when Step 4 of a 10 Step Pipeline fails?"

Here's a common use case in Ceilometer (and, most parts of OpenStack for that matter):

We have some work to do that will take several steps. We could fail anywhere along the way.
If we fail, we may get called again by the service that initiated the action.
We don't want to redo the steps we've already done. Instead, we'd like to retry just the part that failed. Redo'ing previous work could have disastrous ramifications:

writing duplicate records to a database,
sending a notice to a downstream user/service thousands of times,
performing an expensive calculation over and over DDOS'ing ourselves, etc.

Currently in CM, we have the the following major components:

The collector
The dispatcher
The pipeline
Alarm states
and, coming soon, the trigger pipeline

The problem with the above use case manifests itself in the dispatcher, the pipeline and will in the trigger pipeline. The collector is pretty dumb now thanks to the dispatcher.

The way I was hoping to prevent this problem is:

the collector would re-publish to a new CM exchange for each downstream component that needs to do some unit of work. This could be a pipeline plugin, a dispatcher plugin or a trigger pipeline plugin.

dhellmann - Does ceilometer need to republish, or should we set up a different queue for each pipeline? That way the message broker will handle the messages and redelivery if the pipeline raises an exception, but the collector won't be a bottleneck to creating those new messages.

Each plugin would do its work and use the normal queue ack(), reject(), requeue() semantics of the queuing engine to deal with failures.
This means that each plugin is responsible for detecting if it has already run or not. For some entities, such as events, this is easy since there is a unique message_id per event. But let's say that event generates 10 samples, we would need to repeatably produce a unique id for each sample based on that event id. For things like publishing, that gets very difficult since there is no related database record.

The downside with always going back to the queuing system is the large amount of chatter it will produce. Consider, 1 event that produces 10 meters, each with their own 3-step pipeline doing complex calculations and outbound publishing. That gets expensive quickly.

What we need is some way to say "I've done this step already".

Some possibilities:

Each pipeline would manage the retry semantics for the pipeline. If any step fails, the pipeline manager would retry starting at the failed step. This is tricky since we would need to have a context object to persist and pass back to the failed plugin during the retry.
Ditch the pipeline model. The collector would call a dispatcher when a particular event comes in. The dispatcher would do some work and either:
1. ack and republish a new event to a new exchange, or
2. requeue the event, or
3. reject the event
For non-pipeline stuff (the alarm state), we sort of need a persistent state-machine anyway. Rather than reinvent this for every component a reusable state machine library would be preferred (perhaps like the one the orchestration/state-management team is working on). Something that has schema-defined transitions with proper timeouts & persistence handled along each transition. A collection of enums and a snarl of if-statements is not a good state machine. I think this state machine library could also be used for the pipeline management as well.

dhellmann - The steps of the pipeline are meant to be relatively inexpensive transformations, not full processing. So each pipeline is massaging the data before taking the final action, which is the thing that might be reasonably expected to fail (publishing, writing to a db, whatever). Do we have (or envision) any transformations that might "fail" if they see the same data twice? Is it good enough to document that they are expected to be implemented in a way so they can avoid such issues?

A lightweight state machine engine that supports atomic updates is what we need. This could be backed by a database with good transaction support or something like zookeeper. Something like memcache wouldn't have locking we need. Either way the backend for this piece should be pluggable. We should chat with Josh Harlow to find out where they are with their effort and help out there if possible.

Additional problems:

the problem of a single plugin generating N items before failing on the N+1 th item. My gut says we simply need to persist for each step of the generator. :/

@@ Line 25: / Line 25: @@
 The way I was hoping to prevent this problem is:
 # the collector would re-publish to a new CM exchange for each downstream component that needs to do some unit of work. This could be a pipeline plugin, a dispatcher plugin or a trigger pipeline plugin.
+:''dhellmann'' - Does ceilometer need to republish, or should we set up a different queue for each pipeline? That way the message broker will handle the messages and redelivery if the pipeline raises an exception, but the collector won't be a bottleneck to creating those new messages.
 # Each plugin would do its work and use the normal queue ack(), reject(), requeue() semantics of the queuing engine to deal with failures.
 # This means that each plugin is responsible for detecting if it has already run or not. For some entities, such as events, this is easy since there is a unique message_id per event. But let's say that event generates 10 samples, we would need to repeatably produce a unique id for each sample based on that event id. For things like publishing, that gets very difficult since there is no related database record.
@@ Line 39: / Line 40: @@
 ## reject the event
 # For non-pipeline stuff (the alarm state), we sort of need a persistent state-machine anyway. Rather than reinvent this for every component a reusable state machine library would be preferred (perhaps like the one the orchestration/state-management team is working on). Something that has schema-defined transitions with proper timeouts & persistence handled along each transition. A collection of enums and a snarl of if-statements is not a good state machine. I think this state machine library could also be used for the pipeline management as well.
+:''dhellmann'' - The steps of the pipeline are meant to be relatively inexpensive transformations, not full processing. So each pipeline is massaging the data before taking the final action, which is the thing that might be reasonably expected to fail (publishing, writing to a db, whatever). Do we have (or envision) any transformations that might "fail" if they see the same data twice? Is it good enough to document that they are expected to be implemented in a way so they can avoid such issues?
 A lightweight state machine engine that supports atomic updates is what we need. This could be backed by a database with good transaction support or something like zookeeper. Something like memcache wouldn't have locking we need. Either way the backend for this piece should be pluggable. We should chat with Josh Harlow to find out where they are with their effort and help out there if possible.