Jump to: navigation, search

Difference between revisions of "Zaqar/guarantees"

m (Pub-Sub)
m (Producer-Consumer)
Line 37: Line 37:
 
# Server returns a list of claimed messages
 
# Server returns a list of claimed messages
 
# Consumer processes each message, deleting them in turn
 
# Consumer processes each message, deleting them in turn
 +
  
 
So, as long as there are enough consumers to keep up with the producer(s), such that message's don't start expiring, no message will get lost, and consumers will only ever see a given message once, unless they crash and the claim expires (or is manually deleted), in which case, that same worker when restarted (or another worker) will see the message again when it is reclaimed, which is what you want anyway.
 
So, as long as there are enough consumers to keep up with the producer(s), such that message's don't start expiring, no message will get lost, and consumers will only ever see a given message once, unless they crash and the claim expires (or is manually deleted), in which case, that same worker when restarted (or another worker) will see the message again when it is reclaimed, which is what you want anyway.

Revision as of 18:39, 22 August 2013

Marconi Semantic Guarantees

Marconi guarantees FIFO for a given queue, but only when there is a single message producer. Marconi also guarantees once-and-only-once delivery of messages with some caveats, as follows:

Pub-Sub

  1. Publisher posts message to Server
  2. Server calculates a marker for the message
  3. Server writes message (with marker) to DB
  4. Server returns message ref to Publisher
  5. Subscriber lists messages (no marker query param)
  6. Server returns the first N messages in the queue, along with a "next" URL. The URL contains a ?marker=X query param, where X is the worker belonging to the last message returned.
  7. [more messages are posted]
  8. Subscriber lists messages using "next" URL (containing a marker)
  9. Server returns messages with markers > the marker passed in with the URL


A few things to point out here. First of all, as long as the publisher's polling interval is shorter than a given message's TTL, and your storage backend is configured for high durability, the subscriber can not miss a message, and will never receive the same message twice, assuming it persists the "next" URL (and marker), using it for all future requests.

The trick is in how the marker is generated and persisted with the message. For the guarantee to hold, the marker must be unique within a queue, and have atomic ordering (i.e., can't be timestamp+rand, and in the case of 2+ parallel requests to post a message, a message must not be inserted in a different order than the marker for that message was generated; race conditions between the steps of calculating the next marker and inserting a message using that marker must be detected and mitigated). To see (one way) this can be implemented, see the MongoDB storage driver.

The trade-off that is made to provide such a guarantee, is increased latency in the case of many producers posting messages to a single queue, since the marker generation becomes the bottleneck.

The alternative would be to use a timestamp-based message marker, and have the client detect and throw away duplicate messages, but it was decided to try avoiding that and seeing if we could minimize the inherent performance penalty.

Producer-Consumer

This case is a simpler one to deal with.

  1. Producer posts message to Server
  2. Server calculates an ID (and marker, but not needed for claiming)
  3. Server writes message to DB
  4. Server returns message ref to Producer
  5. Consumer claims some messages
  6. Server grabs X messages that aren't yet claimed, and associates them
  7. h a new claim ID.
  8. Server returns a list of claimed messages
  9. Consumer processes each message, deleting them in turn


So, as long as there are enough consumers to keep up with the producer(s), such that message's don't start expiring, no message will get lost, and consumers will only ever see a given message once, unless they crash and the claim expires (or is manually deleted), in which case, that same worker when restarted (or another worker) will see the message again when it is reclaimed, which is what you want anyway.