Zaqar/guarantees

Marconi Semantic Guarantees

Marconi guarantees FIFO for a given queue, but only when there is a single message producer. Marconi also guarantees once-and-only-once delivery of messages with some caveats, as follows:

Pub-Sub

Publisher posts message to Server
Server calculates a marker for the message
Server writes message (with marker) to DB
Server returns message ref to Publisher
Subscriber lists messages (no marker query param)
Server returns the first N messages in the queue, along with a "next" URL. The URL contains a ?marker=X query param, where X is the worker belonging to the last message returned.
[more messages are posted]
Subscriber lists messages using "next" URL (containing a marker)
Server returns messages with markers > the marker passed in with the URL

A few things to point out here. First of all, as long as the publisher's polling interval is shorter than a given message's TTL, and your storage backend is configured for high durability, the subscriber can not miss a message, and will never receive the same message twice, assuming it persists the "next" URL (and marker), using it for all future requests.

The trick is in how the marker is generated and persisted with the message. For the guarantee to hold, the marker must be unique within a queue, and have atomic ordering (i.e., can't be timestamp+rand, and in the case of 2+ parallel requests to post a message, a message must not be inserted in a different order than the marker for that message was generated; race conditions between the steps of calculating the next marker and inserting a message using that marker must be detected and mitigated). To see (one way) this can be implemented, see the MongoDB storage driver.

The trade-off that is made to provide such a guarantee is increased latency in the case of many producers posting messages to a single queue, since the marker generation becomes the bottleneck.

The alternative would be to use a timestamp-based message marker, and have the client detect and throw away duplicate messages, but it was decided to try avoiding that and seeing if we could minimize the inherent performance penalty.

Producer-Consumer

This case is a simpler one to deal with.

Producer posts message to Server
Server calculates an ID (and marker, but not needed for claiming)
Server writes message to DB
Server returns message ref to Producer
Consumer claims some messages
Server grabs X messages that aren't yet claimed, and associates them
h a new claim ID.
Server returns a list of claimed messages
Consumer processes each message, deleting them in turn

So, as long as there are enough consumers to keep up with the producer(s), such that messages don't start expiring, no message will get lost, and consumers will only ever see a given message once, unless they crash and the claim expires (or is manually deleted), in which case, that same worker when restarted (or another worker) will see the message again when it is reclaimed, which is what you want anyway.