In this use case, you use Queues, ala SQS, to feed worker pools.
1. Producer pushes Message onto Queue
2. Worker claims Message
3. Worker processes Message
4. Message deletes (ACK’s) Message
1a. One or more Producers push several Messages onto Queue. In this case, the worker can claim batches of messages, or a single message at a time, per it’s discretion.
2a. Worker does not claim Message before it expires. Producer should be adjusted to use a higher Message TTL, or Worker should poll more frequently.
2b. There are no messages to claim, because all messages have been consumed by this or other workers. Worker continues to periodically send requests to claim messages until new messages are available. Alternatively, a governor process may choose to autoscale the worker pool based on load (which can be checked by GETting the stats for the queue, and/or monitoring whether or not individual workers are idle).
3a. Worker crashes after claiming Message, but before processing it. In this case, the claim on the message will eventually expire, after which the Message will be made available again, to be claimed by another worker.
3b. Worker crashes after processing the message, but before deleting it. In this case, the next worker to claim the message would check whether the message has already been processed before proceeding. In some cases, a rare double-processing of a message may be acceptable, in which case no check is necessary.
We could go into a long digression about “push” vs. “pull” and how “push” has come to be conflated with “pub-sub”, but rather than opening that can of worms, let’s just define "Pub-Sub” simply as "notifying one or more observers/subscribers of an event”.
1. Publisher pushes Message X onto Queue
2. Subscriber A lists messages in Queue, gets Message X
3. Subscriber B lists messages in Queue, gets Message X
4. Subscribers A and B individually process Message X
2a. Subscriber has already listed messages in a previous round. In this case, the subscriber would submit a “next” marker to tell the server what messages it has already seen, so that the server will only return new messages to the subscriber.
2b. Subscriber does not list messages before Message X expires. In this case, the producer should be adjusted to use a higher TTL when posting messages, and/or the subscriber should poll more often.
2b. All messages have been listed. In this case, the subscriber gets an empty response, and will continue to periodically list messages using the queue’s last known marker, until it gets a non-empty response.
2c. Subscriber crashes before it can get Message X. In this case, a process monitor would simply restart the subscriber, and the subscriber would be able to get Message X as long as it is able to poll within the TTL period set by the publisher.
This is sort of like a loosely-coupled RPC pattern. In this case, each agent gets its own queue. A “queue” resource is extremely lightweight in Cloud Queues, so you can create hundreds of thousands of them as needed.
Note that bidirectional communication requires only a single queue.
1. Controller pushes Message onto Queue
2. Agent lists messages in Queue, gets Message
3. Agent performs requested action
4. Agent pushes Result Message onto Queue
5. Controller lists messages on Queue, gets Result Message
2a. Agent could claim messages, but it is slower than simply listing messages, and claiming isn’t necessary in any case since only one client is ever reading from the queue.
2b. Agent crashes before getting Message. In this case, as soon as the agent restarts, it can still get the message assuming it comes up within the TTL period set by Controller.
4a. Agent crashes before posting Result Message. The Controller will need to have a notion of a timeout period after which it no longer expects a response from its request.
4b. If not result is expected, this and the next steps are skipped.
This pattern is really a hybrid use case. The idea is that you add an additional observer that is constantly listing and recording messages in a queue. This could be a CLI “tail”-like script, or a passive server process logging everything as it comes through.
This helps in diagnosing problems/bugs in your message producers. It can also ensure that messages were processed correctly, for example, if you were using Queues as part of a metering solution and you wanted to audit your records to ensure billable events were all submitted to the billing system correctly.
That being said, having the ability to delay messages from being claim-able for a certain period of time is a feature we still need in order to make auditing work really well. After all, you need to give the auditor a chance to list the message before the worker deletes it. Today, workers *can* pause X seconds after claiming a message before deleting it, but we believe taking care of it on the producer end would be more elegant.