Revision as of 01:03, 17 February 2011

Queue Service

This document is a proposal and plan for a queue service, it is not an official OpenStack project yet. Once there is a simple implementation to evaluate, it will be submitted the project to the OpenStack Project Oversight Committee for approval.

The reason for initiating a new project vs using an existing one is due to simplicity, modularity, and scale. Very few (if any) existing queue systems out there were built with multi-tenant cloud use cases in mind. Very few also have a simple and extensible REST API. There are possible solutions to build an AMQP based service, but AMQP brings complexity and a protocol not optimized for high latency and intermittent connectivity.

Requirements

The primary goals of the queue service are:

Simple - Initially it will expose a simple REST based API to make it easy to access from any language. It should not require much setup, if any, before applications can start pushing messages into it.
Modular API - Initially the focus will be on a simple REST API, but this will not in any way be a first-class API. It should be possible to add other protocols (AMQP, JMS, direct protocol buffers, Gearman, etc) for other use cases. Note that the internal service API will not always provide a 1-1 mapping with the external API, so some features with some protocol modules may be unavailable.
Fast - Since this will act as a building block for other services that may drive heavy throughput, performance will have a focus. This mostly comes down to the implementation language and how clients and workers interact with the server to reduce network chatter.
Multi-tenant - Support multiple accounts for the service, and since this will also be a public service for some deployments, protect against potentially malicious users.
Persistent - Allow messages to optionally be persistent. For protocols that can support it, this can be an optional flag when the message is submitted. The persistent storage should also be modular so various data stores can be tested and to accommodate different deployment options.
Zones and Locality Awareness - As has been discussed on the OpenStack mailing list, locality in cloud services is an important feature. When dealing with where messages should be processed, there needs to be location awareness to process data where it exists to reduce network overhead and processing time.
No Single Point of Failure - This goes without saying, but the service should not rely on a single component.
Horizontally Scalable for Large Deployments - While it will be appropriate for small deployments, the service must support very large deployments such as a public cloud service.

Components

The queue service will consist of the following components:

Server - This is where messages are routed through. Clients will insert messages and workers will receive messages. Multiple servers may be run for high availability, and servers do not need to be aware of one another.
Proxy - This is an optional service that sits between the server and clients/workers. This allows clients and workers to connect to a single address and proxy to multiple servers. This is only required for extremely large deployments (such as a public cloud service) and will not be implemented until at least the second major milestone.
Client - An external application that inserts or updates messages in the queue.
Worker - An external application that reads, updates, or deletes messages in the queue. Note that the same application can be both a client and worker, the differentiation is only made based on typical behaviors, not on what they are capable of doing.
Accounts - This represents a user, application, or some other account context. Every message is associated with an account, and accounts can have ACLs to control the CRUD operations for other accounts and queues. ACL entries may have some simple pattern matching capabilities.
Queue - A queue is simply a namespace, it doesn't have metadata associated with it. Queue names are unique within an account context. Queues cannot be created or destroyed, they automatically appear to clients and workers when a message exists for the queue name, and is automatically removed when all messages for a given queue name are deleted.
Messages - A message consists of a unique ID, metadata, and a body. The unique ID is composed of the account, queue name, and a message ID. A server will only contain a single message for a given unique ID. If two messages are inserted with the same unique ID, the last one overwrites the first.

Message Metadata

ttl=SECONDS - How long the message will exist in the queue.
persist=0|1 - Mark that this message should have a higher guarantee of delivery. This can be quantified in an SLA per deployment depending on the backing store. This allows for both fast/unreliable messages and slow/reliable messages.
hide=SECONDS - How long the message should be hidden from list requests. This allows for delayed insert of messages and "in progress" time when a worker is processing the message but is not ready to delete it.

The persist option is only available on insert, the other options are available on insert and update. All metadata will have tunable default values for deployments. In the future there may also be user-defined metadata added, but this will be considered later.

Operations for HTTP API

The HTTP API will be versioned by using the first part of the path to allow changes to be introduced while maintain backwards compatibility.

GET / - List all supported versions.
GET /version - List all accounts that currently have messages in them.
GET /version/account - List all queues that have message in them.
GET /version/account/queue - List all messages in the queue that are not hidden in FIFO order.
GET /version/account/queue/id - List the message with the given id.
PUT /version/account/queue - Insert a message and automatically generate a unique message id.
PUT /version/account/queue/id - Insert a message with the given id, overwriting a previous message with the id if one existed.
POST /version/account/queue - Update the metadata for all messages in the queue and return modified messages.
POST /version/account/queue/id - Update the metadata for the message with the given id and return the message.
DELETE /version/account - Remove all messages in the account.
DELETE /version/account/queue - Remove all messages in the queue.
DELETE /version/account/queue/id - Remove the message with the given id.

Optional GET and POST parameters

limit=COUNT - Limit the number of returned messages. This allows a worker to grab as few or as many messages it can handle at once. For example, a worker resizing images may only grab one at a time, but a worker processing log messages may grab many for efficient batch processing.
last=ID - Only match IDs after the given ID. This allows for multi-cast messages by always having a 'hide' value of 0 and not deleting messages (let the TTL delete them automatically).
wait=SECONDS - Wait the given time period for a new message if no messages can be returned immediately. This allows workers to long-poll if needed.
detail=none|metadata|all - What details to return in the response body.

Other Features

Support user-defined metadata.
Allow secondary indexes on message metadata.
Support in-process router "workers" for advanced routing configuration. This will require API additions to manage them.

Milestones and Feature Priorities

Cactus Release (April)

Queue "kernel" module with basic functionality.
Basic HTTP API module.
Account API with a simple no-op module.

Diablo Release (July)

Extended HTTP API module (support for advanced metadata if not done already).
Real account modules backed by LDAP and/or SQL database.
Persistent message storage API and simple module.

Future Releases

More persistent message storage modules.
Proxy server module.
More API modules (AMQP, Gearman, ...).

Examples

Basic Asynchronous Queue

Worker: POST /account/queue?wait=60&hide=60&detail=all (long-polling worker, request blocks until a message is ready)
Client: PUT /account/queue (message inserted, returns unique id that was created)
Worker: Return from blocking POST request with new message and process it
Worker: DELETE /account/queue/id

Multi-cast Event Notifications

Worker1: GET /account/queue?wait=60
Worker2: GET /account/queue?wait=60
Client: Put /account/queue/id1?ttl=60
Worker1: Return from blocking GET request with message id1
Worker2: Return from blocking GET request with message id1
Worker1: GET /account/queue?wait=60&last=id1
Worker2: GET /account/queue?wait=60&last=id1

This allows multiple workers to read the same message since the message is not hidden by any worker. The server will automatically remove the message once the message TTL has expired.

@@ Line 11: / Line 11: @@
 * Simple - Initially it will expose a simple REST based API to make it easy to access from any language. It should not require much setup, if any, before applications can start pushing messages into it.
-* Modular API - Initially the focus will be on a simple REST API, but this will not in any way be a first-class API. It should be possible to add other protocols (AMQP, protocol buffers, Gearman, etc) for other use cases. Note that the internal service API will not always provide a 1-1 mapping with the external API, so some features with advanced protocols may be unavailable.
+* Modular API - Initially the focus will be on a simple REST API, but this will not in any way be a first-class API. It should be possible to add other protocols (AMQP, JMS, direct protocol buffers, Gearman, etc) for other use cases. Note that the internal service API will not always provide a 1-1 mapping with the external API, so some features with some protocol modules may be unavailable.
-* Fast - Since this will act as a building block for other services that may drive heavy throughput, performance will have a focus. This mostly comes down to implementation language and how clients and workers interact with the server to reduce network chatter.
+* Fast - Since this will act as a building block for other services that may drive heavy throughput, performance will have a focus. This mostly comes down to the implementation language and how clients and workers interact with the server to reduce network chatter.
 * Multi-tenant - Support multiple accounts for the service, and since this will also be a public service for some deployments, protect against potentially malicious users.
 * Persistent - Allow messages to optionally be persistent. For protocols that can support it, this can be an optional flag when the message is submitted. The persistent storage should also be modular so various data stores can be tested and to accommodate different deployment options.
-* Zones and locality awareness - As has been discussed on the [[OpenStack]] mailing list, locality in cloud services is an important feature. When dealing with where messages should be processed, there needs to be location awareness to process data where it exists to reduce network overhead and processing time.
+* Zones and Locality Awareness - As has been discussed on the [[OpenStack]] mailing list, locality in cloud services is an important feature. When dealing with where messages should be processed, there needs to be location awareness to process data where it exists to reduce network overhead and processing time.
 * No Single Point of Failure - This goes without saying, but the service should not rely on a single component.
 * Horizontally Scalable for Large Deployments - While it will be appropriate for small deployments, the service must support very large deployments such as a public cloud service.
@@ Line 24: / Line 24: @@
 * Server - This is where messages are routed through. Clients will insert messages and workers will receive messages. Multiple servers may be run for high availability, and servers do not need to be aware of one another.
-* Proxy - This is an optional service that sits between the server and clients/workers. This allows clients and workers to connect to a single address and proxy to multiple servers. This is only required for extremely large deployments (such as a public cloud service) and may not be implemented until the second major milestone.
+* Proxy - This is an optional service that sits between the server and clients/workers. This allows clients and workers to connect to a single address and proxy to multiple servers. This is only required for extremely large deployments (such as a public cloud service) and will not be implemented until at least the second major milestone.
 * Client - An external application that inserts or updates messages in the queue.
 * Worker - An external application that reads, updates, or deletes messages in the queue. Note that the same application can be both a client and worker, the differentiation is only made based on typical behaviors, not on what they are capable of doing.
-* Accounts - This represents a user, application, or some other client/worker context. Every message is associated with an account, and accounts can have ACLs to control the CRUD operations available to them. An account ACL may list different permissions for various queue patterns.
+* Accounts - This represents a user, application, or some other account context. Every message is associated with an account, and accounts can have ACLs to control the CRUD operations for other accounts and queues. ACL entries may have some simple pattern matching capabilities.
-* Queue - A queue is simply a namespace, it doesn't have attributes associated with it. Queue names are unique within an account context. Queues cannot be created or destroyed, they automatically appear to clients and workers when a message exists for the queue name, and is automatically removed when all messages for a given queue name are deleted.
+* Queue - A queue is simply a namespace, it doesn't have metadata associated with it. Queue names are unique within an account context. Queues cannot be created or destroyed, they automatically appear to clients and workers when a message exists for the queue name, and is automatically removed when all messages for a given queue name are deleted.
-* Messages - A message consists of a unique ID, attributes, and a body. The unique ID is composed of the account, queue name, and a unique message ID. A server will only contain a single message for a given unique ID. If two messages are inserted with the same unique ID, the last one overwrites the first.
+* Messages - A message consists of a unique ID, metadata, and a body. The unique ID is composed of the account, queue name, and a message ID. A server will only contain a single message for a given unique ID. If two messages are inserted with the same unique ID, the last one overwrites the first.
-===  Message Attributes ===
+===  Message Metadata ===
 * ttl=SECONDS - How long the message will exist in the queue.
 * persist=0|1 - Mark that this message should have a higher guarantee of delivery. This can be quantified in an SLA per deployment depending on the backing store. This allows for both fast/unreliable messages and slow/reliable messages.
-* hide=SECONDS - How long the message should be hidden from GET requests. This allows delayed insert of messages and "in progress" time when a worker GETs the message but has not deleted it.
+* hide=SECONDS - How long the message should be hidden from list requests. This allows for delayed insert of messages and "in progress" time when a worker is processing the message but is not ready to delete it.
-The persist option is only available on insert, but the other options are available on insert and update. All attributes will have tunable default values for deployments.
+The persist option is only available on insert, the other options are available on insert and update. All metadata will have tunable default values for deployments. In the future there may also be user-defined metadata added, but this will be considered later.
 == Operations for HTTP API ==
@@ Line 43: / Line 43: @@
 The HTTP API will be versioned by using the first part of the path to allow changes to be introduced while maintain backwards compatibility.
-* GET /version/account/queue - Return all messages in the queue that are not hidden in FIFO order.
+* GET / - List all supported versions.
-* GET /version/account/queue/id - Return the message with the given id.
+* GET /version - List all accounts that currently have messages in them.
+* GET /version/account - List all queues that have message in them.
+* GET /version/account/queue - List all messages in the queue that are not hidden in FIFO order.
+* GET /version/account/queue/id - List the message with the given id.
 * PUT /version/account/queue - Insert a message and automatically generate a unique message id.
 * PUT /version/account/queue/id - Insert a message with the given id, overwriting a previous message with the id if one existed.
-* POST /version/account/queue - Update the attributes for all messages in the queue and return messages.
+* POST /version/account/queue - Update the metadata for all messages in the queue and return modified messages.
-* POST /version/account/queue/id - Update the attributes for the message with the given id and return the message.
+* POST /version/account/queue/id - Update the metadata for the message with the given id and return the message.
+* DELETE /version/account - Remove all messages in the account.
 * DELETE /version/account/queue - Remove all messages in the queue.
 * DELETE /version/account/queue/id - Remove the message with the given id.
@@ Line 55: / Line 59: @@
 * limit=COUNT - Limit the number of returned messages. This allows a worker to grab as few or as many messages it can handle at once. For example, a worker resizing images may only grab one at a time, but a worker processing log messages may grab many for efficient batch processing.
-* last=ID - Only return IDs after the given ID. This allows for multi-cast messages by always having a 'hide' value of 0 and not deleting messages (let the TTL delete them automatically).
+* last=ID - Only match IDs after the given ID. This allows for multi-cast messages by always having a 'hide' value of 0 and not deleting messages (let the TTL delete them automatically).
 * wait=SECONDS - Wait the given time period for a new message if no messages can be returned immediately. This allows workers to long-poll if needed.
-* detail=0|1 - Whether to return the full message or just the id.
+* detail=none|metadata|all - What details to return in the response body.
 == Other Features ==
+* Support user-defined metadata.
 * Allow secondary indexes on message metadata.
+* Support in-process router "workers" for advanced routing configuration. This will require API additions to manage them.
 == Milestones and Feature Priorities ==
@@ Line 73: / Line 79: @@
 === Diablo Release (July) ===
-* Extended HTTP API module (support for options such as last ID, wait, etc.).
+* Extended HTTP API module (support for advanced metadata if not done already).
 * Real account modules backed by LDAP and/or SQL database.
 * Persistent message storage API and simple module.
@@ Line 81: / Line 87: @@
 * More persistent message storage modules.
 * Proxy server module.
-* More API modules.
+* More API modules (AMQP, Gearman, ...).
 == Examples ==
@@ Line 89: / Line 95: @@
 <pre><nowiki>
-Worker: POST /queue?wait=60&hide=60&detail=1 (long-polling worker, request blocks)
+Worker: POST /account/queue?wait=60&hide=60&detail=all (long-polling worker, request blocks until a message is ready)
-Client: PUT /queue (message inserted, returns unique id that was created)
+Client: PUT /account/queue (message inserted, returns unique id that was created)
 Worker: Return from blocking POST request with new message and process it
-Worker: DELETE /queue/id
+Worker: DELETE /account/queue/id
 </nowiki></pre>
@@ Line 100: / Line 106: @@
 <pre><nowiki>
-Worker1: GET /queue?wait=60
+Worker1: GET /account/queue?wait=60
-Worker2: GET /queue?wait=60
+Worker2: GET /account/queue?wait=60
-Client: Put /queue/id1?ttl=60
+Client: Put /account/queue/id1?ttl=60
 Worker1: Return from blocking GET request with message id1
 Worker2: Return from blocking GET request with message id1
-Worker1: GET /queue?wait=60&last=id1
+Worker1: GET /account/queue?wait=60&last=id1
-Worker2: GET /queue?wait=60&last=id1
+Worker2: GET /account/queue?wait=60&last=id1
 </nowiki></pre>
-This allows multiple workers to read the same message since the message is not hidden by any worker on GET. The server will automatically remove the message once the message TTL has expired.
+This allows multiple workers to read the same message since the message is not hidden by any worker. The server will automatically remove the message once the message TTL has expired.

Difference between revisions of "Obsolete:QueueService"