Marconi: Cloud Message Queuing for OpenStack

This specification formalizes the requirements and design considerations captured during one of the Grizzly Summit working sessions to initiate a message bus project for OpenStack. As the project evolves, so too will its requirements, so this specification is only meant as a starting point.

Here's a brief summary of how Marconi works:

Clients post messages via HTTP to Marconi. The URL contains a tenant ID.
Marconi persists messages according to either a default TTL, or one specified by the client.
Clients poll Marconi for messages.
Clients may optionally claim a batch of messages, hiding them from other clients. Once the client has processed each message, it can delete it from the server. In this way, Marconi provides a mechanism for ensuring each message is processed once and only once.

Rationale

The lack of an integrated cloud message bus service is a major inhibitor to OpenStack adoption. While Amazon has SQS and SNS, OpenStack currently provides no alternatives.

OpenStack needs a multi-tenant message bus that is fast, efficient, durable, horizontally-scalable and reliable.

The Marconi project will address these needs, acting as a compliment to the existing RPC infrastructure within OpenStack, while providing multi-tenant services that can be exposed to applications running on public and private clouds.

Use Cases

1. Distribute tasks among multiple workers (transactional job queues)

2. Forward events to data collectors (transactional event queues)

3. Publish events to any number of subscribers (pub-sub)

4. Send commands to one or more agents (RPC via point-to-point or pub-sub)

5. Request information from an agent (RPC via point-to-point)

6. Monitor a Marconi deployment (DevOps)

Design Goals

Marconi's design philosophy is derived from Donald A. Norman's work regarding The Design of Everyday Things:

 The value of a well-designed object is when it has such a rich set of affordances that the people who use it can do things with it that the designer never imagined.

Goals related to the above:

Emergent functionality, utility
Modular, pluggable code base
REST architectural style

Principles to live by:

DRY
YAGNI
KISS

Major Features

Non-Functional

Versioned API
Multi-tenant
Implemented in Python, following PEP 8 and pythonic idioms
Modular, driver-based architecture
Async I/O
Client-agnostic
Low response time, turning around requests in 20-50ms (or better), even under load
High throughput, serving millions of reqs/min with a small cluster
Thousands of req/sec per queue (?)
100's of thousands of queues per tenant
Horizontal scaling of both reads and writes
Support for HA deployments
Guaranteed delivery
Best-effort message ordering
Server generates all IDs
Gzip'd HTTP bodies
Secure (audited code, end-to-end HTTPS support, penetration testing, etc.)
Schema validation
Auth caching

Functional

Eventing and work queuing semantics
JSON
Opaque payload (although must be valid JSON)
Max payload size of 4K
Batch message posting and querying
Keystone auth driver (service catalog may return endpoints for different regions and/or different characteristics)

Future Features (Brainstorming)

TODO: Create blueprints for these, prioritize

Brainstormed features, listed in no particular order:

LZ4 compression for messages at rest
Multi-Transport (Http, ZMQ)
SQLAlchemy driver
REPL for debugging, testing, diagnostics
Client libraries for Python, PHP, Java, and C#
Auto-generated audit river (read-only queue) for actions and state changes
Delayed delivery
Hot-reconfigure
PATCH support for updating queue metadata
Set/get arbitrary queue metadata
Kombu Integration
API tokens tied to a specific app and a specific queue, OAuth?
Message signing
Standalone control panel or at least a simple admin/dashboard app for ops
JSON-P support (may need to use the while(1); trick to prevent XSS attacks)
Multi-get (specify a list of queues to query in a single request)
Tag-based filtering
- Includes a way to return in one call, everything with or without the tag (OR semantics) to afford fanout.
XML support
LZ4 or snappy body compression (at rest, and in WSGI server as well as client libs)
Response caching
Authorization (based on tags and/or queues)
Cross-tenant sharing (need to define business case)
Temporal queries
JavaScript client library (browser and Node.js)
Ruby client library
PHP client library
Cross-regional replication
Horizon plug-in
Ceilometer data provider
PyPy support
HTTP 2.0 support
Long-polling
Web Socket transport driver
Web hooks

Non-Features

Marconi may be used to support other services that provide the following functionality, but will not embed these abilities directly within its code base.

Any kind of push notifications over persistent connections (leads to complicated state management and poor hardware utilization)
Forwarding notifications to email, SMS, Twitter, etc. (ala SNS)
Forwarding notifications to web hooks
Forwarding notifications to APNS, GCM, etc.
Scheduling-as-a-service (ala IronWorker)
Metering and monitoring solutions

Architecture

Marconi will use a micro-kernel architecture. Auth, transport, storage, cache, logging, monitoring, etc. will all be implemented as drivers or exposed with standard protocols, allowing vendors to customize Marconi to suit.

Endpoint controllers define the interface between storage and transport. More info.

Possible frameworks that can help realize a highly modular design:

pkg_resources
stevedore

Reference drivers

Transport: HTTP(S) via WSGI using Falcon
Auth: Keystone middleware
Storage: MongoDB
Logging: Standard library logging
Monitoring: TBD - Statsd, as well as HTTP stats page?

Deployment Options

Self-host via gevent.http or ZMQ
Host with a WSGI server.

Requires writing a small bootstrap script to load the kernel and export the app callable.
Bootstrap script also allows full programmatic customization of logging

API

See the Marconi API spec. [ROUGH DRAFT]

Test Plan

All development will be done TDD-style using nose and testtools. Pair programming may happen on accident (or even on purpose). Eventually we'll add integration, performance, and security tests, and get everything automated in a nice and tidy CI pipeline.

Zaqar/specs/havana

Contents