Marconi: A Message Bus for OpenStack

This specification formalizes the requirements and design considerations captured during one of the Grizzly Summit working sessions that proposed a message bus project for OpenStack. As the project evolves, so too will its requirements, so this specification is only meant as a starting point.

Here's a brief summary of how Marconi works:

Clients post messages via HTTP to Marconi. The URL contains a tenant ID.
Marconi persists messages according to either a default TTL, or one specified by the client.
Clients poll Marconi for messages. Whereas other popular message bus servers use the notion of topics or channels to namespace messages, Marconi is completely tag-based, allowing for maximum flexibility in distribution patterns.
Clients may optionally apply a transaction UUID to the next batch of messages that do not already have a transaction associated with them. In this case, the server returns a list of affected messages for processing by the client. Once the client has processed each message, it can delete that message from the server. In this way, Marconi provides a mechanism for ensuring each message is processed once and only once.

Rationale

The lack of an integrated cloud message bus service is a major inhibitor to OpenStack adoption. While Amazon has SQS and SNS, OpenStack currently provides no alternatives.

OpenStack needs a multi-tenant message bus that is fast, efficient, durable, horizontally-scalable and reliable. Furthermore, the current RPC mechanism that various OpenStack components use to communicate with each other does not include usable APIs for subscribing to notifications, or for sending and receiving generic messages to be consumed by multiple workers. This has complicated OpenStack metering and billing implementations.

The Marconi project will address these needs, acting as a compliment to the existing RPC infrastructure with OpenStack, while providing multi-tenant services that can be exposed to applications running on public and private clouds.

Polling vs. persistent, push - massive concurrency, utilization, keep-alive, user perception.

Use Cases

1. Distribute tasks among multiple workers (transactional job queues)

2. Forward events to data collectors (transactional event queues)

3. Publish events to any number of subscribers (pub-sub)

4. Send commands to one or more agents (RPC via point-to-point or pub-sub)

5. Request information from an agent

Design Goals

Marconi's design philosophy is derived from Donald A. Norman's work regarding The Design of Everyday Things:

 The value of a well-designed object is when it has such a rich set of affordances that the people who use it can do things with it that the designer never imagined.

Goals related to the above:

Emergent functionality, utility
Modular, pluggable code base
REST architectural style

Principles to live by:

DRY
YAGNI
KISS

Major Features

Non-Functional

Versioned API
Multi-tenant
Implemented in Python, following PEP 8 and pythonic idioms
Modular, kernel-based architecture
Async I/O
Monitoring driver
Logging driver
Health endpoint
Client-agnostic
Low response time, turning around requests in 50ms or less, even under load
High throughput, serving millions of reqs/min with a small cluster
Horizontal scaling of both reads and writes
Support for HA deployments
Guaranteed delivery
Best-effort message ordering
Server generates all IDs (i.e., message and transaction IDs)
Gzip'd large messages
Secure (audited code, end-to-end HTTPS support, pen testing, etc.)

Functional

JSON and XML media types
Opaque payload (although must be valid JSON or XML)
Max payload size of 64K
Batch message posting and querying
Tag-based filtering (channels and distribution patterns are emergent)
Keystone auth driver (service catalog may return endpoints for different regions and/or different characteristics)
CLI client
Client libraries for Python, PHP, Java, and C#
Specify safety (optional)
Message signing (HMAC)
Auto-generated audit river for actions and state changes, filterable

Future Features

Listed in no particular order:

JSON-P support
Caching
Temporal queries
JavaScript client library (browser and Node.js)
Ruby client library
PHP client library
Cross-regional replication
Horizon plug-in
Ceilometer data provider
PyPy support

Better put into extensions (YAGNI):

Priority queues
Guaranteed order
Long-polling
Websockets

Non-Features

Marconi may be used to support other services that provide the following functionality, but will not embed these abilities directly within its code base.

Any kind of push notifications over persistent connections (leads to complicated state management and poor hardware utilization)
Forwarding notifications to email, SMS, Twitter, etc. (ala SNS)
Forwarding notifications to web hooks
Forwarding notifications to APNS, GCM, etc.
Scheduling-as-a-service (ala IronWorker)
Metering and monitoring solutions

Architecture

Marconi will use a micro-kernel architecture. Auth, web server, storage, cache, logging, monitoring, etc. will all be implemented as drivers, allowing vendors to customize Marconi to suite. Note, however, that the web framework will be tightly coupled with the micro-kernel for maximum performance, and will not be customizable without hacking on the kernel itself.

Possible frameworks that can help realize a highly modular design:

pkg_resources
stevedore

Non-customizable modules

WSGI-based micro web framework, tuned for low latency and high throughput

Reference drivers

Auth: Keystone
Web Server: Chausette
Storage: MongoDB
Cache: Redis, MongoDB
Logging: Syslog, stdout, file
Monitoring: TBD

API

View the Marconi API spec.

Test Plan

All development will be done TDD-style using nose. Pair programming may happen on accident (or even on purpose). Eventually we'll add integration, performance, and security tests, and get everything automated in a nice and tidy CI pipeline.

Zaqar/specs/havana