Difference between revisions of "Zaqar/bp/placement-service"

Revision as of 13:35, 26 August 2013

Overview

Placement Service Draft v0.1

Rationale: Marconi has a storage bottleneck

Proposal goal: Remove that bottleneck

The placement service aims to address this by handling storage transparently and dynamically.

Transparency

User transparency: availability and use of the Marconi service must not be interrupted when a migration is taking place.
Implementation transparency: storage driver is handed a location/connection and only cares about the serialization/deserialization of data to that storage location.

Terminology

Marconi partition: one Marconi master, a set of Marconi workers, and a storage deployment. This is the minimum abstraction: one adds a Marconi partition, not a storage node or a Marconi worker
Marconi master: receives requests and forwards them round robin to Marconi workers
Marconi workers: process requests and communicate with storage
Storage deployment: a set of storage nodes - one or many, as long as they're addressable with a single client connection

Reference Deployment: Smart Proxy and Partition as a Unit

This approach is emerging as the leading reference implementation for handling scaling of the Marconi service. The primary components are:

A load balancer that can redirect tenant requests to a cluster URL
Operating Marconi at the partition level

Partitions

One master to round-robin tasks to workers
N Marconi web servers
A storage deployment

Operators can optimize N to match their storage configuration and persistence needs.

Load Balancer

The load balancer maintains a mapping from tenants/projects (ID-based) to partition URLs.

Migration Strategy

Freezing Export: have a migration service running on each Marconi partition. The service, when given a queue and a destination partition, launches an export worker. The export worker the communicates the desired data to the new partition's migration service, which in turn launches an import worker to bring in the data. In summary:

"Freeze" the source queue
Export the queue from the source
Import the queue to the destination
"Thaw" the queue

Freeze: set a particular queue as read-only at the proxy layer Thaw: restore a particular queue to normal status at the proxy layer

Advantages

Easier to implement
No changes to Marconi
Scalable
Transparent

Disadvantages

Data migration is less granular: performed at tenant level vs. queue level

Roadmap

Phase 1: Replication and Sharding

In the first iteration of this project, the goal is to provide an easy way to replicate and shard data across dynamically allocated storage nodes. The required features are:

Catalogue + management API
Storage allocation + management API: static weights
Policy engine: being able to assign dedicated storage to particular tenants
Local catalogue cache + push consistency

Phase 2: Migration, Dynamism, and Deletion

Phase 2 introduces the ability to migrate data (read: Marconi data, queues + messages) from one storage node to another. The ability to remove entries from the catalog and also schedule them to be removed from storage is considered. Finally, dynamically controlled storage allocation is introduced for more hands-off operation. In sum:

Migration + migration API: move data from storage node to node, set storage as read-only
Deletion: remove data from catalog and from storage nodes
Dynamism: monitor storage nodes and adjust weights based on node capacity and load

Phase 3: Data Affinity and Generalization

This phase optimizes and generalizes placement service. Conceptually, there's no reason placement service should serve only the needs of Marconi. Requirements:

Data affinity: attempt to cache particular storage connections on worker nodes where certain data appears more often

   - Useful for reducing the number of cached connections

Generalization: make the placement service usable by other services

Phase N: ???

Because the future is open, and predicting beyond this point is very difficult.

Ideas Under Consideration

Periodic Refresh

On a separate Marconi "thread", poll the catalogue service periodically (say, every 10 seconds). This actor is responsible for updating the cache. It queries the catalogue service and looks for changes.

To enable this, migrations are only allowed a granularity of 5 minutes. This helps avoid race conditions on a catalogue resource, since the migration itself triggers a state change from active to migrating for a particular queue.

Push Refresh

This approach does away with time and puts the responsibility of invalidating caches on the placement service. All Marconi nodes must maintain a listen port connected to the placement service, and whenever a migration occurs, Marconi nodes receive updates on queues that are being affected.

Deletions

Deletions take priority over migrations. If a migration is in progress for Q1, and a request to delete Q1 is made, then all messages for Q1 are deleted both from the initial storage location and the destination storage location. The migration is cancelled.

Dynamic Weight Management

The operator of the placement service can manually determine the weights of the storage locations to bootstrap the system. However, in the future, it would be preferred to dynamically update these weights based on host parameters such as:

Storage location CPU load
Storage location remaining capacity

This adds a level of intelligence to the placement storage layer that makes maintenance a more hands-free experience.

Connection Pooling

To be filled soon: caching strategy for storage connections at the Marconi worker level

Deprecated

@@ Line 44: / Line 44: @@
 === Migration Strategy ===
-Double-writing: have proxy duplicate all write requests to the old partition and the new partition. All reads continue to be performed on the old partition. Once the migration is complete, update proxy to redirect to new partition entirely.
+Freezing Export: have a migration service running on each Marconi partition. The service, when given a queue and a destination partition, launches an export worker. The export worker the communicates the desired data to the new partition's migration service, which in turn launches an import worker to bring in the data. In summary:
-* POST/PUT/DELETE - writes
+* "Freeze" the source queue
-* HEAD/GET/OPTIONS - reads
+* Export the queue from the source
-* Cost: global load increases temporarily
+* Import the queue to the destination
-* Benefits: no modifications to Marconi, 0 downtime
+* "Thaw" the queue
+Freeze: set a particular queue as read-only at the proxy layer
+Thaw: restore a particular queue to normal status at the proxy layer
 === Advantages ===