Jump to: navigation, search

Difference between revisions of "Zaqar/bp/placement-service"

(Added whiteboard screenshot)
m (Malini moved page Marconi/bp/placement-service to Zaqar/bp/placement-service: Project Rename)
 
(19 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
<gallery>
 
<gallery>
File:Placement-service.jpg|Placement Service Draft
+
File:Placement-service.jpg|Placement Service Draft v0.1
 
</gallery>
 
</gallery>
  
Marconi depends on storage for maintaining state on messages. However, as it is designed now, this is a scaling bottleneck because at most one storage node can be specified per Marconi instance. This can be partially offset by taking advantage of built-in sharding and replication for storage providers like [http://docs.mongodb.org/manual/reference/program/mongos/ MongoDB], but many other storage engines do not provide such a system. Furthermore, the migration of queues from one storage node to another is not supported cleanly by most storage providers. More intelligent data migration can be performed by taking into account that Marconi operates at the level of queues.
+
'''Rationale''': Marconi has a storage bottleneck
  
The placement service aims to address this by providing a means to specify dynamically where storage for Marconi resources should be performed. It is responsible specifically for:
+
'''Proposal goal''': Remove that bottleneck
  
* Maintaining a catalogue, a set of mapping of (Project + Queue) => storage node locations
+
The placement service aims to address this by handling storage transparently and dynamically.
* Handling queue creation transparently
 
* Handling migration transparently
 
* Handling deletion of queues transparently
 
  
Here, transparency must exist at two levels: to the user and to the implementation of the storage driver. Transparency to the user presents itself in the form of availability and use of the Marconi service must not be interrupted when a migration is taking place. Transparency to the storage drivers means that the storage driver is handed a location/connection and only cares about the serialization/deserialization of data to that storage location.
+
=== Transparency ===
  
== Approach ==
+
* User transparency: availability and use of the Marconi service must not be interrupted when a migration is taking place.
 +
* Implementation transparency: storage driver is handed a location/connection and only cares about the serialization/deserialization of data to that storage location.
  
A placement service consists of the following components:
+
=== Terminology ===
  
* A catalogue of mappings: {queue => [read locations], queue => [write locations], queue => status}
+
* '''Marconi partition''': one Marconi master, a set of Marconi workers, and a storage deployment. This is the minimum abstraction: one adds a Marconi partition, not a storage node or a Marconi worker
    * where status is one of ACTIVE or MIGRATING
+
* '''Marconi master''': receives requests and forwards them round robin to Marconi workers
* A cache maintained in each Marconi instance of the mappings
+
* '''Marconi workers''': process requests and communicate with storage
* A migration service
+
* '''Storage deployment''': a set of storage nodes - one or many, as long as they're addressable with a single client connection
  
=== Catalogue ===
+
== Reference Deployment: Smart Proxy and Partition as a Unit ==
  
The catalog, as described above, maintains these mappings from queues to locations.
+
This approach is emerging as the leading reference implementation for handling scaling of the Marconi service. The primary components are:
  
It is strictly an optimization. Marconi can continue to function in a degraded state if the placement service goes down. The data contained in the catalogue can be regenerated by querying each existing queue while disabling data migration and caching.
+
* A load balancer that can redirect tenant requests to a cluster URL
 +
* Operating Marconi at the partition level
  
The catalogue contains a series of structures that look as follows:
+
=== Partitions ===
 +
 
 +
* One master to round-robin tasks to workers
 +
* N Marconi web servers
 +
* A storage deployment
 +
 
 +
Operators can optimize N to match their storage configuration and persistence needs.
 +
 
 +
=== Smart Proxy ===
 +
 
 +
The smart proxy maintains a mapping from tenants/projects (ID-based) to partition URLs.
 +
 
 +
=== Migration Strategy ===
 +
 
 +
Freezing Export: have a migration service running on each Marconi partition. The service, when given a queue and a destination partition, launches an export worker. The export worker the communicates the desired data to the new partition's migration service, which in turn launches an import worker to bring in the data. In summary:
 +
 
 +
* "Freeze" the source queue
 +
* Export the queue from the source
 +
* Import the queue to the destination
 +
* "Thaw" the queue
 +
 
 +
Freeze: set a particular queue as read-only at the proxy layer
 +
Thaw: restore a particular queue to normal status at the proxy layer
 +
 
 +
=== Advantages ===
 +
 
 +
* Easier to implement
 +
* No changes to Marconi
 +
* Scalable
 +
* Transparent
 +
 
 +
=== Disadvantages ===
 +
 
 +
* Requires the implementation of a smart proxy - this includes: routing requests, partition management, catalogue management, regeneration. and synchronization
 +
* Benefits from having access to raw_read and raw_write functions wrt storage layer
 +
 
 +
== Current State ==
 +
 
 +
=== Concepts ===
 +
 
 +
==== Partitions ====
 +
 
 +
Partitions have: 1) a name, 2) a weight, and 3) a list of node URIs. For example:
  
 
<pre><nowiki>
 
<pre><nowiki>
 
{
 
{
     "{project}.{queue}" : {
+
  "default": {
        "r": ["mongo://192.168.1.105:7777", "mongo://192.168.1.106:7777"],
+
     "weight": 100,
        "w": "mongo://192.168.1.105:7777",
+
    "nodes": [
        "s": 0
+
      "http://localhost:8889",
     }
+
      "http://localhost:8888",
 +
      "http://localhost:8887",
 +
      "http://localhost:8886"
 +
     ]
 +
  }
 
}
 
}
 
</nowiki></pre>
 
</nowiki></pre>
  
An entry in the queue is found by concatenating a project with a queue name. "r" is the collection of locations where data for a particular queue can be gathered from. "w" is the location where new data written to this queue is stored. "s" is the state of the queue: active or migrating.
+
==== Catalogue ====
  
The catalogue is filled with many such documents. The local cache for a given Marconi node is populated with the entries from this cache, to avoid lookups on each request.
+
Catalogue entries have: 1) a key, 2) a node URI, and 3) metadata. For example:
  
=== Marconi Cache ===
+
<pre><nowiki>
 +
{
 +
  "{project_id}.{queue_name}": {
 +
    "href": "http://localhost:8889",
 +
    "metadata": {
 +
      "awesome": "sauce"
 +
    }
 +
  }
 +
}
 +
</nowiki></pre>
  
This is one of the tricky aspects of the placement service. In the face of data migrations and queue deletions, the cache must be invalidated and updated appropriately.
+
=== API ===
  
The current proposed caching scheme is as follows:
+
<pre><nowiki>
 +
GET /v1/partitions  # list all registered partitions
  
# Populate the cache when a Marconi instance is brought up
+
GET /v1/partitions/{name}  # fetch details for a single partition
# Process a request from the cache whenever possible
+
PUT /v1/partitions/{name}  # register a new partition
# If a cache lookup fails, refer to the placement service catalogue - update the cache
+
DELETE /v1/partitions/{name}  #  
  
Furthermore, to reduce inconsistency, there's another method to handle cache updates - periodic refresh.
+
# the catalogue is updated by operations routed through /v1/queues/{name}
 +
GET /v1/catalogue  # list all entries in the catalogue for the given project ID
 +
GET /v1/catalogue/{name}  # fetch info for the given catalogue entry
 +
</nowiki></pre>
  
==== Periodic Refresh ====
+
=== Implementation ===
  
On a separate Marconi "thread", poll the catalogue service periodically (say, every 10 seconds). This actor is responsible for updating the cache. It queries the catalogue service and looks for changes.
+
==== Needs Review ====
  
To enable this, migrations are only allowed a granularity of 5 minutes. This helps avoid race conditions on a catalogue resource, since the migration itself triggers a state change from active to migrating for a particular queue.
+
* Proxy (partition, catalogue, queues handling): https://review.openstack.org/#/c/43909/
 +
* Proxy (v1, health): https://review.openstack.org/#/c/44356/
 +
* Proxy (forward the rest of the routes): https://review.openstack.org/#/c/44364/
  
==== Push Refresh ====
+
==== To Do ====
  
This approach does away with time and puts the responsibility of invalidating caches on the placement service. All Marconi nodes must maintain a listen port connected to the placement service, and whenever a migration occurs, Marconi nodes receive updates on queues that are being affected.
+
* Hierarchical caching: store data in authoritative store (mongo replicaset) on write operations, and cache locally using Redis instance, hitting authoritative only on failed lookups
 +
* Benchmarking
 +
* Unit tests
 +
* Functional tests
 +
* Configuration
 +
* Catalogue and partition registry regeneration
  
=== Deletions ===
+
=== Deployment ===
  
Deletions take priority over migrations. If a migration is in progress for Q1, and a request to delete Q1 is made, then all messages for Q1 are deleted both from the initial storage location and the destination storage location. The migration is cancelled.
+
* Bring up authoritative replicaset
 +
* Bring up redis-server on each box
 +
* launch marconi.proxy.app:app using a WSGI/HTTP server
  
=== Further Details ===
+
== More Ideas/Deprecated ==
  
It seems likely that some form of administrative API will need to be exposed for Marconi to handle communication between the placement service and Marconi needs. It might be necessary for cache handling operations and for triggering migrations.
+
[[Deprecated]]

Latest revision as of 18:42, 7 August 2014

Overview

Rationale: Marconi has a storage bottleneck

Proposal goal: Remove that bottleneck

The placement service aims to address this by handling storage transparently and dynamically.

Transparency

  • User transparency: availability and use of the Marconi service must not be interrupted when a migration is taking place.
  • Implementation transparency: storage driver is handed a location/connection and only cares about the serialization/deserialization of data to that storage location.

Terminology

  • Marconi partition: one Marconi master, a set of Marconi workers, and a storage deployment. This is the minimum abstraction: one adds a Marconi partition, not a storage node or a Marconi worker
  • Marconi master: receives requests and forwards them round robin to Marconi workers
  • Marconi workers: process requests and communicate with storage
  • Storage deployment: a set of storage nodes - one or many, as long as they're addressable with a single client connection

Reference Deployment: Smart Proxy and Partition as a Unit

This approach is emerging as the leading reference implementation for handling scaling of the Marconi service. The primary components are:

  • A load balancer that can redirect tenant requests to a cluster URL
  • Operating Marconi at the partition level

Partitions

  • One master to round-robin tasks to workers
  • N Marconi web servers
  • A storage deployment

Operators can optimize N to match their storage configuration and persistence needs.

Smart Proxy

The smart proxy maintains a mapping from tenants/projects (ID-based) to partition URLs.

Migration Strategy

Freezing Export: have a migration service running on each Marconi partition. The service, when given a queue and a destination partition, launches an export worker. The export worker the communicates the desired data to the new partition's migration service, which in turn launches an import worker to bring in the data. In summary:

  • "Freeze" the source queue
  • Export the queue from the source
  • Import the queue to the destination
  • "Thaw" the queue

Freeze: set a particular queue as read-only at the proxy layer Thaw: restore a particular queue to normal status at the proxy layer

Advantages

  • Easier to implement
  • No changes to Marconi
  • Scalable
  • Transparent

Disadvantages

  • Requires the implementation of a smart proxy - this includes: routing requests, partition management, catalogue management, regeneration. and synchronization
  • Benefits from having access to raw_read and raw_write functions wrt storage layer

Current State

Concepts

Partitions

Partitions have: 1) a name, 2) a weight, and 3) a list of node URIs. For example:

{
  "default": {
    "weight": 100,
    "nodes": [
      "http://localhost:8889",
      "http://localhost:8888",
      "http://localhost:8887",
      "http://localhost:8886"
    ]
  }
}

Catalogue

Catalogue entries have: 1) a key, 2) a node URI, and 3) metadata. For example:

{
  "{project_id}.{queue_name}": {
    "href": "http://localhost:8889",
    "metadata": {
      "awesome": "sauce"
    }
  }
}

API

GET /v1/partitions  # list all registered partitions

GET /v1/partitions/{name}  # fetch details for a single partition
PUT /v1/partitions/{name}  # register a new partition
DELETE /v1/partitions/{name}  # 

# the catalogue is updated by operations routed through /v1/queues/{name}
GET /v1/catalogue  # list all entries in the catalogue for the given project ID
GET /v1/catalogue/{name}  # fetch info for the given catalogue entry

Implementation

Needs Review

To Do

  • Hierarchical caching: store data in authoritative store (mongo replicaset) on write operations, and cache locally using Redis instance, hitting authoritative only on failed lookups
  • Benchmarking
  • Unit tests
  • Functional tests
  • Configuration
  • Catalogue and partition registry regeneration

Deployment

  • Bring up authoritative replicaset
  • Bring up redis-server on each box
  • launch marconi.proxy.app:app using a WSGI/HTTP server

More Ideas/Deprecated

Deprecated