Burrow Proxy Backend

In order to provide the scale-out and HA for a public service, Burrow needs a proxy module that can accept requests and distribute them to a set of backend servers using consistent hashing.

Choosing the server(s)

Requests in burrow can be at the global level (list all accounts), the account level (list all queues), the queue level (list/modify all messages), and the message level (create/list/modify a particular message). The consistent hashing logic must accommodate each of these. In order to do this, a two-step consistent hashing algorithm is being proposed.

[is a great introduction on consistent hashing by gholt from the Swift team.|]

Note that listing all accounts for admin purposes could be an expensive operation (needs to contact every server to list and remove duplicates across different servers), so this may be disabled with large deployments.

Hash on account

When a given account, queue, or message is requested, a hash will first be performed on the account ID to get a subset of total servers that handle queues for that account. For example, if there are 100 queue servers, account ID "eday" may only use buckets 4, 20, and 67. This is done in order to reduce the number of servers (and network communication) for queue operations such as listing queues or messages. Otherwise a "list all queues" account would need to contact every backend server, since the mesagage hash could map to any.

Hash on queue/message

Once a subset of backend servers is chosen based on the account hash, a second hash is performed on the "queue/message" to identify a single hash bucket that is responsible for this message. This hash bucket can contain more than one server in order to provide HA in case one goes down. All proxy servers should prefer the servers in the same order to make sure the same message ID can be found by contacting the first server. One questions is how dynamic should this be, and what should happen when a down server becomes available again. One solution is to have the proxies contact all servers in the bucket list for list and update operations, but always prefer the first for creating (to manage de-duplication). This leads to the possibility for the same message ID to exist in two servers, possibly with different payloads. This should be rare (failover plus duplicate insert), and since Burrow requires workers to be idempotent, it should be sufficient.

Expanding the server set

When the server set changes, both the new and old consistent hashing servers are used by the proxy in order to drain old messages. Active rebalancing of messages are not done since the lifetime of a message is expected to be very short. It should be easier for the proxy to serve from the extra servers for until the max TTL time expires than to rebalance old messages, but this may not be true. Testing will need to be performed at a larger scale once things are further along.

Examples

These examples start with a hash ring with 100 servers.

Insert message eday/test_queue/msg_1:

Hash on eday to return buckets [4,20,67] (from entire set).
Hash on test_queue/msg_1 to get bucket 20 (from [4,20,67])
Bucket 20 has two servers assigned, 10.0.0.1 and 10.0.0.2, insert into the first.
If the first is down, failover and insert into the second.
If the second is also down, fail the request.

List all queues for eday:

Hash on eday to return buckets [4,20,67] (from entire set).
For each server in each hash bucket, list all queues.
Remove duplicates from the list, return.

List/update all messages in eday/test_queue:

Hash on eday to return buckets [4,20,67] (from entire set).
For each server in each hash bucket, forward the update/list request for the queue.
If returning message data, remove message ID duplicates and return final list.

Open Questions

Is this too much network chatter? We can use persistent connections and multiplex different account requests over a single connection to keep connections down.

How can we handle pagination with marker parameters like Swift does? For example, listing all messages in a large queue efficiently, with marker=X/limit=10. For a single server this works fine, with the proxy we either need special marker tokens (packed markers for each server used) or some other mechanism.

Obsolete:BurrowProxy