MagnetoDB/specs/async-schema-operations

Problem Description
Large amount of concurrent create/delete table operations creates huge load on Cassandra. In fact schema agreement process for some of the calls may take too much time so it results in timeout errors and corresponding tables stuck in CREATING/DELETING state forever.

Status
Implemented

QueuedStorageManager
Implement QueuedStorageManager that, rather than executing create/delete table calls directly, enqueues them to MQ shipped with Openstack via oslo.messaging.rpc. It should use non-blocking calls. RPC calls should only include request context as a dictionary and a table name. All necessary information about create/delete table parameters (table schema etc) should be retrieved from table_info_repo. In case of error during creating/deleting table on RPC server side, status of corresponding table should be set to CREATE_FAILED or DELETE_FAILED.

Magnetodb Async Task Executor
Introduce separate executable, "magnetodb-async-task-executor", that will run blocking RPC server which will execute create/delete table requests strictly one by one. Number of simultaneously running processes will effectively define the maximum allowed number of concurrent create/delete table requests.

Only small number of magnetodb-async-task-executor instances is supposed to run (usually only one instance). This component should be designed 'ready to fail'. External tools should be used to monitor its presence and respawn it in case of failure.

Additionally, table status update time should be introduced. Each table status change should updade that attribute as well. During describe table call this attribute should be analyzed, whether table is in CREATING or DELETING status for a long time. If so, it's status should be changed to CREATE_FAILED or DELETE_FAILED.

RPC settings

 * control_exchange: magnetodb
 * amqp_durable_queues: True
 * topic: schema

RPC calls

 * create(context, table_name)
 * delete(context, table_name)

Notifications Impact
RPC server should notify on table creation/deletion stat, end and error

Other End User Impact
TBD

Performance Impact
Create/Delete table operations are expected to be slower but much more reliable.

Deployment Impact

 * 1) Magnetodb Async Task Executor should be deployed to separate node or one of MagnetoDB API nodes.
 * 2) QueuedStorageManager should be used as a storage manager in "magnetodb-api.conf" for each MagnetoDB API instance.
 * 3) Oslo.Messaging should be configured in "magnetodb-api.conf" for each MagnetoDB API instance.

Developer Impact
None

Assignee(s)
Primary assignee:

Other contributors: 

Work Items

 * 1) implement QueuedStorageManager
 * 2) implement Magnetodb Async Task Executor
 * 3) update devstack integration scripts

Dependencies
Oslo.Messaging

Documentation Impact
Magnetodb Async Task Executor deployment should be covered in corresponding doc (TBD)