MagnetoDB/specs/async-schema-operations

Problem description

Large amount of concurrent create/delete table operations creates huge load on Cassandra. In fact schema agreement process for some of the calls may take too much time so it results in timeout errors and corresponding tables stuck in CREATING/DELETING state forever.

Solution

Storage manager as a RPC client

Storage manager, rather than executing create/delete table calls directly, should enqueue them to MQ shipped with Openstack via oslo.messaging.rpc. It should use non-blocking calls. RPC calls should only include request context as a dictionary and a table name. All necessary information about create/delete table parameters (table schema etc) should be retrieved from table_info_repo. In case of error during creating/deleting table on RPC server side, status of corresponding table should be set to ERROR

RPC server

Separate process, magnetodb-schema-processor, should run blocking RPC server that will execute create/delete table requests strictly one by one. Number of simultaneously running processes will effectively define the maximum allowed number of concurrent create/delete table requests.

Additionally, table status update time should be introduced. Each table status change should updade that attribute as well. During describe table call this attribute should be analyzed, whether table is in CREATING or DELETING status for a long time. If so, it's status should be changed to ERROR.

RPC settings

control_exchange: magnetodb
amqp_durable_queues: True
topic: schema

RPC calls

create(context, table_name)
delete(context, table_name)