Distributed Task Management With RPC

Why: This document proposes an architecture of distributed flow that can run tasks simultaneously on multiple workers (increasing scalability and reliability). The main goal here is to provide such an architecture that will allow the user to replace a local engine with a distributed engine without changing any code.

See more details @ https://etherpad.openstack.org/p/TaskFlowWorkerBasedEngine

Client: a machine (or program) that runs a distributed flow
Worker: a machine (or program) that executes distributed flows’ tasks by responding to execution requests
Distributed Task: a task execution type that performs a remote procedure call to a worker
Remote Task: a task that runs on the worker side and executes some code to make a flow progress

How

Distributed flow consists of client and workers. The client (the code that has the engine) runs a flow. When the client wants to start a new task it makes RPC call/s to workers and passes client's endpoint and task's arguments. One of the workers accepts the task and sends a confirmation to the client. Then it starts to execute the task and sends heartbeats to the client. Client listens the worker. When the task is done the worker sends a result. The client considers worker as failed if it hadn't been receiving a task status message for a timeout period. There is a flow architecture on the following image.

Distributed Flow implementation

We don't want to make any difference between distributed and non-distributed flow descriptions (making it as transparent as possible to users of taskflow). The difference should be only in flow engines. Distributed engine should work very similar to SingleThreadedEngine. But instead of TaskAction the DistributedTaskAction should be used. The DistributedTaskAction should make an RPC call instead of running Task.execute() method. On the worker's side a special interface should be implemented for each task.

   class MyTaskEndpoint(object):
       target = messaging.Target(topic='MyTask', server='server1')
       def execute(self, client_endpoint, task_args):
            RPCClient.register_listener(client_endpoint)
            task = tasks.MyTask()
            result = task.execute(**task_args)
            RPCClient.send_result(result, client_endpoint)
       def revert(self, client_endpoint, task_args):
            heartbeat.register_listener(client_endpoint)
            task = tasks.MyTask()
            task.revert(**task_args)
            RPCClient.notify_reverted(client_endpoint)

This interface has its unique topic that corresponds to the name of the task. It contains execute and revert methods. These methods accept a client endpoint to send a heartbeats to the client, and task args required to execute the task.

Distributed Task Management With RPC

Contents

Architecture

Definitions

How

Distributed Flow implementation