Difference between revisions of "Distributed Task Management With RPC"

Revision as of 22:05, 10 March 2014

Why: This document proposes an architecture of distributed flow that can run tasks simultaneously on multiple workers (increasing scalability and reliability). The main goal here is to provide such an architecture that will allow the user to replace a local engine with a distributed engine without changing any code.

See more details @ https://etherpad.openstack.org/p/TaskFlowWorkerBasedEngine

Client: a machine (or program) that runs a distributed flow
Worker: a machine (or program) that executes distributed flows’ tasks by responding to execution requests
Distributed Task: a task execution type that performs a remote procedure call to a worker
Remote Task: a task that runs on the worker side and executes some code to make a flow progress

How

A distributed system consists of a client (potentially many) and workers. A client (the code that has the engine) runs a flow. When the client wants to start a new task it makes RPC call/s to workers and passes client's endpoint and task's arguments. One of the workers accepts the task and sends a confirmation to the client. Then it starts to execute the task and sends heartbeats to the client. The client listens for the workers responses (status updates and so-on) during this period. When the task is done the worker sends a result. The client considers worker as failed if it hadn't been receiving a task status message for a timeout period.

A high-level architecture can be seen in the following image:

Distributed Flow implementation

We don't want to make any difference between distributed and non-distributed engine/flow descriptions (making it as transparent as possible to users). The difference should be only in flow engines types. Distributed engine should work very similar to a single threaded engine. But instead of TaskAction the DistributedTaskAction should be used (an internal implementation detail of an engine). The DistributedTaskAction should make an RPC call instead of running Task.execute() method. On the worker's side a special interface should be implemented for each task.

   class MyTaskEndpoint(object):
       target = messaging.Target(topic='MyTask', server='server1')
       def execute(self, client_endpoint, task_args):
            RPCClient.register_listener(client_endpoint)
            task = tasks.MyTask()
            result = task.execute(**task_args)
            RPCClient.send_result(result, client_endpoint)
       def revert(self, client_endpoint, task_args):
            heartbeat.register_listener(client_endpoint)
            task = tasks.MyTask()
            task.revert(**task_args)
            RPCClient.notify_reverted(client_endpoint)

This interface has its unique topic that corresponds to the name of the task. It contains execute and revert methods. These methods accept a client endpoint to send a heartbeats to the client, and task args required to execute the task.

@@ Line 20: / Line 20: @@
 A distributed system consists of a client (potentially many) and workers. A client (the code that has the engine) runs a flow. When the client wants to start a new task it makes RPC call/s to workers and passes client's endpoint and task's arguments. One of the workers accepts the task and sends a confirmation to the client. Then it starts to execute the task and sends heartbeats to the client. The client listens for the workers responses (status updates and so-on) during this period. When the task is done the worker sends a result. The client considers worker as failed if it hadn't been receiving a task status message for a timeout period.
-A high-level architecture can be seen in the following image:
+'''A high-level architecture can be seen in the following image:'''
 [[File:Distributed flow with oslo.messaging.rpc.png]]

Difference between revisions of "Distributed Task Management With RPC"

Revision as of 22:05, 10 March 2014

Contents

Architecture

Definitions

How

Distributed Flow implementation