Jump to: navigation, search

Difference between revisions of "Distributed Task Management With RPC"

(Created page with " Distributed task management built on RPC can be easily integrated with OpenStack. RPC allows to build reliable and scalable system. This document proposes an architecture of ...")
 
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 +
'''Revised on:''' {{REVISIONMONTH1}}/{{REVISIONDAY}}/{{REVISIONYEAR}} by {{REVISIONUSER}}
  
Distributed task management built on RPC can be easily integrated with OpenStack. RPC allows to build reliable and scalable system. This document proposes an architecture of distributed flow that can run tasks simultaneously on multiple workers. The main goal is to provide a such architecture that will allow to replace a simple flow with a distributed flow without changing a code.  
+
'''Why: '''  This document proposes an architecture of distributed flow that can run tasks simultaneously on multiple workers (increasing scalability and reliability). The main goal here is to provide such an architecture that will allow the user to replace a local engine (previously executing with threads for example) with a distributed engine without changing any code. We do not want to make any difference between distributed and non-distributed engine/flow descriptions (making it as transparent as possible to users). The difference should be only in flow engines types. In general a distributed engine should work much the same as a single threaded engine.  
  
'''Architecture'''
+
=== Architecture ===
  
Client is a machine that runs a distributed flow.
+
==== Definitions ====
Worker is a server that executes distributed flows’ tasks.
 
Distributed task is a flow task that performs a remote procedure call.
 
Remote task is a task that runs on the worker side and executes some code to make a flow progress.
 
  
The flow runs a task that makes a remote call using a RPC Client on client side. RPC Client calls a remote task and passes task’s arguments and client’s endpoint. The worker accepts this task and returns confirmation to the Client. Then the worker launches remote task and periodically sends its status to the Client. The Client doesn’t know is a worker still alive, so it listens statuses from the Worker. When the remote task is finished, the Worker sends a result to the Client. The Сlient considers Worker as failed if it hadn’t been receiving a task status message for a timeout period.
+
; Client
 +
: a machine (or program) that runs a distributed flow
 +
; Worker
 +
: a machine (or program) that executes distributed flows’ tasks by responding to execution requests
 +
; Distributed Task
 +
: a task execution type that performs a remote procedure call to a worker
 +
; Remote Task
 +
: a task that runs on the worker side and executes some code to make a flow progress
  
[[File:Distributed flow with oslo.messaging.rpc.png|thumbnail]]
+
=== How ===
 +
 
 +
A distributed system consists of a client (potentially many) and workers. A client (the code that has the engine) runs a flow. When the client wants to start a new task it makes RPC call/s to workers and passes client's endpoint and task's arguments. One of the workers accepts the task and sends a confirmation to the client. Then it starts to execute the task and sends heartbeats to the client. The client listens for the workers responses (status updates and so-on) during this period. When the task is done the worker sends a result. The client considers worker as failed if it hadn't been receiving a task status message for a timeout period.
 +
 
 +
'''A high-level architecture can be seen in the following image:'''
 +
 
 +
[[File:Distributed flow with oslo.messaging.rpc.png]]
 +
<br /><br />
 +
 
 +
==== Details ====
 +
 
 +
'''Please visit:''' https://etherpad.openstack.org/p/TaskFlowWorkerBasedEngine

Latest revision as of 00:48, 15 March 2014

Revised on: 3/15/2014 by Harlowja

Why: This document proposes an architecture of distributed flow that can run tasks simultaneously on multiple workers (increasing scalability and reliability). The main goal here is to provide such an architecture that will allow the user to replace a local engine (previously executing with threads for example) with a distributed engine without changing any code. We do not want to make any difference between distributed and non-distributed engine/flow descriptions (making it as transparent as possible to users). The difference should be only in flow engines types. In general a distributed engine should work much the same as a single threaded engine.

Architecture

Definitions

Client
a machine (or program) that runs a distributed flow
Worker
a machine (or program) that executes distributed flows’ tasks by responding to execution requests
Distributed Task
a task execution type that performs a remote procedure call to a worker
Remote Task
a task that runs on the worker side and executes some code to make a flow progress

How

A distributed system consists of a client (potentially many) and workers. A client (the code that has the engine) runs a flow. When the client wants to start a new task it makes RPC call/s to workers and passes client's endpoint and task's arguments. One of the workers accepts the task and sends a confirmation to the client. Then it starts to execute the task and sends heartbeats to the client. The client listens for the workers responses (status updates and so-on) during this period. When the task is done the worker sends a result. The client considers worker as failed if it hadn't been receiving a task status message for a timeout period.

A high-level architecture can be seen in the following image:

Distributed flow with oslo.messaging.rpc.png

Details

Please visit: https://etherpad.openstack.org/p/TaskFlowWorkerBasedEngine