Jump to: navigation, search

Difference between revisions of "Sahara/EDP Sequences DefineJobAndExecuteTxt"

(Created page with " <nowiki> @startuml actor client participant "Job Manager Comp" as JM participant "Job Source Comp" as JS participant "Savanna DB" as DB participant "Job Code Storage" as JC n...")
 
m (Sergey Lukjanov moved page Savanna/EDP Sequences DefineJobAndExecuteTxt to Sahara/EDP Sequences DefineJobAndExecuteTxt: Savanna project was renamed due to the trademark issues.)
 
(4 intermediate revisions by one other user not shown)
Line 2: Line 2:
 
@startuml
 
@startuml
 
actor client
 
actor client
 +
note right client
 +
The client in this idealized diagram performs all the steps
 +
necessary to define and execute a job in a new, empty environment.
 +
 +
In an environment where resources have already been defined, a
 +
client might only select and execute a job.
 +
end note
 
participant "Job Manager Comp" as JM
 
participant "Job Manager Comp" as JM
participant "Job Source Comp" as JS
+
participant "Job Origin Comp" as JO
 +
participant "Data Discovery Comp" as DD
 
participant "Savanna DB" as DB
 
participant "Savanna DB" as DB
 
participant "Job Code Storage" as JC
 
participant "Job Code Storage" as JC
 
note right JC
 
note right JC
   This could be a mechanism
+
   Job code storage could be a mechanism
completely outside of savanna
+
completely outside of savanna such as  
such as <b>git</b> or <b>svn</b>.
+
<b>git</b> or <b>svn</b> or something else.
  
   If the internal savanna db
+
   If the internal savanna db is used to store  
is used to store raw job code,  
+
raw job code, there probably is another API to
there probably is another API for
+
write the raw code to the savanna db.
writing the raw code to the savanna db.
 
 
end note
 
end note
client -->  JS: POST Add job source
+
client -->  JO: POST Add job origin
JS --> DB: Store source type\n and location
 
DB --> JS: Success
 
JS --> client: JSON job source object
 
 
note right
 
note right
This may contain some id for the job source object
+
This might be an administrative function.
 +
end note
 +
JO --> DB: Store source type\n and location
 +
DB --> JO: Success
 +
JO --> client: JSON job origin object
 +
note right
 +
This may contain some id  
 +
for the job origin object
 
end note
 
end note
 
client --> JC: Write job source code
 
client --> JC: Write job source code
Line 28: Line 39:
 
must be written to the job code storage
 
must be written to the job code storage
 
end note
 
end note
 +
client --> DD: POST Register data source
 +
note right
 +
This might be an administrative function.
 +
end note
 +
DD --> DB: Store data source object
 +
DB --> DD: Success
 +
DD --> client: JSON data source object
 
client --> JM: POST Create job
 
client --> JM: POST Create job
note right  
+
note right
   Job creation includes an identifer
+
   This step defines the job in the
for a job source object. This job source
+
savanna DB.  Once the job is defined,
object identifier is stored with the job
+
it can be retrieved and run as needed.
object.
 
  
   The Job Manager component needs some
+
   The job object includes an identifer
method to combine information in the job  
+
for a job origin object which allows the
object (name, type, additional path information?)
+
actual job source code to be stored separately.
with the job source object to produce a unique
 
locator for the raw job code.
 
 
end note
 
end note
 +
JM --> JM: Generate a job id
 +
JM --> DB: Store the job object
 +
DB --> JM: Success
 
JM --> client: JSON job object
 
JM --> client: JSON job object
 
note right
 
note right
Line 47: Line 65:
 
client --> JM: POST Get list of jobs
 
client --> JM: POST Get list of jobs
 
note right
 
note right
Maybe the client has multiple jobs at
+
Maybe the client has defined multiple  
this point, so it asks for a list
+
jobs at this point, so it asks for a list.
 
end note
 
end note
 +
JM --> DB: Request the job list
 +
DB --> JM: Return the job list
 
JM --> client: JSON job list
 
JM --> client: JSON job list
 
client --> JM: POST Execute job
 
client --> JM: POST Execute job
Line 55: Line 75:
 
The job is specified by id
 
The job is specified by id
 
end note
 
end note
JM --> JS: Request the job source object
+
JM --> DB: Request the job object
JS --> JM: Return the object
+
DB --> JM: Return the job object  
JM --> JM: Generate unique locator based on\n the job object and job source object
+
JM --> JO: GET request the job source code
JM --> JC: Request the raw job code
+
note right
JC --> JM: Return the code
+
The Job Manager passes the job origin id
JM --> JM: Configure the job for the specified data sources
+
specified in the job object and a destination path
 +
(for example, hdfs).  The Job Origin component uses
 +
plugins to copy the binary job code from the storage
 +
location to the destination path.
 +
end note
 +
JO --> JO: Copy the job source code from job code storage\nto the specified destination
 +
JO --> JM: JSON success, probably returns destination path
 +
JM --> DD: Request job and cluster configuration\nbased on the specified data source
 +
note right
 +
The job execution request specifies a data source.
 +
The job and/or the cluster may need to be configured
 +
for the job to run and make use of the specified source.
 +
 
 +
This needs more definition. What happens here?
 +
end note
 +
DD ->]: Cluster configuration?
 +
DD --> JM: Success
 +
note left
 +
Is something returned here?
 +
end note
 
JM ->]: Submit to cluster  
 
JM ->]: Submit to cluster  
 
JM --> client: JSON job execution object
 
JM --> client: JSON job execution object
 +
..."<size:12><b><i>Some time goes by, the client checks on jobs and decides to stop one</b></i></size>"...
 +
client --> JM: GET List job instances
 +
JM --> client: JSON list of job instance objects
 +
note right
 +
What is a job instance object? A Job Execution?  Something else?
 +
end note
 +
client --> JM: GET job instance status
 +
JM --> client: JSON job instance status
 +
note right
 +
What is a job instance status?
 +
end note
 +
client --> JM: POST terminate job instance
 +
JM ->]: Do something on the cluster to end the job
 +
JM --> client: JSON job instance status
 
@enduml
 
@enduml
 
</nowiki>
 
</nowiki>

Latest revision as of 15:41, 7 March 2014

@startuml
actor client
note right client
The client in this idealized diagram performs all the steps 
necessary to define and execute a job in a new, empty environment.

In an environment where resources have already been defined, a
client might only select and execute a job.
end note
participant "Job Manager Comp" as JM
participant "Job Origin Comp" as JO
participant "Data Discovery Comp" as DD
participant "Savanna DB" as DB
participant "Job Code Storage" as JC
note right JC
  Job code storage could be a mechanism
completely outside of savanna such as 
<b>git</b> or <b>svn</b> or something else.

  If the internal savanna db is used to store 
raw job code, there probably is another API to 
write the raw code to the savanna db.
end note
client -->  JO: POST Add job origin
note right
This might be an administrative function.
end note
JO --> DB: Store source type\n and location
DB --> JO: Success
JO --> client: JSON job origin object
note right
This may contain some id 
for the job origin object
end note
client --> JC: Write job source code
note right
At some point the raw job code
must be written to the job code storage
end note
client --> DD: POST Register data source
note right
This might be an administrative function.
end note
DD --> DB: Store data source object
DB --> DD: Success
DD --> client: JSON data source object
client --> JM: POST Create job
note right
  This step defines the job in the 
savanna DB.  Once the job is defined,
it can be retrieved and run as needed.

  The job object includes an identifer
for a job origin object which allows the
actual job source code to be stored separately.
end note
JM --> JM: Generate a job id
JM --> DB: Store the job object
DB --> JM: Success
JM --> client: JSON job object
note right
This has the job id filled in
end note
client --> JM: POST Get list of jobs
note right
Maybe the client has defined multiple 
jobs at this point, so it asks for a list.
end note
JM --> DB: Request the job list
DB --> JM: Return the job list
JM --> client: JSON job list
client --> JM: POST Execute job
note right
The job is specified by id
end note
JM --> DB: Request the job object
DB --> JM: Return the job object 
JM --> JO: GET request the job source code
note right
The Job Manager passes the job origin id
specified in the job object and a destination path
(for example, hdfs).  The Job Origin component uses
plugins to copy the binary job code from the storage
location to the destination path.
end note
JO --> JO: Copy the job source code from job code storage\nto the specified destination
JO --> JM: JSON success, probably returns destination path
JM --> DD: Request job and cluster configuration\nbased on the specified data source
note right
The job execution request specifies a data source.
The job and/or the cluster may need to be configured
for the job to run and make use of the specified source.

This needs more definition. What happens here?
end note
DD ->]: Cluster configuration?
DD --> JM: Success
note left
Is something returned here?
end note
JM ->]: Submit to cluster 
JM --> client: JSON job execution object
..."<size:12><b><i>Some time goes by, the client checks on jobs and decides to stop one</b></i></size>"...
client --> JM: GET List job instances
JM --> client: JSON list of job instance objects
note right
What is a job instance object? A Job Execution?  Something else?
end note
client --> JM: GET job instance status
JM --> client: JSON job instance status
note right
What is a job instance status?
end note
client --> JM: POST terminate job instance
JM ->]: Do something on the cluster to end the job
JM --> client: JSON job instance status
@enduml