Jump to: navigation, search

Sahara/EDP Sequences DefineJobAndExecuteTxt

@startuml
actor client
note right client
The client in this idealized diagram performs all the steps 
necessary to define and execute a job in a new, empty environment.

In an environment where resources have already been defined, a
client might only select and execute a job.
end note
participant "Job Manager Comp" as JM
participant "Job Origin Comp" as JO
participant "Data Discovery Comp" as DD
participant "Savanna DB" as DB
participant "Job Code Storage" as JC
note right JC
  Job code storage could be a mechanism
completely outside of savanna such as 
<b>git</b> or <b>svn</b> or something else.

  If the internal savanna db is used to store 
raw job code, there probably is another API to 
write the raw code to the savanna db.
end note
client -->  JO: POST Add job origin
note right
This might be an administrative function.
end note
JO --> DB: Store source type\n and location
DB --> JO: Success
JO --> client: JSON job origin object
note right
This may contain some id 
for the job origin object
end note
client --> JC: Write job source code
note right
At some point the raw job code
must be written to the job code storage
end note
client --> DD: POST Register data source
note right
This might be an administrative function.
end note
DD --> DB: Store data source object
DB --> DD: Success
DD --> client: JSON data source object
client --> JM: POST Create job
note right
  This step defines the job in the 
savanna DB.  Once the job is defined,
it can be retrieved and run as needed.

  The job object includes an identifer
for a job origin object which allows the
actual job source code to be stored separately.
end note
JM --> JM: Generate a job id
JM --> DB: Store the job object
DB --> JM: Success
JM --> client: JSON job object
note right
This has the job id filled in
end note
client --> JM: POST Get list of jobs
note right
Maybe the client has defined multiple 
jobs at this point, so it asks for a list.
end note
JM --> DB: Request the job list
DB --> JM: Return the job list
JM --> client: JSON job list
client --> JM: POST Execute job
note right
The job is specified by id
end note
JM --> DB: Request the job object
DB --> JM: Return the job object 
JM --> JO: GET request the job source code
note right
The Job Manager passes the job origin id
specified in the job object and a destination path
(for example, hdfs).  The Job Origin component uses
plugins to copy the binary job code from the storage
location to the destination path.
end note
JO --> JO: Copy the job source code from job code storage\nto the specified destination
JO --> JM: JSON success, probably returns destination path
JM --> DD: Request job and cluster configuration\nbased on the specified data source
note right
The job execution request specifies a data source.
The job and/or the cluster may need to be configured
for the job to run and make use of the specified source.

This needs more definition. What happens here?
end note
DD ->]: Cluster configuration?
DD --> JM: Success
note left
Is something returned here?
end note
JM ->]: Submit to cluster 
JM --> client: JSON job execution object
..."<size:12><b><i>Some time goes by, the client checks on jobs and decides to stop one</b></i></size>"...
client --> JM: GET List job instances
JM --> client: JSON list of job instance objects
note right
What is a job instance object? A Job Execution?  Something else?
end note
client --> JM: GET job instance status
JM --> client: JSON job instance status
note right
What is a job instance status?
end note
client --> JM: POST terminate job instance
JM ->]: Do something on the cluster to end the job
JM --> client: JSON job instance status
@enduml