Sahara/EDP Sequences DefineJobAndExecuteTxt

@startuml actor client note right client The client in this idealized diagram performs all the steps necessary to define and execute a job in a new, empty environment.

In an environment where resources have already been defined, a client might only select and execute a job. end note participant "Job Manager Comp" as JM participant "Job Origin Comp" as JO participant "Data Discovery Comp" as DD participant "Savanna DB" as DB participant "Job Code Storage" as JC note right JC Job code storage could be a mechanism completely outside of savanna such as git or svn or something else.

If the internal savanna db is used to store raw job code, there probably is another API to write the raw code to the savanna db. end note client --> JO: POST Add job origin note right This might be an administrative function. end note JO --> DB: Store source type\n and location DB --> JO: Success JO --> client: JSON job origin object note right This may contain some id for the job origin object end note client --> JC: Write job source code note right At some point the raw job code must be written to the job code storage end note client --> DD: POST Register data source note right This might be an administrative function. end note DD --> DB: Store data source object DB --> DD: Success DD --> client: JSON data source object client --> JM: POST Create job note right This step defines the job in the savanna DB. Once the job is defined, it can be retrieved and run as needed.

The job object includes an identifer for a job origin object which allows the actual job source code to be stored separately. end note JM --> JM: Generate a job id JM --> DB: Store the job object DB --> JM: Success JM --> client: JSON job object note right This has the job id filled in end note client --> JM: POST Get list of jobs note right Maybe the client has defined multiple jobs at this point, so it asks for a list. end note JM --> DB: Request the job list DB --> JM: Return the job list JM --> client: JSON job list client --> JM: POST Execute job note right The job is specified by id end note JM --> DB: Request the job object DB --> JM: Return the job object JM --> JO: GET request the job source code note right The Job Manager passes the job origin id specified in the job object and a destination path (for example, hdfs). The Job Origin component uses plugins to copy the binary job code from the storage location to the destination path. end note JO --> JO: Copy the job source code from job code storage\nto the specified destination JO --> JM: JSON success, probably returns destination path JM --> DD: Request job and cluster configuration\nbased on the specified data source note right The job execution request specifies a data source. The job and/or the cluster may need to be configured for the job to run and make use of the specified source.

This needs more definition. What happens here? end note DD ->]: Cluster configuration? DD --> JM: Success note left Is something returned here? end note JM ->]: Submit to cluster JM --> client: JSON job execution object ..."Some time goes by, the client checks on jobs and decides to stop one "... client --> JM: GET List job instances JM --> client: JSON list of job instance objects note right What is a job instance object? A Job Execution? Something else? end note client --> JM: GET job instance status JM --> client: JSON job instance status note right What is a job instance status? end note client --> JM: POST terminate job instance JM ->]: Do something on the cluster to end the job JM --> client: JSON job instance status @enduml