Jump to: navigation, search

TaskFlow/Persistence/Objects, not Collections

< TaskFlow‎ | Persistence
Revision as of 08:41, 31 March 2014 by Ivan Melnikov (talk | contribs) (Implementation Notes)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

What

blueprint: https://blueprints.launchpad.net/taskflow/+spec/lb-not-collections

It is proposed to stop treating logbook object as collection of smaller objects: LogBook and FlowDetails should be made simple data-transfer objects, not containers for FlowDetails and AtomDetails respectively. Corresponding API should be moved to backends.

Why

TL;DR:

  • general sanity
  • optimization for certain tasks (monitoring, UI)
  • better cache semantics

Longer version follows.

UI and Monitoring

Imagine UI that displays information on a task. To get e.g. latest version of task state and metadata from our persistence backends application should load whole logbook, all flow details from it, and all tasks form all that flows. That can be quite costly, compared to simple query 'here is UUID, give me task details for it'

Imagine UI that wants to list all RUNNING flows. To do that via current API, it should load all logbooks, then all flow details, than all task detailss (yes, all task details for all tasks ever executed with this backend) -- even though it does not need any task details at all.

Cache

Currently, storage has a copy of flowdetail, that has a copy of all task details, which are updated and then merged into storage. This is no more than a cache -- and rather imperfect cache imo, as it relies on (mostly undocumented) persistence usage patterns and everybody pretends that it is not cache but something different.

It is suggested that caching should be made more explicit. Maybe it's better to implement cache as middleware that wraps given backend and provides same interface for backend and connection. Cache semantics (when cached data is valid, when it should be flushed, when it's stale) should be clear.

Implementation Notes

It is proposed to change persistence backend, connection and logbook objects API as follows:

  • remove __iter__ and find from LogBook and FlowDetail
  • add methods to Connection instead:
    • instead of FlowDetail methods:
      • get_atom_by_uuid(uuid) -> atom_detail
      • get_atoms_for_flow(flow_uuid) -> [atom_detail]
      • get_atoms_by_name(flow_uuid, name) -> [atom_detail] [1]
      • save_atom(atom_detail, ...) -> atom_detail [2][3]
    • instead of LogBook methods:
      • get_flow_by_uuid(uuid) -> flow_detail
      • get_flows_for_book(logbook_uuid) -> [flow_detail]
      • save_flow(flow_uuid)

When: 0.3 series

How:

  • add methods to impl_memory.Connection
  • modify storage.Storage, make engine and storage tests run
  • modify other backends one by one
  • modify backends' base class

Notes

  1. backend does not guarantee that names will be unique in the flow, API must reflect that
  2. returns updated atom detail
  3. may need O_CREATE-like option (raise exception when atom already exists)