Jump to: navigation, search

Difference between revisions of "Lifeless/ArchitectureThoughts"

(What does our architecture say today?)
(start pulling in thoughts from the nova bug review)
Line 1: Line 1:
I'm gathering notes and thoughts about our big picture architecture at the moment, seeking to understand how we ended up with things like rpc-over-rabbit (which is a really bad fit for RPC), or the nova-compute -> neutron-api-> neutron-ovs-agent -> nova-api -> nova-compute VIF plugging weirdness.
+
I'm gathering notes and thoughts about our big picture architecture at the moment, seeking to understand how we ended up with things like rpc-over-rabbit (which is a really bad fit for RPC), or the nova-compute -> neutron-api-> neutron-ovs-agent -> nova-api -> nova-compute VIF plugging weirdness. More importantly I want to help OpenStack deliver features more quickly, be more resilient and robust, and perform better, all at once :). I've done this before with Launchpad, but OpenStack is a rather bigger project :).
  
 
It's a group effort - current collaborators [Joe Gordon, Gus Lees, Matthew Treinish, Joshua Harlow]. Ideally we'd have someone familiar with all the early projects and their split-outs to aid with insight.... drop me a mail if you're interested! [or just dive in and annotate these thoughts].
 
It's a group effort - current collaborators [Joe Gordon, Gus Lees, Matthew Treinish, Joshua Harlow]. Ideally we'd have someone familiar with all the early projects and their split-outs to aid with insight.... drop me a mail if you're interested! [or just dive in and annotate these thoughts].
Line 19: Line 19:
 
== What does our architecture say today? ==
 
== What does our architecture say today? ==
  
We have some top level tenets - [[BasicDesignTenets]] - but there's no supporting material about the tradeoffs involved, and we've not followed the tenets in a lot of recent code. Further, with the split out of teams the existing guidance doesn't address the impact of integration points at all.
+
We have some top level tenets - [[BasicDesignTenets]] - but there's no supporting material about the tradeoffs involved, and we've not followed the tenets in a lot of recent code. Further, with the split out of teams the existing guidance doesn't address the impact of integration points at all. We have hacking guidelines (http://docs.openstack.org/developer/hacking/, https://wiki.openstack.org/wiki/CodingStandards) but they don't address many important issues, such as working with external resources, data integrity in a distributed system (e.g. cinder volumes and nova vm's, or ports and vm's).  
  
 
== What should it say? ==
 
== What should it say? ==
Line 38: Line 38:
 
* http://12factor.net/ - great resource for cloudy apps, much of which is relevant to OpenStack's API services themselves.
 
* http://12factor.net/ - great resource for cloudy apps, much of which is relevant to OpenStack's API services themselves.
 
* https://plus.google.com/+RipRowan/posts/eVeouesvaVX and https://plus.google.com/110981030061712822816/posts/AaygmbzVeRq - Steve Yegge on platforms and design. Highly entertaining.
 
* https://plus.google.com/+RipRowan/posts/eVeouesvaVX and https://plus.google.com/110981030061712822816/posts/AaygmbzVeRq - Steve Yegge on platforms and design. Highly entertaining.
 +
 +
== Process for building an architecture ==
 +
 +
Read our code and our bugs, think hard, discuss with the greybeards in our projects, write it down.
 +
 +
== Data gathering ==
 +
 +
* [https://docs.google.com/spreadsheets/d/1u_lTrpZAO8NA44tNgaIMGVubiNQUPXIkNIhZDzZmUhY/edit?usp=sharing Systematic review of nova bugs]:
  
 
== Grab bag of ideas to follow up ==
 
== Grab bag of ideas to follow up ==
Line 43: Line 51:
 
These ideas are not yet *very* categorised or thought through - caveat emptor.
 
These ideas are not yet *very* categorised or thought through - caveat emptor.
  
=== Proposed structural concepts ===
+
=== structural concepts ===
 +
 
 +
These are possibly structural concepts we might bring in (with what that means varying per thing) to make the system more resilient/dynamic/robust.
  
 
* Use a name service rather than static configuration.
 
* Use a name service rather than static configuration.
Line 75: Line 85:
 
  * Deploying new processes isn't atomic
 
  * Deploying new processes isn't atomic
 
  * Include configuration data
 
  * Include configuration data
 +
* Try to structure things so that mistakes in the use of some data field or code error rather than doing the wrong action. For instance, the Nova VM state of (DELETED, RUNNING) isn't valid and updates that create that situation should error.
  
== Concrete things we might work on ==
+
=== Concrete things we might work on ===
 +
 
 +
These are specific projects that would shift our design towards some of the structural things above.
 +
 
 +
* Nova VM_state+task_state validation layer.
 
* Pervasive *service-only for now* availability/liveness service (should be no polling or periodic database writes with information about liveness) [JH]
 
* Pervasive *service-only for now* availability/liveness service (should be no polling or periodic database writes with information about liveness) [JH]
 
   single purpose service, scaled out
 
   single purpose service, scaled out
Line 99: Line 114:
 
* Secured/verifiable/signed (something...) RPC messages (it's taken to long...) [JH]
 
* Secured/verifiable/signed (something...) RPC messages (it's taken to long...) [JH]
 
* Systematic tracing (osprofiler or other...)
 
* Systematic tracing (osprofiler or other...)
 
== Data gathering ==
 
 
* [https://docs.google.com/spreadsheets/d/1u_lTrpZAO8NA44tNgaIMGVubiNQUPXIkNIhZDzZmUhY/edit?usp=sharing Systematic review of nova bugs]:
 

Revision as of 03:28, 16 February 2015

I'm gathering notes and thoughts about our big picture architecture at the moment, seeking to understand how we ended up with things like rpc-over-rabbit (which is a really bad fit for RPC), or the nova-compute -> neutron-api-> neutron-ovs-agent -> nova-api -> nova-compute VIF plugging weirdness. More importantly I want to help OpenStack deliver features more quickly, be more resilient and robust, and perform better, all at once :). I've done this before with Launchpad, but OpenStack is a rather bigger project :).

It's a group effort - current collaborators [Joe Gordon, Gus Lees, Matthew Treinish, Joshua Harlow]. Ideally we'd have someone familiar with all the early projects and their split-outs to aid with insight.... drop me a mail if you're interested! [or just dive in and annotate these thoughts].

I'm not using etherpad because its too hard to track deltas there - its great for realtime collaboration, not so much for evolving over weeks/months.

Goals of an architecture

IMO an architecture isn't a control mechanism - its a planning tool: it makes the context for decisions explicit, and articulates the broad principles involved so that we can all be discussing our decisions in a shared context. I wrote a presentation when I was back at Canonical that aimed to do that for Launchpad (part of the LP Architecture Guide - its not perfect and I'd do things a little differently now, but I think its also a pretty good model: in short every developer is making architectural decisions, and its only through shared understandings that we can effectively raise the bar on quality and consistency -> enforcement is way to hard, a control based strategy will inevitable fail (usually by not having enough control resources).

A good architecture needs then to be rooted in the desires and needs our users have for OpenStack, needs to explain what structures and designs will help us deliver those user concerns effectively, and needs to be updated as the world evolves. It needs to act as a blueprint a level above that of the design for any specific component or feature. It needs to help us choose between concerns like robustness and performance. Questions like the use of ACID or BASE in our data storage design can only be answered when we have broad goals like 'support 1000 API requests/second in a single cell without specialist servers' - and so that ties back into our understanding of our users needs.

What do our users want

  • User survey 2014: "Specifically drawing out the comments around neutron, over 61% of them were general concerns, including performance, stability and ease of use.High Availability ranked second (11%), with SDN use cases and IPv6 requirements following (7%) ... On the more technical side, the modular architecture was seen to provide a big advantage. Flexible, fixable, self-service, extensible, modifiable, adaptable, hardware and vendor agnostic, interoperable were key words commonly sighted.

The API becoming a defacto standard and the good user experience were also positively mentioned."

What does our architecture say today?

We have some top level tenets - BasicDesignTenets - but there's no supporting material about the tradeoffs involved, and we've not followed the tenets in a lot of recent code. Further, with the split out of teams the existing guidance doesn't address the impact of integration points at all. We have hacking guidelines (http://docs.openstack.org/developer/hacking/, https://wiki.openstack.org/wiki/CodingStandards) but they don't address many important issues, such as working with external resources, data integrity in a distributed system (e.g. cinder volumes and nova vm's, or ports and vm's).

What should it say?

  • The basic characteristics we want our code to have, along with their relative importance. E.g. robust, fast, scale-out, secure, extensible
  • Concrete patterns (and anti-patterns) we can follow that will help deliver such characteristics
  • Ways in which we can assess a project / component / feature to see if it is aligned with those characteristics
  • WHY each of these things is important / relevant / chosen - so that we can update it in future without repeating ourselves

What should it not say?

  • Use component X - OpenStack values flexability and a broad ecosystem. Requiring specific components at the very highest level doesn't fit with our community. Testing concerns aside, keeping that flexability is broadly good - but care needs to be applied to avoid bad abstractions that don't fit our needs.

Inspiration thats worth reading

Process for building an architecture

Read our code and our bugs, think hard, discuss with the greybeards in our projects, write it down.

Data gathering

Grab bag of ideas to follow up

These ideas are not yet *very* categorised or thought through - caveat emptor.

structural concepts

These are possibly structural concepts we might bring in (with what that means varying per thing) to make the system more resilient/dynamic/robust.

  • Use a name service rather than static configuration.
 Permits real-time adjustment to changing deployments (JH: application configuration as well?)
 May bootstrap via static configuration.
  • Single purpose services. E.g. quota, service status, ...
 Permits scaling out individual hotspots
  • No single-instance daemons : integrate with a quorum service.
 Avoids restarting services and the attendant burst of work when a component has failed.
  • Crash-safe processes: a crash at any point must not unrecoverably leak resources *or* cause expensive reconciliations
  Deals with power failures, bugs, and fat fingered admins gracefully
  • Persist in-progress non-idempotent work using some form of WAL.
 May need both node-local or centralised, or some combination thereof. Different than dealing with crashes because non-crash situations can interrupt things - e.g. a very slow migration combined with a security update being rolled out
  • Use direct RPC calls - process to process
  Allows scaling within a single node (e.g. servers with 4K CPUs may need multiple nova-compute processes running to deal with load) cleanly
  • Use message bus for fire-and-forget operations.
 The designed use case (and they are very good at this)
  • Stateless daemons: if a node dies, another machine can take over immediately without needing an expensive warm-up period or access to information that was only held by the node that died
  The exception being daemons whose job *is* state storage [swift and cinder have daemons with this job]
  • Keep work close to the data where possible
  Avoid unnecessary network traffic
  • Build timeouts and load thresholds into everything
 Deal with the reality that networks and datacentres are fragile.
  • Assume the network is hostile
 Encrypt and sign all network traffic by default, require opt-out.
  • Design and implement with debugging and operations as key use cases
* How does one reproduce a failure
* Do we dump enough diagnostic details
* Can the request path through the DC be traced?
  • All interfaces with other processes versioned
* Deploying new processes isn't atomic
* Include configuration data
  • Try to structure things so that mistakes in the use of some data field or code error rather than doing the wrong action. For instance, the Nova VM state of (DELETED, RUNNING) isn't valid and updates that create that situation should error.

Concrete things we might work on

These are specific projects that would shift our design towards some of the structural things above.

  • Nova VM_state+task_state validation layer.
  • Pervasive *service-only for now* availability/liveness service (should be no polling or periodic database writes with information about liveness) [JH]
 single purpose service, scaled out
  • HA coordinator (that automatically can resume work unfinished, using WAL or other...) (is heat + convergence becoming this?) this HA coordinator should be easily shardable to scale out (or it should be easily able to acquire work to do from some pool, which will autorelease back to the pool if it crashes, to be resumed by another HA coordinator) [JH]
 -- Not sure that this is actually needed, it seems to me that its a shorthand for a number of key structural changes which I've tried to capture above. In particular a single big coordinator runs the risk of centralising all our logic into one big ball of twine. Some things like live migration may work better with a third service coordinating, but since we have to have a point to point link anyway, we don't seem to gain a lot. Any single location could fail and we need to pick the work up again. If either of the computes has failed, we have to get it back up again to resume that local work.
 -- JH: understood and u are probably right although 'one big ball of twine' could be anything developed by anyone (it seems we are already pretty good in openstack at developing twine, haha); we control the ability to make big balls of twine so I'd hope we could do it correctly (and not create said big one); most of the projects just manipulate resources and reserving, preparing them, returning them back to the user so it sorta feels odd that we have so many projects that do the same thing (with different types of resources); if we imagined the coordination/smarts of that manipulation was in some coordinator then the projects just become the nice driver API (or something similar) that is exposed; perhaps this isn't reality/possible (likely isn't) but it's a nice thought :-P
  • Capability monitor/service (knows capabilities of resources in cloud) - likely tied to scheduler service (but may not be) [JH]
 -- Gant is heading in this direction I think; good case of a single purpose service
  • Resource reservation before allocation (not during, or piecemeal); build a manifest in HA coordinator; reserve all resources in manifest; then allocate/build resources in manifest; then give over control of those allocated/built resources to user (in that order and iff allocation/building succeeds) --- all of this is done in the HA coordinator (which uses API's of downstream services to as needed, those API's should be simple and avoid doing complicated operations, as that is the HA coordinators job to do complex things) -- have each downstream service 'do one thing well' (and leave complexity elsewhere for tying things together) [JH]
 -- I think this would be a fine thing to do. I don't think it has any systemic lessons though - can we generalise it?
 -- JH: will think of some way to generalize it (the concept I guess is to reserve as much as u can, across services before doing much else, instead of having disparate services where u reserve something at one place, do something, then send to next service, it reserves some more stuff, and so-on; making the whole reservation workflow sorta wonky/hard to figure out/understand).
  • Scheduler service (likely connected to capability service in some manner); we should encourage experimentation/research/optimization here [JH]
 -- Gantt.
  • Protobufs for RPC rather than home-brew- still need a layer above to consolidate domain code.
  • Implement a direct RPC facility - perhaps building on the 0mq layer in oslo.messaging, perhaps a new backend with e.g. HTTP+protobufs
  • WAL journalling of local work to survive restarts
 * migrations / resize - any non-idempotent operation
 * https://review.openstack.org/#/c/147879/ (is one such approach)
  • No singletons
  * nova scheduler
  * are there others left? Perhaps some cinder bits? (JH: cinder-volume-manager is still reliant on file-locks and can't be scaled out to 1+ manager at the current time)
  • Secured/verifiable/signed (something...) RPC messages (it's taken to long...) [JH]
  • Systematic tracing (osprofiler or other...)