Heat/Vision

NOTE : The information below is for discussion, notes/diagram have not been produced by the current Heat core team, and in some cases do not exactly represent the current, architecture of Heat, thus this page should be considered a work-in-progress which we will gradually refine to reflect the true plan/vision as discussions within the core team and community continues during and after the design summit --

During the Portland 2013 OpenStack Design Summit for Havana there was considerable discussion about what Heat's architecture could look like in the future. This block diagram attempts to illustrate the current thinking for how to accomplish a variety of future design goals. We plan to iterate gradually toward this vision. The new components are pictured in green. Purple means the end user host's this outside of Heat. Each are detailed below.

Alternative architecture diagram for discussion:

Changes to core components: I removed the Model Interpreter from the Heat REST API box, since the REST API should be as slick and dumb as possible. My view of the flow is like this: A Heat Template (or CFN template for the short term) is passed via the REST API and goes to the queue (AMQP). One of the Heat Engines pick up the request and start processing. The Model Processor (corresponds to Parser in current Heat architecture) reads the Heat Template's topology information, transforms this into the internal objects and derives a processing flow. Initially, CFN support will be tightly coupled with the Model Processor to have non-disruptive support for both Heat Template and CFN - the tight integration is symbolized by the "CFN" box attached to the Model Processor. In a later stage, CFN support would go to the alternative APIs (gray box).

I changed the alternative API relay box in the following way: There are now multiple "Model Translators", one for each alternate format. Each format will have its specifics, so it seems likely each will needs its own translator. Once translated into a Heat Template, the API relay will pass the Heat Template on to the native API. The whole component is also grayed out to indicate it is not part of Heat core for now, but will be drives as an add-on component. In a later stage, we can think about moving this layer closer into Heat core, maybe below the API layer.

I further changed the way the monitoring system is integrated: instead of interacting directly with a Heat Engine, it posts updates to the queue (AMQP). The update events are picked up by the right Heat Engine which then processes the update and acts accordingly.

Alternate APIs

Heat will use an Open API and a related DSL as the common expression of an orchestration. A new solution pattern will be used for compatibility with alternate template formats, allowing implementations of various emerging cloud standards. Each may implement a Model Interpreter which will expose an appropriate service API, and decompose the given template into the common format. Once the resulting template has been generated in the open DSL format, the common API is triggered by the API Relay. This is where CFN templates will be handled in the future, as well as other alternate formats, such as TOSCA, and alternate API's such as CAMP.

Model Interpreter

The concept of a Model Interpreter is introduced. This is a Heat system component that is responsible for parsing the DSL, and composing a deployment plan. It builds a graph of the deployment plan, and hands it off to the Heat Engine's Model Processor. Until the Heat Engine is ready for the new DSL, it will continue to support CFN. See CFN[1] in the diagram. Once full support for the new DSL has been added, the CFN templates will be handled in the relay component. See CFN[2] in the diagram.

Task System

We see merit to the idea of a Task System that would allow for a variety of functionality to be carried out for Heat. While Heat focuses on orchestration of resources, task flows are responsible for:

A sequence of tasks that have a start and end.
A persistent job/process (for example an Auto-Scale policy) that remains running until manually terminated.
A job to run for a specified duration (such as run this automated stress test for 2 days, then exit).

OpenStack services such as Nova or OpenStack Networking would have the option to use the Task System directly in the case where reliable messaging is essential, and a single API call to another service is not sufficient to handle the need. For example, start this new network, and attach this list of servers to it. This will start as a library within Heat, and likely graduate to Oslo upon suitable maturity. It can then be set up as a standalone service in a fault tolerant HA service configuration.

Auto-Scale

Scaling policies may be implemented in Heat. This is expected to be broken out into a standalone service in the future. Ceilometer provides metrics (events triggered upon evaluating sensor data) from running servers and alerts that are passed to one or more user-defined webhooks. The MAPE will be implemented, and the "A" and "P" stages will be handled by the user-defined CEP component.

User Defined CEP

The CEP is a user-defined Complex Event Processor that can apply arbitrary logic to determine what actions to take under various conditions, including triggering the Workflow Service, such as "add a node to this cluster", or orchestrations like "Deploy a new cluster" or combinations of each, such as "Destroy a failed cluster (workflow), and start up a new one (orchestration)". It may also send webhooks back into Ceilometer, which will be relayed to Heat, which will listen for specific scaling events in order to trigger ScaleUp and ScaleDown actions.

The CEP is not hosted by Heat. It is provided by the user as an HTTP service that can accept a webhook call. This allows users to write them in the language of their choice, and implement whatever business logic suits their needs. Some example CEP solutions could be offered to do common auto-scale Analysis and Planning actions.

The justification for creating a CEP is that you may not want to trigger a scale-up workflow every time you get an alarm event from Ceilometer. You may want to control the rate at which actions are taken. You may consider using a simple ECA pattern, or possibly a more sophisticated MAPE pattern from Autonomic Computing.