Heat/Blueprints/hot-software-config-ibm-response

A few us at IBM studied Steve Baker's proposal on HOT Software Configuration Heat/Blueprints/hot-software-config. The following summarizes our thoughts.

HOT Software Config

Overall the proposed hscp constructs and syntax are great. We would like to propose a few minor extensions that help with better expression of dependencies among components and resources, and in-turn enable cross-vm coordination.

We note that while resource types are defined in code and used in templates, components are both defined and used in templates. To avoid confusion in templates, it is helpful to be clear about the distinction between definitions vs uses of components. Components are roughly analogous to procedures in a programming language. Seen in this light, we think the following extensions to the proposal would be helpful.

E1: Signatures for component definitions:

Require component definitions to explicitly declare their inputs and outputs. Just as declaring the signature of a procedure in a programming language is helpful, so it is helpful to declare the signature of a component. Such an explicit signature definition helps

validate (name matching) component invocations with component signatures
component providers to use the names from the signature to read/write these values (from metadata service)

Example: In the following example, a Bash script is used for configuration and hence the type of the component provider is ShellBash. A scheme for exposing the inputs/outputs in the shell script is shown. In this scheme the ShellBash provider exposes the inputs and the names of the outputs as environment variables. Component providers (CM tools) can choose any convention and syntax that suits their tools and culture. For example, a Heat::Chef provider, can choose to expose the inputs/outputs as Chef node attributes, via the node[][] hash.

install_mysql:
  type: Heat::ShellBash

  # inputs
  properties:
    - enabled_flag
    - ensure_run_flag

  # outputs
  attributes:
    - mysql_port

  # path to script file
  config-file: /opt/scripts/install_mysql.bash

And, the install_mysql.bash script can access the inputs enabled_flag and ensure_run_flag, and the name of the output mysql_port as environment variable as shown in the following snippet:

#!/bin/bash
# install mysql
# ....
# ShellBash provider makes the inputs and names of the outputs available as environment variables
# start mysql -- access inputs from environment vars
port = start_service $enabled_flag $ensure_run_flag
# assign value ($port) to output variable ($mysql_port) by storing in metadata service
store_output $mysql_port $port

E2: Allow usage of component outputs (similar to resources):

There are fundamental differences between components and resources, but there are also similarities in terms of attributes (or properties). The extension is to allow components and their inputs/outputs to be used or referred to in a way similar to how properties/attributes of a resource are used.

There are a few options for how to refer to the outputs of a component invocation. Three of them are described below.

Opt1: use an extension of the syntax for get_attr to refer to the output

 get_attr[resource_name, component_name, output_name]

this option is illustrated in the example below.

[Example-Opt1]

components:
  install_mysql:
   .... defined same as previous example ...

  configure_app :
    type: Heat::Puppet
    # inputs
    properties:
      - db_server:
      - db_port:
    # outputs: none
    attributes:
      # none
    config-file: config_app.pp

resources:
  db_server:
    type: OS::Nova::Server
    properties:
      components:
        - name: install_msql
          params:
            enabled_flag: 'true'
            ensure_run_flag: 'true'

  app_server:
    type: OS::Nova::Server
    properties:
      components:
      - name: configure_app
        params:
          db_server:
            get_attr: [db_server, first_address]
          # assign db_port the value produced by (output of) the install_mysql component
          # Notice the proposed syntax for accessing the output of a component:
          # get_attr : [resource_name, component_name, output_name]
          db_port:
            get_attr: [db_server, install_mysql, mysql_port]

Opt2: let the outputs of a component invocation become additional attributes of the relevant resource.

In the previous example (Example-Opt1), this would mean that the invocation of install_mysql at the db_server resource causes that resource to gain an attribute named mysql_port(the output of install_mysql), and it would be accessed using the standard (two-argument) get_attr: [resource_name, attr_name] construct. This is shown in the following snippet.

[Example-Opt2]

app_server:
    type: OS::Nova::Server
    properties:
      components:
      - name: configure_app
        params:
          db_server:
            get_attr: [db_server, first_address]
          # assign db_port the value produced by (output of) the install_mysql component
          # Notice that we are accessing the ouptut of the install_mysql compoenent of resource db_server
          # as an attribute of db_server
          db_port:
            get_attr: [db_server, mysql_port]

Opt3: Indirect access to component outputs via binding variables.

As illustrated in the example below (Example-Opt3), we would use a variable to bind the output from a component to the input of another. In the example, mport is this binding variable. The install_mysql component invocation for db_server would bind enabled_flag=true, ensure_run_flag=true, mysql_port=mport. The configure_app component invocation for app_server would bind db_port=mport.

[Example-Opt3]

# declare a  binding variable mport for passing around the output of install_mysql component to app_server
# I am not sure what the syntax of this would be ... any ideas/suggestions?
binding_variables:
  mport

resources:  
  db_server:
    type: OS::Nova::Server
    properties:
      components:
        - name: install_msql
          params:
            enabled_flag: 'true'
            ensure_run_flag: 'true'
            # this is the mapping of output
            mysql_port : mport

  app_server:
    type: OS::Nova::Server
    properties:
      components:
      - name: configure_app
        params:
          db_server:
            get_attr: [db_server, first_address]
          # assign db_port the value produced by (output of) the install_mysql component
          # use the variable mport used for binding this value
          db_port: mport

Pros and cons of the three options:

Opt1: get_attr[resource_name, component_name, output_name]

pros: (1) simple and explicit reference to the output of a component; (2) easy to analyze and validate
cons: introduces an extension to the get_attr syntax

Opt2: get_attr[resource_nanme, component_output_name]

pros: does not require an extension to the get_attr syntax
cons: (1) introduces implicit attributes to a resource from its component invocations; (2) requires the names of these new attributes (component outputs) to be distinct

Opt3: indirect access to component outputs via binding variables

pros: similar to the waitcondition/handle/signal mechanism (one can map these binding variables to a combination of waitcondition/handle/signal)
cons: (1) requires additional variable declarations for passing values; (2) the connections between the use of a component ouptut as an input to another component becomes implcit through a new variable.

Based on the discussion above, Opt1 seems to have cleaner syntax and simpler semantics and hence preferable.

E3: Uniquely identifying component invocations.

Components have a single definition and multiple uses (or invocations) across or within resource(s). A given component can be invoked multiple times with a single resource. In such scenarios, we need a way to uniquely identify each invocation so that their outputs can be unambiguously referred to.

Example: Consider a scenario where we have a component InstallWASProfile defined to install WebSphere profiles and return a profile id as output. A particular Server (resource) can have multiple such profiles, and hence would use multiple invocations of InstallWASProfile. Here the Server resource needs a way to refer to the output (profile id) of each invocation of the InstallWASProfile.

In addition to the name property of a component invocation, we could add an invocation_id property. This invocation_id property could be optional and would be used only when there is a need to uniquely identify the particular invocation.

Example:

app_server:
    type: OS::Nova::Server
    properties:
      components:
      - name: InstallWasProfile
        invocation_id : install_user_profile
        params:
          user_id
      - name: InstallWasProfile
        invocation_id: install_admin_profile
        params:
          admin_id

E4: Extension to make components a stateful entity similar to resources.

Components would have a states such as UNCONFIGURED, CONFIGURING, CONFIG_COMPLETE. This state can be updated by the CM tool appropriately.

E5: Allow depends_on relations on components.

Allow components to be source(s) and/or target(s) of depends_on relations. This would enable an explicit declaration of ordering between cross-vm components, as compared to the implicit get_attr() induced dependency.

Example: R1 and R2 are compute resources. Both have components r1_1, r1_2, r1_3 and r2_1 and r2_2, r2_3. Consider the dependency "r1_2 depends_on r2_2" meaning that r2_2 has to be complete before r1_2 can execute. This could be because the automation r1_2 wants to SSH into VM R2 to complete some job but r2_1 needs to be completed because it lays down some files that r1_2 needs.

Related Features and Semantics

The following three features are already supported by the proposal. I am mentioning them here because the extensions we are proposing rely on them.

F1: Explicit assignments of component inputs in a component invocation: Component invocations explicitly map or assign values to component's inputs (identified by the component signature).
F2: Semantics of component invocation. Given an assignment of values to inputs on component invocations, the semantics of a component invocation is that the component execution starts after all its input values are available.
F3: Component providers read and write values from a global data space, which I assume is the Heat metadata service.

Benefits of the proposed extensions

B1. Validation of component invocations. Given the component signature it is easy to validate that an invocation assigns values to all the required inputs. Such validations would easily catch errors such as missing input, variable name typos, and can also be extended to type checking of the values.
B2. Component-level dependence graph. Explicit characterization of inputs/outputs of a component via its signature and the assignment of values to inputs on component invocations establish precise dependencies at the component level.
B3. One dependence system. With the component level dependences characterized via the same constructs (attributes, get_attr()) used for resource level dependences, Heat can construct and maintain a single dependence system and use it to orchestrate resource and component creation.
B4. Separation of concerns. Heat defines only the semantics that component invocations start only after all its inputs are available. It will leave to the component provider (CM tool) how the component invocations are handled (read/write of inputs/outputs) and executed. The component provider would exploit the component signature to derive the names of the inputs and outputs and read it from Heat metadata service.
B5. Flexible implementation choices. Heat engine is free to use any mechanism to determine when a component invocation's inputs are available and hence can be started, and also get notified when their execution completes. For example, an initial implementation could use the wait-conditions and signals for this. The important thing is this mechanism is not exposed to the template user -- the template user only knows about and relies on the guarantee that a component invocation will start only after its inputs are available.