Jump to: navigation, search

Heat/PolicyExtension

< Heat
Revision as of 16:12, 28 April 2014 by Mike Spreitzer (talk | contribs) (Hypothetical API)

Policy Extension to Heat Templates

This page describes a proposed extension to Heat templates allowing expression of various sorts of policies relevant to infrastructure. First is a description of the information model, which focuses on what is being said without getting into details of concrete syntax. Next are some use cases. Next is a proposal for concrete syntax. Lastly there are some examples using the concrete syntax.

A Word About Networking

You will see relatively little concern in this document's policy catalog for network resources. The policy catalog here is a fairly direct dump of work my group has done earlier, using OpenStack but not Heat. In this work, the focus is on declaring what is needed about and between endpoints, with the expectation that the template processor will invent network resources as appropriate to get those jobs done. The ability to apply policies to and between groups enables concise expression of many atomic relationships.

In the roadmap to a policy-enabled future, we may not want to start with a big shift in the way networking is handled; however, other parties are also interested in making this shift. The policy-enabling roadmap starts with compute-only work, and later encompasses networking; it remains to be seen what the networking landscape will look like at that time.

Information Model

Generalities

This extension introduces three concepts in templates: groups, policies, and relationships.

This extension also introduces a meta-model for the datacenters in which templates will be instantiated, to support the semantics of some particular types of policies as well as to support smart placement in general.

Datacenter Meta-Model

A datacenter is modeled as a pair of trees that share leaves. The leaves are the physical resources that can host the non-networking virtual resources (such as VM instances and storage volumes) that are prescribed by templates. One tree describes a hierarchy of physical location; the other tree is an approximation of the network connectivity. Neither tree needs to be of uniform depth. Each vertex in the location tree is labeled with a level, drawn from a vocabulary established by the datacenter or some more central authority. Open question: who defines these vocabularies? The levels stand in a partial order; we say some levels are more fine grained or more coarse grained than other levels. The usage of levels in the location tree must be consistent with this partial order. Thus, the root might be labeled with level=room while a leaf might be labeled with level=PM. The ordering is only partial because some levels might be incommensurate. For example, level=PM might host only VMs while level=StorageBox might host only storage volumes.

We use a tree to approximate the network not because we believe datacenter networks really are trees but rather in hope that a tree is a good enough approximation for our management purposes here.

To support placement with an eye toward avoiding overloads, each physical resource that can host virtual resources is characterized by a capacity vector. Each virtual resource that can be hosted on such a physical resource has a demand vector of the same length as the host's capacity vector; this demand vector is essentially the allocation of capacity that is planned when the virtual resource is placed. The non-overload constraint is that the sum, of all the virtual resources placed in a host, of their demand vectors cannot exceed the host's capacity vector; this test must pass at each position in the vectors. The particulars of the demand and capacity vectors used in the core placement algorithm can be synthetic things invented to encode various surface aspects of the placement problem. The exposed and implicit components of these vectors are defined by the datacenter or a more central authority. Open question: who?

To support network overload avoidance, each edge in the network tree has a bandwidth capacity.

To support license types that need it, the model of a physical machine identifies the type/configuration of the machine in enough detail to allow correct calculation of license charges for software on that machine.

Groups, Policies, and Relationships

Groups are used to group the resources of a template together for concise application of policies and relationships. The resources of the template are partitioned among some groups, and those groups are members of other groups, and so on. In a given template, the groups are arranged into a tree and the leaf groups contain the resources. The tree need not be of uniform depth.

There are different types of policies, each type with a specific meaning and possibly taking type-specific parameters.

Each policy is either hard or soft --- that is, a requirement or a preference.

A policy type is either primitive or synthetic. A synthetic policy is a shorthand for a type-specific assembly of other policies.

A primitive policy is something that, in its essence, is about one or two resources. A primitive policy type is either monadic, meaning its essential statement is about a single resource, or dyadic, meaning its essential statement is a relationship between two resources. For a monadic example, consider a type of policy that takes an identifier of a licensed product as a parameter and declares the presence of that product in the resource. For a dyadic example, an anti-collocation policy might take a physical hierarchy level parameter and be applied to two VMs to mean that their locations should differ at the given level. Some dyadic types of policies are symmetric, others are not. For example, an anti-collocation policy is symmetric; an endpoint accessibility policy (i.e., firewall rule) is asymmetric.

While a primitive policy's essence is about a resource or two, the actual applications of policies can be at higher levels of organization. Each higher level application of a primitive policy is a shorthand for many lower level application of the policy. There are two high level ways to apply a policy. One is to apply a policy to a group; you would not want to do this for an asymmetric primitive policy. The other, which makes sense only for dyadic policies, is to apply a policy to a pair of groups.

Open Question: who groups exterrnal resources? A dyadic policy in a template can relate one of the template's resources with a pre-existing resource outside the template. What about groups of external resources? Do we allow a template to define groups of external resources, or allow access to groups defined in existing stacks?

Applying a monadic primitive policy to a group is shorthand for applying the policy to every relevant member of the group. Thus, after recursion, this amounts to applying the policy's essential statement to every relevant resource covered by the group. A given policy is relevant to only certain types of resources. For example, a policy declaring presence of a licensed product might be relevant only to Compute instances.

Applying a symmetric dyadic primitive policy to a group is a shorthand for applying the policy to every distinct unordered pair of relevant members of the group. For example, applying a dyadic policy to a group of the resources {A, B, C} is shorthand for applying that policy to {A, B}, to {B, C}, and to {C, A} (if they are all relevant). For a more complicated example, consider a tree of three groups: GP has members GL and GR, while GL has members {R1, R2} and GR has members {R3, R4}. Applying a dyadic policy to group GP is shorthand for applying that policy to the unordered pair {GL, GR}.

Applying a symmetric dyadic primitive policy to an unordered pair of groups {A, B} is equivalent to applying that policy to the ordered pair (A, B) and also equivalent to applying it to (B, A).

Applying a dyadic primitive policy to an ordered pair (A, B), where A and B are either groups or resources, is shorthand for applying that policy to every ordered pair of (X, Y) where (1) X is either A and relevant (if A is a resource) or a member of A (if A is a group) and (2) Y is either B and relevant (if B is a resource) or a member of B (if B is a group). For example, if group GL's relevant members are resources R1 and R2 while group GR's relevant members are resources R3 and R4, applying a dyadic primitive policy to (GL, GR) amounts to applying the policy to (R1, R3), to (R1, R4), to (R2, R3), and to (R2, R4).

How Policies Combine

Policies combine in the natural way for their semantics. Orthogonal policies combine by conjunction. Policies that allow networking combine by disjunction. Allowing connectivity implicitly denies related connectivity that is not explicitly allowed. That is, when a policy allows some network connectivity from resource A in stack S to resource B in stack T, all the networking from S to T is denied except that which is explicitly allowed by some policy.

Catalog of Policy Types

Licensed Product Presence

A policy of this monadic primitive type takes two string parameters. The first identifies a catalog of licensed products, and the second is a product identifier drawn from that catalog. Note that a given catalog can contain products from many vendors. A vendor of management software typically has a catalog of products from many sources that can all be managed by the management software. For this reason and others, it is not expected that all the products in a catalog use the same type of license; the catalog identifies the applicable license type for each product.

Open Question: who defines the catalog of catalogs?

This type of policy is relevant to placement decision-making because, for some types of license, the license charges for a collection of VMs can depend on the degree to which those VMs share physical resources.

Collocation

A policy of this symmetric primitive type takes one parameter, a level L in the location tree. When applied to resources {A, B} such a policy says that A's location and B's location should be the same at level L (and thus, implicitly, every coarser grain level); whether they agree at finer grain levels is not specified. For example, when L=chassis this means that A and B should be placed in the same chassis. Note that hard collocation constraints at fine grained levels make live migration problematic, users will probably not want to use them.

Anti-Collocation

A policy of this symmetric primitive type takes one parameter, a level L in the location tree. When applied to resources {A, B} such a policy says that A's location and B's location should differ at level L (and thus, implicitly, every relevant finer grain level); whether they differ at coarser grain levels is not specified. For example, when L=PM this means that A and B should be hosted by different PMs and it does not matter how close together those PMs are.

LLMN Anti-Collocation

This is the only synthetic policy type, and it is monadic. A policy of this type is applied to a group of M interchangeable members to make a concise statement of multiple levels of anti-collocation. There are three parameters: L1 and L2 are physical hierarchy levels, and N is a positive integer. The meaning of such a policy is that the M members should be spread among at least N different location tree vertices at level L1, with no more than ceiling(M/N) covered by any one such vertex, and no two covered by the same level L2 vertex. For example, when applied to a group of 7 VMs, with L1=rack, L2=PM, and N=2, this means that the 7 VMs should be spread out across at least 2 racks, with no more than 4 VMs on any one rack and no two VMs on the same PM. The implementation is allowed to introduce inhomogeneous treatment of the members, for example to partition them into particular subgroups for placement purposes.

Network Reachability

This symmetric dyadic primitive type of policy takes no parameters. When applied to two resources it means that they can communicate on all protocols and ports.

Network Endpoint Accessibility

This asymmetric dyadic primitive type of policy takes two parameters, a set P of protocols and a set E of (protocol, port) pairs. Applying such a policy to the resource pair (A, B) means that A can initiate communication to B over any protocol in P and also to any B endpoint in E.

Network Bandwidth

This symmetric dyadic primitive type of policy takes one parameter, a number of bytes per second. Applying such a policy to resources A and B declares a network bandwidth need in each direction between A and B.

Network Hop-Count Limit

This symmetric dyadic primitive type of policy takes one parameter, a non-negative integer N. When applied to resource pair {A,B} it means that the length (in edges) of the path between A and B in the networking tree is supposed to not exceed N.

Use Cases

Multiple HA Clusters of Appservers

Consider a template that specifies a collection of 3 clusters of 4 VMs each (for a total of 12 VMs), where each VM is running an application server and all the VMs in each cluster are interchangeable. We want each cluster to be highly available. We deem a cluster to be available if at least one member is.

Suppose the location tree is rooted at the room level, with division into rack and then into individual PM.

Suppose the template includes VM instance resources named M11, ... M14, M21, ... M24, M31, ... M34.

For each cluster C, create group G${C} containing the cluster's VM instances. Put all of those groups together into one more group that is the root; call it GRoot.

For each cluster C, apply the LLMN anti-collocation constraint to group G${C} with parameters L1=rack, L2=PM, and N=2; M=4 in these cases.

Hadoop with Direct Attached Storage

A net hop count limit of zero between a VM instance and its associated storage volume means that the volume should be hosted on directly attached storage.

Concrete Syntax Proposal A

We need notation for three new concepts: (1) resource grouping, (2) a relationship between one group or resource and another, and (3) applying policies to relationships or to individual groups or resources. Here is one proposal.

Proposal A Definition

This proposal is mid-way between two extremes. It leaves the existing resources section of a HOT unchanged in overall structure and usage, while introducing a groups section to hold the definitions of the groups. One extreme would be to treat groups as another kind of resource; that would be misleading, because a group is not a subject for infrastructure orchestration and cannot be implemented in the way that is usual for a resource in today's Heat engine. The other extreme would be to remove the resources section in favor of including the resource definitions inline in their respective groups.

Groups can be nested. This nesting relationship induces a tree structure, where the resources are the leaves and the groups are the internal vertices. This is distinct from the other relationships that are explicitly constructed for the purpose of applying policies; these other relationships can involve arbitrary graph structures and are not limited to isolation in one vertex of the grouping tree.

The group definitions are nested in the same way that the groups are; the referenced resources are defined elsewhere (in the resources section) and references to them appear in the group definitions. Each resource is directly contained in exactly one group (which may, of course, be directly contained in exactly one other group, and so on --- up to the root, which is NOT contained in any group).

In the following discussion of the concrete syntax, we use JSON concepts and notation. YAML uses the same JSON data concepts and generalizes the notation. What is important here is the data contents, not the surface syntax used in the template. We do not care about the ordering of fields in a JSON Object (i.e., map). We write ( thing )? to mean that thing may or may not appear, and write ( stuff )* to mean that stuff may be repeated zero or more times.

Thus, we generalize the contents of a HOT to allow the new groups section:

HOT ::= { ... (, "groups" : Group )? ... }

where the sytnax of a group definition is:

Group ::= { "id": TID, "members": [ MemberList ] (, "metadata": JsonObject )? (, "policies": Policies )? (, "relationships": Relationships )? }

Note that a group can have general metadata attached. A member list is a list of group definitions and resource references:

MemberList ::= ( Member ( , Member )* )?
Member ::= Group | { "get_resource": TID }

A TID is a "Template ID" --- a name used in the template:

TID ::= String

In contrast, an RID is a "Resource ID" --- something handed out by Nova, Cinder, and so on:

RID ::= String

The relationships of something are expressed in a JSON list:

Relationships ::= [ ( Relationship (, Relationship )* )? ]

A relationship identifies the thing on the other end of the relationship and expresses the set of policies that apply to the relationship:

Relationship ::= { "related" : Reference, "policies": Policies }
Reference ::= { "get_resource": TID } | { "get_group": TID } | RID

The policies of a thing are expressed as a JSON list:

Policies ::= [ Policy (, Policy )* ]

A policy is expressed by identifying its type, optionally some additional type-specific properties, and optionally general metadata:

Policy ::= { "type": String (, "properties": JsonObject )? (, "metadata": JsonObject )? }

We also allow a resource to include policies and/or relationships:

Resource ::= { ... (, "policies": Policies )? (, "relationships": Relationships )? ... }

Volume attachments are a special case: they are logically relationships, but they pre-date this work and are defined as resources. We allow policies that make sense for a relationship to be attached to a volume attachment resource.

Proposal A Concrete Examples

Hadoop on Five VMs

With policies to: put the VMs on distinct hosts (OS::AntiCoLocation), make each volume the only one on its host disk (OS::VolExclusive), allow use of the BlockDeviceDriver backend for the Cinder volume (OS::VolNotMoved), and make each volume local to the VM using it (OS::NetMaxHops).

Most of the action is in the groups, but there are also some policies attached the volumes and volume attachments.

UserData elided for brevity; key name elided too.

{

 "heat_template_version" : "2013-05-23",
 "description" : "examples/vt_hadoop_5vm_anticolocateddatanodes",
 "resources" : {
   "WCA4-hadoop-datanode-2" : {
     "type" : "OS::Nova::Server",
     "properties" : {
       "user_data" : "...",
       "key_name" : "...",
       "flavor" : "m1.medium",
       "image" : "ubuntu12.10-with-be",
       "security_groups" : [ "test" ]
     }
   },
   "WCA4-hadoop-datanode-1" : {
     "type" : "OS::Nova::Server",
     "properties" : {
       "user_data" : "...",
       "key_name" : "...",
       "flavor" : "m1.medium",
       "image" : "ubuntu12.10-with-be",
       "security_groups" : [ "test" ]
     }
   },
   "WCA4-hadoop-datanode-4" : {
     "type" : "OS::Nova::Server",
     "properties" : {
       "user_data" : "...",
       "key_name" : "...",
       "flavor" : "m1.medium",
       "image" : "ubuntu12.10-with-be",
       "security_groups" : [ "test" ]
     }
   },
   "WCA4-hadoop-datanode-3" : {
     "type" : "OS::Nova::Server",
     "properties" : {
       "user_data" : "...",
       "key_name" : "...",
       "flavor" : "m1.medium",
       "image" : "ubuntu12.10-with-be",
       "security_groups" : [ "test" ]
     }
   },
   "$vtid_15" : {
     "type" : "AWS::EC2::VolumeAttachment",
     "properties" : {
       "Device" : "/dev/vdb",
       "InstanceID" : {
         "get_resource" : "WCA4-hadoop-datanode-4"
       },
       "VolumeID" : {
         "get_resource" : "Storage volume used by datanode server_0"
       }
     },
     "policies" : [ {
       "type" : "OS::NetMaxHops",
       "properties" : {
         "hops" : 0
       }
     } ]
   },
   "$vtid_9" : {
     "type" : "AWS::EC2::VolumeAttachment",
     "properties" : {
       "Device" : "/dev/vdb",
       "InstanceID" : {
         "get_resource" : "WCA4-hadoop-datanode-1"
       },
       "VolumeID" : {
         "get_resource" : "Storage volume used by datanode server_3"
       }
     },
     "policies" : [ {
       "type" : "OS::NetMaxHops",
       "properties" : {
         "hops" : 0
       }
     } ]
   },
   "$vtid_13" : {
     "type" : "AWS::EC2::VolumeAttachment",
     "properties" : {
       "Device" : "/dev/vdb",
       "InstanceID" : {
         "get_resource" : "WCA4-hadoop-datanode-3"
       },
       "VolumeID" : {
         "get_resource" : "Storage volume used by datanode server_1"
       }
     },
     "policies" : [ {
       "type" : "OS::NetMaxHops",
       "properties" : {
         "hops" : 0
       }
     } ]
   },
   "$vtid_11" : {
     "type" : "AWS::EC2::VolumeAttachment",
     "properties" : {
       "Device" : "/dev/vdb",
       "InstanceID" : {
         "get_resource" : "WCA4-hadoop-datanode-2"
       },
       "VolumeID" : {
         "get_resource" : "Storage volume used by datanode server_2"
       }
     },
     "policies" : [ {
       "type" : "OS::NetMaxHops",
       "properties" : {
         "hops" : 0
       }
     } ]
   },
   "WCA4-hadoop-namenode_1" : {
     "type" : "OS::Nova::Server",
     "properties" : {
       "user_data" : "...",
       "key_name" : "...",
       "flavor" : "m1.medium",
       "image" : "ubuntu12.10-with-be",
       "security_groups" : [ "test" ]
     }
   },
   "Storage volume used by datanode server_3" : {
     "type" : "AWS::EC2::Volume",
     "properties" : {
       "Size" : 300
     },
     "policies" : [ {
       "type" : "OS::VolExclusive"
     }, {
       "type" : "OS::VolNotMoved"
     } ]
   },
   "Storage volume used by datanode server_0" : {
     "type" : "AWS::EC2::Volume",
     "properties" : {
       "Size" : 300
     },
     "policies" : [ {
       "type" : "OS::VolExclusive"
     }, {
       "type" : "OS::VolNotMoved"
     } ]
   },
   "Storage volume used by datanode server_1" : {
     "type" : "AWS::EC2::Volume",
     "properties" : {
       "Size" : 300
     },
     "policies" : [ {
       "type" : "OS::VolExclusive"
     }, {
       "type" : "OS::VolNotMoved"
     } ]
   },
   "Storage volume used by datanode server_2" : {
     "type" : "AWS::EC2::Volume",
     "properties" : {
       "Size" : 300
     },
     "policies" : [ {
       "type" : "OS::VolExclusive"
     }, {
       "type" : "OS::VolNotMoved"
     } ]
   }
 },
 "groups" : {
   "id" : "hadoop-1.1.2-wcc",
   "members" : [ {
     "id" : "WCA4-hadoop-namenode_0",
     "members" : [ {
       "get_resource" : "WCA4-hadoop-namenode_1"
     } ]
   }, {
     "id" : "WCA4-hadoop-datanode-0",
     "members" : [ {
       "get_resource" : "WCA4-hadoop-datanode-1"
     }, {
       "get_resource" : "WCA4-hadoop-datanode-2"
     }, {
       "get_resource" : "WCA4-hadoop-datanode-3"
     }, {
       "get_resource" : "WCA4-hadoop-datanode-4"
     } ],
     "policies" : [ {
       "type" : "OS::AntiCoLocation",
       "properties" : {
         "level" : "compute_node",
         "hardConstraint" : false
       },
       "metadata" : {
         "description" : "AntiColocationConstraint_$vtid_17"
       }
     } ]
   }, {
     "get_resource" : "Storage volume used by datanode server_3"
   }, {
     "get_resource" : "$vtid_9"
   }, {
     "get_resource" : "Storage volume used by datanode server_2"
   }, {
     "get_resource" : "$vtid_11"
   }, {
     "get_resource" : "Storage volume used by datanode server_1"
   }, {
     "get_resource" : "$vtid_13"
   }, {
     "get_resource" : "Storage volume used by datanode server_0"
   }, {
     "get_resource" : "$vtid_15"
   } ]
 }

}


Proposal B

This is an alternative proposal for concrete syntax. It can not work. Read on to understand why.

Proposal B Definition

This proposal uses ordinary resource types and processing to add policy. Since ordinary resource types merely expose underlying APIs, we first have to be clear about those. APIs for policy are undergoing evolution, so we will have to make do with a hypothetical API for now.

Hypothetical API

We suppose the API for policy directly reflects the three key concepts identified above: groups, relationships, and policies.

We suppose the API proceeds through the following four phases.

  1. Group and Policy Definition/Update
  2. Joint Decision-Making
  3. Resource Creation/Update
  4. Final Confirmation

In the Group and Policy Definition/Update phase the client defines the policies, grouping, and relationships. For the non-group virtual resources involved, the client provides a prescription that is generally a subset of what is used in today's create/update operations; in particular, it is the subset that is relevant to scheduling. In the Joint Decision-Making phase, the client requests that a joint scheduling decision be made for a top-level group (and it is indeed made). This scheduling decision implicitly makes a reservation of the physical resources to which the decision binds the input virtual resources. This reservation is maintained by a lease. In the Resource Creation/Update phase the client makes the create/update calls for the individual (non-group) virtual resources involved, with reference to the group and member prescription involved so that the implementation can do as planned in the decision. In general, some of those will succeed, some will fail in various ways, and consequently some will never even be tried. Successful create/update operations, and some of the failed ones, use the physical resources that were reserved for the corresponding virtual resource; these no longer need the lease to maintain them. The client refreshes the lease if this phase takes a long time. In the Final Confirmation phase the client makes a call that signals that the client will attempt no more creations/updates according to the decision that was made. This terminates the implicit reservation and thus enables the implementation to release the physical resource reservations that are now known to be unneeded. After Final Confirmation for a given top-level group, the client can return to the Group and Policy Definition/Update phase and proceed from there again.

We suppose a policy API with the following operations.

  • Define a Policy. Inputs are the type and parameters of the policy, output is identity.
  • Delete a Policy.
  • Apply Policy to Resource. Inputs identify a policy and a non-group resource to which the policy applies.
  • Remove Policy from Resource.
  • Apply Policy Between Two Resources. Inputs identify two non-group resources, in a particular order, and a set of policies to add to that relationship (which is implicitly created if it did not already exist).
  • Remove Policy Between Two Resources. Inputs identify two non-group resources, in a particular order, and a set of policies to remove from that relationship (which is meaningless and implicitly deleted when its set of policies becomes empty).

We suppose a group API with the following operations.

  • Declare a Group. Establishes identity, nothing else.
  • Delete a Group.
  • Add Resource Prescription to Group. Inputs provide the member type and the subset of member properties that are relevant to scheduling.
  • Add Nested Group to Group.
  • Remove Member from Group. The removed member can be either a nested group or a non-group resource.
  • Attach a Policy to a Group
  • Remove a Policy from a Group
  • Add Policy Edge. Inputs identify a "from" group, a "to" group or atomic resource, and a set of policies to add to that relationship (which might already have other policies).
  • Remove Policy Edge. Inputs identify a "from" group, a "to" group or atomic resource, and a set of policies to remove from that relationship (which is meaningless and implicitly deleted when its set of policies becomes empty).
  • Make Scheduling Decision. This operation can only be invoked on a top-level group, and it causes a joint decision about scheduling its members to be made. The details of the decision are stored, not returned.
  • Final Confirmation.

Hypothetical Syntax

Why This Is Bad

Proposal C