Jump to: navigation, search

Difference between revisions of "TripleO/TuskarJunoPlanning"

(Ceilometer)
m
 
(33 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Overview ==
 
== Overview ==
  
This planning document is born from discussions from the TripleO mid-cycle Icehouse meetup in Sunnyvale ([[TripleO/TuskarJunoInitialDiscussion]]).  The ideas iterated there were then individually fleshed out and detailed, with much input coming from conversations with other OpenStack projects.
+
This planning document is born from discussions from the TripleO mid-cycle Icehouse meetup in Sunnyvale ([[TripleO/TuskarJunoInitialDiscussion]], [https://etherpad.openstack.org/p/openstack-management-features OpenStack Management Features]).  The ideas iterated there were then individually fleshed out and detailed, with much input coming from conversations with other OpenStack projects.
  
 
Our principal concerns duing the Juno cycle are:
 
Our principal concerns duing the Juno cycle are:
Line 15: Line 15:
  
 
== Cloud Service Representation ==
 
== Cloud Service Representation ==
 
=== Description ===
 
  
 
A cloud service represents a cloud function - provisioning service, storage service, message broker, database server, etc. A cloud service is fulfilled by a cloud service element; for example, a user can fulfill the message broker cloud service by either choosing the qpid-server element or the rabbitmq-server element.
 
A cloud service represents a cloud function - provisioning service, storage service, message broker, database server, etc. A cloud service is fulfilled by a cloud service element; for example, a user can fulfill the message broker cloud service by either choosing the qpid-server element or the rabbitmq-server element.
Line 24: Line 22:
 
[[File:Tuskar-service-model.png|Tuskar Service Model]]
 
[[File:Tuskar-service-model.png|Tuskar Service Model]]
  
For Juno, Tuskar will provide role defaults or suggestions corresponding to current default roles, as well as an "All-in-One" role.  Each default role will be automatically associated with a list of cloud service elements and a pre-created image.  Later iterations of Tuskar will allow for more user customization in this area.
+
For Juno, Tuskar will provide role defaults or suggestions corresponding to current default roles in Icehouse, as well as an "All-in-One" role.  Each default role will be automatically associated with a list of cloud service elements and a pre-created image.  Later iterations of Tuskar will allow for more user customization in this area.
 
 
=== Requirements ===
 
 
 
==== Heat ====
 
  
* Diskimage builder resource (question raised wrt ability to spawn VM to host it in seed/undercloud), also OS::GlanceImageUploader resource (''may not be required for Juno'')
+
If possible, we would also like to support customized roles that specify their own elements.  This would require the development of additional TripleO features as described in a later section ('''Custom Roles''').
  
 
==== Tuskar ====
 
==== Tuskar ====
  
 
* ensure existence of default images
 
* ensure existence of default images
* create default roles, each associated with a list of elements and an image
+
* create default roles, each associated with a list of elements, an image, and a Heat template
  
==== Tuskar-UI ====
+
==== TripleO-UI ====
  
 
* update deployment workflow to accomodate cloud services
 
* update deployment workflow to accomodate cloud services
  
 
== Overcloud Planning and Deployment ==
 
== Overcloud Planning and Deployment ==
 
=== Description ===
 
  
 
In Icehouse, the planning stage of overcloud deployment is represented by data stored in Tuskar database tables.  For Juno, we would like to remove the database from Tuskar.  Instead, the planning stage of a deployment will be represented by the full Heat template that would be used to deploy it.  Since Heat does not intend to be a template store, this template will be stored in Swift instead.  When the overcloud is ready to be deployed, the Tuskar service will pull the template out of Swift.
 
In Icehouse, the planning stage of overcloud deployment is represented by data stored in Tuskar database tables.  For Juno, we would like to remove the database from Tuskar.  Instead, the planning stage of a deployment will be represented by the full Heat template that would be used to deploy it.  Since Heat does not intend to be a template store, this template will be stored in Swift instead.  When the overcloud is ready to be deployed, the Tuskar service will pull the template out of Swift.
Line 49: Line 41:
 
[[File:tuskar-arch-juno.png|Tuskar Architecture]]
 
[[File:tuskar-arch-juno.png|Tuskar Architecture]]
  
=== Requirements ===
+
Once an overcloud is deployed, all Tuskar interactions with that overcloud should be done through Heat.  This includes queries about running Nova instances and relevant Ironic nodes.
  
 
==== Heat ====
 
==== Heat ====
Line 55: Line 47:
 
* Allow stacks to be updated without forcing the user to re-provide all parameters (https://bugs.launchpad.net/heat/+bug/1224828)
 
* Allow stacks to be updated without forcing the user to re-provide all parameters (https://bugs.launchpad.net/heat/+bug/1224828)
 
* Stack update, allow retry from failed states (https://blueprints.launchpad.net/heat/+spec/update-failure-recovery)
 
* Stack update, allow retry from failed states (https://blueprints.launchpad.net/heat/+spec/update-failure-recovery)
* Stack update, rolling updates (canary deployments etc) (https://blueprints.launchpad.net/heat/+spec/rolling-updates) (''may not be required by Tuskar'')
+
* Stack update, rolling updates (canary deployments etc) (https://blueprints.launchpad.net/heat/+spec/rolling-updates)
 
* Stack update, enable cancelling in-progress update (e.g to pause or rollback) (https://blueprints.launchpad.net/heat/+spec/cancel-update-stack)
 
* Stack update, enable cancelling in-progress update (e.g to pause or rollback) (https://blueprints.launchpad.net/heat/+spec/cancel-update-stack)
 
* Possibly allow inline specification of provider resources, e.g have a resource which generates a heat template, then refer to it, e.g like type: {get_attr: [SomeResource, template]} (''may not be required by Tuskar'')
 
* Possibly allow inline specification of provider resources, e.g have a resource which generates a heat template, then refer to it, e.g like type: {get_attr: [SomeResource, template]} (''may not be required by Tuskar'')
* nested resource templates (''may not be required for Juno'')
+
* nested resource templates
 
* Stack preview (preview what would happen via stack-create) (https://blueprints.launchpad.net/heat/+spec/preview-stack) (''may not be required for Juno'')
 
* Stack preview (preview what would happen via stack-create) (https://blueprints.launchpad.net/heat/+spec/preview-stack) (''may not be required for Juno'')
 
* Stack check (sync state of stack with the real state of underlying resources, e.g persist out-of-band failures in stack resource states) (https://blueprints.launchpad.net/heat/+spec/stack-check) (''may not be required for Juno'')
 
* Stack check (sync state of stack with the real state of underlying resources, e.g persist out-of-band failures in stack resource states) (https://blueprints.launchpad.net/heat/+spec/stack-check) (''may not be required for Juno'')
Line 66: Line 58:
  
 
* update Heat templates in TripleO for: HOT, provider resources, software config
 
* update Heat templates in TripleO for: HOT, provider resources, software config
* given an image element, return the list of Heat parameters that it needs (''may not be required for Juno'')
 
  
 
==== Tuskar ====
 
==== Tuskar ====
  
* given a role specification, return the list of Heat parameters that it needs (derived from its image elements) (''may not be required for Juno'')
+
* provide default templates for default roles (these will be wrapped in ProviderResources)
* given a role specification and its Heat parameters, construct a Heat template that meets those specifications (''may not be required for Juno'')
+
* construct an overcloud template that references the default roles through ProviderResources
 +
* ensure constructed templates define the appropriate ResourceAttributes to be queried through the UI/CLI
 
* given a Heat template constructed in this way, parse it in such a way as to retrieve the role specification and Heat parameters
 
* given a Heat template constructed in this way, parse it in such a way as to retrieve the role specification and Heat parameters
 
* rebuild Tuskar to save and retrieve Heat templates from Swift
 
* rebuild Tuskar to save and retrieve Heat templates from Swift
 
* update CLI as necessary
 
* update CLI as necessary
  
==== Tuskar-UI ====
+
==== TripleO-UI ====
  
 +
* update stack queries to take advantage of Heat ResourceAttributes (make sure information is gotten through Heat); for example, Role
 
* update deployment workflow
 
* update deployment workflow
 +
 +
== Custom Roles ==
 +
 +
Although default, predefined roles are the immediate goal, it would be nice to also allow users to define their own roles.  To do so, they would specify their own list of elements.  Tuskar would then use that information to call out to diskimage-builder to create a custom image matching the role; and it would create a custom Heat template for that role that the master overcloud template could then use.
 +
 +
If the above cannot be achieved within Juno, a simpler alternative would be to allow users to specify a custom role, and then associate it with manually created templates and images.
 +
 +
The remainder of the workflow is the same as with default roles.
 +
 +
==== Heat ====
 +
 +
* Diskimage builder resource (question raised wrt ability to spawn VM to host it in seed/undercloud), also OS::GlanceImageUploader resource
 +
==== TripleO ====
 +
 +
* given an image element, return the list of Heat parameters that it needs
 +
 +
==== Tuskar ====
 +
 +
* given a role specification, use diskimage-builder to construct a matching image (derived from the role's image elements)
 +
* given a role specification, return the list of Heat parameters that it needs (derived from the role's image elements)
 +
* given a role specification and its Heat parameters, construct a Heat template for that role that meets those specifications
 +
* allow user to specify a custom role and associate it with a manually created template and image
 +
 +
==== TripleO-UI ====
 +
 +
* create custom role workflow
  
 
== High Availability ==
 
== High Availability ==
 
=== Description ===
 
  
 
One of TripleO's top priorities for Juno is to allow the deployment of a High-Availability (HA) overcloud.  We would like to extend Tuskar to ensure that it can be used to deploy a HA overcloud as well.
 
One of TripleO's top priorities for Juno is to allow the deployment of a High-Availability (HA) overcloud.  We would like to extend Tuskar to ensure that it can be used to deploy a HA overcloud as well.
 
=== Requirements ===
 
  
 
==== TripleO ====
 
==== TripleO ====
Line 102: Line 117:
 
* allow the generation of Heat templates that support HA
 
* allow the generation of Heat templates that support HA
  
==== Tuskar-UI ====
+
==== TripleO-UI ====
  
 
* deployment workflow support for HA architecture
 
* deployment workflow support for HA architecture
  
 
== Auto-Scaling ==
 
== Auto-Scaling ==
 
=== Description ===
 
  
 
Having the option for an auto-scaling cloud deployment would be greatly appealing to many users.  Heat is actively working on auto-scaling support, as are other projects.
 
Having the option for an auto-scaling cloud deployment would be greatly appealing to many users.  Heat is actively working on auto-scaling support, as are other projects.
 
=== Requirements ===
 
  
 
==== Ceilometer ====
 
==== Ceilometer ====
  
* Ceilometer native auth for alarm notifications (and possibly metrics in future) (https://blueprints.launchpad.net/ceilometer/+spec/trust-alarm-notifier)
+
* Inhibit autoscaling during stack abandon/adopt (quiesce and revitalize)
* Eliminate central agent SPoF
 
* SNMP batch mode, one bulk command per node per polling cycle
 
  
 
==== Heat ====
 
==== Heat ====
Line 126: Line 135:
 
* support complex conditionals when choosing victim on scale-down (e.g get notification or poll metric via ceilometer related to occupancy or other application metrics); possibly handled by above, if we can get ceilometer to pass us appropriate data when signalling the scaling policy, TBC (''may not be required for Juno'')
 
* support complex conditionals when choosing victim on scale-down (e.g get notification or poll metric via ceilometer related to occupancy or other application metrics); possibly handled by above, if we can get ceilometer to pass us appropriate data when signalling the scaling policy, TBC (''may not be required for Juno'')
 
* Method to inhibit autoscaling/alarms during abandon/adopt (and suspend/resume?)
 
* Method to inhibit autoscaling/alarms during abandon/adopt (and suspend/resume?)
 
==== TripleO ====
 
 
* update image elements to support auto-scaling
 
  
 
==== Tuskar ====
 
==== Tuskar ====
Line 135: Line 140:
 
* allow the generation of Heat templates that support auto-scaling
 
* allow the generation of Heat templates that support auto-scaling
  
==== Tuskar-UI ====
+
==== TripleO-UI ====
  
 
* update deployment workflow to include options that allow an auto-scaled overcloud deployment
 
* update deployment workflow to include options that allow an auto-scaled overcloud deployment
Line 141: Line 146:
 
== Node Management (Ironic) ==
 
== Node Management (Ironic) ==
  
=== Description ===
+
We would like to switch over to using Ironic in the near future, as there are a host of features that would depend upon it.
 
 
=== Requirements ===
 
  
 
==== Ironic ====
 
==== Ironic ====
Line 161: Line 164:
 
==== Nova ====
 
==== Nova ====
  
* ensure that flavor metadata can include node tag values
+
* create a nova filter for exact matches to Ironic nodes (including all node metadata) (https://review.openstack.org/#/c/83728/)
* create a nova filter for exact matches to Ironic nodes (including all node metadata)
 
  
==== Heat ====
+
==== TripleO ====
 +
 
 +
* Allow settings for Nova Scheduler (https://review.openstack.org/#/c/84131/)
 +
 
 +
==== Tuskar ====
 +
 
 +
* use exact match filter when deploying overcloud
 +
 
 +
==== TripleO-UI ====
  
* Ironic resource plugin  (''may not be required by us'')
+
* added node tag values to flavor extra_specs
  
 
== Metric Graphs ==
 
== Metric Graphs ==
  
=== Description ===
+
Data visualization is a key part of maintaining a cloud.  We would like to start integrating Ceilometer usage into Tuskar.
 
 
=== Requirements ===
 
  
 
==== Ceilometer ====
 
==== Ceilometer ====
Line 180: Line 188:
 
* Configurable data retention based on rollups
 
* Configurable data retention based on rollups
 
* Overarching "health" metric for nodes
 
* Overarching "health" metric for nodes
 +
* Ceilometer native auth for alarm notifications (and possibly metrics in future) (https://blueprints.launchpad.net/ceilometer/+spec/trust-alarm-notifier)
 +
* Eliminate central agent SPoF
 +
* SNMP batch mode, one bulk command per node per polling cycle
  
  
Line 188: Line 199:
 
* Look into consistent hashing; see if it can be reused in ceilometer -- though it requires stateful DB
 
* Look into consistent hashing; see if it can be reused in ceilometer -- though it requires stateful DB
  
==== Tuskar-UI ====
+
==== Ironic ====
 +
 
 +
* Add a periodic task to send hardware sensor data to ceilometer  (https://blueprints.launchpad.net/ironic/+spec/send-data-to-ceilometer)
 +
 
 +
==== TripleO ====
 +
 
 +
* update undercloud images to allow for the monitoring of hardware (setup Central Agent for polling all hardware)
 +
 
 +
==== Tuskar ====
 +
 
 +
* add SNMP agents to overcloud images
  
* Add Ceilometer-based graphs
+
==== TripleO-UI ====
 +
 
 +
* Add Ceilometer-based graphs and metric data
  
 
== User Interfaces ==
 
== User Interfaces ==
  
=== Description ===
+
This is a general category intended to encapsulate work needed in the UI and CLI.
 
 
=== Requirements ===
 
  
 
==== Horizon ====
 
==== Horizon ====
Line 202: Line 223:
 
* separate horizon from openstack-dashboard
 
* separate horizon from openstack-dashboard
 
* test the replacement of lesscpy with pyScss
 
* test the replacement of lesscpy with pyScss
 +
* improve error handling so that messages are less generic
 +
* additional work on the plugin architecture to support dynamic hardware-specific views
  
 
==== Tuskar-CLI ====
 
==== Tuskar-CLI ====
Line 207: Line 230:
 
* create a tuskar-cli plugin for OpenStackClient
 
* create a tuskar-cli plugin for OpenStackClient
  
==== Tuskar-UI ====
+
==== TripleO-UI ====
  
 
* increase the modularity of views
 
* increase the modularity of views
 +
* investigate usage of Heat UI components in Horizon (for example, parameter form building)
 
* create a mechanism for asynchronous communication
 
* create a mechanism for asynchronous communication

Latest revision as of 20:34, 4 April 2014

Overview

This planning document is born from discussions from the TripleO mid-cycle Icehouse meetup in Sunnyvale (TripleO/TuskarJunoInitialDiscussion, OpenStack Management Features). The ideas iterated there were then individually fleshed out and detailed, with much input coming from conversations with other OpenStack projects.

Our principal concerns duing the Juno cycle are:

  • integrating further with other OpenStack services, using their capabilities to enhance our TripleO management experience
  • ensuring that Tuskar does not try to implement functionality that is better located in other projects


This document details our high-level goals for Juno. It does so at multiple levels; for each we provide:

  • a description of the goal
  • a list of project requirements and/or blueprints needed

Cloud Service Representation

A cloud service represents a cloud function - provisioning service, storage service, message broker, database server, etc. A cloud service is fulfilled by a cloud service element; for example, a user can fulfill the message broker cloud service by either choosing the qpid-server element or the rabbitmq-server element.

With the cloud service concept in play, overcloud roles are updated as follows: instead of being associated with images, they are associated with a list of cloud service elements. This provides greater flexibility for the Tuskar user when designing the scalable components of a cloud. For example, instead of being limited to a Controller role, the user can separate out the network components of that role into a Network role and scale it individually.

Tuskar Service Model

For Juno, Tuskar will provide role defaults or suggestions corresponding to current default roles in Icehouse, as well as an "All-in-One" role. Each default role will be automatically associated with a list of cloud service elements and a pre-created image. Later iterations of Tuskar will allow for more user customization in this area.

If possible, we would also like to support customized roles that specify their own elements. This would require the development of additional TripleO features as described in a later section (Custom Roles).

Tuskar

  • ensure existence of default images
  • create default roles, each associated with a list of elements, an image, and a Heat template

TripleO-UI

  • update deployment workflow to accomodate cloud services

Overcloud Planning and Deployment

In Icehouse, the planning stage of overcloud deployment is represented by data stored in Tuskar database tables. For Juno, we would like to remove the database from Tuskar. Instead, the planning stage of a deployment will be represented by the full Heat template that would be used to deploy it. Since Heat does not intend to be a template store, this template will be stored in Swift instead. When the overcloud is ready to be deployed, the Tuskar service will pull the template out of Swift.

Tuskar Architecture

Once an overcloud is deployed, all Tuskar interactions with that overcloud should be done through Heat. This includes queries about running Nova instances and relevant Ironic nodes.

Heat

TripleO

  • update Heat templates in TripleO for: HOT, provider resources, software config

Tuskar

  • provide default templates for default roles (these will be wrapped in ProviderResources)
  • construct an overcloud template that references the default roles through ProviderResources
  • ensure constructed templates define the appropriate ResourceAttributes to be queried through the UI/CLI
  • given a Heat template constructed in this way, parse it in such a way as to retrieve the role specification and Heat parameters
  • rebuild Tuskar to save and retrieve Heat templates from Swift
  • update CLI as necessary

TripleO-UI

  • update stack queries to take advantage of Heat ResourceAttributes (make sure information is gotten through Heat); for example, Role
  • update deployment workflow

Custom Roles

Although default, predefined roles are the immediate goal, it would be nice to also allow users to define their own roles. To do so, they would specify their own list of elements. Tuskar would then use that information to call out to diskimage-builder to create a custom image matching the role; and it would create a custom Heat template for that role that the master overcloud template could then use.

If the above cannot be achieved within Juno, a simpler alternative would be to allow users to specify a custom role, and then associate it with manually created templates and images.

The remainder of the workflow is the same as with default roles.

Heat

  • Diskimage builder resource (question raised wrt ability to spawn VM to host it in seed/undercloud), also OS::GlanceImageUploader resource

TripleO

  • given an image element, return the list of Heat parameters that it needs

Tuskar

  • given a role specification, use diskimage-builder to construct a matching image (derived from the role's image elements)
  • given a role specification, return the list of Heat parameters that it needs (derived from the role's image elements)
  • given a role specification and its Heat parameters, construct a Heat template for that role that meets those specifications
  • allow user to specify a custom role and associate it with a manually created template and image

TripleO-UI

  • create custom role workflow

High Availability

One of TripleO's top priorities for Juno is to allow the deployment of a High-Availability (HA) overcloud. We would like to extend Tuskar to ensure that it can be used to deploy a HA overcloud as well.

TripleO

  • Deploy HA Overcloud
  • glusterfs
  • pacemaker, corosync
  • neutron (?)
  • heat-engine A/A
  • qpid proton (assuming amqp 1.0 have merged into oslo.messaging and oslo.messaging have merged in each core project. If not, will use rabbitmq)
  • etc etc

Tuskar

  • allow the generation of Heat templates that support HA

TripleO-UI

  • deployment workflow support for HA architecture

Auto-Scaling

Having the option for an auto-scaling cloud deployment would be greatly appealing to many users. Heat is actively working on auto-scaling support, as are other projects.

Ceilometer

  • Inhibit autoscaling during stack abandon/adopt (quiesce and revitalize)

Heat

  • hooks to do cleanup on scale-down (e.g host evacuation etc) (https://blueprints.launchpad.net/heat/+spec/update-hooks)
  • choose victim on scale-down, or specify strategy for choosing (e.g oldest first or newest first) (https://blueprints.launchpad.net/heat/+spec/autoscaling-parameters)
  • support complex conditionals when choosing victim on scale-down (e.g get notification or poll metric via ceilometer related to occupancy or other application metrics); possibly handled by above, if we can get ceilometer to pass us appropriate data when signalling the scaling policy, TBC (may not be required for Juno)
  • Method to inhibit autoscaling/alarms during abandon/adopt (and suspend/resume?)

Tuskar

  • allow the generation of Heat templates that support auto-scaling

TripleO-UI

  • update deployment workflow to include options that allow an auto-scaled overcloud deployment

Node Management (Ironic)

We would like to switch over to using Ironic in the near future, as there are a host of features that would depend upon it.

Ironic

  • Ironic graduation
  • CI jobs
  • Nova driver
  • Serial console
  • Migration path
  • User documentation
  • Autodiscovery of nodes
  • Ceilometer
  • Tagging
  • scalability

Nova

TripleO

Tuskar

  • use exact match filter when deploying overcloud

TripleO-UI

  • added node tag values to flavor extra_specs

Metric Graphs

Data visualization is a key part of maintaining a cloud. We would like to start integrating Ceilometer usage into Tuskar.

Ceilometer

  • Combine samples for different meters in a transformer to produce a single derived meter
  • Rollup of course-grained statistics for UI queries
  • Configurable data retention based on rollups
  • Overarching "health" metric for nodes
  • Ceilometer native auth for alarm notifications (and possibly metrics in future) (https://blueprints.launchpad.net/ceilometer/+spec/trust-alarm-notifier)
  • Eliminate central agent SPoF
  • SNMP batch mode, one bulk command per node per polling cycle


For additional data:

  • Acquire hardware-oriented metrics via IPMI (e.g., voltage, fan speeds, etc.)
  • Keystone v3 usage would avoid IPMI credentials; allowing pollster-style interaction
  • Look into consistent hashing; see if it can be reused in ceilometer -- though it requires stateful DB

Ironic

TripleO

  • update undercloud images to allow for the monitoring of hardware (setup Central Agent for polling all hardware)

Tuskar

  • add SNMP agents to overcloud images

TripleO-UI

  • Add Ceilometer-based graphs and metric data

User Interfaces

This is a general category intended to encapsulate work needed in the UI and CLI.

Horizon

  • separate horizon from openstack-dashboard
  • test the replacement of lesscpy with pyScss
  • improve error handling so that messages are less generic
  • additional work on the plugin architecture to support dynamic hardware-specific views

Tuskar-CLI

  • create a tuskar-cli plugin for OpenStackClient

TripleO-UI

  • increase the modularity of views
  • investigate usage of Heat UI components in Horizon (for example, parameter form building)
  • create a mechanism for asynchronous communication