Jump to: navigation, search

Difference between revisions of "Ceilometer/Graduation"

m (Detailed scoping of future monitoring support)
 
(30 intermediate revisions by 5 users not shown)
Line 1: Line 1:
__NOTOC__
+
 
 +
== Key take-aways ==
 +
* Concentrated on metering stability & reach for grizzly
 +
* Traded off core maturity over mission creep
 +
* Built up a growing, diverse & sustainable developer community
 +
* Deep & involved design discussion connotes a vibrant project attracting wide interest
 +
* Track record in following openstack community best practices
 +
* Proven in production deployments, more scaled-up roll-outs in the pipeline
 +
* Aiming for flexible data acquisition layer to enable future monitoring support
 +
* Demonstrated openness to joining forces with related projects
 +
 
 
== Why we think we're ready ==
 
== Why we think we're ready ==
 
* Deployed and in use at many sites
 
* Deployed and in use at many sites
** [[DreamHost]]
+
** DreamHost
 
** eNovance
 
** eNovance
 
** [[CloudWatt]]
 
** [[CloudWatt]]
Line 15: Line 25:
 
* Successfully delivered the G2 milestone aligned with the overall project release cycle
 
* Successfully delivered the G2 milestone aligned with the overall project release cycle
 
* Good integration with all core projects now, including Swift
 
* Good integration with all core projects now, including Swift
 +
* Built up a diverse and sustainable core developer community, affiliated to multiple organizations
 +
* Followed openstack community best practices from the outset
  
 
== Is our architecture stable? ==
 
== Is our architecture stable? ==
Line 21: Line 33:
 
* permits regular fetching of data via a pull mechanism for non-notification-generating projects (i.e. Swift)
 
* permits regular fetching of data via a pull mechanism for non-notification-generating projects (i.e. Swift)
 
* allows more intrusive data collection for non-supported projects via an optional agent mechanism
 
* allows more intrusive data collection for non-supported projects via an optional agent mechanism
 +
 +
<br>
 
The same measurements can then be retrieved at different intervals and republished to multiple destinations though a YAML-based configuration mechanism, thus allowing seamless integration of multiple projects around Ceilometer without forcing the use of a single database and API.
 
The same measurements can then be retrieved at different intervals and republished to multiple destinations though a YAML-based configuration mechanism, thus allowing seamless integration of multiple projects around Ceilometer without forcing the use of a single database and API.
  
There are still some aspects of the architecture that are still emerging (such as database schemas for metadata/rich data and aggregation) and all groups have committed to working on a mutually acceptable solution.  
+
[[File:ceilometer-multi-publish2.jpg]]
 +
 
 +
There are still some aspects of the architecture that are still emerging (such as database schemas for metadata/rich data and aggregation) and all groups have committed to working on a mutually acceptable solution.
 +
 
 +
=== In the last cycle (Grizzly) ===
 +
* while we've made changes in G, they have been incremental
 +
* incremental tweaks for performance
 +
* cleaner nova interaction model (still based on polling the hypervisor layer)
 +
* new API, but as a refinement based on experience with the old one.  Old one still maintained until J
 +
* added one new query method to the storage drivers to support the new api
 +
* multipublisher maintained the old publishing system and added the potential for new output types
 +
 
 +
<br>
 +
=== In next version (Havana) ===
 +
* nothing that has been proposed will require a complete rewrite, afawct
 +
* rackspace has proposed changing the way data is collected from the compute nodes, which (iff they can get it working) may allow us to drop code in our compute agent
 +
* multipublisher eliminates the need to rewrite to avoid the message bus
 +
 
 +
== Scope & Complementarity ==
 +
 
 +
* enabling reuse of metering for other purpose is good code reuse, not feature creep
 +
* metric collection is definitely in scope, alerting evaluation we'll be figuring out at next summit
 +
* no known missing integration points
 +
** authenticates with Keystone
 +
** plugin for horizon underway
 +
** potential reuse of metric by nova scheduler in the future
 +
* reaching out to bring related projects into the fold (StackTach, Synaps, Healthnmon)
 +
* future monitoring support is complementary to the Heat requirement:
 +
** a natural replacement for the existing simple CW implementation in Heat
 +
** Heat will report Stack usage metrics to Ceilometer and consume alerting notifications from Ceilometer
 +
 
 +
=== Detailed scoping of future monitoring support ===
 +
 
 +
We think of it in terms of a multi-phase pipeline pattern, where some (but not ''all'') aspects of what is traditionally lumped into the monitoring/alerting bucket would be addressed by Ceilometer.
 +
 
 +
[[File:Ceilometer-monitoring-scope.jpeg]]
 +
 
 +
Now, phases #1-5 are considered in scope for Ceilometer, to avoid duplication and effectively reuse the infrastructure we already have in place for metering.
 +
 
 +
Whereas, elements #6 and #7 are also logical for Ceilometer to provide in the future, in terms of addressing the Heat requirement.
 +
 
 +
However, elements #8 (except in the more trivial notification case) and #9 would be considered out-of-scope.
  
 
== Testimonials ==
 
== Testimonials ==
 
* Rackspace
 
* Rackspace
 
** Rackspace has committed considerable resources to taking the lessons learned from developing and deploying [[StackTach]] into Ceilometer ([[StackTach]] will be twilighted once comparable functionality is available). Rax has documented some of their discussion points (here: http://wiki.openstack.org/RaxCeilometerRequirements) and we will working through those in the public channels.
 
** Rackspace has committed considerable resources to taking the lessons learned from developing and deploying [[StackTach]] into Ceilometer ([[StackTach]] will be twilighted once comparable functionality is available). Rax has documented some of their discussion points (here: http://wiki.openstack.org/RaxCeilometerRequirements) and we will working through those in the public channels.
* Healthnmon
+
* Red Hat
* Redhat
 
 
** Red Hat has committed two core developers to the project during the Grizzly release cycle, concentrating on rationalizing the nova interaction model, helping shape the v2 API, packaging for Fedora/EPEL/RHEL, and laying the groundwork for metrics & monitoring support. We expect this committment to continue into the Havana cycle.
 
** Red Hat has committed two core developers to the project during the Grizzly release cycle, concentrating on rationalizing the nova interaction model, helping shape the v2 API, packaging for Fedora/EPEL/RHEL, and laying the groundwork for metrics & monitoring support. We expect this committment to continue into the Havana cycle.
 +
* DreamHost
 +
** DreamHost has committed a developer to the project from its inception, and is using ceilometer to collect metering data from its public cloud for billing and capacity planning.
 +
* HP
 +
** HP has committed several developers to leverage Ceilometer metering, and to integrate the health and monitoring support into the Ceilometer project provided by the healthnmon support.

Latest revision as of 16:31, 26 February 2013

Key take-aways

  • Concentrated on metering stability & reach for grizzly
  • Traded off core maturity over mission creep
  • Built up a growing, diverse & sustainable developer community
  • Deep & involved design discussion connotes a vibrant project attracting wide interest
  • Track record in following openstack community best practices
  • Proven in production deployments, more scaled-up roll-outs in the pipeline
  • Aiming for flexible data acquisition layer to enable future monitoring support
  • Demonstrated openness to joining forces with related projects

Why we think we're ready

  • Deployed and in use at many sites
  • Robust multi purpose architecture recently extended to support multiple publishing channels, thus allowing ceilometer to become a metrics source for other tools apart from metering
  • Successfully passed the challlenge of being adopted by 3 related projects which have agreed to join or use ceilometer:
  • Delivered Folsom within 2 weeks of release, prior to incubation
  • Successfully delivered the G2 milestone aligned with the overall project release cycle
  • Good integration with all core projects now, including Swift
  • Built up a diverse and sustainable core developer community, affiliated to multiple organizations
  • Followed openstack community best practices from the outset

Is our architecture stable?

Discussion with Healthnmon and sandywalsh have been deep and involved. We believe their suggestions are still obtainable with only slight modifications to Ceilometer architecture without changing the fundamentals. The main challenge was to explain the reasoning behind our choices which, while different in their approach of the problem of collecting metrics from other projects, provides a solution which:

  • allows lean collection of metrics from supporting projects that send events on the Oslo bus
  • permits regular fetching of data via a pull mechanism for non-notification-generating projects (i.e. Swift)
  • allows more intrusive data collection for non-supported projects via an optional agent mechanism


The same measurements can then be retrieved at different intervals and republished to multiple destinations though a YAML-based configuration mechanism, thus allowing seamless integration of multiple projects around Ceilometer without forcing the use of a single database and API.

Ceilometer-multi-publish2.jpg

There are still some aspects of the architecture that are still emerging (such as database schemas for metadata/rich data and aggregation) and all groups have committed to working on a mutually acceptable solution.

In the last cycle (Grizzly)

  • while we've made changes in G, they have been incremental
  • incremental tweaks for performance
  • cleaner nova interaction model (still based on polling the hypervisor layer)
  • new API, but as a refinement based on experience with the old one. Old one still maintained until J
  • added one new query method to the storage drivers to support the new api
  • multipublisher maintained the old publishing system and added the potential for new output types


In next version (Havana)

  • nothing that has been proposed will require a complete rewrite, afawct
  • rackspace has proposed changing the way data is collected from the compute nodes, which (iff they can get it working) may allow us to drop code in our compute agent
  • multipublisher eliminates the need to rewrite to avoid the message bus

Scope & Complementarity

  • enabling reuse of metering for other purpose is good code reuse, not feature creep
  • metric collection is definitely in scope, alerting evaluation we'll be figuring out at next summit
  • no known missing integration points
    • authenticates with Keystone
    • plugin for horizon underway
    • potential reuse of metric by nova scheduler in the future
  • reaching out to bring related projects into the fold (StackTach, Synaps, Healthnmon)
  • future monitoring support is complementary to the Heat requirement:
    • a natural replacement for the existing simple CW implementation in Heat
    • Heat will report Stack usage metrics to Ceilometer and consume alerting notifications from Ceilometer

Detailed scoping of future monitoring support

We think of it in terms of a multi-phase pipeline pattern, where some (but not all) aspects of what is traditionally lumped into the monitoring/alerting bucket would be addressed by Ceilometer.

Ceilometer-monitoring-scope.jpeg

Now, phases #1-5 are considered in scope for Ceilometer, to avoid duplication and effectively reuse the infrastructure we already have in place for metering.

Whereas, elements #6 and #7 are also logical for Ceilometer to provide in the future, in terms of addressing the Heat requirement.

However, elements #8 (except in the more trivial notification case) and #9 would be considered out-of-scope.

Testimonials

  • Rackspace
    • Rackspace has committed considerable resources to taking the lessons learned from developing and deploying StackTach into Ceilometer (StackTach will be twilighted once comparable functionality is available). Rax has documented some of their discussion points (here: http://wiki.openstack.org/RaxCeilometerRequirements) and we will working through those in the public channels.
  • Red Hat
    • Red Hat has committed two core developers to the project during the Grizzly release cycle, concentrating on rationalizing the nova interaction model, helping shape the v2 API, packaging for Fedora/EPEL/RHEL, and laying the groundwork for metrics & monitoring support. We expect this committment to continue into the Havana cycle.
  • DreamHost
    • DreamHost has committed a developer to the project from its inception, and is using ceilometer to collect metering data from its public cloud for billing and capacity planning.
  • HP
    • HP has committed several developers to leverage Ceilometer metering, and to integrate the health and monitoring support into the Ceilometer project provided by the healthnmon support.