Jump to: navigation, search

Difference between revisions of "RaxCeilometerRequirements"

Line 10: Line 10:
  
 
* It appears that once the data is collected, it goes back into another rabbit queue for their Collector to process. It stores the raw data potentially many times as different metrics. For example there are objects for Cpu, IP, Disk, Bandwidth, etc. I don't know really what that buys us, vs raw + rolled up data. Much of the raw data is then discarded (we need to keep it). To get a single picture of an instance may require many queries (we need Request ID, Instance ID, Host, Tenant ID and time-range)
 
* It appears that once the data is collected, it goes back into another rabbit queue for their Collector to process. It stores the raw data potentially many times as different metrics. For example there are objects for Cpu, IP, Disk, Bandwidth, etc. I don't know really what that buys us, vs raw + rolled up data. Much of the raw data is then discarded (we need to keep it). To get a single picture of an instance may require many queries (we need Request ID, Instance ID, Host, Tenant ID and time-range)
 +
** Requires new blueprint
 +
** May affect: https://blueprints.launchpad.net/ceilometer/+spec/synaps-dimensional-decomposition
 
* There is no double entry accounting. It's the raw event, but not consolidated. The question of polling the HV directly as a second source is still there.
 
* There is no double entry accounting. It's the raw event, but not consolidated. The question of polling the HV directly as a second source is still there.
 +
** Requires new blueprint
 
* The Ceilometer Compute Agent is not hypervisor independent. We need support for [[XenServer]]. Additionally we feel this data can all be collected via the existing notifications (and if not, Nova should be fixed to provide the required data). This questions the need for the Compute Agent in the first place.
 
* The Ceilometer Compute Agent is not hypervisor independent. We need support for [[XenServer]]. Additionally we feel this data can all be collected via the existing notifications (and if not, Nova should be fixed to provide the required data). This questions the need for the Compute Agent in the first place.
 +
** Affected blueprints:
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/remove-nova-imports
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/xenapi-support
 
* We need to do post-processing on the raw data beyond the initial collection. We need the queue after the initial collection to allow for the "settling time" to allow for multiple workers (otherwise you have a bottleneck or staggered data)
 
* We need to do post-processing on the raw data beyond the initial collection. We need the queue after the initial collection to allow for the "settling time" to allow for multiple workers (otherwise you have a bottleneck or staggered data)
 +
** Affected blueprints:
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/multi-publisher
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/cw-publish
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/synaps-alarm-evaluation
 
* Need support for error notifications and capture full state from all .start/.end messages. As it is today, there could be significant miscounts.
 
* Need support for error notifications and capture full state from all .start/.end messages. As it is today, there could be significant miscounts.
 +
** Requires new blueprint
 
* Millisecond timing resolution regardless of database.  
 
* Millisecond timing resolution regardless of database.  
 +
** Requires new blueprint
 
* Stop using the nova rpc mechanism which does an ACK immediately regardless of if event was properly handled or not.
 
* Stop using the nova rpc mechanism which does an ACK immediately regardless of if event was properly handled or not.
 +
** Affected blueprints:
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/remove-nova-imports
 +
*** https://blueprints.launchpad.net/ceilometer/+spec/move-listener-framework-oslo
 +
* Extend the API to include [[StackTach]]-like operations for KPI's, etc.
  
 
Other Minor Nits:
 
Other Minor Nits:
  
 
* Ceilometer extensively uses the openstack.common library ... I'm not sure what this really buys us. It seems like there is a lot of boiler plate just to work with this. Could be a lot easier.
 
* Ceilometer extensively uses the openstack.common library ... I'm not sure what this really buys us. It seems like there is a lot of boiler plate just to work with this. Could be a lot easier.

Revision as of 19:14, 14 January 2013

We plan to bring the following functionality to Ceilometer:

  1. Double-entry accounting verification of OpenStack usage before handoff to billing (finance is our primary customer).
  2. 90+ days storage of high-volume raw notifications (planning for at least 2 billion rows).
  3. Secondary aggregation/rollups of the raw data with support for third-party hooks into the notification pipeline (tricky with schema changes).
  4. Support for downstream consumers via PubSubHubBub/Atom mechanisms (such as AtomHopper)
  5. Monitoring of instance state for detailed debugging and SLA tracking.

The implications of these requirements will require changes to Ceilometer, specifically:

Other Minor Nits:

  • Ceilometer extensively uses the openstack.common library ... I'm not sure what this really buys us. It seems like there is a lot of boiler plate just to work with this. Could be a lot easier.