Jump to: navigation, search

Difference between revisions of "SystemUsageData"

 
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
* '''Launchpad Entry''': [[NovaSpec]]: System Usage Data
+
* '''Launchpad Entry''': [[NovaSpec]]: [https://blueprints.launchpad.net/nova/+spec/system-usage-records System Usage Data]
 
* '''Created''': 3 February 2011
 
* '''Created''': 3 February 2011
 
* '''Contributors''': Paul Voccio, Glen Campbell
 
* '''Contributors''': Paul Voccio, Glen Campbell

Revision as of 15:41, 3 February 2011

Summary

Nova administrators want data on system usage for billing, chargeback, or monitoring purposes.

Release Note

System usage data is collected by Nova and provided via a rich Atom API. Usage data includes utilization of bandwidth, hard disk, and RAM, along with other important events such as the creation or teardown of servers. If configured, usage data is provided via PubSubHubbub (PSH) for efficient distribution of data to subscribers.

Rationale

As a systems integrator, we need the ability for 3rd party systems to query usage information from Nova to determine how to charge customers for use of the platform.

Enterprise customers also need the ability to determine usage data, though for different reasons. For example, a cloud administrator may want to determine VMs that have not been used in several month so that they can be reclaimed and the resources reallocated. Bandwidth statistics at the individual VM level could be used to optimize load balancing and deployment efficiency. And some IT departments may implement a quota system for their users.

Note that Nova itself should not be concerned with billing, but it needs to collect the usage data and aggregate it for a requested time period, as well as potentially reset any counters involved.

User stories

As a systems integrator, I need to retrieve usage data so that I can properly bill my customers.

As a systems integrator or enterprise cloud administrator, I want to monitor usage data over time so that I can optimize the utilization of my resources.

As an business or agency that supports multiple projects, I need to account for the resources consumed by each project so that I can properly meet accounting and budget standards.

As a systems integrator, I need to provide usage data to multiple third-party systems without building custom interfaces to each one so that I can efficiently utilize my manpower.

Assumptions

Usage data is retrieved by Account ID as defined in the openstack-accounting blueprint.

There is a billing system that is not a part of OpenStack. Invoicing, billing, and customer management are handled externally to OpenStack/Nova. Nova only has knowledge of an "account ID" as per the openstack-accounting blueprint.

The "account ID" is "not" the same as the (existing) project_id; there may be multiple projects under a single account ID.

If Nova is installed in multiple regions (aka "zones"), usage data will not span zones; even if the same account ID is used in multiple zones, that data must be aggregated external to Nova.

Design

Event Data

We need to have a model for usages that record a usage every time an action ("event") happens on a vm. Events include creation, resizes, deletions, add/remove ips, etc. In theory, any event should be logged; in practice, it may be suitable to focus only on those events that are of significant interest to the OpenStack community.

Event data would include something similar, but not strict to the following:

Data Type Name
t.integer instance_id
t.integer ram_size
t.integer disk_size
t.integer account_id
t.integer used_tx
t.integer used_rx
t.integer has_backups
t.integer extra_ips
t.datetime started_at
t.datetime ended_at
t.datetime created_at
t.datetime updated_at
t.string options

This could potentially be made generic by specifying the attributes to be tracked and recorded in a configuration file. Ideally, the usage tracking API should be extensible so that users may track specific usage data of interest to themselves without the need to modify the core Nova system.

API

API RETURN FORMAT

The usage data API will return Atom 1.0-formatted data.

The proposed method for returning structured data is by using the Open Data Protocol embedded within the Atom feed.

API CACHING

GET requests MUST return a Cache-Control: header.

The default cache time (in seconds) MUST be configurable by a system-wide setting. For example, if the system setting for cache time is 60 seconds, then an API GET request should return:

Cache-Control: max-age=60

If the user wishes to disable caching, use a cache setting of 0.

API METHODS

URN Method Returns
/v1.0/usage GET returns aggregated usage information for the entire Nova instance
PUT 405 Method Not Allowed
POST 405 Method Not Allowed
DELETE 405 Method Not Allowed
/v1.0/accountID/usage GET Returns aggregated usage data for the specified account. Strictly speaking, GET requests should be idempotent; however, because usage data may have some latency, a second GET request may return (slightly) different data
PUT 405 Method Not Allowed
POST Returns aggregated usage data for the specified account as per GET; however, the POST request also resets ongoing counters to zero.
DELETE Resets aggregated usage data for the specified account.

API QUERY STRINGS

All queries must be able to be constrained by a timestamp:

/version/accountID/usage?begin=2011-01-01-00:00&end=2011-02-01-00:00

It's probably best not to provide search functionality like this on the host node. We should do our best to keep the database on each host node as small as we reasonably can. The API would be useful with this capability, but it should search through a data store at a region level.

In these cases, the begin value is compare as being "greater than or equal to" and the end value as "less than." This allows for inclusive date/time ranges without, for example, the need to specify complex operators.

This could get propagated to the regional level and then put into a log db for retrieval later or processing. The api to retrieve usages would look at this secondary db and not the configuration db.

This paradigm works for event type data, but it's not optimal for accumulation of usage that happens out-of-band from the API. For example, you may create an instance, and it remains on through an entire billing cycle, consuming network bandwidth during that period. If bandwidth should be billable, there would need to be regular "bandwidth used" events in the event stream that get added up when it's time to bill for them. It might make more sense to have a separate solution for cumulative counters (like bandwidth) that can be externally retreived on regular intervals rather than having some sort of local scheduler that generates events on such intervals. This way the collection interval could vary on a per-instance basis and no complex scheduler logic would need to be tracked on the host node.

Notifications

The event-tracking worker should provide the ability to perform PubSubHubbub (PSH) notifications based on configuration settings (i.e., this functionality can be disabled via a configuration flag).

The configuration must also allow zero or more PSH endpoints ("hubs") to be defined.

When an events occurs and notifications are enabled, then the worker would issue a standard PSH "ping".

Specifically, it would issue a POST request to the hubs defined in the configuration; the POST would include the content-type and parameters as defined in the specification.

Note that PubSubHubbub is optional; the API may always be retrieved via a simple Atom 1.0 query.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

  • data migration, if any
  • redirects from old URLs to new ones, if any
  • how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.