SystemUsageData

Launchpad Entry: NovaSpec: System Usage Data
Created: 3 February 2011
Contributors: Paul Voccio, Glen Campbell

Summary

Nova administrators want data on system usage for billing, chargeback, or monitoring purposes.

Release Note

System usage data is collected by Nova and provided via a rich Atom API. Usage data includes utilization of bandwidth, hard disk, and RAM, along with other important events such as the creation or teardown of servers. If configured, usage data is provided via PubSubHubbub (PSH) for efficient distribution of data to subscribers.

Rationale

As a systems integrator, we need the ability for 3rd party systems to query usage information from Nova to determine how to charge customers for use of the platform.

Enterprise customers also need the ability to determine usage data, though for different reasons. For example, a cloud administrator may want to determine VMs that have not been used in several month so that they can be reclaimed and the resources reallocated. Bandwidth statistics at the individual VM level could be used to optimize load balancing and deployment efficiency. And some IT departments may implement a quota system for their users.

Note that Nova itself should not be concerned with billing, but it needs to collect the usage data and aggregate it for a requested time period, as well as potentially reset any counters involved.

User stories

As a systems integrator, I need to retrieve usage data so that I can properly bill my customers.

As a systems integrator or enterprise cloud administrator, I want to monitor usage data over time so that I can optimize the utilization of my resources.

As an business or agency that supports multiple projects, I need to account for the resources consumed by each project so that I can properly meet accounting and budget standards.

As a systems integrator, I need to provide usage data to multiple third-party systems without building custom interfaces to each one so that I can efficiently utilize my manpower.

Assumptions

Usage data is retrieved by Account ID as defined in the openstack-accounting blueprint.

There is a billing system that is not a part of OpenStack. Invoicing, billing, and customer management are handled externally to OpenStack/Nova. Nova only has knowledge of an "account ID" as per the openstack-accounting blueprint.

The "account ID" is "not" the same as the (existing) project_id; there may be multiple projects under a single account ID.

If Nova is installed in multiple regions (aka "zones"), usage data will not span zones; even if the same account ID is used in multiple zones, that data must be aggregated external to Nova.

Design

Event Data

We need to have a model for usages that record a usage every time an action ("event") happens on a vm. Events include creation, resizes, deletions, add/remove ips, etc. In theory, any event should be logged; in practice, it may be suitable to focus only on those events that are of significant interest to the OpenStack community.

Event data would include something similar, but not strict to the following:

Data Type	Name
t.integer	instance_id
t.integer	ram_size
t.integer	disk_size
t.integer	account_id
t.integer	used_tx
t.integer	used_rx
t.integer	has_backups
t.integer	extra_ips
t.datetime	started_at
t.datetime	ended_at
t.datetime	created_at
t.datetime	updated_at
t.string	options

This could potentially be made generic by specifying the attributes to be tracked and recorded in a configuration file. Ideally, the usage tracking API should be extensible so that users may track specific usage data of interest to themselves without the need to modify the core Nova system.

API

API RETURN FORMAT

The usage data API will return Atom 1.0-formatted data.

The proposed method for returning structured data is by using the Open Data Protocol embedded within the Atom feed.

API CACHING

GET requests MUST return a Cache-Control: header.

The default cache time (in seconds) MUST be configurable by a system-wide setting. For example, if the system setting for cache time is 60 seconds, then an API GET request should return:

Cache-Control: max-age=60

If the user wishes to disable caching, use a cache setting of 0.

API METHODS

URN	Method	Returns
/v1.0/usage	GET	returns aggregated usage information for the entire Nova instance
	PUT	405 Method Not Allowed
	POST	405 Method Not Allowed
	DELETE	405 Method Not Allowed
/v1.0/accountID/usage	GET	Returns aggregated usage data for the specified account. Strictly speaking, GET requests should be idempotent; however, because usage data may have some latency, a second GET request may return (slightly) different data
	PUT	405 Method Not Allowed
	POST	Returns aggregated usage data for the specified account as per GET; however, the POST request also resets ongoing counters to zero.
	DELETE	Resets aggregated usage data for the specified account.

API QUERY STRINGS

All queries must be able to be constrained by a timestamp:

/version/accountID/usage?begin=2011-01-01-00:00&end=2011-02-01-00:00

It's probably best not to provide search functionality like this on the host node. We should do our best to keep the database on each host node as small as we reasonably can. The API would be useful with this capability, but it should search through a data store at a region level.

In these cases, the begin value is compare as being "greater than or equal to" and the end value as "less than." This allows for inclusive date/time ranges without, for example, the need to specify complex operators.

This could get propagated to the regional level and then put into a log db for retrieval later or processing. The api to retrieve usages would look at this secondary db and not the configuration db.

This paradigm works for event type data, but it's not optimal for accumulation of usage that happens out-of-band from the API. For example, you may create an instance, and it remains on through an entire billing cycle, consuming network bandwidth during that period. If bandwidth should be billable, there would need to be regular "bandwidth used" events in the event stream that get added up when it's time to bill for them. It might make more sense to have a separate solution for cumulative counters (like bandwidth) that can be externally retreived on regular intervals rather than having some sort of local scheduler that generates events on such intervals. This way the collection interval could vary on a per-instance basis and no complex scheduler logic would need to be tracked on the host node.

Notifications

The event-tracking worker should provide the ability to perform PubSubHubbub (PSH) notifications based on configuration settings (i.e., this functionality can be disabled via a configuration flag).

The configuration must also allow zero or more PSH endpoints ("hubs") to be defined.

When an events occurs and notifications are enabled, then the worker would issue a standard PSH "ping".

Specifically, it would issue a POST request to the hubs defined in the configuration; the POST would include the content-type and parameters as defined in the specification.

Note that PubSubHubbub is optional; the API may always be retrieved via a simple Atom 1.0 query.

Implementation

This section should describe a plan of action (the "how") to implement the changes discussed. Could include subsections like:

UI Changes

Should cover changes required to the UI, or specific UI that is required to implement this

Code Changes

Code changes should include an overview of what needs to change, and in some cases even the specific details.

Migration

Include:

data migration, if any
redirects from old URLs to new ones, if any
how users will be pointed to the new way of doing things, if necessary.

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

This should highlight any issues that should be addressed in further specifications, and not problems with the specification itself; since any specification with problems cannot be approved.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.