SystemUsageData

Launchpad Entry: NovaSpec: System Usage Data
Created: 3 February 2011
Contributors: Paul Voccio, Glen Campbell

Summary

Nova administrators want data on system usage for billing, chargeback, or monitoring purposes.

Release Note

System usage data is collected by Nova and provided via a rich Atom API. Usage data includes utilization of bandwidth, hard disk, and RAM, along with other important events such as the creation or teardown of servers. If configured, usage data is provided via PubSubHubbub (PSH) for efficient distribution of data to subscribers.

Rationale

As a systems integrator, we need the ability for 3rd party systems to query usage information from Nova to determine how to charge customers for use of the platform.

Enterprise customers also need the ability to determine usage data, though for different reasons. For example, a cloud administrator may want to determine VMs that have not been used in several month so that they can be reclaimed and the resources reallocated. Bandwidth statistics at the individual VM level could be used to optimize load balancing and deployment efficiency. And some IT departments may implement a quota system for their users.

Note that Nova itself should not be concerned with billing, but it needs to collect the usage data and aggregate it for a requested time period, as well as potentially reset any counters involved.

User stories

As a systems integrator, I need to retrieve usage data so that I can properly bill my customers.
As a systems integrator or enterprise cloud administrator, I want to monitor usage data over time so that I can optimize the utilization of my resources.
As an business or agency that supports multiple projects, I need to account for the resources consumed by each project so that I can properly meet accounting and budget standards.
As a systems integrator, I need to provide usage data to multiple third-party systems without building custom interfaces to each one so that I can efficiently utilize my manpower.

Assumptions

Usage data is retrieved by Tenant ID (as defined in the openstack-accounting blueprint - is this still relevant?).

There is a billing system that is not a part of OpenStack. Invoicing, billing, and customer management are handled externally to OpenStack/Nova. Nova only has knowledge of an "tenant ID" (as per the openstack-accounting blueprint -- relevant?).

The "tenant ID" is "not" the same as the (existing) project_id; there may be multiple projects under a single account ID.

*NOTE**: if we need to store account as well as tenant, we need to revisit keystone! Current design is that open stack services are only given tenant and user.

If Nova is installed in multiple regions (aka "zones"), usage data will not span zones; even if the same tenant ID is used in multiple zones, that data must be aggregated external to Nova.

The usage data API is queried against a regional endpoint, and not directly against the host. (open for discussion)

Design and Implementation

Event Data

We need to have a model for usages that record a usage every time an action ("event") happens on a vm. Events include creation, resizes, deletions, add/remove ips, etc. In theory, any event should be logged; in practice, it may be suitable to focus only on those events that are of significant interest to the OpenStack community.

Event data would include something similar, but not strict to the following:

Data Type	Name
t.integer	instance_id
t.integer	ram_size
t.integer	disk_size
t.string	tenant_id
t.integer	used_tx
t.integer	used_rx
t.integer	has_backups
t.integer	extra_ips
t.datetime	started_at
t.datetime	ended_at
t.datetime	created_at
t.datetime	updated_at
t.string	options

This could potentially be made generic by specifying the attributes to be tracked and recorded in a configuration file. Ideally, the usage tracking API should be extensible so that users may track specific usage data of interest to themselves without the need to modify the core Nova system.

API

RETURN FORMAT

The usage data API will return Atom 1.0-formatted data.

(*NOTE* this proposal is still being evaluated; please do not start implementation before a final determination is made.) The proposed method for returning structured data is by using the Open Data Protocol embedded within the Atom feed.

CACHING

GET requests MUST return a Cache-Control: header.

The default cache time (in seconds) MUST be configurable by a system-wide setting. For example, if the system setting for cache time is 60 seconds, then an API GET request should return:

Cache-Control: max-age=60

If the user wishes to disable caching, use a cache setting of 0.

METHODS

URN	Method	Returns
/v1.0/usage	GET	returns aggregated usage information for the entire Nova instance/zone
	PUT	405 Method Not Allowed
	POST	405 Method Not Allowed
	DELETE	405 Method Not Allowed
/v1.0/tenantId/usage	GET	Returns aggregated usage data for the specified tenant. Strictly speaking, GET requests should be idempotent; however, because usage data may have some latency, a second GET request may return (slightly) different data
	PUT	405 Method Not Allowed
	POST	Returns aggregated usage data for the specified account as per GET; however, the POST request also resets ongoing counters to zero.
	DELETE	Resets aggregated usage data for the specified account.

QUERY STRINGS

All queries must be able to be constrained by a timestamp:

/version/tenantID/usage?begin=timestamp1&end=timestamp2

It's probably best not to provide search functionality like this on the host node. We should do our best to keep the database on each host node as small as we reasonably can. The API would be useful with this capability, but it should search through a data store at a region level.

In these cases, the begin value is compare as being "greater than or equal to" and the end value as "less than." This allows for inclusive date/time ranges without, for example, the need to specify complex operators.

This could get propagated to the regional level and then put into a log db for retrieval later or processing. The api to retrieve usages would look at this secondary db and not the configuration db.

timestamp1 and timestamp2 are UNIX timestamp values.

This paradigm works for event type data, but it's not optimal for accumulation of usage that happens out-of-band from the API. For example, you may create an instance, and it remains on through an entire billing cycle, consuming network bandwidth during that period. If bandwidth should be billable, there would need to be regular "bandwidth used" events in the event stream that get added up when it's time to bill for them. It might make more sense to have a separate solution for cumulative counters (like bandwidth) that can be externally retreived on regular intervals rather than having some sort of local scheduler that generates events on such intervals. This way the collection interval could vary on a per-instance basis and no complex scheduler logic would need to be tracked on the host node.

Notifications

The event-tracking worker should provide the ability to perform PubSubHubbub (PSH) notifications based on configuration settings (i.e., this functionality can be disabled via a configuration flag).

See the Notification System blueprint for details.

The configuration must also allow zero or more PSH endpoints ("hubs") to be defined.

When an events occurs and notifications are enabled, then the worker would issue a standard PSH "ping".

Specifically, it would issue a POST request to the hubs defined in the configuration; the POST would include the content-type and parameters as defined in the specification.

Note that PubSubHubbub is optional; the API may always be retrieved via a simple Atom 1.0 query.

Dependencies

"Customer," in this case, is simply the account code specified in the multi-tenant-accounting blueprint.

Notifications require the implementation of the Notification System blueprint.

"Instance," must be independent of Customer (actions, usages) in the event of a transfer between Customers.

Instance Diagnostics

already available via the API and XenTools - they just need to be stored in the DB

Instance Actions

already available for all XenAPI calls (also available via the API) - this will need to be implemented for every action that is performed outside of the XenAPI itself

Migration

N/A

Test/Demo Plan

This need not be added or completed until the specification is nearing beta.

Unresolved issues

The actual usage data is still under consideration and needs further definition.

BoF agenda and discussion

Use this section to take notes during the BoF; if you keep it in the approved spec, use it for summarising what was discussed and note any options that were rejected.