Jump to: navigation, search

Difference between revisions of "EfficientMetering"

(harlowja: The storage engine will be discussed in a meeting http://wiki.openstack.org/Meetings/MeteringAgenda 7 June: storage backend (high availability, SPOF etc.) .)
(system disks persist when the instance is shutdown but not terminated and must be accounted for)
(78 intermediate revisions by 6 users not shown)
Line 2: Line 2:
 
= Efficient Metering in [[OpenStack]] Blueprint =
 
= Efficient Metering in [[OpenStack]] Blueprint =
  
Project and code : https://launchpad.net/ceilometer
+
== Related resources ==
  
Meetings : http://wiki.openstack.org/Meetings/MeteringAgenda
+
http://wiki.openstack.org/utilizationdata
  
 
== Uses cases ==
 
== Uses cases ==
Line 28: Line 28:
 
== Proposed design ==
 
== Proposed design ==
  
=== Meters ===
+
=== Counters ===
  
The following is a first list of meters that needs to be collected in order to allow billing systems to perform their tasks.  This list must be expandable over time and each administrator must have the possibility to enable or disable each meter based on his local needs.
+
The following is a first list of counters that needs to be collected in order to allow billing systems to perform their tasks.  This list must be expandable over time and each administrator must have the possibility to enable or disable each counter based on his local needs.
  
 
{| border="1" cellpadding="2" cellspacing="0"
 
{| border="1" cellpadding="2" cellspacing="0"
 
|   
 
|   
|  ''' Meter name '''  
+
|  ''' Counter name '''  
 
|  '''Component'''  
 
|  '''Component'''  
|  '''Resource ID'''
 
 
|  '''Volume unit'''  
 
|  '''Volume unit'''  
|  '''Payload'''
+
|  '''Secondary'''  
|  '''Note'''  
 
 
|-
 
|-
 
|  c1  
 
|  c1  
 
|  instance  
 
|  instance  
 
|  nova compute  
 
|  nova compute  
|  instance id
 
 
|  minute  
 
|  minute  
 
|  type  
 
|  type  
|  type is the instance flavor id used
 
 
|-
 
|-
 
|  c2  
 
|  c2  
 
|  cpu  
 
|  cpu  
 
|  nova compute  
 
|  nova compute  
|  instance id
 
 
|  minute  
 
|  minute  
 
|  type  
 
|  type  
|  type is the cpu type used (possibly: [GPU|Arm|x86|x86_64])
 
 
|-
 
|-
 
|  c3  
 
|  c3  
 
|  ram  
 
|  ram  
 
|  nova compute  
 
|  nova compute  
| instance id
+
| Megabyte  
Megabyte  
 
 
 
|   
 
|   
 
|-
 
|-
Line 68: Line 60:
 
|  disk  
 
|  disk  
 
|  nova compute  
 
|  nova compute  
| instance id
+
| Megabyte  
Megabyte  
 
 
|   
 
|   
|  system disks persist when the instance is shutdown but not terminated and must be accounted for
 
|-
 
|  c5
 
|  io
 
|  nova compute
 
| instance id
 
|  Megabyte
 
 
|  disk IO in megabyte per second has a high impact on the service availability and could be billed separately
 
 
|-
 
|-
 
|  v1  
 
|  v1  
 
|  bd_reserved  
 
|  bd_reserved  
 
| nova volume  
 
| nova volume  
|  volume id
 
 
|  Megabyte  
 
|  Megabyte  
 
 
|   
 
|   
 
|-
 
|-
Line 92: Line 72:
 
|  bd_used  
 
|  bd_used  
 
|  nova volume
 
|  nova volume
|  volume id
 
 
|  Megabyte  
 
|  Megabyte  
 
|   
 
|   
|  (optional)
 
 
|-
 
|-
 
|  n1  
 
|  n1  
 
|  net_in_int  
 
|  net_in_int  
 
|  nova network  
 
|  nova network  
|  IP address
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data received from internal network source
 
 
|-
 
|-
 
|  n2  
 
|  n2  
 
|  net_in_ext  
 
|  net_in_ext  
 
|  nova network  
 
|  nova network  
|  IP address
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data received from external network source
 
 
|-
 
|-
 
|  n3  
 
|  n3  
 
|  net_out_int  
 
|  net_out_int  
 
|  nova network  
 
|  nova network  
|  IP address
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data sent to internal network dest
 
 
|-
 
|-
 
|  n4  
 
|  n4  
 
|  net_out_ext  
 
|  net_out_ext  
 
|  nova network  
 
|  nova network  
|  IP address
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data sent to external network destinations
 
 
|-
 
|-
 
|  n5  
 
|  n5  
 
|  net_float  
 
|  net_float  
 
|  nova network  
 
|  nova network  
IP address
+
count
minute
+
|   
|  type
 
|  The type distinguishes public IPs depending on their allocation policy. For instance IPv6 or IPv4_FROM_RIPE or IPv4_FROM_OVH etc. The acquisition or maintainance cost of a floating IP may depend on its allocation policy.
 
 
|-
 
|-
 
|  o1  
 
|  o1  
 
|  obj_volume  
 
|  obj_volume  
 
|  swift  
 
|  swift  
|  swift account id
 
 
|  Megabytes  
 
|  Megabytes  
 
|   
 
|   
|  total object volume stored
 
 
|-
 
|-
 
|  o2  
 
|  o2  
 
|  obj_in_int  
 
|  obj_in_int  
 
|  swift  
 
|  swift  
|  swift account id
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data received from internal network source
 
 
|-
 
|-
 
|  o3  
 
|  o3  
 
|  obj_in_ext  
 
|  obj_in_ext  
 
|  swift  
 
|  swift  
|  swift account id
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data received from external network source
 
 
|-
 
|-
 
|  o4  
 
|  o4  
 
|  obj_out_int  
 
|  obj_out_int  
 
|  swift  
 
|  swift  
|  swift account id
 
 
|  Kbytes  
 
|  Kbytes  
 
|   
 
|   
|  volume of data sent to internal network dest
 
 
|-
 
|-
 
|  o5  
 
|  o5  
 
|  obj_out_ext  
 
|  obj_out_ext  
 
|  swift  
 
|  swift  
|  swift account id
 
 
|  Kbytes
 
|  Kbytes
 
|   
 
|   
|  volume of data sent to external network destinations
 
|-
 
|  o6
 
|  obj_number
 
|  swift
 
|  swift account id
 
 
|  container
 
|  Number of objects stored for a container. The resource_id is the container id.
 
|-
 
|  o7
 
|  obj_containers
 
|  swift
 
|  swift account id
 
 
|  Number of containers
 
 
|-
 
|  o8
 
|  obj_requests
 
|  swift
 
|  swift account id
 
 
|  type
 
|  Number of HTTP requests, type being the request type (GET/HEAD/PUT/POST…)
 
 
|}
 
|}
  
Other possible meters:
+
''Note for network counters (n1-n4)'': the distinction between internal and external traffic requires that internal networks be explicitly listed in the agent configuration.
* service handlers (load balancer, databases, queues...)
 
* service usage
 
  
''Note for network meters (n1-n4)'': the distinction between internal and external traffic requires that internal networks be explicitly listed in the agent configuration.
+
=== Agents ===
  
''Note(dhellmann)'': That isn't going to scale to a real system where tenants may create their own networks. We should just collect the data for each network, and let the billing system decide on the rate at which to charge (possibly $0 for internal networks).
+
* Agent on each nova compute node to accumulate and send counters for c1, c2, c3, n1, n2, n3, n4.  The agent is likely to be pulling this information from libvirt.
 +
* Agent on each nova volume node to accumulate and send counters for v1, v2
 +
* Agent on each nova network node to accumulate and send counters for n5 (can this be pulled directly from the nova database?)
 +
* Agent on each swift proxy to forward existing accounting data o1 and accumulate and send o2-o5
  
 
=== Storage ===
 
=== Storage ===
 
{| border="1" cellpadding="2" cellspacing="0"
 
|  '''Field name'''
 
|  '''Type'''
 
|-
 
|  source
 
|  ?
 
|-
 
|  user_id
 
|  String
 
|-
 
|  project_id
 
|  String
 
|-
 
|  resource_id
 
|  String
 
|-
 
|  resource_metadata
 
|  String
 
|-
 
|  meter_type
 
|  String
 
|-
 
|  meter_volume
 
|  Number
 
|-
 
|  meter_duration
 
|  Integer
 
|-
 
|  meter_datetime
 
|  Timestamp
 
|-
 
|  payload
 
|  String
 
|-
 
|  message_signature
 
|  String
 
|-
 
|  message_id
 
 
|}
 
  
 
* Data is stored on a per account basis in a db on a per availability zone basis
 
* Data is stored on a per account basis in a db on a per availability zone basis
Line 257: Line 151:
 
** account_id (same as keystone’s)
 
** account_id (same as keystone’s)
 
** account_state (enabled, credit disabled, admin disabled)
 
** account_state (enabled, credit disabled, admin disabled)
 +
* Per event records hold
 +
** account_id
 +
** counter_type
 +
** counter_volume
 +
** counter_duration
 +
** counter_datetime
 +
** message_signature
 +
** message_id
 
* db is not directly accessible by any other mean than API  
 
* db is not directly accessible by any other mean than API  
 
* a process must collect messages from agent and store data  
 
* a process must collect messages from agent and store data  
* a process may validate meters against nova event database
+
* a process may validate counters against nova event database
 
* a process may verify that messages were not lost
 
* a process may verify that messages were not lost
 
* a process may verify that accounts states are in sync with keystone
 
* a process may verify that accounts states are in sync with keystone
  
Note: The instance_metadata field content is duplicated for each meter. For instance it will be duplicated for all c? fields. The storage optimization is to be dealt with in future versions of ceilometer.
+
=== Messaging ===
 
 
Note: The storage may collapse records or it may be done by the API as an optimisation to reduce the amount of information that is returned. For instance, if all fields from two consecutive c1 counter are equal and they are adjacent in time (i.e meter_datetime[second] - meter_datetime[first] == meter_duration[second] - meter_duration[second] ), then the first record can be removed because it is redundant.
 
 
 
=== Alternative gauge design ===
 
 
 
During the Folsom ODS session, an alternate design was discussed where events instead of recoding deltas, would record
 
the absolute value of a gauge. That would require to extend the event to include the 'object id' (instance, network, volume) associated with the meter.
 
 
 
The delta model can be derived from the absolute model, and means it's resilient in the face of missing delta registration.
 
 
 
=== Agents ===
 
 
 
* Agent on each nova compute node to accumulate and send meters for c1, c2, c3, c4, c5, n1, n2, n3, n4.  The agent is likely to be pulling this information from libvirt.
 
** c5 could get disk I/O stats with libvirt's virDomainBlockStats
 
** n3 / n4 could use iptables accounting rules ? (for external traffic ?)
 
** n1 / n2 could use libvirt's virDomainInterfaceStats ? (for all traffic ?)
 
* Agent on each nova volume node to accumulate and send meters for v1, v2
 
* Agent on each swift proxy to forward existing accounting data o1 and accumulate and send o2-o5
 
 
 
Note: nova network node need not accumulate and send meters for n5 because they can be pulled directly from the nova database ( see  nova-manage floating list for instance )
 
 
 
=== Architecture ===
 
 
 
* An agent runs on each [[OpenStack]] node ( Bare Metal machine ) and harvests the data localy
 
** If a meter is available from the existing [[OpenStack]] component it should be used
 
** A standalone ceilometer agent implements the meters that are not yet available from the existing [[OpenStack]] components
 
* A storage daemon communicates with the agents to collect their data and aggregate them
 
* The agents collecting data are authenticated to avoid pollution of the metering service
 
* The data is sent from agents to the storage daemon via a trusted messaging system (RabbitMQ?)
 
* The data / messages exchanged between agents and the storage daemon use a common messages format
 
* The content of the storage is made available thru a REST API providing aggregation
 
* The message queue is separate from other queues (such as the nova queue)
 
* The messages in queue are signed and non repudiable (http://en.wikipedia.org/wiki/Non-repudiation)
 
 
 
Note: document some use case scenarios to really nail down the architecture.  Who signals the metering service?  The API service or nova, quantum, swift, glance, volume?
 
 
 
Note: ideally, all meters are available from the [[OpenStack]] component responsible for a given resource (for instance the disk I/O for an ephemeral disk is made available in nova). However, it is not realistic to assume it can always be the case. Standalone ceilometer agents runing on [[OpenStack]] nodes provide access to the meters when the [[OpenStack]] component don't. The meter implemented in ceilometer agents should always be contributed to the [[OpenStack]] component. This kind of incubation for each given meter ( first implemented in ceilometer agents and then in the [[OpenStack]] component ) is both practical for short term purposes and a sound long term practice that avoids forking code.
 
 
 
=== Messaging use cases ===
 
  
Instance creation
+
* data is sent from agents to storage via a trusted messaging system (RabbitMQ?)
* An instance is created, nova issues a message ( http://wiki.openstack.org/SystemUsageData )
+
* message queue is separate from other queues
* The metering storage agent listens on the nova queue and picks up the creation message
+
* messages in queue are signed and non repudiable (see link for explanation on this exotic word)
* The metering storage agent stores the creation event locally, with a timestamp
 
* The metering storage daemon is notified by the agent that the instance has been created five minutes ago and aggregates this information in the tenant records
 
  
 
=== API ===
 
=== API ===
  
* Database can only be queried via a REST API (i.e. the database schema is not a supported API and can change in a non backward compatible way from one version to the other).
+
* Database can only be queried via a REST API
 
* Requests must be authenticated (separate from keystone, or only linked to accounting type account)
 
* Requests must be authenticated (separate from keystone, or only linked to accounting type account)
 
* API Server must be able to be redundant
 
* API Server must be able to be redundant
 
* Requests allow to
 
* Requests allow to
** Discover the sorts of things the server can provide:
+
** GET account_id list
*** list the components providing metering data
+
** SET account_id state
*** list the meters for a given component
+
*** When this is called, the API server transfers the request to the keystone database to ensure account state is in sync
*** list the known users
+
* GET list of counter_type
*** list the known projects
+
** GET list of events per account
*** list the known sources
+
*** optional start and end for counter_datetime
*** list the types of meters known
+
*** optional counter_type
* Fetch raw event data, without aggregation:
+
* GET sum of (counter_volume, counter_duration) for counter_type and account_id
** per user
+
** optional start and end for counter_datetime
** per project
 
** per source
 
** per user and project
 
* Produce aggregate views of the data:
 
** sum "volume" field for meter type over a time period
 
*** per user
 
*** per project
 
*** per source
 
* sum "duration" field for meter type over a period of time
 
** per user
 
** per project
 
** per source
 
 
 
Note: here is a list of additional items. Some of these items may be better handled in the consumer of this API
 
(the system that actually bills the user)
 
* list discrete events that may not have a duration (instance  creation, IP allocation, etc.)
 
* list raw event data for a resource ("what happened with a specific  instance?")
 
* aggregate event data per meter type for a resource over a period of  time ("what costs are related to this instance?")
 
* sum volume for meter type over a time period for a specific resource  ("how much total bandwidth was used by a VIF?")
 
* sum duration for meter type over a time period for a specific  resource ("how long did an instance run?")
 
* metadata for resources (such as location of instances)
 
* aggregating averages in addition to sums
 
Note: the aggregation of values is done by the API and is not stored in the database. It may be cached for performance reasons but the caching strategy is outside of the scope of this blueprint.
 
 
 
Note: At the Folsom design session, the SET account_id call designed to change the status of the tenant in keystone was pointed more as a wart at this stage, since the billing system will need to talk to Keystone API anyway to make sense of the account id.
 
 
 
== Free Software Billing Systems ==
 
 
 
A list of the billing system implementations that could use the Metering system when it becomes available.
 
 
 
* Dough https://github.com/lzyeval/dough
 
* trystack.org billing https://github.com/trystack/dash_billing
 
* nova-billing https://github.com/griddynamics/nova-billing
 
 
 
== Related resources ==
 
 
 
* Definition of a Storage Accounting Record http://www.ogf.org/Public_Comment_Docs/Documents/2012-02/EMI-StAR-OGF-info-doc-v2.pdf
 
* [[UsageRecord]] format http://www.ogf.org/documents/GFD.98.pdf
 
* Capturing exchanges https://github.com/rackspace/stacktach
 
* Messages about system usage http://wiki.openstack.org/SystemUsageData
 
* http://etherpad.openstack.org/EfficientMetering
 
* Use https://github.com/stackforge
 
* lzyeval codebase:
 
** billing https://github.com/lzyeval/dough
 
** metering https://github.com/lzyeval/kanyun
 
* trystack.org codebase:
 
** https://github.com/trystack/dash_billing
 
* http://wiki.openstack.org/utilizationdata
 
* Nova billing https://github.com/griddynamics/nova-billing
 
* Swift
 
** Retrieve Account Metadata http://docs.openstack.org/bexar/openstack-object-storage/developer/content/ch03s01.html#d5e388
 
** swift middlewares examples :
 
*** https://github.com/spilgames/swprobe (https://lists.launchpad.net/openstack/msg07794.html)
 
*** https://github.com/pandemicsyn/swift-informant (https://lists.launchpad.net/openstack/msg07795.html)
 
* April 2012 mailing list thread on billing https://lists.launchpad.net/openstack/msg10334.html
 
* Virgo (scriptable agent for meter collection): https://github.com/racker/virgo
 
** Contact Brandon Philips at Rackspace - brandon.philips@rackspace.com
 
* Ovirt DWH http://www.ovirt.org/wiki/Ovirt_DWH and associated database schema http://gerrit.ovirt.org/gitweb?p=ovirt-dwh.git;a=blob;f=data-warehouse/historydbscripts_postgres/create_tables.sql;h=2e05299a2de1b79634e862e5f1811dda3f303a96;hb=0271e5205ad29109c2e2313e7f6fb900e76a757a#l377
 
* Swift http://folsomdesignsummit2012.sched.org/event/d9135eabdd775432c74c3f1d32a325d3 and http://etherpad.openstack.org/FolsomSwiftStatsd
 
* Collecting meters from libvirt https://github.com/ss7pro/rescnt
 
* Doug Hellman sandbox https://github.com/dhellmann/metering-prototype/
 
* Prototype ceilometer implementation http://github.com/woorea/ceilometer-java and discussion https://lists.launchpad.net/openstack/msg11410.html
 
 
 
== FAQ ==
 
 
 
Q: why reinvent the wheel ? XXXX already does it.
 
 
 
A: please mail about the tool you think does the work, unless it is listed below.
 
* http://wiki.openstack.org/SystemUsageData for instance is specific to nova while the metering aims at aggregating all [[OpenStack]] components
 
* collectd, munin etc. all have pieces of the puzzle but do not have all of them and they are not designed with billing in mind and are not a good fit for this blueprint
 

Revision as of 09:46, 7 April 2012

Efficient Metering in OpenStack Blueprint

Related resources

http://wiki.openstack.org/utilizationdata

Uses cases

  • need a tool to collect per customer usage
  • need an API to query collected data from existing billing system
  • data needed per customer, with an hour level granularity, includes:
    • Compute - Nova:
      • instances (type, availability zone) - hourly usage
      • cpu - hourly usage
      • ram - hourly usage
      • nova volume block device (type, availability zone) - hourly usage
        • reserved
        • used
    • network (data in/out, availability zone) - hourly bytes + total bytes
      • differentiate between internal and external end-points
      • External floating IP - hourly bytes + total bytes
  • Storage - Swift
    • total data stored
    • data in/out - hourly bytes + total bytes
    • differentiate between internal and external end-points

^l

Proposed design

Counters

The following is a first list of counters that needs to be collected in order to allow billing systems to perform their tasks. This list must be expandable over time and each administrator must have the possibility to enable or disable each counter based on his local needs.

Counter name Component Volume unit Secondary
c1 instance nova compute minute type
c2 cpu nova compute minute type
c3 ram nova compute Megabyte
c4 disk nova compute Megabyte
v1 bd_reserved nova volume Megabyte
v2 bd_used nova volume Megabyte
n1 net_in_int nova network Kbytes
n2 net_in_ext nova network Kbytes
n3 net_out_int nova network Kbytes
n4 net_out_ext nova network Kbytes
n5 net_float nova network count
o1 obj_volume swift Megabytes
o2 obj_in_int swift Kbytes
o3 obj_in_ext swift Kbytes
o4 obj_out_int swift Kbytes
o5 obj_out_ext swift Kbytes

Note for network counters (n1-n4): the distinction between internal and external traffic requires that internal networks be explicitly listed in the agent configuration.

Agents

  • Agent on each nova compute node to accumulate and send counters for c1, c2, c3, n1, n2, n3, n4. The agent is likely to be pulling this information from libvirt.
  • Agent on each nova volume node to accumulate and send counters for v1, v2
  • Agent on each nova network node to accumulate and send counters for n5 (can this be pulled directly from the nova database?)
  • Agent on each swift proxy to forward existing accounting data o1 and accumulate and send o2-o5

Storage

  • Data is stored on a per account basis in a db on a per availability zone basis
  • Per account records hold
    • account_id (same as keystone’s)
    • account_state (enabled, credit disabled, admin disabled)
  • Per event records hold
    • account_id
    • counter_type
    • counter_volume
    • counter_duration
    • counter_datetime
    • message_signature
    • message_id
  • db is not directly accessible by any other mean than API
  • a process must collect messages from agent and store data
  • a process may validate counters against nova event database
  • a process may verify that messages were not lost
  • a process may verify that accounts states are in sync with keystone

Messaging

  • data is sent from agents to storage via a trusted messaging system (RabbitMQ?)
  • message queue is separate from other queues
  • messages in queue are signed and non repudiable (see link for explanation on this exotic word)

API

  • Database can only be queried via a REST API
  • Requests must be authenticated (separate from keystone, or only linked to accounting type account)
  • API Server must be able to be redundant
  • Requests allow to
    • GET account_id list
    • SET account_id state
      • When this is called, the API server transfers the request to the keystone database to ensure account state is in sync
  • GET list of counter_type
    • GET list of events per account
      • optional start and end for counter_datetime
      • optional counter_type
  • GET sum of (counter_volume, counter_duration) for counter_type and account_id
    • optional start and end for counter_datetime