Difference between revisions of "Ceilometer/blueprints/api-group-by"

Revision as of 05:31, 20 August 2013

Summary

Enhance API v2 so it implements new arguments to do GROUP BY operations.

User stories

I had an instance running for 6h. It started as a m1.tiny flavor during the first 2 hours and then grew up to a m1.large flavor for the next 4 hours. I need to get this two durations so I can bill them with different rates.

Design example

For example add:

g[]=<field name>

That solves the user story above with:

/v2/meters/instance/statistics?
 q[0].field=resource&
 q[0].op=eq&
 q[0].value=<my-resource-id>&
 q[1].field=timestamp&
 q[1].op=lt&
 q[1].value=<now>&
 q[2].field=timestamp&
 q[2].op=gt&
 q[2].value=<now - 6 hours>&
 g[0]=metadata.flavor&
 period=360

Would return

{[
  { "m1.tiny": { min: 1, max: 1, avg: 1, sum: 1 },
  { "m1.tiny": { min: 1, max: 1, avg: 1, sum: 1 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 1 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 1 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 1 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 1 } },
]}

Further more, dropping the q[0] request that narrows the search to only one resource allows to retrieve this information for all instances over that period of time:

/v2/meters/instance/statistics?
 q[0].field=timestamp&
 q[0].op=lt&
 q[0].value=<now>&
 q[1].field=timestamp&
 q[1].op=gt&
 q[1].value=<now - 6 hours>&
 g[0]=metadata.flavor&
 g[1]=resource&
 period=360

If there was another large instance, that would return:

{[
  { "m1.tiny": { min: 1, max: 1, avg: 1, sum: 1 }, "m1.large": { min: 1, max: 1, avg: 1, sum: 1 } },
  { "m1.tiny": { min: 1, max: 1, avg: 1, sum: 1 }, "m1.large": { min: 1, max: 1, avg: 1, sum: 1 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 2 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 2 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 2 } },
  { "m1.large": { min: 1, max: 1, avg: 1, sum: 2 } },
]}

Angus's comments/ramblings

1) I assume we can't group by more than one field? If so this should be (not an array):

groupby=metadata.flavor&

You can group by more than one field, see the second examples -- jd

2) period is not yet impl. - I'd better get on that ;)

3) Currently we return:

{[
  { "min": 1,
    "max": 1,
    "avg": 1,
    "sum": 1,
    "count": 1,
    "duration": 1,
  },
]}

To show the groupby we could return the following:

{[
  { "min": 1,
    "max": 1,
    "avg": 1,
    "sum": 1,
    "count": 1,
    "duration": 1,
    "groupby": "m1.tiny",
  },
]}

If there is no groupby that can just be None.

Fine with me, but you probably want "groupby": [ "m1.tiny" ] since you can group by multiple values. -- jd

We probably want that to be a mapping between the field name and its value. {'metadata.instance_type': 'm1.tiny'} -- dhellmann

4) from an impl. pov (mongo) we have:

   MAP_STATS = bson.code.Code("""
	    function () {
-	        emit('statistics', { min : this.counter_volume,
+	        emit(groupby_field, { min : this.counter_volume,
	                             max : this.counter_volume,
	                             qty : this.counter_volume,
	                             count : 1,
	                             timestamp_min : this.timestamp,
	                             timestamp_max : this.timestamp } )
	    }
	    """)

If we can pass in the groupby field into the above function then this will be super easy. Can we generate this bcode dynamically?

I don't see why you couldn't :) -- jd

We will need to be careful about injection attacks. -- dhellmann

Design notes

Storage driver tests to check group by statistics

Created a new class StatisticsGroupByTest in tests/storage/base.py that contains the storage tests for group by statistics and has its own test data

The storage tests check group by statistics for

single field, "user-id"
single field, "resource-id"
single field, "project-id"
single field, "source"
single metadata field (not yet implemented)
multiple fields
multiple metadata fields (not yet implemented)
multiple mixed fields, regular and metadata (not yet implemented)
single field groupby with query filter
single metadata field groupby with query filter (not yet implemented)
multiple field group by with multiple query filters
multiple metadata field group by with multiple query filters (not yet implemented)
single field with period
single metadata field with period (not yet implemented)
single field with query filter and period
single metadata field with query filter and period (not yet implemented)

The test data is constructed such that the measurements are integers (specified by the "volume" attribute of the sample) and the averages in the statistics are also integers. This helps avoid floating point errors when checking the statistics attributes (e.g. min, max, avg) in the tests.

Currently, metadata group by tests are not implemented. Supporting metadata fields is a more complicated case, so we leave that for future work.

The group by period tests and test data are constructed, so that there are periods with no samples. For the group by period tests, statistics are calculated for the periods 10:11 - 12:11, 12:11 - 14:11, 14:11 - 16:11, and 16:11 - 18:11. However, there are no samples with timestamps in the period 12:11 - 14:11. It's important to have this case, to check that the storage drivers behave properly when there are no samples in a period.

Addressed by: https://review.openstack.org/41597 "Add SQLAlchemy implementation of groupby"

SQL Alchemy group by implementation

Decided to only implement group by for the "user-id", "resource-id", and "project-id" fields. The "source" and metadata fields are not supported. It turned out that supporting "source" in SQL Alchemy is much more complicated than "user-id", "resource-id", and "project-id".