InstrumentationMetricsMonitoring10292012

3:02 The topic for #openstack-meeting is: OpenStack meetings || Development in #openstack-dev || Help in #openstack

3:02 markmcclain enikanorov: I'm doing some client work

3:02 edgarmagana adeu

3:02 mestery 3:02 later!

3:02 salv-orlando bot now workie - sorry

3:02 danwent by all

3:02 markvoelker Goodnight folks

3:02 nati_ueno_ Bye!

3:02 SumitNaiksatam_ bye 3:02 asomya left the room.

3:02 sachint__ later

3:02 salv-orlando not our fault though 3:02 markvoelker left the room (quit: Quit: Colloquy for iPad - http://colloquy.mobi).

3:02 salv-orlando bye

3:03 sasharatkovic bye

3:03 harlowja

3:03 amotoki We can find the irc log at http://eavesdrop.openstack.org/irclogs/%23openstack-meeting/. 3:03 john5223 left the room (quit: Quit: Leaving).

3:03 harlowja bot no workie 3:03 sasharatkovic left the room (quit: Quit: Page closed). 3:03 edgarmagana left the room (quit: Quit: Page closed).

3:03 asalkeld hi

3:03 harlowja howdy

3:03 nijaba o/

3:03 eglynn o/ 3:03 mlavalle left the room.

3:03 jeffreyb sandywalsh: hey

3:04 sandywalsh o/ I made it

3:04 jeffreyb cool

3:04 anniec ^hyay sandy!

3:04 timjr

3:04 jeffreyb waiting for angus et al 3:04 pamor left the room (quit: Quit: Page closed).

3:04 asalkeld I am here

3:04 dhellmann o/

3:04 jeffreyb did you guys have a chance to look at the diagram? fire away 3:04 nati_ueno_ left the room (quit: Remote host closed the connection).

3:04 asalkeld yip, good start 3:04 dhellmann jeffreyb: diagram? (I guess that's a "no") 3:04 markmcclain left the room (quit: Quit: Leaving.).

3:05 jeffreyb on the spec from last week

3:05 eglynn http://wiki.openstack.org/InstrumentationMetricsMonitoring?action=AttachFile&do=view&target=InstrumentationMonitoringSketch.png

3:05 timjr http://wiki.openstack.org/InstrumentationMetricsMonitoring?action=AttachFile&do=view&target=InstrumentationMonitoringSketch.png

is that meetbot even here? 3:05 zyluo left the room (quit: Quit: Leaving). 3:05 salv-orlando left the room.

3:05 jeffreyb yeah, kinda tough to walk thru on irc, but captures our thinking

3:05 asalkeld registry ~= config?

3:05 harlowja sure, approx == similar to logging config for python 'similar'

3:06 jeffreyb think of it as a metric root 3:06 sachint_ left the room (quit: Quit: Page closed).

3:06 eglynn I guess I had a question on whether the 'metric driver' plus handlers kinda subsumes weher ceilometer sits today? s/weher/where/ 3:06 jaypipes left the room (quit: Quit: Leaving).

3:06 dhellmann timjr: I think the previous meeting folks said the bot was down

3:06 jeffreyb not sure…thinking there are bits in common

3:06 asalkeld I think everything including ceilometer-agent should use the same lib

3:06 jeffreyb so could perhaps drive ceilometer too

3:06 sandywalsh hmm, diagram is a little confusing to me 3:06 roampune left the room.

3:06 jeffreyb in what way? you are welcome to enhance. i put the sources on the wiki.

3:07 asalkeld so registry.new_data(...)

3:07 jeffreyb the core part is having the measurement bits that can then be used to drive metrics flow to file|datagram|ceilometer?

3:07 asalkeld goes to find the handler yea so one question is the difference in required fields 3:08 mattray [~Opscode@pdpc/supporter/21for7/mrayzenoss] entered the room.

3:08 asalkeld so metering needs more info and trace potentially much less 3:08 ijw11 left the room.

3:08 jeffreyb well, we were thinking about having scoping rules for what's active as well as metric levels (like log levels: billing, monitor, profile)

3:08 sandywalsh not really clear on what the metric layer is meant to convey, how that works with the monkeypatching/decorator layer and what the handlers/drivers are (as compared to notifier drivers)

3:09 harlowja /reload oops wrong place

3:09 eglynn or another approach would be for ceilometer to provide the different handlers/emitters/publishers, i.e. a common infrastcuture for routing/transforming these data

3:09 jeffreyb the metric core could be used via m.patch or via decoraters

3:09 eglynn (regardless of the source)

3:09 sandywalsh basically that whole middle block is confusing 3:09 ejkern left the room (quit: Ping timeout: 245 seconds).

3:09 jeffreyb sandy: what's the proposal from you? 3:10 harlowja left the room (quit: Quit: I'm popping this joint!).

3:10 jeffreyb others; is it also confusing to you? 3:10 harlowja [~harlowja@nat/yahoo/x-fwjzzopkarfcwbxt] entered the room. 3:10 vbannai left the room (quit: Quit: Ex-Chat).

3:10 jeffreyb i can perhaps try to create an annotated guide to the diagram

3:10 timjr I think the notion is that you can use the metrics layer by monkeypatching if you want, or with a decorator, or with explicit function calls

3:10 dhellmann it's not clear if the boxes inside nova-* are "logical" groupings or just where code lives at runtime

3:10 sandywalsh well, we're talking about two different things: events (for billing and monitoring) and instrumentation (for performance)

3:11 jeffreyb yes, so that is meant to convey a lib utilized in various daemons

3:11 dhellmann sandywalsh: they are two different things, but we're trying to explore whether we can share code for handling the data at different levels

3:11 eglynn sandywalsh: sure but still some commonality I suspect

3:11 jeffreyb i didn't have time to put in the interface boundaries

3:11 dhellmann for example, the "metric driver" box and all of the drivers could be shared with the ceilometer agent, so that both the agent and instrumented services could send data to the same places using the same code

3:11 sandywalsh well, the motivation for instrumentation is very different than the motivation for monitoring/billing

3:11 jeffreyb yeah, i think some are definitely common/shared

3:12 timjr we were thinking that you could view metrics as a superset of billing data... so we introduced the notion of levels -- meter, monitor, profile being like error, info, trace in logging

3:12 dhellmann or even send data to different places with the same code

3:12 jeffreyb there is quite an overlap between monitoring and instrumentation however, you might do different things with the data in the end

3:12 dhellmann right

3:12 harlowja right

3:12 asalkeld agree

3:12 eglynn sandywalsh: true that, but some of the mechanics of getting these data from A to B still common, or?

3:12 sandywalsh instrumentation is a much higher sampling rate than monitoring

3:12 timjr the point in the code where you emit the data should have no clue about where it's going

3:12 nijaba we did an excellent table of requirements per use cases at last summit, fwiw

3:12 sandywalsh and instrumentation can afford to drop some data

3:13 jeffreyb it is a much higher sampling rate

3:13 harlowja sandywalsh: isn't that more about what u do with the data, not how its produced?

3:13 jeffreyb but monitoring can afford to drop data too

3:13 asalkeld sandywalsh, agree but the notifier can do that

3:13 dhellmann sandywalsh: and that would be configured in the publisher

3:13 eglynn sandywalsh: that's a good point, on the droppability sandywalsh: only metering really requires completeness

3:13 sandywalsh well, not really monitoring needs it just as much 3:14 tgall_foo [~tgall@70.35.96.184] entered the room.

3:14 harlowja so thats making sure the publishing mechansim for '^hbilling' is using the NonDroppableHandler 3:14 tgall_foo left the room (quit: Changing host). 3:14 tgall_foo [~tgall@linaro/tgall-foo] entered the room.

3:14 timjr so, you would have a different handler for metric-level instrumentation, which uses, e.g., UDP, and can drop some

3:14 jeffreyb instrumentation is typically going to be used post-event, so it isn't even clear to me that you want it to be propogated out to dashboards/tools in real-time

3:14 sandywalsh we need to have a valid picture of state ... (for orchestration, etc)

3:14 eglynn for monitoring, once the data gets old, value drops off rapidly

3:14 jeffreyb eglynn: very much so

3:14 eglynn (might as well drop it on the floor if queues backed up etc.)

3:14 jeffreyb you don't want to lose monitoring data for sure, but if you have to you do

3:14 timjr uhh, no you want it most when your queues are backing up it's for debugging that kind of thing

3:15 sandywalsh it depends ... monitoring has equal importance to billing, we need all the events to get a complete picture

3:15 eglynn you want the most recent

3:15 jeffreyb yes, but you'd rather deliver high priority c&c messages

3:15 eglynn the hour-old stuff is already old news

3:15 sandywalsh instrumentation is trending data

3:15 eglynn (can maybe sample that)

3:15 timjr sandywalsh: could you elaborate on that? trending? 3:15 sachint__ is now known as sthakkar

3:16 eglynn in the twitter sense? (of trending ...)

3:16 sandywalsh instrumentation is things like: number of calls / second, average execution time, etc

3:16 timjr sure so, if the call latency to rabbitmq shoots up, then we know where to look for trouble

3:16 sandywalsh monitoring / billing requires a sole consumer and guarantee of hand off

3:16 timjr definitely but they can use the same emission hooks in the code

3:17 jeffreyb delivery sla is separate from measuring

3:17 timjr they just need different handlers (in the logging analogy)

3:17 asalkeld yea 3:17 sthakkar is now known as sachinthakkar

3:17 eglynn billing yes, but if you have to choose between more and less recent monitoring data, then the older stuff gets sampled or dropped

3:17 sandywalsh timjr: possibly, my concern is that instrumentation code could be anywhere in the code (depending on the developer)

3:17 eglynn (in extremis ...)

3:17 timjr sure

3:17 dhellmann thinks we're quibbling over implementation details a little early

3:18 harlowja

3:18 jeffreyb sandy: yes, that's what we hope for — instrumentation everywhere

3:18 timjr so, if the level is "meter" (or "billing", if you will), you gotta be careful where it is cuz somebody could end up paying if you move it!

3:18 anniec very similar to logging concept .. where there are different levels

3:18 sandywalsh yes, monitoring / billing require clear anchor points

3:18 timjr it's important for that kind of call to be explicit in the code, IMHO

3:18 anniec you turn on what you need

3:18 sandywalsh hmm not sure about that point anniec

3:18 jeffreyb so what are the q's we are trying to answer? 1) is the source of measurement the same for billing|monitoring|instrumentation 2) the how?

3:19 dhellmann are *any* sources of measurement the same?

3:19 jeffreyb i don't think we are agreed on #1 3:19 sandywalsh jeffreyb: I'm trying to get on the same page for terminology / requirements so we can talk about about implementation

3:19 timjr dhellmann: sure. I'm more on the fence over whether logging and metrics should be the same. Signs point to "no"

3:19 dhellmann and if the sources aren't the same, is there any benefit in sharing the delivery code?

3:19 jeffreyb doug: seems like there is definitely overlap with monitoring for some billing stuff 3:19 sachinthakkar left the room (quit: Quit: Page closed).

3:19 sandywalsh the diagram hints heavily at implementation

3:19 eglynn 1) the source can be different, but there can be common infrastructure for "publication"

3:20 sandywalsh I think billing and monitoring are the same 3:20 sthakkar [4834601d@gateway/web/freenode/ip.72.52.96.29] entered the room.

3:20 sandywalsh I think instrumentation is a different animal

3:20 jeffreyb sandywalsh: you are free to diagram yourself!

3:20 timjr well, we are threatening to implement it...

3:20 harlowja lol

3:20 eglynn sandywalsh: disagree, timeliness versus completeness 3:20 sandywalsh jeffreyb: I will

3:20 jeffreyb the point here is we should all present some views and see where we can agree or agree to disagree sandywalsh: great

3:21 asalkeld well I think if the code is in one spot it is easier for devs to see how to do either

3:21 sandywalsh (reading the scroll back, hard to keep up)

3:21 jeffreyb sandywalsh: i put the vdx and graffle in the wiki if you want to re-use any of those bits 3:21 sandywalsh jeffreyb: that's ok, I'll do a wiki page, but thanks 3:22 nijaba jeffreyb: link?

3:22 jeffreyb nijaba: http://wiki.openstack.org/InstrumentationMetricsMonitoring

3:22 sandywalsh as I've stated before, I've got concerns about putting instrumentation hooks in trunk (permanent) since the needs are so diverse

3:22 jeffreyb sandywalsh: what if we can make it non-intrusive/cheap?

3:23 harlowja just a perspective, facebook, yahoo, google code, instrumentation is in trunk

3:23 jeffreyb sandywalsh: is your concern performance or code cleanliness?

3:23 timjr sandywalsh: there's no harm in extra instrumentation -- you just don't configure a handler if you don't want it 3:23 sandywalsh jeffreyb: I'm all ears, the tach approach I think is best

3:23 timjr it's like debug-level logging

3:23 sandywalsh timjr: not really true, there are costs

3:23 timjr you just turn it off if you're ignoring it

3:23 asalkeld sandywalsh, not many other companies tolerate monkey patching

3:23 timjr function calls are not free, it's true

3:23 sandywalsh driver / library/ loaders / config

3:23 dhellmann timjr: even less expensive, since the decorator can evaluate the "on/off" flag once at startup

3:23 jeffreyb sandywalsh: but is tach for sort of one-off profiling or continuous use? 3:24 sandywalsh jeffreyb: both, we use it permanently and for one-offs 3:24 metral_ [~metral@c-24-4-21-156.hsd1.ca.comcast.net] entered the room.

3:24 harlowja dhellmann: really, nice

3:24 jeffreyb sandywalsh: so would you plan to monkey patch more? 3:24 sandywalsh jeffreyb: for instrumentation, yes

3:24 sandywalsh the decorators in trunk are already causing problems

3:24 timjr there's a place for that -- you can be much more pervasive if you monkey patch stuff nobody would want instrumentation calls on every second line of code

3:25 jeffreyb sandywalsh: what about real-time monitoring of things like resources/pools/etc? is that instrumentation?

3:25 dhellmann having monkeypatching as an option will provide good flexibility, but I don't think that should preclude wiring in instrumentation

3:25 sandywalsh the big issue with decorators are how they interact with exceptions

3:25 timjr but I would not want important events of measurements to be implicit 3:25 markwash [~markw@c-71-202-143-22.hsd1.ca.comcast.net] entered the room.

3:25 timjr events /or/ measurements, even

3:25 sandywalsh events I totally agree should be hard coded

3:25 harlowja sandywalsh: agreed, the decorartors right now are sorta hacky orchestration, hacky eventlet exception catching...

3:25 nijaba timjr: +1

3:25 dhellmann sandywalsh: what does a decorator do that tach doesn't do?

3:25 jeffreyb dhellmann: i was thinking the base metrics/measurement thingies could be used in both ways: decorators and monkey patching — then transmission conduit is shared

3:25 sandywalsh harlowja: +1 3:26 dhellmann jeffreyb: agreed

3:26 sandywalsh dhellmann: tach doesn't affect trunk and it's always the outer wrapper

3:26 dhellmann sandywalsh: does being the outer wrapper help with exceptions?

3:27 jeffreyb sandywalsh: but won't a bunch of us end up patching outside of trunk for the same sorts of needs?

3:27 sandywalsh so, let's consider a developer that wants to instrument a part of nova (let's say networking) 3:27 metral left the room (quit: Read error: Operation timed out). 3:27 metral_ is now known as metral

3:27 sandywalsh they either have to create a disposable branch or submit to trunk

3:27 jeffreyb seems like over time we will keep wanting to go deeper and deeper on what we measure all the time so that we can more easily characterize system behavior

3:27 sandywalsh neither of which is really attractive

3:27 jeffreyb and analyze run time faults 3:28 sandywalsh jeffreyb: yes

3:28 harlowja why is submission to trunk bad

3:28 eglynn would such a patch be carried long-term?

3:28 sandywalsh for instrumentation it would mean putting hooks everywhere

3:28 eglynn (or just as long as it takes to track down the bottleneck)

3:28 sandywalsh for monitoring/billing it's fine

3:28 jeffreyb i would expect the instrumentation would grow over the long term in trunk

3:28 timjr we should have performance-level instrumentation on trunk, if possible, because we should have a CI gate that looks for performance regressions, eventually

3:28 asalkeld sandywalsh, where it adds generic value 3:28 sandywalsh jeffreyb: that's the problem

3:28 eglynn i.e. is much ultra-fine-grained profiling inherently disposable?

3:28 sandywalsh timjr: disagree

3:29 timjr raises an eyebrow

3:29 jeffreyb sandywalsh: i see that as a good thing

3:29 timjr sandywalsh: I don't see your POV, I guess... could you elaborate?

3:29 jeffreyb the challenge for us is to make it so that the impact is negligible at run-time and so folks can switch it off who don't want or need it

3:29 sandywalsh we can still gate CI efforts via monitoring events or MP'ed installs

3:30 jeffreyb the monkey patch approach is fine for some situations but that would get a bit hard across large sets of fns

3:30 timjr sandywalsh: granted, that would work as well

3:30 sandywalsh and each party can instrument what is important to them, not what "is decreed" to be the important spots

3:30 timjr but, as developer, I want to know which events my code changes might affect so it is better if they are present on the code I'm changing

3:30 jeffreyb sandywalsh: well, that's where we were wanting to put in some scoping controls and level controls

3:30 asalkeld well how about we support in-code trace and monkey patching

3:31 timjr we definitely are going to want to decree some important spots 3:31 sandywalsh jeffreyb: it all sounds terribly heavy weight to me for something that is so transient

3:31 eglynn so can we distinguish between instrumentation that has a long-term use (coarse-grained timings, fault counts etc.) and tactical stuff that is only of interest to solve a particular problem?

3:31 asalkeld and can let ptl's decide where they want the trace

3:31 jeffreyb we think this level of data is gold

3:31 dhellmann yes, I don't think we're going to get everyone to agree to one or the other for MP, so this feels like a rabbit-warren of a discussion

3:31 anniec asalkeld: +1

3:31 sandywalsh there is some low-hanging instrumentation fruit, like we do today at the rpc layer

3:31 asalkeld the question: should we make such a lib

3:32 dhellmann if we agree that there is *something* instrumenting for monitoring, what can we do with the results and how much can we share them?

3:32 timjr rabbit-warren? are we... breeding?

3:32 asalkeld and what should it look like

3:32 sandywalsh but simply because they exist I still don't believe that code belongs in trunk

3:32 jeffreyb asalkeld: i think the answer is yes

3:32 dhellmann timjr: we're lost in the dark?

3:32 timjr oh and fuzzy

3:32 harlowja and cuddly

3:32 sandywalsh dhellmann: yes, once we grab the data, we can use common code to process it

3:32 timjr scoots a little further from harlowja

3:33 harlowja scoots closer

3:33 sandywalsh I'm arguing for this low-level distinction

3:33 dhellmann sandywalsh: cool. let's talk about that, then

3:33 timjr so, let's take a concrete example

3:33 jeffreyb i am guessing there are particular tastes in processing that may not be common

3:33 timjr if I want to know database latency, for example

3:33 jeffreyb e.g. we might collect loads of data and give it to a researcher on a hadoop grid to dream up interesting things

3:34 timjr is there any reason not to put a line of code that says "this is the extent that defines database latency"?

3:34 sandywalsh timjr: there are many ways to skin that cat without having to affect trunk

3:34 timjr if that information was in a config file or a patch, it would be less robust in the face of later code changes

3:35 sandywalsh so, for example, consider how cells work

3:35 timjr isn't that a ... federation layer?

3:35 sandywalsh cells have a new derivation for compute.api that redirects calls to other (child) cells

3:36 timjr I'm not sure there's good consensus on the value of federation vs. monolithic scaling

3:36 sandywalsh in the normal situation, it all works as a single cell, but by changing the --foo_driver flag, it works with cells we can do the same for --db_api and other larger subsystems this doesn't need to be a permanent part of trunk

3:36 jeffreyb just seems like not everything can or should be measured/gated by 1) patching, 2) queue rpc

3:36 sandywalsh but instead can be a "extra" part of ceilometer

3:37 timjr it doesn't need to be, but I'd certainly prefer to see a pretty good complement of metrics coming from trunk

3:37 jeffreyb if instrumentation data is lower priority, then it shouldn't go through the same channels of deliver as billing and control messages

3:37 harlowja a library like others in java that might be a good talking poiint as well, https://github.com/johnewart/ruby-metrics, the concepts there seem useful to others in other languages, as a library u could use it in your monkey patchers, u could use it in ceilometer, and so on, something like that for openstack/python would seem like the right way to go, as to how much gets into trunk, or how much doesn't, that can be up to the code reviewers and others, dhellmann should ceilometer be the point of the that library?

3:37 sandywalsh timjr: you can still do that by having a config file that hits the major / agreed-upon points timjr: it doesn't need to clutter up trunk

3:37 jeffreyb harlowja: rules violation - too much text

3:37 harlowja

3:38 timjr sandywalsh: but how would you connect the config file to the code?

3:38 sandywalsh timjr: tach, today, has all the hooks for nova rpc and some other areas (for instrumentation, not monitoring/billing)

3:38 harlowja sandywalsh: how about we seperate out the cluttering up trunk with doing it or not, that seems to be the code reviewers accepting it or not, but is the common concept that something in celiometer or a library should be created to aid in whatever the final result is?

3:38 dhellmann harlowja: ceilometer was accepted as an incubated project for measuring things in an openstack cloud. I think that means yes, the lib should be part of the project. That's not to say where the code actually lives long term.

3:39 jeffreyb sandywalsh: what about monitoring something like measure eventlet resources e.g. ?

3:39 sandywalsh dhellmann: I don't really care where it lives (ceilometer or another project), but it doesn't have to live in nova trunk

3:39 jeffreyb dhellman: do the things i had put on the spec/etherpad fit with the types of things you had in mind related to ceilometer?

3:39 harlowja sure

3:39 dhellmann harlowja: maybe some of it goes into oslo, or we release it as a stand-alone lib ourselves but managed by the ceilometer project

3:39 harlowja ya, that might be useful 3:39 tgall_foo left the room (quit: Quit: This computer has gone to sleep). 3:40 sandywalsh jeffreyb: still works fine 3:40 dhellmann jeffreyb: could you post that link again? I looked at it, but it's been a few days

3:40 jeffreyb sandywalsh: so would that be put into irc? http://wiki.openstack.org/InstrumentationMetricsMonitoring — see measurements section er, sandy, i meant rpc sorry

3:40 sandywalsh

3:40 dhellmann jeffreyb: oh, I thought there was an etherpad

3:40 sandywalsh I was wondering

3:41 jeffreyb dhellman: i copied most of the bits out of the etherpad to there 3:41 sandywalsh jeffreyb: it doesn't have to, but my inflight project was using that approach 3:41 dhellmann jeffreyb: sneaky 3:41 sandywalsh jeffreyb: I'm always up for suggestions on better ways

3:41 jeffreyb sandywalsh: aren't you worried about the transport implications? 3:41 sandywalsh jeffreyb: it was using the same eventlet backdoor to count greenthreads jeffreyb: that's the whole point, to get a realistic measurement

3:41 jeffreyb ah yes, you mentioned that but i didn't have a chance to look at it 3:42 dhellmann jeffreyb: I hadn't thought of a comprehensive list. The design goal I've always taken with ceilometer was make it extensible so we don't have to think of everything to measure ourselves.

3:42 sandywalsh and it's low bandwidth/frequency

3:42 dhellmann this looks like a good list of things to be measuring

3:42 sandywalsh so ... care for another topic? 3:42 henrynash [~henrynash@94-195-226-219.zone9.bethere.co.uk] entered the room.

3:42 harlowja a library in or out of ceilmoeter, perhaps like https://github.com/johnewart/ruby-metrics (or the design we put up which is similar), then that gets used by celiometer code in nova/elsewhere (thus unifying that into using the library created), then if that library gets to big, or has different stuff that is conflicting with ceilometer, it gets split off into some 'metrics' library that can either be used in monkey-patching, or ca accepted by code reviewers into nova code (or other code) as the project reveiwers say yes/no to annotations/... metrics code additions...

3:42 sandywalsh the workers that consume from the queue? 3:42 zhuadl left the room (quit: Ping timeout: 252 seconds).

3:42 jeffreyb dhellman: when there is something to be measured in process, was the idea to send it to ceilometer agent? dhellman: is there a core set of measurement objects? 3:43 dhellmann jeffreyb: not necessarily, unless it's something you want to bill for (API calls?)

3:43 sandywalsh harlowja: I think that sounds reasonable

3:43 jeffreyb dhellman: i see. definitely most of those are not of that nature.

3:43 sandywalsh harlowja: (except for the decorators for instrumentation

3:43 harlowja sure, sure, impl detail

3:44 sandywalsh

3:44 dhellmann jeffreyb: the Counter class is probably the closest we come, but we don't have a separate class representing the measurement of each meter. We do have a class that *produces* the measurement, but they are all represented by a common object at this point. jeffreyb: yes, that's clear jeffreyb: and it makes sense for that to be the case

3:44 sandywalsh so ... the ceilometer worker ... it seems like overkill ... can we go with something lighter weight like the StackTach worker? 3:44 sthakkar left the room (quit: Quit: Page closed).

3:44 sandywalsh and it's using the nova rpc code in the wrong way (imho) since events aren't rpc events

3:45 jeffreyb dhellman: so could the counter stuff be folded in with the extra metrics gauges?

3:45 dhellmann sandywalsh: the rpc issues are well documented

3:45 sandywalsh s/rpc methods

3:45 dhellmann sandywalsh: StackTach uses YAGI, right?

3:45 sandywalsh dhellmann: no, it has it's own lightweight worker 3:45 mattray left the room (quit: Quit: Leaving.).

3:45 sandywalsh I'd like to see a YAGI-like thing being the common layer though

3:45 dhellmann sandywalsh: ah, ok the issue with YAGI is it doesn't (AFAICT) address duplicate events, which we *definitely* don't want for billing

3:46 harlowja impl detail, the ceilometer worker is a metric/billing 'sink' right, not the only metric/billing 'sink' i would hope

3:46 dhellmann harlowja: yes "a" not "the"

3:46 sandywalsh dhellmann: we're using YAGI for billing today, no problems

3:46 dhellmann sandywalsh: what happens if a worker dies while processing some events? how do you avoid reprocessing them when it restarts?

3:46 timjr I think our perspective is we don't care about the transport as long as it's pluggable

3:47 jeffreyb timjr: yes

3:47 dhellmann timjr: +1

3:47 sandywalsh harlowja: well, that's the tricky part. Honestly I think the rabbit_queue_list flag is bad ... these queues are huge and the events big. we want to publish notifications less not more

3:47 nijaba we need a reliable transport

3:47 harlowja some need

3:47 sandywalsh nijaba: certainly

3:48 jeffreyb uh, reliable transport for some, but not for all uses

3:48 nijaba harlowja: exactly

3:48 sandywalsh for monitoring/billing it has to be reliable

3:48 timjr I would say billing wants a reliable transport, and fine-granularity performance info get by with UDP

3:48 sandywalsh for instrumentation ... meh

3:48 nijaba yes, the we was about the metering bit

3:48 sandywalsh agreed

3:48 nijaba so transport has to be pluggable, period....

3:49 timjr nod

3:49 sandywalsh so, I think it would be nice to see a lean-mean worker as a common piece of code

3:49 harlowja lean-mean, nice

3:49 nijaba sandywalsh: certainly the point of this

3:49 jeffreyb sandywalsh: are you talking about for dequeuing of messages or sending or both?

3:50 sandywalsh we've gone through a million variations on this over the last few months. Oddly enough there are only a few combinations that work reliable at scale

3:50 anniec hi all, we have 10 min left for the meeting.. do people feel we have good understanding of direction we want to go to? or should we call a G+ Hangout for next meeting to explain more? 3:50 sandywalsh jeffreyb: I'm mostly concerned about consuming the events from rabbit and "doing something with them"

3:50 jeffreyb anniec: seriously?

3:50 sandywalsh haha

3:50 harlowja sandywalsh: what variations have u hit, just out of curosity, MQ stuff? others?

3:50 timjr um, well, I'm going to try to come up with some concrete examples of the client code and config

3:51 anniec originally, Angus wanted the meeting to see who can do what ..

3:51 timjr so we can hash that out in the next meeting or whatever

3:51 jeffreyb sandywalsh: i feel it is a bad idea to put instrumentation events in rabbit

3:51 sandywalsh carrot, kombu, various amqp libraries under the hood. Frequent memory/locking issues

3:51 harlowja sandywalsh: thats just a different selection of where to write and where the agents read from, no?

3:51 anniec so just want to bring back to the original intent

3:51 harlowja sandywalsh: don't use amqp?

3:51 sandywalsh jeffreyb: instrumentation should *not* go in rabbit

3:51 anniec if there are still open questions that blocks us from moving forward, we should find a way to move forward

3:51 timjr I think we got consensus on pluggable transport

3:51 sandywalsh I'll work on a proposal wiki page if that would help?

3:51 asalkeld sure

3:51 timjr there is some dissent around explicit code vs. monkeypatching

3:52 sandywalsh you all can bend/spindle/mutilate as desired

3:52 timjr I will not that the two are not mutually exclusive note, even

3:52 harlowja let it be though, explicit code vs monkeypatching, whatever the reviewers that accept the code prefer right?

3:52 asalkeld I think we are more in the agreement than not

3:52 sandywalsh timjr: that's my big bugaboo, but again, it depends on the purpose of it

3:52 harlowja ya for agreement, high five

3:52 timjr I don't think we would rule out monkeypatching

3:52 jeffreyb timjr: no, or decorators either

3:52 timjr it's a nice escape hatch for monitoring that people don't want to clean up and contribute back for whatever reason

3:52 sandywalsh harlowja: I wish it was that easy ... often time code gets approved without the right consideration

3:52 anniec ok .. so tim is going to come up with example of client code and config for next discussion who else is doing what?

3:53 harlowja sandywalsh: agreed, lol

3:53 anniec sorry .. i am a manager type .. at the end of meeting, needs to have a Action Item list

3:53 sandywalsh I would suggest you look at the StackTach worker I just updated the library on Friday

3:53 jeffreyb anniec: banned

3:53 harlowja anniec: nice try, hjaha

3:53 sandywalsh and Stacky is up

3:53 timjr sandywalsh: I'll be sure to give stacktach a good read before the next meeting I regret not having done so prior to this one

3:53 sandywalsh and I have a video of both that I'm waiting on approach to sent the ML an install guide and howto

3:53 harlowja video, lol niceeee

3:54 sandywalsh s/approach/approval/

3:54 harlowja hot stacktach action

3:54 timjr lol

3:54 sandywalsh yep ... funky screencast

3:54 asalkeld can we find a place to put library code?

3:54 harlowja ceilometer subdir ?

3:54 eglynn ceilo?

3:54 nijaba sure

3:54 sandywalsh stacktach/stacky is living in github/rackspace currently 3:54 dolphm left the room (quit: Remote host closed the connection).

3:55 asalkeld mmm

3:55 harlowja mmmm == good

3:55 asalkeld so how does say nova depend on this

3:55 harlowja that is the question

3:55 asalkeld wouldn't we need something like python-ceilometer

3:55 harlowja but maybe to early to decide that (when that happens rip out as library?)

3:55 timjr the logging client lib can probably be generic enough to be a pip for all to use the calls into it will have to be added to nova's source code

3:56 dhellmann maybe we need a new repo managed by the ceilometer project but allowing us to package the library separately for consumption by other projects

3:56 timjr (along with the other components)

3:56 harlowja new stackforge 'metric' project?

3:56 sandywalsh well, I think that's the next bun-fight ... how to package all these pieces for menu-like selection of components

3:56 jeffreyb let's mock up some code for sharing maybe location isn't quite so importatnt just now?

3:56 nijaba dhellmann: +1

3:56 asalkeld dhellmann, sounds good

3:56 sandywalsh dhellmann: yep 3:56 harlowja jeffreyb: +1

3:56 jeffreyb dhellman: sounds good

3:57 eglynn sounds like agreement

3:57 asalkeld woot 3:57 dhellmann jeffreyb: indeed, if we create a github repo with the library we can always move it under openstack later

3:57 sandywalsh "I want the ceilometer worker + stacktach + the RAX billing module" how to get that ^

3:57 harlowja magic

3:57 sandywalsh heh

3:57 harlowja dhellmann: so a new stackforge repo?

3:57 timjr I think anvil could easily be configured to pull those components

3:57 jeffreyb who else is using stacktach so far or had a chance to explore it?

3:58 harlowja timjr: ya anvil woot

3:58 sandywalsh let's agree on the cut-points first before we start slicing/dicing the project I think 3:58 sandywalsh jeffreyb: we've been using it internally ... i only just made v2 public

3:58 dhellmann harlowja: we'll be moving ceilometer off of stackforge to "openstack" soon

3:59 harlowja kk, so a folder in there? but thats not a sep repo 3:59 sandywalsh jeffreyb: but I know there's a bunch of folks that were using the old v1

3:59 timjr ok, well, that was fun  my first openstack meeting, actually

3:59 sandywalsh

3:59 timjr next time, I hope the bot's around

3:59 dhellmann harlowja: we said we would just create a new github repo for now. it's easy to move it later.

3:59 nijaba dhellmann: just learned that this would have to wait for a planed maintenance period: gerrit needs to be restarted

3:59 harlowja wfm restart the gerrits dhellmann: github repo on stackforge or elsewhere?

3:59 sandywalsh someone want to copy-paste this session?

3:59 dhellmann nijaba: do you have a schedule for that?

4:00 nijaba dhellmann: not yet. "soon"

4:00 dhellmann nijaba: ok harlowja: on something that one of us can control without talking to the infra guys

4:00 harlowja k harlowja: i can make one in the yahoo org 4:01 gyee left the room (quit: Quit: Leaving).

4:01 sandywalsh someone want to copy-paste this session? So we can put on the wiki?

4:02 timjr nothing around to log it, eh?

4:02 anniec i can copy and paste and email it i can't find the log today

4:02 dhellmann anniec: maybe put the file online somewhere and email the link?

4:02 jeffreyb attach to the wiki

4:02 sandywalsh

4:02 dhellmann sandywalsh: the bot is broken

4:02 sandywalsh

yep

4:02 anniec wait .. i don't have full log

4:03 sandywalsh sure is

4:03 anniec i only have it up to 3:24