Jump to: navigation, search

Difference between revisions of "MagnetoDB/specs/requestmetrics"

Line 1: Line 1:
=== Status ===
=== [Outdated] Status ===
/!\ The spec is outdated and replaces with https://review.openstack.org/#/c/138950/
/!\ The spec is outdated and replaces with https://review.openstack.org/#/c/138950/
=== Request Real Time Metrics ===
=== Request Real Time Metrics ===

Revision as of 15:59, 10 December 2014

[Outdated] Status

/!\ The spec is outdated and replaces with https://review.openstack.org/#/c/138950/

Request Real Time Metrics

Real time request metrics including latency/count/etc.

Specification status


Problem Description

To proactively address MagnetoDB operational issues, admin user needs real time visibility to request metrics data on each API node. Including:

  • number of requests
  • number of failures
  • number of errors
  • average latency
  • median latency
  • minimum latency
  • maximum latency
  • requests per second
  • distribution of request latency for each type of REST API call, such as "50%","66%","75%","80%","90%","95%","98%","99%","100%"

Proposed Change

Request metrics can be reported to StatsD using a sample rate for API calls. Request metrics are either counters or timing data (in units of milliseconds).

StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric.

Middleware seems a natural place to collect the request metrics data. However, if admin user needs visibility into requests to each API endpoints, we will need to capture metrics before/after each call to the API endpoints.

We propose to introduce a new middleware to all API node services, including REST API and streaming.

Initially we will focus on request measurements on API node wide metrics, API endpoints and WSGI processing delays on each API node. Later on, we will expand to cover Cassandra request metrics.

Each REST API call made to MagnetoDB will have the following request metrics:

  • request receive time
  • request receive error
  • request receive timeout
  • request receive bytes
  • response send time
  • response send error
  • response send timeout
  • response send bytes

It can be further broken down to API endpoints.

WSGI will have the following metrics:

  • backlog
  • waits
  • request processing time
  • response processing time
  • dispatch time

Cassandra metrics is supported by Cassandra python driver and can be enabled. But metrics data is supported through Scales.


Scales (bundled with Cassandra Python Driver) can be used instead of StatsD.

Security Impact

  • Does this change touch sensitive data such as tokens, keys, or user data?
  • Does this change alter the API in a way that may impact security, such as a new way to access sensitive information or a new way to login?
  • Does this change involve cryptography or hashing?
  • Does this change require the use of sudo or any elevated privileges?
  • Does this change involve using or parsing user-provided data? This could be directly at the API level or indirectly such as changes to a cache layer.
  • Can this change enable a resource exhaustion attack, such as allowing a single API interaction to consume significant server resources? Some examples of this include launching subprocesses for each connection, or entity expansion attacks in XML.

Notifications Impact

Other End User Impact

Performance Impact

Performance impact should be minimal since if statsd is used. The metrics sent to statsd is through UDP.

Other Deployer Impact

A dependency in statsd will be introduced.

Developer Impact



Charles Wang

Work Items
  • statsd

Documentation Impact