Jump to: navigation, search

Difference between revisions of "MagnetoDB/specs/requestmetrics"

m (Proposed Change)
Line 21: Line 21:
  
 
Request metrics can be reported to StatsD per API call. Request metrics are either counters or timing data (in units of milliseconds).  
 
Request metrics can be reported to StatsD per API call. Request metrics are either counters or timing data (in units of milliseconds).  
 +
<br />
  
 
StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric.  
 
StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric.  
 +
<br />
  
 
Middleware seems a natural place to collect the request metrics data.  
 
Middleware seems a natural place to collect the request metrics data.  
 +
<br />
  
We propose to introduce a new middleware to all API node services, including api/streaming/task executor. Each category of REST APIs will have a corresponding request metrics.
+
We propose to introduce a new middleware to all API node services, including api/streaming/task executor.
 +
<br />
 +
 
 +
Initially we will focus on request measurements on API endpoint, WSGI processing delay, and Cassandra request metrics.
 +
<br />
 +
 
 +
Each API endpoint will have the following request metrics:
 +
<br />
 +
 
 +
* request receive time
 +
* request receive error
 +
* request receive timeout
 +
* request receive bytes
 +
* response send time
 +
* response send error
 +
* response send timeout
 +
* response send bytes
 +
<br />
 +
 
 +
WSGI will have the following metrics:
 +
<br />
 +
* backlog
 +
* waits
 +
* request processing time
 +
* response processing time
 +
* dispatch time
 +
 
 +
Cassandra metrics is supported by Cassandra python driver and can be enabled. But metrics data is supported through Scales.
  
 
==== Alternatives ====
 
==== Alternatives ====

Revision as of 19:53, 18 November 2014

Request Real Time Metrics

Real time request metrics including latency/count/etc.

Specification status

Draft

Problem Description

To proactively address MagnetoDB operational issues, admin user needs real time visibility to request metrics data on each API node. Including:

  • number of requests
  • number of failures
  • number of errors
  • average latency
  • median latency
  • minimum latency
  • maximum latency
  • requests per second
  • distribution of request latency for each type of REST API call, such as "50%","66%","75%","80%","90%","95%","98%","99%","100%"

Proposed Change

Request metrics can be reported to StatsD per API call. Request metrics are either counters or timing data (in units of milliseconds).

StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric.

Middleware seems a natural place to collect the request metrics data.

We propose to introduce a new middleware to all API node services, including api/streaming/task executor.

Initially we will focus on request measurements on API endpoint, WSGI processing delay, and Cassandra request metrics.

Each API endpoint will have the following request metrics:

  • request receive time
  • request receive error
  • request receive timeout
  • request receive bytes
  • response send time
  • response send error
  • response send timeout
  • response send bytes


WSGI will have the following metrics:

  • backlog
  • waits
  • request processing time
  • response processing time
  • dispatch time

Cassandra metrics is supported by Cassandra python driver and can be enabled. But metrics data is supported through Scales.

Alternatives

Scales (bundled with Cassandra Python Driver) can be used instead of StatsD.

Security Impact

  • Does this change touch sensitive data such as tokens, keys, or user data?
  • Does this change alter the API in a way that may impact security, such as a new way to access sensitive information or a new way to login?
  • Does this change involve cryptography or hashing?
  • Does this change require the use of sudo or any elevated privileges?
  • Does this change involve using or parsing user-provided data? This could be directly at the API level or indirectly such as changes to a cache layer.
  • Can this change enable a resource exhaustion attack, such as allowing a single API interaction to consume significant server resources? Some examples of this include launching subprocesses for each connection, or entity expansion attacks in XML.

Notifications Impact

Other End User Impact

Performance Impact

Performance impact should be minimal since if statsd is used. The metrics sent to statsd is through UDP.

Other Deployer Impact

A dependency in statsd will be introduced.

Developer Impact

Implementation

Assignee(s)

Charles Wang

Work Items
Dependencies
  • statsd

Documentation Impact

References