Difference between revisions of "MagnetoDB/specs/requestmetrics"
Charles Wang (talk | contribs) m (→Proposed Change) |
Charles Wang (talk | contribs) |
||
Line 21: | Line 21: | ||
Request metrics can be reported to StatsD per API call. Request metrics are either counters or timing data (in units of milliseconds). | Request metrics can be reported to StatsD per API call. Request metrics are either counters or timing data (in units of milliseconds). | ||
+ | <br /> | ||
StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric. | StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric. | ||
+ | <br /> | ||
Middleware seems a natural place to collect the request metrics data. | Middleware seems a natural place to collect the request metrics data. | ||
+ | <br /> | ||
− | We propose to introduce a new middleware to all API node services, including api/streaming/task executor. Each | + | We propose to introduce a new middleware to all API node services, including api/streaming/task executor. |
+ | <br /> | ||
+ | |||
+ | Initially we will focus on request measurements on API endpoint, WSGI processing delay, and Cassandra request metrics. | ||
+ | <br /> | ||
+ | |||
+ | Each API endpoint will have the following request metrics: | ||
+ | <br /> | ||
+ | |||
+ | * request receive time | ||
+ | * request receive error | ||
+ | * request receive timeout | ||
+ | * request receive bytes | ||
+ | * response send time | ||
+ | * response send error | ||
+ | * response send timeout | ||
+ | * response send bytes | ||
+ | <br /> | ||
+ | |||
+ | WSGI will have the following metrics: | ||
+ | <br /> | ||
+ | * backlog | ||
+ | * waits | ||
+ | * request processing time | ||
+ | * response processing time | ||
+ | * dispatch time | ||
+ | |||
+ | Cassandra metrics is supported by Cassandra python driver and can be enabled. But metrics data is supported through Scales. | ||
==== Alternatives ==== | ==== Alternatives ==== |
Revision as of 19:53, 18 November 2014
Request Real Time Metrics
Real time request metrics including latency/count/etc.
Specification status
Draft
Problem Description
To proactively address MagnetoDB operational issues, admin user needs real time visibility to request metrics data on each API node. Including:
- number of requests
- number of failures
- number of errors
- average latency
- median latency
- minimum latency
- maximum latency
- requests per second
- distribution of request latency for each type of REST API call, such as "50%","66%","75%","80%","90%","95%","98%","99%","100%"
Proposed Change
Request metrics can be reported to StatsD per API call. Request metrics are either counters or timing data (in units of milliseconds).
StatsD can be utilized to expand timing data to min, max, avg, count, and 90th percentile per timing metric.
Middleware seems a natural place to collect the request metrics data.
We propose to introduce a new middleware to all API node services, including api/streaming/task executor.
Initially we will focus on request measurements on API endpoint, WSGI processing delay, and Cassandra request metrics.
Each API endpoint will have the following request metrics:
- request receive time
- request receive error
- request receive timeout
- request receive bytes
- response send time
- response send error
- response send timeout
- response send bytes
WSGI will have the following metrics:
- backlog
- waits
- request processing time
- response processing time
- dispatch time
Cassandra metrics is supported by Cassandra python driver and can be enabled. But metrics data is supported through Scales.
Alternatives
Scales (bundled with Cassandra Python Driver) can be used instead of StatsD.
Security Impact
- Does this change touch sensitive data such as tokens, keys, or user data?
- Does this change alter the API in a way that may impact security, such as a new way to access sensitive information or a new way to login?
- Does this change involve cryptography or hashing?
- Does this change require the use of sudo or any elevated privileges?
- Does this change involve using or parsing user-provided data? This could be directly at the API level or indirectly such as changes to a cache layer.
- Can this change enable a resource exhaustion attack, such as allowing a single API interaction to consume significant server resources? Some examples of this include launching subprocesses for each connection, or entity expansion attacks in XML.
Notifications Impact
Other End User Impact
Performance Impact
Performance impact should be minimal since if statsd is used. The metrics sent to statsd is through UDP.
Other Deployer Impact
A dependency in statsd will be introduced.
Developer Impact
Implementation
Assignee(s)
Charles Wang
Work Items
Dependencies
- statsd