Jump to: navigation, search

Difference between revisions of "Large Scale SIG/Monitor"

(Replaced content with "Please update your links! The Scaling Journey documentation has now moved to: === https://docs.openstack.org/large-scale/journey/ === You can propose changes to the cont...")
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
The second stage in the [[Large_Scale_SIG#The_Scaling_Journey|Scaling Journey]] is '''Monitor'''.
+
Please update your links! The Scaling Journey documentation has now moved to:
  
Once you have properly [[Large_Scale_SIG/Configure|configured]] your cluster to handle scale, you will need to properly monitor it for signs of load stress. Monitoring in OpenStack can be a bit overwhelming and it's sometimes hard to determine how to meaningfully monitor your deployment to get advance warning of when load is just too high. This page aims to help answer those questions.
+
=== https://docs.openstack.org/large-scale/journey/ ===
  
Once meaningful monitoring is in place, you are ready to proceed to the third stage of the Scaling Journey: [[Large_Scale_SIG/ScaleUp|Scale Up]].
+
You can propose changes to the content through the [https://opendev.org/openstack/large-scale openstack/large-scale] git repository.
 
 
== FAQ ==
 
 
 
'''Q: How can I detect that RabbitMQ is a bottleneck ?'''
 
 
 
A: oslo.metrics will introduce monitoring for rpc calls, currently under development
 
 
 
'''Q: How can I detect that database is a bottleneck ?'''
 
 
 
A: oslo.metrics will also integrate oslo.db as the next step after oslo.messaging
 
 
 
'''Q: How can I track latency issues ?'''
 
 
 
A:
 
 
 
'''Q: How can I track traffic issues ?'''
 
 
 
A:
 
 
 
'''Q: How do I track error rates ?'''
 
 
 
A:
 
 
 
'''Q: How do I track saturation issues ?'''
 
 
 
A:
 
 
 
== Resources ==
 
* oslo.metrics [https://opendev.org/openstack/oslo.metrics/ code] and [https://docs.openstack.org/oslo.metrics/latest/ documentation]
 
* Learn about golden signals (latency, traffic, errors, saturation) in the [https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals Google SRE book]
 
 
 
 
 
== Other SIG work on that stage ==
 
* Measurement of MQ behavior through oslo.metrics
 
** Approved spec for oslo.metrics: https://review.opendev.org/#/c/704733/
 
** Code up at https://opendev.org/openstack/oslo.metrics/
 
** 0.1.0 initial release done
 
** Get to a 1.0 release
 
*** oslo-messaging metrics code https://review.opendev.org/#/c/761848/ (genekuo)
 
*** Enable bandit (issue to fix with predictable path for metrics socket ?)
 
*** Improve tests to get closer to 100% coverage
 

Latest revision as of 09:44, 1 September 2022

Please update your links! The Scaling Journey documentation has now moved to:

https://docs.openstack.org/large-scale/journey/

You can propose changes to the content through the openstack/large-scale git repository.