Jump to: navigation, search

Difference between revisions of "NeutronGateFailureTriage"

(Created page with "This page documents how to triage GateFailures for Neutron. Instructions to be provided.")
 
Line 1: Line 1:
This page documents how to triage GateFailures for Neutron. Instructions to be provided.
+
This page provides guidelines for spotting and assessing neutron gate failures.
 +
Some hints for triaging failures are also provided.
 +
 
 +
Spotting gate failures
 +
 
 +
This can be achieved using several tools.
 +
# [http://jogo.github.io/gate/ Joe Gordon's github.io pages]
 +
  Even though it's not an "official" Openstack page it provides a quick snapshot of the current status for the most important jobs
 +
  This page is built using data available at graphite.openstack.org. If you want to check how that is done go here: https://github.com/jogo/jogo.github.io/tree/master/gate
 +
  (caveat: the color of the neutron job is very similar to that of the full job with nova-network)
 +
# logstash.openstack.org
 +
  This query will return failures for a specific job: build_status:FAILURE AND message:Finished  AND build_name:"check-tempest-dsvm-neutron" AND build_queue:"gate"
 +
  And divided by the total number of jobs executed:  message:Finished  AND build_name:"check-tempest-dsvm-neutron" AND build_queue:"gate", it will return the failure rate in the selected period for a given job.
 +
 
 +
It is important to remark that failures in the check queue might be misleading as the problem causing the failure is most of the time in the patch being checked.
 +
However, these failures are a precious resource for assessing frequency and determining root cause of failures which manifest in the gate queue.
 +
 
 +
The step above will provide a quick outlook of where things stand. When the failure rate raises above 10% for a job in 24 hours, it's time to be on alert.
 +
25% is amber alert. 33% is red alert. Anything above 50% means that probably somebody from the infra team has already a contract out on you.

Revision as of 15:42, 29 May 2014

This page provides guidelines for spotting and assessing neutron gate failures. Some hints for triaging failures are also provided.

Spotting gate failures

This can be achieved using several tools.

  1. Joe Gordon's github.io pages
  Even though it's not an "official" Openstack page it provides a quick snapshot of the current status for the most important jobs
  This page is built using data available at graphite.openstack.org. If you want to check how that is done go here: https://github.com/jogo/jogo.github.io/tree/master/gate
  (caveat: the color of the neutron job is very similar to that of the full job with nova-network)
  1. logstash.openstack.org
  This query will return failures for a specific job: build_status:FAILURE AND message:Finished  AND build_name:"check-tempest-dsvm-neutron" AND build_queue:"gate"
  And divided by the total number of jobs executed:  message:Finished  AND build_name:"check-tempest-dsvm-neutron" AND build_queue:"gate", it will return the failure rate in the selected period for a given job.

It is important to remark that failures in the check queue might be misleading as the problem causing the failure is most of the time in the patch being checked. However, these failures are a precious resource for assessing frequency and determining root cause of failures which manifest in the gate queue.

The step above will provide a quick outlook of where things stand. When the failure rate raises above 10% for a job in 24 hours, it's time to be on alert. 25% is amber alert. 33% is red alert. Anything above 50% means that probably somebody from the infra team has already a contract out on you.