Difference between revisions of "ElasticRecheck"

Revision as of 17:58, 12 December 2013

Dump information / FAQs on elastic-recheck and how to use it and contribute to it.

When you hit a failure and there is no e-r query comment in your patch, but you do find a bug to recheck against, you should look at writing an e-r query for it so you don't have to dig next time. Lots of people check the http://status.openstack.org/rechecks/ page but not all of those bugs have e-r queries.

So what's the thought process for writing an e-r query (best practices)?

First either identify or open the bug to recheck against, that's standard operating procedure.

Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.

Avoid general error messages from Tempest in console.html since those aren't always unique.
Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.

Test your query out in http://logstash.openstack.org:

Typically start with a simple message and filename query over the last 7 days.
Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>"
- For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt"
If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query.

TODO steps for writing the e-r query and pushing it up

TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.

@@ Line 5: / Line 5: @@
 So what's the thought process for writing an e-r query (best practices)?
-# First either identify or open the bug to recheck against, that's standard operating procedure.
+First either identify or open the bug to recheck against, that's standard operating procedure.
-# Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
+Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
 * Avoid general error messages from Tempest in console.html since those aren't always unique.
 * Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
-# Test your query out in http://logstash.openstack.org:
+Test your query out in http://logstash.openstack.org:
 * Typically start with a simple message and filename query over the last 7 days.
 * Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>"
@@ Line 17: / Line 17: @@
 * If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits.  You need a 100% failure rate for a good e-r query.
-# TODO steps for writing the e-r query and pushing it up
+TODO steps for writing the e-r query and pushing it up
-# TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.
+TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.