Difference between revisions of "ElasticRecheck"

Revision as of 18:12, 12 December 2013

Dump information / FAQs on elastic-recheck and how to use it and contribute to it.

When you hit a failure and there is no e-r query comment in your patch, but you do find a bug to recheck against, you should look at writing an e-r query for it so you don't have to dig next time. Lots of people check the http://status.openstack.org/rechecks/ page but not all of those bugs have e-r queries.

So what's the thought process for writing an e-r query (best practices)?

First either identify or open the bug to recheck against, that's standard operating procedure.
- See here for more info: https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures
Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
- Avoid general error messages from Tempest in console.html since those aren't always unique.
- Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
1. Test your query out in http://logstash.openstack.org:
  - Typically start with a simple message and filename query over the last 7 days.
  - Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>"
    - For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt"
  - If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query.
TODO steps for writing the e-r query and pushing it up
TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.

@@ Line 5: / Line 5: @@
 So what's the thought process for writing an e-r query (best practices)?
-First either identify or open the bug to recheck against, that's standard operating procedure.
+# First either identify or open the bug to recheck against, that's standard operating procedure.
+#* See here for more info: https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures
-Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
+# Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
-* Avoid general error messages from Tempest in console.html since those aren't always unique.
+#* Avoid general error messages from Tempest in console.html since those aren't always unique.
-* Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
+#* Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
+## Test your query out in http://logstash.openstack.org:
-Test your query out in http://logstash.openstack.org:
+##* Typically start with a simple message and filename query over the last 7 days.
-* Typically start with a simple message and filename query over the last 7 days.
+##* Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>"
-* Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>"
+##** For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt"
-** For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt"
+##* If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits.  You need a 100% failure rate for a good e-r query.
-* If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits.  You need a 100% failure rate for a good e-r query.
+# TODO steps for writing the e-r query and pushing it up
+# TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.
-TODO steps for writing the e-r query and pushing it up
-TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.