ElasticRecheck

How To Help With Gate Failures
TODO(mriedem): Integrate this wiki content into the elastic-recheck readme: https://github.com/openstack-infra/elastic-recheck/blob/master/README.rst

Dump information / FAQs on elastic-recheck and how to use it and contribute to it.

When you hit a failure and there is no e-r query comment in your patch, but you do find a bug to recheck against, you should look at writing an e-r query for it so you don't have to dig next time. Lots of people check the http://status.openstack.org/rechecks/ page but not all of those bugs have e-r queries.

So what's the thought process for writing an e-r query (best practices)?


 * 1) First either identify or open the bug to recheck against, that's standard operating procedure.
 * 2) * See here for more info: https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures
 * 3) Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
 * 4) * Avoid general error messages from Tempest in console.html since those aren't always unique.
 * 5) * Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
 * 6) Test your query out in http://logstash.openstack.org:
 * 7) * Typically start with a simple message and filename tags query over the last 7 days.
 * 8) * Query is structured like this: message:" " AND tags:""
 * 9) ** For example: message:"because vif doesn't exist" AND tags:"screen-n-net.txt"
 * 10) * If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query.
 * 11) Query limitations:
 * 12) * We currently only index INFO and above level messages, so we can't write queries against DEBUG level messages.
 * 13) * elastic-recheck doesn't currently have multi-line support, i.e. taking two separate error messages and putting them into the same query, see https://review.openstack.org/#/c/60508/ as an example of where this is needed.
 * 14) Writing the e-r query and pushing it up
 * 15) * This is pretty easy, you just create a new query yaml file under elastic-recheck/queries and push it up for review. Here is an example: https://review.openstack.org/#/c/61826/
 * 16) * Tip: use the Related-Bug: #xxxxxxx line in the commit message so it's automatically linked back into the bug report for people monitoring gate failure bugs.
 * 17) What to do when a bug is resolved
 * 18) * When a tracked bug is marked as fixed and it's dropped off the http://status.openstack.org/elastic-recheck/ page (for TBD # of days?), push a change to archive the query for that bug.
 * 19) * "Archiving" the query for a fixed bug is pretty easy, you just add the 'resolved_at' field to the query yaml file. Example: https://review.openstack.org/#/c/61186/
 * 20) TODO: doc when to mark a bug as critical so that it shows up in the weekly release status meetings for the PTLs
 * 21) * Basically if it's not in elastic-recheck then it's not critical
 * 22) * See the ML thread on this subject: http://lists.openstack.org/pipermail/openstack-dev/2013-November/020048.html