Jump to: navigation, search

ElasticRecheck

Revision as of 16:00, 16 September 2014 by Mikhail S Medvedev (talk | contribs) (How To Help With Gate Failures)

How To Help With Gate Failures

TODO(mriedem): Integrate this wiki content into the elastic-recheck readme: https://github.com/openstack-infra/elastic-recheck/blob/master/README.rst

Dump information / FAQs on elastic-recheck and how to use it and contribute to it.

When you hit a failure and there is no e-r query comment in your patch, but you do find a bug to recheck against, you should look at writing an e-r query for it so you don't have to dig next time. Lots of people check the http://status.openstack.org/rechecks/ page but not all of those bugs have e-r queries.

So what's the thought process for writing an e-r query (best practices)?

  1. First either identify or open the bug to recheck against, that's standard operating procedure.
  2. Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
    • Avoid general error messages from Tempest in console.html since those aren't always unique.
    • Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
    1. Test your query out in http://logstash.openstack.org:
      • Typically start with a simple message and filename tag query over the last 7 days.
      • Query is structured like this: message:"<your unique fail here>" AND tag:"<name of the log that the failure message appears in>"
        • For example: message:"because vif doesn't exist" AND tag:"screen-n-net.txt"
      • If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query.
    2. Query limitations:
      • We currently only index INFO and above level messages, so we can't write queries against DEBUG level messages.
      • elastic-recheck doesn't currently have multi-line support, i.e. taking two separate error messages and putting them into the same query, see https://review.openstack.org/#/c/60508/ as an example of where this is needed.
  3. Writing the e-r query and pushing it up
    • This is pretty easy, you just create a new query yaml file under elastic-recheck/queries and push it up for review. Here is an example: https://review.openstack.org/#/c/61826/
    • Tip: use the Related-Bug: #xxxxxxx line in the commit message so it's automatically linked back into the bug report for people monitoring gate failure bugs.
  4. What to do when a bug is resolved
  5. TODO: doc when to mark a bug as critical so that it shows up in the weekly release status meetings for the PTLs