Difference between revisions of "ElasticRecheck"
Line 5: | Line 5: | ||
So what's the thought process for writing an e-r query (best practices)? | So what's the thought process for writing an e-r query (best practices)? | ||
− | First either identify or open the bug to recheck against, that's standard operating procedure. | + | # First either identify or open the bug to recheck against, that's standard operating procedure. |
− | + | #* See here for more info: https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures | |
− | Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug. | + | # Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug. |
− | * Avoid general error messages from Tempest in console.html since those aren't always unique. | + | #* Avoid general error messages from Tempest in console.html since those aren't always unique. |
− | * Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them. | + | #* Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them. |
− | + | ## Test your query out in http://logstash.openstack.org: | |
− | Test your query out in http://logstash.openstack.org: | + | ##* Typically start with a simple message and filename query over the last 7 days. |
− | * Typically start with a simple message and filename query over the last 7 days. | + | ##* Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>" |
− | * Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>" | + | ##** For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt" |
− | ** For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt" | + | ##* If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query. |
− | * If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query. | + | # TODO steps for writing the e-r query and pushing it up |
− | + | # TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field. | |
− | TODO steps for writing the e-r query and pushing it up | ||
− | |||
− | TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field. |
Revision as of 18:12, 12 December 2013
Dump information / FAQs on elastic-recheck and how to use it and contribute to it.
When you hit a failure and there is no e-r query comment in your patch, but you do find a bug to recheck against, you should look at writing an e-r query for it so you don't have to dig next time. Lots of people check the http://status.openstack.org/rechecks/ page but not all of those bugs have e-r queries.
So what's the thought process for writing an e-r query (best practices)?
- First either identify or open the bug to recheck against, that's standard operating procedure.
- See here for more info: https://wiki.openstack.org/wiki/GerritJenkinsGit#Test_Failures
- Second, check the logs for the failure looking for something that uniquely identifies the failure for the bug.
- Avoid general error messages from Tempest in console.html since those aren't always unique.
- Look for errors/warnings in the various log files, e.g. logs/screen-n-cpu.txt and pull information from them.
- Test your query out in http://logstash.openstack.org:
- Typically start with a simple message and filename query over the last 7 days.
- Query is structured like this: message:"<your unique fail here>" AND filename:"<the log that the failure message appears in relative to the root of the job logs>"
- For example: message:"because vif doesn't exist" AND filename:"logs/screen-n-net.txt"
- If you have hits, make sure there are no false negatives by checking 'build_status' on the left side of the logstash page - that will show you the success/failure rate for the builds that the query hits. You need a 100% failure rate for a good e-r query.
- TODO steps for writing the e-r query and pushing it up
- TODO steps for what to do when a bug is resolved and we can archive the query with the 'resolved_at' field.