LoggingStandards
Basic Principles
The point of this document is to bring together a set of common guidelines that we can all agree upon, to make logging across all the OpenStack services as consistent as possible.
Some common principles:
INFO
Every Inbound WSGI request should be logged Exactly Once
Every inbound WSGI should get logged exactly once. It should be logged with enough information that the operation the user was attempting can be reconstructed.
Current Issues:
- Keystone is logging all requests twice (once from keystone, once from eventlet.wsgi)
- Nova is logging all ec2 requests twice (once from nova, once from eventlet.wsgi)
INFO log level is for Operators
Everything at INFO level or above should be relevant to Operators, which means it should describe in abstract terms what is going on on. Referencing internal function names is not useful:
2014-01-26 15:35:07.158 INFO nova.virt.libvirt.firewall [req-88cce8e9-dd2a-4f87-adc7-4ce2f527ba6e AggregatesAdminTestXML-1384929043 AggregatesAdminTestXML-1314829414] [instance: 6e017b40-d5bc-442b-ac0a-6784efe8a70b] Called setup_basic_filtering in nwfilter
Non-OpenStack libraries should not be logging at INFO
Third party libraries which use the Info level should have their default logging levels set to WARN. It is not useful to an Operator to know that a new item was added to a connection pool (thanks request lib!).
Messages at INFO should be a unit of work
Good:
2014-01-26 15:36:10.597 28297 INFO nova.virt.libvirt.driver [-] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Instance spawned successfully. 2014-01-26 15:36:14.307 28297 INFO nova.virt.libvirt.driver [-] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Instance destroyed successfully.
Bad:
2014-01-26 15:36:11.198 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Creating image 2014-01-26 15:36:11.525 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Using config drive 2014-01-26 15:36:12.326 AUDIT nova.compute.manager [req-714315e2-6318-4005-8f8f-05d7796ff45d FixedIPsTestXml-911165017 FixedIPsTestXml-1315774890] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Terminating instance 2014-01-26 15:36:12.570 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Creating config drive at /opt/stack/data/nova/instances/fd027464-6e15-4f5d-8b1f-c389bdb8772a/disk.config
This is incredible overshare, in normal operating it means all these messages just need to be thrown away.
INFO messages shouldn't need a secret decoder ring
Bad:
2014-01-26 15:36:14.256 28297 INFO nova.compute.manager [-] Lifecycle event 1 on VM b1b8e5c7-12f0-4092-84f6-297fe7642070
DEBUG
Log level definitions:
http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-vs-error This is a nice writeup about when to use each log level. Here is a brief description:
- Debug: Shows everything and is likely not suitable for normal production operation due to the sheer size of logs generated
- Info: Usually indicates successful service start/stop, versions and such non-error related data
- Audit: (An OpenStack invented, level, still not sure what we should be doing with it)
- Warning: Indicates that there might be a systemic issue; potential predictive failure notice
- Error: An error has occurred and an administrator should research the event
- Critical: An error has occurred and the system might be unstable; immediately get administrator assistance