Difference between revisions of "LoggingStandards"
m (→Log Translation) |
m (→Guidelines) |
||
Line 93: | Line 93: | ||
* Do not manipulate the translated string to add other literal text to it, etc. | * Do not manipulate the translated string to add other literal text to it, etc. | ||
** BAD: LOG.info(_LI('some message') + ': variable=%(value)s', ...) | ** BAD: LOG.info(_LI('some message') + ': variable=%(value)s', ...) | ||
− | ** GOOD: LOG.info(_LI('some message: variable=%(value)s', ...) | + | ** GOOD: LOG.info(_LI('some message: variable=%(value)s'), ...) |
Revision as of 22:12, 26 February 2014
Basic Principles
The point of this document is to bring together a set of common guidelines that we can all agree upon, to make logging across all the OpenStack services as consistent as possible.
Some common principles:
INFO
Every Inbound WSGI request should be logged Exactly Once
Every inbound WSGI should get logged exactly once. It should be logged with enough information that the operation the user was attempting can be reconstructed.
Current Issues:
- Keystone is logging all requests twice (once from keystone, once from eventlet.wsgi)
- Nova is logging all ec2 requests twice (once from nova, once from eventlet.wsgi)
INFO log level is for Operators
Everything at INFO level or above should be relevant to Operators, which means it should describe in abstract terms what is going on on. Referencing internal function names is not useful:
2014-01-26 15:35:07.158 INFO nova.virt.libvirt.firewall [req-88cce8e9-dd2a-4f87-adc7-4ce2f527ba6e AggregatesAdminTestXML-1384929043 AggregatesAdminTestXML-1314829414] [instance: 6e017b40-d5bc-442b-ac0a-6784efe8a70b] Called setup_basic_filtering in nwfilter
Non-OpenStack libraries should not be logging at INFO
Third party libraries which use the Info level should have their default logging levels set to WARN. It is not useful to an Operator to know that a new item was added to a connection pool (thanks request lib!).
Currently proposed settings for 3rd party libraries:
- amqp=WARN
- boto=WARN
- qpid=WARN
- sqlalchemy=WARN
- suds=INFO
- iso8601=WARN
- requests.packages.urllib3.connectionpool=WARN
Eventlet.wsgi
- debug = False
otherwise you get the following after every request
2014-01-26 16:27:29.263 21034 INFO eventlet.wsgi.server [-] (21034) accepted ('127.0.0.1', 59064)
Messages at INFO should be a unit of work
Good:
2014-01-26 15:36:10.597 28297 INFO nova.virt.libvirt.driver [-] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Instance spawned successfully. 2014-01-26 15:36:14.307 28297 INFO nova.virt.libvirt.driver [-] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Instance destroyed successfully.
Bad:
2014-01-26 15:36:11.198 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Creating image 2014-01-26 15:36:11.525 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Using config drive 2014-01-26 15:36:12.326 AUDIT nova.compute.manager [req-714315e2-6318-4005-8f8f-05d7796ff45d FixedIPsTestXml-911165017 FixedIPsTestXml-1315774890] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Terminating instance 2014-01-26 15:36:12.570 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Creating config drive at /opt/stack/data/nova/instances/fd027464-6e15-4f5d-8b1f-c389bdb8772a/disk.config
This is incredible overshare, in normal operating it means all these messages just need to be thrown away.
INFO messages shouldn't need a secret decoder ring
Bad:
2014-01-26 15:36:14.256 28297 INFO nova.compute.manager [-] Lifecycle event 1 on VM b1b8e5c7-12f0-4092-84f6-297fe7642070
DEBUG
Log level definitions:
http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-vs-error This is a nice writeup about when to use each log level. Here is a brief description:
- Debug: Shows everything and is likely not suitable for normal production operation due to the sheer size of logs generated
- Info: Usually indicates successful service start/stop, versions and such non-error related data
- Audit: (An OpenStack invented, level, still not sure what we should be doing with it)
- Warning: Indicates that there might be a systemic issue; potential predictive failure notice
- Error: An error has occurred and an administrator should research the event
- Critical: An error has occurred and the system might be unstable; immediately get administrator assistance
Log Translation
Many deployers want to have logs written in their native language (either exclusively, or in addition to the English logs). On the other hand, many deployers do not care about this and translating log messages introduces a lot of work for the I18nTeam team. To balance those requirements, during the icehouse release cycle the gettextutils module in Oslo was updated to provide functions for marking log messages at different verbosity levels so they appear in different message catalogs. That allows the translators to prioritize their work based on the type of messages their deployers need to have translated.
Blueprint: https://blueprints.launchpad.net/oslo/+spec/log-messages-translation-domain
Code: http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/gettextutils.py#n86
Guidelines
- Debug messages are not translated, for now.
- Annotate other log messages with the appropriate marker function.
- _LI() for INFO
- _LW() for WARNING
- _LE() for ERROR
- _LC() for CRITICAL
- Exception messages should continue to use _().
- Messages logged and raised as exceptions should use _().
- Do not manipulate the translated string to add other literal text to it, etc.
- BAD: LOG.info(_LI('some message') + ': variable=%(value)s', ...)
- GOOD: LOG.info(_LI('some message: variable=%(value)s'), ...)