Jump to: navigation, search

LoggingStandards

Revision as of 19:57, 5 October 2015 by Mriedem (talk | contribs) (Basic Principles)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Basic Principles

The point of this document is to bring together a set of common guidelines that we can all agree upon, to make logging across all the OpenStack services as consistent as possible.

Note that this wiki may be out of date since most of this information was codified in a spec:

http://specs.openstack.org/openstack/openstack-specs/specs/log-guidelines.html

Some common principles:

INFO

Every Inbound WSGI request should be logged Exactly Once

Every inbound WSGI should get logged exactly once. It should be logged with enough information that the operation the user was attempting can be reconstructed.

Current Issues:

  • Keystone is logging all requests twice (once from keystone, once from eventlet.wsgi)
  • Nova is logging all ec2 requests twice (once from nova, once from eventlet.wsgi)

INFO log level is for Operators

Everything at INFO level or above should be relevant to Operators, which means it should describe in abstract terms what is going on on. Referencing internal function names is not useful:

2014-01-26 15:35:07.158 INFO nova.virt.libvirt.firewall [req-88cce8e9-dd2a-4f87-adc7-4ce2f527ba6e AggregatesAdminTestXML-1384929043 AggregatesAdminTestXML-1314829414] [instance: 6e017b40-d5bc-442b-ac0a-6784efe8a70b] Called setup_basic_filtering in nwfilter

Non-OpenStack libraries should not be logging at INFO

Third party libraries which use the Info level should have their default logging levels set to WARN. It is not useful to an Operator to know that a new item was added to a connection pool (thanks request lib!).

Currently proposed settings for 3rd party libraries:

  • amqp=WARN
  • boto=WARN
  • qpid=WARN
  • sqlalchemy=WARN
  • suds=INFO
  • iso8601=WARN
  • requests.packages.urllib3.connectionpool=WARN

Eventlet.wsgi

  • debug = False

otherwise you get the following after every request

2014-01-26 16:27:29.263 21034 INFO eventlet.wsgi.server [-] (21034) accepted ('127.0.0.1', 59064)

Messages at INFO should be a unit of work

Good:

 2014-01-26 15:36:10.597 28297 INFO nova.virt.libvirt.driver [-] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Instance spawned successfully.
 2014-01-26 15:36:14.307 28297 INFO nova.virt.libvirt.driver [-] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Instance destroyed successfully.

Bad:

 2014-01-26 15:36:11.198 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Creating image
 2014-01-26 15:36:11.525 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Using config drive
 2014-01-26 15:36:12.326 AUDIT nova.compute.manager [req-714315e2-6318-4005-8f8f-05d7796ff45d FixedIPsTestXml-911165017 FixedIPsTestXml-1315774890] [instance: b1b8e5c7-12f0-4092-84f6-297fe7642070] Terminating instance 2014-01-26
 15:36:12.570 INFO nova.virt.libvirt.driver [req-ded67509-1e5d-4fb2-a0e2-92932bba9271 FixedIPsNegativeTestXml-1426989627 FixedIPsNegativeTestXml-38506689] [instance: fd027464-6e15-4f5d-8b1f-c389bdb8772a] Creating config drive at /opt/stack/data/nova/instances/fd027464-6e15-4f5d-8b1f-c389bdb8772a/disk.config

This is incredible overshare, in normal operating it means all these messages just need to be thrown away.

INFO messages shouldn't need a secret decoder ring

Bad:

 2014-01-26 15:36:14.256 28297 INFO nova.compute.manager [-] Lifecycle event 1 on VM b1b8e5c7-12f0-4092-84f6-297fe7642070

DEBUG

Log level definitions

http://stackoverflow.com/questions/2031163/when-to-use-log-level-warn-vs-error This is a nice writeup about when to use each log level. Here is a brief description:

  • Debug: Shows everything and is likely not suitable for normal production operation due to the sheer size of logs generated
  • Info: Usually indicates successful service start/stop, versions and such non-error related data
  • Audit: (An OpenStack invented, level, still not sure what we should be doing with it)
  • Warning: Indicates that there might be a systemic issue; potential predictive failure notice
  • Error: An error has occurred and an administrator should research the event
  • Critical: An error has occurred and the system might be unstable; immediately get administrator assistance

Log Translation

Many deployers want to have logs written in their native language (either exclusively, or in addition to the English logs). On the other hand, many deployers do not care about this and translating log messages introduces a lot of work for the I18nTeam team. To balance those requirements, during the icehouse release cycle the gettextutils module in Oslo was updated to provide functions for marking log messages at different verbosity levels so they appear in different message catalogs. That allows the translators to prioritize their work based on the type of messages their deployers need to have translated.

Blueprint: https://blueprints.launchpad.net/oslo/+spec/log-messages-translation-domain

Code: http://git.openstack.org/cgit/openstack/oslo-incubator/tree/openstack/common/gettextutils.py#n86

Guidelines

The implementation of gettextutils moved to oslo.i18n during Juno. Please refer to the oslo.i18n documentation for the current guidelines: http://docs.openstack.org/developer/oslo.i18n/index.html