Security/Guidelines/logging guidelines

Overview
OpenStack needs logging and notification security guidelines and best practices to prevent accidental leakage of confidential information to unauthorized users. This wiki is an attempt to gain the OpenStack community consensus on what those standards should be and how they should be implemented.

Status
This is currently in review by the OpenStack Security Group (OSSG).

Difficulty Identifying Confidential Data
There is no standard/structured logging and notification data format across OpenStack projects which would enable OpenStack operators to unambiguously identify and filter out confidential data which should not be exposed to certain users. Simple architectural diagram example:



Note: For brevity, this document will use "logs" in place of "logs and notifications".

This diagram represents multiple OpenStack services generating logs which may be formatted differently and may hold different types of confidential data (data that an operator would not want a user to access). There may be an optional operator-created filtering and aggregation system and some method of exposing the sanitized logs to users or operators. The delivery method isn't strictly relevant to this discussion but the ability to unambiguously filter confidential data out of logs is very important.

Some non-exhaustive examples of accidental credential disclosure to unauthorized users within OpenStack:
 * (Ceilometer) Log contains DB password in plain text (CVE-2013-6384) [OSSA 2013-031]
 * (Keystone) Plaintext passwords are logged
 * (Nova) Clear text password has been print in log by some API call

The problem is exacerbated by the CI/CD nature of OpenStack in general. Code is being merged daily in many projects and some of this code may introduce new logging entries which need to be examined and filtered by operators running the service. Operators who update frequently may spend more time and effort on this process. Currently, many operators are in a reactive mode when addressing log leaks as they must actively monitor log data changes and act quickly to head off potential data leaks as soon as possible. This is obviously not an optimal solution for OpenStack operators especially when the difficulty increases with the number of OpenStack services run.

Use of Log Level for Security
In some cases, OpenStack security issues around logging are due to the use of log levels to filter out confidential data. To provide a more specific example, setting log level to DEBUG or INFO has caused plain text credentials to be logged (sometimes in a user globally accessible location). This causes operators to make a choice between potential confidential data leaks and better performance/debug data in logs. Again, this is not an optimal design for operators.

Unambiguously Identifying Confidential Log Data
Proposed rules:
 * Identify confidential data in the OpenStack code to provide administrators a single "tag" to sanitize data in back end log filtering systems
 * Purge all log data, in OpenStack code, identified as confidential or sensitive by OpenStack

OpenStack community feedback to date explicitly rejects the concept of an Oslo Config-like setting to disable security features (to allow some sensitive data to be logged).

Discussion point: One piece of feedback is to enable an AUDIT level log setting which would enable sensitive data logging. Community thoughts on this?

OpenStack Sensitive/Confidential Data
(should not be exposed to users) Note: This list is meant to be a living list that adapts to new OpenStack issues/architecture.

Reminder: Always ensure that users may only access data that is associated with their tenant/project.

Possible Implementation Options
There is one area of interest where the author of this wiki will admit lack of knowledge: The Cloud Auditing Data Federation (CADF) seems to be working in a similar space. There is a PyCADF project in OpenStack that may of interest for further investigation.

Trace Class
In Project Solum, a TraceData class was created to work with Oslo Log to perform two main tasks:
 * Enable identification of confidential/sensitive data in code
 * Allow trace data to be persistent and potentially built up and used in each following log call

Code: https://github.com/stackforge/solum/blob/master/solum/common/trace_data.py

Unit Test/Usage example: https://github.com/stackforge/solum/blob/master/solum/tests/common/test_trace_data.py

This code will accept an Oslo Context class and fill itself in with the data to prevent unexpected interactions with that common library. Oslo Log may consume this trace data in the same fashion that a context class can be used.

One potential concern with this approach is that the Trace class duplicates some of the Oslo Context class data. Other approaches are offered below.

Native Oslo Log Support
This is the preferred implementation path based on OpenStack community feedback to date.

Extend Oslo Log itself to incorporate the capability to flag data as sensitive (first class citizen feature in Oslo Log). This may take a little more work from the Oslo team to architect this type of solution.

One thought is that there might be a: LOG.debug("my log message", confidential=True) type mechanism for identifying sensitive data. This would mandate breaking up log entries into "public" and "private" log calls (i.e. back to back logging calls in some cases).

Oslo Log Extra Structure
Oslo log provides an "extra" field which may hold arbitrary data. One option would be to manually create a "private" key which holds a JSON dictionary/list of confidential data that the operator may use to filter log data.

Example: LOG.debug("Random non-confidential log data", extra={private={value=confidential_data}})

The benefit to this path is that there are no Oslo code changes needed. The problem is that this is a very tedious and error prone process to properly structure each Oslo Log call.

Log Level Usage Recommendations
Proposed rule: Do not use log level to filter out confidential data (such as passwords, etc).

This post has a great definition of log level usage for consideration by the community: When To Use Log Level Warn vs Error

In order to make this more specific for OpenStack and to obtain community input, here is a table with recommendations for log level:

Socializing Recommendations
Potential ways to socialize security recommendations:
 * Style Guidelines: Update OpenStack Style Guidelines (http://docs.openstack.org/developer/hacking/) with the OpenStack agreed upon recommendations
 * Code Reviews: Use code reviews to link to OSSG recommendations: https://wiki.openstack.org/wiki/Security/Guidelines. Use the anchors to specify the exact topic.
 * Convince Leadership: Socialize ideology and benefits to PTLs and TC. Challenge PTLs to become early adopters and set a good example for others.
 * Engage Oslo Team: Put recommendations into common OpenStack libraries such as Oslo