KeyManager

= Key Manager =

The Key Manager effort became Barbican. This documentation is here for historical purposes.
malini.k.bhandaru "at" intel.com

https://etherpad.openstack.org/havana-key-manager

History
March 06, 2013: Initial version

April 14, 2013: Added reference to Rackspace session at OpenStack Summit

April 18, 2013: Added pointer to etherpad from Key Manager design session at OpenStack Summit

April 23, 2013: Added section on "Post summit discussion" changes/clarifications

Server side encryption with key management would make data protection more readily available, enable harnessing of any special hardware encryption support on the servers, make available a larger set of encryption algorithms and reduce client maintenance effort. Amazon and Google’s object storage systems provide transparent data encryption. Encryption is no longer prohibitive with newer chips available that carry hardware support for AES-NI, and implementations that harness possible parallelisms in data and processor architecture. The popular wisdom today is increase security

Recently interest has grown in [|OpenStack]  to provide server side encryption in Cinder ( Volume ), Swift ( Object ) , Glance ( Snapshot ).

Protecting data involves not only encryption support but also key management, the creating, storing, protecting, and providing ready access to the encryption keys. The keys would need to be stored on a device separate from that housing the data they seek to protect. Key management could be a separate OpenStack service or a sub-service of Keystone, OpenStack's identity service.

The keys themselves ideally would be random, of the desired length, with associated meta data such as ownership and themselves encrypted before being stored.

= Security Model = different keys.
 * Protection of data at rest: the encrypted data and the encryption keys are held in separate locations. Stealing the data disk still leaves the data protected.
 * Keys opaque: The keys themselves are  encrypted using  '''Master" keys.
 * Master Key Protection: Master keys are protected in Hardware using Trusted Platform Module (TPM) technology. Keys are released to only trusted host machines (BIOS and initial boot sequences that are measurably good ).
 * Secure Master Key Transmission: TPM technology is used to transfer master keys, from cooperating services and between sibling services (in the case of horizontal scaling).
 * Support Dual Locking: High value data could be protected with a user/project/domain specific key and a service key. This is akin to using two keys such as  with bank safe-deposit boxes, a bank key and a customer key.
 * Limited Knowledge: Key Manager will not maintain mapping between keys to encrypted entities. Encrypted entities will maintain as meta data key-id (a pointer to a key to be used to unlock the same. But with dual keys, the customer key is not referenced, it is implicit as part of the authentication.
 * Limited Access: Authorization and access control mechanisms limit access to keys.
 * Protection from denial of service: multiple replicas of key manager.
 * Data Isolation: Should some audit or law enforcement authority demand access for a certain customer, data belonging to other customers not exposed because they use

= Design Considerations =

High Availability
Think of the key manager as a dictionary of   The keys have to be as accessible as the objects they encrypt. Either the Key Manager backing store should be something along the lines of Swift which provides high availability and redundancy by way of Swift Proxies and multiple replication sites. Alternately, the backing store could be mirrored databases. Ideally the mirrors or replication sites should be in different geographical zones.

For security, keys and the data they seek to protect should not be co-resident on the same physical device. Given this constraint, should one take a Swift backing store solution approach, it may be simple to introduce a separate Swift cluster to store the keys. The storage needs just for the keys would be less than the typical storage needs of Swift for object and snapshot/image storage.

Other high availability solutions that are typically used instead of Swift in a production environment also meet the needs of the Key Manager storage.

Opaque Keys
Keys while in storage will be encrypted for security. This calls for master keys to encrypt the key strings.

Protecting Master Keys
Master keys are long-lived and used to encrypt a large number of keys and require strong protection. These criteria recommend that master keys be readily accessible, stored locally, and as securely as possible. Trusted Compute Platform storage meets these requirements.

Restricting Service Access
The Key Management service will be available only to the OpenStack services, excluding the Compute Hosts, which are the least trusted of the hosts (and the reason no-compute-db feature was developed). Note, it shall not be available to end-users. The user will never be provided access to the keys directly. At the time of account creation, they may request the creation of keys for users/projects/domains.

Restricting Key Access
Keys are owned by the service that creates them, and access to such keys is limited to the service introducing them.

The exception to the above are the wider scope keys that are used in dual locking. That is the User/Project/Domain keys, which belong to  the Identity Service, which is a part of Keystone. Keystone's Trust feature will be used to delegate access to such keys to the encryption service needing them. Delegation comes with an expiry period. Delegation brings with a need for services to access the Keystone Identity service master key. Transfer of the Identity Keystone Master key from one service to the other can be securely performed using TPM symmetric key sharing protocols.

Key Attributes / meta-data
Keys could have attributes such as no-cache, number-of-uses. KMIP has a notion of usage-mask, who can use for what purposes (encryption/decryption etc).

Key Caching
Key Manager’s keys need to be accessible at the same level as the objects they encrypt, to ensure ready access. The keys themselves could be cached at the service endpoint using them with an expiration equal to or less than that of the access token lifetime used to obtain them. Caching reduces network traffic and the load on the key manager. With dual keys, where the wider scope key is obtained through access delegation, the lifetime would be that of the delegation period.

Logging
Need to log all access to keys, CRUD: creation, read, update, delete actions should be logged as important events. This would meet regulation/audit needs such as HIPAA, Sarbanes Oxley etc.

Invalid/autherized attempts shpuld also be logged including IP address, time etc that may indicate hacking attempts and need for action.

Life Cycle Management: Background Tasks

 * 1) archiving, re-keying
 * 2) API for life cycle management
 * 3) Plug-in solutions/implementations (open source and proprietary)

Side Benefits

 * 1) Communication between the service and the key manager do not need to be further encrypted using ssl or https because they keys flying between them are at all times encrypted. The decrypted key string would at any time only reside on the service that seeks to save it or use.
 * 2) Keys used by different open stack services could reside in a single storage system but if one service were to be compromised, the keys from other services would still be safe.
 * 3) Further, should there be a desire to change a master key, only keys stored by that service need to be re-encrypted. The actual data that they were used to encrypt do not  need to be re-encrypted.

= Key Manager in OpenStack =

Key API
create  Key manager will create a random key and save the same, and return a tuple  The communication between requester and key-manager should be secure to ensure that the key is not compromised. get    put   

delete 

update 

By supporting key delete, we essentially render any stored data associated with it inaccessible. It will not be necessary to "wipe" clean / shred for instance a block device. Just necessary to update that they area is free and can be re-used.

Key Scope:

 * Per entity (entity could be a volume, an object, a VM image/snapshot)
 * Per user
 * Per project (within a domain)
 * Per domain

For strong encryption, typically a key is used in conjunction with an initialization vector (IV). The per-entity key would serve as an IV. It could be used alone or in conjunction with a wider scoped key, such as a domain scope key.

Key Size
Some algorithms require longer keys, so we support a wide range.
 * 128, 192, 256, .. 2048 .. longer or shorter (possibly used with padding).



The master keys would be held in TPM Storage



Encryption
Available encryption algorithm options would be obtained by the OpenStack services directly querying the libraries used to provide such encryption support. The options would also be provided as options during user/project/domain creation, to set defaults. The options may further be offered with each entity creation (could get too chatty for high volume data such as objects). Typical options would be RSA, AES etc.

Swift(object storage) example: assume an object X is stored in encrypted for on the Swift object store. Let enc-object-x be the encrypted representation of object X. Then the Swift file system would contain:

Swift

enc-object-x, meta_data: 

Similarly, an encrypted Cinder volume might be represented as

Cinder

Volume, meta_data: <enc:true, algorithm:aes-xts, key-id:abcdefghijklmnopqrstuvxyz>

Key Flow
The figures below illustrate how the Key Manager fits into the regular flow of putting and getting an object in Swift. For simplicity, caching of keys and secondary key handling (for dual locking) is omitted.

Concerns/Questions

 * 1) Another failure point: With another service, Key Manager, in the picture, we have another component that could fail. But encryption need keys, maintained by either the end user or the server. This is a feature cost. Caching keys mitigates some of the problems that arise from network latency and server failure. Using the TPM to protect the encryption master keys makes the cache less of a security hole.
 * 2) KMIP: Do we need to support KMIP in OpenStack?  If the keys are not for end user direct consumption, KMIP is not mandated. However if we desire to use the key manager to save private and public keys of the OpenStack services, then  KMIP would be useful to exchange information across cloud boundaries.
 * 3) Encryption data transfer overhead:  Keys typically are not updated, except on master key re-keying. Swift uses Rsync for replication, and for objects of size keys, it is not a performance criterion. However, strong encryption requires Initialization Vectors (IV)/Salts/and cipher chaining. Thus a small change in a document towards the beginning will generate a totally different encrypted object, which is what we desire, but in the context of  Rsync, it implies a full data payload needs to be transferred. But data protection overrides all else here. Further, use cases may establish that this is an unwarranted concern if typically there are few updates to an object.
 * 4) Unauthorized key deletion: If we use a Swift based system for the Key Manager backend store, a hijacked server with spurious insertions of tombstone records to mimic a legitimate deletion Swift storage nodes would result in key loss by way of a background reaper task periodically deleting such objects. This would not be a new security hazard, and has to be handled as today. Perhaps key deletion could be turned off to prevent such havoc.
 * 5) Fear of Key Loss: Key Manager back end storage should ideally be distributed in geographically disparate locations.
 * 6) Salts/IV: A key per object/entity behaves like a salt/IV, especially when used in conjunction with a user/project/domain (wider scope) key

= Phased Implementation = Key Manager implementation could be in phases along the lines below. Double locking could even be part of phase I.

Phase I

 * 1) Stub Key Manager, could pull out JHU-APL or Mirantis Key Manager implementation or Rackspace's (*) key manager as a service solution(added April 14, 2013), and float as either a new OpenStack Service ( or a sub-service of KeyStone). Essentially establish all the plumbing flows. Define a KeyManager_client that the other services use, via the KeyManager API. The key manager back end could initially be a file or mysql or sqlite database. The default could be mysql backend for devstack like single machine deployments for developer/testing.
 * 2) Master keys stored on Python key ring or using mechanisms similar to private key protection on the various OpenStack service host machines.
 * 3) Encryption algorithm and parameters could initially be defaults in a nova conf file (JHU-APL approach for volume encryption) or defaults per user/project/domain, going up the generalization chain till something specifc is found else use a domain level default.

Phase II
Make Key Manager a separate Swift instance, with multiple zones for storage. This would support true HA and fault tolerance.

Phase III

 * 1) Support multiple encryption algorithms via encryption library querying. Provide user interface support to select preferences and store as part of user/project/domain profile.
 * 2) Reaper routine to change a master key for a service, aka re-keying.
 * 3) Support dual locking, a feature that uses KeyStone Identity V3 API's trust feature.

Phase IV
Introduce true TPM support for master keys. For instance, volume encryption may prefer XTS, an encryption strategy that uses sector address. We have expertise on TPM within Intel, and attestation service support already in OpenStack, which is currently being used to verify goodness of compute nodes. This would extend to the OpenStack service nodes.

Phase V
Chef puppet support for transferring symmetric keys to the various service host machines, particularly to scale horizontally in as automated a fashion as possible.

Glossary
Key-string: A string of bits used to encrypt data. Ideally auto-generated using a random number generator that exploits entropy. Intel's hardware random number generator is a high speed source of quality randomness.

Key-id: a unique ID used to index a key-string in the system. The key-id will be attached as meta data with the encrypted object/volume/.

Master-key: a key-string used to encrypt the keys (key-strings) before saving in the key manager, saved in trusted storage at the service end-point.

TPM: Trusted Platform Module

Post Summit Discussion/Revisions/Clarifications

 * Where the key is generated The original design had each agent (Swift, Cinder etc) have its own master key and generate the keys for object/volume etc encryption and use the key manager to just store these along with additional attributes. An advantage of this approach is that the key is never transferred in the clear between any of the cloud service endpoints, and buys time for us to routinely encrypt the communication channels between the end-points. While the design reduces the amount of damage/exposure should an agent get compromised, it adds a layer of deployment complexity. This stems from the master key per agent have to be shared/transferred to sibling agents for high availability. Sure such master keys can be transferred securely, using PKI and encrypting the master with the public key of the receiving agent sibling, or using the TPM transfer protocol which internally also uses PKI public key based encryption of the payload.

Instead, having all keys created at the key manager and encrypted with a master, possibly a different master per key requesting agent, would isolate all key management related activity to the key manager. It would then be a modular plug in.One of the comments regarding dispersed master keys and key generation was that to support high availability.

With this approach the "put object" or encryption example, sequence diagram would change. Swift would invoke "Create_key" instead of creating key itself and invoking put_key.

Keys could still be cached at the agents using a per agent master key that was protected in its TPM or other ways.


 * Encrypted Communication Users such as the NSA need/want source and destination authenticated and encrypted communication.  Adam Young suggested exploring NSS http://www.mozilla.org/projects/security/pki/nss/  to meet this requirement and in so doing also possibly avoid all things eventlet/blocking/performance draining.


 *  API  Red Hat folks suggested examining the DogTag API to determine if it met our needs and act as a point of reference.

Adam Young and Guang-Yee suggested exploring FreeIPA and leveraging.
 *  Life cycle tasks 


 * Logging for Complaince Rackspace's demo for logging is rich.

The keys could instead belong to the original user, and the delegation would provide access to a key if and only if its owner-id matched the user-id (or project-id or domain-id based on key-scope).
 * '''Additional security via time-limited delegated authorization" The original design allowed access to an encrypted object string only to the original owner who did the "put". Swift was then delegated access to the encryption key for a limited time. But Swift could access all the keys it inserted in the original design because they belonged to Swift.


 * Supporting KMIP where necessary  Both the Intel and Rackspace designs support formatters, KMIP being a formatter. (KMIP spec mentions either dynamic formatting or saving data in KMIP format are both acceptable).


 * Separate Service  We were all in agreement about a separate key manager service, to keep its functionality separate, enable isolated hardening, easy plugin should a cloud vendor perhaps want a hardware based solution, not expose it externally (and if necessary offer private and public urls for access and along the lines of security groups limit access) etc.