MessageSecurity

Message Security
Message Security in OpenStack is currently not implemented. Recently there have been a couple of proposals to implement signatures and eventually encryption for RPC messages.

Implementing this kind of security features is a delicate task as there are the usual conflicting trade off between security and performance as well as some peculiar issues with the nature of OpenStack distributed environment.

Public Crypto versus Shared Keys
One of the assumptions in the present proposals is that Public Key crypto will be used to provide Integrity for messages, however a simple Shared Key crypto model is also well suited to handle Messaging Queues.

One reason why Public Key crypto is being proposed is the perceived lower overhead of the public key trust model. However let's analyze what is required to use either model.

Public Key Infrastructure
The first point is that each service that needs to send messages will need to own a Public/Private Key pair, and this means some form of secure storage for the Private Key.

Next the Public Key also must be made available, there are 2 strategies to do so, a PKI model where a Public Key is signed by a Central Authority, or a Trusted Repository where all Public Key are deposited and guaranteed to be good by either cross signatures or a well know party that can authoritatively assert whether a key is good or bad. This in turn require that quite regularly all clients check that their peers keys are still valid and not revoked. This is true with either a PKI style system where CRLs or OCSP responders are queried, or a central trust authority where Public Keys are checked for validity.

There is also the non-trivial task of deciding where keys are generated as virtual machine based systems tend to have poor entropy and sourcing enough to generate key pairs can be a problem at installation time. Following that there is the problem of how to communicate the public key to the CA for signing or to the trusted repository for depositing it.

Shared Key Infrastructure
The first point is that each service that needs to send messages will need to own its own secret key, and this means some form of secure storage for the Secret Key.

In a shared key model ideally each actor would have a different shared key with each and every other peer, however this becomes very quickly impossible to achieve as the number of peers scales up, both in terms of storage and exchanges necessary. A Key Server approach is the only reasonable way.

With a Key Server each service only needs one Secret Key which it shares with the Key Server. The actual key used to communicate between any 2 peers is provided on the fly by the key server, which needs to be contacted by at least one of the peers before they can actually send messages between themselves. The Key server will provide Tickets, containing Signing and Encryption Keys (SEK), that are bound to a specific peer-pair and allow them to communicate safely.

Once keys are obtained the two peers become independent from the Key Server and can send as many messages as required until the keys are invalid. In such a system the Secret Keys shared between Services and Key Server have long term validity while the signing and encryption keys can have a relatively short validity period so that brute force attacks on the messages will not lead to gaining access to any long term secret and be of limited value.

Security considerations for the trusted server
With either model the central server will have to store some keys and guarantee their validity.

In the Public Key model the central server needs to be able to provide proof of validity of a key and mark as revoked keys that are considered compromised. In order to do that a signing key needs to be handled by this server and signatures are used to mark public keys as valid or revoked via CRLs or OCSPs. In the case of short lived Public Keys instead an authentication system needs to be provided so that services can authenticate and store their public keys. In both cases the overall system will have to rely on some Public Key that identifies the trusted authority, be it in the form of the Public Key and Certificated representing a PKI or be it in the form of a Public Key and x509 Certificate used to secure the connection with the Trusted Repository. In either case, a compromise of this key will compromise the whole system.

In the shared key model the Key Server will hold a master key used to encrypt the Service keys in its storage. Authentication between services and the Key Server is based on a shared Secret that can be easily rotated, unless it has been compromised. Revocation is not necessary as all that is needed is to remove the compromised Secret Key from the Key Server. However generating a new shared Secret Key will require the entire enrollment process to be repeated, and if a Secret Key is compromised in the middle of a server session with other servers, all these sessions will need to be terminated since it wont be obvious which sessions are genuine and which are bogus.

With either system a stateless service will have to contact the trusted authority, be it a PKI, Trusted Repository or Key Server in order to either get a session key or to check the revocation status. With both systems services that can keep state can cache the session key or the revocation checks until expiration of the keys or until the next validation interval expires so the required communication overhead between services and a central system is similar.

Shared Keys and Key Server Proposal
One advantages of using a Key Server compared to a pure public key based system is that the Encryption and Signing Key exchange can be regulated by the Key Server and it can apply access control and deny communication between arbitrary peers in the system. This allows to more easily perform centralized access control, prevent unauthorized communication and avoid the need to perform post authentication access control and policy lookups on the receiving side.

Given that otherwise the overhead for either a public key based system or a shared key based system in terms of security of the trusted server or communication requirements looks similar we put forward the proposal of using a Shared Key system based on a Key Server.

Note that the service long term key stored in Key Server may be used for derivation and may be used for authentication, however authentication to the Key Server can also be deferred to existing components, for example password based authentication over an HTTPS connection would be sufficient to authenticate a Service to the Key Server. Other methods that would work as well would be Kerberos keytabs and a KDC for authentication, x509 User Certificates and so on. Basically, the authentication to the key server part can be abstracted away if needed.

That said we will proceed to describe also an authentication method based on shared keys that will work for communication over pure HTTP (non encrypted) transport with the Key Server.

Message Integrity and Confidentiality
Securing the message queue requires two distinct components:
 * Integrity or Signing and authentication of messages
 * Confidentiality or Encryption of the messages

In order to reduce the chance of cryptoanalysis with some authentication and encryption keys we will play safe and propose to use separate keys for encryption and authentication even though we will not use mechanisms susceptible to known attacks. For the same reason in order to reduce replay attacks we will propose a scheme that uses different keys depending on the direction of the communication. I.E. The SEK pair for Svc.A -> Svc.B will not be the same as the one for Svc.B -> Svc.A.

Standards
The standard for providing message integrity is HMAC. For encryption the most respected algorithm is currently AES, a block cipher with a fixed 128bits block size.

Because the current feeling is that encryption may not be necessary we will consider it optional. In order to avoid changing message formats this means it is more convenient to use an "encryption first, authentication later" approach, whereby the authentication step does not differ based on whether encryption is performed or not, rather the message being authenticated can be either plain text or encrypted.

The next step is sketching out how to apply encryption and authentication to the message keeping in mind the Horton Principle.

Message Format
The data interchange format en vogue in the project is JSON so we will create a message format based on JSON syntax. The first thing we want to assure is that authentication covers all the message as well as the metadata tied to the message, this is important to avoid substitution attacks where the metadata may be swapped out and replaced without affecting the signature. This means that Message and Metadata will be serialized objects contained in a simpler container..

Pseudo JSON notation: MetaData = jsonutils.dumps({   'source':,    'destination': ,    'timestamp': , # 1/100th second resolution from UTC    'nonce': , # must not repeat until the timestamp changes    'esek': ,    'encryption':  }) Message = jsonutils.dumps(raw_msg)

_METADATA_KEY = 'oslo.secure.metadata' _SIGNATURE_KEY = 'oslo.secure.hmac'

RPC_Message = { _VERSION_KEY: _RPC_ENVELOPE_VERSION, _METADATA_KEY: MetaData, _MESSAGE_KEY: Message, _SIGNATURE_KEY: Signature }

Message Signature
The Signature is calculated over the concatenation of the version string and the buffers.

Version = null terminated string containing the version number MetaData = serialized JSON Metadata Message = serialized JSON Message

Signature = HMAC(SignKey, (Version || MetaData || Message))

We propose to use HMAC-SHA-256 by default as the authentication function as per RFC 6234.

NOTE: Particular care needs to be taken to make sure the RPC_Message obtained in input cannot be abused and the rest of the pipeline will use exclusively what has been authenticated. For this reason the output of the validation function should be a separate structure that provides unserialized Metadata and Message, and further components should not have access to the original RPC_Message. If the same format needs to be maintained a new RPC_Message containing only the version and serialized message will be provided in output, rebuilt from the verified values.

Hashlib has all the code needed to implement this.

Message Encryption
Optionally the message may be encrypted, in this case the MetaData field 'encryption' will be set to True.

Because the use of nonces is particularly difficult to get right, and the use of message queues may involve multiple parties using the same keys when they act in a cluster and because there is a desire to allow as much as possible stateless services, we propose to use AES-128-CBC with a Random IV by default in order to encrypt the content. This requires the availability of a pseudo-random generator on the sender side, we do not expect this to be an issue in practice on the machines used in a typical OpenStack deployment.

Encryption: Plain-Text = P1 || P2 || P3 || ... C0 = Random IV (128bit) for i in range(1, N): Ci = ENC(EncKey, Pi^Ci-1) Encrypted-Message = C0 || C1 || C2 || C3 || ...

Decryption: IV = C0 Cipher-Text = C1 || C2 || C3 || ... for i in range (1, N): Pi = DEC(EncKey, Ci)^Ci-1 Plain-Text = P1 || P2 || P3 || ...

Various python crypto modules have all the code needed to implement this.

Tickets
In order to obtain the Signing and Encryption keys necessary to send messages the client needs to request a Ticket containing them from the Key distribution Server. Obtaining a ticket requires the client to authenticate to the KDS.

Client Authentication and Key Derivation
We propose an authentication and key retrieval scheme to request and transfer Tickets.

Authentication scheme
A simple authentication scheme is used to request a Ticket. The request does not need to be encrypted, because none of the data sent is sensitive and all of it can be deduced by the activity that is going to be performed, last but not least, some of this data needs to be in the clear to identify the requesting service and look up the correct key to use to check the authentication.

We want to reduce the ability to burn Key Server resources so we will embed a timestamp useful to restrict the validity period of any given message.

In addition to timestamp the request needs to contain 3 names.
 * The name of the service making the request, which will be used to lookup the Shared Key and Authenticate the request.
 * The name of the target service.

When receiving the request the first operation must be Authentication of the request, no other field should be considered until the request is authenticated with the Shared Key. Once the HMAC function validate the request, the timestamp MUST be checked for validity.

Pseudo JSON notation: MetaData = jsonutils.dumps({   'requestor':,    'target': ,    'timestamp': , # 1/100th second resolution from UTC    'nonce': , # must not repeat until the timestamp changes })

KeyEx_Request = { 'metadata': MetaData, # base64 encoded 'signature': Signature = HMAC(Key, MetaData) # base64 encoded }

NOTE: as for message signing we use both a timestamp and a nonce here, replay attacks are not a problem for the keyserver, but filtering on timestamp/nonce upfront can save resources on a key server. Checking for replay attacks is therefore optional but welcome.

NOTE: If external authentication is used the Signature will be omitted.

Key Derivation
In order to avoid easy attacks on Keys and in order to be able to quickly expire keys, a key derivation scheme is used to generate the SEK pair.

In all cases, whether a shared key is used for client authentication or authentication is performed by external means (for example via x509 certificates over HTTPS), the Key Server will maintain (or create on the fly if missing) a long term Service Key (which is also the shared key in our authentication scheme) that is used to perform Key Derivation on the server's behalf. These Keys are stored reversibly encrypted with a Key Server master key.

Key derivation is performed using a standard Hash based Key Derivation Function (HKDF) as described in RFC 5869.

The extract function can be used with the Key Server with a Random Salt genrated anew every time and the key shared with the requester. Alternatively a Random Key can be generated an the extract function skipped. This is implementation specific and does not affect the protocol outcome.

The expansion function is given in input parameters to generate different keys based on which pair of services is involved in the process, this way the Session Key is bound to the triplet: sender/receiver/timestamp

Key Derivation inputs: Time.T = The time in the request TTL = Time To Leave, validity in seconds from Time.T Svc.A = the sender service name Svc.B = the receiver service name Key.A = the sender long term key Rnd.Salt = a random salt used for the extract function Rnd.Key = the Key used as input for the expand function Ls = Length of Signing Key (128bits) Le = Lenght of Encryption Key (128bits)

Extract function (optional): Rnd.Key = HKDF-Extract(Rnd.Salt, Key.A)

Expand Function: SEK = HKDF-Expand(Rnd.Key, Svc.A+','+Svc.B+','+Time.T, Ls+Le)

The output of the expand function is an array of bytes of length 256 bits (Ls+Le), the first half will be used as the Signing Key, the second half as the Encryption Key.

Key Exchange
The keys obtained by the Key Derivation step need to be sent back to the requester.

In addition, in order to avoid lookups to the Key Server from both the sender and the receiver, in the normal case, we send the expand function Random Key encrypted with the receiver key: KeyData = jsonutils.dumps({   'key': Rnd.Key, (base64 encoded to avoid json mangling)    'timestamp': Time.T,    'ttl': TTL })

Esek = ENC(Key.B, KeyData)

The source and destination are not included, as they are sent already with every message by the sender. By not including them in the Esek we force the receiver to implicitly check that they are valid and avoid the risk that the receiver forgets to check the ones in the message metadata match the ones in the encrypted Esek.

If the communication is happening over a secure transport like verified HTTPS, then it would be possible to simply return the Ticket directly in the clear, however in case the authentication scheme is used over a clear-text protocol like HTTP the keys need to be protected with encryption. To avoid confusion and possible mistakes we take a conservative approach and alwys return the ticket encrypted. The reply must also be authenticated in order to avoid substitution attacks.

We'll reuse an encryption and authentication scheme similar to the one described previously for securing the messages exchanged between the 2 parties.

Reply Format
Pseudo JSON notation: MetaData = jsonutils.dumps({   'source':,    'destination': ,    'expiration':  })

Optionally encrypted buffer containing the Encryption and Signature pair as returned by the HKDF. Ticket = jsonutils.dumps({   'skey': ,    'ekey': ,    'esek': Esek })

KeyEx_Reply = { 'metadata': MetaData, # base64 encoded 'ticket': Ticket, or ENC(Key.A, Ticket) # base64 encoded 'signature': Signature # base64 encoded }

The Signature is calulated over all the data: MetaData = serialized JSON Metadata Ticket = serialized JSON Metadata, encrypted Signature = HMAC(Key.A, (MetaData || Ticket))

We propose again to use HMAC-SHA-256 as the default authentication function as per RFC 6234.

We'll reuse the same exact scheme used for Message Encryption with AES-128-CBC and a Random IV

RESTful API
As custom a RESTful API will be proposed to access the Key Server, a GET call will be used to obtain a Ticket.

Request: POST /kds/ticket/{Signature}

{   "request": { "metadata": MetaData, "signature": Signature } }

Reply: 200 OK

{   "reply": { "metadata": MetaData, "ticket": Ticket, "signature": Signature } }

Error codes

 * 200 OK - This status code is returned in response to a successful GET operation
 * 401 Unauthorized - This status code is returned when either authentication has not been performed, or the authentication fails.
 * 403 Forbidden - This status code is returned when the requester field does not match either the sender or the receiver fields.
 * 500 Internal Server Error - This status code is returned when an unexpected error has occurred in the server implementation.
 * 501 Not Implemented - This status code is returned when the implementation is unable to fulfill the request because it is incapable of implementing the entire API as specified.
 * 503 Service Unavailable - This status code is returned when the server is unable to communicate with a backend service (database, memcache, ...)

Key Server lookups
Normally only one lookup per peer-pair is needed by the sender in order to be able to send signed and/or encrypted messages to a receiver. Until the expiration time returned in the message no other lookups are needed to send messages to the same reciever. However a receiver may need to performa a lookup when a group name is used as destination.

Group destination
When the destination is a group of services, all the receiver in the group need to be able to lookup a group key in order to be able to validate and unencrypt messages. To avoid long term shared group keys and their management, group keys are only short lived and need to be retrieved by a group member on demand. By using short lived keys we sidestep the revocation issue. Because any group key will be pahesd out of use in a short time. Disabling a compromised group member will sufficie to deprive it of any valid group key as soon as the last released key it had access to expires.

Group Key Lookup
TODO

Fanout messages
Fanout messages signing with symmetric keys is problemaitc however only 3 cases use fanout so far: if not signing doesn't work, we might need to use a onetime-only key scheme, but the lookups to the key server would be quite numerous (one per compute node).
 * nova network, but I have been assured this case will go away
 * nova compute to all schedulers, see group keys above
 * nova scheduler to all compute, this is problematic, but the message is a broadcast request, not a command, so we could simply not sign in this case

Why Keystone ?
Assigning Keys to services and handling group of services effectively means assigning an identity to these services. Keystone is the identity provider/gateway within Openstack so embedding a Key Distribution Server in Keystone seem the natural approach. This specific implementation uses Tickets to allow secure RPC communication between services which is another similarity to Keystone tokens for HTTP based communication.

A new KDS service in Keystone
The Key Distribution Server should be made available in keystone under /kds The one GET operation currently defined in the API paragraph would be reachable at /kds/ticket

The server implies storing keys in a database per target name (in the form of topic.hostname), reversibly encrypted with a master key kept in a file and sourced at keystone startup, based on configuration file options.

It also depends on the oslo-incubator cryptoutils library being built as part of the SecureMessage effort.

Implementation
Implementation would be done in phases and touches several components:
 * oslo-incubator libraries
 * nova and other services (to start performing signing)
 * keystone as the Key Distribution Server

Phase 1
Add basic crypto functions and new message envelope building functions Change code to use new envelope by default with signing optional

Phase 2
Add support for fetching keys but fall back to non signed if lookup fails

Phase 3
Add Key Server with basic per-host keys only

Phase 4
Add Support for shared per service-type keys Add Access Control checks to limit access to these keys

Phase 5
Turn signing on as required by default