Jump to: navigation, search

Difference between revisions of "MessageSecurity"

(Message Format)
m (Shared Keys and Key Server Proposal)
 
(46 intermediate revisions by 4 users not shown)
Line 8: Line 8:
 
== Public Crypto versus Shared Keys ==
 
== Public Crypto versus Shared Keys ==
  
One of the assumption in the present proposals is that Public Key crypto will be used to provide Integrity for messages, however a simple Shared Key crypto model is also well suited to handle Messaging Queues.
+
One of the assumptions in the present proposals is that Public Key crypto will be used to provide Integrity for messages, however a simple Shared Key crypto model is also well suited to handle Messaging Queues.
  
 
One reason why Public Key crypto is being proposed is the perceived lower overhead of the public key trust model. However let's analyze what is required to use either model.
 
One reason why Public Key crypto is being proposed is the perceived lower overhead of the public key trust model. However let's analyze what is required to use either model.
Line 22: Line 22:
  
 
=== Shared Key Infrastructure ===
 
=== Shared Key Infrastructure ===
 +
The first point is that each service that needs to send messages will need to own its own secret key, and this means some form of secure storage for the Secret Key.
  
 
In a shared key model ideally each actor would have a different shared key with each and every other peer, however this becomes very quickly impossible to achieve as the number of peers scales up, both in terms of storage and exchanges necessary. A Key Server approach is the only reasonable way.
 
In a shared key model ideally each actor would have a different shared key with each and every other peer, however this becomes very quickly impossible to achieve as the number of peers scales up, both in terms of storage and exchanges necessary. A Key Server approach is the only reasonable way.
  
With a Key Server each service only need a key shared with the Key Server itself, the actual key used to communicate between any 2 peers is provided on the fly by the key server, which needs to be contacted by both peers before actually sending messages between them. The Key server will provide Signing and Encryption Keys (SEK) that are bound to a specific peer-pair and allow them to communicate safely.
+
With a Key Server each service only needs one Secret Key which it shares with the Key Server. The actual key used to communicate between any 2 peers is provided on the fly by the key server, which needs to be contacted by at least one of the peers before they can actually send messages between themselves. The Key server will provide Tickets, containing Signing and Encryption Keys (SEK), that are bound to a specific peer-pair and allow them to communicate safely.
  
Once keys are obtained the two peers become independent from the Key Server and can send as many messages as required until the keys are valid. In such a system the keys shared between Services and Key Server have long term validity while the signing and encryption keys can have a relatively short validity period so that brute force attacks on the messages will not lead to gaining access to any long term secret and be of limited value.
+
Once keys are obtained the two peers become independent from the Key Server and can send as many messages as required until the keys are invalid. In such a system the Secret Keys shared between Services and Key Server have long term validity while the signing and encryption keys can have a relatively short validity period so that brute force attacks on the messages will not lead to gaining access to any long term secret and be of limited value.
  
 
=== Security considerations for the trusted server ===
 
=== Security considerations for the trusted server ===
Line 35: Line 36:
 
In the Public Key model the central server needs to be able to provide proof of validity of a key and mark as revoked keys that are considered compromised. In order to do that a signing key needs to be handled by this server and signatures are used to mark public keys as valid or revoked via CRLs or OCSPs. In the case of short lived Public Keys instead an authentication system needs to be provided so that services can authenticate and store their public keys. In both cases the overall system will have to rely on some Public Key that identifies the trusted authority, be it in the form of the Public Key and Certificated representing a PKI or be it in the form of a Public Key and x509 Certificate used to secure the connection with the Trusted Repository. In either case, a compromise of this key will compromise the whole system.
 
In the Public Key model the central server needs to be able to provide proof of validity of a key and mark as revoked keys that are considered compromised. In order to do that a signing key needs to be handled by this server and signatures are used to mark public keys as valid or revoked via CRLs or OCSPs. In the case of short lived Public Keys instead an authentication system needs to be provided so that services can authenticate and store their public keys. In both cases the overall system will have to rely on some Public Key that identifies the trusted authority, be it in the form of the Public Key and Certificated representing a PKI or be it in the form of a Public Key and x509 Certificate used to secure the connection with the Trusted Repository. In either case, a compromise of this key will compromise the whole system.
  
In the shared key model the Key Server will hold a master key used to encrypt the Service keys in its storage, authentication between services and the Key Server is based on a shared Secret that can be easily rotated. Revocation is not necessary as all is needed is to remove the compromised secrets from the Key Server and generate new ones. Revocation is replaced by faster expiring signing/encryption keys.
+
In the shared key model the Key Server will hold a master key used to encrypt the Service keys in its storage. Authentication between services and the Key Server is based on a shared Secret that can be easily rotated, unless it has been compromised. Revocation is not necessary as all that is needed is to remove the compromised Secret Key from the Key Server. However generating a new shared Secret Key will require the entire enrollment process to be repeated, and if a Secret Key is compromised in the middle of a server session with other servers, all these sessions will need to be terminated since it wont be obvious which sessions are genuine and which are bogus.
  
 
With either system a stateless service will have to contact the trusted authority, be it a PKI, Trusted Repository or Key Server in order to either get a session key or to check the revocation status. With both systems services that can keep state can cache the session key or the revocation checks until expiration of the keys or until the next validation interval expires so the required communication overhead between services and a central system is similar.
 
With either system a stateless service will have to contact the trusted authority, be it a PKI, Trusted Repository or Key Server in order to either get a session key or to check the revocation status. With both systems services that can keep state can cache the session key or the revocation checks until expiration of the keys or until the next validation interval expires so the required communication overhead between services and a central system is similar.
Line 41: Line 42:
 
== Shared Keys and Key Server Proposal ==
 
== Shared Keys and Key Server Proposal ==
  
Public Key crypto is generally slow and complex, so given that the overhead for both systems in terms of security of the trusted server or communication requirements looks similar we put forward the proposal of using a Share Key system based on a Key Server.
+
One advantages of using a Key Server compared to a pure public key based system is that the Encryption and Signing Key exchange can be regulated by the Key Server and it can apply access control and deny communication between arbitrary peers in the system. This allows to more easily perform centralized access control, prevent unauthorized communication and avoid the need to perform post authentication access control and policy lookups on the receiving side.
  
Note that the service long term key stored in Key Server is needed for key derivation and may be used for authentication, however authentication can be deferred to existing components, for example password based authentication over an HTTPS connection would be sufficient to authenticate a Service to the Key Server. Other methods that would work as well would be Kerberos keytabs and a KDC for authentication, x509 User Certificates and so on. Basically, the authentication part can be abstracted away if needed.
+
Given that otherwise the overhead for either a public key based system or a shared key based system in terms of security of the trusted server or communication requirements looks similar we put forward the proposal of using a Shared Key system based on a Key Server.
If authentication is performed through external means the long term key does not need to be shared with the service and can be maintained exclusively in the Key Server.
 
  
That said we will proceed to describe also an authentication method based on shared keys that will work for communication over pure HTTP (non encrypted) transport for completeness. We'll defer to the deployment strategy which method to use in preference.
+
Note that the service long term key stored in Key Server may be used for derivation and may be used for authentication, however authentication to the Key Server can also be deferred to existing components, for example password based authentication over an HTTPS connection would be sufficient to authenticate a Service to the Key Server. Other methods that would work as well would be Kerberos keytabs and a KDC for authentication, x509 User Certificates and so on. Basically, the authentication to the key server part can be abstracted away if needed.
 +
 
 +
That said we will proceed to describe also an authentication method based on shared keys that will work for communication over pure HTTP (non encrypted) transport with the Key Server.
  
 
== Message Integrity and Confidentiality ==
 
== Message Integrity and Confidentiality ==
Line 75: Line 77:
 
     'source': <sender>,
 
     'source': <sender>,
 
     'destination': <receiver>,
 
     'destination': <receiver>,
     'timestamp': <python time.time()>,
+
     'timestamp': <time.time()>, # 1/100th second resolution from UTC
 +
    'nonce': <64bit unsigned number>, # must not repeat until the timestamp changes
 +
    'esek': <encrypted SEK pair for the receiver (base64 encoded)>,
 
     'encryption': <true | false>
 
     'encryption': <true | false>
 
})
 
})
Line 103: Line 107:
 
</pre>
 
</pre>
  
We propose to use HMAC-SHA-256 as the authentication function as per [http://tools.ietf.org/html/rfc6234 RFC 6234].
+
We propose to use HMAC-SHA-256 by default as the authentication function as per [http://tools.ietf.org/html/rfc6234 RFC 6234].
  
 
NOTE: Particular care needs to be taken to make sure the RPC_Message obtained in input cannot be abused and the rest of the pipeline will use exclusively what has been authenticated. For this reason the output of the validation function should be a separate structure that provides unserialized Metadata and Message, and further components should not have access to the original RPC_Message. If the same format needs to be maintained a new RPC_Message containing only the version and serialized message will be provided in output, rebuilt from the verified values.
 
NOTE: Particular care needs to be taken to make sure the RPC_Message obtained in input cannot be abused and the rest of the pipeline will use exclusively what has been authenticated. For this reason the output of the validation function should be a separate structure that provides unserialized Metadata and Message, and further components should not have access to the original RPC_Message. If the same format needs to be maintained a new RPC_Message containing only the version and serialized message will be provided in output, rebuilt from the verified values.
Line 113: Line 117:
 
Optionally the message may be encrypted, in this case the MetaData field 'encryption' will be set to True.
 
Optionally the message may be encrypted, in this case the MetaData field 'encryption' will be set to True.
  
Because the use of nonces is particularly difficult to get right, and the use of message queues may involve multiple parties using the same keys when they act in a cluster and because there is a desire to allow as much as possible stateless services, we  propose the use of AES-128-CBC with a Random IV in order to encrypt the content. This requires the availability of a pseudo-random generator on the sender side, we do not expect this to be an issue in practice on the machines used in a typical OpenStack deployment.
+
Because the use of nonces is particularly difficult to get right, and the use of message queues may involve multiple parties using the same keys when they act in a cluster and because there is a desire to allow as much as possible stateless services, we  propose to use AES-128-CBC with a Random IV by default in order to encrypt the content. This requires the availability of a pseudo-random generator on the sender side, we do not expect this to be an issue in practice on the machines used in a typical OpenStack deployment.
  
 
Encryption:
 
Encryption:
Line 135: Line 139:
 
Various python crypto modules have all the code needed to implement this.
 
Various python crypto modules have all the code needed to implement this.
  
== Client Authentication and Key Derivation ==
+
== Tickets ==
 +
 
 +
In order to obtain the Signing and Encryption keys necessary to send messages the client needs to request a Ticket containing them from the Key distribution Server. Obtaining a ticket requires the client to authenticate to the KDS.
 +
 
 +
=== Client Authentication and Key Derivation ===
  
Although not mandatory to implement or use we propose an authentication and key retrieval scheme to request and transfer SEKs.
+
We propose an authentication and key retrieval scheme to request and transfer Tickets.
  
=== Authentication scheme ===
+
==== Authentication scheme ====
  
A simple authentication scheme is used to request a SEK. The request does not need to be encrypted, because none of the data sent is sensitive and all of it can be deduced by the activity that is going to be performed, last but not least, some of this data needs to be in the clear to identify the requesting service and look up the correct key to use to check the authentication.
+
A simple authentication scheme is used to request a Ticket. The request does not need to be encrypted, because none of the data sent is sensitive and all of it can be deduced by the activity that is going to be performed, last but not least, some of this data needs to be in the clear to identify the requesting service and look up the correct key to use to check the authentication.
  
We want to reduce the ability to play replay attacks against the Key Server so we will embed a timestamp useful to restrict the validity period of any given message.
+
We want to reduce the ability to burn Key Server resources so we will embed a timestamp useful to restrict the validity period of any given message.
  
In addition to timestamp and counter the request needs to contain 3 names.
+
In addition to timestamp the request needs to contain 3 names.
 
* The name of the service making the request, which will be used to lookup the Shared Key and Authenticate the request.
 
* The name of the service making the request, which will be used to lookup the Shared Key and Authenticate the request.
* The name of the sending service
+
* The name of the target service.
* The name of the receiving service.
 
  
 
When receiving the request the first operation must be Authentication of the request, no other field should be considered until the request is authenticated with the Shared Key.
 
When receiving the request the first operation must be Authentication of the request, no other field should be considered until the request is authenticated with the Shared Key.
Once the HMAC function validate the request timestamp and counter MUST be checked for validity.
+
Once the HMAC function validate the request, the timestamp MUST be checked for validity.  
 
 
Usually one of the receiving or sending service name will match the name of the service making the request, if this is the case the server will simply proceed without further checks. However if the Service name does not match either sending or receiving name further Access Control needs to be performed. A list of services allowed to impersonate a service role will need to be provided to allow release of a SEK to a service that does not match either receiving or sending names. This may be legitimate for high availability cases when multiple copies of the service may impersonate the same identity. This kind of delegation is out of scope for our first implementation and will not be further discussed.
 
  
 
Pseudo JSON notation:
 
Pseudo JSON notation:
Line 159: Line 164:
 
MetaData = jsonutils.dumps({
 
MetaData = jsonutils.dumps({
 
     'requestor': <requestor>,
 
     'requestor': <requestor>,
     'sender': <sender>,
+
     'target': <target>,
     'receiver': <receiver>,
+
     'timestamp': <time.time()>, # 1/100th second resolution from UTC
     'timestamp': <timestamp>
+
     'nonce': <64bit unsigned number>, # must not repeat until the timestamp changes
 
})
 
})
  
 
KeyEx_Request = {
 
KeyEx_Request = {
     'meta': MetaData,
+
     'metadata': MetaData, # base64 encoded
     'hmac': Signature = HMAC(Key, MetaData)
+
     'signature': Signature = HMAC(Key, MetaData) # base64 encoded
 
}
 
}
 
</pre>
 
</pre>
  
NOTE: we do not use random values here as replies will be identical given identical input (or denied if the timestamp is too old), this means that a replay attack will give an attacker no advantage and it will allow stateless services to request the same key multiple times if needed to process multiple messages from the same sender. (A sender will probably never send exactly the same request as the timestamp will likely vary between them).
+
NOTE: as for message signing we use both a timestamp and a nonce here, replay attacks are not a problem for the keyserver, but filtering on timestamp/nonce upfront can save resources on a key server. Checking for replay attacks is therefore optional but welcome.
  
 
NOTE: If external authentication is used the Signature will be omitted.
 
NOTE: If external authentication is used the Signature will be omitted.
  
=== Key Derivation ===
+
==== Key Derivation ====
 
 
In order to avoid easy attacks on Keys and in order to be able to quickly expire keys, a key derivation scheme is used to generate the SEK
 
  
In all cases, whether a shared key is used for client authentication or authentication is performed by external means (for example via x509 certificates over HTTPS), the Key Server will maintain (or create on the fly if missing) a long rterm Service Key (which is also the shared key in our authentication scheme) that is used to perform Key Derivation on the server's behalf. These Keys are stored reversibly encrypted with a Key Server master key.
+
In order to avoid easy attacks on Keys and in order to be able to quickly expire keys, a key derivation scheme is used to generate the SEK pair.
  
In addition to per service keys, the Key Server generates a new random key every X minutes where X is also the TTL of SEKs. The key server stores 2 or more previous random values to allow a service to retrieve older SEK values if needed, this allows the Key Server to operate in a stateless fashion without disrupting SEK distribution at random key change time. Note that the Random Key could be generated by using a pseudo random function primed with a key derived from the master key. This would allow scaling of the Key Server to multiple machines without the need of interaction in order to exchange Random Keys. The only requirement to allow this deployment is that clocks be kept reasonably in sync. This is already a requirement in general as we want to quicky expire keys and messages to reduce replay attacks therefore we do not see it as an obstacle in a typical OpenStack scenario.
+
In all cases, whether a shared key is used for client authentication or authentication is performed by external means (for example via x509 certificates over HTTPS), the Key Server will maintain (or create on the fly if missing) a long term Service Key (which is also the shared key in our authentication scheme) that is used to perform Key Derivation on the server's behalf. These Keys are stored reversibly encrypted with a Key Server master key.
  
 
Key derivation is performed using a standard Hash based Key Derivation Function (HKDF) as described in [http://tools.ietf.org/html/rfc5869 RFC 5869].
 
Key derivation is performed using a standard Hash based Key Derivation Function (HKDF) as described in [http://tools.ietf.org/html/rfc5869 RFC 5869].
  
The extract function will be used with the Key Server Random Key in order to change the output of the key derivation function at regular intervals and therefore causing effective expiration of previously released keys as the Random Key changes in time.
+
The extract function can be used with the Key Server with a Random Salt genrated anew every time and the key shared with the requester.
 +
Alternatively a Random Key can be generated an the extract function skipped.
 +
This is implementation specific and does not affect the protocol outcome.
  
The expansion function is also given in input parameters to generate different keys based on which pair of services is involved in the process.
+
The expansion function is given in input parameters to generate different keys based on which pair of services is involved in the process, this way the Session Key is bound to the triplet: sender/receiver/timestamp
  
 
Key Derivation inputs:
 
Key Derivation inputs:
 
<pre>
 
<pre>
 
Time.T = The time in the request
 
Time.T = The time in the request
 +
TTL = Time To Leave, validity in seconds from Time.T
 
Svc.A = the sender service name
 
Svc.A = the sender service name
 
Svc.B = the receiver service name
 
Svc.B = the receiver service name
 
Key.A = the sender long term key
 
Key.A = the sender long term key
Rnd.K = The Key Server Random Key valid at Time.T (might be an historic key if the Random Key has just been rotated)
+
Rnd.Salt = a random salt used for the extract function
 +
Rnd.Key = the Key used as input for the expand function
 +
Ls = Length of Signing Key (128bits)
 
Le = Lenght of Encryption Key (128bits)
 
Le = Lenght of Encryption Key (128bits)
Ls = Length of Signing Key (128bits)
 
 
</pre>
 
</pre>
  
Key derivation:
+
Extract function (optional):
 +
<pre>
 +
Rnd.Key = HKDF-Extract(Rnd.Salt, Key.A)
 +
</pre>
 +
 
 +
Expand Function:
 
<pre>
 
<pre>
Pseudo-Random Key (PRK) = HKDF-Extract(Rnd.K, Key.A)
+
SEK = HKDF-Expand(Rnd.Key, Svc.A+','+Svc.B+','+Time.T, Ls+Le)
SEK = HKDF-Expand(PRK, Svc.A+Svc.B, Le+Ls)
 
 
</pre>
 
</pre>
  
The output of the expand function is an array of bytes of length 256 bits (Le+Ls), the first half will be used as the Encryption Key, the second half as the Signing Key.
+
The output of the expand function is an array of bytes of length 256 bits (Ls+Le), the first half will be used as the Signing Key, the second half as the Encryption Key.
  
=== Key Exchange ===
+
==== Key Exchange ====
  
 
The keys obtained by the Key Derivation step need to be sent back to the requester.
 
The keys obtained by the Key Derivation step need to be sent back to the requester.
  
If the communication is happening over a secure transport like verified HTTPS, then it is possible to simply return the keys directly in the clear, however in case the above authentication scheme is used over a clear-text protocol like HTTP the keys need to be protected with encryption. The reply must also be authenticated in order to avoid substitution attacks.
+
In addition, in order to avoid lookups to the Key Server from both the sender and the receiver, in the normal case, we send the expand function Random Key encrypted with the receiver key:
 +
<pre>
 +
KeyData = jsonutils.dumps({
 +
    'key': Rnd.Key, (base64 encoded to avoid json mangling)
 +
    'timestamp': Time.T,
 +
    'ttl': TTL
 +
})
 +
 
 +
Esek = ENC(Key.B, KeyData)
 +
</pre>
 +
 
 +
The source and destination are not included, as they are sent already with every message by the sender. By not including them in the Esek we force the receiver to implicitly check that they are valid and avoid the risk that the receiver forgets to check the ones in the message metadata match the ones in the encrypted Esek.
 +
 
 +
If the communication is happening over a secure transport like verified HTTPS, then it would be possible to simply return the Ticket directly in the clear, however in case the authentication scheme is used over a clear-text protocol like HTTP the keys need to be protected with encryption. To avoid confusion and possible mistakes we take a conservative approach and alwys return the ticket encrypted. The reply must also be authenticated in order to avoid substitution attacks.
  
 
We'll reuse an encryption and authentication scheme similar to the one described previously for securing the messages exchanged between the 2 parties.
 
We'll reuse an encryption and authentication scheme similar to the one described previously for securing the messages exchanged between the 2 parties.
Line 222: Line 246:
 
     'source': <sender>,
 
     'source': <sender>,
 
     'destination': <receiver>,
 
     'destination': <receiver>,
     'expiration': <calculated as timestamp sent in the request + TTL>,
+
     'expiration': <calculated as timestamp sent in the request + TTL>
     'encryption': <true | false>
+
})
 +
 
 +
Optionally encrypted buffer containing the Encryption and Signature pair as returned by the HKDF.
 +
Ticket = jsonutils.dumps({
 +
    'skey': <Signing Key from SEK>,
 +
     'ekey': <Encryption Key from SEK>,
 +
    'esek': Esek
 
})
 
})
  
 
KeyEx_Reply = {
 
KeyEx_Reply = {
     'meta': MetaData,
+
     'metadata': MetaData, # base64 encoded
     'sek': SEK,
+
     'ticket': Ticket, or ENC(Key.A, Ticket) # base64 encoded
     'hmac': Signature
+
     'signature': Signature # base64 encoded
 
}
 
}
 
</pre>
 
</pre>
 
==== Reply Signature ====
 
  
 
The Signature is calulated over all the data:
 
The Signature is calulated over all the data:
 
<pre>
 
<pre>
 
MetaData = serialized JSON Metadata
 
MetaData = serialized JSON Metadata
SEKStore = optionally encrypted buffer containing the Encryption and Signature pair as returned by the HKDF.
+
Ticket = serialized JSON Metadata, encrypted
 
+
Signature = HMAC(Key.A, (MetaData || Ticket))
Signature = HMAC(Key, (MetaData || SEKStore))
 
 
</pre>
 
</pre>
  
We propose again to use HMAC-SHA-256 as the authentication function as per [http://tools.ietf.org/html/rfc6234 RFC 6234].
+
We propose again to use HMAC-SHA-256 as the default authentication function as per [http://tools.ietf.org/html/rfc6234 RFC 6234].
 
 
==== Reply Encryption ====
 
  
On untrusted transport the SEK will be encrypted, in this case the MetaData field 'encryption' will be set to True.
 
 
We'll reuse the same exact scheme used for Message Encryption with AES-128-CBC and a Random IV
 
We'll reuse the same exact scheme used for Message Encryption with AES-128-CBC and a Random IV
  
=== RESTful API ===
+
==== RESTful API ====
  
As custom a RESTful API will be proposed to access the Key Server, a GET call will be used to obtain a SEK.
+
As custom a RESTful API will be proposed to access the Key Server, a GET call will be used to obtain a Ticket.
  
 
Request:
 
Request:
 
<pre>
 
<pre>
GET /keyserver/sek
+
POST /kds/ticket/{Signature}
  
 
{
 
{
     'meta': MetaData,
+
     "request": {
    'hmac': Signature
+
        "metadata": MetaData,
 +
        "signature": Signature
 +
    }
 
}
 
}
 
</pre>
 
</pre>
Line 269: Line 295:
  
 
{
 
{
     'meta': MetaData,
+
     "reply": {
    'sek': SEKStore,
+
        "metadata": MetaData,
    'hmac': Signature
+
        "ticket": Ticket,
 +
        "signature": Signature
 +
    }
 
}
 
}
 
</pre>
 
</pre>
Line 287: Line 315:
 
=== Key Server lookups ===
 
=== Key Server lookups ===
  
Worst case 2 lookups per message (1 for sender and 1 for receiver)
+
Normally only one lookup per peer-pair is needed by the sender in order to be able to send signed and/or encrypted messages to a receiver.
TODO: Expand
+
Until the expiration time returned in the message no other lookups are needed to send messages to the same reciever.
 +
However a receiver may need to performa a lookup when a group name is used as destination.
 +
 
 +
==== Group destination ====
 +
 
 +
When the destination is a group of services, all the receiver in the group need to be able to lookup a group key in order to be able to validate and unencrypt messages.
 +
To avoid long term shared group keys and their management, group keys are only short lived and need to be retrieved by a group member on demand.
 +
By using short lived keys we sidestep the revocation issue. Because any group key will be pahesd out of use in a short time. Disabling a compromised group member will sufficie to deprive it of any valid group key as soon as the last released key it had access to expires.
 +
 
 +
==== Group Key Lookup ====
 +
 
 +
TODO
  
 
=== Fanout messages ===
 
=== Fanout messages ===
Line 294: Line 333:
 
however only 3 cases use fanout so far:
 
however only 3 cases use fanout so far:
 
* nova network, but I have been assured this case will go away
 
* nova network, but I have been assured this case will go away
* nova compute to all schedulers, but we can use a 'scheduler' key the all schedulers have access to
+
* nova compute to all schedulers, see group keys above
 
* nova scheduler to all compute, this is problematic, but the message is a broadcast request, not a command, so we could simply not sign in this case
 
* nova scheduler to all compute, this is problematic, but the message is a broadcast request, not a command, so we could simply not sign in this case
 
if not signing doesn't work, we might need to use a onetime-only key scheme, but the lookups to the key server would be quite numerous (one per compute node).
 
if not signing doesn't work, we might need to use a onetime-only key scheme, but the lookups to the key server would be quite numerous (one per compute node).
 +
 +
== A Key Distribution Server in Keystone ==
 +
 +
=== Why Keystone ? ===
 +
Assigning Keys to services and handling group of services effectively means assigning an identity to these services.
 +
Keystone is the identity provider/gateway within Openstack so embedding a Key Distribution Server in Keystone seem the natural approach.
 +
This specific implementation uses Tickets to allow secure RPC communication between services which is another similarity to Keystone tokens for HTTP based communication.
 +
 +
=== A new KDS service in Keystone ===
 +
The Key Distribution Server should be made available in keystone under /kds
 +
The one GET operation currently defined in the [[MessageSecurity#RESTful_API|API paragraph]] would be reachable at /kds/ticket
 +
 +
The server implies storing keys in a database per target name (in the form of topic.hostname),
 +
reversibly encrypted with a master key kept in a file and sourced at keystone startup, based on configuration file options.
 +
 +
It also depends on the oslo-incubator cryptoutils library being built as part of the SecureMessage effort.
  
 
== Implementation ==
 
== Implementation ==
  
Implementation would be done in phases  
+
Implementation would be done in phases and touches several components:
 +
* oslo-incubator libraries
 +
* nova and other services (to start performing signing)
 +
* keystone as the Key Distribution Server
  
 
=== Phase 1 ===
 
=== Phase 1 ===

Latest revision as of 12:33, 13 May 2014

Message Security

Message Security in OpenStack is currently not implemented. Recently there have been a couple of proposals to implement signatures and eventually encryption for RPC messages.

Implementing this kind of security features is a delicate task as there are the usual conflicting trade off between security and performance as well as some peculiar issues with the nature of OpenStack distributed environment.

Public Crypto versus Shared Keys

One of the assumptions in the present proposals is that Public Key crypto will be used to provide Integrity for messages, however a simple Shared Key crypto model is also well suited to handle Messaging Queues.

One reason why Public Key crypto is being proposed is the perceived lower overhead of the public key trust model. However let's analyze what is required to use either model.

Public Key Infrastructure

The first point is that each service that needs to send messages will need to own a Public/Private Key pair, and this means some form of secure storage for the Private Key.

Next the Public Key also must be made available, there are 2 strategies to do so, a PKI model where a Public Key is signed by a Central Authority, or a Trusted Repository where all Public Key are deposited and guaranteed to be good by either cross signatures or a well know party that can authoritatively assert whether a key is good or bad. This in turn require that quite regularly all clients check that their peers keys are still valid and not revoked. This is true with either a PKI style system where CRLs or OCSP responders are queried, or a central trust authority where Public Keys are checked for validity.

There is also the non-trivial task of deciding where keys are generated as virtual machine based systems tend to have poor entropy and sourcing enough to generate key pairs can be a problem at installation time. Following that there is the problem of how to communicate the public key to the CA for signing or to the trusted repository for depositing it.

Shared Key Infrastructure

The first point is that each service that needs to send messages will need to own its own secret key, and this means some form of secure storage for the Secret Key.

In a shared key model ideally each actor would have a different shared key with each and every other peer, however this becomes very quickly impossible to achieve as the number of peers scales up, both in terms of storage and exchanges necessary. A Key Server approach is the only reasonable way.

With a Key Server each service only needs one Secret Key which it shares with the Key Server. The actual key used to communicate between any 2 peers is provided on the fly by the key server, which needs to be contacted by at least one of the peers before they can actually send messages between themselves. The Key server will provide Tickets, containing Signing and Encryption Keys (SEK), that are bound to a specific peer-pair and allow them to communicate safely.

Once keys are obtained the two peers become independent from the Key Server and can send as many messages as required until the keys are invalid. In such a system the Secret Keys shared between Services and Key Server have long term validity while the signing and encryption keys can have a relatively short validity period so that brute force attacks on the messages will not lead to gaining access to any long term secret and be of limited value.

Security considerations for the trusted server

With either model the central server will have to store some keys and guarantee their validity.

In the Public Key model the central server needs to be able to provide proof of validity of a key and mark as revoked keys that are considered compromised. In order to do that a signing key needs to be handled by this server and signatures are used to mark public keys as valid or revoked via CRLs or OCSPs. In the case of short lived Public Keys instead an authentication system needs to be provided so that services can authenticate and store their public keys. In both cases the overall system will have to rely on some Public Key that identifies the trusted authority, be it in the form of the Public Key and Certificated representing a PKI or be it in the form of a Public Key and x509 Certificate used to secure the connection with the Trusted Repository. In either case, a compromise of this key will compromise the whole system.

In the shared key model the Key Server will hold a master key used to encrypt the Service keys in its storage. Authentication between services and the Key Server is based on a shared Secret that can be easily rotated, unless it has been compromised. Revocation is not necessary as all that is needed is to remove the compromised Secret Key from the Key Server. However generating a new shared Secret Key will require the entire enrollment process to be repeated, and if a Secret Key is compromised in the middle of a server session with other servers, all these sessions will need to be terminated since it wont be obvious which sessions are genuine and which are bogus.

With either system a stateless service will have to contact the trusted authority, be it a PKI, Trusted Repository or Key Server in order to either get a session key or to check the revocation status. With both systems services that can keep state can cache the session key or the revocation checks until expiration of the keys or until the next validation interval expires so the required communication overhead between services and a central system is similar.

Shared Keys and Key Server Proposal

One advantages of using a Key Server compared to a pure public key based system is that the Encryption and Signing Key exchange can be regulated by the Key Server and it can apply access control and deny communication between arbitrary peers in the system. This allows to more easily perform centralized access control, prevent unauthorized communication and avoid the need to perform post authentication access control and policy lookups on the receiving side.

Given that otherwise the overhead for either a public key based system or a shared key based system in terms of security of the trusted server or communication requirements looks similar we put forward the proposal of using a Shared Key system based on a Key Server.

Note that the service long term key stored in Key Server may be used for derivation and may be used for authentication, however authentication to the Key Server can also be deferred to existing components, for example password based authentication over an HTTPS connection would be sufficient to authenticate a Service to the Key Server. Other methods that would work as well would be Kerberos keytabs and a KDC for authentication, x509 User Certificates and so on. Basically, the authentication to the key server part can be abstracted away if needed.

That said we will proceed to describe also an authentication method based on shared keys that will work for communication over pure HTTP (non encrypted) transport with the Key Server.

Message Integrity and Confidentiality

Securing the message queue requires two distinct components:

  • Integrity or Signing and authentication of messages
  • Confidentiality or Encryption of the messages

In order to reduce the chance of cryptoanalysis with some authentication and encryption keys we will play safe and propose to use separate keys for encryption and authentication even though we will not use mechanisms susceptible to known attacks. For the same reason in order to reduce replay attacks we will propose a scheme that uses different keys depending on the direction of the communication. I.E. The SEK pair for Svc.A -> Svc.B will not be the same as the one for Svc.B -> Svc.A.

Standards

The standard for providing message integrity is HMAC. For encryption the most respected algorithm is currently AES, a block cipher with a fixed 128bits block size.

Because the current feeling is that encryption may not be necessary we will consider it optional. In order to avoid changing message formats this means it is more convenient to use an "encryption first, authentication later" approach, whereby the authentication step does not differ based on whether encryption is performed or not, rather the message being authenticated can be either plain text or encrypted.

The next step is sketching out how to apply encryption and authentication to the message keeping in mind the Horton Principle.

Message Format

The data interchange format en vogue in the project is JSON so we will create a message format based on JSON syntax. The first thing we want to assure is that authentication covers all the message as well as the metadata tied to the message, this is important to avoid substitution attacks where the metadata may be swapped out and replaced without affecting the signature. This means that Message and Metadata will be serialized objects contained in a simpler container..

Pseudo JSON notation:

MetaData = jsonutils.dumps({
    'source': <sender>,
    'destination': <receiver>,
    'timestamp': <time.time()>, # 1/100th second resolution from UTC
    'nonce': <64bit unsigned number>, # must not repeat until the timestamp changes
    'esek': <encrypted SEK pair for the receiver (base64 encoded)>,
    'encryption': <true | false>
})
Message = jsonutils.dumps(raw_msg)

_METADATA_KEY = 'oslo.secure.metadata'
_SIGNATURE_KEY = 'oslo.secure.hmac'

RPC_Message = {
    _VERSION_KEY: _RPC_ENVELOPE_VERSION,
    _METADATA_KEY: MetaData,
    _MESSAGE_KEY: Message,
    _SIGNATURE_KEY: Signature
}

Message Signature

The Signature is calculated over the concatenation of the version string and the buffers.

Version = null terminated string containing the version number
MetaData = serialized JSON Metadata
Message = serialized JSON Message

Signature = HMAC(SignKey, (Version || MetaData || Message))

We propose to use HMAC-SHA-256 by default as the authentication function as per RFC 6234.

NOTE: Particular care needs to be taken to make sure the RPC_Message obtained in input cannot be abused and the rest of the pipeline will use exclusively what has been authenticated. For this reason the output of the validation function should be a separate structure that provides unserialized Metadata and Message, and further components should not have access to the original RPC_Message. If the same format needs to be maintained a new RPC_Message containing only the version and serialized message will be provided in output, rebuilt from the verified values.

Hashlib has all the code needed to implement this.

Message Encryption

Optionally the message may be encrypted, in this case the MetaData field 'encryption' will be set to True.

Because the use of nonces is particularly difficult to get right, and the use of message queues may involve multiple parties using the same keys when they act in a cluster and because there is a desire to allow as much as possible stateless services, we propose to use AES-128-CBC with a Random IV by default in order to encrypt the content. This requires the availability of a pseudo-random generator on the sender side, we do not expect this to be an issue in practice on the machines used in a typical OpenStack deployment.

Encryption:

Plain-Text = P1 || P2 || P3 || ...
C0 = Random IV (128bit)
for i in range(1, N):
   Ci = ENC(EncKey, Pi^Ci-1)
Encrypted-Message = C0 || C1 || C2 || C3 || ...

Decryption:

IV = C0
Cipher-Text = C1 || C2 || C3 || ...
for i in range (1, N):
    Pi = DEC(EncKey, Ci)^Ci-1
Plain-Text = P1 || P2 || P3 || ...

Various python crypto modules have all the code needed to implement this.

Tickets

In order to obtain the Signing and Encryption keys necessary to send messages the client needs to request a Ticket containing them from the Key distribution Server. Obtaining a ticket requires the client to authenticate to the KDS.

Client Authentication and Key Derivation

We propose an authentication and key retrieval scheme to request and transfer Tickets.

Authentication scheme

A simple authentication scheme is used to request a Ticket. The request does not need to be encrypted, because none of the data sent is sensitive and all of it can be deduced by the activity that is going to be performed, last but not least, some of this data needs to be in the clear to identify the requesting service and look up the correct key to use to check the authentication.

We want to reduce the ability to burn Key Server resources so we will embed a timestamp useful to restrict the validity period of any given message.

In addition to timestamp the request needs to contain 3 names.

  • The name of the service making the request, which will be used to lookup the Shared Key and Authenticate the request.
  • The name of the target service.

When receiving the request the first operation must be Authentication of the request, no other field should be considered until the request is authenticated with the Shared Key. Once the HMAC function validate the request, the timestamp MUST be checked for validity.

Pseudo JSON notation:

MetaData = jsonutils.dumps({
    'requestor': <requestor>,
    'target': <target>,
    'timestamp': <time.time()>, # 1/100th second resolution from UTC
    'nonce': <64bit unsigned number>, # must not repeat until the timestamp changes
})

KeyEx_Request = {
    'metadata': MetaData, # base64 encoded
    'signature': Signature = HMAC(Key, MetaData) # base64 encoded
}

NOTE: as for message signing we use both a timestamp and a nonce here, replay attacks are not a problem for the keyserver, but filtering on timestamp/nonce upfront can save resources on a key server. Checking for replay attacks is therefore optional but welcome.

NOTE: If external authentication is used the Signature will be omitted.

Key Derivation

In order to avoid easy attacks on Keys and in order to be able to quickly expire keys, a key derivation scheme is used to generate the SEK pair.

In all cases, whether a shared key is used for client authentication or authentication is performed by external means (for example via x509 certificates over HTTPS), the Key Server will maintain (or create on the fly if missing) a long term Service Key (which is also the shared key in our authentication scheme) that is used to perform Key Derivation on the server's behalf. These Keys are stored reversibly encrypted with a Key Server master key.

Key derivation is performed using a standard Hash based Key Derivation Function (HKDF) as described in RFC 5869.

The extract function can be used with the Key Server with a Random Salt genrated anew every time and the key shared with the requester. Alternatively a Random Key can be generated an the extract function skipped. This is implementation specific and does not affect the protocol outcome.

The expansion function is given in input parameters to generate different keys based on which pair of services is involved in the process, this way the Session Key is bound to the triplet: sender/receiver/timestamp

Key Derivation inputs:

Time.T = The time in the request
TTL = Time To Leave, validity in seconds from Time.T
Svc.A = the sender service name
Svc.B = the receiver service name
Key.A = the sender long term key
Rnd.Salt = a random salt used for the extract function
Rnd.Key = the Key used as input for the expand function
Ls = Length of Signing Key (128bits)
Le = Lenght of Encryption Key (128bits)

Extract function (optional):

Rnd.Key = HKDF-Extract(Rnd.Salt, Key.A)

Expand Function:

SEK = HKDF-Expand(Rnd.Key, Svc.A+','+Svc.B+','+Time.T, Ls+Le)

The output of the expand function is an array of bytes of length 256 bits (Ls+Le), the first half will be used as the Signing Key, the second half as the Encryption Key.

Key Exchange

The keys obtained by the Key Derivation step need to be sent back to the requester.

In addition, in order to avoid lookups to the Key Server from both the sender and the receiver, in the normal case, we send the expand function Random Key encrypted with the receiver key:

KeyData = jsonutils.dumps({
    'key': Rnd.Key, (base64 encoded to avoid json mangling)
    'timestamp': Time.T,
    'ttl': TTL
})

Esek = ENC(Key.B, KeyData)

The source and destination are not included, as they are sent already with every message by the sender. By not including them in the Esek we force the receiver to implicitly check that they are valid and avoid the risk that the receiver forgets to check the ones in the message metadata match the ones in the encrypted Esek.

If the communication is happening over a secure transport like verified HTTPS, then it would be possible to simply return the Ticket directly in the clear, however in case the authentication scheme is used over a clear-text protocol like HTTP the keys need to be protected with encryption. To avoid confusion and possible mistakes we take a conservative approach and alwys return the ticket encrypted. The reply must also be authenticated in order to avoid substitution attacks.

We'll reuse an encryption and authentication scheme similar to the one described previously for securing the messages exchanged between the 2 parties.

Reply Format

Pseudo JSON notation:

MetaData = jsonutils.dumps({
    'source': <sender>,
    'destination': <receiver>,
    'expiration': <calculated as timestamp sent in the request + TTL>
})

Optionally encrypted buffer containing the Encryption and Signature pair as returned by the HKDF.
Ticket = jsonutils.dumps({
    'skey': <Signing Key from SEK>,
    'ekey': <Encryption Key from SEK>,
    'esek': Esek
})

KeyEx_Reply = {
    'metadata': MetaData, # base64 encoded
    'ticket': Ticket, or ENC(Key.A, Ticket) # base64 encoded
    'signature': Signature # base64 encoded
}

The Signature is calulated over all the data:

MetaData = serialized JSON Metadata
Ticket = serialized JSON Metadata, encrypted
Signature = HMAC(Key.A, (MetaData || Ticket))

We propose again to use HMAC-SHA-256 as the default authentication function as per RFC 6234.

We'll reuse the same exact scheme used for Message Encryption with AES-128-CBC and a Random IV

RESTful API

As custom a RESTful API will be proposed to access the Key Server, a GET call will be used to obtain a Ticket.

Request:

POST /kds/ticket/{Signature}

{
    "request": {
        "metadata": MetaData,
        "signature": Signature
    }
}

Reply:

200 OK

{
    "reply": {
        "metadata": MetaData,
        "ticket": Ticket,
        "signature": Signature
    }
}

Error codes

  • 200 OK - This status code is returned in response to a successful GET operation
  • 401 Unauthorized - This status code is returned when either authentication has not been performed, or the authentication fails.
  • 403 Forbidden - This status code is returned when the requester field does not match either the sender or the receiver fields.
  • 500 Internal Server Error - This status code is returned when an unexpected error has occurred in the server implementation.
  • 501 Not Implemented - This status code is returned when the implementation is unable to fulfill the request because it is incapable of implementing the entire API as specified.
  • 503 Service Unavailable - This status code is returned when the server is unable to communicate with a backend service (database, memcache, ...)

Operation Considerations

Key Server lookups

Normally only one lookup per peer-pair is needed by the sender in order to be able to send signed and/or encrypted messages to a receiver. Until the expiration time returned in the message no other lookups are needed to send messages to the same reciever. However a receiver may need to performa a lookup when a group name is used as destination.

Group destination

When the destination is a group of services, all the receiver in the group need to be able to lookup a group key in order to be able to validate and unencrypt messages. To avoid long term shared group keys and their management, group keys are only short lived and need to be retrieved by a group member on demand. By using short lived keys we sidestep the revocation issue. Because any group key will be pahesd out of use in a short time. Disabling a compromised group member will sufficie to deprive it of any valid group key as soon as the last released key it had access to expires.

Group Key Lookup

TODO

Fanout messages

Fanout messages signing with symmetric keys is problemaitc however only 3 cases use fanout so far:

  • nova network, but I have been assured this case will go away
  • nova compute to all schedulers, see group keys above
  • nova scheduler to all compute, this is problematic, but the message is a broadcast request, not a command, so we could simply not sign in this case

if not signing doesn't work, we might need to use a onetime-only key scheme, but the lookups to the key server would be quite numerous (one per compute node).

A Key Distribution Server in Keystone

Why Keystone ?

Assigning Keys to services and handling group of services effectively means assigning an identity to these services. Keystone is the identity provider/gateway within Openstack so embedding a Key Distribution Server in Keystone seem the natural approach. This specific implementation uses Tickets to allow secure RPC communication between services which is another similarity to Keystone tokens for HTTP based communication.

A new KDS service in Keystone

The Key Distribution Server should be made available in keystone under /kds The one GET operation currently defined in the API paragraph would be reachable at /kds/ticket

The server implies storing keys in a database per target name (in the form of topic.hostname), reversibly encrypted with a master key kept in a file and sourced at keystone startup, based on configuration file options.

It also depends on the oslo-incubator cryptoutils library being built as part of the SecureMessage effort.

Implementation

Implementation would be done in phases and touches several components:

  • oslo-incubator libraries
  • nova and other services (to start performing signing)
  • keystone as the Key Distribution Server

Phase 1

Add basic crypto functions and new message envelope building functions Change code to use new envelope by default with signing optional

Phase 2

Add support for fetching keys but fall back to non signed if lookup fails

Phase 3

Add Key Server with basic per-host keys only

Phase 4

Add Support for shared per service-type keys Add Access Control checks to limit access to these keys

Phase 5

Turn signing on as required by default