Difference between revisions of "Swift/server-side-enc"

Revision as of 15:37, 6 November 2013

Server Side Encryption

Abstract

This wiki page summarizes design aspects and insights into Server Side Encryption for Swift. The general scheme is to create a middleware that will encrypt the object data during PUT and decrypt it during GET. The target is to create two domains - the user domain between the client and the middleware where the data is decrypted and the system domain between the middleware and the data at rest (on the device) where the data is encrypted.

A design goal is to extend swift as necessary but to not change existing swift behaviors.

First level design highlights

The design described here encrypts once at the proxy. To enable encryption, the admin needs to add the encryption middleware to the pipeline. Support is given to non greenfield installations by:

Once the middleware is added, the middleware would be able to encrypt new objects and keep old objects non encrypted
A PUT object would result in encrypting the object data by the middleware if the container is marked for encryption. If an existing container is marked for encryption, existing objects that area already in the system would remain unencrypted (unless a new version is uploaded with PUT).
Encrypted objects would be decrypted even if the container is not marked for encryption
If middleware is removed, keys would not be exposed, new objects would be stored unencrypted, encrypted objects would not be decrypted. Adding the middleware would resume the previous state.
Each object would have its own key for encrypting the data.
The object key of 'AUTH_myaccount/mycontainer/myobject' would be stored as system metadata x-object-sysmeta-key-XXX where XXX is base64(hash('AUTH_myaccount/mycontainer')) with a value of enc(containerkey, objkey) - i.e. objkey encrypted using the container key. This would allow COPY operation without decrypting objects as described below. The hash used would be the Swift system hash (MD5 be default).
Non encrypted objects would have no x-object-sysmeta-key-* metadata.
New objkey would be randomly chosen during PUT object
new containerkey, accountkey would be randomly chosen during the creation of the container/account respectively and will never be changed.
The container key is stored as system metadata x-conatiner-sysmeta-key with a value of enc(accountkey, containerkey) - i.e. containerkey encrypted using the account key.
The decrypted container key is cached together with the container metadata in memcache (requires enhancing Swift core)
The account key is stored as system metadata x-account-sysmeta-key with a value of enc(masterkey, accountkey) - i.e. accountkey encrypted using the master key (master key is stored per account by the key manager).
The decrypted account key is cached together with the account metadata in memcache (requires enhancing Swift core)
Manifests are never encrypted
Object user metadata keys and values are base64 encrypted using the objkey and sent as user metadata: 'x-object-meta-confidentiality: Top-Secret' could become 'x-object-meta-Y29uZmlkZW50aWFsaXR5: VG9wLVNlY3JldA==' (I used null encoding in this example).

Server Side Encryption Details

Etag issues

Current Swift behavior:

During PUT object, at the object server, while the object server writes the chunks to the DiskFile it computes an MD5 checksum of the chunks.
If the proxy had sent an etag header, the object server will compare the computed etag to the one sent by the proxy. If the etag mismatch, the object server return with HTTPUnprocessableEntity... in which case the etag metadata would not be stored as metadata of the object which presumably would result in the object be discarded at some future time via the auditor.
else: the etag is stored under the Etag metadata key of the object.

We name here the MD5 of the stored object 'de-etag' (for decrypted etag) and the MD5 of the encrypted object 'en-etag'. The en-etag is stored in the etag-field of the object allowing auditors to continue working unchanged. This is achieved by removing the etag provided by the client (if provided) and thus allowing the object server to calculate the en-etag field and store it.

During a PUT, the middleware is tasked with:

Calculating the de-etag while chunks are sent to the object server
If the client provided etag, comparing the etag to the calculated de-etag, if not matched responding with HTTPUnprocessableEntity(?) to the client. This would leave an encrypted object without x-object-sysmeta-etag in the system which should be ignored during a GET/HEAD. Eventual consistency should be added to resolve this issue (updating the auditors to discard encrypted objects without x-object-sysmeta-etag)
Else: perform a POST and store the de-etag under x-object-sysmeta-etag. If the proxy fails after the PUT and before the POST succeeded, the eventual consistency discussed above would resolve the issue.

During a HEAD, the middleware would send the x-object-sysmeta-etag as the etag (or would indicate that the object does not exist if the x-object-sysmeta-etag is missing).

Keys

As discussed above, each object data and user metadata is encrypted with its own encryption key named here objkey. The objkey is randomly created during an object PUT and is stored in an encrypted form as part of the object system metadata under x-object-sysmeta-key-XXX where XXX is the base64(hash('AUTH_myaccount/mycontainer')). This unusal key name structure was chosen in order to allow support in COPY operations in Swift. During a copy operation the object may move between accounts and containers and as a result, the objkey would need to be re-encrypted. As the object is being copied it requires access via two separate path under two separate master keys etc. The same object may be copied repeatedly. in order to overcome the eventual consistency issues that may result with the need to update the key of an object after it was copied or to introduce changes into swift, it is suggested to use the above unusual key name thus allowing each path to maintain an individual copy of the objkey in the object system metadata. See COPY below for more details.

Note that the objkey is never changed and the object is never re-encrypted during COPY operations or during master key changes. The objkey is stored in encrypted form as the value of the object sysmeta key x-object-sysmeta-key-XXX. The objkey is encrypted with the key of the container. The container key would be cached using memcache to enable rapid access to multiple objects of the same container at minimal overhead.

The container key is randomly chosen when the container is created and is stored in the container system metadata field x-container-sysmeta-key after being encrypted by the account key. The account key would be cached using memcache to enable rapid access to multiple containers of the same account at minimal overhead. The container key never changes or reencrypted even during master key changes.

The account key is randomly chosen when the account is created and is stored in the account system metadata field x-account-sysmeta-key after being encrypted by the master key of the account owner. The master key is retrieved from a key manager to decrypt the account key and is never cached. The account key never changes, but it is being reencrypted during master key changes.

The account master key is stored by barbican or an alternative key manager.

Consistency and Signitures

When decrypting objects, we do consistency check to make sure we are using the right key and internal checksums match; TODO - add a proper encrypt-and-mac to prevent any modifications of the objects.

Blob Encryption

The M2Crypto library is used for the crypto operations.

API Implementation

During a PUT account or container the middleware:

Choose a random key and would set the sysmeta to store the key

During a PUT object the middleware:

Look at the sysmeta of the container: if x-container-sysmeta-enc-enabled is missing or False perform regular swift core PUT
else: choose a random key, encrypt it with container key and set the x-object-sysmeta-key-XXX with the value of the encrypted objkey, remove all user metadata and insert encrypted version of the user metadata instead; per chunk: (1) calculate de-etag, (2) encrypt the data with objkey (3) send the chunk to swift core; after last chunk, if swift core is succesfull, compare de-etag to client provided etag and fail the request if not matching. Otherwise update x-object-sysmeta-key-etag with de-etag

During a HEAD object the middleware:

Head the object, decrypt the objkey
Remove all user metadata and insert decrypted version of the user metadata instead
Replace the etag with the etag from x-object-sysmeta-etag
?check consistency?

During a GET object the middleware:

GET the object, decrypt the objkey
Remove all user metadata and insert decrypted version of the user metadata instead
Replace the etag with the etag from x-object-sysmeta-etag
Per chunk, decrypt the chunk and send to client
?check consistency?

During a POST object the middleware:

Head the object, decrypt the objkey
Remove all user metadata and insert encrypted version of the user metadata instead
?check consistency?

During a COPY the middleware:

Decrypt the objkey based on the source container key
Encrypt the objkey with the destination container key
Store the new decrypted objkey under the appropriate x-object-sysmeta-key-XXX (using the new path of the destination container)
?check consistency?

Core Swift would perform regular COPY operation and results would be sent back to the client

Unresolved issues

Each PUT REST request is translated by the middleware to two separate internal REST calls at the middleware: PUT + POST to update the de-etag and the consistency signature. If the cluster is configured with copy on POST enabled, this result in a copy of each encrypted object. If many objects are encrypted, this would effect the cluster performance as it would triple the i/o used during each encrypted PUT operations. Short term solution: do not use both encryption and copy on POST on the same system if the system is required to encrypt many of its objects. Long term solution: remove the need to do copy on post by fixing Swift.

@@ Line 66: / Line 66: @@
 The M2Crypto library is used for the crypto operations.
+==== API Implementation ====
+During a PUT account or container  the middleware:
+# Choose a random key and would set the sysmeta to store the key
+During a PUT object the middleware:
+# Look at the sysmeta of the container: if x-container-sysmeta-enc-enabled is missing or False perform regular swift core PUT
+# else: choose a random key, encrypt it with container key and set the  x-object-sysmeta-key-XXX with the value of the encrypted objkey, remove all user metadata and insert encrypted version of the user metadata instead; per chunk: (1) calculate de-etag, (2) encrypt the data with objkey (3) send the chunk to swift core; after last chunk, if swift core is succesfull, compare de-etag to client provided etag and fail the request if not matching. Otherwise update  x-object-sysmeta-key-etag with de-etag
+During a HEAD object  the middleware:
+# Head the object, decrypt the objkey
+# Remove all user metadata and insert decrypted version of the user metadata instead
+# Replace the etag with the etag from x-object-sysmeta-etag
+# ?check consistency?
+During a GET object  the middleware:
+# GET the object, decrypt the objkey
+# Remove all user metadata and insert decrypted version of the user metadata instead
+# Replace the etag with the etag from x-object-sysmeta-etag
+# Per chunk, decrypt the chunk and send to client
+# ?check consistency?
+During a POST object  the middleware:
+# Head the object, decrypt the objkey
+# Remove all user metadata and insert encrypted version of the user metadata instead
+# ?check consistency?
+During a COPY the middleware:
+# Decrypt the objkey based on the source container key
+# Encrypt the objkey with the destination container key
+# Store the new decrypted objkey under the appropriate x-object-sysmeta-key-XXX (using the new path of the destination container)
+# ?check consistency?
+Core Swift would perform regular COPY operation and results would be sent back to the client
 ==== Unresolved issues ====
 # Each PUT REST request is translated by the middleware to two separate internal REST calls at the middleware: PUT + POST to update the de-etag and the consistency signature. If the cluster is configured with copy on POST enabled, this result in a copy of each encrypted object. If many objects are encrypted, this would effect the cluster performance as it would triple the i/o used during each encrypted PUT operations. Short term solution: do not use both encryption and copy on POST on the same system if the system is required to encrypt many of its objects. Long term solution: remove the need to do copy on post by fixing Swift.