Jump to: navigation, search

Difference between revisions of "Bare-metal-trust"

Line 3: Line 3:
 
incurred by virtualization. But how can the cloud provider determine that a user released BM node is free of malware, after all a BM user has full access to the machine and could have introduced rootkits and other malware.
 
incurred by virtualization. But how can the cloud provider determine that a user released BM node is free of malware, after all a BM user has full access to the machine and could have introduced rootkits and other malware.
  
==Proposed solution'==
+
==Proposed solution==
 
Our solution essentially mimics how one may download software and compute its SHA-256 hash and compare against its advertised SHA-256 hash to determine
 
Our solution essentially mimics how one may download software and compute its SHA-256 hash and compare against its advertised SHA-256 hash to determine
 
its legitimacy. It involves using Intel TXT, which is composed of hardware, software, and firmware. The hardware, attached to the platform, called the Trusted Platform Module (TPM)[5], provides the hardware root of trust. Firmware on the TPM is used to compute secure hashes and save the secure hashes to a set of registers called Platform Configuration Registers (PCRs), with different registers containing different measurements. Other components are Intel virtualization technology, signed code modules, and a trusted boot loader
 
its legitimacy. It involves using Intel TXT, which is composed of hardware, software, and firmware. The hardware, attached to the platform, called the Trusted Platform Module (TPM)[5], provides the hardware root of trust. Firmware on the TPM is used to compute secure hashes and save the secure hashes to a set of registers called Platform Configuration Registers (PCRs), with different registers containing different measurements. Other components are Intel virtualization technology, signed code modules, and a trusted boot loader
Line 21: Line 21:
 
An administrator on introducing new hardware into the cloud can confirm whether they need to update the whitelist if on boot the machine fails attestation. This could happen for one of many reasons, such as new hardware distinct from existing machines in the data center/enterprise, BIOS upgrade, new PCIe device attachments etc.
 
An administrator on introducing new hardware into the cloud can confirm whether they need to update the whitelist if on boot the machine fails attestation. This could happen for one of many reasons, such as new hardware distinct from existing machines in the data center/enterprise, BIOS upgrade, new PCIe device attachments etc.
  
'''Work flow'''
+
==Work flow==
===============================
 
 
From node enrollment to allocation and release, the various steps.
 
From node enrollment to allocation and release, the various steps.
 
Note BIOS re-flash and automatic enable TXT will not be available in this cycle
 
Note BIOS re-flash and automatic enable TXT will not be available in this cycle
Line 45: Line 44:
 
7. Jump to 2
 
7. Jump to 2
  
'''How this work'''
+
==How this work==
===============================
 
 
The solution includes two parts, measure and verify:
 
The solution includes two parts, measure and verify:
  
Line 81: Line 79:
 
and return the result.
 
and return the result.
  
'''Limitations and Future Work'''
+
==Limitations and Future Work==
===============================
 
  
 
1) Hot Add/Remove of PCIe devices are not supported.  A machine reboot is
 
1) Hot Add/Remove of PCIe devices are not supported.  A machine reboot is

Revision as of 10:27, 3 August 2015

Problem description

While hypervisors and virtual machines are a common paradigm in the Cloud, heavy compute users seek Bare Metal (BM) to eliminate the performance overhead incurred by virtualization. But how can the cloud provider determine that a user released BM node is free of malware, after all a BM user has full access to the machine and could have introduced rootkits and other malware.

Proposed solution

Our solution essentially mimics how one may download software and compute its SHA-256 hash and compare against its advertised SHA-256 hash to determine its legitimacy. It involves using Intel TXT, which is composed of hardware, software, and firmware. The hardware, attached to the platform, called the Trusted Platform Module (TPM)[5], provides the hardware root of trust. Firmware on the TPM is used to compute secure hashes and save the secure hashes to a set of registers called Platform Configuration Registers (PCRs), with different registers containing different measurements. Other components are Intel virtualization technology, signed code modules, and a trusted boot loader called TBOOT[1]. Essentially the BIOS, option ROM, and kernel/Ramdisk are all measured in the various PCRs. From a bare metal trust standpoint, we are interested in PCRs 0-7(BIOS, option ROM). The kernel/Ramdisk measurements would depend on the image the tenant seeks to launch on their bare metal instance. PCR value testing is provided by an Open Attestation service, OAT[2]. Additional details in references.

We integrate Ironic with Intel TXT to enable detection on boot of any changes in the BIOS, PCIe device firmware, and/or kernel/Ramdisk from expected values. The bare metal is trusted only when there is an exact match. On a legitimate update, for example BIOS firmware upgrade, it is necessary to update the whitelist to include the new expected measurements. For increased cloud security, it is good practice to maintain the whitelist, clearing out old values after all machines are upgraded/updated.

Result:

An OAT client using the customized image verifies the trust state and passes the value to Ironic. The related Horizon blueprint [12] addresses displaying the trust status of a bare metal node on boot. An administrator on introducing new hardware into the cloud can confirm whether they need to update the whitelist if on boot the machine fails attestation. This could happen for one of many reasons, such as new hardware distinct from existing machines in the data center/enterprise, BIOS upgrade, new PCIe device attachments etc.

Work flow

From node enrollment to allocation and release, the various steps. Note BIOS re-flash and automatic enable TXT will not be available in this cycle

1. Manual work to prepare trusted boot (Outside of Ironic)

           (enable TXT, VT-x, VT-d, take ownership of the TPM)

2. Start trusted boot

           (Boot a customized image with OAT-Client using trusted boot)
           (Passing the OAT server URL for attestation)

3. Attestation (Outside of Ironic)

           Node sends its PCR values to the OAT-Server for attestation
           PCRs(0-7) BIOS/Option ROM related
           PCRs(17-22) kernel/Ramdisk related
           For bare metal trust we are chiefly concerned with PCRs(0-7)

4. Ironic polls the result from the OAT-Server as part of the cleaning task

           If trusted:
           Nodes are available and Jump to 5

5. Deploying

           Boot guest image using trusted boot

6. Tenant releases node

7. Jump to 2

How this work

The solution includes two parts, measure and verify:

1.0 MEASURE

1.1 Enable TXT in BIOS (Outside of Ironic, Workflow step 1) This is a prerequisite. We need at least three reboots to enable TPM and TXT. This should be done manually for now, as it is OEM vendor-specific. Some scripts may also be available at a later date to handle this aspect for some OEMs/hardware vendors.

1.2. Using Trusted Boot (Workflow step 2) Leverage TBOOT to generate the PCRs values during trusted boot. Platform Configuration Registers (PCRs) are protected registers provided by the TPM to store measurement.

2.0 VERIFY (Outside of Ironic, Workflow step 3)

2.1 Create customized images Create a customized image with OAT-Client installed. OpenAttestation (OAT) is a Remote Attestation solution. It includes several OAT-Clients and one OAT-Server. OAT-Client should be installed on each bare metal node to interact with the TPM. The OAT-Server has a whitelist to verify the PCRs values from the OAT-Client.

2.2 Register Nodes This action happens when every node becomes active. In order to securely pass PCRs values to OAT-Server, OAT-Client has to download OAT-Server's certificates and register itself in OAT-Server's DB. The OAT-Client will register the node into OAT-Server with one of known good values in the whitelist according to its hardware info. Then the OAT-Client will compare the node's real PCR values with the whitelist value and return the result.

Limitations and Future Work

1) Hot Add/Remove of PCIe devices are not supported. A machine reboot is necessary to obtain fresh measurements and re-attestation.

2) On detecting that a bare metal instance does not meet trust measurements, we shall log it. In the future, we shall introduce a Ceilometer alert event to be consumed by the cloud administrator, perhaps needing follow-up with the prior tenant. Additionally there is a cleaning step that will address reflashing BIOS and Option ROMS. Until reflashed, the untrusted bare metal node will not be allocated to any tenant.

3) Blue Pill When the hardware virtualization support is enabled without a hypervisor running, the machine is open to launch a Blue Pill style attack. Unfortunately OEM vendors do not provide the capability to disable Intel TXT or its equivalents and those that tried to support the same insisted on establishing physical presence. The reasoning here is the dynamic control feature itself could pose a vulnerability. Our goal is unaffected by the Blue Pill aspect because we are concerned strictly with establishing that a machine prior to hand-off to a tenant is clean, free of rootkits and malware. Should the tenant seek to install a rootkit, that is in their purview. Once the machine is released back to the cloud, our workflow will detect and erase the same.

This said, no claims of trust or re-attestation of a bare metal machine are possible post boot time. In the Horizon dashboard re-attest action will not be provided. Node boot time and trust status at boot time will be displayed.