Bare-metal-trust

Problem description
While hypervisors and virtual machines are a common paradigm in the Cloud, heavy compute users seek Bare Metal (BM) to eliminate the performance overhead incurred by virtualization. But how can the cloud provider determine that a user released BM node is free of malware, after all a BM user has full access to the machine and could have introduced rootkits and other malware.

Proposed solution
Our solution essentially mimics how one may download software and compute its SHA-256 hash and compare against its advertised SHA-256 hash to determine its legitimacy. It involves using Intel TXT, which is composed of hardware, software, and firmware. The hardware, attached to the platform, called the Trusted Platform Module (TPM)[3], provides the hardware root of trust. Firmware on the TPM is used to compute secure hashes and save the secure hashes to a set of registers called Platform Configuration Registers (PCRs), with different registers containing different measurements. Other components are Intel virtualization technology, signed code modules, and a trusted boot loader called TBOOT[1]. Essentially the BIOS, option ROM, and kernel/Ramdisk are all measured in the various PCRs. From a bare metal trust standpoint, we are interested in PCRs 0-7(BIOS, option ROM). The kernel/Ramdisk measurements would depend on the image the tenant seeks to launch on their bare metal instance. PCR value testing is provided by an Open Attestation service, OAT[2]. Additional details in references.

We integrate Ironic with Intel TXT to enable detection on boot of any changes in the BIOS, PCIe device firmware, and/or kernel/Ramdisk from expected values. The bare metal is trusted only when there is an exact match. On a legitimate update, for example BIOS firmware upgrade, it is necessary to update the whitelist to include the new expected measurements. For increased cloud security, it is good practice to maintain the whitelist, clearing out old values after all machines are upgraded/updated.

Result:

An OAT client using the customized image verifies the trust state and passes the value to Ironic. A related Horizon blueprint addresses displaying the trust status of a bare metal node on boot. An administrator on introducing new hardware into the cloud can confirm whether they need to update the whitelist if on boot the machine fails attestation. This could happen for one of many reasons, such as new hardware distinct from existing machines in the data center/enterprise, BIOS upgrade, new PCIe device attachments etc.

Typical use cases of bare metal trust
Here is an use case for bare metal trust which will measure each node when it release

1. Manual work to prepare trusted boot (enable TXT, VT-x, VT-d, take ownership of the TPM) 2. Enable trusted boot for Ironic Create a customized user image with ``oat-client`` installed Enroll a node and update its capability value with `trusted_boot`=`true` Create a special flavor with 'capabilities:trusted_boot'=true Prepare `tboot` and mboot.c32 and put them into tftp_root 3. Set up an OAT-Server https://github.com/OpenAttestation/OpenAttestation/wiki/ 4. Deploy on node and using trusted boot to measure node (Boot a customized user image with OAT-Client using trusted boot) (Passing the OAT server URL for attestation) 5. Attestation (Polls the result from the OAT-Server) If trusted: Nodes are available and continue 6. Re-Deploy Boot the tenant user image 7. Tenant releases node Jump to 4

Trust Script Example
Node has to register itself into OAT-Server and wait for attestation, this can be done via boot instance with user-data, here is a sample about the user-data:

#!/bin/bash -v modprobe tpm_tis modprobe tpm #OAT-Sever's info OAT_SERVER_IP=10.239.48.127 OAT_SERVER_USER=admin OAT_SERVER_PASSWD=p2

#Node's IP DEV=eth1 dhclient $DEV OAT_CLIENT_IP=$(ip addr show dev $DEV | grep ' inet '| awk '{print $2}'|cut -c -12) OAT_CLIENT_HOST=$OAT_CLIENT_IP

#Node's info bios=NewMLE1 bios_ver=v123 oem_manu=OEM1 vmm=NewMLE2 vmm_ver=v123 os=OS1 os_ver=v1

#Init OAT-Client iptables -A INPUT -p tcp --dport 8181 -j ACCEPT iptables -A INPUT -p tcp --dport 8080 -j ACCEPT iptables -A INPUT -p tcp --dport 9999 -j ACCEPT iptables -A INPUT -p tcp --dport 9998 -j ACCEPT /usr/bin/tagent setup <<EOF $OAT_SERVER_IP \ $OAT_SERVER_USER $OAT_SERVER_PASSWD EOF

#register a new host oat_cert -h $OAT_SERVER_IP oat_host -a -h $OAT_SERVER_IP " {\"HostName\":\"$OAT_CLIENT_HOST\",\"IPAddress\":\"$OAT_CLIENT_IP\",\"Port\":\"9999\",\"BIOS_Name\":\"$bios\",\"BIOS_Version\":\"$bios_ver\",\"BIOS_Oem\":\"$oem_manu\",\"VMM_Name\":\"$vmm\",\"VMM_Version\":\"$vmm_ver\",\"VMM_OSName\":\"$os\",\"VMM_OSVersion\":\"$os_ver\",\"Email\":\"\",\"AddOn_Connection_String\":\"\",\"Description\":\"\"}"

Manual verify result
User can install and use oat-command tool to query the trust state of the node oat_cert -h $OAT_SERVER_IP oat_pollhosts -h $OAT_SERVER_IP '{"hosts":["$OAT_CLIENT_IP"]}'

Limitations and Future Work
1) Previously, a clean task to rebuild the node with trusted boot was proposed. But it was much more complicated in Ironic's current framework after investigation. Because the node has to leave clean stage if it wants to rebuild. A tool which is on the top of Ironic is a better solution so far. It can log the trust state and trigger Ironic for further steps.

2) Hot Add/Remove of PCIe devices are not supported. A machine reboot is necessary to obtain fresh measurements and re-attestation.

3) Blue Pill When the hardware virtualization support is enabled without a hypervisor running, the machine is open to launch a Blue Pill style attack. Unfortunately OEM vendors do not provide the capability to disable Intel TXT or its equivalents and those that tried to support the same insisted on establishing physical presence. The reasoning here is the dynamic control feature itself could pose a vulnerability. Our goal is unaffected by the Blue Pill aspect because we are concerned strictly with establishing that a machine prior to hand-off to a tenant is clean, free of rootkits and malware. Should the tenant seek to install a rootkit, that is in their purview. Once the machine is released back to the cloud, our workflow will detect and erase the same.

This said, no claims of trust or re-attestation of a bare metal machine are possible post boot time. In the Horizon dashboard re-attest action will not be provided. Node boot time and trust status at boot time will be displayed.