Jump to: navigation, search

Difference between revisions of "Baremetal"

(some cleanup and updates to the summary and team sections)
(Posting high-level overview. Moved old content to /Historical.)
Line 1: Line 1:
 
__NOTOC__
 
__NOTOC__
* '''Launchpad Entry''': [[NovaSpec]]:general-bare-metal-provisioning-framework
 
* '''Created''': [https://launchpad.net/~mkkang Mikyung Kang]
 
* '''Maintained''':[https://launchpad.net/~mkkang  Mikyung Kang] [https://launchpad.net/~dkang  David Kang] [https://launchpad.net/~50barca Ken Igarashi] [https://launchpad.net/~arata776 Arata Notsu] [https://launchpad.net/~devananda Devananda van der Veen]
 
* '''Contributors''':
 
** [ [https://launchpad.net/~USC-ISI USC/ISI] ] Mikyung Kang <mkkang@isi.edu>, David Kang <dkang@isi.edu>
 
** [NTT DOCOMO] Ken Igarashi <igarashik@nttdocomo.co.jp>
 
** [[VirtualTech|Japan Inc.]] Arata Notsu <notsu@virtualtech.jp>
 
** [HP] Devananda van der Veen <devananda.vdv@gmail.com>
 
* '''Mailing List''':
 
** Discussion is on the openstack-dev email list.
 
** Following convention, we use both [nova] and [baremetal] tags in the Subject line
 
** http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
* '''Bug listing'''
 
** Use the "baremetal" tag when filing bugs in Launchpad
 
** https://bugs.launchpad.net/nova/+bugs?field.tag=baremetal
 
* '''Old content''':
 
** See [[/Historical]]
 
 
 
<<[[TableOfContents]]()>>
 
<<[[TableOfContents]]()>>
  
== Summary ==
+
/!\ If you are looking for the old page, [[GeneralBareMetalProvisioningFramework/Historical|it has been moved here]].
  
This blueprint proposes to support general bare-metal provisioning framework in [[OpenStack]]. 
+
== Overview ==
  
The target release for this is Grizzly. USC/ISI, NTT docomo, and HP are working on bare-metal driver which implements the following:
+
Baremetal is a driver for Openstack Nova Compute which controls physical hardware instead of virtual machines. This hardware, often called the "baremetal nodes", is exposed via Openstack's API and in many ways acts just like any other compute instance. Provisioning and management of physical hardware can thus be accomplished using common cloud tools. This opens the door for the orchestration of physical deployments using Heat, salt-cloud, and so on.
  
 +
The current implementation provides:
 
* Deploy machine images onto bare metal using PXE and iSCSI
 
* Deploy machine images onto bare metal using PXE and iSCSI
 
* Control machine power using IPMI
 
* Control machine power using IPMI
 
* Support common architectures (x86_64, i686)
 
* Support common architectures (x86_64, i686)
* Integration with Quantum for both openflow and non-openflow network environments
 
  
 
Future plans include:
 
Future plans include:
 +
* Improve performance/scalability of PXE deployment process
 +
* Better support for complex network environments (VLANs, etc)
 +
* Support snapshot and migrate of baremetal instances
 
* Support non-PXE image deployment
 
* Support non-PXE image deployment
 
* Support other architectures (arm, tilepro)
 
* Support other architectures (arm, tilepro)
* Support fault-tolerance of bare-metal nova-compute node
+
* Support fault-tolerance of baremetal nova-compute node
  
== Team Branches ==
+
== Key Differences ==
  
The USC/ISI team has a branch here (general bare-metal provisioning framework and non-PXE support):
+
There are several key differences between the baremetal driver and other hypervisor drivers (kvm, xen, etc).
* https://github.com/usc-isi/hpc-trunk-essex (stable/essex)
+
* There is no hypervisor running underneath the cloud instances, so the tenant has full and direct access to the hardware, and that hardware is dedicated to a single instance.
* https://github.com/usc-isi/nova (folsom)
+
* Nova does not have any access to manipulate a baremetal instance except for what is provided at the hardware level and exposed over the network, such as IPMI control. Therefor, some functionality implemented by other hypervisor drivers is not available via the baremetal driver, such as: instance snapshots, attach and detach network volumes to a running instance, and so on.
* [[HeterogeneousTileraSupport]]
+
* It is also important to note that there are additional security concerns created by tenants having direct access to the network (eg., MAC spoofing, packet sniffing, etc).
 +
** Other hypervisors mitigate this with virtualized networking.
 +
** Quantum + [[OpenFlow]] can be used much to the same effect, if your network hardware supports it.
 +
* Public cloud images may not work on some hardware, particularly if your hardware requires add'l drivers to be loaded.
 +
* The PXE sub-driver requires a specialized ramdisk for deployment, which is distinct from the cloud image's ramdisk.
  
NTT docomo has a branch here (PXE support and additional bare-metal features):
+
== Extra services ==
* Working branch: https://github.com/NTTdocomo-openstack/nova (master branch)
 
  
HP "TripleO" team has several branches here:
+
At a minimum, Keystone, Nova, Glance, and Quantum must be up and running. The following additional services are currently required for baremetal deployment, though work is underway to simplify things by removing these:
* https://github.com/tripleo/nova/tree/baremetal-dev
 
* https://github.com/tripleo/devstack/tree/baremetal-dev
 
  
== Etherpads ==
+
* ''dnsmasq''
 +
** This must run on the nova-compute host, and quantum-dhcp must not be answering on the same network.
 +
* ''nova-baremetal-deploy-helper''
 +
** This service must run on the nova-compute host to assist with image deployment.
  
Here is a list of etherpads from past summit discussions:
+
Nova must be configured for the baremetal driver by adding options to the <code><nowiki>[baremetal]</nowiki></code> section of nova.conf. Also, a separate database is used to store hardware information. This can be configured on the same database host with Nova's database or a separate host.
* http://etherpad.openstack.org/GrizzlyBareMetalCloud
 
* http://etherpad.openstack.org/FolsomBareMetalCloud
 
  
== Devstack ==
+
Also, you must inform the baremetal driver of your hardware's physical characteristics:
 +
* The # of CPUs, and amount of RAM and disk
 +
* MAC address of all network interfaces
 +
* optionally, power management login information
 +
This can be done via a Nova API extension or written directly to the baremetal database.
  
Support for running baremetal has been added to devstack, and continues to be updated as new baremetal capabilities land in nova.
+
== Use-cases ==
  
Devstack can be used for simulating a baremetal environment on a single host. Detailed instructions for this are available at https://github.com/tripleo/incubator/blob/master/notes.md
+
Here are a few ideas we have about potential use-cases for the baremetal driver. This isn't an exhaustive list -- there are doubtless many more interesting things which it can do!
  
Devstack can also be used with the baremetal driver to control physical hardware.
+
* High-performance computing clusters.
 +
* Computing tasks that require access to hardware devices which can't be virtualized.
 +
* Database hosting (some databases run poorly in a hypervisor).
 +
* Or, rapidly deploying a cloud infrastructure ....
  
== Overview ==
+
We (the tripleo team) have a vision that Openstack can be used to deploy Openstack at a massive scale. We think the story of getting "from here to there" goes like this:
  
/!\ NOTE /!\  This section is out of date and needs to be updated based on code which has landed in Grizzly trunk. -Devananda, 2013-01-18
+
* First, do simple hardware provisioning with a base image that contains configuration-management software (chef/puppet/salt/etc). The CMS checks in with a central server to determine what packages to install, then installs and configures your applications. All this happens automatically after first-boot of any baremetal node.
 +
* Then, accelerate provisioning by pre-installing your application software into the cloud image, but let a CMS still do all configuration.
 +
* Pre-install KVM and nova-compute into an image, and scale out your compute cluster by using baremetal driver to deploy nova-compute images. Do the same thing for Swift, proxy nodes, software load balancers, and so on.
 +
* Use Heat to orchestrate the deployment of an entire cloud.
 +
* Finally, run a mixture of baremetal nova-compute and KVM nova-compute in the same cloud (shared keystone and glance, but different tenants). Continuously deploy the cloud from the cloud using a common API.
  
[[Image:GeneralBareMetalProvisioningFramework$bm1.png]]
+
== Community ==
  
1) A user requests a baremetal instance.
+
* '''Main Contributors'''
 
+
** [ [https://launchpad.net/~USC-ISI USC/ISI] ]  
* Non-PXE (Tilera):
+
*** Mikyung Kang <mkkang@isi.edu>, David Kang <dkang@isi.edu>
+
*** https://github.com/usc-isi/hpc-trunk-essex (stable/essex)
<pre><nowiki>
+
*** https://github.com/usc-isi/nova (folsom)
    euca-run-instances -t tp64.8x8 -k my.key ami-CCC   
+
*** [[HeterogeneousTileraSupport]]
</nowiki></pre>
+
* [NTT DOCOMO]
 
+
** Ken Igarashi <igarashik@nttdocomo.co.jp>
* PXE
+
** https://github.com/NTTdocomo-openstack/nova
+
* [[VirtualTech|Japan Inc.]]  
<pre><nowiki>
+
** Arata Notsu <notsu@virtualtech.jp>
    euca-run-instances -t baremetal.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC
+
* [ [https://launchpad.net/~tripleo HP tripleo team] ]  
</nowiki></pre>
+
** Devananda van der Veen <devananda@hp.com>, Robert Collins <robertc@hp.com>
 
+
** https://github.com/tripleo/nova/tree/baremetal-dev
 
+
** https://github.com/tripleo/devstack/tree/baremetal-dev
2) nova-scheduler selects a baremetal nova-compute with the following configuration.
+
** irc.freenode.net #tripleo
 
+
* '''Blueprints on Launchpad'''
[[Image:GeneralBareMetalProvisioningFramework$bm2.png]]
+
** https://blueprints.launchpad.net/nova/+spec/general-bare-metal-provisioning-framework
 
+
** https://blueprints.launchpad.net/nova/+spec/improve-baremetal-pxe-deploy
* Here we assume that:
+
** https://blueprints.launchpad.net/quantum/+spec/pxeboot-ports
+
* '''Mailing List''':
<pre><nowiki>
+
** Discussion is on the openstack-dev email list.
    $IP
+
** Following convention, we use both [nova] and [baremetal] tags in the Subject line
      MySQL for baremetal DB runs at the machine whose IP address is $IP(127.0.0.1). It must be changed if a different IP address is used.
+
** http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
+
* '''Bug listing'''
    $ID
+
** Use the "baremetal" tag when filing bugs in Launchpad
      $ID should be replaced by MySQL user id
+
** https://bugs.launchpad.net/nova/+bugs?field.tag=baremetal
 
+
* '''Etherpads''':
    $Password
+
** Here is a list of etherpads from past summit discussions:
      $Password should be replaced by MySQL password
+
*** http://etherpad.openstack.org/GrizzlyBareMetalCloud
</nowiki></pre>
+
*** http://etherpad.openstack.org/FolsomBareMetalCloud
 
+
* '''Team Branches'''
 
+
** The USC/ISI team has a branch here (general bare-metal provisioning framework and non-PXE support):
* Non-PXE (Tilera) [nova.conf]:
+
*** https://github.com/usc-isi/hpc-trunk-essex (stable/essex)
+
*** https://github.com/usc-isi/nova (folsom)
<pre><nowiki>
+
*** [[HeterogeneousTileraSupport]]
    baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm
+
* NTT docomo has a branch here (PXE support and additional bare-metal features):
    compute_driver=nova.virt.baremetal.driver.BareMetalDriver
+
** https://github.com/NTTdocomo-openstack/nova (master branch)
    baremetal_driver=nova.virt.baremetal.tilera.TILERA
+
* HP "TripleO" team has several branches here:
    power_manager=nova.virt.baremetal.tilera_pdu.Pdu
+
** https://github.com/tripleo/nova/tree/baremetal-dev
    instance_type_extra_specs=cpu_arch:tilepro64
+
** https://github.com/tripleo/devstack/tree/baremetal-dev
    baremetal_tftp_root = /tftpboot
+
** Walkthrough for a development environment: https://github.com/tripleo/incubator/blob/master/notes.md
    scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager
 
</nowiki></pre>
 
 
 
 
 
* PXE [nova.conf]:
 
 
<pre><nowiki>
 
    baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm
 
    compute_driver=nova.virt.baremetal.driver.BareMetalDriver
 
    baremetal_driver=nova.virt.baremetal.pxe.PXE
 
    power_manager=nova.virt.baremetal.ipmi.Ipmi
 
    instance_type_extra_specs=cpu_arch:x86_64
 
    baremetal_tftp_root = /tftpboot
 
    scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager
 
    baremetal_deploy_kernel = xxxxxxxxxx
 
    baremetal_deploy_ramdisk = yyyyyyyy
 
</nowiki></pre>
 
 
 
 
 
* [[Image:GeneralBareMetalProvisioningFramework$bm3.png]]
 
 
 
3) The bare-metal nova-compute selects a bare-metal node from its pool based on hardware resources and the instance type (# of cpus, memory, HDDs).
 
 
 
4) Deployment images and configuration are prepared.
 
 
 
* Non-PXE (Tilera):
 
** The key injected file system is prepared and then NFS directory is configured for the bare-metal nodes. The kernel is already put to CF(Compact Flash Memory) of each tilera board and the ramdisk is not used for the tilera bare-metal nodes. For NFS mounting, /tftpboot/fs_x (x=node_id) should be set before launching instances.
 
* PXE:
 
** kernel and ramdisk for the deployment, and the user specified kernel and ramdisk are put to TFTP server. PXE are configured for the baremetal host.
 
 
 
5) The baremetal nova-compute powers on the baremetal node thorough
 
 
 
* Non-PXE (Tilera): PDU(Power Distribution Unit)
 
* PXE: IPMI
 
 
 
6) The image is deployed to bare-metal node.
 
 
 
* Non-PXE (Tilera): The images are deployed to bare-metal nodes. nova-compute mounts AMI into NFS directory based on the id of the selected tilera bare-metal node.
 
* PXE: The host uses the deployment kernel and ramdisk, and the baremetal nova-copute writes AMI to the host's local disk via iSCSI.
 
 
 
7) Bare-metal node is booted.
 
 
 
* Non-PXE (Tilera):
 
** The bare-metal node is configured for network, ssh, and iptables rule.
 
** Done.
 
** [[Image:GeneralBareMetalProvisioningFramework$bm4.png]]
 
* PXE:
 
** The host is rebooted.
 
** Next, the host is booted up by the user specified kernel, ramdisk and its local disk.
 
** Done.
 
** [[Image:GeneralBareMetalProvisioningFramework$bm5.png]]
 
 
 
== Packages A: Non-PXE (Tilera) ==
 
 
 
* This procedure is for RHEL. Reading 'tilera-bm-instance-creation.txt' may make this document easy to understand.
 
* TFTP, NFS, EXPECT, and Telnet installation:
 
 
<pre><nowiki>
 
    $ yum install nfs-utils.x86_64 expect.x86_64 tftp-server.x86_64 telnet
 
</nowiki></pre>
 
 
 
 
 
* TFTP configuration:
 
 
<pre><nowiki>
 
    $ cat /etc/xinetd.d/tftp
 
    # default: off
 
    # description: The tftp server serves files using the trivial file transfer \
 
    #      protocol.  The tftp protocol is often used to boot diskless \
 
    #      workstations, download configuration files to network-aware printers,
 
    #      \
 
    #      and to start the installation process for some operating systems.
 
    service tftp 
 
    {
 
          socket_type            = dgram
 
          protocol                = udp
 
          wait                    = yes
 
          user                    = root
 
          server                  = /usr/sbin/in.tftpd
 
          server_args            = -s /tftpboot
 
          disable                = no
 
          per_source              = 11
 
          cps                    = 100 2
 
          flags                  = IPv4
 
    }
 
    $ /etc/init.d/xinetd restart
 
</nowiki></pre>
 
 
 
 
 
* NFS configuration:
 
 
<pre><nowiki>
 
    $ mkdir /tftpboot
 
    $ mkdir /tftpboot/fs_x (x: the id of tilera board) 
 
    $ cat /etc/exports
 
    /tftpboot/fs_0 tilera0-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_1 tilera1-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_2 tilera2-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_3 tilera3-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_4 tilera4-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_5 tilera5-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_6 tilera6-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_7 tilera7-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_8 tilera8-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_9 tilera9-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    $ sudo /etc/init.d/nfs restart
 
    $ sudo /usr/sbin/exportfs
 
</nowiki></pre>
 
 
 
 
 
* TileraMDE install: TileraMDE-3.0.1.125620:
 
 
<pre><nowiki>
 
    $ cd /usr/local/
 
    $ tar -xvf tileramde-3.0.1.125620_tilepro.tar
 
    $ tar -xjvf tileramde-3.0.1.125620_tilepro_apps.tar.bz2
 
    $ tar -xjvf tileramde-3.0.1.125620_tilepro_src.tar.bz2
 
    $ mkdir /usr/local/TileraMDE-3.0.1.125620/tilepro/tile
 
    $ cd /usr/local/TileraMDE-3.0.1.125620/tilepro/tile/
 
    $ tar -xjvf tileramde-3.0.1.125620_tilepro_tile.tar.bz2
 
    $ ln -s /usr/local/TileraMDE-3.0.1.125620/tilepro/ /usr/local/TileraMDE
 
</nowiki></pre>
 
 
 
 
 
* Installation for 32-bit libraries to execute TileraMDE:
 
 
<pre><nowiki>
 
    $ yum install glibc.i686 glibc-devel.i686
 
</nowiki></pre>
 
 
 
 
 
== Packages B: PXE ==
 
 
 
* This procedure is for Ubuntu 12.04 x86_64. Reading 'baremetal-instance-creation.txt' may make this document easy to understand.
 
* dnsmasq (PXE server for baremetal hosts)
 
* syslinux (bootloader for PXE)
 
* ipmitool (operate IPMI)
 
* qemu-kvm (only for qemu-img)
 
* open-iscsi (connect to iSCSI target at berametal hosts)
 
* busybox (used in deployment ramdisk)
 
* tgt (used in deployment ramdisk)
 
* Example:
 
 
<pre><nowiki>
 
    $ sudo apt-get install dnsmasq syslinux ipmitool qemu-kvm open-iscsi
 
    $ sudo apt-get install busybox tgt
 
</nowiki></pre>
 
 
 
 
 
* Ramdisk for Deployment
 
* To create a deployment ramdisk, use 'baremetal-mkinitrd.sh' in [baremetal-initrd-builder](https://github.com/NTTdocomo-openstack/baremetal-initrd-builder):
 
 
 
<pre><nowiki>
 
    $ cd baremetal-initrd-builder
 
    $ ./baremetal-mkinitrd.sh <ramdisk output path> <kernel version>
 
</nowiki></pre>
 
 
 
 
 
<pre><nowiki>
 
    $ ./baremetal-mkinitrd.sh /tmp/deploy-ramdisk.img 3.2.0-26-generic
 
    working in /tmp/baremetal-mkinitrd.9AciX98N
 
    368017 blocks
 
    Register the kernel and the ramdisk to Glance.
 
</nowiki></pre>
 
 
 
 
 
<pre><nowiki>
 
    $ glance add name="baremetal deployment ramdisk" is_public=true container_format=ari disk_format=ari < /tmp/deploy-ramdisk.img
 
    Uploading image 'baremetal deployment ramdisk'
 
    ===========================================[100%] 114.951697M/s, ETA  0h  0m  0s
 
    Added new image with ID: e99775cb-f78d-401e-9d14-acd86e2f36e3
 
   
 
    $ glance add name="baremetal deployment kernel" is_public=true container_format=aki disk_format=aki < /boot/vmlinuz-3.2.0-26-generic
 
    Uploading image 'baremetal deployment kernel'
 
    ===========================================[100%] 46.9M/s, ETA  0h  0m  0s
 
    Added new image with ID: d76012fc-4055-485c-a978-f748679b89a9
 
</nowiki></pre>
 
 
 
 
 
* ShellInABox
 
* Baremetal nova-compute uses [ShellInABox](http://code.google.com/p/shellinabox/) so that users can access baremetal host's console through web browsers.
 
* Build from source and install:
 
 
 
<pre><nowiki>
 
    $ sudo apt-get install gcc make
 
    $ tar xzf shellinabox-2.14.tar.gz
 
    $ cd shellinabox-2.14
 
    $ ./configure
 
    $ sudo make install
 
</nowiki></pre>
 
 
 
 
 
* PXE Boot Server
 
* Prepare TFTP root directory:
 
 
 
<pre><nowiki>
 
    $ sudo mkdir /tftpboot
 
    $ sudo cp /usr/lib/syslinux/pxelinux.0 /tftpboot/
 
    $ sudo mkdir /tftpboot/pxelinux.cfg
 
</nowiki></pre>
 
 
 
* Start dnsmasq. Example: start dnsmasq on eth1 with PXE and TFTP enabled:
 
 
 
<pre><nowiki>
 
    $ sudo dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/dnsmasq.pid --interface=eth1 --dhcp-range=192.168.175.100,192.168.175.254
 
   
 
    (You may need to stop and disable dnsmasq)
 
    $ sudo /etc/init.d/dnsmasq stop
 
    $ sudo sudo update-rc.d dnsmasq disable
 
</nowiki></pre>
 
 
 
 
 
* How to create an image:
 
* Example: create a partition image from ubuntu cloud images' Precise tarball:
 
 
 
<pre><nowiki>
 
$ wget http://cloud-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-root.tar.gz
 
$ dd if=/dev/zero of=precise.img bs=1M count=0 seek=1024
 
$ mkfs -F -t ext4 precise.img
 
$ sudo mount -o loop precise.img /mnt/
 
$ sudo tar -C /mnt -xzf ~/precise-server-cloudimg-amd64-root.tar.gz
 
$ sudo mv /mnt/etc/resolv.conf /mnt/etc/resolv.conf_orig
 
$ sudo cp /etc/resolv.conf /mnt/etc/resolv.conf
 
$ sudo chroot /mnt apt-get install linux-image-3.2.0-26-generic vlan open-iscsi
 
$ sudo mv /mnt/etc/resolv.conf_orig /mnt/etc/resolv.conf
 
$ sudo umount /mnt
 
</nowiki></pre>
 
 
 
 
 
== Nova Directories ==
 
 
 
<pre><nowiki>
 
    $ sudo mkdir /var/lib/nova/baremetal
 
    $ sudo mkdir /var/lib/nova/baremetal/console
 
    $ sudo mkdir /var/lib/nova/baremetal/dnsmasq
 
</nowiki></pre>
 
 
 
 
 
== Baremetal Database ==
 
 
 
* Create the baremetal database. Grant all provileges to the user specified by the 'baremetal_sql_connection' flag. Example:
 
 
 
<pre><nowiki>
 
$ mysql -p
 
mysql> create database nova_bm;
 
mysql> grant all privileges on nova_bm.* to '$ID'@'%' identified by '$Password';
 
mysql> exit
 
</nowiki></pre>
 
 
 
 
 
* Create tables:
 
 
 
<pre><nowiki>
 
$ bm_db_sync
 
</nowiki></pre>
 
 
 
 
 
== Create Baremetal Instance Type ==
 
 
 
* First, create an instance type in the normal way.
 
 
<pre><nowiki>
 
$ nova-manage instance_type create --name=tp64.8x8 --cpu=64 --memory=16218 --root_gb=917 --ephemeral_gb=0 --flavor=6 --swap=1024 --rxtx_factor=1
 
$ nova-manage instance_type create --name=bm.small --cpu=2 --memory=4096 --root_gb=10 --ephemeral_gb=20 --flavor=7 --swap=1024 --rxtx_factor=1
 
(about --flavor, see 'How to choose the value for flavor' section below)
 
</nowiki></pre>
 
 
 
 
 
* Next, set baremetal extra_spec to the instance type:
 
 
<pre><nowiki>
 
$ nova-manage instance_type set_key --name=tp64.8x8 --key cpu_arch --value 'tilepro64'
 
$ nova-manage instance_type set_key --name=bm.small --key cpu_arch --value 'x86_64'
 
</nowiki></pre>
 
 
 
 
 
== How to choose the value for flavor ==
 
 
 
* Run nova-manage instance_type list, find the maximum FlavorID in output. Use the maximum FlavorID+1 for new instance_type.
 
 
<pre><nowiki>
 
$ nova-manage instance_type list
 
m1.medium: Memory: 4096MB, VCPUS: 2, Root: 40GB, Ephemeral: 0Gb, FlavorID: 3, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.small: Memory: 2048MB, VCPUS: 1, Root: 20GB, Ephemeral: 0Gb, FlavorID: 2, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.large: Memory: 8192MB, VCPUS: 4, Root: 80GB, Ephemeral: 0Gb, FlavorID: 4, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.tiny: Memory: 512MB, VCPUS: 1, Root: 0GB, Ephemeral: 0Gb, FlavorID: 1, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.xlarge: Memory: 16384MB, VCPUS: 8, Root: 160GB, Ephemeral: 0Gb, FlavorID: 5, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
</nowiki></pre>
 
 
 
* In the example above, the maximum Flavor ID is 5, so use 6 and 7.
 
 
 
== Start Processes ==
 
 
 
<pre><nowiki>
 
(Currently, you might have trouble if run processes as a user other than the superuser...)
 
$ sudo bm_deploy_server &
 
$ sudo nova-scheduler &
 
$ sudo nova-compute &
 
</nowiki></pre>
 
 
 
 
 
== Register Baremetal Node and NIC ==
 
 
 
* First, register a baremetal node.
 
* non-PXE (Tilera): Next, register the baremetal node's NICs.
 
* PXE: First, register a baremetal node. In this step, one of the NICs must be specified as a PXE NIC. Ensure the NIC is PXE-enabled and the NIC is selected as a primary boot device in BIOS. Next, register all the NICs except the PXE NIC specified in the first step.
 
* To register a baremetal node, use 'nova-bm-manage node create'. It takes the parameters listed below.
 
 
<pre><nowiki>
 
--host: baremetal nova-compute's hostname
 
--cpus=: number of CPU cores
 
--memory_mb: memory size in MegaBytes
 
--local_gb: local disk size in GigaBytes
 
--pm_address: tilera node's IP address / IPMI address
 
--pm_user: IPMI username
 
--pm_password: IPMI password
 
--prov_mac_address: tilera node's MAC address / PXE NIC's MAC address
 
--terminal_port: TCP port for ShellInABox. Each node must use unique TCP port. If you do not need console access, use 0.
 
</nowiki></pre>
 
 
 
 
<pre><nowiki>
 
# Tilera example
 
$ nova-bm-manage node create --host=bm1 --cpus=64 --memory_mb=16218 --local_gb=917 --pm_address=10.0.2.1 --pm_user=test --pm_password=password --prov_mac_address=98:4b:e1:67:9a:4c --terminal_port=0
 
# PXE/IPMI example
 
$ nova-bm-manage node create --host=bm1 --cpus=4 --memory_mb=6144 --local_gb=64 --pm_address=172.27.2.116 --pm_user=test --pm_password=password --prov_mac_address=98:4b:e1:11:22:33 --terminal_port=8000
 
</nowiki></pre>
 
 
 
 
 
* To verify the node registration, run 'nova-bm-manage node list':
 
 
<pre><nowiki>
 
$ nova-bm-manage node list
 
ID        SERVICE_HOST  INSTANCE_ID  CPUS    Memory    Disk      PM_Address        PM_User          TERMINAL_PORT  PROV_MAC            PROV_VLAN
 
1        bm1          None          64      16218    917      10.0.2.1          test              0  98:4b:e1:67:9a:4c  None
 
2        bm1          None          4      6144      64        172.27.2.116      test              8000  98:4b:e1:11:22:33  None
 
</nowiki></pre>
 
 
 
 
 
* To register NIC, use 'nova-bm-manage interface create'. It takes the parameters listed below.
 
 
<pre><nowiki>
 
--node_id: ID of the baremetal node owns this NIC (the first column of 'bm_node_list')
 
--mac_address: this NIC's MAC address in the form of xx:xx:xx:xx:xx:xx
 
--datapath_id: datapath ID of OpenFlow switch this NIC is connected to
 
--port_no: OpenFlow port number this NIC is connected to
 
(--datapath_id and --port_no are used for network isolation. It is OK to put 0, if you do not have OpenFlow switch.)
 
</nowiki></pre>
 
 
 
 
 
 
<pre><nowiki>
 
# example: node 1, without OpenFlow
 
$ nova-bm-manage interface create --node_id=1 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0 --port_no=0
 
# example: node 2, with OpenFlow
 
$ nova-bm-manage interface create --node_id=2 --mac_address=98:4b:e1:11:22:34 --datapath_id=0x123abc --port_no=24
 
</nowiki></pre>
 
 
 
 
 
* To verify the NIC registration, run 'bm_interface_list':
 
 
<pre><nowiki>
 
$ bm_interface_list
 
ID        BM_NODE_ID        MAC_ADDRESS        DATAPATH_ID      PORT_NO
 
1        1                98:4b:e1:67:9a:4e  0x0              0
 
2        2                98:4b:e1:11:22:34  0x123abc          24
 
</nowiki></pre>
 
 
 
 
 
== Run Instance ==
 
 
 
* Run instance using the baremetal instance type. Make sure to use kernel and image that support baremetal hardware (i.e contain drivers for baremetal hardware ).
 
 
<pre><nowiki>
 
euca-run-instances -t tp64.8x8 -k my.key ami-CCC
 
euca-run-instances -t bm.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC
 
</nowiki></pre>
 

Revision as of 21:16, 4 February 2013

<<TableOfContents()>>

/!\ If you are looking for the old page, it has been moved here.

Overview

Baremetal is a driver for Openstack Nova Compute which controls physical hardware instead of virtual machines. This hardware, often called the "baremetal nodes", is exposed via Openstack's API and in many ways acts just like any other compute instance. Provisioning and management of physical hardware can thus be accomplished using common cloud tools. This opens the door for the orchestration of physical deployments using Heat, salt-cloud, and so on.

The current implementation provides:

  • Deploy machine images onto bare metal using PXE and iSCSI
  • Control machine power using IPMI
  • Support common architectures (x86_64, i686)

Future plans include:

  • Improve performance/scalability of PXE deployment process
  • Better support for complex network environments (VLANs, etc)
  • Support snapshot and migrate of baremetal instances
  • Support non-PXE image deployment
  • Support other architectures (arm, tilepro)
  • Support fault-tolerance of baremetal nova-compute node

Key Differences

There are several key differences between the baremetal driver and other hypervisor drivers (kvm, xen, etc).

  • There is no hypervisor running underneath the cloud instances, so the tenant has full and direct access to the hardware, and that hardware is dedicated to a single instance.
  • Nova does not have any access to manipulate a baremetal instance except for what is provided at the hardware level and exposed over the network, such as IPMI control. Therefor, some functionality implemented by other hypervisor drivers is not available via the baremetal driver, such as: instance snapshots, attach and detach network volumes to a running instance, and so on.
  • It is also important to note that there are additional security concerns created by tenants having direct access to the network (eg., MAC spoofing, packet sniffing, etc).
    • Other hypervisors mitigate this with virtualized networking.
    • Quantum + OpenFlow can be used much to the same effect, if your network hardware supports it.
  • Public cloud images may not work on some hardware, particularly if your hardware requires add'l drivers to be loaded.
  • The PXE sub-driver requires a specialized ramdisk for deployment, which is distinct from the cloud image's ramdisk.

Extra services

At a minimum, Keystone, Nova, Glance, and Quantum must be up and running. The following additional services are currently required for baremetal deployment, though work is underway to simplify things by removing these:

  • dnsmasq
    • This must run on the nova-compute host, and quantum-dhcp must not be answering on the same network.
  • nova-baremetal-deploy-helper
    • This service must run on the nova-compute host to assist with image deployment.

Nova must be configured for the baremetal driver by adding options to the [baremetal] section of nova.conf. Also, a separate database is used to store hardware information. This can be configured on the same database host with Nova's database or a separate host.

Also, you must inform the baremetal driver of your hardware's physical characteristics:

  • The # of CPUs, and amount of RAM and disk
  • MAC address of all network interfaces
  • optionally, power management login information

This can be done via a Nova API extension or written directly to the baremetal database.

Use-cases

Here are a few ideas we have about potential use-cases for the baremetal driver. This isn't an exhaustive list -- there are doubtless many more interesting things which it can do!

  • High-performance computing clusters.
  • Computing tasks that require access to hardware devices which can't be virtualized.
  • Database hosting (some databases run poorly in a hypervisor).
  • Or, rapidly deploying a cloud infrastructure ....

We (the tripleo team) have a vision that Openstack can be used to deploy Openstack at a massive scale. We think the story of getting "from here to there" goes like this:

  • First, do simple hardware provisioning with a base image that contains configuration-management software (chef/puppet/salt/etc). The CMS checks in with a central server to determine what packages to install, then installs and configures your applications. All this happens automatically after first-boot of any baremetal node.
  • Then, accelerate provisioning by pre-installing your application software into the cloud image, but let a CMS still do all configuration.
  • Pre-install KVM and nova-compute into an image, and scale out your compute cluster by using baremetal driver to deploy nova-compute images. Do the same thing for Swift, proxy nodes, software load balancers, and so on.
  • Use Heat to orchestrate the deployment of an entire cloud.
  • Finally, run a mixture of baremetal nova-compute and KVM nova-compute in the same cloud (shared keystone and glance, but different tenants). Continuously deploy the cloud from the cloud using a common API.

Community