Jump to: navigation, search

Difference between revisions of "Baremetal"

(add Bugs section)
(Baremetal driver has been deleted from Nova, so I am deleting this page. Well, replacing it with a stub / reference to Ironic. Most info on this page was two years old.)
(24 intermediate revisions by 10 users not shown)
Line 1: Line 1:
== Overview ==
The Nova "baremetal" driver was deprecated in the Juno release, and has been deleted from Nova.
The baremetal driver is a hypervisor driver for Openstack Nova Compute. Within the Openstack framework, it has the same role as the drivers for other hypervisors (libvirt, xen, etc), and yet it is presently unique in that the hardware is not virtualized - there is no hypervisor between the tenants and the physical hardware. It exposes hardware via Openstack's API, using pluggable sub-drivers to deliver machine imaging (PXE) and power control (IPMI). WIth this, provisioning and management of physical hardware is accomplished using common cloud APIs and tools, such Heat or salt-cloud. However, due to this unique situation, using the baremetal driver requires some additional preparation of its environment.
Please see [[Ironic]] for all current work on the Bare Metal Provisioning program within OpenStack.
This driver was added to the Grizzly release, but it should be considered somewhat experimental at this point.
'''NOTE:''' On May 7th, the TC approved a scope change for Nova to separate the baremetal driver into its own project. The log of that discussion [http://eavesdrop.openstack.org/meetings/tc/2013/tc.2013-05-07-20.01.html| can be found here.]
=== Terminology ===
There is also some terminology which baremetal introduces.
* ''Baremetal host'' and ''compute host'' are often used interchangeably to refer to the machine which runs the nova-compute and nova-baremetal-deploy-helper services (and possibly other services as well). This functions like a hypervisor, providing power management and imaging services.
* ''Node'' and ''baremetal node'' refer to the physical machines which are controlled by the ''compute host''. When a user requests that Nova start a ''baremetal instance'', it is created on a ''baremetal node''.
* A ''baremetal instance'' is a Nova instance created directly on a physical machine without any virtualization layer running underneath it. Nova retains both power control (via IPMI) and, in some situations, may retain network control (via Quantum and OpenFlow).
* ''Deploy image'' is pair of specialized kernel and ramdisk images which are used by the ''compute host'' to write the user-specified image onto the ''baremetal node''.
* Hardware is ''enrolled'' in the baremetal driver by adding its MAC addresses, physical characteristics (# CPUs, RAM, and disk space), and the IPMI credentials into the baremetal database. Without this information, the ''compute host'' has no knowledge of the ''baremetal node''.
=== Features ===
The current implementation of the Baremetal driver provides the following functionality.
* A Nova API to enroll & manage hardware in the baremetal database
* Power control of enrolled hardware via IPMI
* PXE boot of the baremetal nodes.
* Support for common CPU architectures (i386, x86_64)
* FlatNetwork environments are supported and well tested
** OpenFlow-enabled environments should be supported, but are less well tested at this time
* Cloud-init is used for passing user data into the baremetal instances after provisioning. Limited support for file-injection also exists, but is being deprecated.
Current limitations include:
* A separate dnsmasq process must run on the baremetal compute host to control the PXE boot process. This conflicts with quantum-dhcp, which must therefor be disabled.
* Cloud-init requires an instances' IP be assigned by quantum, and without quantum-dhcp, this requires file injection to set the IP statically.
Future plans include:
* Improve performance/scalability of PXE deployment process
* Better support for complex non-SDN environments (eg., static VLANs)
* Better integration with quantum-dhcp
* Support snapshot and migrate of baremetal instances
* Support non-PXE image deployment
* Support other architectures (arm, tilepro)
* Support fault-tolerance of baremetal nova-compute node
=== Key Differences ===
There are several key differences between the baremetal driver and other hypervisor drivers (kvm, xen, etc).
* There is no hypervisor running underneath the baremetal instances, so the tenant has full and direct access to the hardware, and that hardware is dedicated to a single instance.
* Nova does not have any access to manipulate a baremetal instance except for what is provided at the hardware level and exposed over the network, such as IPMI control. Therefor, some functionality implemented by other hypervisor drivers is not available via the baremetal driver, such as: instance snapshots, attach and detach network volumes to a running instance, and so on.
* It is also important to note that there are additional security concerns created by tenants having direct access to the network (eg., MAC spoofing, packet sniffing, etc).
** Other hypervisors mitigate this with virtualized networking.
** Quantum + [[OpenFlow]] can be used much to the same effect, if your network hardware supports it.
* Public cloud images may not work on some hardware, particularly if your hardware requires add'l drivers to be loaded.
* The PXE driver requires a specialized ramdisk (and a corresponding kernel) for deployment, which is distinct from the cloud image's ramdisk. This can be built via the [http://github.com/stackforge/diskimage-builder diskimage-builder] project. The Glance UUIDs for these two images should be added to the extra_specs for any flavor (instance_type) that will be deployed onto a bare metal compute host. Alternatively, these UUIDs can also be added to the bare metal compute host's nova.conf file.
== Use-cases ==
Here are a few ideas we have about potential use-cases for the baremetal driver. This isn't an exhaustive list -- there are doubtless many more interesting things which it can do!
* High-performance computing clusters.
* Computing tasks that require access to hardware devices which can't be virtualized.
* Database hosting (some databases run poorly in a hypervisor).
* Or, rapidly deploying a cloud infrastructure ....
We (the tripleo team) have a vision that Openstack can be used to deploy Openstack at a massive scale. We think the story of getting "from here to there" goes like this:
* First, do simple hardware provisioning with a base image that contains configuration-management software (chef/puppet/salt/etc). The CMS checks in with a central server to determine what packages to install, then installs and configures your applications. All this happens automatically after first-boot of any baremetal node.
* Then, accelerate provisioning by pre-installing your application software into the cloud image, but let a CMS still do all configuration.
* Pre-install KVM and nova-compute into an image, and scale out your compute cluster by using baremetal driver to deploy nova-compute images. Do the same thing for Swift, proxy nodes, software load balancers, and so on.
* Use Heat to orchestrate the deployment of an entire cloud.
* Finally, run a mixture of baremetal nova-compute and KVM nova-compute in the same cloud (shared keystone and glance, but different tenants). Continuously deploy the cloud from the cloud using a common API.
== The Baremetal Deployment Process ==
'''This section is a stub and needs to be expanded.'''
== Differences in Starting a Baremetal Cloud ==
This section aims to cover the technical aspects of creating a bare metal cloud without duplicating the information required in general to create an openstack cloud. It assumes you already have all the other services -- MySQL, Rabbit, Keystone, Glance, etc -- up and running, and then covers:
* Nova configuration changes
* Additional package requirements
* Extra services that need to be started
* Images, Instance types, and metadata that need to be created and defined
* Enrolling your hardware
=== Configuration Changes ===
The following nova configuration options should be set on the compute host, in addition to any others that your environment requires.
scheduler_host_manager = nova.scheduler.baremetal_host_manager.BaremetalHostManager
firewall_driver = nova.virt.firewall.NoopFirewallDriver
compute_driver = nova.virt.baremetal.driver.BareMetalDriver
ram_allocation_ratio = 1.0
reserved_host_memory_mb = 0
net_config_template = /opt/stack/nova/nova/virt/baremetal/net-static.ubuntu.template
tftp_root = /tftpboot
power_manager = nova.virt.baremetal.ipmi.IPMI
driver = nova.virt.baremetal.pxe.PXE
instance_type_extra_specs = cpu_arch:{i386|x86_64}
sql_connection = mysql://{user}:{pass}@{host}/nova_bm
=== Additional Packages ===
If using the default baremetal driver (PXE) and default power driver (IPMI), then the baremetal compute host(s) must have the following packages installed to enable image deployment and power management.
  dnsmasq ipmitool open-iscsi syslinux
Additionally, to support PXE image deployments, the following steps should be taken:
  sudo mkdir -p /tftpboot/pxelinux.cfg
  sudo cp /usr/lib/syslinux/pxelinux.0 /tftpboot/
  sudo chown -R $NOVA_USER /tftpboot
  sudo mkdir -p $NOVA_DIR/baremetal/dnsmasq
  sudo mkdir -p $NOVA_DIR/baremetal/console
  sudo chown -R $NOVA_USER $NOVA_DIR/baremetal
=== Services ===
At a minimum, Keystone, Nova, Glance, and Quantum must be up and running. The following additional services are currently required for baremetal deployment, and should be started on the nova compute host.
* ''nova-baremetal-deploy-helper''. This service assists with image deployment. It reads all necessary options from nova.conf.
* ''dnsmasq''. Currently, this must run on the nova compute host. The baremetal PXE driver interacts directly with the dnsmasq configuration file and modifies the TFTP boot files that dnsmasq serves.
Start dnsmasq as follows:
  # Disable any existing dnsmasq service
  sudo service dnsmasq disable && sudo pkill dnsmasq
  # Start dnsmasq for baremetal deployments. Change IFACE and RANGE as needed.
  # Note that RANGE must not overlap with the instance IPs assigned by Nova or Quantum.
  sudo dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot \
    --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/var/run/dnsmasq.pid \
    --interface=$IFACE --dhcp-range=$RANGE
'''NOTE''': This dnsmasq process must be the only process on the network answering DHCP requests from the MAC addresses of the enrolled bare metal nodes. If another DHCP server answers the PXE boot, deployment is likely to fail. This means that '''you must disable quantum-dhcp.''' Work on this limitation is planned for the Havana cycle.
A separate database schema must be created for the baremetal driver to store information about the enrolled hardware. Create it first:
  mysql> CREATE DATABASE nova_bm;
  mysql> GRANT ALL ON nova_bm.* TO 'nova_user'@'some_host' IDENTIFIED BY '$password';
Then initialize the database with:
  nova-baremetal-manage db sync
=== Image Requirements ===
The [https://github.com/stackforge/diskimage-builder diskimage-builder] project is provided as a toolchain for customizing and building both run-time images and the deployment images used by the PXE driver. Customization may be necessary if, for example, your hardware requires drivers not enabled or included in the default images.
Diskimage-builder requires the following packages be installed:
  python-lxml python-libvirt libvirt-bin qemu-system
Additionally, if you will be building a deploy image, you will need shellinabox and the following packages:
  qemu-kvm busybox tgt gcc make
To install shellinabox, run:
  wget http://shellinabox.googlecode.com/files/shellinabox-2.14.tar.gz
  tar -xzf shellinabox-2.14.tar.gz
  cd shellinabox-2.14
  sudo make install
To build images, clone the project and run the following:
  git clone https://github.com/stackforge/diskimage-builder.git
  cd diskimage-builder
  # build the image your users will run
  bin/disk-image-create -u base -o my-image
  # and extract the kernel & ramdisk
  bin/disk-image-get-kernel -d ./ -o my -i $(pwd)/my-image.qcow2
  # build the deploy image
  # Note that this will build a kernel & ramdisk based on the host it is run on.
  KERNEL=$(uname -r)
  sudo cp /boot/vmlinuz-$KERNEL ./
  sudo chmod a+r vmlinuz-$KERNEL
  bin/ramdisk-image-create deploy -k $KERNEL -o my-deploy-ramdisk
Load all of these images into Glance, and note the glance image UUIDs for each one as it is generated. These are needed for associating the images to each other, and to the special baremetal flavor.
  glance image-create --name my-vmlinuz --public --disk-format aki  < my-vmlinuz
  glance image-create --name my-initrd --public --disk-format ari  < my-initrd
  glance image-create --name my-image --public --disk-format qcow2 --container-format bare \
      --property kernel_id=$MY_VMLINUZ_UUID --property ramdisk_id=$MY_INITRD_UUID < my-image
  glance image-create --name deploy-vmlinuz --public --disk-format aki < vmlinuz-$KERNEL
  glance image-create --name deploy-initrd --public --disk-format ari < my-deploy-ramdisk
You will also need to create a special baremetal flavor in Nova, and associate both the deploy kernel and ramdisk with that flavor via the "baremetal" namespace.
  # pick a unique number
  # change these to match your hardware
  nova flavor-create my-baremetal-flavor $FLAVOR_ID $RAM $DISK $CPU
  # associate the deploy images with this flavor
  # cpu_arch must match nova.conf, and of course, also must match your hardware
  nova flavor-key my-baremetal-flavor set \
    cpu_arch={i386|x86_64} \
    "baremetal:deploy_kernel_id"=$DEPLOY_VMLINUZ_UUID \
=== Hardware Enrollment ===
The last step is to enroll your physical hardware with the baremetal cloud. To do this, we need to give the baremetal driver some general information (# CPUs, RAM, and disk size) and also specify every MAC address which might send PXE/DHCP request. If you are using the IPMI power driver, you must also input the IP, user, and password for each node's IPMI interface. This can all be done via a Nova API admin extension. You must also inform the baremetal driver which Nova compute host should control the bare metal node.
  # create a "node" for each machine
  # extract the "id" from the result and use that in the next step
  nova baremetal-node-create --pm_address=... --pm_user=... --pm_password=... \
  # for each NIC on the node, including $FIRST-MAC, also create an interface
  nova baremetal-interface-create $ID $MAC
Once the hardware is enrolled in the baremetal driver, the Nova compute process will broadcast the availability of a new compute resource to the Nova scheduler during the next periodic update, which by default occurs once a minute. After that, you will be able to provision the hardware with a command such as the following:
  nova boot --flavor my-baremetal-flavor --image my-image my-baremetal-node
== Bugs ==
Bugs should be tagged with the keyword "baremetal" within the Nova project in Launchpad. To see the list of known baremetal bugs, go to https://bugs.launchpad.net/nova/+bugs?field.tag=baremetal+
When reporting bugs, please include any relevant information about your hardware and network environment (sanitize IPs and MAC addresses as necessary), and any relevant snippets from log the nova-compute, nova-scheduler, and nova-baremetal-deploy-helper log files. Please also include the database records for the nova instance, nova compute record, baremetal node, and the tftp configuration file. Below is a simple script to extract that information from the "nova" and "nova_bm" schema, as well as from the filesystem on the nova-compute host.
  cat > get_baremetal_crash_info.sh <<'EOF'
  node=$(mysql nova -NBre "select node from instances where uuid='$id'")
  conf=$(mysql nova_bm -e "select updated_at, task_state, pxe_config_path from bm_nodes where instance_uuid='$1'" | awk "/$id/ {print \$4}")
  echo "=========== COMPUTE NODE ==========="
  mysql nova -e "select hypervisor_hostname, created_at, updated_at, deleted_at, vcpus, memory_mb, local_gb, vcpus_used, memory_mb_used, local_gb_used, hypervisor_type, cpu_info, free_ram_mb, free_disk_gb, running_vms from compute_nodes where hypervisor_hostname='$node'\G"
  echo "=========== COMPUTE INSTANCE ==========="
  mysql nova -e "select node, created_at, updated_at, deleted_at, image_ref, kernel_id, ramdisk_id, scheduled_at, launched_at, updated_at, launched_on, vm_state, power_state, task_state, memory_mb, vcpus, root_gb, ephemeral_gb from instances where uuid='$id'\G"
  echo "=========== BAREMETAL NODE ==========="
  mysql nova_bm -e "select uuid, created_at, updated_at, deleted_at, cpus, memory_mb, local_gb, root_mb, swap_mb, service_host, instance_uuid, instance_name, task_state, pxe_config_path from bm_nodes where instance_uuid='$id'\G"
  echo "=========== TFTP CONFIG ==========="
  cat $conf
  chmod +x get_baremetal_crash_info.sh
  ./get_baremetal_crash_info.sh <your-instance-uuid-here>
== Community ==
'''NOTE:''' Information regarding the work done in Folsom by USC/ISI and NTT-Docomo [[GeneralBareMetalProvisioningFramework/Historical| has been moved here]].
* '''Main Contributors'''
** [ [https://launchpad.net/~USC-ISI USC/ISI] ]
*** Mikyung Kang <mkkang@isi.edu>, David Kang <dkang@isi.edu>
*** Ken Igarashi <igarashik@nttdocomo.co.jp>
** VirtualTech Japan Inc.
*** Arata Notsu <notsu@virtualtech.jp>
** [ [https://launchpad.net/~tripleo HP Cloud] ]
*** Devananda van der Veen <devananda@hp.com>, Robert Collins <robertc@hp.com>
* '''Blueprints on Launchpad'''
** https://blueprints.launchpad.net/nova/+spec/improve-baremetal-pxe-deploy
** https://blueprints.launchpad.net/nova/+spec/baremetal-operations
** https://blueprints.launchpad.net/nova/+spec/baremetal-compute-takeover
** https://blueprints.launchpad.net/nova/+spec/baremetal-force-node
** https://blueprints.launchpad.net/nova/+spec/selectable-scheduler-filters
** https://blueprints.launchpad.net/quantum/+spec/pxeboot-ports
** https://blueprints.launchpad.net/cinder/+spec/bare-metal-volumes
* '''Discussions'''
** We use the [http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev Openstack Development] email list.Please use both [nova] and [baremetal] tags in the Subject line.
** The #tripleo channel on irc.freenode.net often has discussions about baremetal (and installing openstack on openstack), but there is no official baremetal room. Discussions often happen in #openstack-dev and #openstack-nova.
* '''Etherpads'''
** https://etherpad.openstack.org/HavanaTripleO
** https://etherpad.openstack.org/HavanaBaremetalNextSteps
** https://etherpad.openstack.org/GrizzlyBareMetalCloud
** https://etherpad.openstack.org/FolsomBareMetalCloud
* '''Team Branches''' and other links
** [ [https://launchpad.net/~USC-ISI USC/ISI] ]
*** https://github.com/usc-isi/hpc-trunk-essex (stable/essex)
*** https://github.com/usc-isi/nova (folsom)
*** [[HeterogeneousTileraSupport]]
*** https://github.com/NTTdocomo-openstack/nova (master branch)
** [ [https://launchpad.net/~tripleo HP Cloud] ]
*** https://github.com/tripleo/nova/tree/baremetal-dev
*** https://github.com/tripleo/devstack/tree/baremetal-dev
*** Walkthrough for a development environment: https://github.com/tripleo/incubator/blob/master/notes.md

Latest revision as of 23:43, 8 October 2014

The Nova "baremetal" driver was deprecated in the Juno release, and has been deleted from Nova.

Please see Ironic for all current work on the Bare Metal Provisioning program within OpenStack.