Jump to: navigation, search

Difference between revisions of "Baremetal"

(Baremetal driver has been deleted from Nova, so I am deleting this page. Well, replacing it with a stub / reference to Ironic. Most info on this page was two years old.)
 
(82 intermediate revisions by 17 users not shown)
Line 1: Line 1:
__NOTOC__
 
* '''Launchpad Entry''': [[NovaSpec]]:general-bare-metal-provisioning-framework
 
* '''Created''': [https://launchpad.net/~mkkang Mikyung Kang]
 
* '''Maintained''':[https://launchpad.net/~mkkang  Mikyung Kang] [https://launchpad.net/~dkang  David Kang] [https://launchpad.net/~50barca Ken Igarashi] Arata Notsu
 
* '''Contributors''':
 
** [ [https://launchpad.net/~USC-ISI USC/ISI] ] Mikyung Kang <mkkang@isi.edu>, David Kang <dkang@isi.edu>
 
** [NTT DOCOMO] Ken Igarashi <igarashik@nttdocomo.co.jp>
 
** [[VirtualTech|Japan Inc.]] Arata Notsu <notsu@virtualtech.jp>
 
  
== Summary ==
+
<big>
  
This blueprint proposes to support general bare-metal provisioning framework in [[OpenStack]].
+
The Nova "baremetal" driver was deprecated in the Juno release, and has been deleted from Nova.
  
The target release for this is Folsom. USC/ISI and NTT docomo are working on integration of bare-metal provisioning implementation to support following stuff:
+
Please see [[Ironic]] for all current work on the Bare Metal Provisioning program within OpenStack.
  
* Support PXE and non-PXE bare-metal machines (Review#1)
+
</big>
* Support several architecture types such as x86_64, tilepro64, and arm (Review#1)
 
* Support fault-tolerance of bare-metal nova-compute node (Review#2)
 
 
 
The USC/ISI team has a branch here (general bare-metal provisioning framework and non-PXE support):
 
* https://github.com/usc-isi/hpc-trunk-essex (stable/essex)
 
* https://github.com/usc-isi/nova (folsom)
 
* [[HeterogeneousTileraSupport]]
 
 
 
An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/FolsomBareMetalCloud
 
 
 
Based on that, NTT docomo team and USC/ISI team have collaborated for new general bare-metal provisioning framework.
 
* PXE support
 
* Additional bare-metal features
 
* Working branch: https://github.com/NTTdocomo-openstack/nova (will be requested to be reviewed in Aug. 2: Review#1)
 
 
 
== Code changes ==
 
 
 
 
 
<pre><nowiki>
 
    nova/nova/virt/baremetal/*
 
    nova/nova/tests/baremetal/*
 
    nova/bin/bm*
 
    nova/nova/scheduler/baremetal_host_manager.py
 
    nova/nova/tests/scheduler/test_baremetal_host_manager.py
 
</nowiki></pre>
 
 
 
 
 
== Overview ==
 
 
 
1) A user requests a baremetal instance.
 
 
 
* Non-PXE (Tilera):
 
 
<pre><nowiki>
 
    euca-run-instances -t tp64.8x8 -k my.key ami-CCC   
 
</nowiki></pre>
 
 
 
* PXE
 
 
<pre><nowiki>
 
    euca-run-instances -t baremetal.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC
 
</nowiki></pre>
 
 
 
 
 
2) nova-scheduler selects a baremetal nova-compute with the following configuration.
 
 
 
* Here we assume that:
 
 
<pre><nowiki>
 
    $IP
 
      MySQL for baremetal DB runs at the machine whose IP address is $IP(127.0.0.1). It must be changed if a different IP address is used.
 
 
 
    $ID
 
      $ID should be replaced by MySQL user id
 
 
 
    $Password
 
      $Password should be replaced by MySQL password
 
</nowiki></pre>
 
 
 
 
 
* Non-PXE (Tilera) [nova.conf]:
 
 
<pre><nowiki>
 
    baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm
 
    compute_driver=nova.virt.baremetal.driver.BareMetalDriver
 
    baremetal_driver=tilera
 
    power_manager=tilera_pdu
 
    instance_type_extra_specs=cpu_arch:tilepro64
 
    baremetal_tftp_root = /tftpboot
 
    scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager
 
</nowiki></pre>
 
 
 
 
 
* PXE [nova.conf]:
 
 
<pre><nowiki>
 
    baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm
 
    compute_driver=nova.virt.baremetal.driver.BareMetalDriver
 
    baremetal_driver=pxe
 
    power_manager=ipmi
 
    instance_type_extra_specs=cpu_arch:x86_64
 
    baremetal_tftp_root = /tftpboot
 
    scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager
 
    baremetal_deploy_kernel = xxxxxxxxxx
 
    baremetal_deploy_ramdisk = yyyyyyyy
 
</nowiki></pre>
 
 
 
 
 
3) The bare-metal nova-compute selects a bare-metal node from its pool based on hardware resources and the instance type (# of cpus, memory, HDDs).
 
 
 
4) Deployment images and configuration are prepared.
 
 
 
* Non-PXE (Tilera):
 
** The key injected file system is prepared and then NFS directory is configured for the bare-metal nodes. The kernel is already put to CF(Compact Flash Memory) of each tilera board and the ramdisk is not used for the tilera bare-metal nodes. For NFS mounting, /tftpboot/fs_x (x=node_id) should be set before launching instances.
 
* PXE:
 
** kernel and ramdisk for the deployment, and the user specified kernel and ramdisk are put to TFTP server. PXE are configured for the baremetal host.
 
 
 
5) The baremetal nova-compute powers on the baremetal node thorough
 
 
 
* Non-PXE (Tilera): PDU(Power Distribution Unit)
 
* PXE: IPMI
 
 
 
6) The image is deployed to bare-metal node.
 
 
 
* Non-PXE (Tilera): The images are deployed to bare-metal nodes. nova-compute mounts AMI into NFS directory based on the id of the selected tilera bare-metal node.
 
* PXE: The host uses the deployment kernel and ramdisk, and the baremetal nova-copute writes AMI to the host's local disk via iSCSI.
 
 
 
7) Bare-metal node is booted.
 
 
 
* Non-PXE (Tilera):
 
** The bare-metal node is configured for network, ssh, and iptables rule.
 
** Done.
 
* PXE:
 
** The host is rebooted.
 
** Next, the host is booted up by the user specified kernel, ramdisk and its local disk.
 
** Done.
 
 
 
== Packages A: Non-PXE (Tilera) ==
 
 
 
* This procedure is for RHEL. Reading 'tilera-bm-instance-creation.txt' may make this document easy to understand.
 
* TFTP, NFS, EXPECT, and Telnet installation:
 
 
<pre><nowiki>
 
    $ yum install nfs-utils.x86_64 expect.x86_64 tftp-server.x86_64 telnet
 
</nowiki></pre>
 
 
 
 
 
* TFTP configuration:
 
 
<pre><nowiki>
 
    $ cat /etc/xinetd.d/tftp
 
    # default: off
 
    # description: The tftp server serves files using the trivial file transfer \
 
    #      protocol.  The tftp protocol is often used to boot diskless \
 
    #      workstations, download configuration files to network-aware printers,
 
    #      \
 
    #      and to start the installation process for some operating systems.
 
    service tftp 
 
    {
 
          socket_type            = dgram
 
          protocol                = udp
 
          wait                    = yes
 
          user                    = root
 
          server                  = /usr/sbin/in.tftpd
 
          server_args            = -s /tftpboot
 
          disable                = no
 
          per_source              = 11
 
          cps                    = 100 2
 
          flags                  = IPv4
 
    }
 
    $ /etc/init.d/xinetd restart
 
</nowiki></pre>
 
 
 
 
 
* NFS configuration:
 
 
<pre><nowiki>
 
    $ mkdir /tftpboot
 
    $ mkdir /tftpboot/fs_x (x: the id of tilera board) 
 
    $ cat /etc/exports
 
    /tftpboot/fs_0 tilera0-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_1 tilera1-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_2 tilera2-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_3 tilera3-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_4 tilera4-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_5 tilera5-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_6 tilera6-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_7 tilera7-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_8 tilera8-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    /tftpboot/fs_9 tilera9-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check)
 
    $ sudo /etc/init.d/nfs restart
 
    $ sudo /usr/sbin/exportfs
 
</nowiki></pre>
 
 
 
 
 
* TileraMDE install: TileraMDE-3.0.1.125620:
 
 
<pre><nowiki>
 
    $ cd /usr/local/
 
    $ tar -xvf tileramde-3.0.1.125620_tilepro.tar
 
    $ tar -xjvf tileramde-3.0.1.125620_tilepro_apps.tar.bz2
 
    $ tar -xjvf tileramde-3.0.1.125620_tilepro_src.tar.bz2
 
    $ mkdir /usr/local/TileraMDE-3.0.1.125620/tilepro/tile
 
    $ cd /usr/local/TileraMDE-3.0.1.125620/tilepro/tile/
 
    $ tar -xjvf tileramde-3.0.1.125620_tilepro_tile.tar.bz2
 
    $ ln -s /usr/local/TileraMDE-3.0.1.125620/tilepro/ /usr/local/TileraMDE
 
</nowiki></pre>
 
 
 
 
 
* Installation for 32-bit libraries to execute TileraMDE:
 
 
<pre><nowiki>
 
    $ yum install glibc.i686 glibc-devel.i686
 
</nowiki></pre>
 
 
 
 
 
== Packages B: PXE ==
 
 
 
* This procedure is for Ubuntu 12.04 x86_64. Reading 'baremetal-instance-creation.txt' may make this document easy to understand.
 
* dnsmasq (PXE server for baremetal hosts)
 
* syslinux (bootloader for PXE)
 
* ipmitool (operate IPMI)
 
* qemu-kvm (only for qemu-img)
 
* open-iscsi (connect to iSCSI target at berametal hosts)
 
* busybox (used in deployment ramdisk)
 
* tgt (used in deployment ramdisk)
 
* Example:
 
 
<pre><nowiki>
 
    $ sudo apt-get install dnsmasq syslinux ipmitool qemu-kvm open-iscsi
 
    $ sudo apt-get install busybox tgt
 
</nowiki></pre>
 
 
 
 
 
* Ramdisk for Deployment
 
* To create a deployment ramdisk, use 'baremetal-mkinitrd.sh' in [baremetal-initrd-builder](https://github.com/NTTdocomo-openstack/baremetal-initrd-builder):
 
 
 
<pre><nowiki>
 
    $ cd baremetal-initrd-builder
 
    $ ./baremetal-mkinitrd.sh <ramdisk output path> <kernel version>
 
</nowiki></pre>
 
 
 
 
 
<pre><nowiki>
 
    $ ./baremetal-mkinitrd.sh /tmp/deploy-ramdisk.img 3.2.0-26-generic
 
    working in /tmp/baremetal-mkinitrd.9AciX98N
 
    368017 blocks
 
    Register the kernel and the ramdisk to Glance.
 
</nowiki></pre>
 
 
 
 
 
<pre><nowiki>
 
    $ glance add name="baremetal deployment ramdisk" is_public=true container_format=ari disk_format=ari < /tmp/deploy-ramdisk.img
 
    Uploading image 'baremetal deployment ramdisk'
 
    ===========================================[100%] 114.951697M/s, ETA  0h  0m  0s
 
    Added new image with ID: e99775cb-f78d-401e-9d14-acd86e2f36e3
 
   
 
    $ glance add name="baremetal deployment kernel" is_public=true container_format=aki disk_format=aki < /boot/vmlinuz-3.2.0-26-generic
 
    Uploading image 'baremetal deployment kernel'
 
    ===========================================[100%] 46.9M/s, ETA  0h  0m  0s
 
    Added new image with ID: d76012fc-4055-485c-a978-f748679b89a9
 
</nowiki></pre>
 
 
 
 
 
* ShellInABox
 
* Baremetal nova-compute uses [ShellInABox](http://code.google.com/p/shellinabox/) so that users can access baremetal host's console through web browsers.
 
* Build from source and install:
 
 
 
<pre><nowiki>
 
    $ sudo apt-get install gcc make
 
    $ tar xzf shellinabox-2.14.tar.gz
 
    $ cd shellinabox-2.14
 
    $ ./configure
 
    $ sudo make install
 
</nowiki></pre>
 
 
 
 
 
* PXE Boot Server
 
* Prepare TFTP root directory:
 
 
 
<pre><nowiki>
 
    $ sudo mkdir /tftpboot
 
    $ sudo cp /usr/lib/syslinux/pxelinux.0 /tftpboot/
 
    $ sudo mkdir /tftpboot/pxelinux.cfg
 
</nowiki></pre>
 
 
 
* Start dnsmasq. Example: start dnsmasq on eth1 with PXE and TFTP enabled:
 
 
 
<pre><nowiki>
 
    $ sudo dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/dnsmasq.pid --interface=eth1 --dhcp-range=192.168.175.100,192.168.175.254
 
   
 
    (You may need to stop and disable dnsmasq)
 
    $ sudo /etc/init.d/dnsmasq stop
 
    $ sudo sudo update-rc.d dnsmasq disable
 
</nowiki></pre>
 
 
 
 
 
* How to create an image:
 
* Example: create a partition image from ubuntu cloud images' Precise tarball:
 
 
 
<pre><nowiki>
 
$ wget http://cloud-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-root.tar.gz
 
$ dd if=/dev/zero of=u.img bs=1M count=0 seek=1024
 
$ mkfs -F -t ext4 u.img
 
$ sudo mount -o loop u.img /mnt/
 
$ sudo tar -C /mnt -xzf ~/precise-server-cloudimg-amd64-root.tar.gz
 
$ sudo rm /mnt/etc/resolv.conf
 
        # (set a temporary DNS server to use apt-get in chroot (8.8.8.8 is Google Public DNS address))
 
$ sudo echo nameserver 8.8.8.8 >/mnt/etc/resolv.conf
 
$ sudo chroot /mnt apt-get install linux-image-3.2.0-26-generic vlan open-iscsi
 
$ ln -sf ../run/resolvconf/resolv.conf /mnt/etc/resolv.conf
 
$ sudo umount /mnt
 
</nowiki></pre>
 
 
 
 
 
== Nova Directories ==
 
 
 
<pre><nowiki>
 
    $ sudo mkdir /var/lib/nova/baremetal
 
    $ sudo mkdir /var/lib/nova/baremetal/console
 
    $ sudo mkdir /var/lib/nova/baremetal/dnsmasq
 
</nowiki></pre>
 
 
 
 
 
== Nova Database ==
 
 
 
* Create the baremetal database. Grant all provileges to the user specified by the 'baremetal_sql_connection' flag. Example:
 
 
 
<pre><nowiki>
 
$ mysql -p
 
mysql> create database nova_bm;
 
mysql> grant all privileges on nova_bm.* to '$ID'@'%' identified by '$Password';
 
mysql> exit
 
</nowiki></pre>
 
 
 
 
 
* Create tables:
 
 
 
<pre><nowiki>
 
$ bm_db_sync
 
</nowiki></pre>
 
 
 
 
 
== Create Baremetal Instance Type ==
 
 
 
* First, create an instance type in the normal way.
 
 
<pre><nowiki>
 
$ nova-manage instance_type create --name=tp64.8x8 --cpu=64 --memory=16218 --root_gb=917 --ephemeral_gb=0 --flavor=6 --swap=1024 --rxtx_factor=1
 
$ nova-manage instance_type create --name=bm.small --cpu=2 --memory=4096 --root_gb=10 --ephemeral_gb=20 --flavor=7 --swap=1024 --rxtx_factor=1
 
(about --flavor, see 'How to choose the value for flavor' section below)
 
</nowiki></pre>
 
 
 
 
 
* Next, set baremetal extra_spec to the instance type:
 
 
<pre><nowiki>
 
$ nova-manage instance_type set_key --name=tp64.8x8 --key cpu_arch --value 's== tilepro64'
 
$ nova-manage instance_type set_key --name=bm.small --key cpu_arch --value 's== x86_64'
 
</nowiki></pre>
 
 
 
 
 
== How to choose the value for flavor ==
 
 
 
* Run nova-manage instance_type list, find the maximum FlavorID in output. Use the maximum FlavorID+1 for new instance_type.
 
 
<pre><nowiki>
 
$ nova-manage instance_type list
 
m1.medium: Memory: 4096MB, VCPUS: 2, Root: 40GB, Ephemeral: 0Gb, FlavorID: 3, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.small: Memory: 2048MB, VCPUS: 1, Root: 20GB, Ephemeral: 0Gb, FlavorID: 2, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.large: Memory: 8192MB, VCPUS: 4, Root: 80GB, Ephemeral: 0Gb, FlavorID: 4, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.tiny: Memory: 512MB, VCPUS: 1, Root: 0GB, Ephemeral: 0Gb, FlavorID: 1, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
m1.xlarge: Memory: 16384MB, VCPUS: 8, Root: 160GB, Ephemeral: 0Gb, FlavorID: 5, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
 
</nowiki></pre>
 
 
 
* In the example above, the maximum Flavor ID is 5, so use 6 and 7.
 
 
 
== Start Processes ==
 
 
 
<pre><nowiki>
 
(Currently, you might have trouble if run processes as a user other than the superuser...)
 
$ sudo bm_deploy_server &
 
$ sudo nova-scheduler &
 
$ sudo nova-compute &
 
</nowiki></pre>
 
 
 
 
 
== Register Baremetal Host and NIC ==
 
 
 
* First, register a baremetal node. Next, register the baremetal node's NICs.
 
* To register a baremetal node, use 'bm_node_create'. 'bm_node_create' takes the parameters listed below.
 
 
<pre><nowiki>
 
--service_host: baremetal nova-compute's hostname
 
--cpus=: number of CPU cores
 
--memory_mb: memory size in MegaBytes
 
--local_gb: local disk size in GigaBytes
 
--pm_address: tilera node's IP address / IPMI address
 
--pm_user: IPMI username
 
--pm_password: IPMI password
 
--prov_mac: tilera node's MAC address / PXE NIC's MAC address
 
--terminal_port: TCP port for ShellInABox. Each node must use unique TCP port. If you do not need console access, use 0.
 
</nowiki></pre>
 
 
 
 
<pre><nowiki>
 
$ bm_node_create --service_host=bm1 --cpus=64 --memory_mb=16218 --local_gb=917 --pm_address=10.0.2.1 --pm_user=test --pm_password=password --prov_mac=98:4b:e1:67:9a:4c --terminal_port=0
 
$ bm_node_create --service_host=bm1 --cpus=4 --memory_mb=6144 --local_gb=64 --pm_address=172.27.2.116 --pm_user=test --pm_password=password --prov_mac=98:4b:e1:67:9a:4c --terminal_port=8000
 
</nowiki></pre>
 
 
 
 
 
* To verify the node registration, run 'bm_node_list':
 
 
<pre><nowiki>
 
$ bm_node_list
 
ID        SERVICE_HOST  INSTANCE_ID  CPUS    Memory    Disk      PM_Address        PM_User          TERMINAL_PORT  PROV_MAC            PROV_VLAN
 
1        bm1          None          64      16218    917      10.0.2.1          test              0  98:4b:e1:67:9a:4c  None
 
2        bm1          None          4      6144      64        172.27.2.116      test              8000  98:4b:e1:67:9a:4c  None
 
</nowiki></pre>
 
 
 
 
 
* To register NIC, use 'bm_interface_create'. 'bm_interface_create' takes the parameters listed below.
 
 
<pre><nowiki>
 
--bm_node_id: ID of the baremetal node owns this NIC (the first column of 'bm_node_list')
 
--mac_address: this NIC's MAC address in the form of xx:xx:xx:xx:xx:xx
 
--datapath_id: datapath ID of OpenFlow switch this NIC is connected to
 
--port_no: OpenFlow port number this NIC is connected to
 
(--datapath_id and --port_no are used for network isolation. It is OK to put 0, if you do not have OpenFlow switch.)
 
</nowiki></pre>
 
 
 
 
 
 
<pre><nowiki>
 
$ bm_interface_create --bm_node_id=1 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0 --port_no=0
 
$ bm_interface_create --bm_node_id=2 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0x123abc --port_no=24
 
</nowiki></pre>
 
 
 
 
 
* To verify the NIC registration, run 'bm_interface_list':
 
 
<pre><nowiki>
 
$ bm_interface_list
 
ID        BM_NODE_ID        MAC_ADDRESS        DATAPATH_ID      PORT_NO
 
1        1                98:4b:e1:67:9a:4e  0x0              0
 
2        2                98:4b:e1:67:9a:4e  0x123abc          24
 
</nowiki></pre>
 
 
 
 
 
== Run Instance ==
 
 
 
* Run instance using the baremetal instance type. Make sure to use kernel and image that support baremetal hardware (i.e contain drivers for baremetal hardware ).
 
 
<pre><nowiki>
 
euca-run-instances -t tp64.8x8 -k my.key ami-CCC
 
euca-run-instances -t bm.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC
 
</nowiki></pre>
 

Latest revision as of 23:43, 8 October 2014

The Nova "baremetal" driver was deprecated in the Juno release, and has been deleted from Nova.

Please see Ironic for all current work on the Bare Metal Provisioning program within OpenStack.