Difference between revisions of "Baremetal"
Line 3: | Line 3: | ||
* '''Created''': [https://launchpad.net/~mkkang Mikyung Kang] | * '''Created''': [https://launchpad.net/~mkkang Mikyung Kang] | ||
* '''Maintained''':[https://launchpad.net/~mkkang Mikyung Kang] [https://launchpad.net/~dkang David Kang] [https://launchpad.net/~50barca Ken Igarashi] | * '''Maintained''':[https://launchpad.net/~mkkang Mikyung Kang] [https://launchpad.net/~dkang David Kang] [https://launchpad.net/~50barca Ken Igarashi] | ||
− | * '''Contributors''': [https://launchpad.net/~USC-ISI USC Information Sciences Institute] & NTT | + | * '''Contributors''': [https://launchpad.net/~USC-ISI USC Information Sciences Institute] & NTT Docomo |
== Summary == | == Summary == | ||
Line 11: | Line 11: | ||
The target release for this is Folsom. USC/ISI and NTT docomo are working on integration of bare-metal provisioning implementation to support following stuff: | The target release for this is Folsom. USC/ISI and NTT docomo are working on integration of bare-metal provisioning implementation to support following stuff: | ||
− | * Support PXE and non-PXE bare-metal machines | + | * Support PXE and non-PXE bare-metal machines (Review#1) |
− | * Support several architecture types such as x86_64, tilepro64, and arm | + | * Support several architecture types such as x86_64, tilepro64, and arm (Review#1) |
− | * Support fault-tolerance of bare-metal nova-compute node | + | * Support fault-tolerance of bare-metal nova-compute node (Review#2) |
The USC/ISI team has a branch here (general bare-metal provisioning framework and non-PXE support): | The USC/ISI team has a branch here (general bare-metal provisioning framework and non-PXE support): | ||
* https://github.com/usc-isi/hpc-trunk-essex (stable/essex) | * https://github.com/usc-isi/hpc-trunk-essex (stable/essex) | ||
− | * https://github.com/usc-isi/nova (folsom | + | * https://github.com/usc-isi/nova (folsom) |
* [[HeterogeneousTileraSupport]] | * [[HeterogeneousTileraSupport]] | ||
− | + | An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/FolsomBareMetalCloud | |
− | |||
− | + | Based on that, NTT docomo team and USC/ISI team collaborated and then new general bare-metal provisioning framework. | |
+ | * PXE support | ||
+ | * Additional bare-metal features | ||
+ | * Working branch: https://github.com/NTTdocomo-openstack/nova (will be requested to be reviewed in Aug. 2: Review#1) | ||
+ | |||
+ | == Code changes == | ||
− | |||
− | |||
− | + | <pre><nowiki> | |
+ | nova/nova/virt/baremetal/* | ||
+ | nova/nova/tests/baremetal/* | ||
+ | nova/bin/bm* | ||
+ | nova/nova/scheduler/baremetal_host_manager.py | ||
+ | nova/nova/tests/scheduler/test_baremetal_host_manager.py | ||
+ | </nowiki></pre> | ||
− | |||
== Overview == | == Overview == | ||
Line 36: | Line 43: | ||
1) A user requests a baremetal instance. | 1) A user requests a baremetal instance. | ||
− | * euca-run-instances -t baremetal.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC | + | * Non-PXE (Tilera): |
+ | |||
+ | <pre><nowiki> | ||
+ | euca-run-instances -t tp64.8x8 -k my.key ami-CCC | ||
+ | </nowiki></pre> | ||
+ | |||
+ | * PXE | ||
+ | |||
+ | <pre><nowiki> | ||
+ | euca-run-instances -t baremetal.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | 2) nova-scheduler selects a baremetal nova-compute with the following configuration. | ||
+ | |||
+ | * Here we assume that: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $IP | ||
+ | MySQL for baremetal DB runs at the machine whose IP address is $IP(127.0.0.1). It must be changed if a different IP address is used. | ||
+ | |||
+ | $ID | ||
+ | $ID should be replaced by MySQL user id | ||
+ | |||
+ | $Password | ||
+ | $Password should be replaced by MySQL password | ||
+ | </nowiki></pre> | ||
+ | |||
− | + | * Non-PXE (Tilera) [nova.conf]: | |
+ | |||
+ | <pre><nowiki> | ||
+ | baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm | ||
+ | compute_driver=nova.virt.baremetal.driver.BareMetalDriver | ||
+ | baremetal_driver=tilera | ||
+ | power_manager=tilera_pdu | ||
+ | instance_type_extra_specs=cpu_arch:tilepro64 | ||
+ | baremetal_tftp_root = /tftpboot | ||
+ | scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * PXE [nova.conf]: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm | ||
+ | compute_driver=nova.virt.baremetal.driver.BareMetalDriver | ||
+ | baremetal_driver=pxe | ||
+ | power_manager=ipmi | ||
+ | instance_type_extra_specs=cpu_arch:x86_64 | ||
+ | baremetal_tftp_root = /tftpboot | ||
+ | scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager | ||
+ | baremetal_deploy_kernel = xxxxxxxxxx | ||
+ | baremetal_deploy_ramdisk = yyyyyyyy | ||
+ | </nowiki></pre> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
3) The bare-metal nova-compute selects a bare-metal node from its pool based on hardware resources and the instance type (# of cpus, memory, HDDs). | 3) The bare-metal nova-compute selects a bare-metal node from its pool based on hardware resources and the instance type (# of cpus, memory, HDDs). | ||
− | 4) | + | 4) Deployment images and configuration are prepared. |
− | * PXE: kernel and ramdisk for the deployment, and the user specified kernel and ramdisk are put to TFTP server. PXE | + | |
− | * Non-PXE: NFS/TFTP is | + | * Non-PXE (Tilera): |
+ | ** The key injected file system is prepared and then NFS directory is configured for the bare-metal nodes. The kernel is already put to CF(Compact Flash Memory) of each tilera board and the ramdisk is not used for the tilera bare-metal nodes. For NFS mounting, /tftpboot/fs_x (x=node_id) should be set before launching instances. | ||
+ | * PXE: | ||
+ | ** kernel and ramdisk for the deployment, and the user specified kernel and ramdisk are put to TFTP server. PXE are configured for the baremetal host. | ||
+ | |||
+ | 5) The baremetal nova-compute powers on the baremetal node thorough | ||
+ | |||
+ | * Non-PXE (Tilera): PDU(Power Distribution Unit) | ||
+ | * PXE: IPMI | ||
+ | |||
+ | 6) The image is deployed to bare-metal node. | ||
+ | |||
+ | * Non-PXE (Tilera): The images are deployed to bare-metal nodes. nova-compute mounts AMI into NFS directory based on the id of the selected tilera bare-metal node. | ||
+ | * PXE: The host uses the deployment kernel and ramdisk, and the baremetal nova-copute writes AMI to the host's local disk via iSCSI. | ||
+ | |||
+ | 7) Bare-metal node is booted. | ||
+ | |||
+ | * Non-PXE (Tilera): | ||
+ | ** The bare-metal node is configured for network, ssh, and iptables rule. | ||
+ | ** Done. | ||
+ | * PXE: | ||
+ | ** The host is rebooted. | ||
+ | ** Next, the host is booted up by the user specified kernel, ramdisk and its local disk. | ||
+ | ** Done. | ||
+ | |||
+ | == Packages A: Non-PXE (Tilera) == | ||
+ | |||
+ | * This procedure is for RHEL. Reading 'tilera-bm-instance-creation.txt' may make this document easy to understand. | ||
+ | * TFTP, NFS, EXPECT, and Telnet installation: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ yum install nfs-utils.x86_64 expect.x86_64 tftp-server.x86_64 telnet | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * TFTP configuration: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ cat /etc/xinetd.d/tftp | ||
+ | # default: off | ||
+ | # description: The tftp server serves files using the trivial file transfer \ | ||
+ | # protocol. The tftp protocol is often used to boot diskless \ | ||
+ | # workstations, download configuration files to network-aware printers, | ||
+ | # \ | ||
+ | # and to start the installation process for some operating systems. | ||
+ | service tftp | ||
+ | { | ||
+ | socket_type = dgram | ||
+ | protocol = udp | ||
+ | wait = yes | ||
+ | user = root | ||
+ | server = /usr/sbin/in.tftpd | ||
+ | server_args = -s /tftpboot | ||
+ | disable = no | ||
+ | per_source = 11 | ||
+ | cps = 100 2 | ||
+ | flags = IPv4 | ||
+ | } | ||
+ | $ /etc/init.d/xinetd restart | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * NFS configuration: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ mkdir /tftpboot | ||
+ | $ mkdir /tftpboot/fs_x (x: the id of tilera board) | ||
+ | $ cat /etc/exports | ||
+ | /tftpboot/fs_0 tilera0-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_1 tilera1-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_2 tilera2-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_3 tilera3-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_4 tilera4-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_5 tilera5-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_6 tilera6-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_7 tilera7-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_8 tilera8-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | /tftpboot/fs_9 tilera9-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) | ||
+ | $ sudo /etc/init.d/nfs restart | ||
+ | $ sudo /usr/sbin/exportfs | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * TileraMDE install: TileraMDE-3.0.1.125620: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ cd /usr/local/ | ||
+ | $ tar -xvf tileramde-3.0.1.125620_tilepro.tar | ||
+ | $ tar -xjvf tileramde-3.0.1.125620_tilepro_apps.tar.bz2 | ||
+ | $ tar -xjvf tileramde-3.0.1.125620_tilepro_src.tar.bz2 | ||
+ | $ mkdir /usr/local/TileraMDE-3.0.1.125620/tilepro/tile | ||
+ | $ cd /usr/local/TileraMDE-3.0.1.125620/tilepro/tile/ | ||
+ | $ tar -xjvf tileramde-3.0.1.125620_tilepro_tile.tar.bz2 | ||
+ | $ ln -s /usr/local/TileraMDE-3.0.1.125620/tilepro/ /usr/local/TileraMDE | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * Installation for 32-bit libraries to execute TileraMDE: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ yum install glibc.i686 glibc-devel.i686 | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | == Packages B: PXE == | ||
+ | |||
+ | * This procedure is for Ubuntu 12.04 x86_64. Reading 'baremetal-instance-creation.txt' may make this document easy to understand. | ||
+ | * dnsmasq (PXE server for baremetal hosts) | ||
+ | * syslinux (bootloader for PXE) | ||
+ | * ipmitool (operate IPMI) | ||
+ | * qemu-kvm (only for qemu-img) | ||
+ | * open-iscsi (connect to iSCSI target at berametal hosts) | ||
+ | * busybox (used in deployment ramdisk) | ||
+ | * tgt (used in deployment ramdisk) | ||
+ | * Example: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ sudo apt-get install dnsmasq syslinux ipmitool qemu-kvm open-iscsi | ||
+ | $ sudo apt-get install busybox tgt | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * Ramdisk for Deployment | ||
+ | * To create a deployment ramdisk, use 'baremetal-mkinitrd.sh' in [baremetal-initrd-builder](https://github.com/NTTdocomo-openstack/baremetal-initrd-builder): | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ cd baremetal-initrd-builder | ||
+ | $ ./baremetal-mkinitrd.sh <ramdisk output path> <kernel version> | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | <pre><nowiki> | ||
+ | $ ./baremetal-mkinitrd.sh /tmp/deploy-ramdisk.img 3.2.0-26-generic | ||
+ | working in /tmp/baremetal-mkinitrd.9AciX98N | ||
+ | 368017 blocks | ||
+ | Register the kernel and the ramdisk to Glance. | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | <pre><nowiki> | ||
+ | $ glance add name="baremetal deployment ramdisk" is_public=true container_format=ari disk_format=ari < /tmp/deploy-ramdisk.img | ||
+ | Uploading image 'baremetal deployment ramdisk' | ||
+ | ===========================================[100%] 114.951697M/s, ETA 0h 0m 0s | ||
+ | Added new image with ID: e99775cb-f78d-401e-9d14-acd86e2f36e3 | ||
+ | |||
+ | $ glance add name="baremetal deployment kernel" is_public=true container_format=aki disk_format=aki < /boot/vmlinuz-3.2.0-26-generic | ||
+ | Uploading image 'baremetal deployment kernel' | ||
+ | ===========================================[100%] 46.9M/s, ETA 0h 0m 0s | ||
+ | Added new image with ID: d76012fc-4055-485c-a978-f748679b89a9 | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * ShellInABox | ||
+ | * Baremetal nova-compute uses [ShellInABox](http://code.google.com/p/shellinabox/) so that users can access baremetal host's console through web browsers. | ||
+ | * Build from source and install: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ sudo apt-get install gcc make | ||
+ | $ tar xzf shellinabox-2.14.tar.gz | ||
+ | $ cd shellinabox-2.14 | ||
+ | $ ./configure | ||
+ | $ sudo make install | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * PXE Boot Server | ||
+ | * Prepare TFTP root directory: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ sudo mkdir /tftpboot | ||
+ | $ sudo cp /usr/lib/syslinux/pxelinux.0 /tftpboot/ | ||
+ | $ sudo mkdir /tftpboot/pxelinux.cfg | ||
+ | </nowiki></pre> | ||
+ | |||
+ | * Start dnsmasq. Example: start dnsmasq on eth1 with PXE and TFTP enabled: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ sudo dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/dnsmasq.pid --interface=eth1 --dhcp-range=192.168.175.100,192.168.175.254 | ||
+ | |||
+ | (You may need to stop and disable dnsmasq) | ||
+ | $ sudo /etc/init.d/dnsmasq stop | ||
+ | $ sudo sudo update-rc.d dnsmasq disable | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * How to create an image: | ||
+ | * Example: create a partition image from ubuntu cloud images' Precise tarball: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ wget http://cloud-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-root.tar.gz | ||
+ | $ dd if=/dev/zero of=u.img bs=1M count=0 seek=1024 | ||
+ | $ mkfs -F -t ext4 u.img | ||
+ | $ sudo mount -o loop u.img /mnt/ | ||
+ | $ sudo tar -C /mnt -xzf ~/precise-server-cloudimg-amd64-root.tar.gz | ||
+ | $ sudo rm /mnt/etc/resolv.conf | ||
+ | # (set a temporary DNS server to use apt-get in chroot (8.8.8.8 is Google Public DNS address)) | ||
+ | $ sudo echo nameserver 8.8.8.8 >/mnt/etc/resolv.conf | ||
+ | $ sudo chroot /mnt apt-get install linux-image-3.2.0-26-generic vlan open-iscsi | ||
+ | $ ln -sf ../run/resolvconf/resolv.conf /mnt/etc/resolv.conf | ||
+ | $ sudo umount /mnt | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | == Nova Directories == | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ sudo mkdir /var/lib/nova/baremetal | ||
+ | $ sudo mkdir /var/lib/nova/baremetal/console | ||
+ | $ sudo mkdir /var/lib/nova/baremetal/dnsmasq | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | == Nova Database == | ||
+ | |||
+ | * Create the baremetal database. Grant all provileges to the user specified by the 'baremetal_sql_connection' flag. Example: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ mysql -p | ||
+ | mysql> create database nova_bm; | ||
+ | mysql> grant all privileges on nova_bm.* to '$ID'@'%' identified by '$Password'; | ||
+ | mysql> exit | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * Create tables: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ bm_db_sync | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | == Create Baremetal Instance Type == | ||
+ | |||
+ | * First, create an instance type in the normal way. | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ nova-manage instance_type create --name=tp64.8x8 --cpu=64 --memory=16218 --root_gb=917 --ephemeral_gb=0 --flavor=6 --swap=1024 --rxtx_factor=1 | ||
+ | $ nova-manage instance_type create --name=bm.small --cpu=2 --memory=4096 --root_gb=10 --ephemeral_gb=20 --flavor=7 --swap=1024 --rxtx_factor=1 | ||
+ | (about --flavor, see 'How to choose the value for flavor' section below) | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * Next, set baremetal extra_spec to the instance type: | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ nova-manage instance_type set_key --name=tp64.8x8 --key cpu_arch --value 's== tilepro64' | ||
+ | $ nova-manage instance_type set_key --name=bm.small --key cpu_arch --value 's== x86_64' | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | == How to choose the value for flavor == | ||
+ | |||
+ | * Run nova-manage instance_type list, find the maximum FlavorID in output. Use the maximum FlavorID+1 for new instance_type. | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ nova-manage instance_type list | ||
+ | m1.medium: Memory: 4096MB, VCPUS: 2, Root: 40GB, Ephemeral: 0Gb, FlavorID: 3, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} | ||
+ | m1.small: Memory: 2048MB, VCPUS: 1, Root: 20GB, Ephemeral: 0Gb, FlavorID: 2, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} | ||
+ | m1.large: Memory: 8192MB, VCPUS: 4, Root: 80GB, Ephemeral: 0Gb, FlavorID: 4, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} | ||
+ | m1.tiny: Memory: 512MB, VCPUS: 1, Root: 0GB, Ephemeral: 0Gb, FlavorID: 1, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} | ||
+ | m1.xlarge: Memory: 16384MB, VCPUS: 8, Root: 160GB, Ephemeral: 0Gb, FlavorID: 5, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} | ||
+ | </nowiki></pre> | ||
+ | |||
+ | * In the example above, the maximum Flavor ID is 5, so use 6 and 7. | ||
+ | |||
+ | == Start Processes == | ||
+ | |||
+ | <pre><nowiki> | ||
+ | (Currently, you might have trouble if run processes as a user other than the superuser...) | ||
+ | $ sudo bm_deploy_server & | ||
+ | $ sudo nova-scheduler & | ||
+ | $ sudo nova-compute & | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | == Register Baremetal Host and NIC == | ||
+ | |||
+ | * First, register a baremetal node. Next, register the baremetal node's NICs. | ||
+ | * To register a baremetal node, use 'bm_node_create'. 'bm_node_create' takes the parameters listed below. | ||
+ | |||
+ | <pre><nowiki> | ||
+ | --service_host: baremetal nova-compute's hostname | ||
+ | --cpus=: number of CPU cores | ||
+ | --memory_mb: memory size in MegaBytes | ||
+ | --local_gb: local disk size in GigaBytes | ||
+ | --pm_address: tilera node's IP address / IPMI address | ||
+ | --pm_user: IPMI username | ||
+ | --pm_password: IPMI password | ||
+ | --prov_mac: tilera node's MAC address / PXE NIC's MAC address | ||
+ | --terminal_port: TCP port for ShellInABox. Each node must use unique TCP port. If you do not need console access, use 0. | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | <pre><nowiki> | ||
+ | $ bm_node_create --service_host=bm1 --cpus=64 --memory_mb=16218 --local_gb=917 --pm_address=10.0.2.1 --pm_user=test --pm_password=password --prov_mac=98:4b:e1:67:9a:4c --terminal_port=0 | ||
+ | $ bm_node_create --service_host=bm1 --cpus=4 --memory_mb=6144 --local_gb=64 --pm_address=172.27.2.116 --pm_user=test --pm_password=password --prov_mac=98:4b:e1:67:9a:4c --terminal_port=8000 | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * To verify the node registration, run 'bm_node_list': | ||
+ | |||
+ | <pre><nowiki> | ||
+ | $ bm_node_list | ||
+ | ID SERVICE_HOST INSTANCE_ID CPUS Memory Disk PM_Address PM_User TERMINAL_PORT PROV_MAC PROV_VLAN | ||
+ | 1 bm1 None 64 16218 917 10.0.2.1 test 0 98:4b:e1:67:9a:4c None | ||
+ | 2 bm1 None 4 6144 64 172.27.2.116 test 8000 98:4b:e1:67:9a:4c None | ||
+ | </nowiki></pre> | ||
+ | |||
+ | |||
+ | * To register NIC, use 'bm_interface_create'. 'bm_interface_create' takes the parameters listed below. | ||
+ | |||
+ | <pre><nowiki> | ||
+ | --bm_node_id: ID of the baremetal node owns this NIC (the first column of 'bm_node_list') | ||
+ | --mac_address: this NIC's MAC address in the form of xx:xx:xx:xx:xx:xx | ||
+ | --datapath_id: datapath ID of OpenFlow switch this NIC is connected to | ||
+ | --port_no: OpenFlow port number this NIC is connected to | ||
+ | (--datapath_id and --port_no are used for network isolation. It is OK to put 0, if you do not have OpenFlow switch.) | ||
+ | </nowiki></pre> | ||
− | |||
− | |||
− | |||
− | + | ||
− | + | <pre><nowiki> | |
− | + | $ bm_interface_create --bm_node_id=1 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0 --port_no=0 | |
+ | $ bm_interface_create --bm_node_id=2 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0x123abc --port_no=24 | ||
+ | </nowiki></pre> | ||
− | |||
− | |||
− | + | * To verify the NIC registration, run 'bm_interface_list': | |
+ | |||
+ | <pre><nowiki> | ||
+ | $ bm_interface_list | ||
+ | ID BM_NODE_ID MAC_ADDRESS DATAPATH_ID PORT_NO | ||
+ | 1 1 98:4b:e1:67:9a:4e 0x0 0 | ||
+ | 2 2 98:4b:e1:67:9a:4e 0x123abc 24 | ||
+ | </nowiki></pre> | ||
− | |||
− | == | + | == Run Instance == |
− | + | * Run instance using the baremetal instance type. Make sure to use kernel and image that support baremetal hardware (i.e contain drivers for baremetal hardware ). | |
+ | |||
+ | <pre><nowiki> | ||
+ | euca-run-instances -t tp64.8x8 -k my.key ami-CCC | ||
+ | euca-run-instances -t bm.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC | ||
+ | </nowiki></pre> |
Revision as of 15:16, 1 August 2012
- Launchpad Entry: NovaSpec:general-bare-metal-provisioning-framework
- Created: Mikyung Kang
- Maintained:Mikyung Kang David Kang Ken Igarashi
- Contributors: USC Information Sciences Institute & NTT Docomo
Summary
This blueprint proposes to support general bare-metal provisioning framework in OpenStack.
The target release for this is Folsom. USC/ISI and NTT docomo are working on integration of bare-metal provisioning implementation to support following stuff:
- Support PXE and non-PXE bare-metal machines (Review#1)
- Support several architecture types such as x86_64, tilepro64, and arm (Review#1)
- Support fault-tolerance of bare-metal nova-compute node (Review#2)
The USC/ISI team has a branch here (general bare-metal provisioning framework and non-PXE support):
- https://github.com/usc-isi/hpc-trunk-essex (stable/essex)
- https://github.com/usc-isi/nova (folsom)
- HeterogeneousTileraSupport
An etherpad for discussion of this blueprint is available at http://etherpad.openstack.org/FolsomBareMetalCloud
Based on that, NTT docomo team and USC/ISI team collaborated and then new general bare-metal provisioning framework.
- PXE support
- Additional bare-metal features
- Working branch: https://github.com/NTTdocomo-openstack/nova (will be requested to be reviewed in Aug. 2: Review#1)
Code changes
nova/nova/virt/baremetal/* nova/nova/tests/baremetal/* nova/bin/bm* nova/nova/scheduler/baremetal_host_manager.py nova/nova/tests/scheduler/test_baremetal_host_manager.py
Overview
1) A user requests a baremetal instance.
- Non-PXE (Tilera):
euca-run-instances -t tp64.8x8 -k my.key ami-CCC
- PXE
euca-run-instances -t baremetal.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC
2) nova-scheduler selects a baremetal nova-compute with the following configuration.
- Here we assume that:
$IP MySQL for baremetal DB runs at the machine whose IP address is $IP(127.0.0.1). It must be changed if a different IP address is used. $ID $ID should be replaced by MySQL user id $Password $Password should be replaced by MySQL password
- Non-PXE (Tilera) [nova.conf]:
baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm compute_driver=nova.virt.baremetal.driver.BareMetalDriver baremetal_driver=tilera power_manager=tilera_pdu instance_type_extra_specs=cpu_arch:tilepro64 baremetal_tftp_root = /tftpboot scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager
- PXE [nova.conf]:
baremetal_sql_connection=mysql://$ID:$Password@$IP/nova_bm compute_driver=nova.virt.baremetal.driver.BareMetalDriver baremetal_driver=pxe power_manager=ipmi instance_type_extra_specs=cpu_arch:x86_64 baremetal_tftp_root = /tftpboot scheduler_host_manager=nova.scheduler.baremetal_host_manager.BaremetalHostManager baremetal_deploy_kernel = xxxxxxxxxx baremetal_deploy_ramdisk = yyyyyyyy
3) The bare-metal nova-compute selects a bare-metal node from its pool based on hardware resources and the instance type (# of cpus, memory, HDDs).
4) Deployment images and configuration are prepared.
- Non-PXE (Tilera):
- The key injected file system is prepared and then NFS directory is configured for the bare-metal nodes. The kernel is already put to CF(Compact Flash Memory) of each tilera board and the ramdisk is not used for the tilera bare-metal nodes. For NFS mounting, /tftpboot/fs_x (x=node_id) should be set before launching instances.
- PXE:
- kernel and ramdisk for the deployment, and the user specified kernel and ramdisk are put to TFTP server. PXE are configured for the baremetal host.
5) The baremetal nova-compute powers on the baremetal node thorough
- Non-PXE (Tilera): PDU(Power Distribution Unit)
- PXE: IPMI
6) The image is deployed to bare-metal node.
- Non-PXE (Tilera): The images are deployed to bare-metal nodes. nova-compute mounts AMI into NFS directory based on the id of the selected tilera bare-metal node.
- PXE: The host uses the deployment kernel and ramdisk, and the baremetal nova-copute writes AMI to the host's local disk via iSCSI.
7) Bare-metal node is booted.
- Non-PXE (Tilera):
- The bare-metal node is configured for network, ssh, and iptables rule.
- Done.
- PXE:
- The host is rebooted.
- Next, the host is booted up by the user specified kernel, ramdisk and its local disk.
- Done.
Packages A: Non-PXE (Tilera)
- This procedure is for RHEL. Reading 'tilera-bm-instance-creation.txt' may make this document easy to understand.
- TFTP, NFS, EXPECT, and Telnet installation:
$ yum install nfs-utils.x86_64 expect.x86_64 tftp-server.x86_64 telnet
- TFTP configuration:
$ cat /etc/xinetd.d/tftp # default: off # description: The tftp server serves files using the trivial file transfer \ # protocol. The tftp protocol is often used to boot diskless \ # workstations, download configuration files to network-aware printers, # \ # and to start the installation process for some operating systems. service tftp { socket_type = dgram protocol = udp wait = yes user = root server = /usr/sbin/in.tftpd server_args = -s /tftpboot disable = no per_source = 11 cps = 100 2 flags = IPv4 } $ /etc/init.d/xinetd restart
- NFS configuration:
$ mkdir /tftpboot $ mkdir /tftpboot/fs_x (x: the id of tilera board) $ cat /etc/exports /tftpboot/fs_0 tilera0-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_1 tilera1-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_2 tilera2-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_3 tilera3-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_4 tilera4-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_5 tilera5-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_6 tilera6-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_7 tilera7-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_8 tilera8-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) /tftpboot/fs_9 tilera9-eth0(sync,rw,no_root_squash,no_all_squash,no_subtree_check) $ sudo /etc/init.d/nfs restart $ sudo /usr/sbin/exportfs
- TileraMDE install: TileraMDE-3.0.1.125620:
$ cd /usr/local/ $ tar -xvf tileramde-3.0.1.125620_tilepro.tar $ tar -xjvf tileramde-3.0.1.125620_tilepro_apps.tar.bz2 $ tar -xjvf tileramde-3.0.1.125620_tilepro_src.tar.bz2 $ mkdir /usr/local/TileraMDE-3.0.1.125620/tilepro/tile $ cd /usr/local/TileraMDE-3.0.1.125620/tilepro/tile/ $ tar -xjvf tileramde-3.0.1.125620_tilepro_tile.tar.bz2 $ ln -s /usr/local/TileraMDE-3.0.1.125620/tilepro/ /usr/local/TileraMDE
- Installation for 32-bit libraries to execute TileraMDE:
$ yum install glibc.i686 glibc-devel.i686
Packages B: PXE
- This procedure is for Ubuntu 12.04 x86_64. Reading 'baremetal-instance-creation.txt' may make this document easy to understand.
- dnsmasq (PXE server for baremetal hosts)
- syslinux (bootloader for PXE)
- ipmitool (operate IPMI)
- qemu-kvm (only for qemu-img)
- open-iscsi (connect to iSCSI target at berametal hosts)
- busybox (used in deployment ramdisk)
- tgt (used in deployment ramdisk)
- Example:
$ sudo apt-get install dnsmasq syslinux ipmitool qemu-kvm open-iscsi $ sudo apt-get install busybox tgt
- Ramdisk for Deployment
- To create a deployment ramdisk, use 'baremetal-mkinitrd.sh' in [baremetal-initrd-builder](https://github.com/NTTdocomo-openstack/baremetal-initrd-builder):
$ cd baremetal-initrd-builder $ ./baremetal-mkinitrd.sh <ramdisk output path> <kernel version>
$ ./baremetal-mkinitrd.sh /tmp/deploy-ramdisk.img 3.2.0-26-generic working in /tmp/baremetal-mkinitrd.9AciX98N 368017 blocks Register the kernel and the ramdisk to Glance.
$ glance add name="baremetal deployment ramdisk" is_public=true container_format=ari disk_format=ari < /tmp/deploy-ramdisk.img Uploading image 'baremetal deployment ramdisk' ===========================================[100%] 114.951697M/s, ETA 0h 0m 0s Added new image with ID: e99775cb-f78d-401e-9d14-acd86e2f36e3 $ glance add name="baremetal deployment kernel" is_public=true container_format=aki disk_format=aki < /boot/vmlinuz-3.2.0-26-generic Uploading image 'baremetal deployment kernel' ===========================================[100%] 46.9M/s, ETA 0h 0m 0s Added new image with ID: d76012fc-4055-485c-a978-f748679b89a9
- ShellInABox
- Baremetal nova-compute uses [ShellInABox](http://code.google.com/p/shellinabox/) so that users can access baremetal host's console through web browsers.
- Build from source and install:
$ sudo apt-get install gcc make $ tar xzf shellinabox-2.14.tar.gz $ cd shellinabox-2.14 $ ./configure $ sudo make install
- PXE Boot Server
- Prepare TFTP root directory:
$ sudo mkdir /tftpboot $ sudo cp /usr/lib/syslinux/pxelinux.0 /tftpboot/ $ sudo mkdir /tftpboot/pxelinux.cfg
- Start dnsmasq. Example: start dnsmasq on eth1 with PXE and TFTP enabled:
$ sudo dnsmasq --conf-file= --port=0 --enable-tftp --tftp-root=/tftpboot --dhcp-boot=pxelinux.0 --bind-interfaces --pid-file=/dnsmasq.pid --interface=eth1 --dhcp-range=192.168.175.100,192.168.175.254 (You may need to stop and disable dnsmasq) $ sudo /etc/init.d/dnsmasq stop $ sudo sudo update-rc.d dnsmasq disable
- How to create an image:
- Example: create a partition image from ubuntu cloud images' Precise tarball:
$ wget http://cloud-images.ubuntu.com/precise/current/precise-server-cloudimg-amd64-root.tar.gz $ dd if=/dev/zero of=u.img bs=1M count=0 seek=1024 $ mkfs -F -t ext4 u.img $ sudo mount -o loop u.img /mnt/ $ sudo tar -C /mnt -xzf ~/precise-server-cloudimg-amd64-root.tar.gz $ sudo rm /mnt/etc/resolv.conf # (set a temporary DNS server to use apt-get in chroot (8.8.8.8 is Google Public DNS address)) $ sudo echo nameserver 8.8.8.8 >/mnt/etc/resolv.conf $ sudo chroot /mnt apt-get install linux-image-3.2.0-26-generic vlan open-iscsi $ ln -sf ../run/resolvconf/resolv.conf /mnt/etc/resolv.conf $ sudo umount /mnt
Nova Directories
$ sudo mkdir /var/lib/nova/baremetal $ sudo mkdir /var/lib/nova/baremetal/console $ sudo mkdir /var/lib/nova/baremetal/dnsmasq
Nova Database
- Create the baremetal database. Grant all provileges to the user specified by the 'baremetal_sql_connection' flag. Example:
$ mysql -p mysql> create database nova_bm; mysql> grant all privileges on nova_bm.* to '$ID'@'%' identified by '$Password'; mysql> exit
- Create tables:
$ bm_db_sync
Create Baremetal Instance Type
- First, create an instance type in the normal way.
$ nova-manage instance_type create --name=tp64.8x8 --cpu=64 --memory=16218 --root_gb=917 --ephemeral_gb=0 --flavor=6 --swap=1024 --rxtx_factor=1 $ nova-manage instance_type create --name=bm.small --cpu=2 --memory=4096 --root_gb=10 --ephemeral_gb=20 --flavor=7 --swap=1024 --rxtx_factor=1 (about --flavor, see 'How to choose the value for flavor' section below)
- Next, set baremetal extra_spec to the instance type:
$ nova-manage instance_type set_key --name=tp64.8x8 --key cpu_arch --value 's== tilepro64' $ nova-manage instance_type set_key --name=bm.small --key cpu_arch --value 's== x86_64'
How to choose the value for flavor
- Run nova-manage instance_type list, find the maximum FlavorID in output. Use the maximum FlavorID+1 for new instance_type.
$ nova-manage instance_type list m1.medium: Memory: 4096MB, VCPUS: 2, Root: 40GB, Ephemeral: 0Gb, FlavorID: 3, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} m1.small: Memory: 2048MB, VCPUS: 1, Root: 20GB, Ephemeral: 0Gb, FlavorID: 2, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} m1.large: Memory: 8192MB, VCPUS: 4, Root: 80GB, Ephemeral: 0Gb, FlavorID: 4, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} m1.tiny: Memory: 512MB, VCPUS: 1, Root: 0GB, Ephemeral: 0Gb, FlavorID: 1, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {} m1.xlarge: Memory: 16384MB, VCPUS: 8, Root: 160GB, Ephemeral: 0Gb, FlavorID: 5, Swap: 0MB, RXTX Factor: 1.0, ExtraSpecs {}
- In the example above, the maximum Flavor ID is 5, so use 6 and 7.
Start Processes
(Currently, you might have trouble if run processes as a user other than the superuser...) $ sudo bm_deploy_server & $ sudo nova-scheduler & $ sudo nova-compute &
Register Baremetal Host and NIC
- First, register a baremetal node. Next, register the baremetal node's NICs.
- To register a baremetal node, use 'bm_node_create'. 'bm_node_create' takes the parameters listed below.
--service_host: baremetal nova-compute's hostname --cpus=: number of CPU cores --memory_mb: memory size in MegaBytes --local_gb: local disk size in GigaBytes --pm_address: tilera node's IP address / IPMI address --pm_user: IPMI username --pm_password: IPMI password --prov_mac: tilera node's MAC address / PXE NIC's MAC address --terminal_port: TCP port for ShellInABox. Each node must use unique TCP port. If you do not need console access, use 0.
$ bm_node_create --service_host=bm1 --cpus=64 --memory_mb=16218 --local_gb=917 --pm_address=10.0.2.1 --pm_user=test --pm_password=password --prov_mac=98:4b:e1:67:9a:4c --terminal_port=0 $ bm_node_create --service_host=bm1 --cpus=4 --memory_mb=6144 --local_gb=64 --pm_address=172.27.2.116 --pm_user=test --pm_password=password --prov_mac=98:4b:e1:67:9a:4c --terminal_port=8000
- To verify the node registration, run 'bm_node_list':
$ bm_node_list ID SERVICE_HOST INSTANCE_ID CPUS Memory Disk PM_Address PM_User TERMINAL_PORT PROV_MAC PROV_VLAN 1 bm1 None 64 16218 917 10.0.2.1 test 0 98:4b:e1:67:9a:4c None 2 bm1 None 4 6144 64 172.27.2.116 test 8000 98:4b:e1:67:9a:4c None
- To register NIC, use 'bm_interface_create'. 'bm_interface_create' takes the parameters listed below.
--bm_node_id: ID of the baremetal node owns this NIC (the first column of 'bm_node_list') --mac_address: this NIC's MAC address in the form of xx:xx:xx:xx:xx:xx --datapath_id: datapath ID of OpenFlow switch this NIC is connected to --port_no: OpenFlow port number this NIC is connected to (--datapath_id and --port_no are used for network isolation. It is OK to put 0, if you do not have OpenFlow switch.)
$ bm_interface_create --bm_node_id=1 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0 --port_no=0 $ bm_interface_create --bm_node_id=2 --mac_address=98:4b:e1:67:9a:4e --datapath_id=0x123abc --port_no=24
- To verify the NIC registration, run 'bm_interface_list':
$ bm_interface_list ID BM_NODE_ID MAC_ADDRESS DATAPATH_ID PORT_NO 1 1 98:4b:e1:67:9a:4e 0x0 0 2 2 98:4b:e1:67:9a:4e 0x123abc 24
Run Instance
- Run instance using the baremetal instance type. Make sure to use kernel and image that support baremetal hardware (i.e contain drivers for baremetal hardware ).
euca-run-instances -t tp64.8x8 -k my.key ami-CCC euca-run-instances -t bm.small --kernel aki-AAA --ramdisk ari-BBB ami-CCC