Difference between revisions of "StarlingX/Packet SIG"
Greg-waines (talk | contribs) m (→StarlingX Distributed Cloud on Packet.Com) |
Greg-waines (talk | contribs) (→Problem Tracking) |
||
(79 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== Packet SIG == | == Packet SIG == | ||
− | [http://packet.com Packet.com] is a baremetal public cloud, and they have donated some resources to the StarlingX project. The resources are available under the STX-PROJECT-01 project on [http://packet.com Packet.com]. | + | [http://packet.com Packet.com] is a baremetal public cloud, and they have donated some resources to the StarlingX project. The resources are available under the STX-PROJECT-01 project on [http://packet.com Packet.com]. <br /> |
+ | <br /> | ||
+ | |||
=== StarlingX Distributed Cloud on Packet.Com === | === StarlingX Distributed Cloud on Packet.Com === | ||
− | As a demonstration of the [https://wiki.openstack.org/wiki/Edge_Computing_Group OpenStack Edge Computing Group]'s [https://wiki.openstack.org/wiki/Edge_Computing_Group/Edge_Reference_Architectures#Distributed_Control_Plane_Scenario Distributed Control Plane MVP Architecture], StarlingX Distributed Cloud has been deployed on [http://packet.com Packet.com]. | + | As a demonstration of the [https://wiki.openstack.org/wiki/Edge_Computing_Group OpenStack Edge Computing Group]'s [https://wiki.openstack.org/wiki/Edge_Computing_Group/Edge_Reference_Architectures#Distributed_Control_Plane_Scenario Distributed Control Plane MVP Architecture], StarlingX Distributed Cloud has been deployed on [http://packet.com Packet.com]. <br /> |
+ | <br /> | ||
+ | STX R1 : http://mirror.starlingx.cengn.ca/mirror/starlingx/release/2018.10/centos/2018.10.0/outputs/iso/ <br /> | ||
+ | <br /> | ||
+ | Horizon for Central Cloud: http://147.75.105.194 <br /> | ||
+ | SSH to Central Cloud: ssh wrsroot@147.75.105.194 <br /> | ||
<br /> | <br /> | ||
− | |||
− | |||
<br /> | <br /> | ||
==== Packet.com Servers deployed ==== | ==== Packet.com Servers deployed ==== | ||
+ | <br /> | ||
+ | [[File: packet-servers.png]] <br /> | ||
+ | <br /> | ||
+ | [[File: cost.png]] <br /> | ||
+ | <br /> | ||
+ | |||
+ | ==== Networking ==== | ||
+ | [[File: Networking.png]] <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | ==== Setting up the iPXE Boot Server ................................................................ ==== | ||
+ | |||
+ | The initial server of a StarlingX cloud, i.e. controller-0, must be installed via a PXE Boot Server. For packet.com, IPXE Booting will actually be used ... however the setup of the ipxe boot server is the same.<br /> | ||
+ | <br /> | ||
+ | Get a small packet.com server (e.g. c1.small.x86) for the IPXE Boot Server. It should use packet.com L3 Networking and run ubuntu 16.04 on it .<br /> | ||
+ | <br /> | ||
+ | Setup the PXE Boot Server as follows: | ||
+ | <pre> | ||
+ | apt-get update | ||
+ | apt-get install apache2 -y | ||
+ | |||
+ | wget http://mirror.starlingx.cengn.ca/mirror/starlingx/release/2018.10/centos/2018.10.0/outputs/iso/bootimage.iso | ||
+ | |||
+ | mkdir -p /media/iso | ||
+ | mount -o loop ./bootimage.iso /media/iso | ||
+ | mount -o remount,exec,dev /media/iso | ||
+ | |||
+ | mkdir -p /export/pxeboot | ||
+ | cd /var/www/html | ||
+ | ln -s /export/pxeboot BIOS-Client | ||
+ | |||
+ | # for some reason have to remove pxeboot directory before running pxeboot_setup.sh | ||
+ | cd /export | ||
+ | rmdir pxeboot | ||
+ | cd | ||
+ | /media/iso/pxeboot_setup.sh -u http://<IPADDRESS-OF-IPXE-BOOT-SERVER>/BIOS-Client -t /export/pxeboot | ||
+ | </pre> | ||
+ | <br /> | ||
+ | |||
+ | ==== Get L2 VLANs and L3 Public IPs .......................................................................... ==== | ||
+ | |||
+ | Get the following L2 VLANs for the project: | ||
+ | * For Central Cloud | ||
+ | ** StarlingX MGMT Network (CC) (EWR1) | ||
+ | ** StarlingX OAM Network (CC) (EWR1) | ||
+ | ** StarlingX DATA Network -- NOT APPLICABLE | ||
+ | * For a subcloud local to the Central Cloud | ||
+ | ** StarlingX SC1 MGMT Network (EWR1) | ||
+ | ** StarlingX SC1 OAM Network (EWR1) | ||
+ | ** StarlingX SC1 DATA Network (EWR1) | ||
+ | * For a subcloud remote from the Central Cloud | ||
+ | ** StarlingX SC3 MGMT Network (DFW2) | ||
+ | ** StarlingX SC3 OAM Network (DFW2) | ||
+ | ** StarlingX SC3 DATA Network (DFW2) | ||
+ | <br /> | ||
+ | Get the following L3 Public IPs for the project: | ||
+ | * For Central Cloud | ||
+ | ** CC MGMT IP Subnet - Public IPv4/28 subnet <-- NOTE must be at least /28 | ||
+ | ** CC OAM IP Subnet - Public IPv4/29 subnet | ||
+ | ** CC DATA -- NOT APPLICABLE | ||
+ | * For a subcloud local to the Central Cloud | ||
+ | ** SC1 MGMT IP Subnet ... LOCAL / PRIVATE | ||
+ | ** SC1 OAM IP Subnet ... LOCAL / PRIVATE | ||
+ | ** SC1 DATA IP Subnet - Public IPv4/29 subnet | ||
+ | * For a subcloud remote from the Central Cloud | ||
+ | ** SC3 MGMT IP Subnet - Public IPv4/28 subnet <-- NOTE must be at least /28 | ||
+ | ** SC3 OAM IP Subnet - Public IPv4/29 subnet | ||
+ | ** SC3 DATA IP Subnet - Public IPv4/29 subnet | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | ==== Setting up the XYZ-ROUTERs................................................................ ==== | ||
+ | |||
+ | All of the packet.com servers running StarlingX software will use L2-only packet.com networking. A hybrid L3-Networking/L2-Networking server AT EACH packet.com SITE, is required in order to route from the StarlingX L2 networks into the packet.com L3 networks and ultimately the public internet.<br /> | ||
+ | <br /> | ||
+ | At each site, get another packet.com server, that supports hybrid L3/L2 networking (e.g. c2.medium.x86) and run ubuntu 16.04 on it. | ||
+ | * Add all the VLANs onto its L2 port | ||
+ | * Add all Public IPv4 /29 subnets to this server (This will result in packet.com forwarding all packets to these subnets to this server.) | ||
+ | <br /> | ||
+ | Setup the router-nat server as follows: | ||
+ | <pre> | ||
+ | apt-get update | ||
+ | apt-get install vlan | ||
+ | sudo su -c 'echo "8021q" >> /etc/modules' | ||
+ | /sbin/reboot | ||
+ | </pre> | ||
+ | vi /etc/network/interfaces # as required, example below is for the CC-Router | ||
+ | <pre> | ||
+ | auto lo | ||
+ | iface lo inet loopback | ||
+ | |||
+ | auto bond0 | ||
+ | iface bond0 inet static | ||
+ | address 147.75.39.198 | ||
+ | netmask 255.255.255.252 | ||
+ | gateway 147.75.39.197 | ||
+ | bond-downdelay 200 | ||
+ | bond-miimon 100 | ||
+ | bond-mode 4 | ||
+ | bond-updelay 200 | ||
+ | bond-xmit_hash_policy layer3+4 | ||
+ | bond-lacp-rate 1 | ||
+ | bond-slaves enp1s0f0 | ||
+ | dns-nameservers 147.75.207.207 147.75.207.208 | ||
+ | iface bond0 inet6 static | ||
+ | address 2604:1380:1:b800::1 | ||
+ | netmask 127 | ||
+ | gateway 2604:1380:1:b800:: | ||
+ | |||
+ | auto bond0:0 | ||
+ | iface bond0:0 inet static | ||
+ | address 10.99.254.1 | ||
+ | netmask 255.255.255.254 | ||
+ | post-up route add -net 10.0.0.0/8 gw 10.99.254.0 | ||
+ | post-down route del -net 10.0.0.0/8 gw 10.99.254.0 | ||
+ | |||
+ | auto enp1s0f0 | ||
+ | iface enp1s0f0 inet manual | ||
+ | bond-master bond0 | ||
+ | |||
+ | auto enp1s0f1 | ||
+ | iface enp1s0f1 inet static | ||
+ | address 147.75.105.193 | ||
+ | netmask 255.255.255.248 | ||
+ | network 147.75.105.192 | ||
+ | broadcast 147.75.105.199 | ||
+ | |||
+ | auto enp1s0f1.1092 | ||
+ | iface enp1s0f1.1092 inet static | ||
+ | address 11.11.11.1 | ||
+ | netmask 255.255.255.0 | ||
+ | network 11.11.11.0 | ||
+ | broadcast 11.11.11.255 | ||
+ | vlan-raw-device enp1s0f1 | ||
+ | |||
+ | auto enp1s0f1.1172 | ||
+ | iface enp1s0f1.1172 inet static | ||
+ | address 139.178.67.17 | ||
+ | netmask 255.255.255.240 | ||
+ | network 139.178.67.16 | ||
+ | broadcast 139.178.67.31 | ||
+ | mtu 1400 | ||
+ | vlan-raw-device enp1s0f1 | ||
+ | |||
+ | auto enp1s0f1.1117 | ||
+ | iface enp1s0f1.1117 inet static | ||
+ | address 192.168.214.1 | ||
+ | netmask 255.255.255.0 | ||
+ | network 192.168.214.0 | ||
+ | broadcast 192.168.214.255 | ||
+ | mtu 1400 | ||
+ | vlan-raw-device enp1s0f1 | ||
+ | |||
+ | auto enp1s0f1.1145 | ||
+ | iface enp1s0f1.1145 inet static | ||
+ | address 139.178.66.41 | ||
+ | netmask 255.255.255.248 | ||
+ | network 139.178.66.40 | ||
+ | broadcast 139.178.66.47 | ||
+ | vlan-raw-device enp1s0f1 | ||
+ | </pre> | ||
+ | <br /> | ||
+ | Reboot server one more time and enable IP forwarding: | ||
+ | <pre> | ||
+ | /sbin/reboot | ||
+ | ... | ||
+ | sysctl -w net.ipv4.ip_forward=1 | ||
+ | </pre> | ||
+ | <br /> | ||
+ | <br /> | ||
− | + | ==== Installing the Central Cloud ........................................................................ ==== | |
<br /> | <br /> | ||
− | ==== Networking ==== | + | '''SW Install of initial server (cc-controller-0):''' <br /> |
+ | In order to IPXE Install the StarlingX Load onto a packet.com server, it must have L3 connectivity.<br /> | ||
+ | <br /> | ||
+ | Setup a new packet.com server for controller-0 (e.g. m1.xlarge.x86), initially with L3 connectivity and initially with an ubuntu 16.04 load.<br /> | ||
+ | <br /> | ||
+ | After the server boots, 'REINSTALL' the packet.com server using 'Custom IPXE' and specifying the following ipxe.conf file: | ||
+ | <pre> | ||
+ | #!ipxe | ||
+ | set base-url http://147.75.38.129/BIOS-Client | ||
+ | kernel ${base-url}/vmlinuz console=ttyS1,115200n8 root=live:${base-url}/LiveOS/squashfs.img ip=dhcp ks=${base-url}/pxeboot_controller.cfg boot_device=sda rootfs_device=sda inst.gpt inst.text inst.repo=${base-url} security_profile=standard user_namespace.enable=1 network ksdevice=bootif BOOTIF=01-${netX/mac} | ||
+ | initrd ${base-url}/initrd.img | ||
+ | imgstat | ||
+ | boot | ||
+ | </pre> | ||
+ | <br /> | ||
+ | NOTE: I setup a GitHub repo with ipxe.conf files for a StarlingX Standard configuration and a StarlingX All-In-One configuration: https://github.com/gwaines/ipxe-configs <br /> | ||
+ | <br /> | ||
+ | Specify https://raw.githubusercontent.com/gwaines/ipxe-configs/master/ipxe-starlingx-standard-sda.conf as the URL for the ipxe.conf file for the Central Cloud.<br /> | ||
+ | <br /> | ||
+ | The server will reboot a few times, run the StarlingX installer to install the StarlingX load on /dev/sda and then reboot a final time running the StarlingX load.<br /> | ||
+ | <br /> | ||
+ | |||
+ | '''Change to L2-only Networking:''' <br /> | ||
+ | Switch the controller-0 packet.com server to L2-only networking now, and set: | ||
+ | * port 1 = CC MGMT VLAN | ||
+ | * port 2 = CC OAM VLAN | ||
+ | <br /> | ||
+ | |||
+ | '''Bootstrap StarlingX Software:''' <br /> | ||
+ | Login to console with wrsroot/wrsroot ... system will force password change.<br /> | ||
+ | <br /> | ||
+ | Cleanup remnants of L3 networking before running 'config_controller': | ||
+ | <pre> | ||
+ | /sbin/ifdown <port1-dev> | ||
+ | vi /etc/resolv.conf ## and delete all dns server entries | ||
+ | </pre> | ||
+ | <br /> | ||
+ | Bootstrap the StarlingX software by running 'config_controller': | ||
+ | <pre> | ||
+ | sudo config_controller | ||
+ | ... | ||
+ | # answer all the questions appropriately | ||
+ | # e.g. specify that this is a Distributed Cloud install of the Central Cloud (SystemController) | ||
+ | # e.g. specify the correct port/dev names for MGMT and OAM | ||
+ | # e.g. specify a unique (private) IP Subnet MGMT; for central cloud can use default 192.168.204.0/24 | ||
+ | # e.g. use the CC OAM Public IPv4 subnet for the OAM Addresses | ||
+ | ... | ||
+ | </pre> | ||
+ | ... wait for config_controller to complete.<br /> | ||
+ | <br /> | ||
+ | Fix BIOS Boot Settings on server: | ||
+ | * reboot server, | ||
+ | * from console, F2 (or whatever) to enter Setup and change boot setting to boot disk first, and then network.<br /> | ||
+ | <br /> | ||
+ | |||
+ | '''Configure and Unlock Controller-0''' <br /> | ||
+ | Refer to StarlingX documentation and | ||
+ | * Configure Cinder Storage on LVM of controller-0 disk or partition, | ||
+ | * Add LVM Storage Backend for Cinder, | ||
+ | <br /> | ||
+ | WORKAROUND for packet.com switches dropping multicast heartbeat packets: <br /> | ||
+ | Change maintenance's heartbeat failure behaviour to simply raise an alarm ... rather than declare the node as failed and reset the node. | ||
+ | <pre> | ||
+ | system service-parameter-list | ||
+ | system service-parameter-modify platform maintenance heartbeat_failure_action=alarm | ||
+ | system service-parameter-apply platform | ||
+ | </pre> | ||
+ | <br /> | ||
+ | ... and unlock controller-0. <br /> | ||
+ | <br /> | ||
+ | |||
+ | |||
+ | '''SW Install of second controller (cc-controller-1):''' <br /> | ||
+ | All subsequent nodes in StarlingX cluster are PXE BOOTED from controller-0.<br /> | ||
+ | <br /> | ||
+ | Setup a new packet.com server for controller-1 (e.g. m1.xlarge.x86), initially with L3 connectivity and initially with an ubuntu 16.04 load.<br /> | ||
+ | <br /> | ||
+ | Switch the controller-1 packet.com server to L2-only networking now, and set: | ||
+ | * port 1 = CC MGMT VLAN | ||
+ | * port 2 = CC OAM VLAN | ||
+ | <br /> | ||
+ | And reboot from console, and use F# key to force a PXE BOOT.<br /> | ||
+ | Console should boot with message that it is waiting for configuration of server at controller.<br /> | ||
+ | <br /> | ||
+ | On controller-0 console: | ||
+ | <pre> | ||
+ | system host-update 2 personality=controller | ||
+ | </pre> | ||
+ | <br /> | ||
+ | controller-1 should now start to install software and then reboot into a locked/online state ... ready for provisioning.<br /> | ||
+ | <br /> | ||
+ | |||
+ | '''Configure and Unlock Controller-1:''' <br /> | ||
+ | Refer to StarlingX documentation and | ||
+ | * Configure OAM Interface, | ||
+ | * Configure Cinder Storage on LVM of controller-1 disk or partition, | ||
+ | ... and unlock controller-1. <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | ==== Installing the Sub Clouds ............................................................................ ==== | ||
+ | <br /> | ||
+ | |||
+ | '''Configuring the Sub Clouds at the Central Cloud:''' <br /> | ||
+ | |||
+ | Configure the subcloud at the Central Cloud (SystemController): | ||
+ | <pre> | ||
+ | dcmanager subcloud add --name=subcloud-1 \ | ||
+ | --description="subcloud-1 description" \ | ||
+ | --location="subcloud-1 location" \ | ||
+ | --management-subnet=192.168.214.0/24 \ | ||
+ | --management-start-ip=192.168.214.2 \ | ||
+ | --management-end-ip=192.168.214.50 \ | ||
+ | --management-gateway-ip=192.168.214.1 \ | ||
+ | --systemcontroller-gateway-ip=192.168.204.1 | ||
+ | |||
+ | dcmanager subcloud generate-config subcloud-1 \ | ||
+ | --management-interface-port=eno1 \ | ||
+ | --management-interface-mtu=1400 \ | ||
+ | --oam-subnet=11.11.11.0/24 \ | ||
+ | --oam-gateway-ip=11.11.11.1 \ | ||
+ | --oam-floating-ip=11.11.11.12 \ | ||
+ | --oam-unit-0-ip=11.11.11.13 \ | ||
+ | --oam-unit-1-ip=11.11.11.14 \ | ||
+ | --oam-interface-port=eno2 \ | ||
+ | --oam-interface-mtu=1500 \ | ||
+ | --system-mode=simplex &> subcloud-1.ini | ||
+ | </pre> | ||
+ | <br /> | ||
+ | |||
+ | '''SW Install of initial server (sc1-controller-0):''' <br /> | ||
+ | In order to IPXE Install the StarlingX Load onto a packet.com server, it must have L3 connectivity.<br /> | ||
+ | <br /> | ||
+ | Setup a new packet.com server for the All-In-One controller of the subcloud (e.g. m1.xlarge.x86), initially with L3 connectivity and initially with an ubuntu 16.04 load.<br /> | ||
+ | <br /> | ||
+ | After the server boots, 'REINSTALL' the packet.com server using 'Custom IPXE' and specifying the following ipxe.conf file: | ||
+ | <pre> | ||
+ | #!ipxe | ||
+ | set base-url http://147.75.38.129/BIOS-Client | ||
+ | kernel ${base-url}/vmlinuz console=ttyS1,115200n8 root=live:${base-url}/LiveOS/squashfs.img ip=dhcp ks=${base-url}/pxeboot_smallsystem.cfg boot_device=sda rootfs_device=sda inst.gpt inst.text inst.repo=${base-url} security_profile=standard user_namespace.enable=1 network ksdevice=bootif BOOTIF=01-${netX/mac} | ||
+ | initrd ${base-url}/initrd.img | ||
+ | imgstat | ||
+ | boot | ||
+ | </pre> | ||
+ | <br /> | ||
+ | NOTE: I setup a GitHub repo with ipxe.conf files for a StarlingX Standard configuration and a StarlingX All-In-One configuration: https://github.com/gwaines/ipxe-configs <br /> | ||
+ | <br /> | ||
+ | Specify https://raw.githubusercontent.com/gwaines/ipxe-configs/master/ipxe-starlingx-aio-sda.conf as the URL for the ipxe.conf file for an AIO Subcloud.<br /> | ||
+ | <br /> | ||
+ | The server will reboot a few times, run the StarlingX installer to install the StarlingX load on /dev/sda and then reboot a final time running the StarlingX load.<br /> | ||
+ | <br /> | ||
+ | |||
+ | '''Change to L2-only Networking:''' <br /> | ||
+ | Switch the All-In-One controller packet.com server to L2-only networking now, and set: | ||
+ | * port 1 = SC1 MGMT VLAN | ||
+ | * port 2 = SC1 OAM VLAN | ||
+ | * port 3 = SC1 DATA VLAN | ||
+ | <br /> | ||
+ | |||
+ | |||
+ | '''Bootstrap StarlingX Software:''' <br /> | ||
+ | Login to console with wrsroot/wrsroot ... system will force password change.<br /> | ||
+ | <br /> | ||
+ | Cleanup remnants of L3 networking before running 'config_controller': | ||
+ | <pre> | ||
+ | /sbin/ifdown <port1-dev> | ||
+ | vi /etc/resolv.conf ## and delete all dns server entries | ||
+ | </pre> | ||
+ | <br /> | ||
+ | Bootstrap the StarlingX software as a subcloud by running 'config_subcloud' and passing in the subcloud-1.ini file generated on the SystemController. ( You'll have to transfer the contents of that file to the subcloud all-in-one server. | ||
+ | <pre> | ||
+ | sudo config_subcloud subcloud-1.ini | ||
+ | </pre> | ||
+ | ... wait for config_controller to complete.<br /> | ||
+ | <br /> | ||
+ | Fix BIOS Boot Settings on server: | ||
+ | * reboot server, | ||
+ | * from console, F2 (or whatever) to enter Setup and change boot setting to boot disk first, and then network.<br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | '''Configure and Unlock Controller-0''' <br /> | ||
+ | Refer to StarlingX documentation and | ||
+ | * Configure Cinder Storage on LVM of controller-0 disk or partition, | ||
+ | * Add LVM Storage Backend for Cinder, | ||
+ | * Configure Flat Provider Network, | ||
+ | * Configure Data Interface on port 3 and indicate that it is attached to Flat Provider Network, | ||
+ | * Create an external network/subnet on that Flat Provider Network with the SC1 DATA Subnet. | ||
+ | ... and unlock controller-0. <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | '''Change the subcloud to managed:''' <br /> | ||
+ | |||
+ | At the Central Cloud (SystemController): | ||
+ | <pre> | ||
+ | dcmanager subcloud manage subcloud-1 | ||
+ | </pre> | ||
+ | This will enable synchronization and monitoring of the subcloud by the central cloud. | ||
+ | <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | ==== Problem Tracking ==== | ||
+ | <br /> | ||
+ | '''Problems Found with Packet.com:''' <br /> | ||
+ | * Out-of-band console connection to DFW2 n2.xlarge.x86 server does not work (works on other server types) | ||
+ | * Multi-cast Packets appear to be dropped by Packet.com switches | ||
+ | ** Results in StartingX Maintenance reporting a false alarm that it has lost connectivity with other nodes in the StarlingX Cloud. E.g. In Central Cloud, cc-controller-0 claims that it has a heartbeat audit failure with cc-controller-1. | ||
+ | ** WORKAROUND: Change maintenance's heartbeat failure behaviour to simply raise an alarm ... rather than declare the node as failed and reset the node. | ||
+ | <pre> | ||
+ | system service-parameter-list | ||
+ | system service-parameter-modify platform maintenance heartbeat_failure_action=alarm | ||
+ | system service-parameter-apply platform | ||
+ | </pre> | ||
+ | * On some interfaces, packets between 1400 and 1500 bytes were dropped by packet.com switch, even though MTU of Interface was 1500 bytes. | ||
+ | ** WORKAROUND: set all MTUs to 1400 bytes | ||
+ | <br /> | ||
+ | '''Problems Found with StarlingX R1.0:''' <br /> | ||
+ | * config_subcloud would fail if prior to running the command on controller-0 of the subcloud, the mgmt interface was already configured and there were any unreachable DNS servers configured in /etc/resolv.conf | ||
+ | ** WORKAROUND: ifdown <mgmt-if-dev> and vi /etc/resolv.conf and remove all DNS Server entries | ||
+ | * HORIZON subcloud 'managed' would fail with some error about JSON | ||
+ | ** WORKAROUND: Use CLI to set subcloud to managed | ||
+ | * ADMIN adding new user on SystemController from Horizon causes ADMIN to be force logged out. | ||
+ | * Keypair synchronization does not seem to be working | ||
+ | * Cannot create a Volume from Image on subcloud | ||
+ | * glance image-download on a subcloud will not work if sub cloud's oam interface does not have connectivity to central cloud ... seems wrong | ||
+ | * complexity password error in horizon does not give a good error response ... basically just says "Can not create user." | ||
+ | * creating a tenant-specific non-public image at system controller ... can't see it on sub clouds | ||
+ | * would be nice to support a /31 oam network so we could use the default L3 interface setup by packet for the oam network | ||
− | |||
<br /> | <br /> |
Latest revision as of 18:10, 8 August 2019
Contents
- 1 Packet SIG
- 1.1 StarlingX Distributed Cloud on Packet.Com
- 1.1.1 Packet.com Servers deployed
- 1.1.2 Networking
- 1.1.3 Setting up the iPXE Boot Server ................................................................
- 1.1.4 Get L2 VLANs and L3 Public IPs ..........................................................................
- 1.1.5 Setting up the XYZ-ROUTERs................................................................
- 1.1.6 Installing the Central Cloud ........................................................................
- 1.1.7 Installing the Sub Clouds ............................................................................
- 1.1.8 Problem Tracking
- 1.1 StarlingX Distributed Cloud on Packet.Com
Packet SIG
Packet.com is a baremetal public cloud, and they have donated some resources to the StarlingX project. The resources are available under the STX-PROJECT-01 project on Packet.com.
StarlingX Distributed Cloud on Packet.Com
As a demonstration of the OpenStack Edge Computing Group's Distributed Control Plane MVP Architecture, StarlingX Distributed Cloud has been deployed on Packet.com.
STX R1 : http://mirror.starlingx.cengn.ca/mirror/starlingx/release/2018.10/centos/2018.10.0/outputs/iso/
Horizon for Central Cloud: http://147.75.105.194
SSH to Central Cloud: ssh wrsroot@147.75.105.194
Packet.com Servers deployed
Networking
Setting up the iPXE Boot Server ................................................................
The initial server of a StarlingX cloud, i.e. controller-0, must be installed via a PXE Boot Server. For packet.com, IPXE Booting will actually be used ... however the setup of the ipxe boot server is the same.
Get a small packet.com server (e.g. c1.small.x86) for the IPXE Boot Server. It should use packet.com L3 Networking and run ubuntu 16.04 on it .
Setup the PXE Boot Server as follows:
apt-get update apt-get install apache2 -y wget http://mirror.starlingx.cengn.ca/mirror/starlingx/release/2018.10/centos/2018.10.0/outputs/iso/bootimage.iso mkdir -p /media/iso mount -o loop ./bootimage.iso /media/iso mount -o remount,exec,dev /media/iso mkdir -p /export/pxeboot cd /var/www/html ln -s /export/pxeboot BIOS-Client # for some reason have to remove pxeboot directory before running pxeboot_setup.sh cd /export rmdir pxeboot cd /media/iso/pxeboot_setup.sh -u http://<IPADDRESS-OF-IPXE-BOOT-SERVER>/BIOS-Client -t /export/pxeboot
Get L2 VLANs and L3 Public IPs ..........................................................................
Get the following L2 VLANs for the project:
- For Central Cloud
- StarlingX MGMT Network (CC) (EWR1)
- StarlingX OAM Network (CC) (EWR1)
- StarlingX DATA Network -- NOT APPLICABLE
- For a subcloud local to the Central Cloud
- StarlingX SC1 MGMT Network (EWR1)
- StarlingX SC1 OAM Network (EWR1)
- StarlingX SC1 DATA Network (EWR1)
- For a subcloud remote from the Central Cloud
- StarlingX SC3 MGMT Network (DFW2)
- StarlingX SC3 OAM Network (DFW2)
- StarlingX SC3 DATA Network (DFW2)
Get the following L3 Public IPs for the project:
- For Central Cloud
- CC MGMT IP Subnet - Public IPv4/28 subnet <-- NOTE must be at least /28
- CC OAM IP Subnet - Public IPv4/29 subnet
- CC DATA -- NOT APPLICABLE
- For a subcloud local to the Central Cloud
- SC1 MGMT IP Subnet ... LOCAL / PRIVATE
- SC1 OAM IP Subnet ... LOCAL / PRIVATE
- SC1 DATA IP Subnet - Public IPv4/29 subnet
- For a subcloud remote from the Central Cloud
- SC3 MGMT IP Subnet - Public IPv4/28 subnet <-- NOTE must be at least /28
- SC3 OAM IP Subnet - Public IPv4/29 subnet
- SC3 DATA IP Subnet - Public IPv4/29 subnet
Setting up the XYZ-ROUTERs................................................................
All of the packet.com servers running StarlingX software will use L2-only packet.com networking. A hybrid L3-Networking/L2-Networking server AT EACH packet.com SITE, is required in order to route from the StarlingX L2 networks into the packet.com L3 networks and ultimately the public internet.
At each site, get another packet.com server, that supports hybrid L3/L2 networking (e.g. c2.medium.x86) and run ubuntu 16.04 on it.
- Add all the VLANs onto its L2 port
- Add all Public IPv4 /29 subnets to this server (This will result in packet.com forwarding all packets to these subnets to this server.)
Setup the router-nat server as follows:
apt-get update apt-get install vlan sudo su -c 'echo "8021q" >> /etc/modules' /sbin/reboot
vi /etc/network/interfaces # as required, example below is for the CC-Router
auto lo iface lo inet loopback auto bond0 iface bond0 inet static address 147.75.39.198 netmask 255.255.255.252 gateway 147.75.39.197 bond-downdelay 200 bond-miimon 100 bond-mode 4 bond-updelay 200 bond-xmit_hash_policy layer3+4 bond-lacp-rate 1 bond-slaves enp1s0f0 dns-nameservers 147.75.207.207 147.75.207.208 iface bond0 inet6 static address 2604:1380:1:b800::1 netmask 127 gateway 2604:1380:1:b800:: auto bond0:0 iface bond0:0 inet static address 10.99.254.1 netmask 255.255.255.254 post-up route add -net 10.0.0.0/8 gw 10.99.254.0 post-down route del -net 10.0.0.0/8 gw 10.99.254.0 auto enp1s0f0 iface enp1s0f0 inet manual bond-master bond0 auto enp1s0f1 iface enp1s0f1 inet static address 147.75.105.193 netmask 255.255.255.248 network 147.75.105.192 broadcast 147.75.105.199 auto enp1s0f1.1092 iface enp1s0f1.1092 inet static address 11.11.11.1 netmask 255.255.255.0 network 11.11.11.0 broadcast 11.11.11.255 vlan-raw-device enp1s0f1 auto enp1s0f1.1172 iface enp1s0f1.1172 inet static address 139.178.67.17 netmask 255.255.255.240 network 139.178.67.16 broadcast 139.178.67.31 mtu 1400 vlan-raw-device enp1s0f1 auto enp1s0f1.1117 iface enp1s0f1.1117 inet static address 192.168.214.1 netmask 255.255.255.0 network 192.168.214.0 broadcast 192.168.214.255 mtu 1400 vlan-raw-device enp1s0f1 auto enp1s0f1.1145 iface enp1s0f1.1145 inet static address 139.178.66.41 netmask 255.255.255.248 network 139.178.66.40 broadcast 139.178.66.47 vlan-raw-device enp1s0f1
Reboot server one more time and enable IP forwarding:
/sbin/reboot ... sysctl -w net.ipv4.ip_forward=1
Installing the Central Cloud ........................................................................
SW Install of initial server (cc-controller-0):
In order to IPXE Install the StarlingX Load onto a packet.com server, it must have L3 connectivity.
Setup a new packet.com server for controller-0 (e.g. m1.xlarge.x86), initially with L3 connectivity and initially with an ubuntu 16.04 load.
After the server boots, 'REINSTALL' the packet.com server using 'Custom IPXE' and specifying the following ipxe.conf file:
#!ipxe set base-url http://147.75.38.129/BIOS-Client kernel ${base-url}/vmlinuz console=ttyS1,115200n8 root=live:${base-url}/LiveOS/squashfs.img ip=dhcp ks=${base-url}/pxeboot_controller.cfg boot_device=sda rootfs_device=sda inst.gpt inst.text inst.repo=${base-url} security_profile=standard user_namespace.enable=1 network ksdevice=bootif BOOTIF=01-${netX/mac} initrd ${base-url}/initrd.img imgstat boot
NOTE: I setup a GitHub repo with ipxe.conf files for a StarlingX Standard configuration and a StarlingX All-In-One configuration: https://github.com/gwaines/ipxe-configs
Specify https://raw.githubusercontent.com/gwaines/ipxe-configs/master/ipxe-starlingx-standard-sda.conf as the URL for the ipxe.conf file for the Central Cloud.
The server will reboot a few times, run the StarlingX installer to install the StarlingX load on /dev/sda and then reboot a final time running the StarlingX load.
Change to L2-only Networking:
Switch the controller-0 packet.com server to L2-only networking now, and set:
- port 1 = CC MGMT VLAN
- port 2 = CC OAM VLAN
Bootstrap StarlingX Software:
Login to console with wrsroot/wrsroot ... system will force password change.
Cleanup remnants of L3 networking before running 'config_controller':
/sbin/ifdown <port1-dev> vi /etc/resolv.conf ## and delete all dns server entries
Bootstrap the StarlingX software by running 'config_controller':
sudo config_controller ... # answer all the questions appropriately # e.g. specify that this is a Distributed Cloud install of the Central Cloud (SystemController) # e.g. specify the correct port/dev names for MGMT and OAM # e.g. specify a unique (private) IP Subnet MGMT; for central cloud can use default 192.168.204.0/24 # e.g. use the CC OAM Public IPv4 subnet for the OAM Addresses ...
... wait for config_controller to complete.
Fix BIOS Boot Settings on server:
- reboot server,
- from console, F2 (or whatever) to enter Setup and change boot setting to boot disk first, and then network.
Configure and Unlock Controller-0
Refer to StarlingX documentation and
- Configure Cinder Storage on LVM of controller-0 disk or partition,
- Add LVM Storage Backend for Cinder,
WORKAROUND for packet.com switches dropping multicast heartbeat packets:
Change maintenance's heartbeat failure behaviour to simply raise an alarm ... rather than declare the node as failed and reset the node.
system service-parameter-list system service-parameter-modify platform maintenance heartbeat_failure_action=alarm system service-parameter-apply platform
... and unlock controller-0.
SW Install of second controller (cc-controller-1):
All subsequent nodes in StarlingX cluster are PXE BOOTED from controller-0.
Setup a new packet.com server for controller-1 (e.g. m1.xlarge.x86), initially with L3 connectivity and initially with an ubuntu 16.04 load.
Switch the controller-1 packet.com server to L2-only networking now, and set:
- port 1 = CC MGMT VLAN
- port 2 = CC OAM VLAN
And reboot from console, and use F# key to force a PXE BOOT.
Console should boot with message that it is waiting for configuration of server at controller.
On controller-0 console:
system host-update 2 personality=controller
controller-1 should now start to install software and then reboot into a locked/online state ... ready for provisioning.
Configure and Unlock Controller-1:
Refer to StarlingX documentation and
- Configure OAM Interface,
- Configure Cinder Storage on LVM of controller-1 disk or partition,
... and unlock controller-1.
Installing the Sub Clouds ............................................................................
Configuring the Sub Clouds at the Central Cloud:
Configure the subcloud at the Central Cloud (SystemController):
dcmanager subcloud add --name=subcloud-1 \ --description="subcloud-1 description" \ --location="subcloud-1 location" \ --management-subnet=192.168.214.0/24 \ --management-start-ip=192.168.214.2 \ --management-end-ip=192.168.214.50 \ --management-gateway-ip=192.168.214.1 \ --systemcontroller-gateway-ip=192.168.204.1 dcmanager subcloud generate-config subcloud-1 \ --management-interface-port=eno1 \ --management-interface-mtu=1400 \ --oam-subnet=11.11.11.0/24 \ --oam-gateway-ip=11.11.11.1 \ --oam-floating-ip=11.11.11.12 \ --oam-unit-0-ip=11.11.11.13 \ --oam-unit-1-ip=11.11.11.14 \ --oam-interface-port=eno2 \ --oam-interface-mtu=1500 \ --system-mode=simplex &> subcloud-1.ini
SW Install of initial server (sc1-controller-0):
In order to IPXE Install the StarlingX Load onto a packet.com server, it must have L3 connectivity.
Setup a new packet.com server for the All-In-One controller of the subcloud (e.g. m1.xlarge.x86), initially with L3 connectivity and initially with an ubuntu 16.04 load.
After the server boots, 'REINSTALL' the packet.com server using 'Custom IPXE' and specifying the following ipxe.conf file:
#!ipxe set base-url http://147.75.38.129/BIOS-Client kernel ${base-url}/vmlinuz console=ttyS1,115200n8 root=live:${base-url}/LiveOS/squashfs.img ip=dhcp ks=${base-url}/pxeboot_smallsystem.cfg boot_device=sda rootfs_device=sda inst.gpt inst.text inst.repo=${base-url} security_profile=standard user_namespace.enable=1 network ksdevice=bootif BOOTIF=01-${netX/mac} initrd ${base-url}/initrd.img imgstat boot
NOTE: I setup a GitHub repo with ipxe.conf files for a StarlingX Standard configuration and a StarlingX All-In-One configuration: https://github.com/gwaines/ipxe-configs
Specify https://raw.githubusercontent.com/gwaines/ipxe-configs/master/ipxe-starlingx-aio-sda.conf as the URL for the ipxe.conf file for an AIO Subcloud.
The server will reboot a few times, run the StarlingX installer to install the StarlingX load on /dev/sda and then reboot a final time running the StarlingX load.
Change to L2-only Networking:
Switch the All-In-One controller packet.com server to L2-only networking now, and set:
- port 1 = SC1 MGMT VLAN
- port 2 = SC1 OAM VLAN
- port 3 = SC1 DATA VLAN
Bootstrap StarlingX Software:
Login to console with wrsroot/wrsroot ... system will force password change.
Cleanup remnants of L3 networking before running 'config_controller':
/sbin/ifdown <port1-dev> vi /etc/resolv.conf ## and delete all dns server entries
Bootstrap the StarlingX software as a subcloud by running 'config_subcloud' and passing in the subcloud-1.ini file generated on the SystemController. ( You'll have to transfer the contents of that file to the subcloud all-in-one server.
sudo config_subcloud subcloud-1.ini
... wait for config_controller to complete.
Fix BIOS Boot Settings on server:
- reboot server,
- from console, F2 (or whatever) to enter Setup and change boot setting to boot disk first, and then network.
Configure and Unlock Controller-0
Refer to StarlingX documentation and
- Configure Cinder Storage on LVM of controller-0 disk or partition,
- Add LVM Storage Backend for Cinder,
- Configure Flat Provider Network,
- Configure Data Interface on port 3 and indicate that it is attached to Flat Provider Network,
- Create an external network/subnet on that Flat Provider Network with the SC1 DATA Subnet.
... and unlock controller-0.
Change the subcloud to managed:
At the Central Cloud (SystemController):
dcmanager subcloud manage subcloud-1
This will enable synchronization and monitoring of the subcloud by the central cloud.
Problem Tracking
Problems Found with Packet.com:
- Out-of-band console connection to DFW2 n2.xlarge.x86 server does not work (works on other server types)
- Multi-cast Packets appear to be dropped by Packet.com switches
- Results in StartingX Maintenance reporting a false alarm that it has lost connectivity with other nodes in the StarlingX Cloud. E.g. In Central Cloud, cc-controller-0 claims that it has a heartbeat audit failure with cc-controller-1.
- WORKAROUND: Change maintenance's heartbeat failure behaviour to simply raise an alarm ... rather than declare the node as failed and reset the node.
system service-parameter-list system service-parameter-modify platform maintenance heartbeat_failure_action=alarm system service-parameter-apply platform
- On some interfaces, packets between 1400 and 1500 bytes were dropped by packet.com switch, even though MTU of Interface was 1500 bytes.
- WORKAROUND: set all MTUs to 1400 bytes
Problems Found with StarlingX R1.0:
- config_subcloud would fail if prior to running the command on controller-0 of the subcloud, the mgmt interface was already configured and there were any unreachable DNS servers configured in /etc/resolv.conf
- WORKAROUND: ifdown <mgmt-if-dev> and vi /etc/resolv.conf and remove all DNS Server entries
- HORIZON subcloud 'managed' would fail with some error about JSON
- WORKAROUND: Use CLI to set subcloud to managed
- ADMIN adding new user on SystemController from Horizon causes ADMIN to be force logged out.
- Keypair synchronization does not seem to be working
- Cannot create a Volume from Image on subcloud
- glance image-download on a subcloud will not work if sub cloud's oam interface does not have connectivity to central cloud ... seems wrong
- complexity password error in horizon does not give a good error response ... basically just says "Can not create user."
- creating a tenant-specific non-public image at system controller ... can't see it on sub clouds
- would be nice to support a /31 oam network so we could use the default L3 interface setup by packet for the oam network