Oath FFU Juno To Ocata

Our environment
We have 5 types of hosts


 * API
 * runs the following services:
 * Nova
 * Nova Scheduler
 * Placement
 * Nova-Network (yes, I know.)
 * Keystone
 * Glance API & Registry
 * DB
 * MySQL
 * MQ
 * Rabbit
 * HV/Compute
 * RHEL7
 * Networking: OpenVswitch using tagged vlans
 * Storage: All VMs stored on local disk, configured RAID 10. Instances all under /openstack
 * During an OS upgrade we will leave /openstack alone, and wipe/rewrite the operating system partitions.
 * UI
 * Horizon

Preparing to upgrade

 * 1) Notify users of upcoming downtime
 * 2) Tested our chef boostrap and initial converge for api, mq, db, HVs
 * 3) Make sure VIPs are set up for new services such as placement
 * 4) Validate control plane pipeline
 * 5) HV preparation
 * 6) * Upgrade all compute nodes to RHEL 7
 * 7) DB preparation
 * 8) * archive deleted rows
 * 9) * validate backups
 * 10) Verify network ACLs are correct
 * 11) * Gotcha: Placement is a new service that did not exist in Juno. Make sure that Hypervisors can talk to the API nodes on the Placement port.
 * 12) Update the Horizon Banner with CMR information
 * 13) Add VIP settings in Chef recipe
 * 14) Internal announcements on intranet, email, etc
 * 15) Build new jumphosts that use openstack-client, rather than the novaclient
 * 16) Back up your configurations from your old deployment.
 * 17) * This is nice to have. We ran into some cases where we missed something and having old working configs as reference was nice.

Upgrade

 * 1) Take cluster snapshot of VM status
 * 2) Snooze or silence any alerting utilities so that you don't get spammed while your cluster is down
 * 3) Remove the cluster you are about to start upgrading from the dropdown in Horizon. You don't want users hitting the cluster that you are upgrading from Horizon
 * 4) * Note for Horizon: We left our old Horizon nodes with the Juno code base running on them. We brought up new machines for the new Horizon deployment. This made it easy to rollback in the event that something went wrong. Since we had multiple clusters, leaving our old Horizon nodes running allowed users to continue accessing the remaining Juno clusters while upgrades were in progress.
 * 5) Block access to the API, and wait for things to settle
 * 6) Stop the API and MQ services.
 * 7) Stop nova-compute on all hypervisors
 * 8) Stop MySQL on DB slaves.
 * 9) Make a full backup of the DB
 * 10) Run DB migration scripts
 * 11) * This means running all of the "nova-manage db sync; nova-manage cell xyc, glance-manage, " etc etc commands all in the correct order. We wrote a script to run all of this, which we'll link to once we've open sourced it.
 * 12) Re-image your API and MQ nodes if needed. Everyone's operating system requirements are different. We upgraded ours from RHEL6 to RHEL7 during this process. However, this was not required. We could have just as easily left our API and MQ nodes on RHEL6. If you don't need / want to upgrade your operating system as part of this, re-imaging your control plane with the same OS is still a good idea just to make sure all of the old cruft is removed. Your CI/CD pipelines should be able to deploy OpenStack from scratch.
 * 13) * We had 3 API nodes per cluster. During the upgrade process, we shut down 2 of the API nodes and left them on Juno. We attempted our upgrade / reimage on the third API node. This was done so that in case anything went wrong with the upgrade, we could bring back the 2 Juno API nodes easily to get our service back up while we figure out what went wrong.
 * 14) * We did not re-image our DB hosts
 * 15) Deploy the code for the release that you are upgrading to. We went from Juno to Ocata. So we deployed the Ocata code directly. We did not deploy each release of the code one by one. We just deployed straight to Ocata.
 * 16) Cleanup the DB backups if everything is working
 * 17) Repeat the API node upgrade process for the API nodes that are still on your old revision.