- Launchpad Entry
We need the ability to move XenServer instances from one host in a zone to another host within the same zone. This provides a substantial amount of useful functionality, such as the ability to replace a dying or obsolete host with newer hardware by evacuating instances from it. Migrations are also the base dependency for functional resizes.
As above, the ability to evacuate a host for any physical reason is highly desirable. Furthermore, the ability for a customer to resize an instance (increasing the RAM and disk allotment) without snapshotting and creating a new instance is useful.
- As operations, I want to be able to evacuate a host with failing hardware so that I can replace the box with minimal impact to customers
- As a user, I want to be able to migrate my instance so that I can move to faster and/or newer hardware
The ability to snapshot a running XenServer instance already exists
Design & Implementation
The abstracted websequence diagram is as follows:
A few issues arise in this design. First, because of the current scheduler implementation, it isn't safe to assume that the scheduler will be aware of the ability to migrate. The "simple" scheduler implementation shows that this is possible, but I don't think we should depend on that ability, at least for the time being. The concession, as above, is to cast the message to the destination first, which simply identifies itself and proxies the message right over to the source.
We can provide for the notion of a smarter scheduler by making the initial migration call context sensitive. If we are the source and a destination argument is also present, we simply begin the Rsync. Otherwise, cast to the source, appending ourself to the message arguments. It feels a little bit inconsistent, but it would be more efficient to have a migration-aware scheduler.
First pass will be to implement it as the sequence diagram above indicates, with a second pass to add the above functionality if deemed necessary.
- No state / No Migration Instance: The destination receives the cast from the API and creates a migration instance with the requisite fields. In this stage, the destination creates a migration instance populated with the current instance attributes and new ones, setting the status to pre-migrating and then casts a message to the source.
- pre-migrating: The source has been notified of the intent to migrate the VM, which then snaps the VM and changes the state to migrating
- migrating: The source is Rsync'ing the VHD to the destination host machine. Afterwards, the migration status is set to 'migrating_step2'. The source then shuts the instance down, and then starts Rsync'ing the COW to the destination host machine. Afterwards, the source sets the migration status to post-migrating and casts a message to the destination compute
- post-migrating: The destination creates a new instance from the Instance table and transferred VHD/COW, updates the instance record with the new hostname, and then sets the migration_status to verify-migration
- verify-migration: The migration will exist in this state until a subsequent API call is made. At that point, the source compute will vm-destroy the old instance and mark the migration as status finished
- finished: This is the state the migration is in whether it's been "confirmed" or "reverted"
- reverted: The destination compute will destroy the new instance, and then cast a message to the source to power back on the old instance. The old instance attributes will be restored, and then the migration status will be set to finished
There are few different ways we could go about transferring the disk to the destination host
- Create a XAPI plugin that does the rsync'ing for us
- SSH into dom0. I'd really prefer to avoid this route, but it may be the best option depending on how long 1 would take
- We could suspend the instance instead of shutting it down, and then migrate the RAM VHD in the process so users don't lose their uptime
The Openstack API will modify the "action" endpoint to expose "resize" functionality, which is simply a migration with a larger RAM and disk quota.
Functionality already exists within the Openstack API, but returns HTTP 501 at this time. Afterwards, existing API clients should be able to successfully migrate through the "resize" functionality present in the API. Additionally, functionality will be exposed through the Admin API for migration without resizing the instance.
This need not be added or completed until the specification is nearing beta.