XenAPI

Warning
 * 1) !wiki caution

This is the original blueprint for the XenAPI development. It is now 18 months out of date. It is only here for historical purposes now.

If you want documentation about how Xen``Server / XCP works with Open``Stack today, please see http://wiki.openstack.org/XenServer/XenXCPAndXenServer.

---


 * Launchpad Entry: NovaSpec:austin-xenapi
 * Created: 7 September 2010
 * Last updated: 16 October 2010
 * Contributors: Ewan Mellor

Summary
XenServer: Commercial, supported product from Citrix.

Xen Cloud Platform (XCP): Open-source equivalent of XenServer (and the development project for the toolstack). Everything said about XenServer below applies equally to XCP.

XenAPI: The management API exposed by XenServer and XCP.

xapi: The primary daemon on XenServer and Xen Cloud Platform; the one that exposes the XenAPI.

This specification covers Nova support for XenServer and XCP through XenAPI. Note that this does not imply support for other Xen-based platforms such as those shipped with RHEL 5 or SUSE.

Release Note
Nova may now use XenServer or Xen Cloud Platform as a virtualization platform.

Assumptions
The OpenStack project will not ship the XenAPI.py module. Users will need to get that from http://wiki.xensource.com/xenwiki/XCP_SDK or http://community.citrix.com/cdn/xs/sdks.

Design
The pre-existing interface between Nova and the virtualization platform is through libvirt, with various calls to libvirt (and virsh) from Nova's compute and monitoring layers. There is also a libvirt simulation module, for unit testing without making calls to the a platform.

In order to add XenAPI support, we propose a new layer (nova.virt) that abstracts over the calls to libvirt. Through this abstract interface, the main body of Nova can make virtualization calls without reference to any particular platform. This interface will then be implemented by modules for libvirt (nova.virt.libvirt_conn), XenAPI (nova.virt.xenapi), or a simulator(nova.virt.fake).

One particular desire is for nova-compute to be able to run in an unprivileged domain. This means that it receives a separate VCPU from domain 0 and can be scheduled independently from it, and can be restarted instantly in the event of a failure. It means that Nova can choose its own version of Python, which is important as XenServer domain 0 is still using Python 2.4 as standard. It also means that any vulnerability in nova-compute does not expose domain 0, which won't protect against most mischief-making (once you've compromised nova-compute you can wreak all sorts of havok) but will protect against the most dangerous and subtle attacks, such as quietly modifying the block traffic of every VM on the host. To achieve that, virt.xenapi will communicate with XenAPI over its remote interface only. Anything that actually needs to run in domain 0, such as streaming disk images from nova-objectstore or Glance, will be implemented as a xapi plugin.

Future
The Austin release of XenAPI support will not include the following features. These are scheduled for Bexar:


 * Glance integration
 * nwfilter-style multi-tenant networking

Discussion
Soren Hansen asks: "I was a bit surprised to learn that libvirt actually supports XenAPI on its own. As such, I'm unsure why we need a completely separate driver for XenAPI?

"I think we'd benefit a lot by consolidating on libvirt. After all, it's meant to be /the/ open source, virtualisation abstraction toolkit. If you have specific functionality you feel is missing from libvirt in order for it to be useful, can you perhaps enumerate these shortcomings so that we perhaps can throw some development resources at that?"

Ewan Mellor responds:

With libvirt's aim to be a common, generic API for all virtualization platforms, it is guaranteed that there will be some features that they do not support. To quote from http://libvirt.org/goals.html:

"the goal of libvirt: to provide a common generic and stable layer to securely manage domains on a node.

...

This implies ... that some very specific capabilities which are not generic enough may not be provided as libvirt APIs."

This is fine: it is great that someone is trying to provide this common layer for the start/stop/monitor/migrate use cases, and where all you need is what libvirt can provide it's obviously easiest to use it. It's not merely esoteric things that are missing though: there are plenty of things that libvirt does not support that are absolutely core to XenAPI's design, and that are very useful to Nova. libvirt's support for XenAPI is no more than superficial. libvirt's standard model differs from XenAPI fundamentally in a number of areas, making them incompatible to all intents and purposes.

Examples:

Pools: XenAPI has the concept of a pool of hosts, which represents a locking domain for decisions such as access to shared storage. The topology of this arrangement (the hosts that belong to the pool, the storage that is shared by all hosts in a pool, the shared network infrastructure between hosts in a pool) is represented explicitly in the XenAPI. libvirt has no such concept. This will be important for Nova when we look at migration scenarios (say for load balancing, or host maintenance) because the pool is the domain within which a live migration can be performed without moving the VM disk.

Storage 1: libvirt has a number of storage types hardcoded within it (see http://libvirt.org/formatstorage.html#StoragePool). A few of these map to similar storage types on XenServer, but none are close to the advanced storage types that we have, so you end up either lying about the storage type at the libvirt layer, or refusing to represent certain types. This includes even the default local storage on XenServer, so this isn't an esoteric corner case. Also, XenAPI includes an extension mechanism to allow new storage types to be supported by the API. This can't be done with libvirt (the types are hardcoded in the virStoragePoolType enum). This will be important to Nova as we look at implementing equivalents to EBS, or even the existing AoE support.

Storage 2: libvirt's storage model is weak and arguably broken when it comes to off-host management. libvirt often requires the caller to know the local path to a particular storage element, so you can end up talking about '/dev/sdb' across the wire. The XenAPI point of view is that this should never happen; it is not the responsibility of a remote client to know host-local paths. To achieve this, XenServer has a storage management layer which handles attach and detach calls, and maintains the mapping between datacentre-global IDs (hostnames, IQNs, LUNs, WWNs, etc) and the corresponding local devices. Handling storage in this managed manner is impossible using libvirt.

Errors: The XenAPI uses an error reporting scheme that is designed for programmatic analysis and internationalization. It passes back a string error code, plus a list of strings giving parameters for that error. This allows a client to respond intelligently to errors, or to internationalize error messages correctly. For example, the response ["HOST_DISABLED", ""] allows a client to identify the error and the object involved, and to highlight that object on the user interface, or it can simply be turned into "Server is already disabled", or the equivalent in another language. libvirt does have internationalization, but on the library side, using gettext. This is fine for a client application, but not much use for a server such as Nova. This isn't a big deal for Nova right now, because the user-facing APIs aren't great in this regard either. In the long run, I hope to enhance the user-facing APIs so that good internationalizable errors are passed all the way to the client.

Host plugins: The XenAPI has an extension mechanism that allows one to install a Python script (usually, but it can be any executable) on the host side, and then call that through the XenAPI. This is used in the current virt.xenapi implementation to stream disks from nova-objectstore onto the host. Calling these plugins would be impossible if using libvirt. Having this mechanism available means that your host-side installation can be lighter (a script, rather than an agent) and you don't need a separate RPC mechanism for it, making the overall implementation simpler. This is particularly important for Nova as we integrate Glance, and probably will also be used to implement some of the more interesting ideas around network configuration and isolation.

Tasks / Jobs: XenAPI's task class returns information about a running task (i.e. asynchronous operation). libvirt has the same concept exposed as a job. However, libvirt's jobs are tied to a particular domain, so there can only be one operation happening per domain, and host-level asynchronous operations cannot be represented. XenAPI can perform host-level operations asynchronously too, so Nova can run these without having to tie up a thread. In particular, host plugins can be called asynchronously, which we're using for the disk streaming.

VM state: libvirt has a virDomainState enum containing a variety of domain states. As discussed on a thread on the Nova list a little while ago, these states are a mishmash of various conditions: some are power states (whether the VM is running, suspended, paused, or halted), one is a transition between states (whether the VM has been asked to shut down gracefully, but strangely there isn't one for when the VM has been asked to reboot gracefully or suspend), and one is a reason for entering a power state (whether the VM crashed). The remaining one is whether the VM is blocked, which is a per-VCPU property, and in any case has no place in a remote API where the retrieval timescale far exceeds the granularity on which this state would exist. XenAPI separates out the three major concerns into power states, ongoing tasks (covering reboot and suspend as well as shutdown), and last-shutdown-reasons (covering internal and external shutdown or reboot commands as well as crashing). The per-VCPU statistics on running/runnable/blocked (as well as other, more useful and subtle statistics) are collected in aggregate form (mentioned below) which is much more useful than any attempt to use a single virDomainState. This is such a fundamental part of libvirt's design that I can't see it being fixed, even though it's such a fundamental flaw. The quality of any implementation based on libvirt is hobbled by this design flaw.

Monitoring 1: libvirt includes API calls for collecting block and network statistics on a per-device basis. These are single counters, giving the current value, which means that the client has to poll all the time if it wants to collect aggregated statistics. Nova does precisely this in monitor.py, and then aggregates them into RRDs for eventual collection by a client. This is an expensive way to collect statistics. XenAPI includes a request for aggregated RRDs (using the rrdxport format, over HTTP). This means that client requests can be passed down to the host directly, and there is no need to poll the hosts and store the intermediate data elsewhere. Nova will be able to take direct advantage of XenServer's RRDs, and reduce host load by eliminating the polling.

Monitoring 2: libvirt includes only basic block and network I/O stats, plus a fairly comprehensive domain-memory stats block. XenServer has a far richer collection of statistics, including host-level as well as guest-level statistics, and subtle statistics about VM behaviour (such as the proportion of time that a VM was wasting cycles stuck in a lock, indicating a concurrency hazard). These statistics are all open to Nova to collect, whereas going through libvirt would limit you to the basic set.