Jump to: navigation, search

BlockDeviceConfig

Revision as of 14:21, 26 August 2013 by Ndipanov (talk | contribs) (Havana implementation notes)

Block Device Configuration

This proposal is tracked in blueprint improve-block-device-hadnling

Controls for block device configuration

There are a variety of independent attributes that it is desirable to control when configuring block devices for a virtual machine in OpenStack

Source type

  • Glance image
  • Cinder volume
  • Ephemeral file

Destination type

  • local file
  • Cinder volume

Guest format

  • ext4 (or other FS formats)
  • ISO
  • Swap (different format per guest OS)
  • None

Device type

  • Disk
  • CDROM
  • Floppy
  • Flash (MMC)

Disk bus

  • USB
  • IDE
  • VirtIO
  • SCSI

Shutdown action

  • Delete
  • Preserve

Boot order

  • Index

In addition to the above parameters, it is neccesary to support a "disk path" parameter, to allow the user to request a specific block device path in the guest eg /dev/sda. In general this cannot be supported by all hypervisors and thus its usage should be discouraged in favour of allowing the guest OS / hypervisor to choose paths. For the sake of compatibility with existing tools / APIs though, it must be supported

Command line syntax for block config

The 'nova boot' command currently uses a combination of the '-image' and '-block-device-mapping' command line args to configure storage. The latter is mostly a clone of the same named arg from the EC2 commands, and is not a very well designed syntax from the POV of extensibility.


Feedback Sam

It may be better to have default shutdown preserve for dest=volume and default remove for dest=local. Take for example:

  • -block type=glance,id=XXXXXXX,bus=ide,bootindex=2,dest=volume (Should preserve default)
  • -block type=cinder,id=XXXXXXX,bus=ide,bootindex=2,dest=volume (Should preserve default)

Both examples should be preserved, because the destionation is a volume. We shouldn't care about what the source type is imo. If type is cinder the dest has to be volume also, so as long as dest=volume the storage should be preserved.

Would this be possible?

  • - block type=cinder,id=XXXXXXX,bus=ide,bootindex=2,dest=file


One new parameter to define block device mappings and three shortcurts have been added to the 'nova boot' command:

block-device

Define a block device mapping with all the possible parameters

  • --block-device source=image,dest=volume,id=XXXXXXX,bus=ide,bootindex=2
  • --block-device source=volume,dest=volume,id=XXXXXXX,bus=ide,type=cdrom,bootdex=1
  • --block-device source=blank,dest=local,format=swap,size=50,bus=ide,type=floppy

The only compulsory argument ought to be the source + id, eg the image or volume ID. Everything else should have sane defaults filled in by the hypervisor driver in Nova.

The parameters for block-device and the allowed values are:

  • source=image|snapshot|volume|blank
  • dest=volume|local
  • id=XXXXXX (a volume|image|snapshot UUID if using source=volume|snapshot|image)
  • format=swap|ext4|...|none (to format the image/volume/ephemeral file; defaults to 'none' if omitted)
  • bus=ide|usb|virtio|scsi (hypervisor driver chooses a suitable default if omitted)
  • device=the desired device name (e.g. /dev/vda, /dev/xda, ...)
  • type=disk|cdrom|floppy|mmc (defaults to 'disk' if omitted)
  • bootindex=N (where N is any number >= 0, controls the order in which disks are looked at for booting)
  • size=NN (where NN is number of GB to create type=emphemeral image, or the size to re-size to for type=glance|cinder)
  • shutdown=preserve|remove

boot-volume

Shortcurt to boot directly from a volume (source=volume, dest=volume, boot_index=0, shutdown=preserve). Only one is allowed and cannot be used together with --image and --snapshot:

  • --boot-volume <volume_id>

snapshot

Shortcurt to boot directly from an snapshot (source=snapshot, dest=volume, boot_index=0, shutdown=preserve). Only one is allowed and cannot be used together with --image and --volume:

  • --snapshot <snapshot_id>

swap

Shortcurt to add a swap disk to an instance on boot by specifying its size (source=blank, dest=local, boot_index=-1, shutdown=remove, format=swap). Only one is allowed.

  • --swap <swap size in MB>

ephemeral

Shortcurt to add an ephemeral disk to an instance on boot by specifying its size and format (source=blank, dest=local, boot_index=-1, shutdown=remove). Multiple are allowed:

  • --ephemeral size=<size in GB>,format=<ext3, ext4, ...>

DB data model and migration

Currently the data that we keep associated with a Block Device is:

  • id
  • instance_uuid
  • device_name - See the following section.
  • delete_on_termination - can be kept too
  • virtual_name - See the following section.
  • snapshot_id - See the following section
  • volume_id - See the following section
  • volume_size
  • no_device - it was used to override the image supplied block devices. This seems like it is in need of re-engineering but is not high priority.
  • connection_info - json string stored as a result of volume_api.attach

The new fields we will add:

  • image_id - See the following section
  • source_type - See the following section
  • dest_type - See the following section
  • guest_format - See the following section
  • device_type - See the following section
  • disk_bus - See the following section
  • boot_index - See the following section
  • user_label - we may want to add this and expose it in the guest filesystem for convenience

Field details

virtual_name

This field used to keep values like 'ephemeral0' and 'swap' (all other values are currently disregarded, and you can only pass this through image mapping at the moment - cli does not support it). When migrating we will use this field to infer source_type/guest_format but will most likely drop the field itself.

device_name

We will likely keep this one, however the migration may need to reassign the value. There is an issue with device name that comes from the fact that not all hypervisors guarantee that they will respect the user defined device name. This in turn means that the device file on the instance and the device_name in the db may or may not match. However, there are places in Nova code where we try to use devices from the db to make decisions (for example when attaching). We should either

  1. Make device name the final decision of the hypervisor and expose a driver method that will assign devices to each block device and update the DB. This will make user assigned values merely suggestions, but will make sure the DB is in sync with reality and make this data safe to use outside of the driver/compute node
  2. Remove this field from the DB and treat it a s an internal implementation detail that can only be used in the virt driver. This might break backwards compatibility.

snapshot_id and volume_id

We will keep these and we will be adding an image_id field as part of this change. This will be tied in with the source_type field which we can maybe omit as it ce What is worth to note here is that a block device with a snapshot_id will result in a volume being created and thus will have both snapshot and volume ids.

New fields:

source_type

We seem to agree that this can be one of the following:

  • image
  • volume
  • snapshot
  • blank

The migration will have to add one image block device for each image backed instance. We may want to keep the image_ref in the instance table too. It will also need to change

dest_type

This is one of:

  • local
  • volume

The migration can set this to volume if volume_id or snapshot_id is defined, and local in every other case, or the code could handle empty dest_type by the same rule.

guest_format

The semantics of this would be to tell nova how/if to format the image on boot/attach, so we will migrate most to none apart from swap and ephemeral which we can default to CONF.default_ephemeral_format to be consistent

device_type

At the moment we can make distinction only between disk and cdrom. So if the instances was started from an image that has the meta 'disk_format' property set to 'iso'. This All others will default to disk.

disk_bus

We can use some sane defaults for the info we currently have - or we can leave empty and allow driver to update the DB

boot_index

This parameter makes no sense for swap/ephemeral devices, during migration we will this to 1 for either the image if instance was image backed or the first volume/snap. We might wanna make this a nullable field.

Additional config options

max_local_block_devices

Since the new format will allow to specify devices that will be created as images on the hypervisor (if the hypervisor enables it) it, in order to prevent a DOS a new config option 'max_local_block_devices' was added. This option allows the operator to set a limit on the number of destination_type='local' block devices a user can specify per VM.

API data model and backwards compat issues

An API extension was added to the v2 Nova API to handle the new BDM format when booting, named appropriately os-block-device-mapping-v2. The provided API samples can be found at https://github.com/openstack/nova/tree/master/doc/api_samples/os-block-device-mapping-v2-boot. As can be seen from the attached samples - the new format has

Backward compatibility issues are mostly mitigated by adding a completely new element to the servers object (block_device_mapping_v2 instead of block_device_mapping) in the POST request, and currently (API v2) Nova will handle both syntaxes (they cannot be mixed though). As of V3 - we plan to completely remove the old syntax.

Havana implementation notes

As we have decided to somewhat narrow the scope of this blueprint for the Havana release cycle, this section will try to outline some of the limitations when compared to the above plan, and emphasise some features.

Visible Features

Defaulting device names

When using the new Block device mapping format - it is not required to specify device_name anymore (it is actually discouraged, especially when using the libvirt driver). Nova will chose appropriate default values based on other data supplied with the block device.

Command line shortcuts

As a convenience - several shortcuts were added to the nova cli that will make supplying common patterns of block devices easier,

  • Booting from an image, volume or snapshot has a special boot syntax now so no need to use --block-device (--image, --boot-volume, --snapshot)
  • attach a swap disk on boot (--swap).
  • attach an ephemeral disk on boot (--ephemeral).

Boot index

Boot index needs to be specified properly per instance, and there needs to be exactly one block_device with boot_index set to 0. The nova API will inforce this and the boot command will fail if boot indexes are not properly set for all block devices required for an instance. If using --image or --boot_volume - this will be automatically set by the nova client, if using the --block-device syntax - it will need to be specified.

Libvirt specific features

The Libvirt driver in Nova will act on the following fields if set:

  • guest_format - Libvirt will honor this for swap only at the moment - this field is actually the only way to supply the swap
  • device_type - If supplied - libvirt will honour it and may provide different device names based on it. Currently valid values 'disk', 'cdrom' and 'floppy'.
  • disk_bus - If supplied - libvirt will honour it, if only device_type is supplied - disk bus will be defaulted based on it. Currently valid values are dependant on the underlying hypervisor used by libvirt

source_type and destination_type

Currently we support only a subset of possible combinations.

  • Destinations:
    • local is supported only if source is 'blank' and if it's a boot image (specified by --image boot argument) - all other combination will be ignored by the compute service. There is a blueprint raised for the next cycle to add the possibility to specify more images with target - local for the libvirt driver (see: https://blueprints.launchpad.net/nova/+spec/libvirt-image-to-local-bdm).
    • volume is supported now for all types including image - so it is possible to have a Glance image that will be downloaded to a Cinder volume and attached to an instance.
  • Sources:
    • All are allowed but some have limited options that are considered (blanks can only have local destinations, images currently only volume as outlined above).