[ DRAFT ]
- Created: 25th November 2013
- Author: Stephen Gordon
Currently the Block Storage (Cinder) service only allow a volume to be attached to one instance at a time. The Compute Service (Nova) also makes assumptions in a number of places to this effect as do the APIs, CLIs, and UIs exposed to users. This specification aims to outline the changes required to allow users to share volumes between multiple guests using either read-write or read-only attachments.
There have been several discussions about adding this type of functionality over the Grizzly and Havana cycles. This page is intended to link together those discussions and provide a place for recording future consensus on any and all outstanding issues with the design backing these blueprint(s).
In addition to the noted blueprints these references were consulted in framing this page:
- Design summit minutes:
- Cinder meeting logs:
- Mailing list threads:
Traditional cluster solutions rely on the use of clustered filesystems and quorum disks, writeable by one or more systems and often read by a larger number of systems, to maintain high availability. Users would like to be able to run such clustered applications on their OpenStack clouds. This requires the ability to have a volume attached to multiple compute instances, with some instances having read-only access and some having read-write access.
- Read-only volume support in Cinder and Nova (read-only-volumes).
- Users must be able to explicitly define a volume as "shareable" at creation time.
- Users must be able to attach a "shareable" volume to multiple compute instances, specifying a separate mode (read-write or read-only) for each attachment. That is, some attachments to a volume may be read-only, while other attachments to the same volume may be read-write.
- While Cinder will track the mode of each attachment restriction of write access must be handled by the Hypervisor drivers in Nova.
- Normal reservations should be required (and enforced) for volumes that are not marked as shareable.
- Multi-attach must be implemented as an extension to core functionality in Cinder. A proposal for how extensions should be limited in future is being framed separately but in practice this currently means that:
- An extension must not modify the schema of existing tables.
- An extension may introduce a table of its own.
- An extension may inject itself at defined points in the existing core task flow.
- Administrators and users are ultimately responsible for ensuring data integrity is maintained once a shared volume is attached to multiple instances in read-write mode, however such attachments must only occur as the result of an explicit request.
- Horizon support is not crucial to "Phase I" implementation of this feature, but must be considered and properly tracked as a potential future addition.
- Instances are to expect the same level of read/write consistency as provided by iSCSI LVM, no more and no less.
All existing volumes must automatically be marked, or assumed to be marked, as non-shareable. User impact is therefore expected to be minimal except for users explicitly using this new feature.
This need not be added or completed until the specification is nearing beta.
initial patchset submitted by Charlie Zhou likely need further discussion and iteration to ensure they meet the requirements outlined above. In particular additional changes would be required to support explicit marking volumes as shareable, the current patch assumes all volumes are shareable and also effectively removes the reservation system previously introduced to correct 1096983 as a result.
- New volume_attachment table:
Column('id', String(length=36), primary_key=True, nullable=False), Column('volume_id', String(length=36), ForeignKey('volumes.id'), nullable=False), Column('instance_uuid', String(length=36)), Column('attached_host', String(length=255)), Column('mountpoint', String(length=255)), Column('attach_time', DateTime), Column('detach_time', DateTime), Column('attach_status', String(length=255)), Column('attach_mode',String(255)), Column('created_at', DateTime), Column('updated_at', DateTime), Column('deleted_at', DateTime), Column('deleted', Boolean)
- Currently we save 'attached_mode' in volume's admin_metadata table (r/o-attach change did), under mutli-attach an attached_mode should be related to an attachment but volume, so we will move attached_mode to volume_attachment table as a column.
- Column 'volume_id', 'instance_uuid' and 'attached_host' will be an unique constraint as a composite index for volume_attachment table.
- New exceptions:
- TBD based on Cinder implementation, read-only volume support in the Libvirt/KVM driver has merged.
- Adding mode argument to OpenStack API volumes extension (volume-attach, v2, v3) and novaclient:
- Volumes screen:
- Reflect multiple attachments in the Attached To column/field.
- Reflect the mode of each attachment (ro or rw).
- Reflect whether a volume is "shareable" or not.
- Volume Detail screen:
- As per requirements for Volumes screen.
- Create Volume dialog:
- Allow the marking of the volume as "shareable".
- Edit Attachments dialog:
- Allow the addition of further attachments to shareable volumes that have already been attached to an instance.
- Allow setting of the attachment mode.
Comments and Discussion
These issues or questions are outstanding and without resolution will block implementation of this proposal:
- A determination needs to be made with regards to what to resolve the conflict between the overall volume status and the status of individual attachment ('attach_status').
- Current volume status set:
- Attachment: attaching, in-use, detaching
- Basic: creating, available, deleting, deleted
- Misc: uploading, extending, awaiting-transfer
- Error: error, error_deleting, error_attaching, error_detaching, error_extending, error_restoring
- Current volume attachment status set:
- attached, detached
- Proposal (from zhiyan and thingee):
- Adding attaching and detaching to volume attachment status set.
- The priority of volume status determination is in-use, attaching, detaching.
- Volume status = in-use if any of the attachments are in attached status, even if one of the attachments is in an attaching or detaching status.
- Volume status = attaching if none of attachments are in a in-use status, but one of them is in an attaching status.
- Volume status = detaching if none of attachments are in an attaching status, but one of attachments is in a detaching status.
- what about the determination for error_attaching, error_detaching?
- Current volume status set:
- If multi-attach is determined to be extension functionality, then how to implement as an extension of the core attachment functionality?
- In the discussion on the shared-volume blueprint itself it was suggested that volumes should have to be explicitly marked as shareable to allow multi-attachment, in addition to later discussion about failing the attach if no mode is specified. Is there consensus that a "shareable" marker is required? Currently this proposal assumes the answer is yes.
- Are there additional issues to watch out for when snapshotting a shared volume?
- Wont work for QCOW2 disks on Libvirt/KVM only RAW - do other Hypervisors have any similar restrictions for sharing of volumes with read-write?
- Are there any consistency requirements/expectations at all for multi-attach? Filesystems that can use multi-attach, like GFS, have certain requirements of the underlying storage. If you aren't a simple iSCSI mount, those requirements become important (DuncanT).
- I think we should state that generally volumes have a single writer. We're allowing you to do multiple, but it is up to you to get them to coordinate as though they were a single writer. (caitlin56).
- 'coordinate' has no meaning without a /lot/ more detail, what 'coordiation' is required is dependent on your specific application. It is also totally and utterly dependent on the facilities provided by the block device, When flushes and barriers happen and such (DuncanT).
- FibreChannel(FC) use case (kmartin) - The use case for why this is to support ESX clusters in a FC environment, as a number of cinder drivers support FC today. The ESX Cluster can be made up of one or more ESX hosts. In the majority of customer setups, a ESX Cluster is made up of multiple ESX hosts. Multi-attach in this case means attaching the same cinder volume with read/write access to multiple hosts of the same cluster. It's my understanding the nova ESX(VMware) hypervisor(along with other OpenStack hypervisors, KVM libvirt) driver supports FC attaches thus the need for FC multi-attach. This FC use case would be the same an iSCSI use case, it just using a different attach method. The 'coordination' as mentioned above is handled by the ESX Cluster software. (kmartin)
- Where to store shareable flag under extension model?