Manila/design/manila-liberty-consistency-groups

= Consistency Groups = This document describes the design choices and steps for the consistency group feature in core Manila.

Consistency groups are a mechanism to allow multiple filesystem shares to be guaranteed snapshots are able to be created at the exact same point in time. For example, a database may have its tables, logs, and configuration on separate volumes. If we wanted to restore the database from a previous point in time, it would only make sense to restore the logs, tables, and configuration together from the exact same point in time.

= Manila Core Changes = The core manila changes are very similar to what has been done in Cinder. At a high level, changes in the DB, share service, drivers, scheduler service and api service will need to be changed.

Phase 1:
 * Create/Delete CGs
 * Create a share within a CG
 * Delete a share that is in a CG
 * Snapshot entire CGs
 * Create CG from a CGSnapshot

Potential future features:
 * Add/remove shares in a CG
 * CG migration
 * Replication of a CG

Manila vs Cinder

 * Different CLI syntax
 * Restful api tweaks
 * The api will be an 'experimental' microversion in the Liberty release
 * Change /create_from_src to just be a cgsnapshot_id in the body on POST /consistency-groups
 * Deletes to use the DELETE HTTP verb, os-force_delete is supported via the admin actions extension (/action)
 * At the Driver API, cgsnapshot objects and cg objects (with all the information about what volumes are in the CG) need to be passed to the driver instead of just the cg/cgsnapshot name. This is important for backends that aren't managing a CG construct such as NetApp cDOT as they need to do things such as loop through the snapshots in a cg to delete them.
 * The default policy in policy.json for consistency group operations is the default policy versus in Cinder is it 'nobody'
 * Unlike Cinder, snapshots in a CGsnapshot are not the same as a normal snapshot. A CGsnapshot is treated as a single unit instead of a collection of snapshots.
 * consistency_group_support scheduler capability has multiple values instead of boolean. This is to accommodate backends that require all volumes in a CG to be on the same pool vs. just the same host. It also allows for adding additional values in the future if a backend can support some other paradigm like a CG spanning multiple backends.

User Workflows

 * Snapshot multiple shares
 * Create a Consistency Group
 * Create share(s) with a CG specified
 * Create a CGsnapshot


 * Create a CG
 * POST to /consistency-group
 * specifiy share_types or the default share type will be used


 * Copy a consistency group
 * Create a CGsnapshot
 * Create a CG with the CGSnapshot id


 * Delete a CG
 * Delete all CG snapshots of the CG
 * Delete all shares in the CG
 * Delete the CG (Only empty CGs can be deleted)


 * List shares in a given CG
 * GET /shares?consistencygroup_id=


 * List the shares that were captured in a given CGSnapshot
 * GET /cgsnapshots//members

System Workflows

 * Creating a CG
 * Create the CG in the DB
 * Cast the consistency group creation to the scheduler
 * Choose a host for the CG
 * Cast the creation to the share service
 * Get the CG info from DB
 * Call the driver
 * Update database with info returned from driver (ex: CG status)


 * Creating a CG from a cgsnapshot
 * Create a new CG in the DB
 * Create new shares from CG snapshot members in the DB
 * Schedule the CG create to the scheduler with the orig CG host/pool+share_type (depending on capability)
 * Call the driver with all info about source shares, new shares, cg, and cgsnapshot info.
 * Update CGsnapshot status and all shares status' to available


 * Creating a cg snapshot
 * Create cgsnapshot entity in DB
 * For all shares in CG, create cgsnapshot_member in DB
 * cast create_cgsnapshot to the driver


 * Adding a share to a CG
 * On share creation:
 * Cast to scheduler with the host of the CG
 * if perpool capable: Goes to the same pool as CG
 * if per backend: goes to a pool on the same host that matches a CG share type


 * Deleting a share (in a CG)
 * Not allowed if the CG has snapshots of the share ( check if the share_id is in the CGSnapshotMembers tables)

Scheduler
manila/scheduler/filter_scheduler.py should now look for consistency_group_support as a capability.

consistency_group_support default: None

Values:
 * None - No support for CGs
 * host - shares in a CG must be on pool(s) on the same host that also match the CG share type
 * pool - shares in a CG must live in the same pool as the CG

DB
class ConsistencyGroup(BASE, ManilaBase): """Represents a consistencygroup.""" __tablename__ = 'consistency_groups' id = Column(String(36), primary_key=True) user_id = Column(String(255), nullable=False) project_id = Column(String(255), nullable=False) deleted = Column(String(36), default='False') host = Column(String(255)) name = Column(String(255)) description = Column(String(255)) status = Column(String(255)) source_cgsnapshot_id = Column(String(36)) share_network_id = Column(String(36), ForeignKey('share_networks.id'),                             nullable=True) share_server_id = Column(String(36), ForeignKey('share_servers.id'),                             nullable=True) class ConsistencyGroupShareTypeMapping(BASE, ManilaBase): """Represents the share types in a consistency group""" __tablename__ = 'consistency_group_share_type_mappings' id = Column(String(36), primary_key=True) deleted = Column(String(36), default='False') consistency_group_id = Column(String(36),                                  ForeignKey('consistency_groups.id'),                                   nullable=False) share_type_id = Column(String(36),                           ForeignKey('share_types.id'),                            nullable=False) consistency_group = orm.relationship(       ConsistencyGroup,        backref="share_types",        foreign_keys=consistency_group_id,        primaryjoin=('and_('                     'ConsistencyGroupShareTypeMapping.consistency_group_id '                     '== ConsistencyGroup.id,'                     'ConsistencyGroupShareTypeMapping.deleted == "False")')    )
 * New Tables:

class CGSnapshot(BASE, ManilaBase): """Represents a cgsnapshot.""" __tablename__ = 'cgsnapshots' id = Column(String(36), primary_key=True) consistency_group_id = Column(String(36), ForeignKey('consistency_groups.id')) user_id = Column(String(255), nullable=False) project_id = Column(String(255), nullable=False) deleted = Column(String(36), default='False') name = Column(String(255)) description = Column(String(255)) status = Column(String(255)) consistency_group = orm.relationship(       ConsistencyGroup,        backref="cgsnapshots",        foreign_keys=consistency_group_id,        primaryjoin=('and_('                     'CGSnapshot.consistency_group_id == ConsistencyGroup.id,'                     'CGSnapshot.deleted == "False")')    )

class CGSnapshotMembers(BASE, ManilaBase): __tablename__ = 'cgsnapshot_members' id = Column(String(36), primary_key=True) cgsnapshot_id = Column(String(36), ForeignKey('cgsnapshots.id')) share_id = Column(String(36), ForeignKey('shares.id')) size = Column(Integer) status = Column(String(255)) share_proto = Column(String(255)) share_type_id = Column(String(36), ForeignKey('share_types.id'),                          nullable=True) user_id = Column(String(255)) project_id = Column(String(255)) deleted = Column(String(36), default='False') cgsnapshot = relationship(       ConsistencyGroup,        backref="cgsnapshot_members",        foreign_keys=cgsnapshot_id,        primaryjoin='CGSnapshotMembers.cgsnapshot_id == CGSnapshot.id') share = orm.relationship(Share, backref="cgsnapshot_members",                            foreign_keys=share_id,                             primaryjoin='and_(' 'CGSnapshotMember.share_id == Share.id,' 'CGSnapshotMember.deleted == "False")')

consistency_group_id = Column(String(36), ForeignKey('consistency_groups.id'), nullable=True) consistency_group = relationship(       ConsistencyGroup,        backref="shares",        foreign_keys=consistency_group_id,        primaryjoin='Share.consistency_group_id == ConsistencyGroup.id') source_cgsnapshot_member_id = Column(String(36), nullable=True)
 * New fields in Share

API
Schemas: https://wiki.openstack.org/wiki/Manila/design/manila-liberty-consistency-groups/api-schema

GET /consistency-groups/ GET /consistency-groups/detail POST /consistency-groups GET /consistency-groups/ DELETE consistency-groups/ PUT /consistency-groups/

GET /cgsnapshots/ GET /cgsnapshots/detail POST /cgsnapshots PUT /cgsnapshots/ GET /cgsnapshots/ GET /cgsnapshots//members DELETE /cgsnapshots/

Admin Actions POST /consistency-groups//action # os-reset-status and os-force-delete Body: {"os-reset_status": { "status": "available"}} POST /cgsnapshots//action # os-reset-status and os-force-delete Body: {"os-force_delete": null}

Updates to Share resource Adds consistency_group_id to POST/shares Adds source_cgsnapshot_member_id Add consistency_group_id query filter to /shares

Policies
policy.json - All consistency group policies should default to the default policy "consistency_group:create" : "", "consistency_group:delete": "", "consistency_group:update": "", "consistency_group:get": "", "consistency_group:get_all": "", "consistency_group:create_cgsnapshot" : "", "consistency_group:delete_cgsnapshot": "", "consistency_group:get_cgsnapshot": "", "consistency_group:get_all_cgsnapshots": "",

Driver API
update_share_stats should now return 'consistency_group_support' pool/host/None.

def create_consistency_group(self, context, cg_dict, share_server=None): """Create a consistency group.       :param context:        :param cg_dict: The consistency group details            EXAMPLE:            {'status': 'creating',             'project_id': '13c0be6290934bd98596cfa004650049',             'user_id': 'a0314a441ca842019b0952224aa39192',             'description': None,             'deleted': 'False',             'created_at': datetime.datetime(2015, 8, 10, 15, 14, 6),             'updated_at': None,             'source_cgsnapshot_id': 'f6aa3b59-57eb-421e-965c-4e182538e36a',             'host': 'openstack2@cmodeSSVMNFS',             'deleted_at': None,             'share_types': [],             'id': 'eda52174-0442-476d-9694-a58327466c14',             'name': None            }        :return: (cg_model_update, share_update_list)            cg_model_update - a dict containing any values to be updated for the CG in the database. This value may be None. """       raise NotImplementedError    def create_consistency_group_from_cgsnapshot(self, context, cg_dict,                                                 cgsnapshot_dict, share_server=None):        """Create a consistency group from a cgsnapshot. :param context: :param cg_dict: The consistency group details EXAMPLE: {'status': 'creating', 'project_id': '13c0be6290934bd98596cfa004650049', 'user_id': 'a0314a441ca842019b0952224aa39192', 'description': None, 'deleted': 'False', 'created_at': datetime.datetime(2015, 8, 10, 15, 14, 6), 'updated_at': None, 'source_cgsnapshot_id': 'f6aa3b59-57eb-421e-965c-4e182538e36a', 'host': 'openstack2@cmodeSSVMNFS', 'deleted_at': None, 'shares': [], # The new shares being created 'share_types': [], 'id': 'eda52174-0442-476d-9694-a58327466c14', 'name': None }       :param cgsnapshot_dict: EXAMPLE: {'status': 'available', 'project_id': '13c0be6290934bd98596cfa004650049', 'user_id': 'a0314a441ca842019b0952224aa39192', 'description': None, 'deleted': '0', 'created_at': datetime.datetime(2015, 8, 10, 0, 5, 58), 'updated_at': datetime.datetime(2015, 8, 10, 0, 5, 58), 'consistency_group_id': '4b04fdc3-00b9-4909-ba1a-06e9b3f88b67', 'cgsnapshot_members': [ {'status': 'available', 'share_type_id': '1a9ed31e-ee70-483d-93ba-89690e028d7f', 'share_id': 'e14b5174-e534-4f35-bc4f-fe81c1575d6f', 'user_id': 'a0314a441ca842019b0952224aa39192', 'deleted': 'False', 'created_at': datetime.datetime(2015, 8, 10, 0, 5, 58), 'share': , 'updated_at': datetime.datetime(2015, 8, 10, 0, 5, 58), 'share_proto': 'NFS', 'project_id': '13c0be6290934bd98596cfa004650049', 'cgsnapshot_id': 'f6aa3b59-57eb-421e-965c-4e182538e36a', 'deleted_at': None, 'id': '6813e06b-a8f5-4784-b17d-f3e91afa370e', 'size': 1 }            ],             'deleted_at': None, 'id': 'f6aa3b59-57eb-421e-965c-4e182538e36a', 'name': None }       :return: (cg_model_update, share_update_list) cg_model_update - a dict containing any values to be updated for the CG in the database. This value may be None. share_update_list - a list of dictionaries containing dicts for every share created in the CG. Any share dicts should at a minimum contain the 'id' key and 'export_locations'. Export locations should be in the same format as returned by a share_create. This list may be empty or None. EXAMPLE: [            {'id': 'uuid', 'export_locations': ['export_path']} ]       """        raise NotImplementedError    def delete_consistency_group(self, context, cg_dict, share_server=None):        """Delete a consistency group :param context: :param cg_dict: :return: cg_model_update cg_model_update - a dict containing any values to be updated for the CG in the database. This value may be None. """       raise NotImplementedError    def create_cgsnapshot(self, context, snap_dict, share_server=None):        """Create a consistency group snapshot. :param context: :param snap_dict: :return: (cgsnapshot_update, member_update_list) cgsnapshot_update - a dict containing any values to be updated for the CGSnapshot in the database. This value may be None. member_update_list - a list of dictionaries containing for every member of the cgsnapshot. Each dict should contains values to be           updated for teh CGSnapshotMember in the database. This list may be           empty or None. """       raise NotImplementedError

def delete_cgsnapshot(self, context, snap_dict, share_server=None): """Delete a consistency group snapshot       :param context:        :param snap_dict:        :return: (cgsnapshot_update, member_update_list)            cgsnapshot_update - a dict containing any values to be updated            for the CGSnapshot in the database. This value may be None.        """ raise NotImplementedError

Manila Client
cg-snapshot-create  Creates a cgsnapshot. cg-snapshot-delete  Removes one or more cgsnapshots. cg-snapshot-list    Lists all cgsnapshots. cg-snapshot-show    Shows cgsnapshot details. cg-create Creates a consistency group (--cgsnapshot to create from existing cg snapshot) cg-delete Removes one or more consistency groups. cg-list   Lists all consistencygroups. cg-show   Shows details of a consistency group. cg-update Updates a consistencygroup.
 * Do we want to alias the commands so that there are Cinder consistent commands and the improved names? (I.E cg-create = consisgroup-create) Perhaps we alias the cinderclient commands to be more like this?

create --consistency-group Creates a share and puts it into consistency group

= Outstanding Questions / FAQ =
 * 1) How do we avoid confusion if no backends in a deployment support CGs?
 * 2) * It appears that Cinder has the policies default to 'nobody' but this seems odd, is there a better way?
 * 3) ** (ameade) I can't think of a way to have devstack set the policies automatically either.
 * 4) ** (ameade) we have chosen to go with an os-consistency-groups extension for now
 * 5) Do we need a way to fake CGs with the generic driver?
 * 6) * NO, but it will not be a core feature if it cannot be supported in the generic driver.
 * 7) * Ben says that we should fake it in the generic driver so it can be tested by the gate.
 * 8) Do we need a safeguard for deleting shares that are in a CG? (I.E. protected shares?) Are shares in a CG more valuable than normal shares?
 * 9) Should we overload the Snapshots table to include snapshots in a CGSnapshot to avoid adding a new db table? Or continue with having a new table?
 * 10) Why not have the cg delete all of the shares in it when deleting the cg?
 * 11) * This could cause odd state to occur if the backend fails when deleting shares and it will not be obvious where it failed. This is also extra complexity in the implementation that could instead be hidden by a client.

= Notes on Cinder Impl =


 * A consistency group may be specified on volume create
 * When a consistency group is created it is scheduled to a pool and all volume created in the CG will go to that pol

Cinder cli: cgsnapshot-create  Creates a cgsnapshot and snapshots of every volume in the CG that show up in snapshot-list. cgsnapshot-delete  Removes one or more cgsnapshots. cgsnapshot-list    Lists all cgsnapshots. cgsnapshot-show    Shows cgsnapshot details. <-- /cgsnapshots/detail?all_tenants=1&name=blah consisgroup-create Creates a consistency group. consisgroup-create-from-src Creates a consistency group from a cgsnapshot filled with volumes from the snapshots in the cg. consisgroup-delete Removes one or more consistency groups. consisgroup-list   Lists all consistencygroups. consisgroup-show   Shows details of a consistency group. <-- /consistencygroups/detail?all_tenants=1&name=blah consisgroup-update Updates a consistencygroup. create --consisgroup-id

Cinder API: GET /consistencygroups/detail POST /consistencygroups {'consistencygroup':{ 'name': , 'description': , 'volume_types': , 'availability_zone': , }} POST /consistencygroups/create_from_src {'consistencygroup-from-src':{ 'name': , 'description': , 'cgsnapshot_id': , }} GET /consistencygroups/ POST consistencygroups/<id>/delete {"consistencygroup": {"force": false}} PUT /consistencygroups/<id> {'consistencygroup':{ 'name': , 'description': , 'add_volumes': , 'remove_volumes': , }} GET /cgsnapshots/detail GET /cgsnapshots/<id> POST /cgsnapshots {'cgsnapshot':{ 'consistencygroup_id': , }} DELETE /cgsnapshots/<id> Adds consistencygroup_id to POST/volumes

Xings commit message: 1) Create a CG, specifying all volume types that can be supported by this   CG. The scheduler chooses a backend that supports all specified volume types.    The CG will be empty when it is first created.   Backend needs to report    consistencygroup_support = True.  Volume type can have the following in    extra specs: {'capabilities:consistencygroup_support': '<is> True'}.    If consistencygroup_support is not in volume type extra specs, it will be    added to filter_properties by the scheduler to make sure that the scheduler    will select the backend which reports consistency group support capability.    Create CG CLI:    cinder consisgroup-create --volume-type type1,type2 mycg1    This will add a CG entry in the new consistencygroups table.    2) After the CG is created, create a new volume and add to the CG. Repeat until all volumes are created for the CG.   Create volume CLI (with CG): cinder create --volume-type type1 --consisgroup-id <CG uuid> 10 This will add a consistencygroup_id foreign key in the new volume entry in the db. 3) Create a snapshot of the CG (cgsnapshot).   Create cgsnapshot CLI:    cinder cgsnapshot-create <CG uuid>    This will add a cgsnapshot entry in the new cgsnapshots table, create    snapshot for each volume in the CG, and add a cgsnapshot_id foreign key    in each newly created snapshot entry in the db.

Questions

 * 1) Why does policy.json default to 'nobody' being able to perform CG commands?
 * 2) Why are volume types specified when creating a CG?
 * 3) * This is because all volumes within a CG need to be on the same backend in order for the backend to honor the CG. This is important to note in the docs so that conflicting volume-types are not specified.
 * 4) What occurs when a volume on a different backend is added to a CG via the API?
 * 5) * Currently the API ensures that the volume volume_type matches a supported volume_type of the CG and that the CG and volume are on the same host (but can be on different pools).
 * 6) Is a volume created with a CG going to end up on the same host?
 * 7) * Yes, if a volume has a CG it will be cast to the same pool as the CG.
 * 8) * What if the pool is full but there are other available pools on the same backend? Should this fail?
 * 9) What happens if a driver does not support create_from_src?
 * 10) * The CG goes to ERROR a and volumes are created from the snapshot and also in ERROR status
 * 11) * A volume in a consistency group cannot be deleted and a CG with a volume cannot be deleted, therefore a volume must be removed from the CG before it can be deleted. But what if you don't support modifying a CG?
 * 12) Why do we need a force delete for CGs?
 * 13) What do I do if I do not want to see Snapshots in a CG when doing snapshot-list?
 * 14) * I suppose API filtering could allow for someone to not see all of the snapshots if they do not want to.

Cinder Bugs to file

 * 1) What happens when you force delete a CG?
 * 2) * The volumes remain but still think they are in a CG, which means you still cannot delete the volume
 * 3) consisgroup-create-from-src says cgsnapshot is and optional param in cinderclient
 * 4) Creating a volume in a CG may not end up in the cg. The only check that is performed is that the volume-type of the volume is supported by the CG but volume-types are not 1-to-1 with backends. There could be backend A and B that both match with volume type 1. A cg is created with volume type 1 and ends up on backend A. Then a volume is created with the CG specified but ends up on backend B

Main impl
299b99437c2de84222fd585f06c2d05ca213225b ConsistencyGroup: Return 400 instead of 500 for invalid body adb4c80be82caacad83f1366a4b34e5653fd5dab Create Consistency Group from CG Snapshot API 1a62a6e60fda73bf31256fbf684dc03bb6cf0038 Modify Consistency Group API fc7a4384be2ccf8ff24f6c0f72c681ad9133801a Add tests for consistency groups DB migration 9082273305b0b9c117eb677a0e34c2ffde4e66f0 Volume types need to be specified when creating CG 764c5ec749821d36bb0215dc6002d3caea95d3b1 Delete consistency group failed cf961f83ac323dfad1fa5e227d1e502a17529ecc Consistency Groups

Ex driver impl
8d5b795b37ed021c2639066689645f8aa0b1012f PureISCSIDriver consistency group updates. 39c1a8e08dc854e22ada315a46c20c99df2facf8 Add support to PureISCSIDriver for Consistency Groups 92a817708a3938b1b734d2caaa206b310996d8d0 EMC VMAX driver Kilo update