Jump to: navigation, search

Manila/design/manila-mitaka-data-replication

< Manila‎ | design
Revision as of 17:53, 21 October 2015 by Gouthamr (talk | contribs) (Gouthamr moved page Manila/Replication Design to Manila/design/data-replication-design: maintaining consistency)

Manila Disaster Recovery / Data Replication Design

Introduction

Replicated shares will be implemented in Manila without adding any new services or any new drivers. The code for creating replicated shares and for adding/removing replicas and other replication-related operations will all go into the existing share drivers. The most significant change required to allow this will be that share drivers will be responsible for communicating with ALL storage controllers necessary to achieve any replication tasks, even if that involves sending commands to other storage controllers in other AZs.

While we can't know how every storage controller works and how each vendor will implement replication, this approach should give driver writers all the flexibility needed to implement what they need with minimal added complexity in the Manila core. There are already examples of drivers reading config sections for other drivers when a operation requires communicating with 2 backends. The Manila Share Manager itself is expected to communicate all necessary backend details for share replicas that exist across AZs to a backend which it requests an operation upon.

Supported Replication Styles

There are 3 styles of DR that we would like to support in the long run:

  1. writable - Amazon EFS-style synchronously replicated shares where all replicas are writable. Promotion it not supported and not needed.
  2. readable - Mirror-style replication with a primary (writable) copy and one or more secondary (read-only) copies which can become writable after a promotion of the secondary.
  3. dr - Generalized replication with secondary copies that are inaccessible until after a promotion of one of the secondary copies.

Replica States

Each replica has a replica_state which has 4 possible values:

  1. active - All writable replicas are active.
  2. in_sync - Passive replica which is up to date with the active replica, and can be promoted to active.
  3. out_of_sync - Passive replica which has gone out of date, or new replica that is not yet up to date.
  4. error - Either the scheduler failed to schedule this replica or some irrecoverable damage occurred during promotion of a non-active replica. Drivers may set the replica_state to error if some irrecoverable damage to the replica is discovered at any point during its existence. (through the periodic replica_state update)

New share state

replication_change - New transient state triggered by a change of the active replica. Access to the share is cut off while in this state.

User Workflows

Creating a share supporting replication

  • Administrators can create a share type with extra-spec replication, specifying the style of replication the backend supports.
  • Users can use the share type to create a new share that allows/supports replication.
  • A replicated share always starts out with one replica instance, the share itself. This should not be confused as the share having a replica already. The user can verify if the share has any replicas by requesting the details of the share.

Creating a replica

  • POST to /share-replicas

User has to specify the share name/id of the share that is supposed to be replicated, an availability zone for the replica to exist in and optionally a share_network_id. According to existing design, replicas of a share cannot exist in the same availability zone as the share itself.

  • A newly created share_replica starts out in out_of_sync state and might transition to in_sync state when the driver reports this state.

Listing and showing replicas

  • GET from /share-replicas

User can list all replicas and verify their status and replica_state.

  • GET from /share-replicas?share_id=<share_id>

User can list replicas for a particular share.

  • GET from /share-replicas/<id>

User can view details of a replica.

Promoting a non-active replica

  • POST to /share-replicas/action with body {'os-promote_replica'}

For replication styles that permit promotion, the user can promote a replica with in_sync replica_state to active replica_state by initiating the promote_replica call.

  • Only administrators can attempt promoting replicas with replica_state error or out_of_sync.

Deleting Replicas

  • DELETE to /share-replicas/<id>

The user can delete a replica. The last active replica cannot be deleted using this delete_replica call.

System Workflows

  • Creating a share that supports replication
    • Same process as creating a new share.
    • The replication extra-spec from the share-type is copied over to the shares data on the DB.
    • Scheduler uses the replication extra-spec to filter available hosts and schedule the share on an appropriate backend that supports that specific style of replication.
  • Creating a replica:
    • Create a share instance on the database.
    • Update the share instance replica_state to out_of_sync.
    • Cast a call to the scheduler to find an appropriate host to schedule the replica on.
    • In the scheduler, find weighted host for the replica.
      • If host cannot be chosen, update the replica's status and replica_state on the Database to error. Throw an exception.
    • Cast the create_share_replica call to the driver based on the weighted host selection.
    • Prior to invoking the driver's call, collate information regarding existing active replica instance, existing share access rules and the replica's share_server, pass these to the driver while invoking create_replica.
    • The driver may return export_locations and replica_state. If they are returned, the database is updated with these values.
    • If the replication style is writable, the driver MUST return the replica_state set to active.
    • All the access rules for the new replica are set to active state on the database.
    • The driver may throw an exception, update the status and replica_state of the replica to error.
  • Listing/showing shares:
    • For shares that support replication, the result of listing/showing the share will have new fields, replication: denoting the replication style supported and has_replicas to denote if the share has replicas.
    • For replicated shares, the primary instance for the share must be preferred among those replicas that have a status replication_change or among those that have a replica_state set to active.
  • Listing replicas
    • For listing, Grab share instances from the database that have a replica_state among {in_sync, active, out_of_sync, error}.
    • If share_id is provided, grab only share instances that belong to that share and have a replica_state among {in_sync, active, out_of_sync, error}.
    • Limits and offsets must be respected on the list calls.
  • Promoting a non-active replica:
    • If replica is already active, do nothing.
    • Update the status of the replica being promoted to replication_change.
    • Grab all available replicas and invoke the appropriate drivers promote_replica call by passing the available replicas and the new replica.
    • If the driver throws an exception at this stage, there is a good chance that the replicas are somehow altered on the backend. Loop through the replicas and set their replica_states to error and leave the status unchanged. Also set the status of the replica that failed to promote to available as before this operation. The backend may choose to update the actual replica_state during the replica monitoring call.
    • The driver may return an updated list of replicas. Update the export_locations and replica_states to the database.
    • The status of the replica that was promoted should return to available from replication_change.
  • Periodic replica update:
    • The share manager implements a looping call with default interval of 5 minutes to query from each driver and each backend the replica_state of all non-active replicas that are associated with them.
    • The driver is allowed to set the replica_state to in_sync, out_of_sync, and in exceptional cases, error.
    • If the driver sets the replica_state of a replica to error, it is assumed that some irrecoverable damage has occured to the replica instance. The status of the replica instance must be set to error as well.
  • Deleting a share replica
    • If the replica had no host, it is simply removed from the database.
    • The status of the replica is set to deleting on the database and the appropriate driver method is called to delete the replica.
    • If the driver fails to delete the replica, the status of the replica is updated to error_deleting on the database.
    • If the driver returns without errors, the replica instance is removed from the database.

Scheduler Impact

  • The host_manager must update the replication style capability from the backend. The backends must report this capability at the host/pool level.
  • The filter_scheduler must then match the replication capability with the capability that the replication share_type demands.
  • The replication style must be one of the styles mentioned above.

DB Impact

  • Share Export Locations:

Preferred export locations will only be from the instances with replica_state set to active.

 def export_locations(self):
     # TODO(gouthamr): Return AZ specific export locations for replicated
     #  shares
     # NOTE(gouthamr): For a replicated share, export locations of the
     # 'active' instances are taken, if 'available'.
     all_export_locations = []
     select_instances = list(filter(
         lambda x: x['replica_state'] == constants.REPLICA_STATE_ACTIVE,
         self.instances)) or self.instances
     for instance in select_instances:
         if instance['status'] == constants.STATUS_AVAILABLE:
             for export_location in instance.export_locations:
                 all_export_locations.append(export_location['path'])
   return all_export_locations
  • Share Instance:

Preferred instance will be an instance with status set to replication_change or any instance with status set to available and replica_state set to active.

 def instance(self):
  # NOTE(gouthamr): The order of preference: status 'replication_change',
  # followed  by 'available' and 'creating'. If replicated share and
  # not undergoing a 'replication_change', only 'active' instances are
  # preferred.
  result = None
  if len(self.instances) > 0:
      order = [constants.STATUS_REPLICATION_CHANGE,
               constants.STATUS_AVAILABLE, constants.STATUS_CREATING]
      other_statuses = ([x['status'] for x in self.instances
                         if x['status'] not in order])
      order.extend(other_statuses)
      sorted_instances = sorted(
          self.instances, key=lambda x: order.index(x['status']))
      select_instances = sorted_instances
      if (select_instances[0]['status'] !=
              constants.STATUS_REPLICATION_CHANGE):
          select_instances = (
              list(filter(lambda x: x['replica_state'] ==
                          constants.REPLICA_STATE_ACTIVE,
                          sorted_instances)) or sorted_instances
          )
      result = select_instances[0]
  return result
  • New field on Share:
 replication = Column(String(255), nullable=True)
  • New field on ShareInstance
 replica_state = Column(String(255), nullable=True)

API Design

 GET /share-replicas/
 GET /share-replicas?share-id=<share_id>
 GET /share-replicas/<replica_id>
 POST /share-replicas/<share-id>
 Body:
 {
   'share_replica':
   {
     'availability_zone':<availability_zone_id>,
     'share_id':<share_id>,
     'share_network_id':<share_network_id>
   }
 }
 POST /share-replicas/<replica-id>/action
 Body:
 {'os-promote_replica': null}
 DELETE /share-replicas/<replica-id>

Policies

policy.json - All replication group policies should default to the default policy

   "share_replica:get_all": "rule:default",
   "share_replica:show": "rule:default",
   "share_replica:create" : "rule:default",
   "share_replica:delete": "rule:default",
   "share_replica:promote": "rule:default"


Driver API

 def create_replica(self, context, active_replica, new_replica,
                  access_rules, share_server=None):
     """Replicate the active replica to a new replica on this backend.
     :param context:Current context
     :param active_replica: A current active replica instance dictionary.
         EXAMPLE:
          .. code::
         {
         'id': 'd487b88d-e428-4230-a465-a800c2cce5f8',
         'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
         'deleted': False,
         'host': 'openstack2@cmodeSSVMNFS1',
         'status': 'available',
         'scheduled_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'launched_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'terminated_at': None,
         'replica_state': 'active',
         'availability_zone_id': 'e2c2db5c-cb2f-4697-9966-c06fb200cb80',
         'export_locations': [
             <models.ShareInstanceExportLocations>._as_dict()
         ],
         'share_network_id': '4ccd5318-65f1-11e5-9d70-feff819cdc9f',
         'share_server_id': '4ce78e7b-0ef6-4730-ac2a-fd2defefbd05',
         'share_server': <models.ShareServer>._as_dict() or None,
         }
     :param new_replica: The share replica dictionary.
         EXAMPLE:
          .. code::
         {
         'id': 'e82ff8b6-65f0-11e5-9d70-feff819cdc9f',
         'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
         'deleted': False,
         'host': 'openstack2@cmodeSSVMNFS2',
         'status': 'available',
         'scheduled_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'launched_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'terminated_at': None,
         'replica_state': 'out_of_sync',
         'availability_zone_id': 'f6e146d0-65f0-11e5-9d70-feff819cdc9f',
         'export_locations': [
             models.ShareInstanceExportLocations._as_dict()
         ],
         'share_network_id': '4ccd5318-65f1-11e5-9d70-feff819cdc9f',
         'share_server_id': 'e6155221-ea00-49ef-abf9-9f89b7dd900a',
         'share_server': <models.ShareServer>._as_dict() or None,
         }
     :param access_rules: A list of access rules that other instances of
     the share already obey.
     EXAMPLE:
          .. code::
          [ {
          'id': 'f0875f6f-766b-4865-8b41-cccb4cdf1676',
          'deleted' = False,
          'share_id' = 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
          'access_type' = 'ip',
          'access_to' = '172.16.20.1',
          'access_level' = 'rw',
          }]
     :param share_server: <models.ShareServer>._as_dict() or None,
     Share server of the replica being created.
     :return: (export_locations, replica_state)
     export_locations is a list of paths and replica_state is one of
     active, in-sync, out-of-sync or error.
     A backend supporting 'writable' type replication should return
     'active' as the replica_state.
     Export locations should be in the same format as returned by a
     share_create. This list may be empty or None.
         EXAMPLE:
         .. code::
             [{'id': 'uuid', 'export_locations': ['export_path']}]
     """
     raise NotImplementedError()


   def delete_replica(self, context, active_replica, replica,
                  share_server=None):
     """Delete a replica.
     :param context:Current context
     :param active_replica: A current active replica instance dictionary.
         EXAMPLE:
          .. code::
         {
         'id': 'd487b88d-e428-4230-a465-a800c2cce5f8',
         'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
         'deleted': False,
         'host': 'openstack2@cmodeSSVMNFS1',
         'status': 'available',
         'scheduled_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'launched_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'terminated_at': None,
         'replica_state': 'active',
         'availability_zone_id': 'e2c2db5c-cb2f-4697-9966-c06fb200cb80',
         'export_locations': [
             models.ShareInstanceExportLocations._as_dict()
         ],
         'share_network_id': '4ccd5318-65f1-11e5-9d70-feff819cdc9f',
         'share_server_id': '4ce78e7b-0ef6-4730-ac2a-fd2defefbd05',
         'share_server': <models.ShareServer>._as_dict() or None,
         }
     :param replica: Dictionary of the share replica being deleted.
         EXAMPLE:
          .. code::
         {
         'id': 'e82ff8b6-65f0-11e5-9d70-feff819cdc9f',
         'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
         'deleted': False,
         'host': 'openstack2@cmodeSSVMNFS2',
         'status': 'available',
         'scheduled_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'launched_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'terminated_at': None,
         'replica_state': 'in_sync',
         'availability_zone_id': 'f6e146d0-65f0-11e5-9d70-feff819cdc9f',
         'export_locations': [
             models.ShareInstanceExportLocations._as_dict()
         ],
         'share_network_id': '4ccd5318-65f1-11e5-9d70-feff819cdc9f',
         'share_server_id': '53099868-65f1-11e5-9d70-feff819cdc9f',
         'share_server': <models.ShareServer>._as_dict() or None,
         }
     :param share_server: <models.ShareServer>._as_dict() or None,
     Share server of the replica to be deleted.
     :return: None.
     """
     raise NotImplementedError()


   def promote_replica(self, context, replica_list, replica, access_rules, share_server=None):
     """Promote a replica to 'active' replica state.
     :param context:Current context
     :param replica_list: List of all replicas for a particular share.
     This list also contains the replica to be promoted. The 'active'
     replica will have its 'replica_state' attr set to 'active'.
         EXAMPLE:
          .. code::
         [
             {
             'id': 'd487b88d-e428-4230-a465-a800c2cce5f8',
             'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
             'replica_state': 'in-sync',
                 ...
             'share_server_id': '4ce78e7b-0ef6-4730-ac2a-fd2defefbd05',
             'share_server': <models.ShareServer>._as_dict() or None,
             },
             {
             'id': '10e49c3e-aca9-483b-8c2d-1c337b38d6af',
             'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
             'replica_state': 'active',
                 ...
             'share_server_id': 'f63629b3-e126-4448-bec2-03f788f76094',
             'share_server': <models.ShareServer>._as_dict() or None,
             },
             {
             'id': 'e82ff8b6-65f0-11e5-9d70-feff819cdc9f',
             'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
             'replica_state': 'in-sync',
                 ...
             'share_server_id': '07574742-67ea-4dfd-9844-9fbd8ada3d87',
             'share_server': <models.ShareServer>._as_dict() or None,
             },
             ...
         ]
     :param replica: Dictionary of the replica to be promoted.
         EXAMPLE:
          .. code::
         {
         'id': 'e82ff8b6-65f0-11e5-9d70-feff819cdc9f',
         'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
         'deleted': False,
         'host': 'openstack2@cmodeSSVMNFS2',
         'status': 'available',
         'scheduled_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'launched_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
         'terminated_at': None,
         'replica_state': 'in_sync',
         'availability_zone_id': 'f6e146d0-65f0-11e5-9d70-feff819cdc9f',
         'export_locations': [
             models.ShareInstanceExportLocations._as_dict()
         ],
         'share_network_id': '4ccd5318-65f1-11e5-9d70-feff819cdc9f',
         'share_server_id': '07574742-67ea-4dfd-9844-9fbd8ada3d87',
         'share_server': <models.ShareServer>._as_dict() or None,
         }
     :param access_rules: A list of access rules that other instances of
     the share already obey.
     EXAMPLE:
          .. code::
          [ {
          'id': 'f0875f6f-766b-4865-8b41-cccb4cdf1676',
          'deleted' = False,
          'share_id' = 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
          'access_type' = 'ip',
          'access_to' = '172.16.20.1',
          'access_level' = 'rw',
          }]
     :param share_server: <models.ShareServer>._as_dict() or None,
     Share server of the replica to be promoted.
     :return: updated_replica_list or None
         The driver can return the updated list as in the request
         parameter. Changes that will be updated to the Database are:
         'export_locations' and 'replica_state'.
     :raises Exception
         This can be any exception derived from BaseException. This is
         re-raised by the manager after some necessary cleanup. If the
         driver raises an exception during promotion, it is assumed
         that all of the replicas of the share are in an inconsistent
         state. Recovery is only possible through the periodic update
         call and/or administrator intervention to correct the 'status'
         of the affected replicas if they become healthy again.
     """
     raise NotImplementedError()


     def update_replica_status(self, context, replica, share_server=None):
       """Update the status and replica_state of a replica.
       Drivers should fix replication relationships that were broken if
       possible inside this method.
       :param context:Current context
       :param replica: Dictionary of the replica being updated.
           EXAMPLE:
            .. code::
           {
           'id': 'd487b88d-e428-4230-a465-a800c2cce5f8',
           'share_id': 'f0e4bb5e-65f0-11e5-9d70-feff819cdc9f',
           'deleted': False,
           'host': 'openstack2@cmodeSSVMNFS1',
           'status': 'available',
           'scheduled_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
           'launched_at': datetime.datetime(2015, 8, 10, 0, 5, 58),
           'terminated_at': None,
           'replica_state': 'active',
           'availability_zone_id': 'e2c2db5c-cb2f-4697-9966-c06fb200cb80',
           'export_locations': [
               models.ShareInstanceExportLocations._as_dict()
           ],
           'share_network_id': '4ccd5318-65f1-11e5-9d70-feff819cdc9f',
           'share_server_id': '4ce78e7b-0ef6-4730-ac2a-fd2defefbd05',
           }
       :param share_server: <models.ShareServer>._as_dict() or None
       :return: replica_state
           replica_state - a str value denoting the replica_state that the
           replica can have. Valid values are 'in_sync' and 'out_of_sync'
           or None (to leave the current replica_state unchanged).
       """
       raise NotImplementedError()

Manila Client

   share-replica-create Creates a replica
   share-replica-delete Remove one or more share replicas.
   share-replica-list  List share replicas.
   share-replica-promote Promote replica to active replica.
   share-replica-show  Show details about a replica.

FAQs and Unanswered Questions

  1. How do we deal with Network issues in multi-SVM replication?
 If we choose to make replication a single-svm-only feature, the share-network API doesn't need to change. In order to support replication with share-networks, we also need to modify the share-network create API which allows creation of share networks with a table of AZ-to-subnet mappings. This approach allows us to keep a single share-network per share (with associated security service) while allowing the tenant to specify enough information that each share instance can be attached to the appropriate network in each AZ. Multi-AZ share networks would also be useful for non-replicated use cases.
  1. Do we support Pool-level Replication?
 Some vendors have suggested that certain backends can mirror groups of shares or whole pools more efficiently than individual shares. This design only addresses the mirroring of individual shares. In the future, we may allow allow replication of groups of shares, but only if those groups are contained within a single tenant and defined by the tenant.
  1. From where to where do we allow replication? Is it intra-cloud or inter-cloud? Do we allow replication to something that's not managed by Manila?
 Intra-cloud. Replicating to something outside of Manila allows a bit more freedom, but with significantly less value, because there's practically nothing we can do to automate the failover/failback portion of a disaster. For use cases involving replication outside of Manila, we would need to involve other tools with more breadth/scope to manage the process.
  1. Who configures the replication? The admin? The end user? The manila scheduler?
 The end user. In the original design we presumed that the actual replication relationships should be hidden from the end user, but this doesn't match well with the concept of AZs that we are adding to Manila. If the users need to have control over which AZ the primary copy of their data lives in, then they also need to control where the other copies live. This means that the administrator's job is to ensure that for any share type that is replicated, it can be replicated from any AZ to any other AZ.
  1. Is there no way to achieve non-disruptive failover?
 (bswartz) I would love to find out that our initial intuition here is wrong, because it would change a lot of aspects of the design. It's worth spending time to brainstorm and research possibilities in this area. So far the most promising ideas involve:
 ** Using VirtFS to mediate filesystem access and achieving non-disruptive failover that way
 ** Using some kind of agent inside the guests to mediate file access
  1. Can a replica have a different share_type? There is a valid use case where a user would want to create a share replica on backend with different capabilities than the one the original share resides on. For instance, replicas might need to be on a less expensive backend. In that case, can the replica have a different share_type altogether?
 "Currently", we inherit the share_type of the share and believe that replication has to be on symmetric terms, where both backends have similar capabilities.
  1. Can we allow the driver to restrict replication support between available backends? Backends may support replication to other compatible backends only. Hence, they must report some sort of information to the scheduler so that when creating a replica for an existing share, the scheduler would use that information to schedule the creation of the replica. What information should this be?
 (gouthamr) We're investigating including 'driver_class_name' in a ReplicationFilter, including the possibility of backend reported configuration for 'replication_partners'.
  1. How are access rules persisted across replicas/share instances?
    • Do all replicas have the same access rules applied?
 (gouthamr): This is Currently being pursued
    • Should access rules be applied only to "active" replicas?
  1. How does migration affect replicated shares?

Implementation Progress

Core Work

 for Core API/Scheduler implementation
 for Client implementation

Ex Driver Implementation

 for cDOT Driver Implementation