Jump to: navigation, search

Difference between revisions of "Manila/Replication API Design"

(New APIs)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
 +
#REDIRECT [[Manila/design/manila-mitaka-data-replication]]
 
= Design =
 
= Design =
  
Line 27: Line 28:
 
** GET /share-replicas/
 
** GET /share-replicas/
 
* List share replicas - Takes a share ID. Returns a table of replicas for that share-id with details. Must be a replicated share. Details include AZ, replica state (active, in_sync, out_of_sync) and export locations.
 
* List share replicas - Takes a share ID. Returns a table of replicas for that share-id with details. Must be a replicated share. Details include AZ, replica state (active, in_sync, out_of_sync) and export locations.
** GET /share-replicas/?share-id=<share_id>
+
** GET /share-replicas?share-id=<share_id>
 
* Show replica details - Takes a replica ID. Details include Share ID, AZ, replica state (active, in_sync, out_of_sync) and export locations.
 
* Show replica details - Takes a replica ID. Details include Share ID, AZ, replica state (active, in_sync, out_of_sync) and export locations.
 
** GET /share-replicas/<replica_id>
 
** GET /share-replicas/<replica_id>

Latest revision as of 21:14, 21 October 2015

Design

Intro

The Manila DR API will be implemented as an extension to the Manila API (not part of core) initially, because we want to prove the concept without investing heavily in support in reference implementation (generic driver) or automated testing/CI.

Replication styles

There are 3 styles of DR that we would like to support in the long run:

  1. writeable - Amazon EFS-style synchronously replicated shares where all replicas are writeable. Failover it not supported and not needed.
  2. readable - Mirror-style replication with a primary (writeable) copy and one or more secondary (read-only) copies which can become writeable after a failover.
  3. dr - Generalized replication with secondary copies that are inaccessible until after a failover.

Replica states

Each replica has a state which has 3 possible values:

  1. active - All writeable replicas are active
  2. in_sync - Passive replica which is up to date with the active replica, and can be promoted to active
  3. out_of_sync - Passive replica which has gone out of date, or new replica that is not yet up to date
  4. error - Scheduler failure during replica creation

New APIs

We will implement a Manila extension that includes several new APIs needed to support replicated shares.

  • List replicas - Returns a table of replicas with details. Details include Share ID, AZ, replica state (active, in_sync, out_of_sync) and export locations.
    • GET /share-replicas/
  • List share replicas - Takes a share ID. Returns a table of replicas for that share-id with details. Must be a replicated share. Details include AZ, replica state (active, in_sync, out_of_sync) and export locations.
    • GET /share-replicas?share-id=<share_id>
  • Show replica details - Takes a replica ID. Details include Share ID, AZ, replica state (active, in_sync, out_of_sync) and export locations.
    • GET /share-replicas/<replica_id>
  • Add share replica - Takes a share ID, an AZ, and an optional Share Network ID. The share must not already have a replica in the specified AZ. Returns replica UUID (actually share instance UUID).
    • POST /share-replicas/<share-id>
    • {'availability_zone':<availability_zone_id>, 'share':<share_id>, 'share_network':<share_network_id>}
  • Remove share replica - Takes a replica UUID. Deletes the replica, regardless of state. Must not be the only active replica.
    • DELETE /share-replicas/<replica-id>
  • Set active replica - Takes a replica UUID. Make that replica active. The state of the replica must be in_sync.
    • POST /share-replicas/<replica-id>/action
    • {'os-promote_replica': None}

New share states

  • replication_change - New transient state triggered by a change of the active replica. Access to the share is cut off while in this state.

Changes to existing APIs

  • Share type APIs will have a new user-visible extra spec - replication=writeable/readable/dr. The absence of this extra spec indicates non-replicated shares and the presence of the extra spec will indicate that the share is replicated with the given style.
  • Share create will create a replicated share if the share type is has replication extra spec. The style of replication is determined by the share type's replication vale.
  • Share list/details APIs will return the replication style (writeable, readable, dr) and a flag if more than one replica exists.
  • Create snapshot - creates snapshots of all the replicas
  • Delete share/snapshot - deletes ALL replicas of the share/snapshot
  • Migrate/retype/etc - only the primary replica is considered as the source

Network issues with multi-SVM and replication

!!OPTIONAL!!

If we choose to make replication a single-svm-only feature, the share-network API doesn't need to change. In order to support replication with share-networks, we also need to modify the share-network create API which allows creation of share networks with a table of AZ-to-subnet mappings. This approach allows us to keep a single share-network per share (with associated security service) while allowing the tenant to specify enough information that each share instance can be attached to the appropriate network in each AZ. Multi-AZ share networks would also be useful for non-replicated use cases.

Examples

Writable replication example

  1. Administrator sets up backends in AZs b1 and b2 that have capability replication=writeable
  2. Administrator creates a new share_type called foo
  3. Administrator sets replication=writeable extra spec on share type foo
  4. User creates new share of type foo in AZ b1
  5. Share is created with replication=writeable, and 1 active replica in AZ b1
  6. User grants access on share to client1 in AZ b1, obtains the export location of the replica, mounts the share on a client, and starts to write data
  7. User add new replica of share in AZ b2
  8. A second replica is created in AZ b2 which initially has state out_of_sync
  9. Shortly afterwards, the replica state changes to active (after the replica finishes syncing with the original copy)
  10. The user grants access on the share to client2 in AZ b2, obtains the export location of the new replica, mounts the share, and sees the same data that client1 wrote
  11. Client2 writes some data to the share, which is immediately visible to client1

Readable replication example

  1. Administrator sets up backends in AZs b1 and b2 that have capability replication=readable
  2. Administrator creates a new share_type called bar
  3. Administrator sets replication=readable extra spec on share type bar
  4. User creates new share of type bar in AZ b1
  5. Share is created with replication=readable, and 1 active replica in AZ b1
  6. User grants access on share to client1 in AZ b1, obtains the export location of the replica, mounts the share on a client, and starts to write data
  7. User add new replica of share in AZ b2
  8. A second replica is created in AZ b2 which initially has state out_of_sync
  9. Shortly afterwards, the replica state changes to in_sync (after the replica finishes syncing with the original copy)
  10. The user grants access on the share to client2 in AZ b2, obtains the export location of the new replica, mounts the share, and sees the same data that client1 wrote
  11. Client2 cannot write data to the share but continues to see updates

Failover/failback example

(Continued from above)

  1. An outage occurs in AZ b1
  2. Administrator sends out a bulletin about the outage to his users "the power transformer in b1 turned to slag, it will be 12 hours before it can be replaced, please bear with us, yada yada"
  3. User notices that his application on client1 is no longer running, and investigates
  4. User finds out that client1 is gone, and reads a bulletin from the admin explaining why
  5. User notes that the b1 replica is his share is still active and the b2 replica is in_sync, while the state of the share is available
  6. User calls set active replica to AZ b2 on his share
  7. The share goes to state replication_change and access to the share is briefly lost on client2
  8. The state of replica b2 changes to active and the state of replica b1 changes to out_of_sync after Manila fails to contact the original primary. The state of the share changes back to available, and access is restored on client2
  9. User starts his application on client2, and the application recovers from an apparent crash, with a consistent copy of the application's data
  10. User application is back up and running, disaster is averted
  11. Eventually maintenance on the b1 AZ is completed, and all of the equipment is reactivated
  12. Administrator sends out a bulletin to his users about the outage ending "we replaced the transformer with a better one, everything is back online, thanks for your understanding, yada yada"
  13. Manila notices the out_of_sync replica is reachable and initiates a resync, bringing the b1 replica in_sync within a short time
  14. After some time, the users reads the bulletin and observes that his share is being replicated again. He decides to intentionally move back to b1.
  15. User gracefully shuts down his application, and flushes I/O
  16. User calls set active replica to AZ b1 on his share
  17. The share goes to state replication_change and no disruption is observed because the application is shut down
  18. The state of replica b1 changes to active and the state of replica b2 changes to in_sync after Manila reverses the replication again. The state of the share changes back to available
  19. User starts his application on client1, and the application starts cleanly, having been shut down gracefully

Notes

Pool-level Replication

Some vendors have suggested that certain backends can mirror groups of shares or whole pools more efficiently than individual shares. This design only addresses the mirroring of individual shares. In the future, we may allow allow replication of groups of shares, but only if those groups are contained within a single tenant and defined by the tenant.