Manila/Replication Use Cases

Answered Questions

Q: From where to where do we allow replication? Is it intra-cloud or inter-cloud? Do we allow replication to something that's not managed by Manila?

A: Intra-cloud. Replicating to something outside of Manila allows a bit more freedom, but with significantly less value, because there's practically nothing we can do to automate the failover/failback portion of a disaster. For use cases involving replication outside of Manila, we would need to involve other tools with more breadth/scope to manage the process.

Q: Do we support unplanned failovers? Do we support planned failovers? Are failovers disruptive or not?

A: Failovers can be planned or unplanned, but they are always disruptive at the data level. With application integration, they could be made nondisruptive at the application level, but unfortunately we've chosen no to use any intermediary technology (like virtfs) in the data path, which means we have no options for non disruptive failovers.

Q: Who configures the replication? The admin? The end user? The manila scheduler?

A: The end user. In the original design we presumed that the actual replication relationships should be hidden from the end user, but this doesn't match well with the concept of AZs that we are adding to Manila. If the users need to have control over which AZ the primary copy of their data lives in, then they also need to control where the other copies live. This means that the administrator's job is to ensure that for any share type that is replicated, it can be replicated from any AZ to any other AZ.

Q: Who triggers a failover? Is it a manual button the admin presses? Can Manila failover automatically? If so, when? Can the end users control failovers at all?

A: Failovers are manual, triggered by either an administrator or a user. Generally speaking it's more appropriate for the administrator to initiate a failover because the administrator has more knowledge about the nature of an outage. However, it's also essential for end users to test failovers so they need the capability to initiate failovers of shares themselves.

Q: During replication (before failover) is the secondary even visible/accessible?

A: Yes, but possibly with significant limitations. Some backends may not support accessing the secondary side of a replicated share. Some backends may allow access, but read-only. We know of at least one backend that can support write access to the secondary (in which case calling it a secondary isn't really accurate because it's more of an active-active relationship). Amazon's EFS has the model of active-active replication so it's something we don't want to disallow.

Q: What is the granularity of the failover? Whole backend? Single pools?

A: Individual shares. There's no technical reason to prevent failover/failback on share-by-share basis. To make the administrator's life easier, we also have to support whole-backend failover (could be essential to minimize downtime in an actual disaster). The ability to do single-share failover is nice because it allows testing of the DR system without triggering an outage that affects users, since failovers are disruptive.

Unanswered Questions

There are some major unanswered questions (or areas of investigation).

1) Is there no way to achieve nondisruptive failover? I would love to find out that our initial intuition here is wrong, because it would change a lot of aspects of the design. It's worth spending time to brainstorm and research possibilities in this area. So far the most promising ideas involve:

Using VirtFS to mediate filesystem access and achieving non-disruptive failover that way
Using some kind of agent inside the guests to mediate file access

2) How do we deal with recovery after a disaster and failover? Assuming a successful failover, and a repair of the original primary, failing back will cause another outage. How can we orchestrate that to minimize pain and suffering?

3) Assuming the disruptive aspect of failovers is unavoidable, how can we invest to make them less painful at the application level? Application quiescing and mount automation could make failovers nondisruptive for a least a select few applications.

4) Can the replica have a different share_type?

There is a valid use case where a user would want to create a share replica on backend with different capabilities than the one the original share resides on. For instance, replicas might need to be on a less expensive backend. In that case, can the replica have a different share_type altogether?

"Currently", we inherit the share_type of the share and believe that replication has to be on symmetric terms, where both backends have similar capabilities.

5) Can we allow the driver to restrict replication support between available backends?

Backends may support replication to other compatible backends only. Hence, they must report some sort of information to the scheduler so that when creating a replica for an existing share, the scheduler would use that information to schedule the creation of the replica. What information should this be?

6) How are access rules persisted across replicas/share instances?

Do all replicas have the same access rules applied? (Currently being pursued)
Should access rules be applied only to "active" replicas?

7) How does migration affect replicated shares?

8) API Endpoints:

Current Endpoint Design: https://wiki.openstack.org/wiki/Manila/Replication_API_Design#New_APIs

9) What export locations should appear when showing a replicated share?

All export locations from available "active" replicas (Currently being pursued)
Export locations of active instances
Export locations that are within the AZ of the active instance

10) Where do we store the "replication_change" status?

On the instance "status" for the instance we are promoting (Currently being pursued)
On all the replica instances of the replicated share
As a new status on the share itself

11) 'replica_state' field should be 'healthy' or 'error' as current options 'in-sync' and 'out-of-sync' may signify synchronous and async replication.

Discussions with Ben in IRC
- "the idea is that your redefine in-sync to mean "within the RPO guarantee of the async replication""
- The user should know if it's async or sync replication by the 'description' in the share type.
- "the main thing that in-sync/out-of-sync is supposed to communicate is whether the replica is "good enough" that you could fail over to it"

12) 'replication' field on the share should qualify whether the replication is sync or async, so the options should be ["readable_sync", "readable_async", "writable", "dr_sync", "dr_async"]

Discussions with Ben in IRC
- The user should know if it's async or sync replication by the 'description' in the share type.