Difference between revisions of "Manila/Replication Design Notes"
< Manila
Line 1: | Line 1: | ||
{{OldDesignPage}} | {{OldDesignPage}} | ||
− | ===Current Design page is at [[Manila/design/data-replication | + | ===Current Design page is at [[Manila/design/manila-mitaka-data-replication]]=== |
== Intro == | == Intro == |
Latest revision as of 21:17, 21 October 2015
Contents
Current Design page is at Manila/design/manila-mitaka-data-replication
Intro
The design for replication isn't complete. We have a vision for the feature, and we're trying to define the details. After collecting feedback we will reformat this into a design doc.
Threats to data
- Hardware failures
- Network failures
- Power failures
- Natural disasters (fire, flood, hurricanes, meteors)
- Accidental corruption (bugs, human error)
- Malicious users (viruses, hackers, disgruntled employees)
Solution for protecting data
- Highly available storage systems
- Strategies
- RAID/Erasure coding (protection from media failures)
- Clusters (protection from component failures)
- Multipath network topologies (protection from connection failures)
- Redundant power (protection for power failures)
- Advantages
- Transparent to clients
- Zero RPO/RTO (except for maybe a brief pause)
- Disadvantages
- Typically limited distance (weak against site-wide failures)
- Manila
- HA storage solutions fit into Manila today without changes
- Use "share types" to indicate certain storage backends are highly available
- Strategies
- Backups
- Strategies
- Tape archive (in the old days)
- Virtual tape archive (Amazon Glacier or similar)
- Local snapshots (standard Cinder/Manila features)
- Remote snapshots (copy snapshot to object store, like Cinder backup)
- Advantages
- Can be very cheap
- Stores multiple points in time (protection from corruption/malicious destruction)
- Disadvantages
- RPO typically high
- Local snapshots don't protect against equipment/site failures
- Remote snapshots typically have to be restored before they become accessible -- high RTO
- Manila
- Local snapshots implemented today
- Remote snapshots (aka "backup") planned for future
- Strategies
- Replication
- Strategies
- Synchronous mirroring
- Asynchronous mirroring
- Advantages
- Can handle much longer distances
- Can offer very low RPO/RTO
- Disadvantages
- Not transparent to network clients
- Manila
- This is what we're proposing!!!
- Strategies
Overview of the proposal
- Start from user experience
- If it doesn't address a user's problem, then the rest of the design is pointless
- Also think about administrator's needs and responsibilities
- Consider vendors/driver authors' and practical issues
- Design is intentionally open-ended to make it as easy as possible for vendors to implement
User Experience
- Users will be able to create "replicated" shares and non-replicated shares by specifying a share type
- All existing shares are non-replicated
- Administrator must specifically create share types that include replication extra_spec
- Open question -- should the "replication" extra spec be visible to tenants?
- We could do this similarly to how driver_handles_shares_servers is visible to tenants
- Alternatively: rely on administrator to communicate which types are replicated and rely on the "replicated" attribute appearing on the shares after they're created
- Open question -- what should the "replication" extra spec be called?
- Vendors should be free to offer additional capabilities for different types of replication, just
- There must be a standard capability/extra_spec that controls the Manila replication feature though
- Replicated shares will have a replicated=true flag returned by the API
- Replicated shares will also have a replication_state field
- In Sync - stable state - share data is being replicated to 1 or more secondary controllers
- Out Of Sync - stable state - share data is NOT being replicated
- Resyncing - transitional state - backend is trying to reestablish replication
- Failing Over - transitional state - share is changing to a different primary
- Two new tenant-visible APIs
- Failover
- Can only be called on shares in the In Sync state
- Causes existing export locations to be removed (must unmount first to avoid data loss)
- Causes share to go into Failover Over state
- Causes new export location to appear (presumably on a different storage controller in another location)
- Expected to succeed whether storage controller hosting the share is online or not
- After a successful failover the share may be in 2 states:
- In sync -- if the primary storage controller was online and the backend was able to reverse the replication from the secondary
- Out of sync -- maybe the primary storage controller was offline, or the backend wasn't able to immediately establish replication again from the secondary
- Resync
- Can only be called on shares in the Out Of Sync state
- Causes shares to go into Resyncing state
- Causes backend to attempt to reestablish replication (if possible)
- On success, share goes to In Sync state
- On failure, share goes back into Out Of Sync state
- This would be expected as long as the primary remains down
- Failover
- Replicated shares will have a primary_location=True/False flag
- Indicates if the share is being served by the original (primary) storage controller
- After failing over, this field would be set to False to indicate that the share is being served by a secondary storage controller
- Secondary locations may not have all of the capabilities of the primary
- For example, the share_type may specify SSD disks extra_spec, but the secondary storage controller may have spinning disks
- This is up to the administrator to configure how he wants
- Manila doesn't schedule the secondary location, so this should be okay
- If the share is not being served by the primary storage controller, a failover should always attempt to move it back to the primary, if possible
- This proposal allows replication to more than 1 place (at the administrator's option, if the backend allows it)
- Users aren't aware of how many replications locations there are or which one their share is at -- they only know if it's at the primary or not
Administrator Experience
- Administrator's job today
- Install/configure hardware
- Understand physical layout of infrastructure
- Understand network connections and logical topology of infrastructure
- Think about failure domains and contingencies in case of failures
- Today if a storage controller hosting Manila fails, there's not much an admin can do other than try to get it back online
- Configure Manila
- Setup storage controllers
- Install software
- Configure backends in manila.conf (typically hostnames, logins, passwords, etc)
- Administrator's new responsibilities with Manila DR
- Choose primary/secondary sites for replication
- Could be between racks, between aisles, between floors, between buildings, between cities, or between continents
- Decide whether to do symmetric (active/active) or asymmetric (active/passive) replication
- Individual shares always have a primary (accessible) and secondary (inaccessible) location
- Active/active refers to having 2 controllers where some primaries are on each one and they replicate to each other
- Active/passive refers to have all the primaries one controller and all of the secondaries to the other
- Find a driver that supports replication
- It is very important for generic driver to support replication
- We want to offer this functionality to everyone
- It's needed for the gate to be able to test this feature
- Looking for volunteers to help with the generic driver enhancement
- It is very important for generic driver to support replication
- Setup hardware with sufficient bandwidth to accommodate mirroring
- Configure Manila
- No new config flags for replication
- Each driver can decide how replication relationships should be expressed
- Assume that replication will most likely be between same-vendor backends
- Could be as simple as 1 new config option with a list of names of other backends that can be replicated to
- It would be a really good idea to have an HA configuration of Manila in the case that a site failure could affect controller nodes
- Respond to outages
- Administrators typically have significantly more information than tenants about the actual infrastructure
- Administrators should communicate with their tenants in the event of an outage
- If the administrator decides that failover is appropriate given the nature of the outage, he can/should initiate it
- Sometimes an outage may be brief enough that waiting for the primary to come back is better than failing over
- This is one reason we don't propose automated failover
- Open question: how can we optimize failing over a large number of shares?
- Users can initiate a failover on their own, but we believe that would only be wise for testing purposes
- Sometimes an outage may be brief enough that waiting for the primary to come back is better than failing over
- Fix outages and recover
- At the end of an outage, administrator should Resync all Out Of Sync shares
- Open question: how can we optimize resyncing a large number of shares?
- Users should be notified that the outage has ended and is it safe to fail back to the primary
- Administrator should not fail back shares unilaterally
- Failing over shares causes a brief loss of connection
- Better to let the user choose the least disruptive time
- At the end of an outage, administrator should Resync all Out Of Sync shares
- Permanent outages
- Sometimes outages are so long that it makes more sense to pick a new replication site instead of reconstructing the primary
- Destruction of the building due to fire/flood/tornado/meteor
- Admin/user does a failover to secondary, share goes to Out Of Sync
- Administrator changes the list of replication relationships in manila.conf and restarts manila-share, invokes update_replication API
- Shares move to resyncing state (update replication is like resync++)
- Eventually share becomes In Sync again
- If the current location is not the new primary location, the user may failover to the new primary
- Sometimes outages are so long that it makes more sense to pick a new replication site instead of reconstructing the primary
- Choose primary/secondary sites for replication
Driver Maintainers / Vendor Concerns
- Replication is not a required feature
- It only has to work if the backend advertises the "replication" capability
- Still only 1 database row/1 UUID per share
- Only 3 new DB fields
- Replicated=true/false
- Replication state=In Sync/Out Of Sync/Resyncing/Failing Over
- Primary_location=true/false
- Drivers should store any needed information about share replication using driver private data feature
- Driver have 3 new methods
- failover_share
- Called after the manager deletes the existing export_locations
- Manager sets the share state to Failing Over before invoking this method
- Driver should do whatever is necessary to make the secondary accessible
- The primary may still be accessible, or it may not
- Failover is expected to succeed in both cases
- Driver should return new export location in a model update
- Driver MAY update the share's host field, if a different backend should own the share after the failover
- Driver MAY reinitialize replication in the reverse direction immediately if the primary is accessible
- Driver should update replication state to In Sync or Out Of Sync using a model update
- In Sync indicates the failover was successful and replication was reestablished in the reverse direction
- Out Of Sync indicates the failover was successful but replication was NOT reestablished
- On failure, the share goes into ERROR state
- resync_share
- Manager sets the share state to Resyncing before invoking this method
- Driver should attempt to establish replication again
- Driver should update replication state to In Sync or Out Of Sync using a model update
- In Sync indicates the resync was successful
- Out Of Sync indicates the resync failed
- update_share_replication
- Admin only API
- Informs driver that the topology has changed, and obsolete relationships should be cleaned up and new ones created
- Driver should set a new primary_location if the old primary_location isn't part of the replication relationship anymore
- Primary_location should only change when the replication topology changes
- Open question: how does the driver know which location to make the primary?
- Also does everything else that resync does
- failover_share
- Changes to existing methods
- create_share
- Manager will set replication=true on share if share type has that extra spec
- Driver should setup replication as needed and should set the replication state to In Sync in the model update
- ensure_share
- This method is called for each share on driver startup
- In addition to other cleanup, shares with a replication state of Resyncing or Failing Over should be set to a stable state
- create_share
- Drivers have a lot of flexibility
- Alternative topologies
- Replicate to more than 1 other site
- Fan-out replication or replication chains
- Secondary backends
- Two Manila backends can replicate to eachother
- One Manila backend can manage two controllers
- A backend could have a list of possible replication destinations and choose one (but no involvement from Manila schedueler)
- Synchronous/Asyncronous RPO/RTO times
- All options about different types of replication can be (driver-specific) backend capabilities
- Driver can be aggressive or lazy about repairing broken replication relationships
- The point of all this flexibility is to enable a wide variety of technologies to fit the design
- Alternative topologies
API
- Three new REST APIs
- User facing: failover, resync
- Admin facing: update replication
- Validate share state and invoke manager RPC
- Create
- Set the replicated state depending on the share_type
- Additional fields for share views
- Replicated=true/false
- Replication state
- Primary_location=true/false
Scheduler
- No changes
- Existing extra_specs/capabilities logic ensures that appropriate backends are chosen for replicated shares
- Add RPCs for failover/resync/update_replication
- Implement appropriate replication state changes before calling driver methods
- Clear export_locations before failover
- Add new driver entry points
- Validate model updates regarding replication states
- Make sure that changing a share's host field doesn't cause problems