Jump to: navigation, search

Manila/Replication Design Notes

Warning.svg Old Design Page

This page was used to help design a feature for a previous release of OpenStack. It may or may not have been implemented. As a result, this page is unlikely to be updated and could contain outdated information. It was last updated on 2015-10-21

Current Design page is at Manila/design/manila-mitaka-data-replication

Intro

The design for replication isn't complete. We have a vision for the feature, and we're trying to define the details. After collecting feedback we will reformat this into a design doc.

Threats to data

  • Hardware failures
  • Network failures
  • Power failures
  • Natural disasters (fire, flood, hurricanes, meteors)
  • Accidental corruption (bugs, human error)
  • Malicious users (viruses, hackers, disgruntled employees)


Solution for protecting data

  • Highly available storage systems
    • Strategies
      • RAID/Erasure coding (protection from media failures)
      • Clusters (protection from component failures)
      • Multipath network topologies (protection from connection failures)
      • Redundant power (protection for power failures)
    • Advantages
      • Transparent to clients
      • Zero RPO/RTO (except for maybe a brief pause)
    • Disadvantages
      • Typically limited distance (weak against site-wide failures)
    • Manila
      • HA storage solutions fit into Manila today without changes
      • Use "share types" to indicate certain storage backends are highly available
  • Backups
    • Strategies
      • Tape archive (in the old days)
      • Virtual tape archive (Amazon Glacier or similar)
      • Local snapshots (standard Cinder/Manila features)
      • Remote snapshots (copy snapshot to object store, like Cinder backup)
    • Advantages
      • Can be very cheap
      • Stores multiple points in time (protection from corruption/malicious destruction)
    • Disadvantages
      • RPO typically high
      • Local snapshots don't protect against equipment/site failures
      • Remote snapshots typically have to be restored before they become accessible -- high RTO
    • Manila
      • Local snapshots implemented today
      • Remote snapshots (aka "backup") planned for future
  • Replication
    • Strategies
      • Synchronous mirroring
      • Asynchronous mirroring
    • Advantages
      • Can handle much longer distances
      • Can offer very low RPO/RTO
    • Disadvantages
      • Not transparent to network clients
    • Manila
      • This is what we're proposing!!!


Overview of the proposal

  • Start from user experience
    • If it doesn't address a user's problem, then the rest of the design is pointless
  • Also think about administrator's needs and responsibilities
  • Consider vendors/driver authors' and practical issues
    • Design is intentionally open-ended to make it as easy as possible for vendors to implement


User Experience

  • Users will be able to create "replicated" shares and non-replicated shares by specifying a share type
    • All existing shares are non-replicated
    • Administrator must specifically create share types that include replication extra_spec
    • Open question -- should the "replication" extra spec be visible to tenants?
      • We could do this similarly to how driver_handles_shares_servers is visible to tenants
      • Alternatively: rely on administrator to communicate which types are replicated and rely on the "replicated" attribute appearing on the shares after they're created
    • Open question -- what should the "replication" extra spec be called?
      • Vendors should be free to offer additional capabilities for different types of replication, just
      • There must be a standard capability/extra_spec that controls the Manila replication feature though
  • Replicated shares will have a replicated=true flag returned by the API
  • Replicated shares will also have a replication_state field
    • In Sync - stable state - share data is being replicated to 1 or more secondary controllers
    • Out Of Sync - stable state - share data is NOT being replicated
    • Resyncing - transitional state - backend is trying to reestablish replication
    • Failing Over - transitional state - share is changing to a different primary
  • Two new tenant-visible APIs
    • Failover
      • Can only be called on shares in the In Sync state
      • Causes existing export locations to be removed (must unmount first to avoid data loss)
      • Causes share to go into Failover Over state
      • Causes new export location to appear (presumably on a different storage controller in another location)
      • Expected to succeed whether storage controller hosting the share is online or not
      • After a successful failover the share may be in 2 states:
        • In sync -- if the primary storage controller was online and the backend was able to reverse the replication from the secondary
        • Out of sync -- maybe the primary storage controller was offline, or the backend wasn't able to immediately establish replication again from the secondary
    • Resync
      • Can only be called on shares in the Out Of Sync state
      • Causes shares to go into Resyncing state
      • Causes backend to attempt to reestablish replication (if possible)
      • On success, share goes to In Sync state
      • On failure, share goes back into Out Of Sync state
        • This would be expected as long as the primary remains down
  • Replicated shares will have a primary_location=True/False flag
    • Indicates if the share is being served by the original (primary) storage controller
    • After failing over, this field would be set to False to indicate that the share is being served by a secondary storage controller
      • Secondary locations may not have all of the capabilities of the primary
      • For example, the share_type may specify SSD disks extra_spec, but the secondary storage controller may have spinning disks
        • This is up to the administrator to configure how he wants
        • Manila doesn't schedule the secondary location, so this should be okay
    • If the share is not being served by the primary storage controller, a failover should always attempt to move it back to the primary, if possible
      • This proposal allows replication to more than 1 place (at the administrator's option, if the backend allows it)
      • Users aren't aware of how many replications locations there are or which one their share is at -- they only know if it's at the primary or not


Administrator Experience

  • Administrator's job today
    • Install/configure hardware
    • Understand physical layout of infrastructure
    • Understand network connections and logical topology of infrastructure
    • Think about failure domains and contingencies in case of failures
      • Today if a storage controller hosting Manila fails, there's not much an admin can do other than try to get it back online
    • Configure Manila
      • Setup storage controllers
      • Install software
      • Configure backends in manila.conf (typically hostnames, logins, passwords, etc)
  • Administrator's new responsibilities with Manila DR
    • Choose primary/secondary sites for replication
      • Could be between racks, between aisles, between floors, between buildings, between cities, or between continents
    • Decide whether to do symmetric (active/active) or asymmetric (active/passive) replication
      • Individual shares always have a primary (accessible) and secondary (inaccessible) location
      • Active/active refers to having 2 controllers where some primaries are on each one and they replicate to each other
      • Active/passive refers to have all the primaries one controller and all of the secondaries to the other
    • Find a driver that supports replication
      • It is very important for generic driver to support replication
        • We want to offer this functionality to everyone
        • It's needed for the gate to be able to test this feature
        • Looking for volunteers to help with the generic driver enhancement
    • Setup hardware with sufficient bandwidth to accommodate mirroring
    • Configure Manila
      • No new config flags for replication
      • Each driver can decide how replication relationships should be expressed
        • Assume that replication will most likely be between same-vendor backends
        • Could be as simple as 1 new config option with a list of names of other backends that can be replicated to
      • It would be a really good idea to have an HA configuration of Manila in the case that a site failure could affect controller nodes
    • Respond to outages
      • Administrators typically have significantly more information than tenants about the actual infrastructure
      • Administrators should communicate with their tenants in the event of an outage
      • If the administrator decides that failover is appropriate given the nature of the outage, he can/should initiate it
        • Sometimes an outage may be brief enough that waiting for the primary to come back is better than failing over
          • This is one reason we don't propose automated failover
        • Open question: how can we optimize failing over a large number of shares?
        • Users can initiate a failover on their own, but we believe that would only be wise for testing purposes
    • Fix outages and recover
      • At the end of an outage, administrator should Resync all Out Of Sync shares
        • Open question: how can we optimize resyncing a large number of shares?
      • Users should be notified that the outage has ended and is it safe to fail back to the primary
      • Administrator should not fail back shares unilaterally
        • Failing over shares causes a brief loss of connection
        • Better to let the user choose the least disruptive time
    • Permanent outages
      • Sometimes outages are so long that it makes more sense to pick a new replication site instead of reconstructing the primary
        • Destruction of the building due to fire/flood/tornado/meteor
      • Admin/user does a failover to secondary, share goes to Out Of Sync
      • Administrator changes the list of replication relationships in manila.conf and restarts manila-share, invokes update_replication API
      • Shares move to resyncing state (update replication is like resync++)
      • Eventually share becomes In Sync again
      • If the current location is not the new primary location, the user may failover to the new primary


Driver Maintainers / Vendor Concerns

  • Replication is not a required feature
    • It only has to work if the backend advertises the "replication" capability
  • Still only 1 database row/1 UUID per share
  • Only 3 new DB fields
    • Replicated=true/false
    • Replication state=In Sync/Out Of Sync/Resyncing/Failing Over
    • Primary_location=true/false
  • Drivers should store any needed information about share replication using driver private data feature
  • Driver have 3 new methods
    • failover_share
      • Called after the manager deletes the existing export_locations
      • Manager sets the share state to Failing Over before invoking this method
      • Driver should do whatever is necessary to make the secondary accessible
        • The primary may still be accessible, or it may not
        • Failover is expected to succeed in both cases
      • Driver should return new export location in a model update
      • Driver MAY update the share's host field, if a different backend should own the share after the failover
      • Driver MAY reinitialize replication in the reverse direction immediately if the primary is accessible
      • Driver should update replication state to In Sync or Out Of Sync using a model update
        • In Sync indicates the failover was successful and replication was reestablished in the reverse direction
        • Out Of Sync indicates the failover was successful but replication was NOT reestablished
      • On failure, the share goes into ERROR state
    • resync_share
      • Manager sets the share state to Resyncing before invoking this method
      • Driver should attempt to establish replication again
      • Driver should update replication state to In Sync or Out Of Sync using a model update
        • In Sync indicates the resync was successful
        • Out Of Sync indicates the resync failed
    • update_share_replication
      • Admin only API
      • Informs driver that the topology has changed, and obsolete relationships should be cleaned up and new ones created
      • Driver should set a new primary_location if the old primary_location isn't part of the replication relationship anymore
        • Primary_location should only change when the replication topology changes
        • Open question: how does the driver know which location to make the primary?
      • Also does everything else that resync does
  • Changes to existing methods
    • create_share
      • Manager will set replication=true on share if share type has that extra spec
      • Driver should setup replication as needed and should set the replication state to In Sync in the model update
    • ensure_share
      • This method is called for each share on driver startup
      • In addition to other cleanup, shares with a replication state of Resyncing or Failing Over should be set to a stable state
  • Drivers have a lot of flexibility
    • Alternative topologies
      • Replicate to more than 1 other site
      • Fan-out replication or replication chains
    • Secondary backends
      • Two Manila backends can replicate to eachother
      • One Manila backend can manage two controllers
      • A backend could have a list of possible replication destinations and choose one (but no involvement from Manila schedueler)
    • Synchronous/Asyncronous RPO/RTO times
      • All options about different types of replication can be (driver-specific) backend capabilities
    • Driver can be aggressive or lazy about repairing broken replication relationships
    • The point of all this flexibility is to enable a wide variety of technologies to fit the design

API

  • Three new REST APIs
    • User facing: failover, resync
    • Admin facing: update replication
    • Validate share state and invoke manager RPC
  • Create
    • Set the replicated state depending on the share_type
  • Additional fields for share views
    • Replicated=true/false
    • Replication state
    • Primary_location=true/false


Scheduler

  • No changes
  • Existing extra_specs/capabilities logic ensures that appropriate backends are chosen for replicated shares


Share Manager

  • Add RPCs for failover/resync/update_replication
  • Implement appropriate replication state changes before calling driver methods
  • Clear export_locations before failover
  • Add new driver entry points
  • Validate model updates regarding replication states
  • Make sure that changing a share's host field doesn't cause problems