Jump to: navigation, search

Difference between revisions of "Swift/Fixing-rebalance-and-golang"

 
Line 1: Line 1:
 +
==symptom:==
 +
# rebalance is slow, especially for dense servers
 +
# uncertain latency for end-user requests
 +
# hard to monitor and requires a lot of intervention to get out of bad situations (eg cluster full)
  
 +
==problem:==
 +
* swift is not in the transport data path for rsync
 +
* too much walking the disk
 +
* poor job scheduling/finding the work to be done
 +
* eventlet hub can't touch disk
 +
** mitigation: use lots of processes -- "easy" in python but hard to coordinate work
 +
** solution: use nonblocking io -- "hard rewrite" but efficiently solves the problem
 +
 +
==things in-progress to fix these problems:==
 +
* tsync protocol for data moving
 +
** puts swift in the data path (more efficient for actual transport and writing to disk (as opposed to rsync))
 +
** use an external and supported data transport and wire protocol instead of something we invent (http2+grpc vs repconn or ssync)
 +
** see also https://etherpad.openstack.org/p/swift-rebalance
 +
* better scheduling of work in reconstructor and replicator
 +
** threads not eventlet
 +
** more concurrency == more faster (to HW limits)
 +
** identifying the work to be done (rebuilds vs rebalance; includes backpressure from tsync)
 +
* fix proxy<->storage protocol (can't depend on bespoke features in our current framework)
 +
* golang object server itself to more efficiently take network data and write it to disk
 +
 +
==how do we get there (subject to change):==
 +
    0. hummingbird branch is an interesting R&D reference but not going to be merged (done)
 +
    1. make replication/reconstruction tolerable to the point that we can make it fast by changing a config value (more workers, more connections, etc) (nearly done)
 +
    2. build a better scheduler for consistency engine work
 +
    2. build the tsync protocol
 +
    now: build a feature-complete golang object server (might or might not borrow from hummingbird)
 +
    now: infra/devstack CI work (ie swift consumable in the gate)
 +
    now: ask other deployment projects what needs to be done to make them happy with swift as a golang thing (eg kolla, ansible, tripleo, etc)

Latest revision as of 17:49, 15 March 2017

symptom:

  1. rebalance is slow, especially for dense servers
  2. uncertain latency for end-user requests
  3. hard to monitor and requires a lot of intervention to get out of bad situations (eg cluster full)

problem:

  • swift is not in the transport data path for rsync
  • too much walking the disk
  • poor job scheduling/finding the work to be done
  • eventlet hub can't touch disk
    • mitigation: use lots of processes -- "easy" in python but hard to coordinate work
    • solution: use nonblocking io -- "hard rewrite" but efficiently solves the problem

things in-progress to fix these problems:

  • tsync protocol for data moving
    • puts swift in the data path (more efficient for actual transport and writing to disk (as opposed to rsync))
    • use an external and supported data transport and wire protocol instead of something we invent (http2+grpc vs repconn or ssync)
    • see also https://etherpad.openstack.org/p/swift-rebalance
  • better scheduling of work in reconstructor and replicator
    • threads not eventlet
    • more concurrency == more faster (to HW limits)
    • identifying the work to be done (rebuilds vs rebalance; includes backpressure from tsync)
  • fix proxy<->storage protocol (can't depend on bespoke features in our current framework)
  • golang object server itself to more efficiently take network data and write it to disk

how do we get there (subject to change):

   0. hummingbird branch is an interesting R&D reference but not going to be merged (done)
   1. make replication/reconstruction tolerable to the point that we can make it fast by changing a config value (more workers, more connections, etc) (nearly done)
   2. build a better scheduler for consistency engine work
   2. build the tsync protocol
   now: build a feature-complete golang object server (might or might not borrow from hummingbird)
   now: infra/devstack CI work (ie swift consumable in the gate)
   now: ask other deployment projects what needs to be done to make them happy with swift as a golang thing (eg kolla, ansible, tripleo, etc)