Jump to: navigation, search

Difference between revisions of "Swift/Fixing-rebalance-and-golang"

(Created page with "==symptom:== # rebalance is slow, especially for dense servers # uncertain latency for end-user requests # hard to monitor and requires a lot of intervention to get out of bad...")
 
(Blanked the page)
Line 1: Line 1:
==symptom:==
 
# rebalance is slow, especially for dense servers
 
# uncertain latency for end-user requests
 
# hard to monitor and requires a lot of intervention to get out of bad situations (eg cluster full)
 
  
==problem:==
 
* swift is not in the transport data path for rsync
 
* too much walking the disk
 
* poor job scheduling/finding the work to be done
 
* eventlet hub can't touch disk
 
** mitigation: use lots of processes -- "easy" in python but hard to coordinate work
 
** solution: use nonblocking io -- "hard rewrite" but efficiently solves the problem
 
 
==things in-progress to fix these problems:==
 
* tsync protocol for data moving
 
** puts swift in the data path (more efficient for actual transport and writing to disk (as opposed to rsync))
 
** use an external and supported data transport and wire protocol instead of something we invent (http2+grpc vs repconn or ssync)
 
** see also https://etherpad.openstack.org/p/swift-rebalance
 
* better scheduling of work in reconstructor and replicator
 
** threads not eventlet
 
** more concurrency == more faster (to HW limits)
 
** identifying the work to be done (rebuilds vs rebalance; includes backpressure from tsync)
 
* fix proxy<->storage protocol (can't depend on bespoke features in our current framework)
 
* golang object server itself to more efficiently take network data and write it to disk
 
 
==how do we get there (subject to change):==
 
    0. hummingbird branch is an interesting R&D reference but not going to be merged (done)
 
    1. make replication/reconstruction tolerable to the point that we can make it fast by changing a config value (more workers, more connections, etc) (nearly done)
 
    2. build a better scheduler for consistency engine work
 
    2. build the tsync protocol
 
    now: build a feature-complete golang object server (might or might not borrow from hummingbird)
 
    now: infra/devstack CI work (ie swift consumable in the gate)
 
    now: ask other deployment projects what needs to be done to make them happy with swift as a golang thing (eg kolla, ansible, tripleo, etc)
 

Revision as of 17:48, 15 March 2017