Difference between revisions of "Swift/Fixing-rebalance-and-golang"

Latest revision as of 17:49, 15 March 2017

symptom:

rebalance is slow, especially for dense servers
uncertain latency for end-user requests
hard to monitor and requires a lot of intervention to get out of bad situations (eg cluster full)

problem:

swift is not in the transport data path for rsync
too much walking the disk
poor job scheduling/finding the work to be done
eventlet hub can't touch disk
- mitigation: use lots of processes -- "easy" in python but hard to coordinate work
- solution: use nonblocking io -- "hard rewrite" but efficiently solves the problem

things in-progress to fix these problems:

tsync protocol for data moving
- puts swift in the data path (more efficient for actual transport and writing to disk (as opposed to rsync))
- use an external and supported data transport and wire protocol instead of something we invent (http2+grpc vs repconn or ssync)
- see also https://etherpad.openstack.org/p/swift-rebalance
better scheduling of work in reconstructor and replicator
- threads not eventlet
- more concurrency == more faster (to HW limits)
- identifying the work to be done (rebuilds vs rebalance; includes backpressure from tsync)
fix proxy<->storage protocol (can't depend on bespoke features in our current framework)
golang object server itself to more efficiently take network data and write it to disk

how do we get there (subject to change):

   0. hummingbird branch is an interesting R&D reference but not going to be merged (done)
   1. make replication/reconstruction tolerable to the point that we can make it fast by changing a config value (more workers, more connections, etc) (nearly done)
   2. build a better scheduler for consistency engine work
   2. build the tsync protocol
   now: build a feature-complete golang object server (might or might not borrow from hummingbird)
   now: infra/devstack CI work (ie swift consumable in the gate)
   now: ask other deployment projects what needs to be done to make them happy with swift as a golang thing (eg kolla, ansible, tripleo, etc)

@@ Line 1: / Line 1: @@
+==symptom:==
+# rebalance is slow, especially for dense servers
+# uncertain latency for end-user requests
+# hard to monitor and requires a lot of intervention to get out of bad situations (eg cluster full)
+==problem:==
+* swift is not in the transport data path for rsync
+* too much walking the disk
+* poor job scheduling/finding the work to be done
+* eventlet hub can't touch disk
+** mitigation: use lots of processes -- "easy" in python but hard to coordinate work
+** solution: use nonblocking io -- "hard rewrite" but efficiently solves the problem
+==things in-progress to fix these problems:==
+* tsync protocol for data moving
+** puts swift in the data path (more efficient for actual transport and writing to disk (as opposed to rsync))
+** use an external and supported data transport and wire protocol instead of something we invent (http2+grpc vs repconn or ssync)
+** see also https://etherpad.openstack.org/p/swift-rebalance
+* better scheduling of work in reconstructor and replicator
+** threads not eventlet
+** more concurrency == more faster (to HW limits)
+** identifying the work to be done (rebuilds vs rebalance; includes backpressure from tsync)
+* fix proxy<->storage protocol (can't depend on bespoke features in our current framework)
+* golang object server itself to more efficiently take network data and write it to disk
+==how do we get there (subject to change):==
+. hummingbird branch is an interesting R&D reference but not going to be merged (done)
+. make replication/reconstruction tolerable to the point that we can make it fast by changing a config value (more workers, more connections, etc) (nearly done)
+. build a better scheduler for consistency engine work
+. build the tsync protocol
+    now: build a feature-complete golang object server (might or might not borrow from hummingbird)
+    now: infra/devstack CI work (ie swift consumable in the gate)
+    now: ask other deployment projects what needs to be done to make them happy with swift as a golang thing (eg kolla, ansible, tripleo, etc)

Difference between revisions of "Swift/Fixing-rebalance-and-golang"

Latest revision as of 17:49, 15 March 2017

Contents

symptom:

problem:

things in-progress to fix these problems:

how do we get there (subject to change):