Difference between revisions of "Rally/RoadMap"

Latest revision as of 08:36, 1 May 2015

Rally Roadmap

https://docs.google.com/spreadsheets/d/16DXpfbqvlzMFaqaXAcJsBzzpowb_XpymaK2aFY2gA2g/edit#gid=0

@@ Line 1: / Line 1: @@
+== Rally Roadmap ==
-= Benchmarking Engine=
+https://docs.google.com/spreadsheets/d/16DXpfbqvlzMFaqaXAcJsBzzpowb_XpymaK2aFY2gA2g/edit#gid=0
-=== Add support of Users & Tenants out of box ===
-At this moment we are supporting next 3 parameters:
-# timeout - this is the timeout of 1 scenario loop
-# times - how much loops of scenario to run
-# concurrent - how much loops should be run simultaneously
-All tests are run from one user => it is not real situations.
-We are going to add two new parameters:
-# tenants - how much tenants to create
-# users_pro_tenant - how much users should be in each tenant
-Benchmark Engine will create all tenants & users, and prepare OpenStack python clients, before starting benchmarking.
-=== Add generic cleanup mechanism ===
-In benchmarks we are creating a lot of different resources: Tenants, Users, VMs, Snapshots, Block devices.
-If something went wrong, or test is not well written we will get a lot of allocated resources that could make influence on next benchmarks. So we should clean up our OpenStack.
-Such generic cleanup could be easily implemented. As we are creating all resources, using tenants, created before running benchmark Scenario.
-We need to make only 2 steps:
-# Purge resources of each user
-# Destroy all users & tenants
-=== Run multiple scenarios simultaneously ===
-Okay we now how to make load on Nova: Boot & Destroy VM scenario
-But how will influence huge load of another Scenario: Create & Destroy Block device on Nova Scenario?
-This could be also easily extended. For example we will get special name for such benchmarks:
-<pre>
-benchmark: {
-  "@composed" : {
-      "NovaServers.boot_and_destroy": [...],
-      ....
-  },
-  ""
-}
-</pre>
-=== More scenarios ===
-More benchmark scenarios - for different parts of OpenStack API - are to be impemented:
-====Nova====
-Assuming that we have prepared a server (and possibly also some floating IPs) in init(), which will be then deleted in cleanup(), like:
-    def init(cls, config):
-        context = {'prepared_server': cls.nova.servers.create(...)}
-        ...
-        return context
-     ....
-   def cleanup(cls, context):
-        context['prepared_server'].delete()
-        ...
-- the following benchmark scenarios will be implemented in the near future:
-'''rebooting server'''
-    def reboot_server(cls, context):
-        context['prepared_server'].reboot()
-'''suspending/resuming server'''
-    def reboot_server(cls, context):
-        context['prepared_server'].suspend()
-        context['prepared_server'].resume()
-'''shelving/unshelving server'''
-    def reboot_server(cls, context):
-        context['prepared_server'].shelve()
-        context['prepared_server'].unshelve()
-'''associating floating ips'''
-    def associate_floating_ips(cls, context):
-        for floating_ip in context['prepared_floating_ips']:
-            context["prepared_server"].add_floating_ip(floating_ip)
-        for floating_ip in context['prepared_floating_ips']:
-            context["prepared_server"].remove_floating_ip(floating_ip)
-====Cinder====
-Assuming that we have prepared a server and a volume in init(), which will be then deleted in cleanup(), like:
-    def init(cls, config):
-        context = {'prepared_server': cls.nova.servers.create(...)}
-        context = {'prepared_volume': cls.conder.volumes.create(...)}
-        ...
-        return context
-     ....
-   def cleanup(cls, context):
-        context['prepared_server'].delete()
-        context['prepared_volume'].delete()
-        ...
-- the following benchmark scenarios will be implemented in the near future:
-'''create/delete volume'''
-    def create_and_delete_colume(cls, context):
-        volume = cls.cinder.volumes.create(...)
-        volume.delete()
-'''attach/detach volume'''
-    def attach_and_detach_volume(cls, context):
-        context['prepared_volume'].attach(context['prepared_server'], ...)
-        context['prepared_volume'].detach()
-'''reserve/unreserve volume'''
-    def attach_and_detach_volume(cls, context):
-        context['prepared_volume'].reserve()
-        context['prepared_volume'].unreserve()
-'''set/delete metadata'''
-    def set_and_delete_metadata(cls, context):
-        cls.cinder.volumes.set_metadata(context['prepared_volume'], fake_meta)
-        cls.cinder.volumes.delete_metadata(context['prepared_volume'], fake_meta.keys())
-'''extend volume'''
-    def extend_volume(cls, context):
-        cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size*2)
-        cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size/2)
-= Data processing =
-At this moment only thing that we have is getting tables with: min, max, avg. As a first step good=)
-But we need more!
-=== Data aggregation ===
-to be done...
-=== Graphics & Plots ===
-* Simple plot Time of Loop / Iteration
-[[File:RallyPlot.png||Rally plot]]
-* Histogram of loop times
-[[File:Rally Histgoram.png||Rally histgoram]]
-= Profiling =
-=== Improve & Merge Tomograph into upstream ===
-To collect profiling data we use a small library [https://github.com/timjr/tomograph Tomograph] that was created to be used with OpenStack (needs to be improved as well). Profiling data is collected by inserting profiling/log points in OpenStack source code and adding event listeners/hooks on events from 3rd party libraries that support such events (e.g. sqlalchemy).
-Currently our patches are applied to OpenStack code during cloud deployment. For easier maintenance and better profiling results profiler could be integrated as an oslo component and in such case it’s patches could be merged to upstream. Profiling itself would be managed by configuration options.
-=== Improve Zipkin or use something else ===
-Currently we use [http://twitter.github.io/zipkin/ Zipkin] as a collector and visualization service but in future we plan to replace it with something more suitable in terms of load (possibly Ceilometer?) and improve visualization format (need better charts). Early results you coude
-Some early results you can see here:
-* [http://37.58.79.43:8080/traces/00080f28f2640353 Terminate 3 VMs]
-* [http://37.58.79.43:8080/traces/0001f24bb3d05ccd Run 3 VMs by one request]
-=== Make it work out of box ===
-Few things should be done:
-# Merge into upstream Tomograph that will send LOGs to Zipkin
-# Bind Tomograph with Benchmark Engine
-# Automate installation of Zipkin from Rally
-= Server providing =
-# Improve VirshProvider
-## Implement netinstall linux on VM (currently implemented only cloning existing VM).
-## Add support zfs/lvm2 for fast cloning
-# Implement LxcProvider
-## This provider about to be used for fast deployment large amount of instances.
-## Support zfs clones for fast deployment.
-# Implement AmazonProvider
-## Get your VMs from Amazon
-= Deployers =
-# '''Implement MultihostEngine''' - this engine will deploy multihost configuration using existing engines.
-# '''Implement Dev Fuel based engine''' - deploy OpenStack using FUEL on existing servers or VMs
-# '''Implement Full Fuel based engine''' - deploy OpenStack with Fuel on bare metal nodes.
-# '''Implement TrippleO based engine''' - deploy OpenStack on bare metal nodes using TrippleO.