Jump to: navigation, search

Difference between revisions of "Rally/RoadMap"

m (Run multiple scenarios simultaneously)
m (More scenarios)
Line 72: Line 72:
  
 
'''suspending/resuming server'''
 
'''suspending/resuming server'''
     def reboot_server(cls, context):
+
     def suspend_and_resume_server(cls, context):
 
         context['prepared_server'].suspend()
 
         context['prepared_server'].suspend()
 
         context['prepared_server'].resume()
 
         context['prepared_server'].resume()
  
 
'''shelving/unshelving server'''
 
'''shelving/unshelving server'''
     def reboot_server(cls, context):
+
     def shelve_and_unshelve_server(cls, context):
 
         context['prepared_server'].shelve()
 
         context['prepared_server'].shelve()
 
         context['prepared_server'].unshelve()
 
         context['prepared_server'].unshelve()
Line 129: Line 129:
 
         cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size*2)
 
         cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size*2)
 
         cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size/2)
 
         cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size/2)
 
 
  
 
= Data processing =
 
= Data processing =

Revision as of 18:17, 16 October 2013

Benchmarking Engine

Add support of Users & Tenants out of box

At this moment we are supporting next 3 parameters:

  1. timeout - this is the timeout of 1 scenario loop
  2. times - how much loops of scenario to run
  3. concurrent - how much loops should be run simultaneously

All tests are run from one user => it is not real situations.

We are going to add two new parameters:

  1. tenants - how much tenants to create
  2. users_pro_tenant - how much users should be in each tenant


Benchmark Engine will create all tenants & users, and prepare OpenStack python clients, before starting benchmarking.


Add generic cleanup mechanism

In benchmarks we are creating a lot of different resources: Tenants, Users, VMs, Snapshots, Block devices.

If something went wrong, or test is not well written we will get a lot of allocated resources that could make influence on next benchmarks. So we should clean up our OpenStack.

Such generic cleanup could be easily implemented. As we are creating all resources, using tenants, created before running benchmark Scenario.

We need to make only 2 steps:

  1. Purge resources of each user
  2. Destroy all users & tenants


Run multiple scenarios simultaneously

Okay we now how to make load on Nova: Boot & Destroy VM scenario.

But how will influence huge load of another Scenario: Create & Destroy Block device on Nova Scenario?

This could be also easily extended. For example we will get special name for such benchmarks:

benchmark: {
  "@composed" : {
      "NovaServers.boot_and_destroy": [...], 
      ....
  },
  ""
}

More scenarios

More benchmark scenarios - for different parts of OpenStack API - are to be impemented:

Nova

Assuming that we have prepared a server (and possibly also some floating IPs) in init(), which will be then deleted in cleanup(), like:

   def init(cls, config):
       context = {'prepared_server': cls.nova.servers.create(...)}
       ...
       return context
    ...
  def cleanup(cls, context):
       context['prepared_server'].delete()
       ...

- the following benchmark scenarios will be implemented in the near future:

rebooting server

   def reboot_server(cls, context):
       context['prepared_server'].reboot()

suspending/resuming server

   def suspend_and_resume_server(cls, context):
       context['prepared_server'].suspend()
       context['prepared_server'].resume()

shelving/unshelving server

   def shelve_and_unshelve_server(cls, context):
       context['prepared_server'].shelve()
       context['prepared_server'].unshelve()

associating floating ips

   def associate_floating_ips(cls, context):
       for floating_ip in context['prepared_floating_ips']:
           context["prepared_server"].add_floating_ip(floating_ip)
       for floating_ip in context['prepared_floating_ips']:
           context["prepared_server"].remove_floating_ip(floating_ip)


Cinder

Assuming that we have prepared a server and a volume in init(), which will be then deleted in cleanup(), like:

   def init(cls, config):
       context = {'prepared_server': cls.nova.servers.create(...)}
       context = {'prepared_volume': cls.conder.volumes.create(...)}
       ...
       return context
    ...
  def cleanup(cls, context):
       context['prepared_server'].delete()
       context['prepared_volume'].delete()
       ...

- the following benchmark scenarios will be implemented in the near future:

create/delete volume

   def create_and_delete_colume(cls, context):
       volume = cls.cinder.volumes.create(...)
       volume.delete()

attach/detach volume

   def attach_and_detach_volume(cls, context):
       context['prepared_volume'].attach(context['prepared_server'], ...)
       context['prepared_volume'].detach()

reserve/unreserve volume

   def attach_and_detach_volume(cls, context):
       context['prepared_volume'].reserve()
       context['prepared_volume'].unreserve()

set/delete metadata

   def set_and_delete_metadata(cls, context):
       cls.cinder.volumes.set_metadata(context['prepared_volume'], fake_meta)
       cls.cinder.volumes.delete_metadata(context['prepared_volume'], fake_meta.keys())

extend volume

   def extend_volume(cls, context):
       cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size*2)
       cls.cinder.volumes.extend(context['prepared_volume'], context['prepared_volume'].size/2)

Data processing

At this moment only thing that we have is getting tables with: min, max, avg. As a first step good=) But we need more!


Data aggregation

to be done...


Graphics & Plots

  • Simple plot Time of Loop / Iteration

Rally plot


  • Histogram of loop times

Rally histgoram

Profiling

Improve & Merge Tomograph into upstream

To collect profiling data we use a small library Tomograph that was created to be used with OpenStack (needs to be improved as well). Profiling data is collected by inserting profiling/log points in OpenStack source code and adding event listeners/hooks on events from 3rd party libraries that support such events (e.g. sqlalchemy). Currently our patches are applied to OpenStack code during cloud deployment. For easier maintenance and better profiling results profiler could be integrated as an oslo component and in such case it’s patches could be merged to upstream. Profiling itself would be managed by configuration options.


Improve Zipkin or use something else

Currently we use Zipkin as a collector and visualization service but in future we plan to replace it with something more suitable in terms of load (possibly Ceilometer?) and improve visualization format (need better charts). Early results you coude

Some early results you can see here:


Make it work out of box

Few things should be done:

  1. Merge into upstream Tomograph that will send LOGs to Zipkin
  2. Bind Tomograph with Benchmark Engine
  3. Automate installation of Zipkin from Rally


Server providing

  1. Improve VirshProvider
    1. Implement netinstall linux on VM (currently implemented only cloning existing VM).
    2. Add support zfs/lvm2 for fast cloning
  2. Implement LxcProvider
    1. This provider about to be used for fast deployment large amount of instances.
    2. Support zfs clones for fast deployment.
  3. Implement AmazonProvider
    1. Get your VMs from Amazon

Deployers

  1. Implement MultihostEngine - this engine will deploy multihost configuration using existing engines.
  2. Implement Dev Fuel based engine - deploy OpenStack using FUEL on existing servers or VMs
  3. Implement Full Fuel based engine - deploy OpenStack with Fuel on bare metal nodes.
  4. Implement TrippleO based engine - deploy OpenStack on bare metal nodes using TrippleO.