Difference between revisions of "Rally"

Revision as of 19:57, 10 March 2014

What is Rally?

If you are here, you are probably familiar with OpenStack and you also know that it's a really huge ecosystem of cooperative services. When something fails, performs slowly or doesn't scale, it's really hard to answer different questions on "what", "why" and "where" has happened. Another reason why you could be here is that you would like to build an OpenStack CI/CD system that will allow you to improve SLA, performance and stability of OpenStack continuously.

The OpenStack QA team mostly works on CI/CD that ensures that new patches don't break some specific single node installation of OpenStack. On the other hand it's clear that such CI/CD is only an indication and does not cover all cases (e.g. if a cloud works well on a single node installation it doesn't mean that it will continue to do so on a 1k servers installation under high load as well). Rally aims to fix this and help us to answer the question "How does OpenStack work at scale?". To make it possible, we are going to automate and unify all steps that are required for benchmarking OpenStack at scale: multi-node OS deployment, verification, benchmarking & profiling.

Deploy engine is not yet another deployer of OpenStack, but just a pluggable mechanism that allows to unify & simplify work with different deployers like: DevStack, Fuel, Anvil on hardware/VMs that you have.
Verification - (work in progress) uses tempest to verify the functionality of a deployed OpenStack cloud. In future Rally will support other OS verifiers.
Benchmark engine - allows to create parameterized load on the cloud based on a big repository of benchmarks.

For more information about how it works take a look at Rally Architecture

Use Cases

Before diving deep in Rally architecture let's take a look at 3 major high level Rally Use Cases:

Typical cases where Rally aims to help are:

Automate measuring & profiling focused on how new code changes affect the OS performance;
Using Rally profiler to detect scaling & performance issues;
Investigate how different deployments affect the OS performance:
- Find the set of suitable OpenStack deployment architectures;
- Create deployment specifications for different loads (amount of controllers, swift nodes, etc.);
Automate the search for hardware best suited for particular OpenStack cloud;
Automate the production cloud specification generation:
- Determine terminal loads for basic cloud operations: VM start & stop, Block Device create/destroy & various OpenStack API methods;
- Check performance of basic cloud operations in case of different loads.

Architecture

Usually OpenStack projects are as-a-Service, so Rally provides this approach and a CLI driven approach that does not require a daemon:

Rally as-a-Service: Run rally as a set of daemons that present Web UI (work in progress) so 1 RaaS could be used by whole team.
Rally as-an-App: Rally as a just lightweight CLI app (without any daemons), that makes it simple to develop & much more portable.

How is this possible? Take a look at diagram below:

So what is behind Rally?

Rally Components

Rally consists of 4 main components:

Server Providers - provide servers (virtual servers), with ssh access, in one L3 network.
Deploy Engines - deploy OpenStack cloud on servers that are presented by Server Providers
Verification - component that runs tempest (or another pecific set of tests) against a deployed cloud, collects results & presents them in human readable form.
Benchmark engine - allows to write parameterized benchmark scenarios & run them against the cloud.

But why does Rally need these components?
It becomes really clear if we try to imagine: how I will benchmark cloud at Scale, if ...

Rally in action

How amqp_rpc_single_reply_queue affects performance

To show Rally's capabilities and potential we used NovaServers.boot_and_destroy scenario to see how amqp_rpc_single_reply_queue option affects VM bootup time. Some time ago it was shown that cloud performance can be boosted by setting it on so naturally we decided to check this result. To make this test we issued requests for booting up and deleting VMs for different number of concurrent users ranging from one to 30 with and without this option set. For each group of users a total number of 200 requests was issued. Averaged time per request is shown below:

So apparently this option affects cloud performance, but not in the way it was thought before.

How To

Actually there are only 3 steps that should be interesting for you:

Weekly updates

Each week we write up on a special weekly updates page what sort of things have been accomplished in Rally during the past week and what are our plans for the next one. Below you can find the most recent report.

Over the past week the direction of our efforts hasn't changed significantly: we are still working hard on further logical organization of the core parts of Rally which will enable the system to be even more extendable than it is now. Some important changes include:

Further work on integrating the Context classes into Rally. Let us remind you that the notion of contexts is used by us to define different environments in which benchmark scenarios can be launched by Rally, e.g. an environment with temporarity generated OpenStack users and/or a context that enables genearic cleanup for the benchmark scenarios. This week, we have added the base Context class with a unified interface and we have also rewritten some already existing context classes according to the base class API (https://review.openstack.org/#/c/78193/);
Various fixes in the Devstack deploy engine, including the support for connecting to the VM with a user-password combination instead of a key-pair (https://review.openstack.org/#/c/77540/), minor bugfix in the cleanup procedure (https://review.openstack.org/#/c/70727/) and adding support for git branching (https://review.openstack.org/#/c/78225/);
Many small but important improvements that make the code overall more readable, e.g. using the configuration files in appropriate places (https://review.openstack.org/#/c/78325/), moving a couple of helper methods for the benchmark engine to the correct modules (https://review.openstack.org/#/c/78524/), replacing the incorrect mocking syntax with the decorator-based one (https://review.openstack.org/#/c/78589/) and so on.

This week, we are going to continue the work on the context classes for benchmark scenarios since this is going to be a tool which will make Rally really pluggable. Current tasks include:

Changing the benchmark scenario input config format;
Splitting the already existing validation procedures to different context classes in a logical way;
Implementing the Context class factory (like we did with deploy engines or scenario runners);

and many others.

We are also going to introduce several enhancement both to the task result output (in its HTML form) and to the code (by moving some common code to a special utils module).

We encourage you to take a look at new patches in Rally pending for review and to help us making Rally better.

Source code for Rally is hosted at GitHub: https://github.com/stackforge/rally
You can track the overall progress in Rally via Stackalytics: http://stackalytics.com/?release=icehouse&metric=commits&project_type=all&module=rally
Open reviews for Rally: https://review.openstack.org/#/q/status:open+rally,n,z

Stay tuned.

Regards,
The Rally team

Previous Weekly Updates

Rally in the World

Date	Authors	Title	Location
01/Mar/2014	Bangalore C.B. Ananth (cbpadman at cisco.com) Rahul Upadhyaya (rahuupad at cisco.com)	Benchmark as a Service OpenStack-Rally	OpenStack Meetup Bangalore
28/Feb/2014	Peeyush Gupta	Benchmarking OpenStack With Rally	http://www.thegeekyway.com/
26/Feb/2014	Oleg Gelbukh	Benchmarking OpenStack at megascale: How we tested Mirantis OpenStack at SoftLayer	http://www.mirantis.com/blog/
07/Nov/2013	Boris Pavlovic	Benchmark OpenStack at Scale	Openstack summit Hong Kong

Join Rally team

Discussions & RoadMap

https://etherpad.openstack.org/p/Rally_Main

Open and assigned tasks

https://trello.com/b/DoD8aeZy/rally

To get account ping Boris in IRC (boris-42) or email me (boris(at)pavlovic.me)

IRC chat

server: freenode.net

chanel: #openstack-rally

Weekly Meetings

The Rally project team holds weekly meetings on Tuesdays at 1700 UTC in IRC, at the #openstack-meeting channel.

@@ Line 88: / Line 88: @@
 '''Each week we write up on a special [[Rally/Updates|weekly updates page]] what sort of things have been accomplished in Rally during the past week and what are our plans for the next one. Below you can find the most recent report.'''
-This week, several important contributions have been made to Rally, considering both the overall system stability and the improvements of the user interface. To name a few:
+Over the past week the direction of our efforts hasn't changed significantly: we are still working hard on further logical organization of the core parts of Rally which will enable the system to be even more extendable than it is now. Some important changes include:
-* '''''Vast refactoring of the ScenarioRunner class''''' has enabled to stop sharing OpenStack clients objects between processes in the core of the system, which occasionally caused bugs in Rally (https://review.openstack.org/#/c/74769/);
+* '''''Further work on integrating the Context classes''''' into Rally. Let us remind you that the notion of ''contexts'' is used by us to define different environments in which benchmark scenarios can be launched by Rally, e.g. an environment with temporarity generated OpenStack users and/or a context that enables genearic cleanup for the benchmark scenarios. This week, we have added the base ''Context'' class with a unified interface and we have also rewritten some already existing context classes according to the base class API (https://review.openstack.org/#/c/78193/);
-* Another important refactoring step resulted in the '''''replacement of OpenStack endpoint dictionaries with special objects throughout the system''''', which has made the code more reliable and extendable (https://review.openstack.org/#/c/74425/);
+* Various fixes in the '''''Devstack deploy engine''''', including the support for connecting to the VM with a user-password combination instead of a key-pair (https://review.openstack.org/#/c/77540/), minor bugfix in the cleanup procedure (https://review.openstack.org/#/c/70727/) and adding support for ''git branching'' (https://review.openstack.org/#/c/78225/);
-* Perhaps the prettiest patch of the week was '''''the introduction of a benchmark result visualization tool''''', implemented with the ''nvd3'' plugin to ''d3.js'' (so that the actual charts are drawn to a ''html'' file). The graphs look really nice and will be of great use for those who want to share their benchmarking results  (https://review.openstack.org/#/c/72970/);
+* Many small but important improvements that make the code overall more readable, e.g. using the configuration files in appropriate places (https://review.openstack.org/#/c/78325/), moving a couple of helper methods for the ''benchmark engine'' to the correct modules (https://review.openstack.org/#/c/78524/), replacing the incorrect mocking syntax with the decorator-based one (https://review.openstack.org/#/c/78589/) and so on.
-* Several '''''nice improvements in the CLI''''' include the showing of 90- and 95- percentile results in the benchmark summary (https://review.openstack.org/#/c/73522/) and a new '''show''' command which allows the user to get the information on ''images/flavors/networks/etc.'' available in the current deployment in a very quick way (https://review.openstack.org/#/c/75699/).
-The ongoing work includes:
+This week, we are going to continue the work on the '''''context classes''''' for benchmark scenarios since this is going to be a tool which will make Rally really pluggable. Current tasks include:
-* An extention of the '''use''' command which will be applicable soon not only to deployments but also to tasks (https://review.openstack.org/#/c/75936/);
+* Changing the benchmark scenario ''input config format'';
-* Further refactoring of the core benchmark engine, including the work around input configuration parameters validation (for a detailed description of what's going to be done, see [https://docs.google.com/a/mirantis.com/document/d/1LYUAHkZQD8W7dtlj2I3PDA6x67TiD3AMnSWG6ljsups/edit#heading=h.ae5lk415py0q|this special document]);
+* Splitting the already existing validation procedures to different context classes in a logical way;
-* After finishing some major refactoring procedues, we have also resumed the work around passing pre-created user endpoints to the DummyEngine (https://review.openstack.org/#/c/67720/) and generating the ''"stress"'' load on the cloud.
+* Implementing the ''Context class factory'' (like we did with ''deploy engines'' or ''scenario runners'');
+and many others.
+We are also going to introduce several enhancement both to the task result output (in its HTML form) and to the code (by moving some common code to a special ''utils'' module).