Jump to: navigation, search

Difference between revisions of "Zaqar/Performance"

(Created page with "== Zaqar's Drivers Performance == This wiki page contains current performance numbers per driver. == Benchmark Environment == * 1x Load Generator ** Hardware *** 1x Intel X...")
 
Line 31: Line 31:
 
*** config: http://paste.openstack.org/show/100592/
 
*** config: http://paste.openstack.org/show/100592/
 
*** app.py: http://paste.openstack.org/show/100593/
 
*** app.py: http://paste.openstack.org/show/100593/
 +
 +
== Scenarios ==
 +
 +
### Event Broadcasting (Read-Heavy) ###
 +
 +
OK, so let's say you have a somewhat low-volume source, but tons of event
 +
observers. In this case, the observers easily outpace the producer, making
 +
this a read-heavy workload.
 +
 +
Options
 +
    * 1 producer process with 5 gevent workers
 +
        * 1 message posted per request
 +
    * 2 observer processes with 25 gevent workers each
 +
        * 5 messages listed per request by the observers
 +
    * Load distributed across 4[6] queues
 +
    * 10-second duration
 +
 +
Results
 +
    * Redis
 +
        * Producer: 1.7 ms/req,  585 req/sec
 +
        * Observer: 1.5 ms/req, 1254 req/sec
 +
    * Mongo
 +
        * Producer: 2.2 ms/req,  454 req/sec
 +
        * Observer: 1.5 ms/req, 1224 req/sec
 +
 +
### Event Broadcasting (Balanced) ###
 +
 +
This test uses the same number of producers and consumers, but note that
 +
the observers are still listing (up to) 5 messages at a time[4], so they
 +
still outpace the producers, but not as quickly as before.
 +
 +
Options
 +
    * 2 producer processes with 25 gevent workers each
 +
        * 1 message posted per request
 +
    * 2 observer processes with 25 gevent workers each
 +
        * 5 messages listed per request by the observers
 +
    * Load distributed across 4 queues
 +
    * 10-second duration
 +
 +
Results
 +
    * Redis
 +
        * Producer: 1.4 ms/req, 1374 req/sec
 +
        * Observer: 1.6 ms/req, 1178 req/sec
 +
    * Mongo
 +
        * Producer: 2.2 ms/req, 883 req/sec
 +
        * Observer: 2.8 ms/req, 348 req/sec
 +
 +
### Point-to-Point Messaging ###
 +
 +
In this scenario I simulated one client sending messages directly to a
 +
different client. Only one queue is required in this case[5].
 +
 +
Options
 +
    * 1 producer process with 1 gevent worker
 +
        * 1 message posted per request
 +
    * 1 observer process with 1 gevent worker
 +
        * 1 message listed per request
 +
    * All load sent to a single queue
 +
    * 10-second duration
 +
 +
Results
 +
    * Redis
 +
        * Producer: 2.9 ms/req, 345 req/sec
 +
        * Observer: 2.9 ms/req, 339 req/sec
 +
    * Mongo
 +
        * Producer: 5.5 ms/req, 179 req/sec
 +
        * Observer: 3.5 ms/req, 278 req/sec
 +
 +
=== Task Distribution ===
 +
 +
This test uses several producers and consumers in order to simulate
 +
distributing tasks to a worker pool. In contrast to the observer worker
 +
type, consumers claim and delete messages in such a way that each message
 +
is processed once and only once.
 +
 +
Options
 +
    * 2 producer processes with 25 gevent workers
 +
        * 1 message posted per request
 +
    * 2 consumer processes with 25 gevent workers
 +
        * 5 messages claimed per request, then deleted one by one before
 +
          claiming the next batch of messages
 +
    * Load distributed across 4 queues
 +
    * 10-second duration
 +
 +
=== Auditing / Diagnostics ===
 +
 +
This test is the same as performed in Task Distribution, but also adds a
 +
few observers to the mix.
 +
 +
When testing the Redis driver, I varied whether or not keep-alive was
 +
enabled in the uWSGI config. The impact on performance was negligble,
 +
perhaps due to the speed of the test network and the fact that TLS is not
 +
being used in these tests.
 +
 +
=== Benchmark Config ===
 +
* 2 producer processes with 25 gevent workers each
 +
** 1 message posted per request
 +
* 2 consumer processes with 25 gevent workers each
 +
** 5 messages claimed per request, then deleted one by one before claiming the next batch of messages
 +
* 1 observer process with 5 gevent workers
 +
** 5 messages listed per request
 +
* Load distributed across 4 queues
 +
* 10-second duration
  
 
== MongoDB ==
 
== MongoDB ==
  
 +
=== Instance Configuration ===
 
* 3x MongoDB Nodes
 
* 3x MongoDB Nodes
 
** Hardware
 
** Hardware
Line 46: Line 150:
 
**** Journaling enabled
 
**** Journaling enabled
 
**** Profiling on message DBs enabled for requests over 10ms
 
**** Profiling on message DBs enabled for requests over 10ms
 +
 +
=== Results ===
  
 
== Redis ==
 
== Redis ==

Revision as of 09:53, 10 September 2014

Zaqar's Drivers Performance

This wiki page contains current performance numbers per driver.

Benchmark Environment

  • 1x Load Generator
    • Hardware
      • 1x Intel Xeon E5-2680 v2 2.8Ghz
      • 32 GB RAM
      • 10Gbps NIC
      • 32GB SATADOM
    • Software
      • Debian Wheezy
      • Python 2.7.3
      • zaqar-bench
  • 1x Web Head

Scenarios

      1. Event Broadcasting (Read-Heavy) ###

OK, so let's say you have a somewhat low-volume source, but tons of event observers. In this case, the observers easily outpace the producer, making this a read-heavy workload.

Options

   * 1 producer process with 5 gevent workers
       * 1 message posted per request
   * 2 observer processes with 25 gevent workers each
       * 5 messages listed per request by the observers
   * Load distributed across 4[6] queues
   * 10-second duration

Results

   * Redis
       * Producer: 1.7 ms/req,  585 req/sec
       * Observer: 1.5 ms/req, 1254 req/sec
   * Mongo
       * Producer: 2.2 ms/req,  454 req/sec
       * Observer: 1.5 ms/req, 1224 req/sec
      1. Event Broadcasting (Balanced) ###

This test uses the same number of producers and consumers, but note that the observers are still listing (up to) 5 messages at a time[4], so they still outpace the producers, but not as quickly as before.

Options

   * 2 producer processes with 25 gevent workers each
       * 1 message posted per request
   * 2 observer processes with 25 gevent workers each
       * 5 messages listed per request by the observers
   * Load distributed across 4 queues
   * 10-second duration

Results

   * Redis
       * Producer: 1.4 ms/req, 1374 req/sec
       * Observer: 1.6 ms/req, 1178 req/sec
   * Mongo
       * Producer: 2.2 ms/req, 883 req/sec
       * Observer: 2.8 ms/req, 348 req/sec
      1. Point-to-Point Messaging ###

In this scenario I simulated one client sending messages directly to a different client. Only one queue is required in this case[5].

Options

   * 1 producer process with 1 gevent worker
       * 1 message posted per request
   * 1 observer process with 1 gevent worker
       * 1 message listed per request
   * All load sent to a single queue
   * 10-second duration

Results

   * Redis
       * Producer: 2.9 ms/req, 345 req/sec
       * Observer: 2.9 ms/req, 339 req/sec
   * Mongo
       * Producer: 5.5 ms/req, 179 req/sec
       * Observer: 3.5 ms/req, 278 req/sec

Task Distribution

This test uses several producers and consumers in order to simulate distributing tasks to a worker pool. In contrast to the observer worker type, consumers claim and delete messages in such a way that each message is processed once and only once.

Options

   * 2 producer processes with 25 gevent workers
       * 1 message posted per request
   * 2 consumer processes with 25 gevent workers
       * 5 messages claimed per request, then deleted one by one before
         claiming the next batch of messages
   * Load distributed across 4 queues
   * 10-second duration

Auditing / Diagnostics

This test is the same as performed in Task Distribution, but also adds a few observers to the mix.

When testing the Redis driver, I varied whether or not keep-alive was enabled in the uWSGI config. The impact on performance was negligble, perhaps due to the speed of the test network and the fact that TLS is not being used in these tests.

Benchmark Config

  • 2 producer processes with 25 gevent workers each
    • 1 message posted per request
  • 2 consumer processes with 25 gevent workers each
    • 5 messages claimed per request, then deleted one by one before claiming the next batch of messages
  • 1 observer process with 5 gevent workers
    • 5 messages listed per request
  • Load distributed across 4 queues
  • 10-second duration

MongoDB

Instance Configuration

  • 3x MongoDB Nodes
    • Hardware
      • 2x Intel Xeon E5-2680 v2 2.8Ghz
      • 128 GB RAM
      • 10Gbps NIC
      • 2x LSI Nytro WarpDrive BLP4-1600[2]
    • Software
      • Debian Wheezy
      • mongod 2.6.4
        • Default config, except setting replSet and enabling periodic logging of CPU and I/O
        • Journaling enabled
        • Profiling on message DBs enabled for requests over 10ms

Results

Redis

  • 1x Redis Node
    • Hardware
      • 2x Intel Xeon E5-2680 v2 2.8Ghz
      • 128 GB RAM
      • 10Gbps NIC
      • 2x LSI Nytro WarpDrive BLP4-1600[2]
    • Software
      • Debian Wheezy
      • Redis 2.4.14
        • Default config (snapshotting and AOF enabled)
        • One process