Difference between revisions of "Zaqar/Performance"

Revision as of 11:01, 10 September 2014

Zaqar's Drivers Performance

This wiki page contains current performance numbers per driver.

Benchmark Environment

1x Load Generator
- Hardware
  - 1x Intel Xeon E5-2680 v2 2.8Ghz
  - 32 GB RAM
  - 10Gbps NIC
  - 32GB SATADOM
- Software
  - Debian Wheezy
  - Python 2.7.3
  - zaqar-bench
1x Web Head
- Hardware
  - 1x Intel Xeon E5-2680 v2 2.8Ghz
  - 32 GB RAM
  - 10Gbps NIC
  - 32GB SATADOM
- Software
  - Debian Wheezy
  - Python 2.7.3
  - zaqar server
  - storage=mongodb
  - partitions=4
  - MongoDB URI configured with w=majority
- uWSGI + gevent
  - config: http://paste.openstack.org/show/100592/
  - app.py: http://paste.openstack.org/show/100593/

MongoDB

Instance Configuration

3x MongoDB Nodes
- Hardware
  - 2x Intel Xeon E5-2680 v2 2.8Ghz
  - 128 GB RAM
  - 10Gbps NIC
  - 2x LSI Nytro WarpDrive BLP4-1600[2]
- Software
  - Debian Wheezy
  - mongod 2.6.4
    - Default config, except setting replSet and enabling periodic logging of CPU and I/O
    - Journaling enabled
    - Profiling on message DBs enabled for requests over 10ms

Redis

1x Redis Node
- Hardware
  - 2x Intel Xeon E5-2680 v2 2.8Ghz
  - 128 GB RAM
  - 10Gbps NIC
  - 2x LSI Nytro WarpDrive BLP4-1600[2]
- Software
  - Debian Wheezy
  - Redis 2.4.14
    - Default config (snapshotting and AOF enabled)
    - One process

Scenarios

Event Broadcasting (Read-Heavy)

OK, so let's say you have a somewhat low-volume source, but tons of event observers. In this case, the observers easily outpace the producer, making this a read-heavy workload.

Benchmark Config

2 producer processes with 25 gevent workers each
- 1 message posted per request
2 consumer processes with 25 gevent workers each
- 5 messages listed per request by the observers
Load distributed across 4[6] queues
10-second duration

Results

   * Redis
       * Producer: 1.7 ms/req,  585 req/sec
       * Observer: 1.5 ms/req, 1254 req/sec
   * Mongo
       * Producer: 2.2 ms/req,  454 req/sec
       * Observer: 1.5 ms/req, 1224 req/sec

Event Broadcasting (Balanced)

This test uses the same number of producers and consumers, but note that the observers are still listing (up to) 5 messages at a time[4], so they still outpace the producers, but not as quickly as before.

Benchmark Config

2 producer processes with 25 gevent workers each
- 1 message posted per request
2 consumer processes with 25 gevent workers each
- 5 messages listed per request by the observers
Load distributed across 4 queues
10-second duration

Results

   * Redis
       * Producer: 1.7 ms/req,  585 req/sec
       * Observer: 1.5 ms/req, 1254 req/sec
   * Mongo
       * Producer: 2.2 ms/req,  454 req/sec
       * Observer: 1.5 ms/req, 1224 req/sec

Point-to-Point Messaging

In this scenario I simulated one client sending messages directly to a different client. Only one queue is required in this case[5].

Benchmark Config

1 producer process with 1 gevent worker
- 1 message posted per request
1 observer process with 1 gevent worker
- 1 message listed per request
All load sent to a single queue
10-second duration

Results

   * Redis
       * Producer: 2.9 ms/req, 345 req/sec
       * Observer: 2.9 ms/req, 339 req/sec
   * Mongo
       * Producer: 5.5 ms/req, 179 req/sec
       * Observer: 3.5 ms/req, 278 req/sec

Task Distribution

This test uses several producers and consumers in order to simulate distributing tasks to a worker pool. In contrast to the observer worker type, consumers claim and delete messages in such a way that each message is processed once and only once.

Benchmark Config

2 producer processes with 25 gevent workers each
- 1 message posted per request
2 consumer processes with 25 gevent workers each
- 5 messages claimed per request, then deleted one by one before claiming the next batch of messages
Load distributed across 4 queues
10-second duration

Results

   * Redis
       * Producer: 1.5 ms/req, 1280 req/sec
       * Consumer
           * Claim: 6.9 ms/req
           * Delete: 1.5 ms/req
           * 1257 req/sec (overall)
   * Mongo
       * Producer: 2.5 ms/req, 798 req/sec
       * Consumer
           * Claim: 8.4 ms/req
           * Delete: 2.5 ms/req
           * 813 req/sec (overall)

Auditing / Diagnostics

This test is the same as performed in Task Distribution, but also adds a few observers to the mix.

When testing the Redis driver, I varied whether or not keep-alive was enabled in the uWSGI config. The impact on performance was negligble, perhaps due to the speed of the test network and the fact that TLS is not being used in these tests.

Benchmark Config

2 producer processes with 25 gevent workers each
- 1 message posted per request
2 consumer processes with 25 gevent workers each
- 5 messages claimed per request, then deleted one by one before claiming the next batch of messages
1 observer process with 5 gevent workers
- 5 messages listed per request
Load distributed across 4 queues
10-second duration

Results

   * Redis (Keep-Alive)
       * Producer: 1.6 ms/req, 1275 req/sec
       * Consumer
           * Claim: 7.0 ms/req
           * Delete: 1.5 ms/req
           * 1217 req/sec (overall)
       * Observer: 3.5 ms/req, 282 req/sec
   * Redis (No Keep-Alive)
       * Producer: 1.6 ms/req, 1255 req/sec
       * Consumer
           * Claim: 7.0 ms/req
           * Delete: 1.6 ms/req
           * 1202 req/sec (overall)
       * Observer: 3.4 ms/req, 281 req/sec
   * Mongo (Keep-Alive)
       * Producer: 2.2 ms/req, 878 req/sec
       * Consumer
           * Claim: 8.2 ms/req
           * Delete: 2.3 ms/req
           * 876 req/sec (overall)
       * Observer: 7.4 ms/req, 133 req/sec

@@ Line 31: / Line 31: @@
 *** config: http://paste.openstack.org/show/100592/
 *** app.py: http://paste.openstack.org/show/100593/
+== MongoDB ==
+=== Instance Configuration ===
+* 3x MongoDB Nodes
+** Hardware
+*** 2x Intel Xeon E5-2680 v2 2.8Ghz
+*** 128 GB RAM
+*** 10Gbps NIC
+*** 2x LSI Nytro WarpDrive BLP4-1600[2]
+** Software
+*** Debian Wheezy
+*** mongod 2.6.4
+**** Default config, except setting replSet and enabling periodic logging of CPU and I/O
+**** Journaling enabled
+**** Profiling on message DBs enabled for requests over 10ms
+== Redis ==
+* 1x Redis Node
+** Hardware
+*** 2x Intel Xeon E5-2680 v2 2.8Ghz
+*** 128 GB RAM
+*** 10Gbps NIC
+*** 2x LSI Nytro WarpDrive BLP4-1600[2]
+** Software
+*** Debian Wheezy
+*** Redis 2.4.14
+**** Default config (snapshotting and AOF enabled)
+**** One process
 == Scenarios ==
-### Event Broadcasting (Read-Heavy) ###
+=== Event Broadcasting (Read-Heavy) ===
 OK, so let's say you have a somewhat low-volume source, but tons of event
@@ Line 40: / Line 70: @@
 this a read-heavy workload.
-=== Benchmark Config ===
+==== Benchmark Config ====
 * 2 producer processes with 25 gevent workers each
 ** 1 message posted per request
@@ Line 48: / Line 78: @@
 * 10-second duration
-### Event Broadcasting (Balanced) ###
+==== Results ====
+    * Redis
+        * Producer: 1.7 ms/req,  585 req/sec
+        * Observer: 1.5 ms/req, 1254 req/sec
+    * Mongo
+        * Producer: 2.2 ms/req,  454 req/sec
+        * Observer: 1.5 ms/req, 1224 req/sec
+=== Event Broadcasting (Balanced) ===
 This test uses the same number of producers and consumers, but note that
@@ Line 54: / Line 92: @@
 still outpace the producers, but not as quickly as before.
-=== Benchmark Config ===
+==== Benchmark Config ====
 * 2 producer processes with 25 gevent workers each
 ** 1 message posted per request
@@ Line 62: / Line 100: @@
 * 10-second duration
-### Point-to-Point Messaging ###
+==== Results ====
+    * Redis
+        * Producer: 1.7 ms/req,  585 req/sec
+        * Observer: 1.5 ms/req, 1254 req/sec
+    * Mongo
+        * Producer: 2.2 ms/req,  454 req/sec
+        * Observer: 1.5 ms/req, 1224 req/sec
+=== Point-to-Point Messaging ===
 In this scenario I simulated one client sending messages directly to a
 different client. Only one queue is required in this case[5].
-=== Benchmark Config ===
+==== Benchmark Config ====
 * 1 producer process with 1 gevent worker
 ** 1 message posted per request
@@ Line 74: / Line 120: @@
 * All load sent to a single queue
 * 10-second duration
+==== Results ====
+    * Redis
+        * Producer: 2.9 ms/req, 345 req/sec
+        * Observer: 2.9 ms/req, 339 req/sec
+    * Mongo
+        * Producer: 5.5 ms/req, 179 req/sec
+        * Observer: 3.5 ms/req, 278 req/sec
 === Task Distribution ===
@@ Line 89: / Line 143: @@
 * Load distributed across 4 queues
 * 10-second duration
+==== Results ====
+    * Redis
+        * Producer: 1.5 ms/req, 1280 req/sec
+        * Consumer
+            * Claim: 6.9 ms/req
+            * Delete: 1.5 ms/req
+            * 1257 req/sec (overall)
+    * Mongo
+        * Producer: 2.5 ms/req, 798 req/sec
+        * Consumer
+            * Claim: 8.4 ms/req
+            * Delete: 2.5 ms/req
+            * 813 req/sec (overall)
 === Auditing / Diagnostics ===
@@ Line 110: / Line 178: @@
 * 10-second duration
-== MongoDB ==
+==== Results ====
+    * Redis (Keep-Alive)
-=== Instance Configuration ===
+        * Producer: 1.6 ms/req, 1275 req/sec
-* 3x MongoDB Nodes
+        * Consumer
-** Hardware
+            * Claim: 7.0 ms/req
-*** 2x Intel Xeon E5-2680 v2 2.8Ghz
+            * Delete: 1.5 ms/req
-*** 128 GB RAM
+            * 1217 req/sec (overall)
-*** 10Gbps NIC
+        * Observer: 3.5 ms/req, 282 req/sec
-*** 2x LSI Nytro WarpDrive BLP4-1600[2]
+    * Redis (No Keep-Alive)
-** Software
+        * Producer: 1.6 ms/req, 1255 req/sec
-*** Debian Wheezy
+        * Consumer
-*** mongod 2.6.4
+            * Claim: 7.0 ms/req
-**** Default config, except setting replSet and enabling periodic logging of CPU and I/O
+            * Delete: 1.6 ms/req
-**** Journaling enabled
+            * 1202 req/sec (overall)
-**** Profiling on message DBs enabled for requests over 10ms
+        * Observer: 3.4 ms/req, 281 req/sec
+    * Mongo (Keep-Alive)
-=== Results ===
+        * Producer: 2.2 ms/req, 878 req/sec
+        * Consumer
-== Redis ==
+            * Claim: 8.2 ms/req
-* 1x Redis Node
+            * Delete: 2.3 ms/req
-** Hardware
+            * 876 req/sec (overall)
-*** 2x Intel Xeon E5-2680 v2 2.8Ghz
+        * Observer: 7.4 ms/req, 133 req/sec
-*** 128 GB RAM
-*** 10Gbps NIC
-*** 2x LSI Nytro WarpDrive BLP4-1600[2]
-** Software
-*** Debian Wheezy
-*** Redis 2.4.14
-**** Default config (snapshotting and AOF enabled)
-**** One process

Difference between revisions of "Zaqar/Performance"

Revision as of 11:01, 10 September 2014

Contents

Zaqar's Drivers Performance

Benchmark Environment

MongoDB

Instance Configuration

Redis

Scenarios

Event Broadcasting (Read-Heavy)

Benchmark Config

Results

Event Broadcasting (Balanced)

Benchmark Config

Results

Point-to-Point Messaging

Benchmark Config

Results

Task Distribution

Benchmark Config

Results

Auditing / Diagnostics

Benchmark Config

Results