Difference between revisions of "Large Scale Configuration Rabbit"

Latest revision as of 09:42, 1 September 2022

Please update your links! The Large Scale SIG documentation has now moved to:

https://docs.openstack.org/large-scale/

You can propose changes to the content through the openstack/large-scale git repository.

@@ Line 1: / Line 1: @@
-== Large Scale Configuration of Rabbit ==
+Please update your links! The Large Scale SIG documentation has now moved to:
-=== Introduction ===
+=== https://docs.openstack.org/large-scale/ ===
-The following information are mostly taken from a discussion we had on the mailing list.
-You can see this discussion here:
-http://lists.openstack.org/pipermail/openstack-discuss/2020-August/thread.html#16362
+You can propose changes to the content through the [https://opendev.org/openstack/large-scale openstack/large-scale] git repository.
-==== Clustering or not clustering? ====
-When deploying RabbitMQ, you have two possibility:
-* Deploy rabbit in a cluster
-* Deploy only one rabbit node
-Deploying only one rabbit node can be seen as dangerous, mostly because if the node is down, your service is also down.
-On the other hand, clustering rabbit has some downside that make it harder to configure / manage.
-So, if your cluster is less reliable than a single node, the single node solution is better.
-Moreover, as many OpenStack services are using RabbitMQ for internal communication (a.k.a. RPC), having a highly available rabbit solution is a must have.
-If you choose the clustering mode, you should always keep an odd number of servers in the cluster (like 3 / 5 / 7, etc) to avoid split-brain issues.
-==== One rabbit to rule them all? ====
-You can also consider deploying rabbit in two ways:
-* one rabbit (cluster or not) for each OpenStack services
-* one big rabbit (cluster or not) for all OpenStack services
-There is no recommendation on that part, except that if you split your rabbit in multiples services, you will, for sure, reduce the risk.
-=== Which version of rabbit should I run? ===
-You should always try to consider running the latest version of rabbit.
-We also know that rabbit before 3.8 may have some issues on clustering side, so you might consider running at least rabbitmq 3.8.x.
-See https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk
-=== Rabbit config recommendation ===
-==== Policy ====
-If you plan to deploy rabbit in a cluster, then you will have to configure a policy.
-RabbitMQ apply some policies on queues and exchanges.
-See here: https://www.rabbitmq.com/parameters.html
-If you plan to deploy a cluster of RabbitMQ, you will have to add a policy.
-Remember that Rabbit can apply only '''one policy to a queue''' or an exchange. So you should avoid having multiples policies in your deployment, or if you do, try to avoid overlapping policies because you wont be able to predict which one is effective on a queue.
-===== pattern =====
-Policies are applied based on a regex pattern. The pattern we agreed on (from the mailing list discussion) is the following:
-'^(?!(amq\.)|(.*_fanout_)|(reply_)).*'
-which will set HA on all queues, except the one that:
-* starts with amq.
-* contains _fanout_
-* starts with reply_
-===== parameters =====
-A policy will apply some parameters to queues / exchanges.
-Here are the parameters we recommend when running rabbit in cluster mode:
-{
-    "alternate-exchange": "unroutable",
-    "expires": 86400000,
-    "ha-mode": "all",
-    "ha-promote-on-failure": "always",
-    "ha-promote-on-shutdown": "always",
-    "ha-sync-mode": "manual",
-    "message-ttl": 43200000,
-    "queue-master-locator": "client-local"
-}
-====== alternate-exchange ======
-See https://rabbitmq.com/ae.html
-This is not mandatory, but a nice to have feature to collect "lost" messages from rabbit (the messages that could not be routed).
-====== expires ======
-queue expiration period in milliseconds.
-By default there is no expires.
-So a queue without any consumer for 24H will be automatically deleted.
-====== ha-mode ======
-See https://www.rabbitmq.com/ha.html#mirroring-arguments
-Can be one of:
-* all: queues are mirrored across all nodes
-* exactly: need also ha-params "count". Will be replicated on "count" nodes
-* nodes: need also ha-params "node-names:. Will be replicated on all nodes in "node-names"
-We recommend to mirror all queues across nodes, so a queue which is created on a node will also be created on other nodes.
-====== ha-promote-on-failure ======
-* always: (default) will force moving queue master to another node if master die unexpectedly
-* when-synced: will allow moving queue master only on a synced node. If no synced node, then queue will need to be removed
-We keep the default here, to make sure that on failure, a new queue master will be elected and it will continue working.
-====== ha-promote-on-shutdown ======
-* always: will force moving queue master to another node if master is shutdown
-* when-synced: (default)
-We prefer to have queue master moved to an unsynchronized mirror in all circumstances (i.e. we choose availability of the queue over avoiding message loss due to unsynchronised mirror promotion).
-====== ha-sync-mode ======
-See https://www.rabbitmq.com/ha.html#replication-factor
-* automatic: can be blocking, will always replicate the queue, but can block the io while doing it
-* manual: (default) mode. A new queue mirror will only receive new messages (messages already in the queue wont be mirrored).
-Using manual is not a big issue for us, as most of the time, OpenStack queues are empty.
-====== message-ttl ======
-Message TTL in queues.
-By default, no TTL.
-We recommend to set it to 43200000 (12h).
-This is huge (maybe too much?) but safe.
-So a message not consumed in 12h will be dropped.
-====== master-locator ======
-Determine which node is elected master when creating the queue
-* client-local: (default) Pick the node the client that declares the queue is connected to
-* min-masters: Pick the node hosting the minimum number of bound masters
-* random
-We recommend keeping the client-local (default value).
-==== On OpenStack services ====
-=== Rabbit clustering configuration ===
-=== Rabbit without clustering configuration ===
-TODO