|
|
(6 intermediate revisions by 2 users not shown) |
Line 1: |
Line 1: |
− | == Large Scale Configuration of Rabbit ==
| + | Please update your links! The Large Scale SIG documentation has now moved to: |
| | | |
− | === Introduction === | + | === https://docs.openstack.org/large-scale/ === |
− | The following information are mostly taken from a discussion we had on the mailing list.
| |
− | You can see this discussion here:
| |
| | | |
− | http://lists.openstack.org/pipermail/openstack-discuss/2020-August/thread.html#16362
| + | You can propose changes to the content through the [https://opendev.org/openstack/large-scale openstack/large-scale] git repository. |
− | | |
− | | |
− | ==== Clustering or not clustering? ====
| |
− | | |
− | When deploying RabbitMQ, you have two possibility:
| |
− | * Deploy rabbit in a cluster
| |
− | * Deploy only one rabbit node
| |
− | | |
− | Deploying only one rabbit node can be seen as dangerous, mostly because if the node is down, your service is also down.
| |
− | | |
− | On the other hand, clustering rabbit has some downside that make it harder to configure / manage.
| |
− | | |
− | So, if your cluster is less reliable than a single node, the single node solution is better.
| |
− | | |
− | Moreover, as many OpenStack services are using RabbitMQ for internal communication (a.k.a. RPC), having a highly available rabbit solution is a must have.
| |
− | | |
− | If you choose the clustering mode, you should always keep an odd number of servers in the cluster (like 3 / 5 / 7, etc) to avoid split-brain issues.
| |
− | | |
− | ==== One rabbit to rule them all? ====
| |
− | | |
− | You can also consider deploying rabbit in two ways: | |
− | * one rabbit (cluster or not) for each OpenStack services
| |
− | * one big rabbit (cluster or not) for all OpenStack services
| |
− | | |
− | There is no recommendation on that part, except that if you split your rabbit in multiples services, you will, for sure, reduce the risk.
| |
− | | |
− | | |
− | === Which version of rabbit should I run? ===
| |
− | You should always try to consider running the latest version of rabbit.
| |
− | | |
− | We also know that rabbit before 3.8 may have some issues on clustering side, so you might consider running at least rabbitmq 3.8.x.
| |
− | | |
− | See https://groups.google.com/forum/#!newtopic/rabbitmq-users/rabbitmq-users/zFhmpHF2aWk
| |
− | | |
− | === Rabbit config recommendation ===
| |
− | ==== Policy ====
| |
− | If you plan to deploy rabbit in a cluster, then you will have to configure a policy.
| |
− | | |
− | RabbitMQ apply some policies on queues and exchanges.
| |
− | See here: https://www.rabbitmq.com/parameters.html
| |
− | | |
− | If you plan to deploy a cluster of RabbitMQ, you will have to add a policy.
| |
− | | |
− | Remember that Rabbit can apply only '''one policy to a queue''' or an exchange. So you should avoid having multiples policies in your deployment, or if you do, try to avoid overlapping policies because you wont be able to predict which one is effective on a queue.
| |
− | | |
− | ===== pattern =====
| |
− | Policies are applied based on a regex pattern. The pattern we agreed on (from the mailing list discussion) is the following:
| |
− | | |
− | '^(?!(amq\.)|(.*_fanout_)|(reply_)).*'
| |
− | | |
− | which will set HA on all queues, except the one that:
| |
− | * starts with amq.
| |
− | * contains _fanout_
| |
− | * starts with reply_
| |
− | | |
− | ===== parameters =====
| |
− | A policy will apply some parameters to queues / exchanges.
| |
− | | |
− | Here are the parameters we recommend when running rabbit in cluster mode:
| |
− | {
| |
− | "alternate-exchange": "unroutable",
| |
− | "expires": 86400000,
| |
− | "ha-mode": "all",
| |
− | "ha-promote-on-failure": "always",
| |
− | "ha-promote-on-shutdown": "always",
| |
− | "ha-sync-mode": "manual",
| |
− | "message-ttl": 43200000,
| |
− | "queue-master-locator": "client-local"
| |
− | }
| |
− | | |
− | ====== alternate-exchange ======
| |
− | See https://rabbitmq.com/ae.html
| |
− | | |
− | This is not mandatory, but a nice to have feature to collect "lost" messages from rabbit (the messages that could not be routed).
| |
− | | |
− | ====== expires ======
| |
− | queue expiration period in milliseconds.
| |
− | By default there is no expires.
| |
− | | |
− | So a queue without any consumer for 24H will be automatically deleted.
| |
− | | |
− | ====== ha-mode ======
| |
− | See https://www.rabbitmq.com/ha.html#mirroring-arguments
| |
− | | |
− | Can be one of:
| |
− | * all: queues are mirrored across all nodes
| |
− | * exactly: need also ha-params "count". Will be replicated on "count" nodes
| |
− | * nodes: need also ha-params "node-names:. Will be replicated on all nodes in "node-names"
| |
− | | |
− | We recommend to mirror all queues across nodes, so a queue which is created on a node will also be created on other nodes.
| |
− | | |
− | | |
− | ====== ha-promote-on-failure ======
| |
− | * always: (default) will force moving queue master to another node if master die unexpectedly
| |
− | * when-synced: will allow moving queue master only on a synced node. If no synced node, then queue will need to be removed
| |
− | | |
− | We keep the default here, to make sure that on failure, a new queue master will be elected and it will continue working.
| |
− | | |
− | ====== ha-promote-on-shutdown ======
| |
− | * always: will force moving queue master to another node if master is shutdown
| |
− | * when-synced: (default)
| |
− | | |
− | We prefer to have queue master moved to an unsynchronized mirror in all circumstances (i.e. we choose availability of the queue over avoiding message loss due to unsynchronised mirror promotion).
| |
− | | |
− | ====== ha-sync-mode ======
| |
− | See https://www.rabbitmq.com/ha.html#replication-factor
| |
− | | |
− | * automatic: can be blocking, will always replicate the queue, but can block the io while doing it
| |
− | * manual: (default) mode. A new queue mirror will only receive new messages (messages already in the queue wont be mirrored).
| |
− | | |
− | | |
− | Using manual is not a big issue for us, as most of the time, OpenStack queues are empty.
| |
− | | |
− | | |
− | ====== message-ttl ======
| |
− | Message TTL in queues.
| |
− | | |
− | By default, no TTL.
| |
− | | |
− | We recommend to set it to 43200000 (12h).
| |
− | | |
− | This is huge (maybe too much?) but safe.
| |
− | | |
− | So a message not consumed in 12h will be dropped.
| |
− | | |
− | ====== master-locator ======
| |
− | Determine which node is elected master when creating the queue
| |
− | * client-local: (default) Pick the node the client that declares the queue is connected to
| |
− | * min-masters: Pick the node hosting the minimum number of bound masters
| |
− | * random
| |
− | | |
− | We recommend keeping the client-local (default value).
| |
− | | |
− | ==== On OpenStack services ====
| |
− | | |
− | === Rabbit clustering configuration ===
| |
− | | |
− | | |
− | === Rabbit without clustering configuration ===
| |
− | | |
− | TODO
| |