OpsGuide/RabbitMQ troubleshooting

This section provides tips on resolving common RabbitMQ issues.

RabbitMQ service hangs
It is quite common for the RabbitMQ service to hang when it is restarted or stopped. Therefore, it is highly recommended that you manually restart RabbitMQ on each controller node.

Note

The RabbitMQ service name may vary depending on your operating system or vendor who supplies your RabbitMQ service.

 Restart the RabbitMQ service on the first controller node. The service rabbitmq-server restart command may not work in certain situations, so it is best to use:

 If the service refuses to stop, then run the pkill command to stop the service, then restart the service:

 Verify RabbitMQ processes are running:

 If there are errors, run the cluster_status command to make sure there are no partitions:

For more information, see RabbitMQ documentation. Go back to the first step and try restarting the RabbitMQ service again. If you still have errors, remove the contents in the  directory between stopping and starting the RabbitMQ service. If there are no errors, restart the RabbitMQ service on the next controller node.

Since the Liberty release, OpenStack services will automatically recover from a RabbitMQ outage. You should only consider restarting OpenStack services after checking if RabbitMQ heartbeat functionality is enabled, and if OpenStack services are not picking up messages from RabbitMQ queues.

RabbitMQ alerts
If you receive alerts for RabbitMQ, take the following steps to troubleshoot and resolve the issue:

 Determine which servers the RabbitMQ alarms are coming from.</li> Attempt to boot a nova instance in the affected environment.</li> If you cannot launch an instance, continue to troubleshoot the issue.</li> Log in to each of the controller nodes for the affected environment, and check the  log files for any reported issues.</li> Look for connection issues identified in the log files.</li> For each controller node in your environment, view the  directory to check it contains nova*, cinder*, neutron*, or glance*. Also check RabbitMQ message queues that are growing without being consumed which will indicate which OpenStack service is affected. Restart the affected OpenStack service.</li> For each compute node your environment, view the  directory and check if it contains nova*, cinder*, neutron*, or glance*, Also check RabbitMQ message queues that are growing without being consumed which will indicate which OpenStack services are affected. Restart the affected OpenStack services.</li> Open OpenStack Dashboard and launch an instance. If the instance launches, the issue is resolved.</li> If you cannot launch an instance, check the  log files for reported connection issues.</li> Restart the RabbitMQ service on all of the controller nodes:

</li> Repeat steps 7-8.</li></ol>

Excessive database management memory consumption
Since the Liberty release, OpenStack with RabbitMQ 3.4.x or 3.6.x has an issue with the management database consuming the memory allocated to RabbitMQ. This is caused by statistics collection and processing. When a single node with RabbitMQ reaches its memory threshold, all exchange and queue processing is halted until the memory alarm recovers.

To address this issue:

<ol style="list-style-type: decimal;"> Check memory consumption:

</li> Edit the  configuration file, and change the   parameter between 30000-60000 milliseconds. Alternatively you can turn off statistics collection by setting  parameter to “none”.</li></ol>

File descriptor limits when scaling a cloud environment
A cloud environment that is scaled to a certain size will require the file descriptor limits to be adjusted.

Run the rabbitmqctl status to view the current file descriptor limits:

Adjust the appropriate limits in the  configuration file.