Jump to: navigation, search

Large Scale Scaling Stories/2020-01-29-AlbertBraden

Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order:

Thursday 12/19/2019: openstack server list –all-projects does not return all VMs.

In /etc/nova/nova.conf we have default: # max_limit = 1000

The recordset cleanup script depends on correct output from “openstack server list –all-projects"

Fix: Increased max_limit to 2000

The recordset cleanup script will run “openstack server list –all-projects|wc –l" and compare the output to max_limit, and refuse to run if max_limit is too low. If this happens, increase max_limit so that it is greater than the number of VMs in the cluster.

As time permits we need to look into paging results: https://docs.openstack.org/api-guide/compute/paginated_collections.html

Friday 12/13/2019: Arp table got full on pod2 controllers


Fix: Increase sysctl values:

--- a/roles/openstack/controller/neutron/tasks/main.yml 
+++ b/roles/openstack/controller/neutron/tasks/main.yml 
@@ -243,6 +243,9 @@ 
       - { name: 'net.bridge.bridge-nf-call-iptables', value: '1' } 
       - { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' } 
+      - { name: 'net.ipv4.neigh.default.gc_thresh3', value: '4096' } 
+      - { name: 'net.ipv4.neigh.default.gc_thresh2', value: '2048' } 
+      - { name: 'net.ipv4.neigh.default.gc_thresh1', value: '1024' }

12/10/2019: RPC workers were overloaded


Fix: increase number of RPC workers. modify /etc/neutron/neutron.conf on controllers:

< #rpc_workers = 1 
> rpc_workers = 8

October 2019: Rootwrap

Neutron was timing out because rootwrap was taking too long to spawn.

Fix: Run rootwrap daemon:

Add line to /etc/neutron/neutron.conf on the controllers:

root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"

Add line to /etc/sudoers.d/neutron_sudoers on the controllers:

neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf