Jump to: navigation, search

Difference between revisions of "Large Scale Scaling Stories/2020-01-29-AlbertBraden"

(Created page with "Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order: ==== Thursday 12/19/2019: openstack server list –all-projects does no...")
 
 
Line 1: Line 1:
Here are the scaling issues I've encountered recently at Synopsys, in reverse chronological order:
+
Please update your links! The Large Scale SIG documentation has now moved to:
   
 
==== Thursday 12/19/2019: openstack server list –all-projects does not return all VMs.  ====
 
  
In /etc/nova/nova.conf we have default: # max_limit = 1000
+
=== https://docs.openstack.org/large-scale/ ===
  
The recordset cleanup script depends on correct output from “openstack server list –all-projects"
+
You can propose changes to the content through the [https://opendev.org/openstack/large-scale openstack/large-scale] git repository.
 
 
Fix: Increased max_limit to 2000
 
 
 
The recordset cleanup script will run “openstack server list –all-projects|wc –l" and compare the output to max_limit, and refuse to run if max_limit is too low. If this happens, increase max_limit so that it is greater than the number of VMs in the cluster.
 
 
 
As time permits we need to look into paging results: https://docs.openstack.org/api-guide/compute/paginated_collections.html
 
 
 
==== Friday 12/13/2019: Arp table got full on pod2 controllers ====
 
https://www.cyberciti.biz/faq/centos-redhat-debian-linux-neighbor-table-overflow/
 
 
 
Fix: Increase sysctl values:
 
<source>--- a/roles/openstack/controller/neutron/tasks/main.yml
 
+++ b/roles/openstack/controller/neutron/tasks/main.yml
 
@@ -243,6 +243,9 @@
 
    with_items:
 
      - { name: 'net.bridge.bridge-nf-call-iptables', value: '1' }
 
      - { name: 'net.bridge.bridge-nf-call-ip6tables', value: '1' }
 
+      - { name: 'net.ipv4.neigh.default.gc_thresh3', value: '4096' }
 
+      - { name: 'net.ipv4.neigh.default.gc_thresh2', value: '2048' }
 
+      - { name: 'net.ipv4.neigh.default.gc_thresh1', value: '1024' }
 
</source>
 
 
==== 12/10/2019: RPC workers were overloaded ====
 
http://lists.openstack.org/pipermail/openstack-discuss/2019-December/011465.html
 
 
 
Fix: increase number of RPC workers. modify /etc/neutron/neutron.conf on controllers:
 
 
 
<source>148c148
 
< #rpc_workers = 1
 
---
 
> rpc_workers = 8</source>
 
 
==== October 2019: Rootwrap ====
 
Neutron was timing out because rootwrap was taking too long to spawn.
 
 
 
Fix: Run rootwrap daemon:
 
 
 
Add line to /etc/neutron/neutron.conf on the controllers:
 
 
 
root_helper_daemon = "sudo /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf"
 
 
Add line to /etc/sudoers.d/neutron_sudoers on the controllers:
 
 
 
neutron ALL = (root) NOPASSWD: /usr/bin/neutron-rootwrap-daemon /etc/neutron/rootwrap.conf
 

Latest revision as of 09:42, 1 September 2022

Please update your links! The Large Scale SIG documentation has now moved to:

https://docs.openstack.org/large-scale/

You can propose changes to the content through the openstack/large-scale git repository.