Large Scale SIG/ScaleUp
The third stage in the Scaling Journey is Scale Up.
As you monitor your cluster at scale, you will see that it hits scaling limits within one cluster. All hope is not lost, though! There are things you can put in place push back how much a single cluster can handle, before having to resort to setting up a more complex deployment configuration. This page aims to help answer those questions.
Once you are past that stage, you are ready to proceed to next stage of the Scaling Journey: Scale Out.
Q: Cleaning up deleted entries in my database is a bit of a hassle. is there a tool I could use to help me with that?
A: The OSarchiver tool, developed by OVH, can help you there: see https://github.com/ovh/osarchiver/ . We are working on making it maintained upstream as part of the OSops tooling.
Q: How many compute nodes can a typical OpenStack cluster contain ?
A: Request may timeout when scheduling large number of instances in a single request (> 100) when cluster size grows beyond 1000 compute nodes
Q: How do you decide to add a new node for control plane
A: If you found out that your rabbitmq queue keep piling up for a certain service, it usually means that it's time to add more control plane workers to those service to consume the queue.
- A curated collection of scaling stories, as we collect them
- Evaluation of internal messaging
- Old but still relevant/interesting: https://www.youtube.com/watch?v=bpmgxrPOrZw
- Evaluation of databases
- Scaling Neutron: https://www.youtube.com/watch?v=5WL47L1P5kE (https://www.slideshare.net/moreirabelmiro/evolution-of-openstack-networking-at-cern)
- Scaling Nova/Ironic: https://techblog.web.cern.ch/techblog/post/nova-ironic-at-scale/
- Scheduling Performance: https://techblog.web.cern.ch/techblog/post/scheduling-optimizations/
- Global scaling: https://www.openstack.org/summit/barcelona-2016/summit-schedule/events/15977/chasing-1000-nodes-scale