Large Scale SIG
Belmiro Moreira <email@example.com> Thierry Carrez <firstname.lastname@example.org>
The aim of the group is to facilitate running OpenStack at large scale, answer questions that OpenStack operators have as they need to scale up and scale out, and help address some of the limitations operators encounter in large OpenStack clusters.
The Scaling Journey
The work of the group is organized along the various stages in the scaling journey for someone growing an OpenStack deployment. That path was successfully traveled by many before, . The job of the SIG is to extract that knowledge and provide answers for those who come next.
For each step the SIG will collect frequently-asked questions and answers, articles, presentations. When documentation or tools are missing, we help to produce them.
Stage 1: Configure
Tune configuration options and optimize the parameters for your OpenStack cluster, so that it can handle additional load.
See Large_Scale_SIG/Configure for more details !
Stage 2: Monitor
Meaningful monitoring of your cluster to detect strain and limits.
See Large_Scale_SIG/Monitor for more details !
Stage 3: Scale up
As you reach those limits, what can be done to handle more load within one cluster.
See Large_Scale_SIG/ScaleUp for more details !
Stage 4: Scale out
Past a given scale, you will have to scale out to multiple clusters, regions, cells or zones. What are the available options?
See Large_Scale_SIG/ScaleOut for more details !
Stage 5: Upgrade and maintain
Once you have scaled out, how to do you effectively upgrade and maintain your deployment?
See Large_Scale_SIG/UpgradeAndMaintain for more details !
Large Scale OpenStack show on OpenInfra Live
We are regularly producing a live show on https://openinfra.live , sharing experience from large scale OpenStack deployments and taking questions from the audience.
- Upgrades in Large Scale OpenStack Infrastructure with operators from Blizzard Entertainment, OVHcloud, Bloomberg, Workday, Vexxhost and CERN
- Presentation video (May 20, 2021)
- Discussion video (Jun 10, 2021)
- How OpenStack Large Clouds Manage their Spare Capacity https://youtu.be/G7oN2XdI__k (July 15, 2021)
- Discussing Software-Defined Supercomputers https://youtu.be/fOJTHanmOFg (August 26, 2021)
- Neutron scaling best practices https://youtu.be/4ZLqILbLIpQ (October 14, 2021)
- Operators’ Tricks and Tools https://youtu.be/F_9KKAQE4fc (December 9, 2021)
- Large Scale Ops Deep Dive: OVHCloud https://youtu.be/XV-L7b8lSXw (February 3, 2022)
Ceph and OpenStack
Ceph is very commonly associated with OpenStack, which raises a number of questions.
See Large_Scale_SIG/CephAndOpenStack for more details !
Join the SIG!
The Large Scale SIG will use mostly asynchronous communications means: discussions on the openstack-discuss mailing-list using the [largescale-sig] prefix, and various etherpads. Occasionally we may leverage the #openstack-operators IRC channel for synchronous discussion.
The Large Scale SIG meets on IRC typically every two weeks:
- Current IRC meeting schedule
- Propose agenda items for our next meeting
- Past meetings summary and logs
Live open discussions
In the past we had live open discussions in a video meeting around a specific topic. Thise were replaced by openInfra.live. Here are links to old videos:
- Regions vs Cells video slides with an introduction presentation from Belmiro Moreira (Feb 24, 2021)
- Scaling RabbitMQ Clusters video slides with a kickstart presentation from Gene Kuo (Mar 24, 2021)
- Project Teams Gathering meetings Wednesday Oct 28 7UTC-8UTC and 16UTC-17UTC: see https://etherpad.opendev.org/p/wallaby-ptg-largescale-sig
- Open Infrastructure Summit, virtual, Oct 20, 2020 (forum session): see https://etherpad.opendev.org/p/vSummit2020_OpenStackScalingStory
- OpenDev (virtual), June 2020
- Open Infrastructure Summit, Shanghai, Nov 4, 2019 (forum session): see https://etherpad.openstack.org/p/PVG-large-scale-SIG