Jump to: navigation, search

Difference between revisions of "Large Scale SIG"

 
(17 intermediate revisions by 4 users not shown)
Line 4: Line 4:
 
'''Chairs''':  
 
'''Chairs''':  
 
   Belmiro Moreira <belmiro.moreira@cern.ch>
 
   Belmiro Moreira <belmiro.moreira@cern.ch>
  Pengju Jiao <jiaopengju@cmss.chinamobile.com>
 
 
   Thierry Carrez <thierry@openstack.org>
 
   Thierry Carrez <thierry@openstack.org>
  
The aim of the group is to facilitate running OpenStack at large scale, and help address some of the limitations operators encounter in large OpenStack clusters.
+
The aim of the group is to facilitate running OpenStack at large scale, answer questions that OpenStack operators have as they need to scale up and scale out, and help address some of the limitations operators encounter in large OpenStack clusters.
  
=== Short-term Objectives ===
 
* Scaling within one cluster, and instrumentation of the bottlenecks there (see https://etherpad.openstack.org/p/large-scale-sig-cluster-scaling)
 
* Document large scale configuration and tips &tricks (see https://etherpad.openstack.org/p/large-scale-sig-documentation)
 
  
=== Communications ===
+
== The Scaling Journey ==
 +
The work of the group is organized along the various stages in the scaling journey for someone growing an OpenStack deployment. That path was successfully traveled by many before, . The job of the SIG is to extract that knowledge and provide answers for those who come next.
 +
 
 +
For each step the SIG will collect frequently-asked questions and answers, articles, presentations. When documentation or tools are missing, we help to produce them.
 +
 
 +
==== Stage 1: Configure ====
 +
Tune configuration options and optimize the parameters for your OpenStack cluster, so that it can handle additional load.
 +
 
 +
See [[Large_Scale_SIG/Configure]] for more details !
 +
 
 +
==== Stage 2: Monitor ====
 +
Meaningful monitoring of your cluster to detect strain and limits.
 +
 
 +
See [[Large_Scale_SIG/Monitor]] for more details !
 +
 
 +
==== Stage 3: Scale up ====
 +
As you reach those limits, what can be done to handle more load within one cluster.
 +
 
 +
See [[Large_Scale_SIG/ScaleUp]] for more details !
 +
 
 +
==== Stage 4: Scale out ====
 +
Past a given scale, you will have to scale out to multiple clusters, regions, cells or zones. What are the available options?
 +
 
 +
See [[Large_Scale_SIG/ScaleOut]] for more details !
 +
 
 +
==== Stage 5: Upgrade and maintain ====
 +
Once you have scaled out, how to do you effectively upgrade and maintain your deployment?
 +
 
 +
See [[Large_Scale_SIG/UpgradeAndMaintain]] for more details !
 +
 
 +
 
 +
== ''Large Scale OpenStack'' show on OpenInfra Live ==
 +
 
 +
We are regularly producing a live show on https://openinfra.live , sharing experience from large scale OpenStack deployments and taking questions from the audience.
 +
 
 +
Past episodes:
 +
 
 +
* Upgrades in Large Scale OpenStack Infrastructure with operators from Blizzard Entertainment, OVHcloud, Bloomberg, Workday, Vexxhost and CERN
 +
** [https://www.youtube.com/watch?v=yf5iFiCg_Tw Presentation video] (May 20, 2021)
 +
** [https://www.youtube.com/watch?v=C2fSy005lDs Discussion video] (Jun 10, 2021)
 +
* How OpenStack Large Clouds Manage their Spare Capacity https://youtu.be/G7oN2XdI__k (July 15, 2021)
 +
* Discussing Software-Defined Supercomputers https://youtu.be/fOJTHanmOFg (August 26, 2021)
 +
* Neutron scaling best practices https://youtu.be/4ZLqILbLIpQ (October 14, 2021)
 +
* Operators’ Tricks and Tools https://youtu.be/F_9KKAQE4fc (December 9, 2021)
 +
* Large Scale Ops Deep Dive: OVHCloud https://youtu.be/XV-L7b8lSXw (February 3, 2022)
 +
 
 +
 
 +
== Other ==
 +
 
 +
==== Ceph and OpenStack ====
 +
Ceph is very commonly associated with OpenStack, which raises a number of questions.
 +
 
 +
See [[Large_Scale_SIG/CephAndOpenStack]] for more details !
 +
 
 +
 
 +
== Join the SIG! ==
 
The Large Scale SIG will use mostly asynchronous communications means: discussions on the [http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss openstack-discuss mailing-list] using the [largescale-sig] prefix, and various etherpads. Occasionally we may leverage the #openstack-operators IRC channel for synchronous discussion.
 
The Large Scale SIG will use mostly asynchronous communications means: discussions on the [http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-discuss openstack-discuss mailing-list] using the [largescale-sig] prefix, and various etherpads. Occasionally we may leverage the #openstack-operators IRC channel for synchronous discussion.
  
=== Meeting ===
+
==== IRC Meeting ====
The Large Scale SIG meets on IRC typically every two weeks.  
+
The Large Scale SIG meets on IRC typically every two weeks:
See https://etherpad.openstack.org/p/large-scale-sig-meeting for details on the next meeting
+
* Current [http://eavesdrop.openstack.org/#Large_Scale_SIG_Meeting IRC meeting schedule]
 +
* Propose [https://etherpad.openstack.org/p/large-scale-sig-meeting agenda items] for our next meeting
 +
* Past meetings [http://eavesdrop.openstack.org/meetings/large_scale_sig/ summary and logs]
  
=== Past meeting summary and logs ===
+
==== Live open discussions ====
See http://eavesdrop.openstack.org/meetings/large_scale_sig/
+
In the past we had live open discussions in a video meeting around a specific topic. Thise were replaced by openInfra.live. Here are links to old videos:
 +
* Regions vs Cells [https://www.youtube.com/watch?v=mZ2EeQ0_SEU video] [https://drive.google.com/file/d/1BzLqjCI9twXk58oz0VVfVV26wWOwkpWF/view?usp=sharing slides] with an introduction presentation from Belmiro Moreira (Feb 24, 2021)
 +
* Scaling RabbitMQ Clusters [https://www.youtube.com/watch?v=f9ingNg6eoA video] [https://speakerdeck.com/line_developers/rabbitmq-cluster-at-large-scale-openstack-infra slides] with a kickstart presentation from Gene Kuo (Mar 24, 2021)
  
=== In-person meetings ===
+
==== Past events ====
* Open Infrastructure Summit, Shanghai, Nov 4, 2019 (forum session)
+
* Project Teams Gathering meetings Wednesday Oct 28 7UTC-8UTC and 16UTC-17UTC: see https://etherpad.opendev.org/p/wallaby-ptg-largescale-sig
** Notes at https://etherpad.openstack.org/p/PVG-large-scale-SIG
+
* Open Infrastructure Summit, virtual, Oct 20, 2020 (forum session): see https://etherpad.opendev.org/p/vSummit2020_OpenStackScalingStory
 +
* OpenDev (virtual), June 2020
 +
* Open Infrastructure Summit, Shanghai, Nov 4, 2019 (forum session): see https://etherpad.openstack.org/p/PVG-large-scale-SIG

Latest revision as of 13:16, 6 April 2022

Status: Active

Chairs:

 Belmiro Moreira <belmiro.moreira@cern.ch>
 Thierry Carrez <thierry@openstack.org>

The aim of the group is to facilitate running OpenStack at large scale, answer questions that OpenStack operators have as they need to scale up and scale out, and help address some of the limitations operators encounter in large OpenStack clusters.


The Scaling Journey

The work of the group is organized along the various stages in the scaling journey for someone growing an OpenStack deployment. That path was successfully traveled by many before, . The job of the SIG is to extract that knowledge and provide answers for those who come next.

For each step the SIG will collect frequently-asked questions and answers, articles, presentations. When documentation or tools are missing, we help to produce them.

Stage 1: Configure

Tune configuration options and optimize the parameters for your OpenStack cluster, so that it can handle additional load.

See Large_Scale_SIG/Configure for more details !

Stage 2: Monitor

Meaningful monitoring of your cluster to detect strain and limits.

See Large_Scale_SIG/Monitor for more details !

Stage 3: Scale up

As you reach those limits, what can be done to handle more load within one cluster.

See Large_Scale_SIG/ScaleUp for more details !

Stage 4: Scale out

Past a given scale, you will have to scale out to multiple clusters, regions, cells or zones. What are the available options?

See Large_Scale_SIG/ScaleOut for more details !

Stage 5: Upgrade and maintain

Once you have scaled out, how to do you effectively upgrade and maintain your deployment?

See Large_Scale_SIG/UpgradeAndMaintain for more details !


Large Scale OpenStack show on OpenInfra Live

We are regularly producing a live show on https://openinfra.live , sharing experience from large scale OpenStack deployments and taking questions from the audience.

Past episodes:


Other

Ceph and OpenStack

Ceph is very commonly associated with OpenStack, which raises a number of questions.

See Large_Scale_SIG/CephAndOpenStack for more details !


Join the SIG!

The Large Scale SIG will use mostly asynchronous communications means: discussions on the openstack-discuss mailing-list using the [largescale-sig] prefix, and various etherpads. Occasionally we may leverage the #openstack-operators IRC channel for synchronous discussion.

IRC Meeting

The Large Scale SIG meets on IRC typically every two weeks:

Live open discussions

In the past we had live open discussions in a video meeting around a specific topic. Thise were replaced by openInfra.live. Here are links to old videos:

  • Regions vs Cells video slides with an introduction presentation from Belmiro Moreira (Feb 24, 2021)
  • Scaling RabbitMQ Clusters video slides with a kickstart presentation from Gene Kuo (Mar 24, 2021)

Past events