Jump to: navigation, search

Difference between revisions of "Meetings/InfraTeamMeeting"

(Agenda for next meeting)
(Agenda for next meeting)
 
(197 intermediate revisions by 6 users not shown)
Line 10: Line 10:
  
 
* Announcements
 
* Announcements
** Holiday at the end of the week for some of us.
 
  
 
* Actions from last meeting
 
* Actions from last meeting
Line 18: Line 17:
 
* Topics
 
* Topics
 
** Upgrading Old Servers (clarkb 20230627)
 
** Upgrading Old Servers (clarkb 20230627)
*** https://etherpad.opendev.org/p/opendev-bionic-server-upgrades
+
*** https://etherpad.opendev.org/p/opendev-server-upgrade-planning Central tracking document which may link to more host specific documents
**** wiki.openstack.org: https://etherpad.opendev.org/p/opendev-mediawiki-upgrade
+
*** Next on the list are graphite and backup servers
**** tonyb looking at cacti after wiki
+
*** backup03.ca-ymq-1.vexxhost.opendev.org has been launched and is being backed up too
*** https://etherpad.opendev.org/p/opendev-focal-server-upgrades
+
**** https://review.opendev.org/c/opendev/system-config/+/995420 Starting backup02 removal here
**** tonyb expects to try a simple focal -> noble upgrade this week
+
*** Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic
** AFS Mirror cleanups (clarkb 20240220)
+
** Deploying a Prometheus for Server Metrics (clarkb 20260331)
*** Ubuntu Xenial cleanups are starting to show up under topic:drop-ubuntu-xenial
+
*** https://review.opendev.org/c/opendev/system-config/+/980840
*** CentOS 8 Stream EOLd and jobs can no longer successfully run there. Cleanup is happening under topic:drop-centos-8-stream
+
*** This change and its child deploy prometheus with node exporter to collect server metrics
**** Projects like glance are still holding onto these old jobs despite no hope of them ever succeeding. Do we want to say Monday July 8 we're force merging cleanups on our side and projects will have to deal with the fallout since they aren't fixing things now?
+
*** These two changes simplify the setup and testing of prometheus and node exporter
*** Can followup with webserver log processing to determine which other mirrors may be dead.
+
**** https://review.opendev.org/c/zuul/zuul-jobs/+/994564 manage /etc/hosts with public IPs
** Gitea 1.22 Upgrade Planning (clarkb 20240528)
+
**** https://review.opendev.org/c/opendev/system-config/+/994565 Use public IPs in system-config-run jobs
*** There is a Gitea 1.22.0 release now. Once we have the general upgrade working we can test the doctor tool to fixup the DB case sensitivity.
+
** Larger VM sizes for tests (corvus 20260618)
*** https://review.opendev.org/c/opendev/system-config/+/920580
+
*** corvus has been testing python 3.14 with zuul; zuul unit tests now use slightly more than 8GB under 3.14
** OpenMetal Cloud Rebuild (clarkb 20240604)
+
*** We have 16gb nodes, but in two clouds, rax-classic and vexxhost, they have fewer vcpus than their 8gb counterparts, so we need to use 32gb nodes to compensate
*** Waiting on a response from OpenMetal about the best way to proceed. I don't want to try and fix things and step on their toes or vice versa.
+
*** Are we okay with this?  Alternatives?
** Testing Rackspace's New Cloud Offering (clarkb 20240604)
+
** Dealing with alien zuul config errors in the openstack tenant (frickler 20260617)
*** Rax reached out via a ticket to our nodepool account indicating we can test out their new offering. They are specifically looking for feedback
+
*** Currently there are still 185 zuul config errors in the openstack tenant, despite my year-long struggle to get rid of them.
*** Clarkb is working on some dialogue after Rax reached out a bit more directly.
+
*** Most of these are from "alien" repos (74 airship, 29 starlingx) that I have no motivation to fix myself with my OpenStack hats on
** Nodepool in Zuul (corvus 20240702)
+
*** Efforts to motivate these projects to clean up their errors themselves have mostly failed
*** Nodepool in Zuul work is proceeding
+
*** I still believe that cleaning these up and being able to easily identify fresh errors is important for the healthyness of the CI setup as a whole
*** Creating Opendev image build jobs
+
*** One pretty strong action would be to move these repos into their own tenant(s) or a different shared one like opendev
*** Running zuul-launcher in shadow mode
+
*** I acknowledge that without further work this would break their CI setup, but I'm questioning now whether that impact would be worse than the impact the current situation has on my ability to maintain the OpenStack CI
** Collating backlog items from $everyone (tonyb 20240107)
+
*** Other ideas or opinions are welcome
*** Review of approved specs: https://docs.opendev.org/opendev/infra-specs/latest/ Seems like that could do with some love
+
*** clarkb reached out to starlingx and airship about this
*** Would it be helpful to have a '#noteit' IRC action that'd record the timestamp for context and add it to an editable list?
+
**** Airship indicated they would like to avoid the extra work involved in setting up a separate tenant
*** There is a pretty solid risk of this becoming long and write only :/
+
**** clarkb pointed out to them that they would need to fix their zuul config errors and be reachable via email or matrix at a bare minumum if we want to make that work.
 +
**** https://lists.starlingx.io/archives/list/starlingx-discuss@lists.starlingx.io/thread/YQVACUR4OCX74ZULHAJ4AD44MHGY37YI/
 +
** Gitea 1.26.4 Upgrade (clarkb 20260622)
 +
*** https://review.opendev.org/c/opendev/system-config/+/994326 Upgrade Gitea to 1.26.4
 +
*** Its time to upgrade to the next Gitea bugfix release
 +
** Bump Anubis difficult to 5 (clarkb 20260630)
 +
*** There is some evidence that bots are regularly solving the Anubis challenge
 +
*** The challenges are slowing them down enough that services continue to be mostly responsive
 +
*** Should we increase the difficulty one level to slow them down even futher?
 +
*** This will impact regular users too which is likely the primary consideration we should make.
 +
*** https://review.opendev.org/c/opendev/system-config/+/995096
 +
** Planning Gerrit Project Renames (clarkb 20260622)
 +
*** We have a request to rename x/cursive to openstack/cursive
 +
*** Any concern with project ownership doing that? The current group membership includes people from Johns Hopkins University and OpenStack Barbican
 +
*** Aiming for July 9 at ~2100 UTC
  
 
* Open discussion
 
* Open discussion
Line 51: Line 64:
 
Changes should have their topic set to project-rename.
 
Changes should have their topic set to project-rename.
  
* Rename example/foo -> example/bar: https://review.opendev.org/c/openstack/project-config/+/123456
+
* Rename x/cursive -> openstack/cursive: https://review.opendev.org/c/openstack/project-config/+/990122 (stephenfin, fungi)
  
 
== Previous meetings ==
 
== Previous meetings ==
 
Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/
 
Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/

Latest revision as of 14:53, 30 June 2026

Weekly Project Infrastructure team meeting

The OpenDev Team holds public weekly meetings in #opendev-meeting on OFTC, Tuesdays at 1900 UTC. Everyone interested in infrastructure and process surrounding automated testing and deployment is encouraged to attend.

Please feel free to add agenda items (and your IRC nick in parenthesis).

Agenda for next meeting

  • Announcements
  • Actions from last meeting
  • Specs Review
  • Topics
    • Upgrading Old Servers (clarkb 20230627)
    • Deploying a Prometheus for Server Metrics (clarkb 20260331)
    • Larger VM sizes for tests (corvus 20260618)
      • corvus has been testing python 3.14 with zuul; zuul unit tests now use slightly more than 8GB under 3.14
      • We have 16gb nodes, but in two clouds, rax-classic and vexxhost, they have fewer vcpus than their 8gb counterparts, so we need to use 32gb nodes to compensate
      • Are we okay with this? Alternatives?
    • Dealing with alien zuul config errors in the openstack tenant (frickler 20260617)
      • Currently there are still 185 zuul config errors in the openstack tenant, despite my year-long struggle to get rid of them.
      • Most of these are from "alien" repos (74 airship, 29 starlingx) that I have no motivation to fix myself with my OpenStack hats on
      • Efforts to motivate these projects to clean up their errors themselves have mostly failed
      • I still believe that cleaning these up and being able to easily identify fresh errors is important for the healthyness of the CI setup as a whole
      • One pretty strong action would be to move these repos into their own tenant(s) or a different shared one like opendev
      • I acknowledge that without further work this would break their CI setup, but I'm questioning now whether that impact would be worse than the impact the current situation has on my ability to maintain the OpenStack CI
      • Other ideas or opinions are welcome
      • clarkb reached out to starlingx and airship about this
    • Gitea 1.26.4 Upgrade (clarkb 20260622)
    • Bump Anubis difficult to 5 (clarkb 20260630)
      • There is some evidence that bots are regularly solving the Anubis challenge
      • The challenges are slowing them down enough that services continue to be mostly responsive
      • Should we increase the difficulty one level to slow them down even futher?
      • This will impact regular users too which is likely the primary consideration we should make.
      • https://review.opendev.org/c/opendev/system-config/+/995096
    • Planning Gerrit Project Renames (clarkb 20260622)
      • We have a request to rename x/cursive to openstack/cursive
      • Any concern with project ownership doing that? The current group membership includes people from Johns Hopkins University and OpenStack Barbican
      • Aiming for July 9 at ~2100 UTC
  • Open discussion

Upcoming Project Renames

(any additions should mention original->new full names and link to the corresponding project-config rename change in Gerrit) Changes should have their topic set to project-rename.

Previous meetings

Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/