Jump to: navigation, search

Difference between revisions of "Meetings/InfraTeamMeeting"

(Agenda for next meeting)
(Agenda for next meeting)
 
(34 intermediate revisions by 4 users not shown)
Line 10: Line 10:
  
 
* Announcements
 
* Announcements
** Service Coordinator Nominations Open Until February 17
 
*** https://lists.opendev.org/archives/list/service-discuss@lists.opendev.org/thread/KXICOMLOABQLOXITQK2IC2C32UNJG3LO/
 
  
 
* Actions from last meeting
 
* Actions from last meeting
Line 18: Line 16:
  
 
* Topics
 
* Topics
** Setuptools removed pkg_resources in 82.0.0 (clarkb 20260210)
 
*** https://discuss.python.org/t/pkg-resources-removal-how-to-go-from-there/106079 Upstream discussion happening here
 
*** This broke our infra-prod-service-bridge job which was fixed by pinning Setuptools in the ansible venv
 
*** We should consider upgrading Ansible as newer Ansible will use packaging instead of pkg_resources
 
 
** Upgrading Old Servers (clarkb 20230627)
 
** Upgrading Old Servers (clarkb 20230627)
 
*** https://etherpad.opendev.org/p/opendev-server-upgrade-planning Central tracking document which may link to more host specific documents
 
*** https://etherpad.opendev.org/p/opendev-server-upgrade-planning Central tracking document which may link to more host specific documents
 
*** Next on the list are graphite and backup servers
 
*** Next on the list are graphite and backup servers
*** Can probably spin up new backup servers alongside the old ones then migrate the old volumes off the old servers to the new ones and finally delete the old servers. Just need to double check borg version support matrix details and also what adding new backup servers will do to our cron job setups for backups.
+
*** backup03.ca-ymq-1.vexxhost.opendev.org has been launched and is being backed up too
 +
**** https://review.opendev.org/c/opendev/system-config/+/995420 Starting backup02 removal here
 
*** Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic
 
*** Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic
** Moving OpenDev Synchronous Communication to Matrix (clarkb 20250520)
+
** Deploying a Prometheus for Server Metrics (clarkb 20260331)
*** We have moved to Matrix
+
*** https://review.opendev.org/c/opendev/system-config/+/980840
*** Minimal traffic has remained on IRC
+
*** This change and its child deploy prometheus with node exporter to collect server metrics
*** Tools like Gerritbot, statusbot, and eavesdropping seem to be working well
+
*** These two changes simplify the setup and testing of prometheus and node exporter
*** Have we noticed any problems?
+
**** https://review.opendev.org/c/zuul/zuul-jobs/+/994564 manage /etc/hosts with public IPs
** Adding Bad Crawler Honeypots to our Sites (clarkb 20251216)
+
**** https://review.opendev.org/c/opendev/system-config/+/994565 Use public IPs in system-config-run jobs
*** https://review.opendev.org/c/opendev/system-config/+/974942 More aggressive lure setup via comment in the html
+
** Larger VM sizes for tests (corvus 20260618)
** MariaDB backups over TCP instead of Unix Socket (clarkb 20260127)
+
*** corvus has been testing python 3.14 with zuul; zuul unit tests now use slightly more than 8GB under 3.14
*** This seems to be working
+
*** We have 16gb nodes, but in two clouds, rax-classic and vexxhost, they have fewer vcpus than their 8gb counterparts, so we need to use 32gb nodes to compensate
*** The one issue we ran into was the keycloak's db listens on ::1 not 127.0.0.1.
+
*** Are we okay with this?  Alternatives?
** Updating All of Our Containers to Trixie (clarkb 20260203)
+
** Dealing with alien zuul config errors in the openstack tenant (frickler 20260617)
*** Trixie has been working well on services like Etherpad and Gerrit
+
*** Currently there are still 185 zuul config errors in the openstack tenant, despite my year-long struggle to get rid of them.
*** Let's update all of our containers to Trixie so that we can drop bookworm builds
+
*** Most of these are from "alien" repos (74 airship, 29 starlingx) that I have no motivation to fix myself with my OpenStack hats on
*** https://review.opendev.org/q/hashtag:%22opendev-trixie%22+status:open
+
*** Efforts to motivate these projects to clean up their errors themselves have mostly failed
*** Should we retire Gear rather than update its container image? Or maybe just stop building a container for geard?
+
*** I still believe that cleaning these up and being able to easily identify fresh errors is important for the healthyness of the CI setup as a whole
** Pre PTG Planning (clarkb 20260203)
+
*** One pretty strong action would be to move these repos into their own tenant(s) or a different shared one like opendev
*** Would March 2-4 or March 9-11 work for a few days of snyc up on meetpad?
+
*** I acknowledge that without further work this would break their CI setup, but I'm questioning now whether that impact would be worse than the impact the current situation has on my ability to maintain the OpenStack CI
*** We would probably do two or three hours a day and the Tuesday sync up would replace our regularly scheduled team meeting.
+
*** Other ideas or opinions are welcome
 +
*** clarkb reached out to starlingx and airship about this
 +
**** Airship indicated they would like to avoid the extra work involved in setting up a separate tenant
 +
**** clarkb pointed out to them that they would need to fix their zuul config errors and be reachable via email or matrix at a bare minumum if we want to make that work.
 +
**** https://lists.starlingx.io/archives/list/starlingx-discuss@lists.starlingx.io/thread/YQVACUR4OCX74ZULHAJ4AD44MHGY37YI/
 +
** Gitea 1.26.4 Upgrade (clarkb 20260622)
 +
*** https://review.opendev.org/c/opendev/system-config/+/994326 Upgrade Gitea to 1.26.4
 +
*** Its time to upgrade to the next Gitea bugfix release
 +
** Bump Anubis difficult to 5 (clarkb 20260630)
 +
*** There is some evidence that bots are regularly solving the Anubis challenge
 +
*** The challenges are slowing them down enough that services continue to be mostly responsive
 +
*** Should we increase the difficulty one level to slow them down even futher?
 +
*** This will impact regular users too which is likely the primary consideration we should make.
 +
*** https://review.opendev.org/c/opendev/system-config/+/995096
 +
** Planning Gerrit Project Renames (clarkb 20260622)
 +
*** We have a request to rename x/cursive to openstack/cursive
 +
*** Any concern with project ownership doing that? The current group membership includes people from Johns Hopkins University and OpenStack Barbican
 +
*** Aiming for July 9 at ~2100 UTC
  
 
* Open discussion
 
* Open discussion
Line 52: Line 64:
 
Changes should have their topic set to project-rename.
 
Changes should have their topic set to project-rename.
  
* Rename example/foo -> example/bar: https://review.opendev.org/c/openstack/project-config/+/123456
+
* Rename x/cursive -> openstack/cursive: https://review.opendev.org/c/openstack/project-config/+/990122 (stephenfin, fungi)
  
 
== Previous meetings ==
 
== Previous meetings ==
 
Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/
 
Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/

Latest revision as of 14:53, 30 June 2026

Weekly Project Infrastructure team meeting

The OpenDev Team holds public weekly meetings in #opendev-meeting on OFTC, Tuesdays at 1900 UTC. Everyone interested in infrastructure and process surrounding automated testing and deployment is encouraged to attend.

Please feel free to add agenda items (and your IRC nick in parenthesis).

Agenda for next meeting

  • Announcements
  • Actions from last meeting
  • Specs Review
  • Topics
    • Upgrading Old Servers (clarkb 20230627)
    • Deploying a Prometheus for Server Metrics (clarkb 20260331)
    • Larger VM sizes for tests (corvus 20260618)
      • corvus has been testing python 3.14 with zuul; zuul unit tests now use slightly more than 8GB under 3.14
      • We have 16gb nodes, but in two clouds, rax-classic and vexxhost, they have fewer vcpus than their 8gb counterparts, so we need to use 32gb nodes to compensate
      • Are we okay with this? Alternatives?
    • Dealing with alien zuul config errors in the openstack tenant (frickler 20260617)
      • Currently there are still 185 zuul config errors in the openstack tenant, despite my year-long struggle to get rid of them.
      • Most of these are from "alien" repos (74 airship, 29 starlingx) that I have no motivation to fix myself with my OpenStack hats on
      • Efforts to motivate these projects to clean up their errors themselves have mostly failed
      • I still believe that cleaning these up and being able to easily identify fresh errors is important for the healthyness of the CI setup as a whole
      • One pretty strong action would be to move these repos into their own tenant(s) or a different shared one like opendev
      • I acknowledge that without further work this would break their CI setup, but I'm questioning now whether that impact would be worse than the impact the current situation has on my ability to maintain the OpenStack CI
      • Other ideas or opinions are welcome
      • clarkb reached out to starlingx and airship about this
    • Gitea 1.26.4 Upgrade (clarkb 20260622)
    • Bump Anubis difficult to 5 (clarkb 20260630)
      • There is some evidence that bots are regularly solving the Anubis challenge
      • The challenges are slowing them down enough that services continue to be mostly responsive
      • Should we increase the difficulty one level to slow them down even futher?
      • This will impact regular users too which is likely the primary consideration we should make.
      • https://review.opendev.org/c/opendev/system-config/+/995096
    • Planning Gerrit Project Renames (clarkb 20260622)
      • We have a request to rename x/cursive to openstack/cursive
      • Any concern with project ownership doing that? The current group membership includes people from Johns Hopkins University and OpenStack Barbican
      • Aiming for July 9 at ~2100 UTC
  • Open discussion

Upcoming Project Renames

(any additions should mention original->new full names and link to the corresponding project-config rename change in Gerrit) Changes should have their topic set to project-rename.

Previous meetings

Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/