Jump to: navigation, search

Difference between revisions of "Meetings/InfraTeamMeeting"

(Agenda for next meeting)
(Agenda for next meeting)
(30 intermediate revisions by 3 users not shown)
Line 10: Line 10:
  
 
* Announcements
 
* Announcements
** OpenInfra Foundation Individual Board member election is happening now. Look for your ballot via email and vote.
+
** OpenStack Release next week then PTG the week after
** OpenInfra Live will feature OpenDev January 18, 2024.
+
** Put your PTG agenda items on the etherpad: https://etherpad.opendev.org/p/apr2024-ptg-opendev
  
 
* Actions from last meeting
 
* Actions from last meeting
Line 20: Line 20:
 
** Upgrading Bionic servers to Focal/Jammy (clarkb 20230627)
 
** Upgrading Bionic servers to Focal/Jammy (clarkb 20230627)
 
*** https://etherpad.opendev.org/p/opendev-bionic-server-upgrades
 
*** https://etherpad.opendev.org/p/opendev-bionic-server-upgrades
** Python container updates (tonyb 20230718)
+
*** https://review.opendev.org/q/topic:jitsi_meet-jammy-update
*** https://review.opendev.org/c/opendev/system-config/+/905018 Drop Bullseye python3.11 images
+
*** Started looking at the wiki there are rough notes at: https://etherpad.opendev.org/p/opendev-bionic-server-upgrades#L58
*** zuul-operator is the last hold out now
+
** MariaDB Upgrades (clarkb 20240220)
**** https://review.opendev.org/c/zuul/zuul-operator/+/881245 is the change we need to get landed.
+
*** Relying on the container image MARIADB_AUTO_UPGRADE flag
** Updating Zuul's database server (clarkb 20231121)
+
*** Etherpad, Gitea, Gerrit, and Mailman could use upgrades.
*** Currently this is an older mysql 5.7 trove instance
+
*** https://review.opendev.org/c/opendev/system-config/+/911000 Upgrade etherpad mariadb to 10.11
*** We can move it to a self hosted instance (maybe on a dedicated host?) running out of docker like many of our other services and get it more up to date.
+
** AFS Mirror cleanups (clarkb 20240220)
*** Are there other services we should consider this for as well?
+
*** Ubuntu Xenial is next but currently busy with PTG, Release, and other tasks.
*** Research/Planning questions: https://etherpad.opendev.org/p/opendev-zuul-mysql-upgrade
+
*** Can followup with webserver log processing to determine which other mirrors may be dead.
** EMS discontinuing legacy/consumer hosting plans (fungi 20231219)
+
** Rebuilding Gerrit Images (clarkb 20240312)
*** We have until 2024-02-07 to upgrade to a business hosting plan (prepaying a year at 10x the current price) or move elsewhere.
+
*** Gerrit 3.9.2 has been released finally.
** Followup on 20231216 incident (frickler 20231217)
+
*** https://review.opendev.org/c/opendev/system-config/+/912470 Update our 3.9 image to 3.9.2
*** Do we want to pin external images like haproxy and only bump them after testing? (Not sure that would've helped for the current issue though)
+
**** This will also rebuild our 3.8.4 image so we should try and restart prod gerrit on the new 3.8.4 image when available.
*** Use docker prune less aggressively for easier rollback?
+
*** Sounds like there are a number of bugfixes that a rebuild will get us. May be worth doing this just after the openstack release completes?
**** We do so for some services, like https://opendev.org/opendev/system-config/src/branch/master/playbooks/roles/gitea/tasks/main.yaml#L71-L76, might want to duplicate for all containers? Bump the hold time to 7d?
+
** Review02 had an oops last night (clarkb 20240326)
*** Add timestamps to zuul_reboot.log?
+
*** Found the server was shutdown. After giving it a few minutes to potentially resolve itself (mostly worried about cloud action) clarkb proceeded to manually start the instance then start the containers.
**** https://opendev.org/opendev/system-config/src/branch/master/playbooks/service-bridge.yaml#L41-L55
+
*** mnaser reports it may have been an OOM event on the hosting side.
**** Also this is running on Saturdays (weekday: 6), do we want to fix the comment or the dow?
+
** Rackspace MFA Requirement (clarkb 20240312)
*** Do we want to document or implement a procedure for rolling back zuul upgrades? Or do we assume that issues can always be fixed in a forward going way?
+
*** MFA is enabled. Enforcement day is today. Please lookout for any issues.
** AFS quota issues (frickler 20231217)
+
** Project Renames (clarkb 20240227)
*** mirror.openeuler has reached its quota limit and the mirror job seems to be failing since two weeks. I'm also a bit worried that they seem do have doubled their volume over the last 12 months
+
*** https://review.opendev.org/c/opendev/system-config/+/911622 Move gerrit replication queue aside during project renames.
*** ubuntu mirrors are also getting close, but we might have another couple of months time there
+
*** Penciled in April 19, 2024 submit your rename requests now.
*** mirror.centos-stream seems to have a steep increase in the last two months and might also run into quota limits soon
+
** Nodepool image delete after upload (clarkb 20240319)
*** project.zuul with the latest releases is getting close to its tight limit of 1GB (sic), I suggest to simply double that
+
*** Nodepool now has the ability to delete on disk files for images after they are uploaded. We could potentially keep only small qcow2s using this functionality to save disk space.
** Broken wheel build issues (frickler 20231217)
 
*** wheel builds for centos >=8 seem broken, with nobody maintaining these it might be better to drop them?
 
** Gitea repo-archives filling server disk (clarkb 20240109)
 
*** https://review.opendev.org/c/opendev/system-config/+/904868 update robots.txt on upstream's suggestion
 
*** https://review.opendev.org/c/opendev/system-config/+/904874 Run weekly removal of all cached repo archives
 
** OpenDev Service Coordinator Election happening soon  (clarkb 20240109)
 
  
 
* Open discussion
 
* Open discussion
Line 57: Line 51:
 
Changes should have their topic set to project-rename.
 
Changes should have their topic set to project-rename.
  
* Rename foo/example -> bar/example: https://review.opendev.org/123456
+
* Rename vexxhost/ansible-role-frrouting > openstack/ansible-role-frrouting: https://review.opendev.org/c/openstack/project-config/+/910018
  
 
== Previous meetings ==
 
== Previous meetings ==
 
Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/
 
Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/

Revision as of 16:08, 26 March 2024

Weekly Project Infrastructure team meeting

The OpenDev Team holds public weekly meetings in #opendev-meeting on OFTC, Tuesdays at 1900 UTC. Everyone interested in infrastructure and process surrounding automated testing and deployment is encouraged to attend.

Please feel free to add agenda items (and your IRC nick in parenthesis).

Agenda for next meeting

  • Actions from last meeting
  • Specs Review
  • Topics
    • Upgrading Bionic servers to Focal/Jammy (clarkb 20230627)
    • MariaDB Upgrades (clarkb 20240220)
    • AFS Mirror cleanups (clarkb 20240220)
      • Ubuntu Xenial is next but currently busy with PTG, Release, and other tasks.
      • Can followup with webserver log processing to determine which other mirrors may be dead.
    • Rebuilding Gerrit Images (clarkb 20240312)
      • Gerrit 3.9.2 has been released finally.
      • https://review.opendev.org/c/opendev/system-config/+/912470 Update our 3.9 image to 3.9.2
        • This will also rebuild our 3.8.4 image so we should try and restart prod gerrit on the new 3.8.4 image when available.
      • Sounds like there are a number of bugfixes that a rebuild will get us. May be worth doing this just after the openstack release completes?
    • Review02 had an oops last night (clarkb 20240326)
      • Found the server was shutdown. After giving it a few minutes to potentially resolve itself (mostly worried about cloud action) clarkb proceeded to manually start the instance then start the containers.
      • mnaser reports it may have been an OOM event on the hosting side.
    • Rackspace MFA Requirement (clarkb 20240312)
      • MFA is enabled. Enforcement day is today. Please lookout for any issues.
    • Project Renames (clarkb 20240227)
    • Nodepool image delete after upload (clarkb 20240319)
      • Nodepool now has the ability to delete on disk files for images after they are uploaded. We could potentially keep only small qcow2s using this functionality to save disk space.
  • Open discussion

Upcoming Project Renames

(any additions should mention original->new full names and link to the corresponding project-config rename change in Gerrit) Changes should have their topic set to project-rename.

Previous meetings

Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/