Jump to: navigation, search

Difference between revisions of "Meetings/InfraTeamMeeting"

(Agenda for next meeting)
(Agenda for next meeting)
Line 10: Line 10:
  
 
* Announcements
 
* Announcements
 +
** Gerrit User Summit happening December 2&3 virtually.
 +
** clarkb out next week. Should we skip the meeting November 23?
  
 
* Actions from last meeting
 
* Actions from last meeting
Line 16: Line 18:
  
 
* Topics
 
* Topics
** Improving OpenDev's CD throughput (clarkb 20211109)
+
** Improving OpenDev's CD throughput (clarkb 20211116)
 
*** We can run many of our jobs in parallel in all of our CD pipelines. But this requires we properly document/address dependencies
 
*** We can run many of our jobs in parallel in all of our CD pipelines. But this requires we properly document/address dependencies
 
**** Need to understand our job dependencies and properly note them in Zuul config or address them by combining jobs.
 
**** Need to understand our job dependencies and properly note them in Zuul config or address them by combining jobs.
Line 32: Line 34:
 
***** this is a follow-on that adds a base job to clone system-config, and stops the other production jobs re-cloning.
 
***** this is a follow-on that adds a base job to clone system-config, and stops the other production jobs re-cloning.
 
***** this job must run first, but then all other jobs can run in parallel, as they are all in the same buildset and using the same "view" of system-config for that particular run
 
***** this job must run first, but then all other jobs can run in parallel, as they are all in the same buildset and using the same "view" of system-config for that particular run
** Gerrit Account cleanups (clarkb 20211109)
+
** Gerrit Account cleanups (clarkb 20211116)
 
*** 33 conflicts remain. Clarkb has written notes on proposed plans for each user in the comments of review02:~clarkb/gerrit_user_cleanups/audit-results-annotated.yaml
 
*** 33 conflicts remain. Clarkb has written notes on proposed plans for each user in the comments of review02:~clarkb/gerrit_user_cleanups/audit-results-annotated.yaml
** Zuul multi scheduler setup (clarkb 20211109)
+
** Zuul multi scheduler setup (clarkb 20211116)
 
*** Zuul is currently running with two schedulers (zuul01.o.o and zuul02.o.o with zuul02.o.o being "primary")
 
*** Zuul is currently running with two schedulers (zuul01.o.o and zuul02.o.o with zuul02.o.o being "primary")
*** We have tracked down a number of bugs Monday with corvus fixing many of them.
+
*** Did first rolling restart of schedulers over the weekend.
*** Overall seems stable enough.
+
*** Zuul-web should return consistent results now as it talk to ZooKeeper directly.
*** Note the "flapping" status page can be weird.
+
** User management on our systems (clarkb 20211116)
** User management on our systems (clarkb 20211109)
 
 
*** Be explicit about uid/gid ranges: https://review.opendev.org/c/opendev/system-config/+/816869/
 
*** Be explicit about uid/gid ranges: https://review.opendev.org/c/opendev/system-config/+/816869/
 
**** 0-999 system, 1000-1999 unallocated, 2000-2999 for infra-root users, 3000-9999 host level users, 10k - 64k container users that need uids on the host as well for bind mounts.
 
**** 0-999 system, 1000-1999 unallocated, 2000-2999 for infra-root users, 3000-9999 host level users, 10k - 64k container users that need uids on the host as well for bind mounts.
*** Clean up unused bootstrapping users: https://review.opendev.org/c/opendev/system-config/+/816771
 
 
*** Give gerritbot and matrix-gerritbot a shared user: https://review.opendev.org/c/opendev/system-config/+/816769/
 
*** Give gerritbot and matrix-gerritbot a shared user: https://review.opendev.org/c/opendev/system-config/+/816769/
 
*** Eventually convert mariadb container's from uid 999 to something that makes more sense on the system.
 
*** Eventually convert mariadb container's from uid 999 to something that makes more sense on the system.
 +
** Caching openstack/openstack on our DIB images (clarkb 20211116)
 +
*** There are semi frequent errors when updating the DIB cache for openstack/openstack
 +
*** Seems related to verifying or updating submodule content.
 +
*** One theory is that we replicate openstack/openstack's submodule updates before we push the new refs to the other repos. Then if DIB fetches in that window of time it is an error.
 +
*** Should we simply stop caching this repo entirely? It isn't really used for much.
  
 
* Open discussion
 
* Open discussion

Revision as of 20:31, 15 November 2021

Weekly Project Infrastructure team meeting

The OpenDev Team holds public weekly meetings in #opendev-meeting on OFTC, Tuesdays at 1900 UTC. Everyone interested in infrastructure and process surrounding automated testing and deployment is encouraged to attend.

Please feel free to add agenda items (and your IRC nick in parenthesis).

Agenda for next meeting

  • Announcements
    • Gerrit User Summit happening December 2&3 virtually.
    • clarkb out next week. Should we skip the meeting November 23?
  • Actions from last meeting
  • Specs Review
  • Topics
    • Improving OpenDev's CD throughput (clarkb 20211116)
      • We can run many of our jobs in parallel in all of our CD pipelines. But this requires we properly document/address dependencies
        • Need to understand our job dependencies and properly note them in Zuul config or address them by combining jobs.
          • Example 1: Combine service-gitea-lb and service-gitea jobs.
          • Example 2: Combine letsencrypt and nameserver jobs
          • Example 3: Have all jobs with webserver config express a dependency on the letsencrypt job
        • Suggest we document the known job dependencies in a human readable format, then encode this into zuul, then we can switch to parallel runs.
        • https://review.opendev.org/c/opendev/system-config/+/807672
          • should list dependencies for all jobs
          • zuul doesn't trigger on this? not sure on best approach to make it mergable
        • https://review.opendev.org/c/opendev/base-jobs/+/807807
          • currently every executor adds keys for bridge, then logs in and clones system-config before running playbooks
          • this change makes split jobs to do this. however, production remains the same as both are called.
        • https://review.opendev.org/c/opendev/system-config/+/807808
          • this is a follow-on that adds a base job to clone system-config, and stops the other production jobs re-cloning.
          • this job must run first, but then all other jobs can run in parallel, as they are all in the same buildset and using the same "view" of system-config for that particular run
    • Gerrit Account cleanups (clarkb 20211116)
      • 33 conflicts remain. Clarkb has written notes on proposed plans for each user in the comments of review02:~clarkb/gerrit_user_cleanups/audit-results-annotated.yaml
    • Zuul multi scheduler setup (clarkb 20211116)
      • Zuul is currently running with two schedulers (zuul01.o.o and zuul02.o.o with zuul02.o.o being "primary")
      • Did first rolling restart of schedulers over the weekend.
      • Zuul-web should return consistent results now as it talk to ZooKeeper directly.
    • User management on our systems (clarkb 20211116)
    • Caching openstack/openstack on our DIB images (clarkb 20211116)
      • There are semi frequent errors when updating the DIB cache for openstack/openstack
      • Seems related to verifying or updating submodule content.
      • One theory is that we replicate openstack/openstack's submodule updates before we push the new refs to the other repos. Then if DIB fetches in that window of time it is an error.
      • Should we simply stop caching this repo entirely? It isn't really used for much.
  • Open discussion

Upcoming Project Renames

(any additions should mention original->new full names and link to the corresponding project-config rename change in Gerrit)

Previous meetings

Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/