Jump to: navigation, search

Difference between revisions of "Meetings/InfraTeamMeeting"

(Weekly Project Infrastructure team meeting)
(26 intermediate revisions by 6 users not shown)
Line 10: Line 10:
  
 
* Announcements
 
* Announcements
** OpenStack Release October 14
 
** Summit next week. PTG the week after.
 
** Wallaby cycle signing key has been activated https://review.opendev.org/760364
 
*** Please sign if you haven't yet https://docs.opendev.org/opendev/system-config/latest/signing.html
 
  
 
* Actions from last meeting
 
* Actions from last meeting
Line 24: Line 20:
 
*** Zuul as CD engine
 
*** Zuul as CD engine
 
** OpenDev
 
** OpenDev
*** Preparing to upgrade Gerrit from 2.13 to 3.2
+
*** Gerrit account and group inconsistencies
**** review-test.opendev.org is an upgraded snapshot of production from October 1. Please check it out
+
**** https://etherpad.opendev.org/p/gerrit-user-consistency-2021 High level notes.
**** Basic functionality seems to be working
+
**** We have 17 accounts with preferred email addresses that don't have a matching external id
***** logging in, git review -s, git review to push, commenting on changes, ICLA signing, replication, change searching, and so on.
+
**** Need to correct the ~642 external id issues before we can push updates to refs/meta/external-ids with Gerrit online.
**** jeepyb bug/spec update hooks and the welcome message hook rely on database access and will need to be updated or sunsetted
+
**** Workaround is we can stop Gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?)
**** Upgrade Process
+
**** Next steps
***** Backup then upgrade from 2.13 to 2.16. This is our fallback midpoint checkpoint
+
***** Identify accounts that are inactive and can be more forcefully retired. Retire these to fix those errors.
***** Backup again then migrate to notedb on 2.16
+
***** Identify accounts that are unlikely to be used anymore based on activity and more forcefully retire those to fix these errors. (We can always undo specific updates to these accounts if necessary)
***** Upgrade to 3.2
+
***** Work with remaining accounts to figure out how to best resolve the account conflicts. This may take some time.
***** Upgrade to 2.16 along with backups should be doable in a day. Then notedb migration can happen overnight with 3.2 upgrade happening on day two.
+
***** https://review.opendev.org/c/opendev/system-config/+/777846 Collecting scripting efforts here
**** Unknowns
+
*** Configuration tuning
***** Storyboard integration
+
**** Using strong refs for jgit caches
**** Can we start talking about scheduling the outage and upgrade?
+
**** Batch user groups and threads
*** Luca has offered to do a conference call with us. Let me know if interested and I'll include you for scheduling if/when that happens.
 
  
 
* General topics
 
* General topics
** PTG PLanning (clarkb 20200929)
+
** OpenAFS cluster status (clarkb 20210302)
*** October PTG registration is now open: https://www.openstack.org/ptg/
+
*** Upgrading servers to Bionic then Focal next.
*** OpenDev planning stats here: https://etherpad.opendev.org/opendev-ptg-planning-oct-2020
+
*** New third db server for proper quorum.
** Bup and Borg Backups (clarkb 20200929)
+
** Bup and Borg Backups (clarkb 20210302)
*** Ethercalc to be the first borg backed up service
+
*** gitea sql db backup issues.
** Splitting puppet else into specific infra-prod jobs (clarkb 20200929)
+
** Picking up steam on Puppet -> Ansible rewrites (clarkb 20210302)
*** Should be mostly mechanical
+
*** Enable Xenial -> Bionic/Focal system upgrades
*** Does it make sense to try and sprint this? Have several people work on getting it done in a short period of time?
+
*** https://etherpad.opendev.org/p/infra-puppet-conversions-and-xenial-upgrades Start capturing TODO list here
** Trusty Upgrade Progress (clarkb 20200929)
+
*** Zuul service host updates in progress now. Mergers are done. Executors in progress.
*** Wiki updates
+
** Deploy a new refstack.openstack.org server (kopecmartin 20210302)
 +
*** Ready for testing?
 +
** Bridge disk space (clarkb 20210302)
 +
*** This appears at least partially related to ansible and python caching. Should we just clear those caches then profile them?
  
 
* Open discussion
 
* Open discussion
** meetpad was not useable for (some?) participants from China (frickler 20201030)
 
*** neutron went back to using Zoom for their last sessions because of this
 
*** can we (maybe via help from the foundation involving their local members) work on improving this?
 
*** would be sad to see open tools not being able to be used because of this
 
  
 
== Upcoming Project Renames ==
 
== Upcoming Project Renames ==

Revision as of 00:53, 2 March 2021

Weekly Project Infrastructure team meeting

The OpenDev Team holds public weekly meetings in #opendev-meeting, Tuesdays at 1900 UTC. Everyone interested in infrastructure and process surrounding automated testing and deployment is encouraged to attend.

Please feel free to add agenda items (and your IRC nick in parenthesis).

Agenda for next meeting

  • Announcements
  • Actions from last meeting
  • Specs approval
  • Priority Efforts (Standing meeting agenda items. Please expand if you have subtopics.)
    • Update Config Management
      • topic:update-cfg-mgmt
      • Zuul as CD engine
    • OpenDev
      • Gerrit account and group inconsistencies
        • https://etherpad.opendev.org/p/gerrit-user-consistency-2021 High level notes.
        • We have 17 accounts with preferred email addresses that don't have a matching external id
        • Need to correct the ~642 external id issues before we can push updates to refs/meta/external-ids with Gerrit online.
        • Workaround is we can stop Gerrit, push to external ids directly, reindex accounts (and groups?), start gerrit, then clear accounts caches (and groups caches?)
        • Next steps
          • Identify accounts that are inactive and can be more forcefully retired. Retire these to fix those errors.
          • Identify accounts that are unlikely to be used anymore based on activity and more forcefully retire those to fix these errors. (We can always undo specific updates to these accounts if necessary)
          • Work with remaining accounts to figure out how to best resolve the account conflicts. This may take some time.
          • https://review.opendev.org/c/opendev/system-config/+/777846 Collecting scripting efforts here
      • Configuration tuning
        • Using strong refs for jgit caches
        • Batch user groups and threads
  • General topics
    • OpenAFS cluster status (clarkb 20210302)
      • Upgrading servers to Bionic then Focal next.
      • New third db server for proper quorum.
    • Bup and Borg Backups (clarkb 20210302)
      • gitea sql db backup issues.
    • Picking up steam on Puppet -> Ansible rewrites (clarkb 20210302)
    • Deploy a new refstack.openstack.org server (kopecmartin 20210302)
      • Ready for testing?
    • Bridge disk space (clarkb 20210302)
      • This appears at least partially related to ansible and python caching. Should we just clear those caches then profile them?
  • Open discussion

Upcoming Project Renames

(any additions should mention original->new full names and link to the corresponding project-config rename change in Gerrit)

Previous meetings

Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/