Jump to: navigation, search

Difference between revisions of "Meetings/InfraTeamMeeting"

(Agenda for next meeting)
(Agenda for next meeting)
Line 28: Line 28:
 
*** https://review.opendev.org/c/opendev/system-config/+/980840
 
*** https://review.opendev.org/c/opendev/system-config/+/980840
 
*** This change and its child deploy prometheus with node exporter to collect server metrics
 
*** This change and its child deploy prometheus with node exporter to collect server metrics
*** Napkin math says that a 1TB volume should get us about 60 days of metrics. mnasiadka also indicates that Prometheus doesn't handle longer term metrics super well
+
*** mnasiadka has proposed we use GrepTimeDB in conjunction with Prometheus to collect metrics over a longer period of time
*** Ideally we would collect at least a years' worth of data. Can we make that happen with Prometheus?
 
*** Do we need to look at Prometheus adjacent tools like Mimir or Thanos?
 
**** Both of these solutions seem to tie into Prometheus using Prometheus as the data collection system. Then they store the data in a different system which can handle long term storage more nimbly. Then for queries they speak promql and prometheus apis allowing you to point tools like grafana at them as if they were prometheus.
 
 
** Upgrade Ansible to v9 (clarkb 20260310)
 
** Upgrade Ansible to v9 (clarkb 20260310)
 
*** https://docs.ansible.com/projects/ansible/latest/reference_appendices/release_and_maintenance.html#ansible-core-support-matrix
 
*** https://docs.ansible.com/projects/ansible/latest/reference_appendices/release_and_maintenance.html#ansible-core-support-matrix

Revision as of 23:24, 18 May 2026

Weekly Project Infrastructure team meeting

The OpenDev Team holds public weekly meetings in #opendev-meeting on OFTC, Tuesdays at 1900 UTC. Everyone interested in infrastructure and process surrounding automated testing and deployment is encouraged to attend.

Please feel free to add agenda items (and your IRC nick in parenthesis).

Agenda for next meeting

  • Announcements
  • Actions from last meeting
  • Specs Review
  • Topics
    • Upgrading Old Servers (clarkb 20230627)
      • https://etherpad.opendev.org/p/opendev-server-upgrade-planning Central tracking document which may link to more host specific documents
      • Next on the list are graphite and backup servers
      • Can probably spin up new backup servers alongside the old ones then migrate the old volumes off the old servers to the new ones and finally delete the old servers. Just need to double check borg version support matrix details and also what adding new backup servers will do to our cron job setups for backups.
      • Remember to use launch-node's --config-drive flag when booting new Noble nodes in Rax Classic
    • Dealing with web crawlers (clarkb 20251216)
      • We have seen ghcr.io hosted anubis images return errors during some deployment jobs. If this becomes consistent we may need to mirror the image
      • static02 and static04 are being cleaned up.
      • Anything else to monitor or can we close this item up for now?
    • Deploying a Prometheus for Server Metrics (clarkb 20260331)
    • Upgrade Ansible to v9 (clarkb 20260310)
    • Gerrit Account Cleanups (clarkb 20260317)
      • Since the upgrade to Gerrit notedb we've had account inconsistencies that prevent us from push to the external ids ref/table directly.
      • clarkb did a bunch of work to get the number down from hundreds to about 33 consistency errors before stalling out.
      • The tail was the most difficult as it wasn't clear what the more appropriate fix for each account would be
      • Since then it has been years and those accounts are likely inactive and unused. We can rerun the Gerrit consistency check, feed the info back through our audit script then decide if we need to be careful with any of these accounts
      • Chances are we can simply disable them all and remove the conflicting external ids.
      • If we take good notes we can reconstruct the accounts as appropriate after the fact without Gerrit downtime should one of these users show up and wonder what happened.
    • Gerrit 3.13 Upgrade Planning (clarkb 20260414)
      • Clarkb would like to target a 3.13 upgrade for the end of May/early June. How does Friday June 5 look for others?
      • Gerrit 3.13 removes support for Robot comments so Zuul will start making normal inline comments
      • This also means that the Zuul restarts performed as part of the upgrade process are actually required when we upgrade to 3.13 to get Zuul's Gerrit version detection sorted out.
      • https://etherpad.opendev.org/p/gerrit-upgrade-3.13 Beginnings of an upgrade plan document
      • Clarkb will be retesting the upgrade process now that 3.12.7 and 3.13.6 images are available.
    • Etherpad 3.1.0 Upgrade (clarkb 20260519)
    • Zuul reporting empty public_v6 addresses for test nodes (clarkb 20260519)
      • Zuul is reporting public_v6 values of for test nodes that do have working ipv6 in clouds like rax classic and ovh
      • This may be an openstack api bug, an openstacksdk bug, or a zuul-launcher bug.
      • Be aware this may impact the behavior of some test jobs.
      • We will need to dig into why this is happening to understand it better.
  • Open discussion

Upcoming Project Renames

(any additions should mention original->new full names and link to the corresponding project-config rename change in Gerrit) Changes should have their topic set to project-rename.

Previous meetings

Previous meetings, with their notes and logs, can be found at http://eavesdrop.openstack.org/meetings/infra/ and earlier at http://eavesdrop.openstack.org/meetings/ci/