Jump to: navigation, search

Infrastructure Status

  • 2024-10-09 21:59:21 UTC started zuul-launcher on zl01
  • 2024-10-08 14:15:55 UTC Rebooted bridge01
  • 2024-10-07 18:53:06 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org reducing volume utilization from 91% to 75%
  • 2024-10-07 12:52:51 UTC Rebooted nb04 to clear leaked loop devices after cleaning up /opt
  • 2024-09-18 15:50:15 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org reducing volume utilization from 91% to 75%
  • 2024-09-17 14:10:10 UTC Deleted the pyparsing-update branch from openstack/cliff (formerly 993972982739b2db3028278cb4d99be2a713d09c) at Stephen Finucane's request
  • 2024-09-10 13:05:13 UTC Released gerritlib 0.11.0
  • 2024-09-03 14:40:15 UTC Removed the late Ilya Etingof from the collaborators list on https://pypi.org/p/sushy-oem-idrac/ at Dmitry Tantsur's request in IRC
  • 2024-08-30 01:20:43 UTC Pruned backups on backup02.ca-ymq-1.vexxhost bringing volume usage down from 90% to 75%
  • 2024-08-08 15:26:21 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org, bringing volume utilization down from 90% to 73%
  • 2024-07-30 17:43:26 UTC Ran nodepool erase routine to clean up left over records for linaro-regionone cloud resources after its removal
  • 2024-07-25 20:52:36 UTC Repaired data corruption for a repository on the gitea09 and gitea14 backends, root cause seems to be from an unexpected hypervisor host outage
  • 2024-07-24 22:34:42 UTC Closed persistent SSH API connections from Gerrit account 34377 in order to end a Git fetch task which was hung for the past month
  • 2024-07-24 19:14:22 UTC Converted database contents to utf8mb4 and case sensitive collations on each gitea backend using the gitea doctor convert tool including in v1.22
  • 2024-07-22 14:32:48 UTC Repaired data corruption for two repositories on the gitea12 backend, root cause seems to be from an unexpected hypervisor host outage
  • 2024-07-19 18:46:10 UTC Pruned backups on backup02.ca-ymq-1.vexxhost bringing volume usage down from 91% to 75%
  • 2024-07-18 14:41:05 UTC restarted zuul to pick up 924114
  • 2024-07-17 13:58:10 UTC restarted schedulers/web for 924292
  • 2024-07-16 19:51:58 UTC restarted zuul for retry fixes
  • 2024-07-15 15:32:28 UTC restarted zuul schedulers/web to pick up 924152
  • 2024-07-14 01:42:34 UTC restarted zuul-scheduler and -web to pick up 924116
  • 2024-07-12 13:13:25 UTC Added Mohammed Naser to inactive x/ospurge reviewer groups following positive feedback from existing group members
  • 2024-07-11 01:28:56 UTC Bypass Zuul to merge https://review.opendev.org/c/openstack/openstack-zuul-jobs/+/922649
  • 2024-07-10 23:04:53 UTC restarted zuul to pick up 923903
  • 2024-07-10 20:11:15 UTC restarted zuul to pick up 923874
  • 2024-07-01 14:52:32 UTC Manually completed Debian mirror updates, which had been hung due to timeouts from the latest stable point release over the weekend
  • 2024-06-27 15:07:19 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org reducing volume utilization from 93% to 72%
  • 2024-06-25 17:10:46 UTC restarted zuul schedulers/web
  • 2024-06-24 21:49:37 UTC restarted zuul schedulers and web to clear spurious secret duplication errors
  • 2024-06-07 16:46:53 UTC Gerrit will be restarted at around 17:45 UTC to pick up some small image updates
  • 2024-05-31 16:02:51 UTC Gerrit on review.opendev.org is being upgraded to version 3.9 and will be offline. We have allocated an hour for the outage window lasting until 1700 UTC
  • 2024-05-29 22:02:47 UTC paused all image builds: https://paste.opendev.org/show/bxOJQAnEGwCHmeBs4tiU/
  • 2024-05-29 21:47:40 UTC deleted the ubuntu-jammy-e57f97d15e0b4878afd4d262b4f8ba75 dib build
  • 2024-05-29 21:39:40 UTC paused ubuntu-jammy builds and deleted all most-recent ubuntu-jammy images due to git package update
  • 2024-05-27 22:00:13 UTC Set Storpool CI's Gerrit account (15670) back to active at their request after they indicated changes have been made to the CI system to address prior concerns
  • 2024-05-22 17:03:02 UTC There will be a short Gerrit outage while we update to the latest 3.8 release in preparation for next weeks 3.9 upgrade.
  • 2024-05-22 14:42:21 UTC Deleted the obsolete logscraper01.openstack.org server instance and associated DNS records
  • 2024-05-18 15:02:22 UTC Yanked the 2015.1.0, 2015.1.0rc1, 2015.1.0b3, 2015.1.0b2 and 2015.1.0b1 releases of Mistral from PyPI to alleviate user confusion around version ordering
  • 2024-05-15 16:48:34 UTC Deleted 41 obsolete/unused DNS records from the openstack.org domain: https://paste.opendev.org/show/bk9sSCPn5j4dZEbDGz8A/
  • 2024-05-10 17:15:52 UTC There will be a short Gerrit downtime while we update a database and our container image
  • 2024-05-09 21:10:09 UTC Rotated wiki's ssl cert
  • 2024-05-02 20:04:59 UTC There will be a short etherpad outage while the service restarts to accommodate new configuration.
  • 2024-05-02 15:45:54 UTC Also added noonedeadpunk to freezer-stable-maint
  • 2024-05-02 15:40:42 UTC Added noonedeadpunk to freezer-core, freezer-tempest-core, and freezer-release to enact https://review.opendev.org/c/openstack/governance/+/914911
  • 2024-04-22 20:03:16 UTC Gerrit will be offline for a short time while we rename a project repo. https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/KP6NCOKJEYRGFD5FS26CZPVLEKFSY2ZO/ for more details
  • 2024-04-18 17:29:27 UTC stopped apache on cacti for maintenance
  • 2024-04-17 14:09:58 UTC Rebooted and deleted repository archives on gitea11 after they filled up its rootfs
  • 2024-04-17 13:58:40 UTC Rebooted and deleted repository archives on gitea10 after they filled up its rootfs
  • 2024-04-15 18:37:41 UTC The Gerrit service on review.opendev.org will be offline momentarily for a restart in order to apply a patch update
  • 2024-04-15 00:25:54 UTC Restarted Gerrit and apache on review02 to unstuck Gerrit in an unhappy state consuming all the CPU
  • 2024-04-11 13:45:12 UTC force-merged 913418,6 to work around zuul config error issue
  • 2024-04-10 20:35:04 UTC Pruned backup02.ca-ymq-1.vexxhost reducing backup volume utilization there from 90% to 70%
  • 2024-04-09 22:31:43 UTC restarted zuul schedulers and web to pick up split sql query change
  • 2024-04-07 17:42:19 UTC restarted zuul-web and zuul-scheduler to pick up https://review.opendev.org/915203
  • 2024-04-07 00:00:44 UTC Rebooted afs01.dfw after a block storage outage in its cloud provider at 21:30 UTC, and manually re-released all volumes to make sure none were left in a broken state
  • 2024-04-06 16:50:53 UTC Zuul job results from 14:10 to 16:43 on 2024-04-06 may be unavailable in Zuul's web UI. Recheck changes affected by this if necessary.
  • 2024-04-05 17:30:14 UTC Reset acme.sh on nb02 as a full disk appears to have corrupted it
  • 2024-03-27 16:56:08 UTC Set Storpool CI's Gerrit account (15670) to inactive due to lack of response to our emails asking them to address Zuul configuration issues.
  • 2024-03-26 03:56:53 UTC review02.opendev.org was in a shutdown state for nearly an hour. Manually starting the instance then manually restarting containers appears to have restored services
  • 2024-03-26 03:29:43 UTC OpenDev is experiencing connectivity issues to several key services including review.opendev.org. Admins are monitoring.
  • 2024-03-11 15:23:23 UTC Pruned backup volume on backup02.ca-ymq-1.vexxhost.opendev.org reducing it from 92% to 69% utilization
  • 2024-03-07 16:56:15 UTC Jobs that fail due to being unable to resolve mirror.dfw.rackspace.opendev.org can be rechecked. This error was an unexpected side effect of some nodepool configuration changes which have been reverted.
  • 2024-03-06 14:01:13 UTC Restarted the ptgbot and statusbot containers forcing them to reconnect to IRC since both came loose during a netsplit at 00:49 UTC
  • 2024-03-04 22:23:31 UTC began graceful restart of zuul cluster
  • 2024-02-26 22:35:24 UTC Gerrit on review.opendev.org will be restarted to perform a minor upgrade to the service.
  • 2024-02-21 13:36:47 UTC force-merged https://review.opendev.org/c/openstack/openstack-ansible-ops/+/909655 in order to fix train-eol tagging
  • 2024-02-14 15:08:06 UTC Increased mirror.centos-stream AFS volume from 320GB to 350GB while we work out how to trim down the subset of packages we rsync
  • 2024-02-12 21:28:44 UTC restarted zuul-web to pick up webui fixes
  • 2024-02-11 15:42:08 UTC restarted all of zuul post bundle-refactor; cleared zk state except config cache.
  • 2024-02-05 23:42:46 UTC Gracefully restarted the gitea cluster to pick up new database connection limit configs
  • 2024-01-23 21:04:32 UTC Restarting web services on review.opendev.org to clear stale workers
  • 2024-01-23 20:04:30 UTC OpenID logins for the Gerrit WebUI on review.opendev.org should be working normally again since the recent service restart
  • 2024-01-23 19:38:13 UTC The Gerrit service on review.opendev.org will be offline momentarily for a restart, in order to attempt to restore OpenID login functionality
  • 2024-01-23 08:56:20 UTC all new logins to https://review.opendev.org are currently failing. investigation is ongoing, please be patient
  • 2024-01-20 19:22:09 UTC restarted zuul schedulers/web to pick up new github api key
  • 2024-01-13 21:54:09 UTC restarted all of zuul on 3a4ad1c46ca55fdc8afb4997482a0695f6ac9ec2
  • 2024-01-12 20:06:31 UTC Retired Gerrit account 35105 at the request of ihalomi who is has changed their UbuntuOne login E-mail resulting in a new OpenID
  • 2024-01-09 16:31:39 UTC increased project.zuul afs volume quota to 5gb
  • 2024-01-08 23:57:05 UTC restarted nl01 to release leaked zk request lock
  • 2024-01-05 14:56:52 UTC Temporarily disabled gitea09 from the load balancer pools while investigating a full rootfs on it
  • 2024-01-04 07:02:48 UTC etherpad services upgrade complete.
  • 2024-01-04 06:17:17 UTC etherpad services will be unavailable briefly while upgraded
  • 2023-12-18 16:11:06 UTC restarted zuul-web to match zuul-scheduler (zuul-scheduler was previously restarted on 2023-12-17)
  • 2023-12-16 16:47:52 UTC Service for Git repository hosting on https://opendev.org/ has been restored by rolling back an haproxy upgrade; Zuul jobs which failed with connection timeouts occurring between 04:00 and 16:15 UTC today can be safely rechecked now
  • 2023-12-16 07:21:52 UTC The gitea load balancer on opendev.org is saturated and therefore new connections are timing out. Admins are investigating.
  • 2023-12-16 05:21:30 UTC Web, and possibly others, services on opendev.org appear to be down. Admins are investigating.
  • 2023-12-11 23:08:55 UTC Started the review.opendev.org server which spontaneously shut down at 21:28 UTC, corrected the fsck passno in its fstab, and restarted the Gerrit IRC/Matrix bots so they'll start seeing change events again
  • 2023-12-11 15:46:14 UTC Zuul jobs reporting POST_FAILURE were due to an incident with one of our cloud providers; this provider has been temporarily disabled and changes can be rechecked.
  • 2023-12-06 21:38:05 UTC The Gerrit service on review.opendev.org will be offline momentarily to restart it onto an updated replication key
  • 2023-12-05 20:37:13 UTC Added (neutral) SPF records for all Mailman sites in order to comply with delivery requirements for some mass mail providers
  • 2023-12-02 00:29:24 UTC The Gerrit service on review.opendev.org will be offline momentarily to restart it onto an updated replication key
  • 2023-11-30 18:23:55 UTC restarted zuul-web to pick up js changes
  • 2023-11-29 21:10:57 UTC The Gerrit service on review.opendev.org will be restarting momentarily for a patch update to address a recently observed regression preventing some changes from merging
  • 2023-11-27 18:33:13 UTC Zuul build urls should be working again (browser refresh may be required)
  • 2023-11-27 18:27:48 UTC restarted zuul-web to fix js errors
  • 2023-11-17 17:14:28 UTC Zuul is fully back in service now, but any events occurring prior to 17:05 UTC may need a recheck to trigger jobs.
  • 2023-11-17 16:35:45 UTC The Gerrit upgrade is complete, however we have Zuul offline in parallel for a schema migration, so any events occurring during this time will be lost (requiring a recheck or similar to trigger jobs once it returns to service); we'll update again once this is complete.
  • 2023-11-17 14:08:16 UTC Gerrit will be unavailable for a short time starting at 15:30 UTC as it is upgraded to the 3.8 release. https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/XT26HFG2FOZL3UHZVLXCCANDZ3TJZM7Q/
  • 2023-11-16 22:41:18 UTC added zuul hosts to ansible emergency file to prepare for 2023-11-17 maintenance
  • 2023-11-06 20:37:38 UTC Bypassed testing to merge change 900243 as a temporary workaround for an outage in one of our log storage providers
  • 2023-11-02 17:38:47 UTC The Etherpad service on etherpad.opendev.org has been upgraded to the 1.9.4 release
  • 2023-11-02 16:09:19 UTC Completed upgrade of mailing list sites to Mailman 3.3.9 as announced in https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/message/MCCGA5276DKUHPGZTATXU7M3VJGFGSXF/
  • 2023-11-01 13:13:01 UTC Deleted the original lists.openstack.org server after creating an archival image, 11.5 years from the anniversary of its creation (you'll be missed old friend!)
  • 2023-10-31 22:03:20 UTC Gerrit on review.opendev.org will be restarted to pick up a configuration change required as part of Gerrit 3.8 upgrade preparations.
  • 2023-10-31 11:00:51 UTC deleted the "feature/zookeeper" branch from the openstack/kayobe which was an unwanted remainder from importing the project
  • 2023-10-30 17:04:13 UTC Moved private host vars for lists01.opendev.org to a mailman3 group vars file on bridge01
  • 2023-10-25 14:25:21 UTC Manually closed 96 stale SSH connections to Gerrit for account 33746
  • 2023-10-23 13:24:29 UTC Issued and deployed a new openinfraci.linaro.cloud ssl cert for the arm64 linaro cloud
  • 2023-10-23 10:59:13 UTC restored nova ptg etherpad to the previous version as all content had been deleted
  • 2023-10-23 06:47:07 UTC restored kolla ptg etherpad to the previous version as all content had been deleted
  • 2023-10-17 16:21:35 UTC Upgraded our Zookeeper cluster to Zookeeper 3.8.3
  • 2023-10-12 20:20:45 UTC Archive imports for lists.openstack.org are taking longer than anticipated to complete... revised maintenance conclusion estimate is 21:00 UTC
  • 2023-10-12 15:32:53 UTC The lists.openstack.org site will be offline over the next few hours for migration to a new server
  • 2023-10-11 23:47:52 UTC Another short Gerrit outage for updates on review.opendev.org. This update ensures we are using the current versions of all Gerrit plugins.
  • 2023-10-10 15:00:15 UTC Manually refreshed the Debian mirror after cleaning up a stale reprepro lockfile left behind when the Debian 12.2 point release resulted in the process getting killed by the managing script's timeout wrapper
  • 2023-10-10 13:28:17 UTC Pruned backup volume on backup02.ca-ymq-1.vexxhost from 93% down to 64% utilization
  • 2023-10-09 16:35:03 UTC The Gerrit service on review.opendev.org will be offline momentarily while we restart it for a combined runtime and platform upgrade
  • 2023-10-04 15:22:32 UTC Cleared /opt/dib_tmp on nb04 and rebooted the server to reset mounts. This should fix arm64 image builds
  • 2023-09-29 11:37:02 UTC Deleted the old lists.katacontainers.io server now that it's been migrated to its new home for over a week
  • 2023-09-21 15:32:53 UTC The lists.openinfra.dev and lists.starlingx.io sites will be offline briefly for migration to a new server
  • 2023-09-14 16:49:22 UTC The lists.airshipit.org and lists.katacontainers.io sites will be offline briefly for migration to a new server
  • 2023-09-12 20:26:02 UTC Deleted a stray spam post from the service-discuss list archives
  • 2023-09-11 21:04:18 UTC Requested delisting for lists.katacontainers.io IPv4 address from SpamHaus PBL
  • 2023-09-06 15:37:33 UTC restarted zuul schedulers/web to pick up default branch bugfix
  • 2023-09-04 22:39:22 UTC Gerrit changes with updates to Zuul's configuration should now be handled correctly. Recheck any changes to Zuul configuration which did not report results.
  • 2023-09-04 20:07:09 UTC Some Gerrit changes that update Zuul configuration may fail with no response from Zuul. A fix is in progress.
  • 2023-08-29 21:39:07 UTC Deleted insecure-ci-registry01.opendev.org (e8753b25-4743-402e-8ec5-8987d11202eb) as it has been replaced by a newer server.
  • 2023-08-28 18:48:42 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org as the volume had reached 90% utilization
  • 2023-08-28 16:27:57 UTC Increased quotas of the AFS mirror volumes for centos from 400GB to 450GB and centos-stream from 250GB to 300GB
  • 2023-08-23 15:34:50 UTC Gerrit is going to be restarted to pick up a small config update. You will notice a short outage of the service.
  • 2023-08-15 20:42:31 UTC restarted the rest of zuul so all components are at the same version
  • 2023-08-15 17:45:23 UTC Zuul job execution has resumed with additional disk space on the servers
  • 2023-08-15 16:56:02 UTC Zuul job execution is temporarily paused while we rearrange local storage on the servers
  • 2023-08-10 13:12:22 UTC manually updated the jitsi web cert on meetpad01 and restarted the jitsi web container
  • 2023-08-07 15:19:51 UTC Cleared out /opt/dib_tmp on nb01 and nb02, restoring their ability to build new amd64 images for the first time since early on 2023-08-02
  • 2023-08-03 15:53:40 UTC Retired and cleaned external IDs from Gerrit account 36249 at its owner's request
  • 2023-08-03 13:56:50 UTC Moved launchpadlib's cache aside on the Gerrit server in order to address failures related to https://ubuntu.social/@launchpadstatus/110594525393361192
  • 2023-08-02 21:57:51 UTC restarted zuul-web to pick up ui changes
  • 2023-07-29 19:54:12 UTC restarted all of zuul on 6c0ffe565f1d0025ccee08a697cc73b4594942e5
  • 2023-07-29 16:56:40 UTC Zuul schedulers have been updated to fixed images and everything's moving normally again
  • 2023-07-29 15:53:00 UTC Some changes and pipelines have been stuck for the past 8 hours due to an upgrade-related Zuul bug, fix is in flight at https://review.opendev.org/890026
  • 2023-07-26 21:03:13 UTC The Gerrit service on review.opendev.org will be offline briefly for a minor upgrade, but should return shortly
  • 2023-07-26 20:04:26 UTC The Gerrit service on review.opendev.org will be offline briefly for a minor upgrade at 21:00 utc, approximately an hour from now
  • 2023-07-25 18:48:21 UTC Manually added a Django "site" for lists.zuul-ci.org and associated the corresponding Mailman mail domain with that site
  • 2023-07-24 19:34:29 UTC Set Gerrit's receive.rejectImplicitMerges option per https://review.opendev.org/885318
  • 2023-07-24 16:44:02 UTC deleted old zuul executor servers
  • 2023-07-16 15:45:48 UTC started ze07-ze12 on new jammy hosts
  • 2023-07-15 21:44:40 UTC gracefully shut down ze07-ze12 for server replacement
  • 2023-07-11 17:38:05 UTC started zuul on replacement ze04-ze06 servers
  • 2023-07-10 21:38:43 UTC began gracefully shutting down ze04-ze06 for server replacement
  • 2023-06-29 18:41:48 UTC Deleted an etherpad at the request of its authors.
  • 2023-06-28 20:25:02 UTC started ze02 and ze03 on new jammy hosts
  • 2023-06-27 18:13:22 UTC paused ze01-ze03 to prepare for server replacement
  • 2023-06-26 15:38:19 UTC Restored /etc/hosts for pypi hosts on mirror.dfw.rax.opendev.org as ipv6 connectivity appears to be working there again
  • 2023-06-23 15:06:32 UTC updated quota for multiple AFS mirror.* volumes to adapt to current demand
  • 2023-06-23 07:38:15 UTC added another 1T volume for /vicepa on afs01/02.dfw to increase AFS capacity
  • 2023-06-22 17:09:09 UTC increased the quota for the mirror/ubuntu-ports volume from 470G to 550G
  • 2023-06-22 12:57:52 UTC Rolled the resf-rocky-linux-git-c.o-changes etherpad back to revision 2474 since it fell victim to an accidental bulk auto-translation
  • 2023-06-20 15:32:04 UTC Rolled back the state of the keystone-weekly-meeting etherpad a few revisions to 62833 in order to undo some corruption introduced by a translator extension
  • 2023-06-15 19:34:56 UTC Edited /etc/hosts on dfw.rax mirror in order to force connectivity to pypi to occur over ipv4. Apacahe2 was reloaded as well to speed things up.
  • 2023-06-12 16:55:26 UTC force-merged another batch of zuul config error cleanup patches
  • 2023-06-05 20:29:34 UTC Deleted zp01.opendev.org (0eb65b92-2ccc-4fc1-a410-c240c96851f0) as it has been replaced by a newer server.
  • 2023-05-26 07:34:20 UTC force-merged https://review.opendev.org/c/openstack/horizon/+/883995 to unblock horizon CI at PTL request
  • 2023-05-25 00:03:41 UTC replaced all zuul-merger hosts with new jammy nodes
  • 2023-05-18 17:50:38 UTC Manually updated the SSL cert on wiki.openstack.org
  • 2023-05-17 19:53:18 UTC restarted remaining zuul executors
  • 2023-05-17 19:00:06 UTC hard restarted ze01
  • 2023-05-12 20:42:46 UTC The Gerrit service on review.opendev.org will be offline briefly for a patchlevel update, but should return to service in a few minutes
  • 2023-05-03 16:43:52 UTC restarted zuul web components
  • 2023-05-01 02:13:44 UTC shutdown ns1.opendev.org, ns2.opendev.org and adns1.opendev.org that have been replaced with ns03.opendev.org, ns04.opendev.org and adns02.opendev.org
  • 2023-04-26 20:21:19 UTC Deleted etherpad01.opendev.org (648795e3-a523-4998-8256-8e40c6e6f222) and its volume (020a2963-1d11-4665-bfdf-1fefb74c8a9f) to complete the etherpad server replacement and cleanup
  • 2023-04-26 04:45:32 UTC mirror-update02 in emergency as it runs a full release of project.tarballs after the volume became locked during a prior operation
  • 2023-04-24 16:22:26 UTC Shutdown etherpad01.opendev.org in final preparation of its removal.
  • 2023-04-19 23:32:24 UTC Moved the etherpad service from etherpad01.opendev.org to etherpad02.opendev.org
  • 2023-04-19 21:59:53 UTC The Etherpad service on etherpad.opendev.org will be offline for the next 90 minutes for a server replacement and operating system upgrade
  • 2023-04-16 22:56:21 UTC This is a test of the status logging system. If this were a real status update, there would be status update information here.
  • 2023-04-13 23:06:26 UTC Deleted static01.opendev.org (ae2fe734-cf8f-4ead-91bf-5e4e627c8d2c) as it has been replaced by static02.opendev.org
  • 2023-04-11 15:02:06 UTC stopped apache on static01.opendev.org
  • 2023-04-11 14:07:27 UTC started apache on static01.opendev.org since it's serving zuul-ci.org
  • 2023-04-06 22:00:51 UTC The Gerrit service on review.opendev.org will be offline for extended periods over the next two hours for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/
  • 2023-04-06 21:06:22 UTC The Gerrit service on review.opendev.org will be offline for extended periods between 22:00 and 00:00 UTC for software upgrades and project renames: https://lists.opendev.org/archives/list/service-announce@lists.opendev.org/thread/VW2O56AXI4OX34CWDNRNZDCWJDZR3QJP/
  • 2023-03-29 12:31:27 UTC removed stale lockfile dating Feb 28 /afs/.openstack.org/mirror/ubuntu-ports/db/lockfile
  • 2023-03-29 06:06:02 UTC force-merged https://review.opendev.org/c/openstack/horizon/+/875326 to resolve deadlock caused by multi-branch failures
  • 2023-03-28 07:23:32 UTC Switched Gerrit account for "YADRO TATLIN CI" (33746) back to active because the vendor CI should haven been fixed
  • 2023-03-27 18:29:00 UTC Gitea01-04 have been deleted. Gitea is entirely running off of the new replacement servers at this point.
  • 2023-03-22 15:13:40 UTC Switched Gerrit account for "YADRO TATLIN CI" (33746) to inactive because it seems to be misconfigured and leaving noise comments on many projects' changes
  • 2023-03-15 03:10:02 UTC pruned backups on backup02
  • 2023-03-10 18:41:18 UTC Deleted gitea05-07 and their associated boot volumes as they have been replaced with gitea10-12.
  • 2023-03-10 17:22:52 UTC force-merged https://review.opendev.org/877107 to bootstrap nebulous tenant
  • 2023-03-09 17:25:25 UTC Yesterday's change to Gerrit configs to use submit-requirements had a boolean logic bug. This has now been corrected and any changes that did not merge as a result can be rechecked. We have reenqueued the changes we identified as being affected.
  • 2023-03-08 22:22:18 UTC switched Gerrit ACL's to submit-requirements. You may see a status of "n/a" in https://review.opendev.org/dashboard/self, this should resolve as changes are updated and reindexed by Gerrit
  • 2023-03-08 19:13:21 UTC Deleted gitea08 and its associated boot volume as part of gitea server replacements
  • 2023-03-07 22:51:13 UTC Increased afs quotas for ubuntu-ports, debian-security, centos, and centos-stream mirrors
  • 2023-03-07 17:52:42 UTC Restarted the nodepool-launcher container on nl02 in order to force a config reload, as a workaround until https://review.opendev.org/875250 is deployed
  • 2023-03-06 19:26:42 UTC Manually disabled gitea01-04 in haproxy to force traffic to go to the new larger gitea servers. This can be undone if the larger servers are not large enough to handle the load.
  • 2023-03-03 14:18:41 UTC Booted mirror.iad3.inmotion via Nova API after it was found in power_state=Shutdown since 13:39:47 UTC
  • 2023-02-27 20:23:52 UTC Restarted the ptgbot service, apparently hung and serving a dead web page at ptg.opendev.org since 2023-02-07
  • 2023-02-27 19:56:41 UTC The Gerrit service on review.opendev.org experienced severe performance degradation between 17:50 and 19:45 due to excessive API query activity; the addresses involved are now blocked but any changes missing job results from that timeframe should be rechecked
  • 2023-02-27 18:57:21 UTC The Gerrit service on review.opendev.org has been restarted to clear an as of yet undiagnosed condition which lead to a prolonged period of unresponsiveness
  • 2023-02-27 13:50:07 UTC Booted mirror.iad3.inmotion via Nova API after it was found in power_state=Shutdown since 22:19:58 UTC yesterday
  • 2023-02-23 00:23:11 UTC Increased the project.starlingx quota in AFS from 1GB to 2GB
  • 2023-02-10 18:15:02 UTC manually dequeued 863252 due to zuul serialization error
  • 2023-02-10 16:07:58 UTC deleted pipeline state for openstack/release-post and openstack/deploy due to data corruption in zk
  • 2023-02-10 00:44:13 UTC all production hosts updated to docker 23
  • 2023-02-09 19:14:45 UTC Bypassed testing to merge urgent change 873320 temporarily disabling uploads to one of our log storage providers which is exhibiting problems
  • 2023-02-02 05:29:53 UTC restarted gerrit @ https://review.opendev.org/c/opendev/system-config/+/870874
  • 2023-01-30 20:43:43 UTC ran backup prune on backup02.ca-ymq-1
  • 2023-01-25 12:09:45 UTC pruned backups on backup01.ord.rax.opendev.org
  • 2023-01-20 03:33:33 UTC restarted gerrit to pick up https://review.opendev.org/c/opendev/system-config/+/871202
  • 2023-01-16 03:34:19 UTC manually restarted iptables on graphite02, tracing01, zk<04,05,06> to pick up new rules for nb04 that were not applied due to failures fixed by https://review.opendev.org/c/opendev/system-config/+/869888
  • 2023-01-13 17:51:50 UTC restarted all nodepool launchers to pick up private_ipv4 bugfix
  • 2023-01-10 22:46:03 UTC One of our CI job log storage providers appears to be having trouble with log uploads and retrievals. We are in the process of removing that provider from the pool.
  • 2023-01-10 19:24:04 UTC Restarted services on lists.openstack.org since some mailman processes were terminated earlier today by out-of-memory events
  • 2022-12-27 15:06:37 UTC Manually synchronized CentOS Stream 9 mirrors after https://review.opendev.org/868392 was deployed
  • 2022-12-26 02:13:39 UTC Restarted services on lists.openstack.org since some mailman processes were terminated by out-of-memory events
  • 2022-12-25 16:42:17 UTC Restarted services on lists.openstack.org since some mailman processes were terminated earlier today by out-of-memory events
  • 2022-12-24 14:08:32 UTC Restarted services on lists.openstack.org since some mailman processes were terminated yesterday by out-of-memory events
  • 2022-12-22 04:20:57 UTC gitea08 restarted and resynced to gerrit
  • 2022-12-12 20:00:29 UTC Gerrit will be unavailable for a short time as it is upgraded to the 3.6 release
  • 2022-12-09 14:41:39 UTC Pruned backups on backup02.ca-ymq-1.vexxhost reducing /opt/backups-202010 utilization from 90% to 58%
  • 2022-12-09 00:03:45 UTC manually purged yaml-mode package on previously in-place upgraded hosts afs01, afs02, afsdb02, lists.openstack.org, lists.katacontainers.io
  • 2022-12-05 20:03:03 UTC The lists.opendev.org and lists.zuul-ci.org sites will be offline briefly for migration to a new server
  • 2022-12-05 06:39:13 UTC delisted review.opendev.org from Spamhaus blocklist, several coporate domains were rejecting Gerrit mail
  • 2022-12-02 02:35:19 UTC restarted gerrit to pickup a fix to automated blueprint updates (https://review.opendev.org/c/opendev/jeepyb/+/866237)
  • 2022-11-30 23:26:15 UTC Cleaned up leaked nodepool instances in rax. Nodepool couldn't clean them up automatically due to missing metadata.
  • 2022-11-17 01:07:42 UTC Status logs now appearing at https://fosstodon.org/@opendevinfra
  • 2022-11-16 05:37:24 UTC restarted gerrit to pick up 3.5.4 (https://review.opendev.org/c/opendev/system-config/+/864217)
  • 2022-11-16 03:04:31 UTC test
  • 2022-11-16 01:50:54 UTC test
  • 2022-11-03 20:15:17 UTC restarted zuul-web on current master to pick up bugfixes
  • 2022-11-02 18:59:33 UTC Zuul's ZK cluster has been upgraded to 3.7 via 3.6.
  • 2022-11-01 14:27:56 UTC review.opendev.org (Gerrit) is back online
  • 2022-11-01 14:13:52 UTC restarted docker containers on review02 which were not running after a crash/reboot
  • 2022-11-01 07:33:17 UTC review.opendev.org (Gerrit) is currently down, we are working to restore service as soon as possible
  • 2022-10-31 15:39:25 UTC Rebooted etherpad.o.o after the db cinder volume remounted RO due to errors. After reboot it came back RW and services were restarted.
  • 2022-10-28 17:18:03 UTC Deleted jvb02.opendev.org (a93ef02b-4e8b-4ace-a2b4-cb7742cdb3e3) as we don't need this extra jitsi meet jvb to meet ptg demands
  • 2022-10-28 17:10:44 UTC Deleted gitea-lb01 (e65dc9f4-b1d4-4e18-bf26-13af30dc3dd6) and its BFV volume (41553c15-6b12-4137-a318-7caf6a9eb44c) as this server has been replaced with gitea-lb02.
  • 2022-10-24 13:32:26 UTC Pruned backups on backup02.ca-ymq-1.vexxhost, reducing volume utilization by 36% (from 93% to 57%)
  • 2022-10-21 12:22:07 UTC Reenqueued 862029,1 in the openstack tenant's promote pipeline in order to address a publication race with an earlier change
  • 2022-10-20 23:22:28 UTC Restarted Gerrit on our latest gerrit 3.5 image (change 861270) which resyncs our plugin versions to 3.5.3.
  • 2022-10-18 18:25:38 UTC Restarted the services on etherpad.opendev.org in order to free up some disk space
  • 2022-10-14 18:16:48 UTC Blocked two IPs using iptables that are having ssh connectivity issues to Gerrit in order quiet our logs
  • 2022-10-07 03:59:15 UTC restarted gerrit @ 3.5.3 (https://review.opendev.org/c/opendev/system-config/+/859885)
  • 2022-09-27 19:48:05 UTC Added mirror volume in AFS for Ceph Quincy Debian/Ubuntu packages
  • 2022-09-27 16:51:06 UTC Added a 1TB volume to each of nb01 and nb02 in order to accommodate the recent increase in built nodepool images
  • 2022-09-20 12:41:42 UTC Manually patched configuration on nl04 in order to get the nodepool-launcher service running again while we wait for change 858523 to be reviewed
  • 2022-09-20 03:10:31 UTC Restarted the meetbot container on eavesdrop01 in order to pick up new channel additions
  • 2022-09-19 13:40:01 UTC As of the weekend, Zuul only supports queue declarations at the project level; if expected jobs aren't running, see this announcement: https://lists.opendev.org/pipermail/service-announce/2022-September/000044.html
  • 2022-09-15 23:13:15 UTC restarted zuul-registry container
  • 2022-09-14 22:29:39 UTC performed reprepro db recovery on debian mirror; has been synced and volume released
  • 2022-09-09 21:55:04 UTC Deleted unused OpenStack "test-list" mailing list
  • 2022-09-08 03:16:44 UTC translate and translate-dev RAX hosted mysql instances upgraded to mysql 5.7
  • 2022-09-07 04:34:11 UTC pruned backups on backup02
  • 2022-09-05 21:37:20 UTC removed `msg_footer` option from zuul-discuss and zuul-announce mailing lists
  • 2022-08-30 00:06:37 UTC Restarted the ptgbot container on eavesdrop01 since it seems to have fallen off the IRC network on 2022-05-25 and never realized it needed to reconnect
  • 2022-08-24 18:02:30 UTC Retired and cleaned external refs from abandoned Gerrit account 34566 at the owner's request
  • 2022-08-24 18:02:23 UTC Restarted the statusbot container on eavesdrop01 after it fell off the network around 11:00 UTC
  • 2022-08-23 07:11:18 UTC restarted gerrit to deploy https://review.opendev.org/c/opendev/system-config/+/853528
  • 2022-08-17 10:45:07 UTC added slaweq to newly created whitebox-neutron-tempest-plugin-core group (https://review.opendev.org/c/openstack/project-config/+/851031)
  • 2022-08-17 10:07:35 UTC removed procedural -2 from https://review.opendev.org/c/openstack/nova/+/826523 since after the freeze is before the freeze
  • 2022-08-12 12:46:04 UTC Restarted meetbot container to pick up recently deployed channel addition in its config
  • 2022-08-12 06:34:22 UTC resynced centos-stream mirror from RAX
  • 2022-07-28 15:03:07 UTC force-merged https://review.opendev.org/851414 due to regression in existing jobs
  • 2022-07-28 13:22:52 UTC Temporarily removed one of our Swift providers from the Zuul build log upload list because their API cert expired, resulting in POST_FAILURE errors starting at 12:00 UTC
  • 2022-07-28 00:10:42 UTC Replaced 3 Cinder volumes across afs01.ord and backup01.ord servers in order to avoid impact from upcoming scheduled maintenance for the service provider
  • 2022-07-27 17:29:16 UTC restarted zuul02 to resolve conflicting change key error
  • 2022-07-18 02:15:10 UTC pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org
  • 2022-07-15 19:32:01 UTC Replaced the 250GB volume for Gerrit data on review.opendev.org with a 1TB volume, to handle future cache growth
  • 2022-07-14 07:57:16 UTC freed up some space on gerrit partition review.opendev.org after full disk errors
  • 2022-07-13 19:21:27 UTC The afs01.dfw server is back in full operation and writes are successfully replicating once more
  • 2022-07-13 14:54:39 UTC Due to an incident in our hosting provider, the tarballs.opendev.org site (and possibly other sites served from static.opendev.org) is offline while we attempt recovery
  • 2022-07-12 15:02:25 UTC Log uploads to OVH's Swift are resuming and our voucher is renewed; thanks again amorin!
  • 2022-07-12 11:51:44 UTC Temporarily disabled log uploads to OVH's Swift while we look into an account authorization problem
  • 2022-07-12 00:22:51 UTC Added 22.03-LTS to the openEuler mirror volume
  • 2022-07-11 02:44:58 UTC Zuul now defaults to using Ansible 5; see https://lists.opendev.org/pipermail/service-announce/2022-June/000041.html
  • 2022-07-08 16:47:39 UTC Converted 18 OpenStack stable branch maintenance groups in Gerrit from centrally-managed to self-owned at gmann's request
  • 2022-07-06 23:00:46 UTC all stats on graphite.opendev.org reset with an xfilesfactor of 0; c.f. https://review.opendev.org/c/opendev/system-config/+/847876
  • 2022-07-05 23:09:57 UTC restarted all of zuul on 78b14ec3c196e7533ac2c72d95fba09c936e625a
  • 2022-07-05 15:57:36 UTC Moved all mailing list sites entirely to HTTPS
  • 2022-06-30 17:56:00 UTC removed windmill/*, x/ansible-role-shade, and x/neutron-classifier from Zuul openstack tenant config.
  • 2022-06-29 22:53:17 UTC Restarted apache2 on static.opendev.org since all sites seem to have been hung and timing out requests as of 22:35 UTC
  • 2022-06-25 22:03:21 UTC Bypassed Zuul in order to remove indefinitely failing periodic jobs from old stable branches of openstack/{ceilometer,networking,nova}-powervm at the request of elodilles
  • 2022-06-21 06:59:51 UTC restarted gerrit @ https://review.opendev.org/c/opendev/system-config/+/846809
  • 2022-06-19 21:12:44 UTC Gerrit 3.5 upgrade is complete. Please reach us in #opendev if you see any issues
  • 2022-06-19 20:03:21 UTC "Gerrit will be unavailable for a short time as it is upgraded to the 3.5 release"
  • 2022-06-17 07:44:35 UTC pushed openstack/requirements 846277,1 to gate in order to unblock neutron
  • 2022-06-09 09:44:12 UTC restored two nova etherpads that had been mangled
  • 2022-06-08 12:19:38 UTC paused centos-9-stream image builds due to https://bugzilla.redhat.com/show_bug.cgi?id=2094683
  • 2022-06-08 10:31:33 UTC cleanup up /opt on nb01+2 and rebooted them
  • 2022-06-02 16:07:20 UTC Deleted ethercalc02.openstack.org (be4f91d1-30d6-4db9-8e93-8af932d0633a) after shutting it down for a couple of days and snapshotting it.
  • 2022-06-01 20:22:55 UTC restarted nodepool launchers on 6.0.0 after encountering suspected sdk 0.99 bug
  • 2022-06-01 20:16:15 UTC Restarted all of zuul on 6.0.1.dev54 69199c6fa
  • 2022-06-01 20:15:26 UTC restarted nodepool launchers on 6416b1483821912ac7a0d954aeb6e864eafdb819, likely with sdk 0.99
  • 2022-06-01 04:04:21 UTC Restarted gerrit with 3.4.5 (https://review.opendev.org/c/opendev/system-config/+/843298)
  • 2022-05-31 22:28:24 UTC restarted zuul mergers on 6.0.1.dev54 7842e3fcf10e116ca47cfffbd82022802b53432d which includes merger graceful fix in preparation for rolling restart
  • 2022-05-27 17:37:49 UTC Upgraded all of Zuul to 6.0.1.dev34 b1311a590. There was a minor hiccup with the new deduplicate attribute on jobs that forced us to dequeue/enqueue two buildsets. Otherwise seems to be running.
  • 2022-05-26 15:40:08 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org bringing filesystem utilization down from 90% to 54%
  • 2022-05-24 17:56:13 UTC Replaced manually-obtained X.509 HTTPS cert on legacy wiki.openstack.org server
  • 2022-05-24 17:55:54 UTC Restarted statusbot container on eavesdrop01 since it appeared to disconnect from the IRC server around 2022-05-17 16:30 UTC and did not reconnect on its own (but also logged nothing to explain why)
  • 2022-05-16 23:31:39 UTC manually fixed kernel install and rebooted kdc04.openstack.org
  • 2022-05-16 16:36:35 UTC Updated OpenAFS quotas for distro mirrors to better reflect current usage.
  • 2022-05-12 18:05:19 UTC Updated zuul-web and zuul-fingergw to 6.0.1.dev14 60e59ba67. This fixes scrolling to specific line numbers on log files.
  • 2022-05-09 19:18:14 UTC Updated database-wide default encoding for Etherpad server to utf8mb4, as setting it only on the table is insufficient to satisfy its safety checks
  • 2022-05-04 13:00:05 UTC Retired 11 OpenStack mailing lists which were unused for 3 years or longer: http://lists.openstack.org/pipermail/openstack-discuss/2022-May/028406.html
  • 2022-05-03 04:57:35 UTC restarted gerrit to pickup changes deployed with stack @ https://review.opendev.org/c/opendev/system-config/+/839251/
  • 2022-05-02 22:52:21 UTC restarted nodepool launchers on a2e5e640ad13b5bf3e7322eb3b62005484e21765
  • 2022-05-01 00:28:56 UTC rolled back to zuul 5.2.5 after finding latest images are missing netaddr in the ansible container
  • 2022-04-29 16:13:40 UTC Decommissioned the status.openstack.org server as it was no longer hosting any working services: http://lists.openstack.org/pipermail/openstack-discuss/2022-April/028279.html
  • 2022-04-29 14:25:48 UTC Replaced block storage volume mirror01.ord.rax.opendev.org/main01 with main02 in order to avoid service disruption from upcoming provider maintenance activity
  • 2022-04-29 02:00:10 UTC Replaced block storage volume backup01.ord.rax.opendev.org/main02 with main04 in order to avoid service disruption from upcoming provider maintenance activity
  • 2022-04-28 22:18:12 UTC Replaced block storage volume afs01.ord.openstack.org/main02 with main03 in order to avoid service disruption from upcoming provider maintenance activity
  • 2022-04-27 17:02:49 UTC Deleted the old subunit2sql database now that the OpenStack CI Health dashboard and subunit2sql workers have been removed.
  • 2022-04-27 06:20:06 UTC To save mirror volume space, we have removed source packages from the ubuntu-ports repository
  • 2022-04-26 22:23:54 UTC restarted zuul schedulers/web/finger on 77524b359cf427bca16d2a3339be9c1976755bc8
  • 2022-04-25 17:45:59 UTC The retired ELK, subunit2sql, and health api services have now been deleted.
  • 2022-04-22 19:47:37 UTC Sent periodic summary of activities to announcements list at https://lists.opendev.org/pipermail/service-announce/2022-April/000037.html
  • 2022-04-22 17:21:28 UTC The old ELK services have been stopped and disabled in preparation for deletion which should occur early next week if no unexpected issues arise.
  • 2022-04-22 12:19:00 UTC Manually restarted the apache2 service on lists.openstack.org because at least one worker was serving a stale certificate several days after successful rotation
  • 2022-04-21 19:56:34 UTC restarted all of zuul on dd0135baa51cdf21a18831926c04227caa060878
  • 2022-04-21 16:22:44 UTC bumped AFS quota for the ubuntu volume by 300G in preparation for mirroring 22.04
  • 2022-04-20 23:52:43 UTC rolling restarted all of zuul on d8011793f94f82452338ee3e0b193928f80a4a46
  • 2022-04-20 16:24:06 UTC Manually completed a CentOS Stream 8 mirror rsync into AFS in order to bypass the safety timeout and work around a large amount of package churn
  • 2022-04-20 02:27:56 UTC test
  • 2022-04-19 18:35:39 UTC Released git-review 2.3.0 and 2.3.1 https://lists.opendev.org/pipermail/service-announce/2022-April/000036.html
  • 2022-04-19 18:35:08 UTC Released bindep 2.11.0 https://lists.opendev.org/pipermail/service-announce/2022-April/000035.html
  • 2022-04-19 14:00:21 UTC Restored the glance-team-meeting-agenda Etherpad to revision 103282 at pdeore's request
  • 2022-04-19 12:55:08 UTC Manually deleted /afs/openstack.org/mirror/wheel/centos-8-aarch64/g/grpcio as it contained only corrupt (zero-byte) packages
  • 2022-04-15 15:10:43 UTC Reenabled deployments for all Gerrit, Gitea and StoryBoard servers in conclusion of today's service maintenance
  • 2022-04-15 15:01:09 UTC The Gerrit service at review.opendev.org is going offline now for scheduled maintenance, but should be available again in a few minutes
  • 2022-04-15 14:02:52 UTC Reminder: The Gerrit service at review.opendev.org will be offline briefly starting at 15:00 UTC (roughly an hour from now) for scheduled maintenance; see http://lists.opendev.org/pipermail/service-announce/2022-April/000034.html for details
  • 2022-04-15 13:30:25 UTC Temporarily disabled deployments for all Gerrit, Gitea and StoryBoard servers in preparation for today's upcoming service maintenance activity
  • 2022-04-14 15:16:00 UTC Restored the zed-glance-ptg Etherpad to revision 9801 at abhishekk's request
  • 2022-04-08 00:47:40 UTC created centos9 x86/aarch wheel volumes. removed weird extra focal wheel volume manually, and removed unused cent7a64 volumes
  • 2022-04-06 03:57:25 UTC testing statusbot
  • 2022-04-06 03:47:14 UTC testing statusbot
  • 2022-04-04 13:21:30 UTC Requested Spamhaus PBL exclusion for the IPv4 address of lists.kata-containers.io
  • 2022-04-01 22:38:53 UTC Restarted Zuul executors for kernel and docker updates, now running on Zuul 5.2.2.dev2 (08348143)
  • 2022-03-31 15:53:27 UTC Paused centos-9-stream image building to ensure it doesn't rebuild and delete our prior image before we can properly debug related NODE_FAILURES
  • 2022-03-30 19:20:44 UTC Pruned backups on backup02.ca-ymq-1.vexxhost in order to free up some storage, info in /opt/backups/prune-2022-03-30-17-05-53.log
  • 2022-03-28 22:06:53 UTC monkeypatched https://review.opendev.org/835518 into running zuul schedulers
  • 2022-03-28 07:19:54 UTC zuul isn't executing check jobs at the moment, investigation is ongoing, please be patient
  • 2022-03-25 16:04:55 UTC Deleted 9 pastes from paste.o.o at the request of the users.
  • 2022-03-25 15:06:40 UTC rolling restart of all of zuul on at least 5.2.0
  • 2022-03-25 14:10:16 UTC Bypassed Zuul to force the merging of 835193,10 due to circular dependency on changes for related repositories addressing regressions in Setuptools 61.0.0
  • 2022-03-21 21:45:39 UTC Updated Gerrit on review.o.o to 3.4.4-14-g76806c8046-dirty
  • 2022-03-21 21:34:29 UTC The Gerrit service on review.opendev.org will be offline momentarily for a Gerrit patch upgrade and kernel update, but should return again shortly
  • 2022-03-21 20:01:44 UTC manually published /afs/openstack.org/project/zuul-ci.org/www/docs/zuul/5.1.0
  • 2022-03-18 21:41:45 UTC Rebooted all 12 Zuul executors for Linux kernel updates to address CVE-2022-25636
  • 2022-03-17 15:40:07 UTC Pruned backups on backup01.ord.rax in order to free up some storage, info in /opt/backups/prune-2022-03-17-02-13-07.log
  • 2022-03-16 17:11:28 UTC Upgraded gitea cluster to gitea 1.16.4
  • 2022-03-12 09:56:17 UTC recreated neutron:stable/yoga branch @452a3093f62b314d0508bc92eee3e7912f12ecf1 in order to have zuul learn about this branch
  • 2022-03-09 17:12:52 UTC Restarted the ptgbot service on eavesdrop since it seems to have not started cleanly when the server was rebooted on 2022-01-27
  • 2022-03-08 17:33:09 UTC Restarted the ethercalc service on ethercalc.openstack.org, which was still running and hadn't logged any errors but was not responding to connections either
  • 2022-03-04 15:56:48 UTC restarted zuul and nodepool launchers; schedulers are at bb2b38c4be8e2592dd2fb7f1f4b631436338ec98 executors a few commits behind, and launchers at ac35b630dfbba7c6af90398b3ea3c82f14eabbde
  • 2022-03-01 23:03:29 UTC Deleted mirror01.kna1.airship-citycloud.opendev.org to finish up resource cleanup in that cloud provider now that it has been removed from Nodepool.
  • 2022-02-28 15:34:13 UTC Started the Ethercalc service after it crashed at 11:29:07 UTC
  • 2022-02-28 05:11:10 UTC stopped gerrit, moved jvm log files to ~gerrit2/tmp/jvm-logs, restarted gerrit to apply 830912
  • 2022-02-28 00:11:07 UTC redirected old jjb at docs.o.o/infra/jenkins-job-builder to RTD site; moved old content to ./attic/* and added a .htaccess and README in the original dir. see https://groups.google.com/g/jenkins-job-builder/c/U6VL3_ajoMA/m/SlYzxDJcAwAJ
  • 2022-02-19 19:42:28 UTC Restarted the Gerrit service on review.opendev.org to switch its code browsing links to the opendev.org Gitea instead of Gerrit's built-in Gitiles service
  • 2022-02-17 21:09:19 UTC restarted zuul-web on commit ba041a3d8ba31355a9057367c6b836589f9fe805 to address log streaming errors
  • 2022-02-15 04:29:43 UTC Updated OpenID provider for the refstack.openstack.org service from openstackid.org to id.openinfra.dev
  • 2022-02-14 16:59:08 UTC Updated OpenID provider for the Zanata service on translate.openstack.org from openstackid.org to id.openinfra.dev
  • 2022-02-11 16:59:26 UTC Manually deleted /nodepool/images/opensuse-15/builds/0000147440 from ZooKeeper, which had been holding back image cleanup for several weeks
  • 2022-02-10 16:54:37 UTC rolling restarted all of zuul on ad1351c225c8516a0281d5b7da173a75a60bf10d
  • 2022-02-09 23:47:49 UTC Restarted Gerrit to pick up sshd.batchThreads = 0 config update
  • 2022-02-06 01:41:44 UTC rolling restart of zuul onto commit 335502cb4fec439de37e46fec2ab676663b4f403
  • 2022-02-02 23:09:49 UTC in-place upgraded nb0<1,2,3>.opendev.org to focal to better match production to our gate test environment
  • 2022-02-01 04:11:46 UTC restarted gerrit to get changes from 827153
  • 2022-01-31 07:42:23 UTC bumped centos volume quota to 450gb, did a manual run to get it back in sync
  • 2022-01-30 21:10:49 UTC Bypassed gating to merge https://review.opendev.org/826974 and manually applied the results of https://review.opendev.org/826969 to files in AFS in order to avoid waiting for daily periodic jobs to trigger.
  • 2022-01-28 23:31:54 UTC restarted all of nodepool on 1a73a7a33ed63ad919377fae42c14390d8fb9eb5
  • 2022-01-28 23:25:24 UTC restarted all of zuul on 930ee8faa3076233614565fcfbf55a4ee74551a7
  • 2022-01-27 08:39:07 UTC restarted gerritbot which had gone missing at 04:56:40
  • 2022-01-24 22:03:13 UTC The review.opendev.org maintenance work is beginning now. Expect Gerrit outages over the next couple of hours. See https://lists.opendev.org/pipermail/service-announce/2022-January/000030.html for details.
  • 2022-01-24 21:05:27 UTC review.opendev.org will have a few short outages over the next few hours (beginning 22:00 UTC) while we rename projects and then upgrade to Gerrit 3.4. See https://lists.opendev.org/pipermail/service-announce/2022-January/000030.html for details.
  • 2022-01-22 22:14:50 UTC rolling restart of zuul onto commit 1ed186108956a1f7cc5fe34dc9d93731beaa56f6
  • 2022-01-21 23:04:18 UTC The Gerrit service on review.opendev.org is being restarted briefly to apply a bugfix
  • 2022-01-21 12:57:37 UTC Pruned backups on backup02.ca-ymq-1.vexxhost.opendev.org
  • 2022-01-20 23:00:05 UTC performed (mostly) rolling restart of zuul on commit 548eafe0b5729e78ab4024abea98b326678d83d8
  • 2022-01-20 21:15:21 UTC manually moved new rebuilds of old zuul docs into position on project.zuul afs volume
  • 2022-01-19 21:48:27 UTC performed (mostly) rolling restart of zuul onto commit d304f4134f05fa08aab70e9add6ec490370dc6e2
  • 2022-01-18 23:43:04 UTC upgraded Gerrit on review02 to 3.3.9
  • 2022-01-18 21:51:04 UTC The meetpad.opendev.org services have returned to working order, and are available for use once again
  • 2022-01-16 13:42:47 UTC Due to a hypervisor host problem in our donor provider, the afs01.dfw.openstack.org server was rebooted at 11:48 UTC
  • 2022-01-12 18:57:01 UTC The ethercalc server was rebooted at 11:17 UTC due to a hypervisor host problem in our donor provider
  • 2022-01-12 18:53:25 UTC Restarted statusbot and gerritbot as they did not seem to gracefully cope with an apparent netsplit we experienced around 18:30 UTC
  • 2021-12-17 22:30:12 UTC The review.opendev.org server is being rebooted to validate a routing configuration update, and should return to service shortly
  • 2021-12-16 01:38:11 UTC Our jitsi-meet services including meetpad.opendev.org are shut down temporarily again, out of an abundance of caution awaiting newer images
  • 2021-12-13 21:05:16 UTC Jitsi-Meet services on meetpad.opendev.org are back in service again following an upgrade to the most recent image builds
  • 2021-12-10 20:18:43 UTC The Gerrit service on review.opendev.org is being restarted again for a plugin change, and should be back shortly
  • 2021-12-10 17:48:53 UTC Restarted the Ethercalc service on ethercalc.openstack.org as it seems to have unceremoniously crashed and stopped on 2021-12-05 at 12:12:57 UTC.
  • 2021-12-10 17:28:16 UTC The Gerrit service on review.opendev.org is being quickly restarted for a configuration adjustment, and should return momentarily
  • 2021-12-10 05:27:08 UTC meetpad is currently down for some unexpected maintenance. there is no fixed time frame, however it is likely to be restored tomorrow during US hours
  • 2021-12-03 03:58:16 UTC performed rolling restart of zuul01/02 and zuul-web
  • 2021-12-03 02:54:15 UTC restarted gerrit with 3.3.8 from https://review.opendev.org/c/opendev/system-config/+/819733/
  • 2021-12-02 18:41:50 UTC Temporarily disabled ansible deployment through bridge.o.o while we troubleshoot system-config state there
  • 2021-12-01 23:54:54 UTC Restarted all of Zuul on ac9b62e4b5fb2f3c7fecfc1ac29c84c50293dafe to correct wedged config state in the openstack tenant and install bug fixes.
  • 2021-11-25 16:50:39 UTC The nl03 and zm01 servers were rebooted at 07:05 UTC today due to a host hardware issue at the provider.
  • 2021-11-16 18:41:24 UTC Rebooted main lists.o.o server onto new extracted vmlinuz-5.4.0-90-generic kernel
  • 2021-11-12 03:50:19 UTC debian-stretch has been yeeted from nodepool and AFS mirrors
  • 2021-11-08 02:09:57 UTC restarted gerrit to pick up plugin changes from #816618
  • 2021-11-04 06:12:16 UTC fedora 35 mirror finished syncing. fedora 33 removed
  • 2021-11-01 22:11:59 UTC The Gerrit service on review.opendev.org is being restarted quickly for some security updates, but should return to service momentarily
  • 2021-10-31 14:43:08 UTC restarted zuul on 4.10.4 due to bugs in master
  • 2021-10-29 06:48:43 UTC removed xenial from ubuntu-ports mirror
  • 2021-10-28 18:45:48 UTC mirror.bhs1.ovh.opendev.org filled its disk around 17:25 UTC. We have corrected this issue around 18:25 UTC and jobs that failed due to this mirror can be rechecked.
  • 2021-10-26 11:43:39 UTC Requested removal of review.opendev.org's IPv4 address from the barracudacentral.org RBL
  • 2021-10-25 22:34:56 UTC The OpenStack docs volume in AFS has been stuck for replication since 15:00 UTC, so a full release has been initiated which should complete in roughly 4 hours
  • 2021-10-22 23:51:17 UTC restarted all of zuul on commit 7c377a93e020f20d4535207d9b22bdc303af4050 for zk disconnect/threadpool fix
  • 2021-10-22 08:48:20 UTC zuul needed to be restarted, queues were lost, you may need to recheck your changes
  • 2021-10-21 01:22:13 UTC restarted all of zuul on commit 1df09a82ef67e9536bce76b9ef071756f9164faa
  • 2021-10-21 01:20:14 UTC deleted empty zk key directories for all old projects listed at https://opendev.org/opendev/project-config/src/branch/master/renames/20211015.yaml
  • 2021-10-19 23:19:59 UTC Manually restarted nsd on ns2.opendev.org, which seems to have failed to start at boot
  • 2021-10-19 17:02:20 UTC Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again
  • 2021-10-15 18:00:45 UTC The Gerrit service on review.opendev.org will be offline starting in 5 minutes, at 18:00 UTC, for scheduled project rename maintenance, which should last no more than an hour (but will likely be much shorter): http://lists.opendev.org/pipermail/service-announce/2021-October/000024.html
  • 2021-10-15 14:54:33 UTC Restarted the Ethercalc service on ethercalc.openstack.org as it was not responding to proxied connections from Apache any longer
  • 2021-10-14 10:03:50 UTC zuul was stuck processing jobs and has been restarted. pending jobs will be re-enqueued
  • 2021-10-13 23:05:48 UTC restarted all of zuul on commit 3066cbf9d60749ff74c1b1519e464f31f2132114
  • 2021-10-13 22:51:48 UTC Both Gerrit and Zuul services are being restarted briefly for minor updates, and should return to service momentarily; all previously running builds will be reenqueued once Zuul is fully started again
  • 2021-10-10 21:00:53 UTC Gerrit will be unavailable for maintenance for up to a few hours; see http://lists.opendev.org/pipermail/service-announce/2021-October/000024.html
  • 2021-10-09 18:49:32 UTC restarted all of zuul on commit a32753227be3a6a52d5e4a60ca4bd645823cfab0
  • 2021-10-07 21:05:00 UTC restarted all of zuul on commit 96c61208a064a2c5cabe1446bafdd58d1d1387c4
  • 2021-10-07 03:57:48 UTC restarted gerrit with sha256:4b708dd0... to pick up bullseye based images
  • 2021-10-06 17:56:59 UTC Manually corrected an incomplete unattended upgrade on afs01.ord.openstack.org
  • 2021-10-01 19:51:23 UTC Restarted Gerritbot to reestablish its connection to the Gerrit server
  • 2021-10-01 19:46:01 UTC The review.opendev.org Gerrit server has become unreachable as of approximately 19:10 UTC due to a networking issue in the provider, but should be reachable again shortly
  • 2021-09-30 17:12:51 UTC restarted all of zuul on commit 5858510e221bb4904d81331d681a68bb0c4b0f9a for further stabilization fixes
  • 2021-09-29 14:15:12 UTC Rolled back nova-yoga-ptg etherpad to revision 8272 via admin API at the request of bauzas
  • 2021-09-29 02:17:36 UTC restarted all of zuul on commit 659ba07f63dcc79bbfe62788d951a004ea4582f8 to pick up change cache fix for periodic jobs
  • 2021-09-28 20:41:16 UTC restarted all of zuul on commit 29d0534696b3b541701b863bef626f7c804b90f2 to pick up change cache fix
  • 2021-09-28 07:08:33 UTC restarted gerrit to pickup https://review.opendev.org/c/opendev/system-config/+/811233
  • 2021-09-27 21:11:02 UTC restarted all of zuul on 0928c397937da4129122b00d2288e582bc46aabc
  • 2021-09-27 20:11:21 UTC Gerrit and Zuul services are being restarted briefly for configuration and code updates but should return to service momentarily
  • 2021-09-27 18:04:07 UTC Deleted openstackid01.openstack.org, openstackid02.openstack.org, openstackid03.openstack.org, and openstackid-dev01.openstack.org at smarcet's request, after making a snapshot image of openstackid01 for posterity
  • 2021-09-23 22:40:34 UTC Replaced main04 volume on afs01.dfw with a new main05 volume in order to avoid impact from upcoming provider maintenance activity
  • 2021-09-23 18:43:09 UTC Restarted the ethercalc service on ethercalc.openstack.org as it seemed to be hung from the Apache proxy's perspective (though was not logging anything useful to journald)
  • 2021-09-22 23:22:07 UTC restarted all of zuul on master for change cache bug data collection
  • 2021-09-22 18:38:16 UTC Zuul has been restarted in order to address a performance regression related to event processing; any changes pushed or approved between roughly 17:00 and 18:30 UTC should be rechecked if they're not already enqueued according to the Zuul status page
  • 2021-09-21 23:00:30 UTC restarted all of zuul on commit 0c26b1570bdd3a4d4479fb8c88a8dca0e9e38b7f
  • 2021-09-20 17:11:56 UTC Started mirror02.iad3.inmotion.opendev.org via Nova API as it was in a SHUTOFF stateyes
  • 2021-09-20 16:00:29 UTC Ran filesystem checks on database volume for the paste.opendev.org service, and rebooted the server in order to attempt to clear any lingering I/O error state
  • 2021-09-13 00:56:07 UTC The mailing list services for lists.airshipit.org, lists.opendev.org, lists.openstack.org, lists.starlingx.io, and lists.zuul-ci.org are back in operation once again and successfully delivering messages
  • 2021-09-12 22:23:57 UTC restarted all of zuul on commit 9a27c447c159cd657735df66c87b6617c39169f6
  • 2021-09-12 21:48:29 UTC The mailing list services for lists.airshipit.org, lists.opendev.org, lists.openstack.org, lists.starlingx.io, and lists.zuul-ci.org are still offline while we finish addressing an unforeseen problem booting recent Ubuntu kernels from PV Xen
  • 2021-09-12 15:01:18 UTC The mailing list services for lists.airshipit.org, lists.opendev.org, lists.openstack.org, lists.starlingx.io, and lists.zuul-ci.org will be offline over the next 6 hours for server upgrades, messages will be sent to the primary discussion lists at each site once the maintenance concludes
  • 2021-09-09 23:07:55 UTC Restarted nl01-04.opendev.org on 5dc0aed2e6d2375de93c38f1389d19512c563b99 to pick up Node.user_data support. This will be required by Zuul shortly.
  • 2021-09-08 21:52:52 UTC restarted all of zuul on commit 04678e25e666c5e97b76e68838a0ce1cf0761144
  • 2021-09-08 21:07:03 UTC The Gerrit service on review.opendev.org is going offline momentarily for a host migration and zuul upgrade, downtime should be only a few minutes.
  • 2021-09-03 20:31:27 UTC Restarted nl01-04 on 4edaeba70265396de415d2c7519b4ff8415e7750
  • 2021-08-30 10:54:48 UTC restarted apache2 on ethercalc.o.o in order to resolve "scoreboard is full" situation
  • 2021-08-26 20:44:35 UTC Used rmlist to remove the openstack-i18n-de mailing list from lists.openstack.org now that https://review.opendev.org/805646 has merged
  • 2021-08-26 17:30:52 UTC Upgrades completed for the lists.katacontainers.io server, which is now fully back in service
  • 2021-08-26 16:05:28 UTC The lists.katacontainers.io service is offline for scheduled upgrades over the next two hours
  • 2021-08-25 20:20:20 UTC Accepted invitation on PyPI for openstackci to be an owner of ovn-bgp-agent
  • 2021-08-23 21:44:12 UTC The Gerrit service on review.opendev.org has been restarted for a patch version upgrade, resulting in a brief outage
  • 2021-08-23 17:36:56 UTC ze04 was rebooted at 02:16 UTC today due to a hypervisor host outage in that provider, but appears to be running normally
  • 2021-08-23 17:36:43 UTC Restarted the statusbot container on eavesdrop01 just now, since it seemed to not start cleanly immediately following a configuration update at 2021-08-21 17:12
  • 2021-08-21 14:30:52 UTC enabled "Guest access" on matrix synapse server EMS control panel to allow anonymous read access to rooms which allow that, and for federated homeservers to be able to browse the published rooms list
  • 2021-08-20 20:14:37 UTC restarted all of zuul on commit 919c5a36546117c4ad869ff9b580455970ecd268
  • 2021-08-19 18:34:21 UTC Completed manually-initiated host migration for afs02.dfw.openstack.org in preparation for upcoming provider maintenance
  • 2021-08-18 23:43:49 UTC restarted all of zuul on 598db8a78ba8fef9a29c35b9f86c9a62cf144f0c to correct tobiko config error
  • 2021-08-17 22:20:41 UTC restarted all of zuul on commit 6eb84eb4bd475e09498f1a32a49e92b814218942
  • 2021-08-13 15:14:32 UTC Hard rebooted elasticsearch02, elasticsearch06, and logstash-worker11 as all three seemed to be hung
  • 2021-08-11 14:24:14 UTC Killed an htcacheclean process on mirror01.bhs.ovh.opendev.org which had been squatting the flock since 2021-07-21, and then cleanly restarted the apache2 service since at least one of its workers logged a segfault in dmesg at 10:18:55 UTC when its cache volume filled completely
  • 2021-08-10 22:11:10 UTC restarted zuul on commit d07397a73c2551d5c77e0ffc3d98008337168902
  • 2021-08-09 13:49:22 UTC Requested Spamhaus XBL delisting for the IPv6 /64 CIDR containing the address of the lists.openstack.org server (seems to have been re-listed around 2021-08-06 07:00 UTC, roughly 2.5 days after the previous de-listing)
  • 2021-08-09 13:13:57 UTC Deleted stale vldb entry for AFS docs volume and ran vos release manually to catch up the read-only replicas
  • 2021-08-07 17:51:55 UTC restarted zuul on commit 14af6950b7e34b0cf3d3e813726e9cbc55a08764
  • 2021-08-05 20:04:47 UTC The Gerrit service on review.opendev.org is going down for a quick restart to adjust its database connection configuration, and should return to service momentarily
  • 2021-08-05 15:02:41 UTC A user account has been deleted from Zanata in an effort to fix the issue described in http://lists.openstack.org/pipermail/openstack-discuss/2021-August/023972.html
  • 2021-08-05 00:07:41 UTC restarted all of zuul on commit d87a9a8b8f00887c37bb382271181cb5d703ba3f
  • 2021-08-04 19:10:20 UTC Offline migrated elasticsearch07.openstack.org in order to mitigate an unplanned service outage
  • 2021-08-04 04:49:00 UTC cleaned up review-test and review01 servers and related volumes, databases, etc.
  • 2021-08-03 15:51:05 UTC Requested Spamhaus XBL delisting for the IPv6 address of the lists.openstack.org server
  • 2021-07-30 15:32:29 UTC Hard rebooted storyboard-dev01.opendev.org via Nova API due to what appears to be a CPU lockup.
  • 2021-07-30 15:02:07 UTC There will be a brief outage of the Gerrit service on review.opendev.org in the next few minutes as part of a routine project rename maintenance: http://lists.opendev.org/pipermail/service-announce/2021-July/000023.html
  • 2021-07-30 14:13:17 UTC There will be a brief outage of the Gerrit service on review.opendev.org starting at 15:00 UTC today as part of a routine project rename maintenance: http://lists.opendev.org/pipermail/service-announce/2021-July/000023.html
  • 2021-07-30 14:10:06 UTC There will be a brief outage of the Gerrit service on review.opendev.org starting at 15:00 UTC today as part of a routine project rename maintenance: http://lists.opendev.org/pipermail/service-announce/2021-July/000023.html
  • 2021-07-30 14:07:59 UTC Temporarily disabled automated deployment for review02.opendev.org, storyboard.openstack.org, and gitea*.opendev.org in preparation for 15:00 UTC project rename maintenance.
  • 2021-07-28 19:50:45 UTC restarted all of zuul on commit 8e4af0ce5e708ec6a8a2bf3a421b299f94704a7e
  • 2021-07-28 13:40:58 UTC Restarted the nodepool-launcher container on nl03.opendev.org in order to free stale node request locks
  • 2021-07-25 19:17:14 UTC Delisted the new Gerrit server's IPv4 address from Microsoft's E-mail service spam filter
  • 2021-07-24 19:32:51 UTC Completed restart of all servers for CVE-2021-33909 and CVE-2021-33910 (among other lower-priority vulnerabilities)
  • 2021-07-22 20:59:09 UTC Upgraded gitea to version 1.14.5
  • 2021-07-21 06:57:46 UTC Due to a configuration error unfortunately the Zuul queue was lost. Please recheck any in-flight changes
  • 2021-07-19 03:30:29 UTC The maintenance of the review.opendev.org Gerrit service is now complete and service has been restored. Please alert us in #opendev if you have any issues. Thank you
  • 2021-07-18 21:36:07 UTC Gerrit Downtime -- over the next few hours the gerrit service will be offline as we move it to a new home. Thank you for your patience and we will send an alert when things are restored in a 4-6 hours
  • 2021-07-18 00:02:06 UTC restarted all of zuul on 7e1e5a0176620d877717e87035223e4f3195d267
  • 2021-07-17 02:38:22 UTC restarted all of zuul on commit 43a8e34559e594a33faafecfe9a0a33e52e25ee8
  • 2021-07-16 21:13:58 UTC restarted all of zuul on commit 43b7f7f22c74301e830b042357536e6b5357d6e8
  • 2021-07-14 07:06:34 UTC paste.openstack.org migrated to paste.opendev.org
  • 2021-07-13 20:08:54 UTC restarted all of zuul on commit f9bfac09dd47e7065cd588287706b6965baaae37 to fix depends-on error and pick up result event handler fix
  • 2021-07-13 17:43:03 UTC Depends-On using https://review.opendev.org URLs are currently not working. This was due to a config change in Zuul that we are reverting and will be restarting Zuul to pick up.
  • 2021-07-12 01:37:00 UTC pulled gerrit 3.2.11 image and restarted, restarted zuul to use full review01.opendev.org name when connecting
  • 2021-07-09 20:06:37 UTC restarted all of zuul on commit 657d8c6fb284261f1213b9eaf1cf5c51f47c383b
  • 2021-07-08 16:40:34 UTC All nodepool launchers restarted on "latest" docker image (built from change corresponding to the nodepool 4.2.0 release)
  • 2021-07-03 01:09:23 UTC restarted all of zuul on commit 10966948d723ea75ca845f77d22b8623cb44eba4 to pick up stats and zk watch bugfixes
  • 2021-07-02 14:22:33 UTC restarted all of zuul on commit cc3ab7ee3512421d7b2a6c78745ca618aa79fc52 (includes zk executor api and zuul vars changes)
  • 2021-07-01 15:39:15 UTC Stopped log workers and logstash daemons on logstash-worker11-20 to collect up to date data on how many indexer workers are necessary
  • 2021-06-25 04:19:56 UTC added openeuler service user/keytab/volume (https://review.opendev.org/c/opendev/system-config/+/784874)
  • 2021-06-24 13:58:34 UTC Our Zuul gating CI/CD services are being taken offline now in order to apply some critical security updates, and are not expected to remain offline for more than 30 minutes.
  • 2021-06-24 12:03:34 UTC Our Zuul gating CI/CD services will be offline starting around 14:00 UTC (in roughly two hours from now) in order to apply some critical security updates, and is not expected to remain offline for more than 30 minutes.
  • 2021-06-23 06:44:48 UTC manually corrected links in recent meetings/logs on eavesdrop01.opendev.org to meetings.opendev.org; see https://review.opendev.org/c/opendev/system-config/+/797550
  • 2021-06-23 05:40:36 UTC cleaned up and rebooted nb01/nb02
  • 2021-06-17 22:05:55 UTC manually performed uninstall/reinstall for bridge ansible upgrade from https://review.opendev.org/c/opendev/system-config/+/792866
  • 2021-06-17 13:15:27 UTC Restarted the nodepool-launcher container on nl04.opendev.org to release stale node request locks
  • 2021-06-16 19:21:43 UTC Restarted nodepool launcher on nl03 to free stale node request locks for arm64 nodes
  • 2021-06-15 12:29:46 UTC Killed a long-running extra htcacheclean process on mirror.regionone.limestone which was driving system load up around 100 from exceptional iowait contention and causing file retrieval problems for jobs run there
  • 2021-06-14 14:40:36 UTC Restarted the ircbot container on eavesdrop to troubleshoot why opendevmeet isn't joining all its configured channels
  • 2021-06-11 17:48:44 UTC Zuul is being restarted for server reboots
  • 2021-06-11 09:34:18 UTC statusbot running on eavesdrop01.opendev.org
  • 2021-06-11 06:23:10 UTC meetbot/logging now running from limnoria on eavesdrop01.opendev.org
  • 2021-06-11 02:02:21 UTC restarted all of zuul on commit dd45f931b62ef6a5362e39bdb56ee203b74e1381 (4.5.0 +1)
  • 2021-06-09 04:01:20 UTC restarted gerrit to pick up changes for https://review.opendev.org/c/opendev/system-config/+/791995
  • 2021-06-08 17:56:28 UTC Dumped archival copies of openstack-security ML configuration and subscriber list in /home/fungi on lists.o.o, then removed the ML from mailman (leaving archives intact)
  • 2021-06-08 04:37:38 UTC gerritbot now running from eavesdrop01.opendev.org
  • 2021-06-07 13:22:09 UTC Increased quota for mirror.centos AFS volume from 300000000 to 350000000 as the volume had filled recently
  • 2021-06-04 13:47:17 UTC restarted zuul at commit 85e69c8eb04b2e059e4deaa4805978f6c0665c03 which caches unparsed config in zk. observed expected increase in zk usage after restart: 3x zk node count and 2x zk data size
  • 2021-06-02 06:54:43 UTC disabled limestone due to mirror issues, appears to be slow operations on cache volume
  • 2021-06-01 21:49:55 UTC Manually replaced HTTPS certificate, key and CA intermediate bundle on for the wiki.openstack.org site which is still not under any configuration management
  • 2021-05-30 15:35:06 UTC restarted all of zuul on commit bd1a669cc8e4eb143ecc96b67031574968d51d1e
  • 2021-05-30 11:42:16 UTC Restarted statusbot in the foreground within a root screen session for better collection of possible crash data
  • 2021-05-30 11:19:11 UTC Restarted statusbot, it wasn't running and had stopped logging around 00:52
  • 2021-05-30 00:49:41 UTC Manually patched statusbot and forced upgrade of simplemediawiki to a prerelease on eavesdrop to accommodate installing for Python 3.5
  • 2021-05-26 21:26:28 UTC Updated the /etc/hosts entry for freenode on eavesdrop since the server we were pinning to seems to have died without being removed from the round-robin DNS entry
  • 2021-05-26 12:46:49 UTC Restarted the nodepool launcher container on nl02 in order to free stuck node request locks
  • 2021-05-26 06:55:00 UTC ask.openstack.org retired and redirected to static site
  • 2021-05-25 15:05:30 UTC A unidentified incident in one region of one of our storage donors caused a small percentage of Zuul job builds to report a POST_FAILURE result with no uploaded logs between 13:14 and 14:18 UTC, these can be safely rechecked
  • 2021-05-25 06:55:54 UTC cleared leaked files and rebooted nb01/nb02
  • 2021-05-19 17:13:52 UTC Restarted mailman services on lists.openstack.org in order to verify configuration after config management changes
  • 2021-05-19 16:41:58 UTC Restarted mailman services on lists.katacontainers.io in order to verify configuration after config management changes
  • 2021-05-18 14:11:53 UTC Restarted "openstack" meetbot and "openstackstatus" statusbot processes after what appears to have been network disruption around 13:58 UTC
  • 2021-05-17 22:25:33 UTC Deleted zuul01.openstack.org (ef3deb18-e494-46eb-97a2-90fb8198b5d3) and its DNS records as zuul02.opendev.org has replaced it.
  • 2021-05-17 22:14:46 UTC Updated swap and log filesystem sizes on zuul02, and restarted all Zuul services on cdc99a3
  • 2021-05-17 21:34:11 UTC The Zuul service at zuul.opendev.org will be offline for a few minutes (starting now) in order for us to make some needed filesystem changes; if the outage lasts longer than anticipated we'll issue further notices
  • 2021-05-15 12:08:24 UTC The load balancer for opendev.org Git services was offline between 06:37 and 12:03 utc due to unanticipated changes in haproxy 2.4 container images, but everything is in service again now
  • 2021-05-14 04:27:48 UTC cleared out a range of old hosts on cacti.openstack.org
  • 2021-05-14 00:18:36 UTC swapped out zuul01.openstack.org for zuul02.opendev.org. The entire zuul + nodepool + zk cluster is now running on focal
  • 2021-05-13 23:51:40 UTC restarted zuul on commit ddb7259f0d4130f5fd5add84f82b0b9264589652 (revert of executor decrypt)
  • 2021-05-13 22:06:20 UTC We are cautiously optimistic that Zuul is functional now on the new server. We ran into some unexpected problems and want to do another restart in the near future to ensure a revert addresses the source of that problem.
  • 2021-05-13 20:41:54 UTC Zuul is in the process of migrating to a new VM and will be restarted shortly.
  • 2021-05-13 16:46:23 UTC Ran disable-ansible on bridge to avoid conflicts with reruns of playbooks to configure zuul02
  • 2021-05-12 15:33:52 UTC Any builds with POST_FAILURE result and no available logs between 11:41 and 14:41 UTC today were related to an authentication endpoint problem in one of our providers and can be safely rechecked now
  • 2021-05-12 12:56:34 UTC Bypassed gating to merge https://review.opendev.org/790961 for temporarily disabling log uploads to one of our providers
  • 2021-05-12 04:39:51 UTC Asterisk PBX service retired; see https://review.opendev.org/c/opendev/system-config/+/790190
  • 2021-05-06 21:42:17 UTC Stopped gerrit and apache on review-test in prep for future cleanup
  • 2021-05-06 04:47:54 UTC arm64 xenial images removed, mirror.wheel.xeniala64 volumes removed
  • 2021-05-05 18:03:42 UTC Deleted server instance survey01.openstack.org, database instance limesurvey, and related DNS records for survey.openstack.org and survey01.openstack.org
  • 2021-05-05 16:05:13 UTC Removed OpenStack Release Manager permissions from global Gerrit config as reflected in https://review.opendev.org/789383
  • 2021-04-30 19:41:06 UTC restarted zuul on commit b9a6190a452a428da43dc4ff3e6e388d4df41e8b
  • 2021-04-29 06:15:21 UTC updated the hosts entry for freenode on eavesdrop, restart gerritbot
  • 2021-04-28 22:07:44 UTC Deleted zk01-zk03.openstack.org as they have been replaced with zk04-06.opendev.org
  • 2021-04-27 22:12:29 UTC Upgraded zuul zk cluster to focal. zk01-03.openstack.org have been replaced with zk04-06.opendev.org
  • 2021-04-27 01:50:43 UTC restarted gerrit due to inability of user to update account settings, logs consistent with lock errors detailed in https://bugs.chromium.org/p/gerrit/issues/detail?id=13726
  • 2021-04-26 23:20:20 UTC nb03 : doubled /opt volume to 800g to allow for more images after enabling raw with https://review.opendev.org/c/opendev/system-config/+/787293
  • 2021-04-26 15:35:31 UTC Requested Spamhaus PBL delisting for the IPv4 address of nb01.opendev.org
  • 2021-04-26 15:23:33 UTC Requested Spamhaus PBL delisting for the IPv4 address of mirror01.dfw.rax.opendev.org
  • 2021-04-26 13:04:01 UTC Requested Spamhaus PBL delisting for the IPv4 address of review.opendev.org, which should take effect within the hour
  • 2021-04-23 19:03:42 UTC The Gerrit service on review.openstack.org is being restarted to pick up some updates, and should be available again momentarily
  • 2021-04-23 16:58:37 UTC Removed pypi.org address override on mirror01.dfw.rax.opendev.org now that v6 routing between them works again
  • 2021-04-23 16:05:45 UTC restarted zuul on commit d4c7d293609a151f0c58b2d8a6fe4ac2817ee501 to pick up global repo state changes
  • 2021-04-22 15:14:56 UTC Temporarily hard-coded the pypi.org hostname to a Fastly IPv4 address in /etc/hosts on mirror01.dfw.rax.opendev.org until v6 routing between them returns to working order
  • 2021-04-22 01:44:48 UTC deleted leaked zk node under /nodepool/images/fedora-32/builds to avoid many warnings in builder logs
  • 2021-04-21 21:16:33 UTC restarted zuul on commit 620d7291b9e7c24bb97633270492abaa74f5a72b
  • 2021-04-21 20:41:30 UTC Started mirror01.regionone.limestone.opendev.org, which seems to have spontaneously shutdown at 15:27:47 UTC today
  • 2021-04-21 20:09:52 UTC Deleted /afs/.openstack.org/project/tarballs.opendev.org/openstack/octavia/test-images/test-only-amphora-x64-haproxy-ubuntu-xenial.qcow2 as requested by johnsom
  • 2021-04-21 04:39:27 UTC Removed temporary block of 161.170.233.0/24 in iptables on gitea-lb01.opendev.org after discussion with operators of the systems therein
  • 2021-04-19 19:06:44 UTC Cleaned up external account id conflicts with two third party CI accounts. The involved parties were emailed a week ago with the proposed plan. No objection to that plan was received so cleanup proceeded today.
  • 2021-04-15 16:39:28 UTC Temporarily blocked 161.170.233.0/24 in iptables on gitea-lb01.opendev.org to limit impact from excessive git clone requests
  • 2021-04-15 15:53:42 UTC Temporarily disabled the gitea02 backend in haproxy due to impending memory exhaustion
  • 2021-04-14 21:44:53 UTC Deleted firehose01.openstack.org (ddd5b4c1-37af-4973-b49c-b2023582b75f) as its deployment was unmaintained and it was never used in production
  • 2021-04-14 04:32:52 UTC planet.openstack.org redirected to opendev.org/openstack/openstack-planet via static.o.o, server removed and dns entries cleaned up
  • 2021-04-12 15:58:00 UTC Switched owner of devstack project on Launchpad from openstack-admins to devstack-drivers (the latter is still owned by openstack-admins)
  • 2021-04-09 18:44:03 UTC Mounted new AFS volumes mirror.wheel.deb11a64 at mirror/wheel/debian-11-aarch64 and mirror.wheel.deb11x64 at mirror/wheel/debian-11-x86_64 with our standard ACLs and base quotas
  • 2021-04-09 15:25:34 UTC Deleted server instance "test" (created 2020-10-23) from nodepool tenant in linaro-us
  • 2021-04-09 15:16:21 UTC restarted zuul at commit 9c3fce2820fb46aa39dbf89984386420fd7a7f70
  • 2021-04-09 15:10:45 UTC Restarted the nodepool-launcer container on nl03.opendev.org in order to free some indefinitely locked node requests
  • 2021-04-09 00:28:46 UTC reboot nb02.opendev.org after mystery slowdown, appears to be root disk related
  • 2021-04-08 17:40:13 UTC Promoted 785432,1 in the zuul tenant's gate pipeline due to indefinitely waiting builds ahead of it
  • 2021-04-08 16:14:44 UTC Restarted ptgbot service on eavesdrop.o.o since the bot left channels during a 2021-02-25 netsplit and never returned
  • 2021-04-08 00:45:08 UTC backup prune on vexxhost backup server complete
  • 2021-04-07 15:34:22 UTC Restarted apache on gitea 04-07 to clean up additional stale processes which may have served old certs
  • 2021-04-07 13:59:04 UTC cold restarted apache on gitea08.opendev.org as there were some stale worker processes which seemed to be serving expired certs from more than a month ago
  • 2021-04-06 15:55:49 UTC POST_FAILURE results between 14:00 and 15:50 UTC can be safely rechecked, and were due to authentication problems in one of our storage donor regions
  • 2021-04-02 18:59:27 UTC Deleted diskimage centos-8-arm64-0000036820 on nb03.opendev.org in order to roll back to the previous centos-8-arm64-0000036819 because of repeated boot failures with the newer image
  • 2021-04-02 15:55:35 UTC restarted all of zuul on commit 991d8280ac54d22a8cd3ff545d3a5e9a2df76c4b to fix memory leak
  • 2021-04-02 14:31:18 UTC Restarted the nodepool-launcher container on nl02.opendev.org to free stuck node request locks
  • 2021-04-02 14:31:04 UTC Restarted statusbot after it never returned from a 2021-03-30 22:41:22 UTC connection timeout
  • 2021-03-30 20:32:43 UTC Restarted the Zuul scheduler to address problematic memory pressure, and reenqueued all in flight changes
  • 2021-03-29 04:08:38 UTC released all wheel mirrors, some of which appeared to be locked
  • 2021-03-29 04:07:14 UTC remove mirror.gem, test.fedora and a few out-of-date backup volumes used during various transitions to free up space on afs servers
  • 2021-03-27 14:12:55 UTC restarted zuul on commit 30959106601613974028cfb03d252db4bddf8888
  • 2021-03-25 22:59:06 UTC Restarted haproxy container on gitea-lb01 due to runaway CPU consumption
  • 2021-03-25 18:45:01 UTC Restarted gerritbot, since in the wake of IRC netsplits it seems to have forgotten it's not joined to at least some channels
  • 2021-03-25 18:13:19 UTC Restarted nodepool-launcher container on nl04.openstack.org in hopes of clearing stuck node requests from what looks like brief disruption in ovh-bhs1 around 03:30 UTC
  • 2021-03-25 11:41:10 UTC Restarted the haproxy container on gitea-lb01.opendev.org because it had been stuck in 100% cpu consumption since 04:00 UTC
  • 2021-03-24 22:56:16 UTC All gitea backends have been enabled in the haproxy LB once more
  • 2021-03-24 22:51:55 UTC Re-enabled gitea05 and 06 in pool, removed 02 due to memory exhaustion
  • 2021-03-24 22:30:14 UTC Temporarily removed gitea07 from the lb pool due to memory exhaustion
  • 2021-03-24 18:28:29 UTC Temporarily removed gitea05 from the balance_git_https pool
  • 2021-03-24 18:24:31 UTC Temporarily removed gitea06 from the balance_git_https pool
  • 2021-03-24 18:13:25 UTC A service anomaly on our Git load balancer has been disrupting access to opendev.org hosted repositories since 17:20 UTC; we've taken action to restore functionality, but have not yet identified a root cause
  • 2021-03-23 21:28:30 UTC restarted zuul on commit b268f71b233304dbbf2ce59846e47d0575b6b35b with recent scheduler bugfixes
  • 2021-03-23 18:34:23 UTC Restarted all Mailman queue processing daemons on lists.o.o in order to mitigate any fallout from a 2021-03-04 OOM event
  • 2021-03-22 20:55:38 UTC added zuul01 to emergency and restarted scheduler on 4.1.0 due to event queue bugs
  • 2021-03-22 14:38:39 UTC restarted zuul at 92f43d874ae8cc3e39d6455e3c8d9f8d0ca13eb7 (event queues are in zookeeper)
  • 2021-03-18 23:48:01 UTC all afs and kerberos servers migrated to focal, under ansible control
  • 2021-03-18 20:58:12 UTC Replaced nl01-04.openstack.org with new Focal nl01-04.opendev.org hosts.
  • 2021-03-18 19:38:12 UTC Restarted the gerritbot container after it disconnected from Freenode after trying to handle a very long commit message subject
  • 2021-03-18 00:31:31 UTC restarted zuul at commit 4bb45bf2a0223c1c624dbd8f44efff207e6b4097
  • 2021-03-17 21:46:50 UTC restarted zuul on commit 8a06dc90101c4b5285aaed858a62dadc5ae27868
  • 2021-03-17 13:49:44 UTC Restarted gerritbot container since it never returned after a 09:33 UTC server change
  • 2021-03-17 03:50:40 UTC kdc03/04 manually upgraded to focal. they are in emergency until 779890; we will run manually first time to confirm operation
  • 2021-03-16 15:51:18 UTC Re-enabled gitea06 in haproxy now that the crisis has passed
  • 2021-03-16 15:11:15 UTC Temporarily disabled balance_git_https,gitea06.opendev.org in haproxy on gitea-lb01
  • 2021-03-15 18:58:37 UTC Manually removed files matching glance-21.0.0.0b3* from the openstack/glance tree of the tarballs volume in AFS per discussion in #openstack-release
  • 2021-03-15 18:06:57 UTC Deleted errant files glance-21.0.0.0b3.tar.gz glance-21.0.0.0b3-py3-none-any.whl and the glance 21.0.0.0b3 record from PyPI per discussion in #openstack-release
  • 2021-03-15 14:06:05 UTC restarted gerritbot as it did not realize it was no longer connected to freenode
  • 2021-03-12 21:19:40 UTC restarted all of zuul at commit 13923aa7372fa3d181bbb1708263fb7d0ae1b449
  • 2021-03-12 19:14:16 UTC Corrected all Gerrit preferred email lacks external id account consistency problems.
  • 2021-03-12 18:15:20 UTC Restarted the containers on refstack01 to pick up configuration change from https://review.opendev.org/780272
  • 2021-03-11 21:31:40 UTC refstack.openstack.org CNAME created to the new refstack server. The A/AAAA records for the old server are renamed refstack-old until we decommission
  • 2021-03-09 21:00:17 UTC Cleaned up old DNS records for old openstack.org zuul mergers and executors
  • 2021-03-09 15:13:50 UTC Replaced ze09-12.openstack.org with ze09-12.opendev.org focal servers. This concludes the rolling replacement of the zuul executors.
  • 2021-03-08 17:01:12 UTC Deleted ze05-z08.openstack.org as new ze05-ze08.opendev.org servers have taken over
  • 2021-03-08 16:20:40 UTC Removed the Dell Ironic CI account from the Third-Party CI group in Gerrit as approved by the Ironic team in their weekly meeting
  • 2021-03-05 16:38:13 UTC Deleted ze02-ze04.openstack.org as they have been replaced with new .opendev.org hosts.
  • 2021-03-04 23:05:00 UTC updated eavesdrop hosts entry and restarted gerritbot due to netsplit
  • 2021-03-04 18:57:52 UTC Removed old ze01.openstack.org in favor of ze01.opendev.org. More new zuul executors to arrive shortly.
  • 2021-03-03 07:32:12 UTC afsdb01 and afsdb02 in-place upgraded to focal
  • 2021-03-03 02:21:55 UTC released git-review 2.0.0
  • 2021-03-02 01:35:09 UTC afsdb03.openstack.org now online, SRV records added
  • 2021-03-01 18:47:41 UTC filed spamhaus pbl removal for lists.katacontainers.io ipv4 address
  • 2021-03-01 18:47:26 UTC filed spamhaus css removal for lists.katacontainers.io ipv6 address
  • 2021-02-26 16:24:26 UTC The refstack.openstack.org service was offline 12:57-13:35 UTC due to a localized outage in the cloud provider where it's hosted
  • 2021-02-26 16:02:34 UTC Added new focal ze01.opendev.org and stopped zuul-executor on ze01.openstack.org
  • 2021-02-24 21:24:05 UTC Replaced zm01-08.openstack.org with new zm01-08.opendev.org servers running on focal
  • 2021-02-24 00:48:52 UTC Old rax.ord bup backups mounted RO on the new rax.ord borg backup server @ /opt/bup-202007
  • 2021-02-23 18:01:30 UTC Deleted zm01.openstack.org 0dad8f01-389c-40f2-8796-57ee4901ce07 as it has been replaced by zm01.opendev.org
  • 2021-02-22 00:10:48 UTC Restarted the Gerrit container on review.o.o to address a recurrence of https://bugs.chromium.org/p/gerrit/issues/detail?id=13726
  • 2021-02-20 00:25:27 UTC restarted zuul on 4f897f8b9ff24797decaab5faa346bd72f110970 and nodepool on c3b68c1498cc87921c33737e8809fdabbf3db5d7
  • 2021-02-19 17:59:28 UTC The change to the upload role has been reverted. Jobs started since the revert appear to be functioning normally. You can recheck changes that failed for builds started between 16:12 and 17:13UTC reporting POST_FAILURE safely now.
  • 2021-02-19 17:19:07 UTC All jobs are failing with POST_FAILURE due to a backward incompatible change made in the swift log upload libarary role. Working on a fix now.
  • 2021-02-16 17:31:23 UTC restarted all of zuul and nodepool at git shas bc4d0dd6140bb81fca1b8fcebe5b817838d2754b and 3d9914ab22b2205d9a70b4499e5e35a1a0cf6ed0 respectively
  • 2021-02-15 23:30:15 UTC restarted zuul scheduler and web at Zuul version: 3.19.2.dev377 c607884b
  • 2021-02-14 22:26:24 UTC All OpenAFS servers have now been removed from the Ansible emergency disable list and are being managed normally again.
  • 2021-02-12 04:21:49 UTC Added afsdb01 and afsdb02 servers to emergency disable list and added back missing public UDP ports in firewall rules while we work out what was missing from 775057
  • 2021-02-11 15:52:51 UTC Recent POST_FAILURE results from Zuul for builds started prior to 15:47 UTC were due to network connectivity issues reaching one of our log storage providers, and can be safely rechecked
  • 2021-02-10 16:32:04 UTC Grouped openinfraptg nick to existing openstackptg account in Freenode and updated ptgbot_nick in our private group_vars accordingly
  • 2021-02-09 15:16:23 UTC Manually reinstalled python-pymysql on storyboard.openstack.org after it was removed by unattended-upgrades at 06:21:48 UTC
  • 2021-02-06 02:18:18 UTC released bindep 2.9.0
  • 2021-02-05 02:47:12 UTC restarted the zuul-executor container on ze08 and ze12 to restore their console streamers, previously lost to recent oom events
  • 2021-02-03 17:48:44 UTC Requested Spamhaus SBL delisting for the lists.katacontainers.io IPv6 address
  • 2021-02-03 10:20:52 UTC restarted apache2 on static.opendev.org in order to resolve slow responses and timeouts
  • 2021-02-03 01:15:12 UTC afsdb01/02 restarted with afs 1.8 packages
  • 2021-02-02 00:58:17 UTC The Gerrit service on review.opendev.org is being quickly restarted to apply a new security patch
  • 2021-01-31 17:46:06 UTC Temporarily suspended Zuul build log uploads to OVH due to Keystone auth errors; POST_FAILURE results recorded between 16:30 and 17:40 UTC can be safely rechecked
  • 2021-01-29 16:48:19 UTC Added cinder-core as an included Gerrit group for rbd-iscsi-client-core as requested by rosmaita
  • 2021-01-27 17:41:46 UTC Blocked two IP addresses from Italy (one from a university and another from a phone provider network) at 16:45z because they seemed to be significantly increasingly system load by crawling file download links for old changes
  • 2021-01-22 23:39:34 UTC Removed email from DannyMassa's stale gerrit account so that a new account can be created with that email. This is admittedly a workaround but we need to figure out external-id edits more properly to properly solve this.
  • 2021-01-20 23:04:06 UTC Cleaned up /opt on nb01 and nb02 to remove stale image build data from dib_tmp and nodepool_dib. nb02's builder has been started as it has much more free space and we want it to "steal" builds from nb01.
  • 2021-01-20 23:03:20 UTC Upgraded gitea to 1.13.1
  • 2021-01-19 06:00:09 UTC restarted gerrit to get zuul-summary-results; see also http://lists.openstack.org/pipermail/openstack-discuss/2021-January/019885.html\
  • 2021-01-17 03:12:37 UTC restarted zuul-web because 9000/tcp became unresponsive after the scheduler restart and did not recover on its own
  • 2021-01-17 03:03:46 UTC restarted zuul scheduler, now gerrit wip state should prevent zuul from attempting to merge
  • 2021-01-16 20:39:51 UTC deleted old aaaa records for nonexistent nb01.openstack.org and nb02.openstack.org servers
  • 2021-01-15 15:39:30 UTC rebooted static.o.o to pick up the recent openafs fixes
  • 2021-01-15 00:57:45 UTC restarted gerritbot since it was lost in a netsplit at 00:03 utc
  • 2021-01-13 17:35:00 UTC rebooted afs02.dfw following hung kernel tasks and apparent disconnect from a cinder volume starting at 03:32:42, volume re-releases are underway but some may be stale for the next hour or more
  • 2021-01-13 17:12:48 UTC Manually deleted an etherpad at the request of dmsimard.
  • 2021-01-13 13:04:16 UTC stopped and restarted mirror.regionone.limestone.opendev.org after it had become unresponsive. need afs cache cleanup, too.
  • 2021-01-12 19:41:17 UTC restarted gerritbot as it switched irc servers at 16:55 and never came back
  • 2021-01-12 15:38:49 UTC manually deleted contents of /var/cache/openafs on mirror.regionone.linaro-us and rebooted to recover from a previous unclean shutdown which was preventing afsd from working
  • 2021-01-11 18:42:26 UTC rebooted paste.o.o just now in order to recover from a hung userspace earlier around 00:15 utc today
  • 2021-01-08 17:40:00 UTC released glean 1.18.2
  • 2021-01-08 15:21:54 UTC killed a hung vos examine process from yesterday's afs issues which was preventing static site volume releases from proceeding
  • 2021-01-07 16:25:00 UTC rebooted afs01.dfw.o.o as it seemed to have been hung since 05:50 utc
  • 2021-01-05 16:20:47 UTC rebooted ethercalc.o.o because the server hung some time in the past 24 hours
  • 2021-01-05 16:06:48 UTC zm02 was rebooted by the provider at 13:14 utc following recovery from a host outage
  • 2021-01-02 16:13:28 UTC on-demand offline server migrations of pbx.o.o and wiki.o.o completed after provider-initiated live migrations failed
  • 2020-12-31 18:46:37 UTC An OVH network outage caused Zuul to report POST_FAILURE results with no summary/logs for some builds completing between 16:10 and 17:06 UTC; these should be safe to recheck now
  • 2020-12-30 21:28:20 UTC issued 'poweroff' on 77.81.189.96 (was mirror01.sto2.citycloud.openstack.org) since that region is not in use and the host is not under config management
  • 2020-12-30 21:23:02 UTC manually removed pabelanger's email address from openstackid01.openstack.org /etc/aliases since ansipuppet isn't running there
  • 2020-12-29 15:35:07 UTC arm64 builds are completing successfully again
  • 2020-12-28 14:20:03 UTC arm64 job nodes are currently unavailable since 2020-12-27 due to an expired cert for the linaro-us cloud; the provider admin has been notified by irc and e-mail
  • 2020-12-27 23:05:19 UTC performed on-demand cold migration of backup01.ord.rax.ci.openstack.org to resolve provider ticket #201226-ord-0000452
  • 2020-12-22 18:52:30 UTC dequeued refs/heads/master of openstack/openstack-helm-images from the periodic pipeline of the openstack zuul tenant after determining that it was wedged due to capacity issues in the selected node provider
  • 2020-12-21 16:31:13 UTC rebooted nb01.o.o to recover from a block device outage around 2020-12-19 03:00z
  • 2020-12-21 16:15:54 UTC lists.openstack.org rebooted at 16:00z to remedy complete hang around 04:10z today
  • 2020-12-21 14:59:17 UTC restarted gerritbot on eavesdrop.o.o which seemed to have lost connections
  • 2020-12-19 17:46:03 UTC rebooting mirror02.regionone.linaro-us.opendev.org in order to attempt to free an unkillable afsd process
  • 2020-12-17 05:30:34 UTC restarted nb01/02 with dib 3.5.0 in builder container to fix centos-8 builds
  • 2020-12-16 05:07:46 UTC zuul restarted to pickup https://review.opendev.org/c/zuul/zuul/+/711002
  • 2020-12-14 18:16:53 UTC enqueued tripleo job stability fixes 757836,22 and 757821,16 into the gate and promoted them both to the front of the queue at weshay'a request
  • 2020-12-13 18:00:14 UTC e-mailed kevinz about apparent nat problem in linaro-us cloud, cc'd infra-root inbox
  • 2020-12-12 03:24:02 UTC all zuul mergers and executors have been restarted to enable git protocol v2
  • 2020-12-11 02:55:53 UTC The Gerrit service on review.opendev.org is being restarted quickly to enable support for Git protocol v2, downtime should be less than 5 minutes
  • 2020-12-10 20:11:55 UTC added mattmceuen to new sip-core and vino-core groups in gerrit
  • 2020-12-09 18:17:27 UTC The Gerrit service on review.opendev.org is currently responding slowly or timing out due to resource starvation, investigation is underway
  • 2020-12-09 01:10:55 UTC The Gerrit service on review.opendev.org is being restarted quickly to make heap memory and jgit config adjustments, downtime should be less than 5 minutes
  • 2020-12-08 08:27:45 UTC gerrit restarted after a short outage
  • 2020-12-07 02:00:50 UTC restarted gerrit with themeing from #765422
  • 2020-12-04 14:37:47 UTC restarted gerritbot as it did not realize it was no longer connected to freenode
  • 2020-12-03 17:53:19 UTC released gerritlib 0.10.0 to get a gerrit upgrade related group management fix into our manage-projects image
  • 2020-12-03 03:46:29 UTC restarted the gerrit service on review.o.o for the openjdk 11 upgrade from https://review.opendev.org/763656
  • 2020-12-02 05:29:51 UTC restarted the gerrit service on review.o.o for the config change from https://review.opendev.org/765004
  • 2020-11-30 22:38:32 UTC The Gerrit service on review.opendev.org is being restarted quickly to make further query caching and Git garbage collection adjustments, downtime should be less than 5 minutes
  • 2020-11-30 16:22:46 UTC The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot high load and poor query caching performance, downtime should be less than 5 minutes
  • 2020-11-27 15:31:46 UTC restarted apache2 on static.opendev.org in order to troubleshoot very long response times
  • 2020-11-26 20:39:10 UTC cleaned up stray artifacts published under https://tarballs.opendev.org/openstack/ironic-python-agent{,-builder}/ at dtantsur's request
  • 2020-11-25 00:29:57 UTC removed eavesdrop from emergency, gerritbot should be fixed after 763892 & 763927
  • 2020-11-24 16:43:55 UTC The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an SMTP queuing backlog, downtime should be less than 5 minutes
  • 2020-11-23 20:02:16 UTC The Gerrit service on review.opendev.org is being restarted quickly to troubleshoot an unexpected error condition, downtime should be less than 5 minutes
  • 2020-11-23 00:55:40 UTC Our Gerrit upgrade maintenance has concluded successfully; please see the maintenance wrap-up announcement for additional details: http://lists.opendev.org/pipermail/service-announce/2020-November/000014.html
  • 2020-11-22 00:02:31 UTC Gerrit is up and running again on version 3.2. Zuul is talking to it and running jobs. You can push and review changes. We are still working through things and there may be additional service restarts during our upgrade window ending 01:00UTC November 23. http://lists.opendev.org/pipermail/service-announce/2020-November/000013.html for more details
  • 2020-11-21 17:44:27 UTC The Gerrit service on review.opendev.org is accepting connections but is still in the process of post-upgrade sanity checks and data replication, so Zuul will not see any changes uploaded or rechecked at this time; we will provide additional updates when all services are restored.
  • 2020-11-20 15:05:54 UTC The Gerrit service at review.opendev.org is offline for a weekend upgrade maintenance, updates will be provided once it's available again: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html
  • 2020-11-20 14:02:42 UTC The Gerrit service at review.opendev.org will be offline starting at 15:00 UTC (roughly one hour from now) for a weekend upgrade maintenance: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html
  • 2020-11-20 13:04:31 UTC The Gerrit service at review.opendev.org will be offline starting at 15:00 UTC (roughly two hours from now) for a weekend upgrade maintenance: http://lists.opendev.org/pipermail/service-announce/2020-October/000012.html
  • 2020-11-20 03:19:25 UTC codesearch.openstack.org replaced by codesearch.opendev.org
  • 2020-11-20 00:51:05 UTC cleared /opt on nb01 & nb02 which had filled up, and restarted
  • 2020-11-19 18:32:33 UTC rebooted ns1.opendev.org after it became unresponsive
  • 2020-11-18 18:37:47 UTC The Gerrit service at review.opendev.org is being restarted quickly as a pre-upgrade sanity check, estimated downtime is less than 5 minutes.
  • 2020-11-17 17:47:31 UTC added pabelanger an initial member of new ansible-role-zuul-registry-core and ansible-role-zuul-registry-release groups in gerrit
  • 2020-11-10 00:24:31 UTC removed old mirror-update*.openstack.org servers and dns entries
  • 2020-11-09 20:29:53 UTC added founder access to #openstack-masakari for ptl yoctozepto
  • 2020-11-09 03:12:35 UTC collecting netconsole logs from regionone.linaro-us mirror on temp host @ 104.239.145.100
  • 2020-11-06 21:23:42 UTC Restarted nodepool launchers to pick up an ssh keyscanning fix. This should grab all valid ssh hostkeys for test nodes enabling testing with fips in jobs.
  • 2020-11-05 02:42:21 UTC removed grafana02.openstack.org, CNAME now goes to grafana.opendev.org
  • 2020-11-05 02:41:58 UTC remove old graphite01.opendev.org server and storage
  • 2020-11-04 18:12:47 UTC deleted broken 7t0pduk9xwiy spreadsheet from ethercalc via rest api, since it was malformed in such a way that it was crashing the service on page load
  • 2020-10-31 12:15:23 UTC restarted ethercalc service again following similar 11:48:56 utc crash
  • 2020-10-31 11:50:31 UTC restarted ethercalc service following 11:20:32 utc crash
  • 2020-10-29 00:29:54 UTC reprepro mirroring moved to mirror-update.opendev.org. logs now available at https://static.opendev.org/mirror/logs/
  • 2020-10-28 21:13:54 UTC powered off mirror-update.openstack.org (with its root crontab content commented out) in preparation for merging https://review.opendev.org/759965
  • 2020-10-28 10:43:53 UTC force-merged https://review.opendev.org/759831 at the request of nova ptl in order to unblock integrated gate
  • 2020-10-28 05:47:09 UTC restarted gerritbot which got confused reading the event stream while gerrit was restarted
  • 2020-10-28 05:25:04 UTC restarted gerrit at 04:55 to pick up its-storyboard plugin config update
  • 2020-10-27 04:02:27 UTC removed ceph h/j/l/m AFS volumes and mirroring jobs
  • 2020-10-26 13:59:56 UTC added gtema (openstacksdk ptl) to the python-openstackclient-stable-maint group in gerrit at his request
  • 2020-10-23 12:03:37 UTC rebooted mirror.regionone.limestone after it was found to be in a hung state since roughly 09:00 utc on 2020-10-22
  • 2020-10-21 21:52:14 UTC handed off ownership of https://pypi.org/project/pynotedb/ to softwarefactory account
  • 2020-10-21 00:40:55 UTC The Gerrit service at review.opendev.org is back up and running; for outage details see analysis here: http://lists.opendev.org/pipermail/service-announce/2020-October/000011.html
  • 2020-10-20 18:04:50 UTC Gerrit is offline due to a security compromise. Please refer to https://review.opendev.org/maintenance.html or #opendev for the latest updates.
  • 2020-10-20 15:44:39 UTC Auditing is progressing but not particularly quickly. We'll keep updating every 2 hours or so.
  • 2020-10-20 13:37:08 UTC We've confirmed that known compromised identities have been reset or had their accounts disabled, and we are auditing other service accounts for signs of compromise before we prepare to restore Gerrit to working order. We will update again in roughly 2 hours.
  • 2020-10-20 11:08:44 UTC Update on gerrit downtime: After investigation, we believe the incident is related to a compromised Gerrit user account rather than a vulnerability in Gerrit software. We are continuing to review activity to verify the integrity of git data and expect to have an additional update with possible service restoration in approximately 2 hours.
  • 2020-10-20 08:40:13 UTC We identified a possible vulnerability in Gerrit and are investigating the potential impact on our services. Out of an abundance of caution we have taken our OpenDev hosted Gerrit system offline. We will update with more information once we are able.
  • 2020-10-20 04:31:38 UTC We identified a possible vulnerability in Gerrit and are investigating the potential impact on our services. Out of an abundance of caution we have taken our OpenDev hosted Gerrit system offline. We will update with more information once we are able.
  • 2020-10-20 03:26:44 UTC We are investigating an issue with our hosted Gerrit services. We will provide an update as soon as we can. If you want to follow the latest, feel free to join #opendev
  • 2020-10-19 13:36:47 UTC Open Infrastructure Summit platform issues are being worked on by OSF events and webdev teams, status updates will be available in the conference "lobby" page as well as the #openinfra-summit channel on Freenode (though it is presently not logged)
  • 2020-10-16 18:46:27 UTC deleted trove instances of "testmt" percona ha cluster from 2020-06-27
  • 2020-10-16 14:05:47 UTC added slittle11 as initial member of starlingx-snmp-armada-app-core group in gerrit
  • 2020-10-16 12:55:45 UTC added noonedeadpunk to access list for #openstack-ansible
  • 2020-10-16 00:45:42 UTC restarted gerrit container. cpu pegged and jstack of the busy threads in the container showed all were gc related
  • 2020-10-15 15:26:31 UTC restarted apache on static.opendev.org (serving most static content and documentation sites) at 15:09 utc to recover from an unexplained hung process causing site content not to be served
  • 2020-10-15 09:16:25 UTC rebooted nb03.opendev.org via openstack API after it seemed to have gotten stuck due to disk IO issues
  • 2020-10-14 22:21:52 UTC moved x/pyeclib x/whitebox-tempest-plugin x/kayobe back under openstack/ in tarballs AFS (https://review.opendev.org/758259)
  • 2020-10-14 13:06:27 UTC abandoned open changes for retired project openstack/openstack-ansible-os_monasca-ui
  • 2020-10-14 12:10:32 UTC manually moved files from project/tarballs.opendev.org/x/compute-hyperv into .../openstack/compute-hyperv per 758096
  • 2020-10-13 19:01:13 UTC increased mirror.ubuntu-ports afs quota from 500000000 to 550000000 (93%->84% used)
  • 2020-10-13 19:00:43 UTC increased mirror.ubuntu afs quota from 550000000 to 650000000 (99%->84% used)
  • 2020-10-13 13:19:17 UTC restarted gerritbot on eavesdrop.o.o and germqtt on firehose.o.o following gerrit outage
  • 2020-10-13 12:21:53 UTC restarted gerrit container on review.opendev.org after it stopped responding to apache
  • 2020-10-12 09:17:33 UTC restarted gerritbot on eavesdrop once again
  • 2020-10-12 02:46:51 UTC cleared openafs cache on mirror01.bhs1.ovh.opendev.org and rebooted
  • 2020-10-12 02:46:23 UTC cleared full storage on nb01 and rebooted
  • 2020-10-10 20:34:59 UTC downed and upped gerritbot container on eavesdrop following irc timeout at 00:36:44 utc
  • 2020-10-09 18:26:10 UTC hard rebooted mirror01.bhs1.ovh to recover from high load average (apparently resulting from too many hung reads from afs)
  • 2020-10-09 18:17:12 UTC hard rebooted afs02.dfw.o.o to address a server hung condition
  • 2020-10-09 03:56:16 UTC rebooted logstash-worker02.openstack.org
  • 2020-10-08 15:29:54 UTC added jpena (packaging sig chair) as the initial rpm-packaging-release group member in gerrit
  • 2020-10-07 18:06:30 UTC restored vandalized victoria-ptg-manila etherpad to revision 5171 at the request of gouthamr
  • 2020-10-07 11:37:22 UTC killed a hung vos release from the afs02.dfw outage two days ago which was causing the release cronjob on mirror-update to never release its lockfile
  • 2020-10-06 08:33:48 UTC restarted the gerritbot docker container on eavesdrop because it seemed broken
  • 2020-10-05 20:07:22 UTC CORRECTION: hard rebooted afs02.dfw.o.o to address a server hung condition which seems to have started at roughly 18:50 utc
  • 2020-10-05 20:06:49 UTC hard rebooted afs03.dfw.o.o to address a server hung condition which seems to have started at roughly 18:50 utc
  • 2020-10-05 06:17:40 UTC rebooted translate01.openstack.org which had gone into error state
  • 2020-10-02 22:35:47 UTC Restarted gitea cluster in sequence to pick up new config that emits tracebacks to logs on errors
  • 2020-10-01 22:49:30 UTC review-test has been synced from prod. The sync playbook won't work as is due to db backups including extra databases. We just did it manually.
  • 2020-10-01 16:37:12 UTC restarted all zuul executors on latest docker image for https://review.opendev.org/755518 to fix the ensure-twine role failures
  • 2020-09-28 17:30:36 UTC redeployed zuul-web to pick up javascript updates for proper external link rendering on builds and buildsets
  • 2020-09-28 16:02:58 UTC hard rebooted afsdb02 via nova api following hung kernel tasks around 13:10 utc
  • 2020-09-24 21:10:43 UTC Upgraded gitea to version 1.12.4 from 1.12.3
  • 2020-09-24 15:49:39 UTC added yoctozepto to https://launchpad.net/~masakari-drivers at his request, as a de facto infra liaison for that project
  • 2020-09-24 11:50:20 UTC disabled gitea02 in the load balancer due to missing projects (see previous note about 04 and 05)
  • 2020-09-24 11:43:49 UTC restarted ethercalc service on ethercalc.openstack.org
  • 2020-09-24 11:40:05 UTC took gitea 04 and 05 out of the load-balancer because they are out of sync
  • 2020-09-23 04:46:45 UTC A failing log storage endpoint has been removed, you can recheck any recent jobs with POST_FAILURE where logs have failed to upload
  • 2020-09-23 04:37:14 UTC disabled openedge log upload due to a 503 error from API
  • 2020-09-21 19:51:22 UTC provider maintenance 2020-10-02 01:00-05:00 utc involving ~5-minute outages for databases used by health, review-dev, zuul
  • 2020-09-21 18:27:44 UTC Deleted nb04.opendev.org (9f51d27e-abe7-4e4c-9246-15cfa08718fb) and its cinder volume, nb04.opendev.org/main (df50b5c7-1908-4719-9116-06ae5154387a), as this server is no longer necessary (replaced with nb01 and nb02.opendev.org).
  • 2020-09-18 01:03:58 UTC cleared old docker repo from /afs/.openstack.org/mirror/deb-docker ; jobs should now be using repo-specific dirs
  • 2020-09-15 16:12:03 UTC Our PyPI caching proxies are serving stale package indexes for some packages. We think because PyPI's CDN is serving stale package indexes. We are sorting out how we can either fix or workaround that. In the meantime updating requirements is likely the wrong option.
  • 2020-09-15 12:34:38 UTC reinstalled the kernel, kernel headers, and openafs-client on mirror.kna1.airship-citycloud.opendev.org and rebooted it, as it seems to have possibly been previously rebooted after an incomplete package update
  • 2020-09-14 23:48:43 UTC rebooted elasticsearch06.opensatck.org, which was hung
  • 2020-09-14 18:50:15 UTC cinder volume for wiki.o.o has been replaced and cleaned up
  • 2020-09-14 18:48:31 UTC deleted old 2017-01-04 snapshot of wiki.openstack.org/main01 in rax-dfw
  • 2020-09-14 18:44:07 UTC provider maintenance 2020-09-30 01:00-05:00 utc involving ~5-minute outages for databases used by cacti, refstack, translate, translate-dev, wiki, wiki-dev
  • 2020-09-14 14:01:21 UTC restarted houndd on codesearch.o.o following a json encoding panic at 10:03:40z http://paste.openstack.org/show/797837/
  • 2020-09-13 22:47:11 UTC started nb03.opendev.org which had gone into shutdown
  • 2020-09-13 22:40:36 UTC rebooted regionone.linaro-us mirror as it had gone into shutdown
  • 2020-09-11 15:34:36 UTC corrected quota on mirror.deb-octopus afs volume from 5mb to 50gb
  • 2020-09-11 14:59:55 UTC deleted errant zuul-0.0.0-py3-none-any.whl and zuul-0.0.0.tar.gz files and the corresponding 0.0.0 release from the zuul project on pypi
  • 2020-09-10 23:00:58 UTC Deleted nb03.openstack.org. It has been replaced by nb03.opendev.org.
  • 2020-09-10 17:58:14 UTC cinder volume for wiki-dev has been replaced and cleaned up
  • 2020-09-10 17:33:46 UTC Stopped nodepool-builder on nb03.openstack.org in preparation for its deletion
  • 2020-09-10 17:27:53 UTC cinder volumes for nb01 and nb02 have been replaced and cleaned up
  • 2020-09-10 16:41:07 UTC added new mirror.deb-octopus volume mounted at /afs/.openstack.org/mirror/ceph-deb-octopus with replicas and set acls consistent with other reprepro mirrors
  • 2020-09-10 02:46:50 UTC removed old nb01/2.openstack.org dns entries
  • 2020-09-09 23:01:33 UTC cinder volume for graphite01 (current production server) has been replaced and cleaned up
  • 2020-09-09 21:47:27 UTC restarted zuul scheduler and web at commit 6a5b97572f73a70f72a0795d5e517cff96740887 to pick up held db attribute
  • 2020-09-09 19:29:56 UTC Configured mirror01.ca-ymq-1.vexxhost.opendev.org to configure its ipv6 networking statically with netplan rather than listen to router advertisements.
  • 2020-09-09 16:29:12 UTC rebooted graphite01 to resolve a page allocation failure during volume attach
  • 2020-09-09 15:00:24 UTC cinder volume for graphite02 (not yet in production) has been replaced and cleaned up
  • 2020-09-09 14:59:56 UTC cinder volumes for all six elasticsearch servers have been replaced and cleaned up
  • 2020-09-08 13:05:31 UTC elasticsearch04 rebooted after it was found to be hung
  • 2020-09-08 13:00:27 UTC elasticsearch03 was rebooted 2020-09-08 23:48z after it was found to be hung
  • 2020-09-07 23:49:16 UTC rebooted hung elasticsearch03.openstack.org
  • 2020-09-07 09:58:40 UTC deleted bogus 2001:db8:: addresses on mirror01.ca-ymq-1.vexxhost.opendev.org once more
  • 2020-09-05 20:29:26 UTC cinder volume for eavesdrop.o.o has been replaced and cleaned up
  • 2020-09-05 20:13:25 UTC cinder volume for cacti.o.o has been replaced and cleaned up
  • 2020-09-04 21:32:39 UTC cinder volume for mirror01.dfw.rax.o.o has been replaced and cleaned up
  • 2020-09-04 19:32:15 UTC cinder volume for review.o.o has been replaced, upgraded from 200gb sata to 256gb ssd, and cleaned up
  • 2020-09-04 19:13:01 UTC rebooted mirror01.dfw.rax to resolve a page allocation failure during volume attach
  • 2020-09-04 17:28:06 UTC cinder volume for etherpad01 has been replaced and cleaned up
  • 2020-09-04 15:26:19 UTC all four cinder volumes for afs02.dfw have been replaced and cleaned up
  • 2020-09-03 21:08:09 UTC all four cinder volumes for afs01.dfw have been replaced and cleaned up
  • 2020-09-02 17:37:41 UTC restarted ethercalc service following crash at 17:25:29
  • 2020-09-02 12:45:30 UTC zm07 was rebooted at 01:54 and again at 04:08 by the cloud provider because of unspecified hypervisor host issues
  • 2020-09-01 20:43:17 UTC restarted ethercalc service after a crash at 18:04:01z
  • 2020-09-01 18:45:36 UTC marked old gerrit account 10697 inactive at the request of thiagop
  • 2020-09-01 14:37:01 UTC deleted unused cinder volumes in rax-dfw control plane tenant: wiki-dev.openstack.org/main01 (55db9e89-9cb4-4202-af88-d8c4a174998e), review-dev.openstack.org/main01 (66fea64f-9220-4c53-8988-deb32477ada7), static.openstack.org/main01 (0c0c7fb5-146b-4ecf-baf6-4ac9eaa4f277)
  • 2020-08-31 16:06:53 UTC restarted gerritbot following a netsplit at 14:37:31 from which it never returned
  • 2020-08-31 14:12:00 UTC zm07 was rebooted by the provider at 04:00z due to unspecified hypervisor host issues
  • 2020-08-31 09:18:26 UTC due to a new release of setuptools (50.0.0), a lot of jobs are currently broken, please do not recheck blindly. see http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016905.html
  • 2020-08-28 22:15:13 UTC A zuul server ended up with read only filesystems which caused many jobs to hit retry_limit. The server has been rebooted and appears happy. Jobs can be rechecked.
  • 2020-08-28 22:09:30 UTC ze08 ended up with read only filesystems and has been rebooted. This resulted in many retry_limit errors.
  • 2020-08-28 17:58:14 UTC Restarted zuul-web and zuul-fingergw to pick up web updates, primarly the one that fixes keyboard scrolling.
  • 2020-08-25 01:41:18 UTC rebooted mirror01.us-east.openedge.opendev.org due to it corrupting it's disk. seems ok after reboot
  • 2020-08-21 10:19:37 UTC restarted crashed ethercalc service, log info at http://paste.openstack.org/show/797024/
  • 2020-08-20 22:17:56 UTC hard rebooted afs02.dfw.openstack.org after it became entirely unresponsive (hung kernel tasks on console too)
  • 2020-08-20 18:52:15 UTC edited /etc/nodepool/nodepool.yaml on nb03 to pause all image builds for now, since its in the emergency disable list
  • 2020-08-20 16:31:33 UTC all nodepool builders stopped in preparation for image rollback and pause config deployment
  • 2020-08-20 03:13:25 UTC reboot zm05.openstack.org that had hung
  • 2020-08-19 17:18:34 UTC ethercalc service restarted following exportCSV crash at 16:42:59 utc
  • 2020-08-18 18:28:30 UTC Updated elastic-recheck-core to include tripleo-ci-core as the tripleo-ci team intends to maintain the elastic-recheck tool
  • 2020-08-17 22:31:39 UTC restarted ethercalc service on ethercalc02.o.o following unexplained crash at 17:58:42 utc
  • 2020-08-16 18:39:13 UTC added cloudnull and mnaser as initial members of the ansible-collections-openstack-release group in openstack, as it is associated with the openstack/ansible-collections-openstack repo maintained by the openstack ansible sig
  • 2020-08-13 20:36:25 UTC manually restarted nodepool-launcher container on nl03 to pick up changed catalog entries in vexxhost ca-ymq-1 (aka mtl1)
  • 2020-08-12 23:15:11 UTC manually pulled, downed and upped gerritbot container on eavesdrop for recent config parsing fix
  • 2020-08-12 17:01:38 UTC added pierre riteau to cloudkitty-core group in gerrit per openstack tc and https://review.opendev.org/745653
  • 2020-08-11 23:20:19 UTC Moved gerritbot from review.openstack.org to eavesdrop.openstack.org. Cleanup on old server needs to be done and we need to have project-config run infra-prod-service-eavesdrop when the gerritbot config updates.
  • 2020-08-11 20:54:00 UTC The openstackgerrit IRC bot (gerritbot) will be offline for a short period while we redeploy it on a new server
  • 2020-08-08 02:56:51 UTC ze06 rebooted by provider 2020-08-08 02:59 utc in an offline migration to address a hypervisor host failure
  • 2020-08-08 02:27:24 UTC ze01 rebooted by provider 2020-08-07 21:38 utc in an offline migration to address a hypervisor host failure
  • 2020-08-07 20:03:21 UTC rebooted ze06 after it started complaining about i/o errors for /dev/xvde and eventually set the filesystem read-only, impacting job execution resulting in retry_limit results in some cases
  • 2020-08-06 00:47:50 UTC updated review.openstack.org cname to refer to review.opendev.org rather than review01.openstack.org
  • 2020-08-04 15:22:30 UTC manually updated /etc/gerritbot/channel_config.yaml on review01 with latest content from openstack/project-config:gerritbot/channels.yaml (for the first time since 2020-03-17) and restarted the gerritbot service
  • 2020-08-03 20:59:32 UTC Restarted elasticsearch on elasticsearch05 and elasticsearch07 as they had stopped. Rebooted logstash-worker01-20 as their logstash daemons had failed after the elasticsearch issues.
  • 2020-08-03 16:16:45 UTC deleted corrupt znode /nodepool/images/fedora-31/builds/0000011944 to unblock image cleanup threads
  • 2020-07-31 15:28:23 UTC seeded python-tempestconf-release gerrit group by including python-tempestconf-core
  • 2020-07-30 16:36:46 UTC deleted (driverfixes|stable)/(mitaka|newton|ocata|pike) branches from manila, manila-ui and python-manilaclient repos per http://lists.openstack.org/pipermail/openstack-discuss/2020-July/016188.html
  • 2020-07-27 19:43:37 UTC copied python-tripleoclient 13.4.0 wheel/sdist from pypi to the tarballs site after verifying with release key signatures (also uploaded) from build failure log
  • 2020-07-27 19:28:54 UTC reenqueued ref for os-net-config 12.4.0 into the tag pipeline to rerun failed release notes job
  • 2020-07-27 19:25:06 UTC reenqueued refs for tripleo-image-elements 12.1.0 and python-tripleoclient 13.5.0 into the tag pipeline to rerun failed release notes jobs
  • 2020-07-27 19:20:12 UTC copied tripleo-image-elements 12.1.0 and os-apply-config 11.3.0 wheels/sdists from pypi to the tarballs site after verifying with release key signatures (also uploaded) from build failure logs
  • 2020-07-27 18:03:52 UTC took the zuul-executor container down and back up on ze11 in order to make sure /afs is available
  • 2020-07-27 17:40:08 UTC Restarted logstash geard, logstash workers, and logstash now that neutron logs have been trimmed in size.
  • 2020-07-27 15:47:25 UTC took the zuul-executor container down and back up on ze10 in order to try to narrow down possible causes for afs permission errors
  • 2020-07-27 15:23:44 UTC rebooted ze11 in hopes of eliminating random afs directory creation permission errors
  • 2020-07-25 00:14:30 UTC rebooted zm01 which had been hung since 2020-07-22 14:10z
  • 2020-07-24 17:47:40 UTC Moved files out of gerrit production fs and onto ephemeral drive. Assuming this causes no immediate problems those files can be removed in the near future. This has freed up space for gerrit production efforts.
  • 2020-07-24 16:22:52 UTC copied nova 19.3.0 wheel/sdist from pypi to the tarballs site after verifying with release key signatures (also uploaded) from build failure log
  • 2020-07-24 15:24:00 UTC We are renaming projects in Gerrit and review.opendev.org will experience a short outage. Thank you for your patience.
  • 2020-07-23 18:38:11 UTC copied oslo.messaging 12.2.2 and designate 8.0.1 wheels/sdists from pypi to the tarballs site after verifying with release key signatures (also uploaded) from build failure logs
  • 2020-07-23 01:24:07 UTC removed the r/w requirements for "contents" on the github zuul app ( https://github.com/apps/opendev-zuul)
  • 2020-07-22 21:25:09 UTC hard rebooted ask.o.o after it was hung for nearly two hours
  • 2020-07-21 19:47:22 UTC restarted all of zuul at 9a9b690dc22c6e6fec43bf22dbbffa67b6d92c0a to fix github events and shell task vulnerability
  • 2020-07-21 19:21:02 UTC performed hard reboot of zm03,4,7 via api after they became unresponsive earlier today
  • 2020-07-21 07:32:08 UTC rebooted eavesdrop01.openstack.org as it was not responding to network or console
  • 2020-07-17 22:54:09 UTC restarted all of zuul using tls zookeeper and executors in containers
  • 2020-07-15 15:01:13 UTC deleted configuration for openstack-infra mailing list for the lists.openstack.org site, leaving archives intact
  • 2020-07-13 22:24:06 UTC rotated backup volume to main-202007/backups-202007 logical volume on backup01.ord.rax.ci.openstack.org
  • 2020-07-13 18:16:55 UTC old volume and volume group for main/backups unmounted, deactivated and deleted on backup01.ord.rax.ci.openstack.org
  • 2020-07-13 17:05:05 UTC Restarted zuul-executor container on ze01 now that we vendor gear in the logstash job submission role.
  • 2020-07-13 15:14:50 UTC zm05 rebooted by provider at 12:02 utc due to hypervisor host problem, provider trouble ticket 200713-ord-0000367
  • 2020-07-10 16:40:47 UTC bypassed zuul and merged 740324,2 to openstack/tripleo-ci at weshay's request
  • 2020-07-09 22:15:29 UTC The connection flood from AS4837 (China Unicom) has lessened in recent days, so we have removed its temporary access restriction for the Git service at opendev.org as of 18:24 UTC today.
  • 2020-07-08 16:14:11 UTC removed zuul01:/root/.bup and ran 'bup init' to clear the client-side bup cache which was very large (25G)
  • 2020-07-08 15:48:52 UTC commented out old githubclosepull cronjob for github user on review.o.o
  • 2020-07-06 00:22:50 UTC rebooted unresponsive elasticsearch05.openstack.org
  • 2020-07-02 20:15:04 UTC Upgraded etherpad-lite to 1.8.4 on etherpad.opendev.org
  • 2020-07-01 15:23:14 UTC removed stray /srv/meetbot-openstack/meetings/cinder/2020/cinder.2020-07-01-14.15.* on eavesdrop.o.o at the request of smcginnis
  • 2020-06-30 20:14:51 UTC triggered full gerrit re-replication after gitea restarts with `replication start --all` via ssh command-line api
  • 2020-06-30 18:24:33 UTC Due to a flood of connections from random prefixes, we have temporarily blocked all AS4837 (China Unicom) source addresses from access to the Git service at opendev.org while we investigate further options.
  • 2020-06-30 10:45:57 UTC stopped zuul-executor on ze01 with "docker-compose down" to allow for debugging ModuleNotFoundError
  • 2020-06-30 10:45:16 UTC stopped zuul-executor on ze01 with "docker-compose down
  • 2020-06-29 17:26:44 UTC Restarted zuul-executor on ze01 with new docker image that includes openafs-client
  • 2020-06-29 13:06:44 UTC Stopped zuul-executor on ze01 as the container image it is running lacks `unlog`
  • 2020-06-25 17:26:28 UTC ze01 is running via docker now, ze* is still in emergency so we can watch ze01
  • 2020-06-25 14:54:12 UTC nl02.openstack.org rebooted by provider at 14:46z due to a hypervisor host outage
  • 2020-06-25 14:53:37 UTC logstash-worker02.openstack.org rebooted by provider at 14:46z due to a hypervisor host outage
  • 2020-06-24 13:54:28 UTC restarted nodepool to pick up latest openstacksdk
  • 2020-06-22 23:09:22 UTC reset membership of https://launchpad.net/~trove-coresec to only include the current trove ptl
  • 2020-06-22 22:57:47 UTC closed down bug reporting for https://launchpad.net/trove and updated the project description to link to storyboard
  • 2020-06-22 17:08:02 UTC Upgraded gitea farm to gitea 1.12.0 with minor local edits
  • 2020-06-19 22:05:11 UTC put gitea01 in the emergency file and manually upgraded gitea to 1.12.0 there. This allows us to check the performance updates and memory use after the addition of caching.
  • 2020-06-19 03:26:06 UTC reboot elasticsearch07.openstack.org and logstashworker09.openstack.org, both had become unresponsive
  • 2020-06-18 17:24:38 UTC Set gerrit account 10874 to inactive at the request of the user. This was a followon to a similar request in April where we only managed to disable one of two requests accounts.
  • 2020-06-18 16:57:05 UTC Stopped nodepool builders and manually cleared some disk space so that ansible can run successfully. Then reran service-nodepool.yaml against nodepool. This has begun clearing our old -plain images.
  • 2020-06-18 13:46:36 UTC set gerrit account for "Pratik Raj" back to active after contact was established with members of the community
  • 2020-06-18 12:18:32 UTC temporarily marked gerrit account for "Pratik Raj" inactive to stem an uncoordinated bulk change flood
  • 2020-06-17 21:50:49 UTC Cleaned up old unnecessary records in openstack.org DNS zone.
  • 2020-06-16 21:03:11 UTC Rebooted logstash-worker02 and 13 as ansible base.yaml complained it could not reach them
  • 2020-06-16 18:09:05 UTC Zuul is back online; changes uploaded or approved between 16:40 and 18:00 will need to be rechecked.
  • 2020-06-16 16:40:38 UTC Zuul is being restarted for an urgent configuration change and may be offline for 15-30 minutes. Patches uploaded or approved during that time will need to be rechecked.
  • 2020-06-16 15:54:40 UTC retrieved the following release artifacts from pypi, signed with the victoria cycle key and copied into afs volume for tarballs site: cliff 3.3.0, octavia 6.0.1, python-troveclient 4.1.0, zaqar 9.0.1
  • 2020-06-16 02:53:05 UTC rebooted mirror01.ca-ymq-1.vexxhost.opendev.org for afs connection issues
  • 2020-06-16 00:53:35 UTC ze04 rebooted to clear inconsistent afs rw volume access following saturday's outage
  • 2020-06-16 00:48:52 UTC deleted user-committee ml from openstack mailman site on lists.o.o
  • 2020-06-15 23:03:31 UTC disabled ansible on bridge due to 5+ hour backlog with potentially breaking change at end
  • 2020-06-15 22:05:51 UTC started mirror-update01.openstack.org
  • 2020-06-15 21:45:33 UTC re-keyed gearman tls certs (they expired)
  • 2020-06-15 14:28:33 UTC rebooted mirror01.ord.rax.opendev.org to clear hung openafs client state
  • 2020-06-15 08:15:40 UTC force-merged https://review.opendev.org/735517 and https://review.opendev.org/577955 to unblock devstack and all its consumers after a new uwsgi release
  • 2020-06-15 07:44:42 UTC uWSGI made a new release that breaks devstack, please refrain from rechecking until a devstack fix is merged.
  • 2020-06-15 06:56:04 UTC rebooted mirror.mtl01.inap.opendev.org due to unresponsive apache processes
  • 2020-06-14 23:12:55 UTC rebooted graphite.openstack.org as it was unresponsive
  • 2020-06-14 13:42:54 UTC Package mirrors should be back in working order; any jobs which logged package retrieval failures between 19:35 UTC yesterday and 13:20 UTC today can be safely rechecked
  • 2020-06-14 13:20:22 UTC performed hard reboot of afs01.dfw.openstack.org
  • 2020-06-14 13:14:24 UTC temporarily powered off mirror-update.opendev.org and mirror-update.openstack.org while working on afs01.dfw.openstack.org recovery process
  • 2020-06-14 08:38:20 UTC The opendev specific CentoOS and openSUSE mirror disappeared and thus CentOS and openSUSE jobs are all broken.
  • 2020-06-13 13:04:28 UTC afs volume replica sync and content catch up for mirror.ubuntu completed
  • 2020-06-12 23:46:12 UTC project rename maintenance concluded: http://eavesdrop.openstack.org/meetings/opendev_maint/2020/opendev_maint.2020-06-12-20.27.html
  • 2020-06-12 22:50:40 UTC The Gerrit service on review.opendev.org is available again
  • 2020-06-12 22:00:52 UTC gerrit is being taken offline for emergency cleanup, will return to service again shortly
  • 2020-06-12 20:58:38 UTC The Gerrit service on review.opendev.org is going offline momentarily at 21:00 UTC for project rename maintenance, but should return within a few minutes: http://lists.opendev.org/pipermail/service-announce/2020-June/000004.html
  • 2020-06-12 17:54:36 UTC deleted openstack/octavia branch stable/pike (previously 2976a7f0f109e17930db8a61136526ead44ea7e5) as requested by johnsom and smcginnis
  • 2020-06-12 17:53:58 UTC deleted openstack/octavia branch stable/ocata (previously c2fdffc3b748f8007c72e52df257e38756923b40) as requested by johnsom and smcginnis
  • 2020-06-12 17:52:13 UTC deleted openstack/python-octaviaclient branch stable/pike (previously d4a5507c99430a7efb4a0ab83a47ca48937b23cf) as requested by johnsom and smcginnis
  • 2020-06-12 13:44:57 UTC full release of mirror.ubuntu afs volume is underway ina root screen session on afs01.dfw.openstack.org, mirror update flock is held in a root screen session on mirror-update.openstack.org
  • 2020-06-11 17:49:51 UTC cleaned up dangling fileserver entry with `vos changeaddr 127.0.1.1 -remove -localauth` as suggested by auristor
  • 2020-06-09 20:37:05 UTC started nb03.openstack.org that had shutoff
  • 2020-06-09 17:35:41 UTC updated intermediate cert bundle for openstackid.org and openstackid-dev.o.o to remove the expired addtrust ca cert, as its presence causes problems for requests on older python versions
  • 2020-06-09 14:26:51 UTC used pip to upgrade the versions of requests, requests-cache and certifi on refstack.openstack.org to get newer cert bundle not impacted by addtrust ca expiration
  • 2020-06-09 13:56:13 UTC used `dpkg-reconfigure ca-certificates` to manually deselect the expired addtrust roots on refstack.openstack.org, as a workaround for authentication failures (this is an ubuntu-trusty-specific problem)
  • 2020-06-08 17:57:44 UTC deleted down instances jvb02.opendev.org, jvb03.opendev.org, jvb04.opendev.org after removal from inventory and dns
  • 2020-06-05 19:06:14 UTC powered off jvb02.opendev.org, jvb03.opendev.org and jvb04.opendev.org in preparation for their removal
  • 2020-06-03 22:53:15 UTC manually reloaded apache on graphite.o.o to pick up renewed le cert, https://review.opendev.org/733247 fixes longer term
  • 2020-06-01 13:35:20 UTC renamed etherpad SDK-VictoriaPTG-Planning to sdk-victoriaptg-planning and API-SIG-VictoriaPTG-Planning to api-sig-victoriaptg-planning per gtema's request at 08:07 utc today in #opendev
  • 2020-05-29 15:42:59 UTC manually ran `ansible-playbook playbooks/service-eavesdrop.yaml` as root from /home/zuul/src/opendev.org/opendev/system-config on bridge.o.o to deploy a ptgbot fix faster than the daily periodic pipeline run would have
  • 2020-05-26 20:30:45 UTC ze12 war rebooted around 18:50z due to a hypervisor host problem in the provider, ticket 200526-ord-0001037
  • 2020-05-22 18:05:05 UTC manually deleted empty znode /nodepool/images/centos-7/builds/0000124190
  • 2020-05-20 21:36:33 UTC added gearman certs to private hostvars for ease of management, and moved gearman client certs and keys to the zuul group (in privatate hostvars)
  • 2020-05-20 08:00:46 UTC rebooted mirror.kna1.airship-citycloud.opendev.org ; it was refusing a few connection and had some old hung processes lying around
  • 2020-05-15 17:31:02 UTC Restarted All of Zuul on version: 3.18.1.dev166 319bbacf. This has scheduler, web, and mergers running under python3.7. We have also incorporated bug fixes for config loading (handle errors across tenants and don't load from tags) as well as improvements to the merger around reseting repos and setting HEAD.
  • 2020-05-15 09:14:36 UTC vos release for mirror.ubuntu completed successfully, dropped the lock on mirror-update to resume normal operations
  • 2020-05-13 23:41:49 UTC removed nb01/02.openstack.org servers and volumes
  • 2020-05-13 22:11:53 UTC Restarted zuul-web on zuul.opendev.org in order to switch back to the python3.7 based images. This will act as a canary for memory use.
  • 2020-05-13 20:58:25 UTC deleted obsolete servers mirror01.bhs1.ovh.openstack.org and mirror01.gra1.ovh.openstack.org
  • 2020-05-13 15:23:05 UTC Restarted apache2 on mirror.org.rax.opendev.org. It had an apache worker from April 14 that was presumed to be the problem with talking to dockerhub after independently verifying all round robin backends using s_client.
  • 2020-05-11 21:32:52 UTC reenqueued 726907,3 for opendev/system-config into deploy pipeline after old .git dir was moved out of the way
  • 2020-05-11 21:25:36 UTC moved contents of /etc/ansible/roles on bridge to /etc/ansible/roles/old-2020-05-11 to allow ansible to recreate as git repos
  • 2020-05-11 21:03:39 UTC reenqueued 726848,1 for openstack/project-config into deploy pipeline after fix 726907 merged
  • 2020-05-11 16:14:14 UTC Restarted ptgbot on eavesdrop.openstack.org as it had netsplit into some alternate reality
  • 2020-05-11 15:22:12 UTC deleted unused mirror01.gra1.ovh.opendev.org server instance and associated main01 and tmpbuild cinder volumes
  • 2020-05-11 15:10:44 UTC all ovh mirror servers placed in emergency disable list in preparation for replacement
  • 2020-05-11 15:09:56 UTC Our CI mirrors in OVH BHS1 and GRA1 regions were offline between 12:55 and 14:35 UTC, any failures there due to unreachable mirrors can safely be rechecked
  • 2020-05-11 14:46:25 UTC deleted calebb-mirror-update-test server instance in ovh bhs1 region
  • 2020-05-10 13:29:38 UTC hard restarted apache2 on ethercalc.o.o to clear stale workers serving expired ssl cert, had to forcibly kill some workers which would not stop cleanly
  • 2020-05-10 13:29:18 UTC hard restarted apache2 to clear stale workers serving expired ssl cert, had to forcibly kill some workers which would not stop cleanly
  • 2020-05-08 20:21:25 UTC Restarted zuul-scheduler container on zuul01 to pick up the jemalloc removal in the containers which seems to address python memory leaks.
  • 2020-05-08 20:20:49 UTC Restarted gerrit container on review.opendev.org to pick up new replication config (no github, replication for github runs through zuul jobs now)
  • 2020-05-08 16:26:20 UTC terminated dstat process on lists.o.o after 10 days with no oom
  • 2020-05-08 12:55:49 UTC deleted stable/ussuri branch (7bbf84d3a1dee35eb231f4a459aa6f2cc6e7c811) from openstack/puppet-openstack_spec_helper at the request of tobias-urdin and smcginnis
  • 2020-05-07 16:03:24 UTC deleted openstack/rally branch stable/0.9 (304c76a939b013cbc4b5d0cbbaadecb6c3e88289) per https://review.opendev.org/721687
  • 2020-05-07 02:55:35 UTC nb01/02.openstack.org shutdown and in emergency file; nb01/02.opendev.org are replacements
  • 2020-05-06 22:34:16 UTC deleted openstack/rally branches stable/0.10 (67759651f129704242d346a2c045413fcdea912d) and stable/0.12 (99f13ca7972d7f64b84204c49f1ab91da6d6cb6b) per https://review.opendev.org/721687 (stable/0.9 left in place for now as it still has open changes)
  • 2020-05-06 21:39:38 UTC finished rolling out nodepool ansible change - all launchers now running in docker
  • 2020-05-06 13:22:28 UTC unlocked mirror.yum-puppetlabs afs volume and manually performed vos release to clear stale lock from 2020-04-28 afs01.dfw outage
  • 2020-05-06 02:55:31 UTC unlocked mirror.opensuse afs volume and manually performed vos release to clear stale lock from 2020-04-28 afs01.dfw outage
  • 2020-05-05 18:18:25 UTC unlocked mirror.fedora afs volume and manually performed vos release to clear stale lock from 2020-04-28 afs01.dfw outage
  • 2020-05-04 21:33:20 UTC restarted zuul-web without LD_PRELOAD var set for jemalloc.
  • 2020-05-04 19:32:08 UTC deployments unpaused by removing /home/zuul/DISABLE-ANSIBLE on bridge.o.o
  • 2020-05-04 18:35:58 UTC manually pulled updated gerrit image on review.o.o for recent jeepyb fix
  • 2020-05-04 18:24:17 UTC temporarily paused ansible deploys from zuul by touching /home/zuul/DISABLE-ANSIBLE on bridge.o.o
  • 2020-05-04 17:00:47 UTC Restarted zuul-web and zuul-fingergw on new container images on zuul01. We were running out of memory due to leak in zuul-web which may be caused by python3.7 and new images provide python3.8
  • 2020-05-04 14:45:20 UTC deleted centos-7-0000124082 image to force rebuild with newer virtualenv
  • 2020-05-04 14:45:08 UTC unlocked centos mirror openafs volume and manually started release
  • 2020-05-04 11:21:49 UTC mirror.us-east.openedge.opendev.org was down, donnyd restarted the node and openedge should be fine again
  • 2020-04-30 21:34:17 UTC restarted gerritbot after it failed to rejoin channels following a freenode netsplit
  • 2020-04-30 18:46:38 UTC restarted zuul-web with image built from equivalent of zuul commit d17b8e7f7f5b68c93c6d2bdba1b94df87f8ee93d (cherrypy pin 18.3.0)
  • 2020-04-29 15:49:37 UTC moved /var/cache/apache2 for mirror01.regionone.linaro-us.opendev.org onto separate 100gb cinder volume to free some of its rootfs
  • 2020-04-29 14:36:43 UTC Zuul had to be restarted, all changes submitted or approved between 14:00 UTC to 14:30 need to be rechecked, we queued already those running at 14:00
  • 2020-04-28 20:18:09 UTC follow-up reboot of static01 after a lot of hung i/o processes due to afs01 issues
  • 2020-04-28 20:14:42 UTC reboot afs01.dfw.openstack.org due to host hypervisor issues killing server
  • 2020-04-28 12:29:59 UTC Zuul has been restarted, all events are lost, recheck or re-approve any changes submitted since 9:50 UTC.
  • 2020-04-28 09:17:39 UTC Zuul is currently failing all testing, please refrain from approving, rechecking or submitting of new changes until this is solved.
  • 2020-04-28 09:06:54 UTC Zuul is currently failing testing, please refrain from recheck and submitting of new changes until this is solved.
  • 2020-04-27 15:51:19 UTC Updated lists.openstack.org to use Apache mpm_event insteand of mpm_worker. mpm_worker was a holdover from doing in place upgrades of this server. All other Xenial hosts default to mpm_event.
  • 2020-04-27 15:23:40 UTC running `dstat -tcmndrylpg --tcp --top-cpu-adv --top-mem-adv --swap --output dstat-csv.log` in a root screen session on lists.o.o
  • 2020-04-27 15:23:21 UTC lists.openstack.org rebooted for kernel update
  • 2020-04-26 23:33:23 UTC added _acme-challange.zuul.openstack.org CNAME to acme.opendev.org
  • 2020-04-26 08:48:02 UTC Zuul is happy testing changes again, changes with MERGER_FAILURE can be rechecked.
  • 2020-04-25 21:09:13 UTC restarted all zuul executors and mergers on 9b300bc
  • 2020-04-25 14:19:34 UTC Zuul is currently failing some jobs with MERGER_FAILURE, this needs investigation by OpenDev team. Please refrain from rechecking until we give the all-clear.
  • 2020-04-24 22:59:16 UTC the This Zuul outage was taken as an opportunity to perform an impromptu maintenance for changing our service deployment model; any merge failures received from Zuul between 19:40 and 20:20 UTC were likely in error and those changes should be rechecked; any patches uploaded between 20:55 and 22:45 UTC were missed entirely by Zuul and should also be rechecked to get fresh test results
  • 2020-04-24 20:22:12 UTC The Zuul project gating service is reporting new patches in merge conflict erroneously due to a configuration error, fix in progress
  • 2020-04-24 15:30:10 UTC uploaded focal-minimal image to rax-dfw for opendev control plane use
  • 2020-04-23 21:10:09 UTC running `dstat -tcmndrylpg --tcp --output dstat-csv.log` in a root screen session on lists.o.o to diagnose recurring oom issue
  • 2020-04-23 14:13:33 UTC restarted all mailman sites on lists.openstack.org following oom events around 12:35-12:45 utc
  • 2020-04-22 21:57:05 UTC replaced content of starlingx/kernel f/centos8 branch with f/centos8 branch from github.com/dpanech/starlingx-kernel per sgw
  • 2020-04-22 20:50:19 UTC Removed Trusty from our Ubuntu mirrors and added Focal. Updates have been vos released and should be in production.
  • 2020-04-22 09:27:37 UTC restarted apache2 on ethercalc.openstack.org which seems to have gotten stuck during today's log rotation
  • 2020-04-21 17:54:52 UTC deleted old etherpad.openstack.org and etherpad-dev.openstack.org servers
  • 2020-04-20 15:48:59 UTC Gerrit will be restarted to correct a misconfiguration which caused some git mirrors to have outdated references.
  • 2020-04-20 14:00:47 UTC Zuul is temporarily offline; service should be restored in about 15 minutes.
  • 2020-04-17 15:35:32 UTC submitted and confirmed spamhaus PBL removal request for 104.130.246.32 (review01.openstack.org)
  • 2020-04-16 23:46:27 UTC upgraded zk ensemble to 3.5.7 running in containers
  • 2020-04-16 21:53:28 UTC marked gerrit account 9079 inactive at request of sreejithp
  • 2020-04-16 19:18:54 UTC restarted all mailman sites on lists.openstack.org following oom events around 12:45-12:50z
  • 2020-04-16 05:59:30 UTC restarted all nodepool builders to pickup https://review.opendev.org/#/c/713157/
  • 2020-04-14 09:59:01 UTC restarted statusbot and meetbot which both seem to have stopped working around 2020-04-13 21:40Z
  • 2020-04-13 15:46:16 UTC set unused gerrit account 31007 inactive for avass
  • 2020-04-13 14:37:19 UTC deleted stable/stein branch formerly at 0eb5127abc8e9275c8b6f1b2c5ac735b936cc001 from openstack/tripleo-ansible per EmilienM
  • 2020-04-11 14:35:20 UTC Restarting gerrit to fix an issue from yesterday's maintenance
  • 2020-04-10 22:26:57 UTC Maintenance on etherpad.opendev.org is complete and the service is available again
  • 2020-04-10 20:10:59 UTC Due to a database migration error, etherpad.opendev.org is offline until further notice.
  • 2020-04-10 19:50:28 UTC deleted openstack/cinder branch driverfixes/newton formerly at b9f6cd23ed7c806487d5df065b19741aecd36438
  • 2020-04-10 19:47:57 UTC deleted openstack/cinder branch driverfixes/mitaka formerly at a77f17e3778377a0d7aee9bf412554551a6b8435
  • 2020-04-10 17:54:45 UTC The etherpad migration is still in progress; revised estimated time of completion 18:30 UTC
  • 2020-04-10 17:04:07 UTC etherpad.openstack.org will be offline for about 30 minutes while it is migrated to a new server with a new hostname; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html
  • 2020-04-10 16:07:45 UTC review.opendev.org is being restarted for scheduled maintenance; see http://lists.opendev.org/pipermail/service-announce/2020-April/000003.html
  • 2020-04-09 20:51:31 UTC restarted all mailman sites on lists.openstack.org following oom events around 09:00z
  • 2020-04-09 20:28:55 UTC All ansible has been migrated to Zuul jobs, nothing is running via run_all.sh any more
  • 2020-04-08 15:51:24 UTC used the restoreRevision api call to reset the content of the magnum-weekly-meeting pad to old revision 9012 on etherpad.o.o
  • 2020-04-07 22:07:34 UTC lowered etherpad.openstack.org dns ttl to 300 seconds
  • 2020-04-07 14:18:00 UTC removed zp01 from emergency file since open-proxy issue is resolved in container image
  • 2020-04-07 00:51:04 UTC rebooted storyboard.openstack.org following a slew of out-of-memory conditions which killed various processes
  • 2020-04-06 22:11:10 UTC restarted all of logstash geard, workers and logstash itself to reset after nova fixed its n-api log files
  • 2020-04-06 11:31:41 UTC restarted ze05 which seems to have been dead since 2020-04-02-02:05Z
  • 2020-04-04 15:56:17 UTC launched etherpad01.opendev.org server
  • 2020-04-04 15:21:14 UTC added an acme cname for etherpad01.opendev.org
  • 2020-04-03 20:29:44 UTC deleted ubuntu-bionic-0000104264 image due to missing gpg-related packages
  • 2020-04-03 13:30:26 UTC restarted apache on mirror02.mtl01.inap since certcheck spotted a stale apache worker
  • 2020-04-02 21:42:32 UTC ze07 rebooted at 21:13 by the provider citing unspecified hypervisor host issues; ticket 200402-ord-0001040
  • 2020-04-01 00:04:31 UTC restarted queue managers for all 5 mailman sites on lists.o.o following a spate of oom conditions
  • 2020-03-24 00:43:42 UTC removed old files02 and static.openstack.org servers
  • 2020-03-23 20:23:29 UTC manually deleted old /afs/.openstack.org/project/tarballs.opendev.org/openstack/openstackid which was previously copied and switched to .../osf/openstackid/
  • 2020-03-22 17:30:12 UTC removed /afs/.openstack.org/project/tarballs.opendev.org/openstack/ironic-python-agent-builder/dib/{files/ipa-centos7-master.*,ipa-centos7-master.tar.gz*} ~2020-03-21 22:30z at TheJulia's request
  • 2020-03-21 12:12:44 UTC ze01 was rebooted at 04:08z and 06:26z in the process of a server migration off a problem hypervisor host
  • 2020-03-20 21:44:46 UTC removed openstackid01.openstack.org from ansible emergency disable list to apply https://review.opendev.org/714215
  • 2020-03-20 17:13:27 UTC restored revision 640 for Virtual_PTG_Planning etherpad following defacement
  • 2020-03-20 16:59:05 UTC corrected "chat.frenode.net" misspelling in /etc/hosts on eavesdrop.o.o
  • 2020-03-20 15:07:34 UTC Gerrit maintenance is concluded at this time and requested renames have been performed
  • 2020-03-20 14:10:31 UTC The Gerrit service on review.opendev.org is offline for maintenance until 15:00 UTC http://lists.opendev.org/pipermail/service-announce/2020-March/000001.html
  • 2020-03-20 13:44:58 UTC Gerrit (review.opendev.org) will be down for maintenance starting at 14:00 (in less than 20 mins), probably until 15:00 UTC
  • 2020-03-18 21:16:13 UTC nb01.opendev.org shutdown ... nb04 will be the replacement
  • 2020-03-18 21:16:04 UTC update hostname override for statusbot on eavesdrop.openstack.org
  • 2020-03-11 15:26:42 UTC restarted all mailman queue runners on lists.o.o at 14:12z following an oom killer incident
  • 2020-03-11 10:28:00 UTC The mail server for lists.openstack.org is currently not handling emails. The infra team will investigate and fix during US morning.
  • 2020-03-10 20:19:43 UTC Fedora 29 images are retired
  • 2020-03-05 00:34:51 UTC removed logs.openstack.org and logs-dev.openstack.org CNAMES as there is nothing to serve any more ((https://storyboard.openstack.org/#!/story/2006598 task#37735)
  • 2020-03-05 00:27:32 UTC removed project.gittest volume (https://storyboard.openstack.org/#!/story/2006598 task #38841)
  • 2020-03-05 00:16:13 UTC files02.openstack.org & static-old.openstack.org hosts in emergency file and shutdown for retirement. old system-config configuration to be removed next week
  • 2020-03-04 11:56:42 UTC restarted statusbot, seemed to have stopped doing things at 2020-03-03 10:56:47
  • 2020-02-28 23:21:23 UTC Removed qa.openstack.org apache config from static.opendev.org. DNS still needs cleanup if we are comfortable with that.
  • 2020-02-28 20:40:53 UTC added missing reverse dns entries for mirror01.ord.rax.opendev.org
  • 2020-02-28 17:45:02 UTC restarted zuul executors on commit a25bab2f3c3b190fce5b872790dd60f955e5d29c
  • 2020-02-28 05:20:10 UTC domains from https://review.opendev.org/#/c/710160/ switched to CNAMEs for static.opendev.org
  • 2020-02-28 02:00:06 UTC switched files.openstack.org CNAME to static.opendev.org
  • 2020-02-28 00:36:17 UTC Restarted nodepool launcers on nodepool==3.11.1.dev35 # git sha aba9b4e to pick up proper fix for missing node issue that was previously worked around with a restart.
  • 2020-02-27 22:35:29 UTC Restarted nodepool launcher on nl01-nl04 to clear out state related to a deleted znode. Launchers now running nodepool==3.11.1.dev34 # git sha 5d37a0a
  • 2020-02-27 19:46:41 UTC The scheduler for zuul.opendev.org has been restarted; any changes which were in queues at the time of the restart have been reenqueued automatically, but any changes whose jobs failed with a RETRY_LIMIT, POST_FAILURE or NODE_FAILURE build result in the past 14 hours should be manually rechecked for fresh results
  • 2020-02-27 19:13:09 UTC Memory pressure on zuul.opendev.org is causing connection timeouts resulting in POST_FAILURE and RETRY_LIMIT results for some jobs since around 06:00 UTC today; we will be restarting the scheduler shortly to relieve the problem, and will follow up with another notice once running changes are reenqueued.
  • 2020-02-26 21:24:19 UTC git.openstack.org git.starlingx.io and git.airshipit.org updated to be CNAME to static.opendev.org (https://review.opendev.org/#/c/709403/)
  • 2020-02-26 00:28:11 UTC deleted unused branch feature/openid of project osf/openstackid previously at 7350bfc8f40a5735984271d2c13123df8c0872a0 with smarcet's approval
  • 2020-02-25 23:40:53 UTC opendev.org gitea has been upgraded to 1.11.1
  • 2020-02-25 15:52:43 UTC removed openstackid01.openstack.org from emergency disable list now that 709719 has merged
  • 2020-02-25 15:35:52 UTC deleted old review-dev01.openstack.org
  • 2020-02-25 15:13:35 UTC triggered a zuul-scheduler full-reconfigure on zuul01 to troubleshoot lack of job matches on openstack/openstack-ansible-rabbitmq_server changes
  • 2020-02-24 22:10:27 UTC On all Zuul hosts: uninstalled smmap, smmap2, gitdb, gitdb2, GitPython then ran pip3 install -U -r /opt/zuul/requirements.txt to reinstall Zuul's deps against current happy pypi state as package movement over the weekend left us in a sad state withe smmap, gitdb, and GitPython.
  • 2020-02-24 21:43:18 UTC copied content from /afs/.openstack.org/project/tarballs.opendev.org/openstack/openstackid to ../../osf/openstackid to reflect new publication path
  • 2020-02-24 12:35:40 UTC force-merged requirements fix https://review.opendev.org/709389 to unblock gate
  • 2020-02-24 05:57:11 UTC switched developer.openstack.org docs.openstack.org and docs.starlingx.io CNAME to static.opendev.org (see https://review.opendev.org/709024)
  • 2020-02-23 12:18:53 UTC The failing Zuul process has been restarted, feel free to recheck.
  • 2020-02-23 09:58:31 UTC restarted zuul-executor on ze04 due to process pool failure
  • 2020-02-23 08:32:59 UTC Zuul is unhappy, lots of jobs are failing with RETRY_LIMIT. Please wait until the issue is found and fixed with rechecks and approvals.
  • 2020-02-22 17:47:09 UTC restarted zuul-executor on ze02 due to process pool failure
  • 2020-02-22 17:46:58 UTC restarted gerrit to correct cached git object error with openstack/openstack-ansible-rabbitmq_server (repo on disk appears normal)
  • 2020-02-22 17:46:51 UTC restarted statusbot because it disappeared
  • 2020-02-21 14:41:12 UTC restarted the mailman-openstack runners on lists.openstack.org following a 15:17:50 oom killer event yesterday
  • 2020-02-20 19:46:44 UTC gitea upgraded to 1.10.3
  • 2020-02-20 15:39:19 UTC rebooted gitea05 - adding back to haproxy
  • 2020-02-20 05:28:37 UTC tarballs.<openstack|opendev>.org are now served by static.opendev.org from /afs/openstack.org/project/tarballs.opendev.org. all jobs updated
  • 2020-02-20 05:19:18 UTC tarballs.opendev.org CNAME updated to static.opendev.org (change #708795) which is now serving its content from /afs/openstack.org/projects/tarballs.openstack.org/openstack/
  • 2020-02-20 05:14:29 UTC tarballs.openstack.org CNAME updated to static.opendev.org which is now serving its content from /afs/openstack.org/projects/tarballs.openstack.org/openstack/
  • 2020-02-20 03:32:18 UTC switched A/AAAA records for service-types.openstack.org to a CNAME for static.opendev.org, where it is now published from /afs/openstack.org/projects/service-types.openstack.org
  • 2020-02-20 02:58:46 UTC switched CNAME entry for specs.openstack.org to static.opendev.org where it is now published from /afs/openstack.org/project/specs.openstack.org
  • 2020-02-18 05:37:10 UTC added 100gb to yum-puppetlabs mirror to stop it hitting afs quota
  • 2020-02-17 23:02:19 UTC rebooted logstash-worker15 and etherpad-dev due to hung task kernel errors on console
  • 2020-02-14 10:58:13 UTC disabled gitea05 on the lb due to some brokenness
  • 2020-02-13 21:21:41 UTC temporarily deactivated gerrit account 30112 for "Aditi Pai Dukle" due to comment spam http://lists.openstack.org/pipermail/openstack-infra/2020-February/006600.html
  • 2020-02-13 20:44:49 UTC used mailman's `rmlist` tool to retire the abandoned openstack-sos mailing list on lists.openstack.org following confirmation from the list owner
  • 2020-02-13 04:59:29 UTC rebooted ze12 due to it hitting an OOM situation (see https://storyboard.openstack.org/#!/story/2007290). please recheck any changes with RETRY_LIMIT failed jobs
  • 2020-02-11 17:48:02 UTC restarted zuul scheduler at 3.16.0 with https://review.opendev.org/707205 manually applied
  • 2020-02-11 16:36:36 UTC restarted all of zuul at 3.16.0
  • 2020-02-11 16:08:31 UTC added LE records for review-dev.o.o and review.o.o to rax dns and made review-dev.o.o a cname to review-dev.opendev.org
  • 2020-02-11 03:22:43 UTC rebooted nb01 & nb02 and cleared out tmp directories as yum-based builds were getting stuck on old, unexited dib processes
  • 2020-02-10 11:09:13 UTC restarted zuul-executor on ze12 after it had OOMed
  • 2020-02-10 01:18:01 UTC switched nb03.openstack.org dns from old server in london (213.146.141.47) to new us-based host (139.178.85.141)
  • 2020-02-07 17:58:05 UTC created new project.airship volume mounted at /afs/.openstack.org/project/airshipit.org with two read-only replicas and set quota consistent with project.zuul and project.starlingx volumes
  • 2020-02-07 17:57:04 UTC increased the /afs/.openstack.org/project/zuul-ci.org quota from 100000kb to 1000000kb
  • 2020-02-07 14:28:48 UTC requested delisting of ipv4 address for pbx01.opendev.org from the spamhaus pbl
  • 2020-02-07 14:24:06 UTC requested delisting of ipv6 address for graphite01.opendev.org from the spamhaus sbl/css
  • 2020-02-06 21:45:51 UTC Deleted 6ed0e171-01c7-4d66-ac52-f27a3db8d6a7 status.openstack.org as it has been replaced with status01.openstack.org
  • 2020-02-06 20:01:19 UTC requested delisting of ipv4 address for mirror01.dfw.rax.opendev.org from the spamhaus pbl
  • 2020-02-06 19:59:09 UTC requested delisting of ipv4 address for mirror-update01.opendev.org from the spamhaus pbl
  • 2020-02-06 19:56:35 UTC requested delisting of ipv6 address for storyboard01.opendev.org from the spamhaus sbl/css
  • 2020-02-06 19:56:28 UTC restarted statusbot after failing to return from a 00:28:26 utc irc ctcp ping timeout
  • 2020-02-05 19:09:03 UTC delisted ipv4 address of nb01.opendev.org from the spamhaus pbl
  • 2020-02-05 18:54:54 UTC delisted ipv6 address of mirror-update01.opendev.org from the spamhaus pbl
  • 2020-02-05 18:38:11 UTC halted mirror01.mtl01.inap.opendev.org (which does not appear to be in production) in preparation for deletion
  • 2020-02-05 16:05:38 UTC delisted ipv6 address of storyboard01.opendev.org from the spamhaus pbl
  • 2020-02-05 16:05:32 UTC restarted statusbot after failing to return from a 2020-02-04 11:16:50 utc netsplit
  • 2020-02-03 23:26:55 UTC rebooted graphite.opendev.org due to it being hung with storage i/o errors
  • 2020-01-29 22:35:01 UTC filed pbl whitelisting request with spamhaus for lists.katacontainers.io ipv4 address
  • 2020-01-29 20:30:35 UTC governance.openstack.org and security.openstack.org switched CNAMEs to new static.opendev.org server
  • 2020-01-29 17:49:54 UTC Updated status.openstack.org to point at new Xenial status01.openstack.org server
  • 2020-01-28 18:01:37 UTC a reboot migration of wiki.openstack.org is scheduled for 2020-01-31 at 02:16 UTC per provider ticket #200127-ord-0000056
  • 2020-01-28 18:00:50 UTC a reboot migration of static.openstack.org is scheduled for 2020-01-30 at 20:15 UTC per provider ticket #200126-ord-0000316
  • 2020-01-27 20:27:44 UTC static.o.o lvm scaled back to just the main01 volume, and some ancient unused files deleted from remaining logical volumes
  • 2020-01-27 20:27:38 UTC statusbot restarted after 2020-01-23T21:20:50 irc ping timeout
  • 2020-01-21 18:29:10 UTC performed a hard reboot of zm06 after it lost the use of its rootfs (likely 2020-01-16:05:00z per gap in cacti graphs)
  • 2020-01-20 17:42:51 UTC deleted afs volume mirror.wheel.trustyx64
  • 2020-01-20 13:53:07 UTC Ubuntu Trusty images have been removed
  • 2020-01-18 15:47:28 UTC legacy-ubuntu-trusty nodeset is retired, ubuntu-trusty wheels are frozen
  • 2020-01-17 20:02:04 UTC restarted statusbot service on eavesdrop.o.o to recover following a 07:20z ctcp ping timeout
  • 2020-01-17 20:01:23 UTC restarted zuul scheduler at commit e6d8b210cc416ed494b0b0248404e3e6d7ce337c with 10190257f reverted to debug memory leak
  • 2020-01-15 19:22:15 UTC restarted statusbot service on eavesdrop.o.o to recover following a 18:32z ctcp ping timeout
  • 2020-01-14 23:30:55 UTC restarted nodepool launchers at nodepool==3.10.1.dev43 # git sha 9036dd7
  • 2020-01-14 23:30:32 UTC restarted all of zuul at e6d8b210cc416ed494b0b0248404e3e6d7ce337c
  • 2020-01-14 19:19:18 UTC removed gerrit sysvinit scripts from review-dev01
  • 2020-01-14 17:07:59 UTC Restarted nodepool-builders on nodepool==3.10.1.dev41 # git sha b0c0d27
  • 2020-01-13 18:33:42 UTC used mailman's `rmlist` tool to retire the following abandoned lists.openstack.org mailing lists at their respective owners' requests (archives were preserved and not removed): elections-committee, openstack-content, staffwithtrinet, transparency, women-of-openstack
  • 2020-01-13 16:30:43 UTC manually removed setuptools 45.0.0 wheels from our afs wheel cache
  • 2020-01-10 20:07:49 UTC Rebooted logstash-worker05 via nova api after discovering it has stopped responding to ssh for several days.
  • 2020-01-10 20:07:14 UTC Added gmann to devstack-gate-core to help support fixes necessary for stable branches there.
  • 2020-01-10 02:41:39 UTC restarted zuul-executor service on ze02,03,04,05,12 to get log streamers running again after oom-killer got them; had to clear stale pidfile on zm04
  • 2020-01-09 18:15:54 UTC manually restarted zuul-merger service on zm05 at 18:02z because it did not start automatically following a host outage earlier at 02:10-03:15z
  • 2020-01-08 15:55:22 UTC trove instance subunit2sql-MySQL was down briefly between 05:00 and 05:25 utc due to an offline migration for impending host issues
  • 2020-01-03 23:30:50 UTC issued server reboot via api for zm02, hung and unresponsive since ~00:40z on 2019-01-03
  • 2019-12-19 15:38:17 UTC moved corrupted etherpad magnum-ussuri-virtual-ptg-planning to magnum-ussuri-virtual-ptg-planning_broken at brtknr's request
  • 2019-12-18 22:00:49 UTC restarted all of zuul at commit 84f6ea667c3453a75a7a0210ee08228c9eec167a
  • 2019-12-10 18:45:22 UTC restarted all of zuul at commit 15afed554e2bbac87aad83b0a42cd47c66858cfd
  • 2019-12-09 21:26:25 UTC Restarted Nodepool Launchers at nodepool==3.9.1.dev11 (e391572) and openstacksdk==0.39.0 to pick up OVH profile updates as well as fixes in nodepool to support newer openstacksdk
  • 2019-12-09 19:25:36 UTC Performed full Zuul restart at 19:00 UTC with zuul==3.11.2.dev72 (57aa3a0) for updated OpenStackSDK 0.39.0 release and Depends-On and GitHub driver improvements to avoid looking up extra unneeded changes via code review APIs
  • 2019-12-09 17:09:25 UTC /opt DIB workspace on nb0[123] was full. Removed all data in dib_tmp directory on each and restarted all builders on commit e391572495a18b8053a18f9b85beb97799f1126d
  • 2019-11-25 16:15:13 UTC registered #openstack-snaps with chanserv
  • 2019-11-25 16:15:02 UTC restarted statusbot following an irc ping timeout at 11:39:48z
  • 2019-11-24 22:37:53 UTC force merged pydistutils.cfg removal patch https://review.opendev.org/695821 ; see notes in change about reasoning
  • 2019-11-22 06:26:22 UTC mirror.fedora released manually successfully, manual lock dropped
  • 2019-11-21 14:42:33 UTC etherpad.openstack.org was offline between 01:55 and 03:00 utc due to a hypervisor host issue, provider incident id CSHD-53e26fd0 (no trouble ticket reference was provided this time)
  • 2019-11-20 18:24:12 UTC Added 200GB cinder volume to mirror01.mtl01.inap.opendev.org to be used as openafs and apache2 cache backend. Filesystems created and mounted at the appropriate cache locations and services restarted.
  • 2019-11-19 23:59:51 UTC restarted ze04, ze08 & ze09 due to OOM kills of the streaming daemon. ze09 zuul processes were completely stopped so rebooted the host
  • 2019-11-19 20:32:19 UTC manually killed all vos release processes running since 2019-11-15 on mirror-update.openstack.org and mirror-update.opendev.org servers
  • 2019-11-19 16:22:39 UTC all afs mirrors are stuck in the middle of vos release commands since early utc friday
  • 2019-11-19 06:09:03 UTC rebooted afs02.dfw.openstack.org after it's console was full of I/O errors. very much like what we've seen before during host migrations that didn't go so well
  • 2019-11-15 02:03:26 UTC further investigation hasn't revealed solutions for gitea06's issues. https://storyboard.openstack.org/#!/story/2006849 is updated. host should probably be rebuilt. will wait to see if any comments on https://github.com/go-gitea/gitea/issues/9006
  • 2019-11-14 05:34:24 UTC gitea06 showing upload-pack errors per : https://storyboard.openstack.org/#!/story/2006849. i have disabled it in the load balancer so we can investigate
  • 2019-11-13 19:29:41 UTC openSUSE 15.0 has been removed from infra
  • 2019-11-13 17:03:04 UTC performed an etherpad restoreRevision for padID=Airship_bootstrap&rev=13875 per roman_g's request
  • 2019-11-12 13:15:54 UTC restarted gerritbot after it hit a freenode ping timeout at 12:41:15
  • 2019-11-11 12:21:03 UTC again restarted apache2 service on ask.o.o which had died during logrotation
  • 2019-11-06 01:44:42 UTC restarted executor on ze09 after oom killed finger daemon
  • 2019-11-03 01:53:46 UTC etherpad.o.o was back online and confirmed working as of 01:50 utc
  • 2019-11-03 01:45:05 UTC the hypervisor host for etherpad.o.o has suffered an outage as of 01:30 utc, provider trouble ticket 191103-ord-0000049
  • 2019-11-01 15:01:04 UTC tagged openstack/nose-html-output 0.0.7 release
  • 2019-11-01 14:49:37 UTC re-tagged openstack/nose-html-output 0.0.5 as 0.0.6 with detailed tag message
  • 2019-10-31 14:25:56 UTC manually updated the project description at https://storyboard.openstack.org/#!/project/openstack/whereto to indicate it uses launchpad
  • 2019-10-31 14:20:47 UTC corrected the https://launchpad.net/watcher owner from hudson-openstack to watcher-drivers at licanwei's request
  • 2019-10-30 17:04:07 UTC Rax quota set to 0 while we rebuild images with vhd-util instead of qemu-img so that they can be resized in rax. Once images are updated we should request quote be reset back to normal
  • 2019-10-28 23:23:39 UTC manually unlocked mirror.ubuntu volume; have lock for fedora & ubuntu mirrors and running manual releases for both in screen session on afs01.dfw.openstack.org
  • 2019-10-28 17:04:31 UTC Services on graphite.o.o stopped, stats.timers resized to match new reduced retention config, services restarted, and files chowned back to www-data.
  • 2019-10-24 17:02:53 UTC reenabled inap mtl01 in nodepool
  • 2019-10-24 16:45:00 UTC updated the owner for https://launchpad.net/os-brick from https://launchpad.net/~openstack-admins to https://launchpad.net/~cinder-drivers at rosmaita's request (the former already owns the latter, so no control is lost)
  • 2019-10-24 16:17:42 UTC restarted all of zuul on commit 360f1e9fd32e7aa5eec7135838cb5e14f0cac6ae
  • 2019-10-18 17:07:23 UTC deleted erroneous puppet-mistral project from pypi.org per request from release team http://eavesdrop.openstack.org/irclogs/%23openstack-release/%23openstack-release.2019-10-18.log.html#t2019-10-18T13:00:50
  • 2019-10-17 16:58:41 UTC Restarted nodepool launchers on commit cec796eaf2419b8741d2ca10720601485a5d5750 to pick up new python-path support.
  • 2019-10-16 20:03:10 UTC restarted all of zuul zuul on tag 3.11.1
  • 2019-10-16 19:11:15 UTC openSUSE 42.3 images have been removed from Zuul
  • 2019-10-16 14:18:55 UTC removed /srv/static/tarballs/ironic-python-agent{/dib/files/UPLOAD_RAW,/dib/UPLOAD_TAR,-builder/dib/files/UPLOAD_RAW,-builder/dib/UPLOAD_TAR} from static.openstack.org at dtantsur's request
  • 2019-10-16 14:18:01 UTC removed /srv/static/tarballs/ironic-python-agent{/dib/files/UPLOAD_RAW,/dib/UPLOAD_TAR,-builder/dib/files/UPLOAD_RAW,-builder/dib/UPLOAD_TAR} ./
  • 2019-10-15 18:34:11 UTC really disabled zuul-registry prune cronjob on intermediate registry
  • 2019-10-15 15:49:11 UTC Restarted nl01 and nl04 on nodepool==3.8.1.dev38 # git sha 0a010d9 and nl02 and nl03 on nodepool==3.8.1.dev36 # git sha 5a69b6a. The only difference between these commits is a test only change.
  • 2019-10-14 18:15:03 UTC moved /afs/.openstack.org/mirror/debian/db to /root/tmp/ on mirror-update01.openstack.org and invoked reprepro-mirror-update for debian in a root screen session
  • 2019-10-14 17:12:49 UTC moved /afs/.openstack.org/mirror/debian/dists/buster* to /root/tmp/ on mirror-update01.openstack.org and invoked reprepro-mirror-update for debian in a root screen session
  • 2019-10-14 16:03:41 UTC the trove instance for review-dev-MySQL was rebooted around 04:00 utc by the provider for cold migration due to impending host failure, trouble ticket #191014-ord-0000099
  • 2019-10-14 16:03:33 UTC statusbot restarted following its disappearance associated with a ping timeout at 13:15 utc, no indication of problems in its debug log to explain
  • 2019-10-11 21:02:37 UTC restarted all of zuul on commit b768ece2c0ecd235c418fe910b84ff88f69860d6
  • 2019-10-09 23:19:25 UTC manually installed latest unattended-upgrades on bridge.o.o to get it working again (https://bugs.launchpad.net/ubuntu/+source/unattended-upgrades/+bug/1823872)
  • 2019-10-09 23:19:06 UTC restarted stausbot as it was missing in #openstack-infra
  • 2019-10-07 15:48:49 UTC Restarted nodepool-builder on nb01, nb02, and nb03 to pick up image leak fix
  • 2019-10-05 16:00:52 UTC trove instance Etherpad-Dev-MySQL was shut down temporarily by the provider for a proactive cold migration due to an impending host failure, trouble ticket #191005-ord-0000331
  • 2019-10-04 21:18:38 UTC restarted all of zuul at commit e6496faf406529b4003ce7ebaa22eb1f2fa78929
  • 2019-10-04 21:18:22 UTC removed insecure-ci-registry01 from emergency
  • 2019-10-04 16:22:49 UTC nb03 dib_tmp partition was full; cleaned and restarted
  • 2019-10-02 14:54:10 UTC added flwang to #openstack-containers channel with permissions +Aeortv
  • 2019-10-01 13:59:57 UTC deleted feature/graphql branch of openstack/neutron previously at 9ee628dcc0eb61418e86fee355add38a6b09fab9 per https://review.opendev.org/685955
  • 2019-09-30 18:59:44 UTC Rebooted nb02 to deal with 100's of stray dib processes
  • 2019-09-30 18:00:25 UTC Restarted nodepool-builder on nb01 that OOM'd
  • 2019-09-27 22:31:18 UTC logstash-worker11 was rebooted by the provider at 19:12z due to a host problem, trouble ticket #190927-ord-0001006
  • 2019-09-27 16:46:16 UTC docs.starlingx.io cert updated in "hiera"
  • 2019-09-26 22:21:25 UTC removed obsolete /srv/static/tarballs/trove/images/README at trove ptl's request, as they've started publishing images there again so the message about no longer doing so was misleading
  • 2019-09-23 13:15:11 UTC restarted zuul scheduler on commit 80ef01534d90751aa2d9cd3bf4a356fca292bed8
  • 2019-09-21 21:09:14 UTC chat.freenode.net entry in /etc/hosts on eavesdrop has been updated from card.freenode.net (38.229.70.22), which was taken out of rotation and no longer serving clients, to orwell.freenode.net (185.30.166.37); following that, both statusbot and meetbot were restarted
  • 2019-09-21 21:06:52 UTC trove database instance subunit2sql-mysql is being cold-migrated to a new hypervisor host in dfw as of 20:45 utc per provider support ticket #190921-ord-0000519
  • 2019-09-17 20:01:27 UTC all zuul executors restarted on "zuul==3.10.2.dev79 # git sha e4f3f48"
  • 2019-09-17 17:56:36 UTC booted ze01 2019-09-17 17:45z after it entered a shutoff state for unidentified reasons around 2019-09-16 13:10z
  • 2019-09-16 14:40:22 UTC run_all.sh cronjob on bridge uncommented again
  • 2019-09-16 14:39:55 UTC The Gerrit outage portion of the current maintenance is complete and the service is back on line, however reindexing for renamed repositories is still underway and some Zuul job fixes are in the process of being applied
  • 2019-09-16 14:08:10 UTC The Gerrit service on review.opendev.org is offline briefly for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009064.html
  • 2019-09-16 13:30:36 UTC The Gerrit service on review.opendev.org will be offline briefly starting at 14:00 UTC (that's roughly 30 minutes from now) for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-September/009064.html
  • 2019-09-16 13:15:51 UTC disabled run_all.sh on bridge in preparation for project renames maintenance
  • 2019-09-13 17:00:15 UTC kdc04 in rax-ord rebooted at 15:46 for a hypervisor host problem, provider ticket 190913-ord-0000472
  • 2019-09-13 04:00:47 UTC hard-rebooted storyboard-dev due to hung kernel tasks rendering server unresponsive, killed stuck ansible connection to it from bridge to get configuration management to resume a timely cadence
  • 2019-09-12 12:55:28 UTC generated /etc/cron.d/mailman-senddigest-sites on lists.o.o in order to re-enable daily digests being sent
  • 2019-09-12 05:46:19 UTC force merged https://review.opendev.org/681630 "Revert configparser update" to avoid further breakage after it failed gate on it's prior promoted run
  • 2019-09-11 07:18:34 UTC all volumes released and mirror-update.opendev.org returned to operation. for info on the debugging done with fedora volume; see https://lists.openafs.org/pipermail/openafs-info/2019-September/042865.html
  • 2019-09-10 18:28:49 UTC Rebooted and cleaned up /opt/dib_tmp on nb01 and nb02 after their builders stopped running due to OOMs
  • 2019-09-09 23:56:39 UTC mirror-update.<opendev|openstack>.org shutdown during volume recovery. script running in root screen on afs01 unlocking and doing localauth releases on all affected volumes
  • 2019-09-09 17:00:42 UTC Restarted nodepool launchers with OpenstackSDK 0.35.0 and nodepool==3.8.1.dev10 # git sha f2a80ef
  • 2019-09-06 22:23:16 UTC upgraded exim on lists.openstack.org
  • 2019-09-06 18:23:59 UTC restarted afs02.dfw.openstack.org which was unresponsive (suspected live migration issue); manually fixed syntax error in BosConfig files on all fileservers
  • 2019-09-05 21:16:04 UTC Gerrit is being restarted to pick up configuration changes. Should be quick. Sorry for the interruption.
  • 2019-09-05 14:32:54 UTC restarted zuul executors with commit cfe6a7b985125325605ef192b2de5fe1986ef569
  • 2019-09-05 14:32:26 UTC restarted nl04 to deal with apparently stuck keystone session after ovh auth fixed
  • 2019-09-04 16:54:06 UTC Zuul job logs stored in OVH may fail. We have updated the base job to remove OVH from our storage location. If you have POST_FAILURES a recheck should fix them at this point.
  • 2019-09-03 01:17:04 UTC /var/gitea/data/git/.gitconfig.lock removed on gitea01, 05, 06, 07 after suspected i/o issues ~ 2-sep-17:20. gerrit replication restarted for each
  • 2019-09-02 21:16:41 UTC logstash-worker16 was rebooted by the provider at 21:11 utc due to a host problem, per provider ticket 190902-ord-0000585
  • 2019-08-30 00:58:57 UTC logstash-worker16 rebooted by the provider at 00:19 UTC due to a host problem (provider ticket 190829-ord-0001222)
  • 2019-08-28 18:58:50 UTC Restarted ze01-ze12 on zuul==3.10.2.dev47 # git sha 50a0f73 this fixes a bug in Zuul not testing the correct commit in some cases and may improve memory consumption of jobs due to changes in how json documents are manipulated.
  • 2019-08-27 01:45:11 UTC run "/proc/sys/vm/drop_caches" on afs01 & afs02 dfw to reset very high kswapd0 cpu usage
  • 2019-08-26 07:29:22 UTC unlocked mirror.fedora volume after afs release failure @ 2019-08-24T00:44:34
  • 2019-08-22 23:22:58 UTC opensuse mirror (except obs which is still broken) manually updated as of 23:20 utc
  • 2019-08-21 16:52:10 UTC updated diskimage-builder to 2.26.0 on nodepool builders to pick up centos network manager ipv4 fix
  • 2019-08-15 17:54:03 UTC Rebooted nb01.o.o to clear out stale dib mounts
  • 2019-08-15 15:40:36 UTC Restarted all nodepool builders to pick up openstacksdk 0.34.0
  • 2019-08-12 10:36:14 UTC restarted gerritbot after it suffered a ping timeout at 08:45 utc
  • 2019-08-09 20:19:28 UTC restarted all of zuul on commit a2018c599a364f316a682da64c775d390b054acd
  • 2019-08-09 00:04:49 UTC ran "pip3 install --upgrade --upgrade-strategy=only-if-needed pyfakefs" on all executors to upgrade to most recent pyfakefs release to fix ara static html generation
  • 2019-08-08 15:06:11 UTC shut down zuul-preview and added zp01 to emergency disable due to a suspected misconfiguration allowing it to act as an open proxy
  • 2019-08-06 04:15:30 UTC logs.openstack.org log volume offline; /srv/static/logs symlinked to /opt/tmp_srv_static_logs and fsck running in a screen
  • 2019-08-04 12:32:20 UTC log publishing is working again, you can recheck your jobs failed with "retry_limit"
  • 2019-08-04 05:41:17 UTC Our CI system has problems uploading job results to the log server and thus all jobs are failing. Do not recheck jobs until the situation is fixed.
  • 2019-08-02 06:33:02 UTC afs servers restarted without logging as kafs server currently out of rotation
  • 2019-08-01 14:04:24 UTC Force-Merged openstack/tempest master: Revert "Use memcached based cache in nova in all devstack-tempest jobs" https://review.opendev.org/673784 as requested by gmann to unblock other projects
  • 2019-08-01 06:07:08 UTC restarted ze02 to get log streaming working
  • 2019-07-30 15:57:39 UTC Increased AFS quota for /afs/.openstack.org/mirror/ubuntu and /afs/.openstack.org/mirror/centos by 50GB each.
  • 2019-07-30 06:34:25 UTC rebooted elasticsearch02 to clear old hung puppet processes
  • 2019-07-30 00:41:22 UTC nb02 rebooted after stuck processes sent nodepool-builder into deadlock
  • 2019-07-25 08:43:38 UTC The problem in our cloud provider has been fixed, services should be working again
  • 2019-07-25 08:36:34 UTC Services at opendev.org like our git server and at openstack.org are currently down, looks like an outage in one of our cloud providers.
  • 2019-07-25 07:58:27 UTC sent email update about opendev.org downtime, appears to be vexxhost region-wide http://lists.openstack.org/pipermail/openstack-infra/2019-July/006426.html
  • 2019-07-25 06:53:27 UTC The git service on opendev.org is currently down.
  • 2019-07-23 20:23:45 UTC openstack/doc8 in github has been transferred to the PyCQA organization
  • 2019-07-23 13:22:22 UTC manually disabled http and https backends for missing gitea01 in haproxy
  • 2019-07-23 13:22:10 UTC restarted statusbot after a 07:20z ctcp ping timeout
  • 2019-07-22 23:25:50 UTC deleted gitea01.opendev.org instance from vexxhost-sjc1 in preparation for replacement
  • 2019-07-22 06:27:58 UTC logs.openstack.org volume has been restored. please report any issues in #openstack-infra
  • 2019-07-22 06:08:32 UTC Due to a failure on the logs.openstack.org volume, old logs are unavailable while partition is recovered. New logs are being stored. ETA for restoration probably ~Mon Jul 22 12:00 UTC 2019
  • 2019-07-22 04:02:27 UTC logs.o.o : /srv/static/logs bind mounted to /opt for buffering, recovery of /dev/mapper/logs proceeding in a root screen session
  • 2019-07-19 15:09:46 UTC restarted all of zuul on 3d9498f78db956cee366e3f5b633cf37df4e5bfa
  • 2019-07-18 18:20:11 UTC recovered corrupt https://etherpad.openstack.org/p/security-sig-newsletter content and restored it at the old name after moving the corrupt pad to "broken-security-sig-newsletter"
  • 2019-07-18 17:58:45 UTC deleted stale networking-mlnx repository mirror from the openstack organization on github at lennyb's request
  • 2019-07-16 00:19:33 UTC touched /home/gerrit2/review_site/etc/GerritSiteHeader.html after merge of Id0cd8429ee5ce914aebbbc4a24bef9ebf675e21c
  • 2019-07-15 13:07:41 UTC restarted gerritbot after freenode disruption/disconnect at 06:50-07:19
  • 2019-07-09 23:15:33 UTC restarted all of zuul at commit 86f071464dafd584c995790dce30e2e3ca98f5ac
  • 2019-07-09 18:53:55 UTC manually deleted 30 leaked nodepool images from vexxhost-sjc1
  • 2019-07-08 14:07:19 UTC restarted all of zuul on commit 5b851c14f2bd73039748fca71b5db3b05b697f7f
  • 2019-07-04 17:43:01 UTC started mirror.iad.rax.opendev.org which entered a shutoff state 2019-07-04T17:26:01Z associated with a fs-cache assertion failure kernel panic
  • 2019-07-03 15:58:49 UTC Restarted review.opendev.org's Gerrit service to restore git repo replication which appears to have deadlocked.
  • 2019-07-03 01:25:53 UTC post https://review.opendev.org/667782 rsync cron jobs commented out on mirror-update.openstack.org
  • 2019-07-02 23:46:03 UTC mirror.ord.rax.opendev.org rebooted and running openafs 1.8.3; should fix up periodic "hash mismatch" errors seen in jobs
  • 2019-07-02 06:05:11 UTC restarted afs file servers without auditlog. see notes in https://etherpad.openstack.org/p/opendev-mirror-afs for log recent log bundle
  • 2019-07-01 22:31:54 UTC restarted all of zuul on commit c5090244dc608a1ef232edded5cf92bf753dbb12
  • 2019-07-01 03:58:48 UTC AFS auditlog enabled for afs01/02.dfw and afs01.ord AFS servers. logging to /opt/dafileserver.audit.log. notes, including details on how to disable again @ https://etherpad.openstack.org/p/opendev-mirror-afs
  • 2019-06-28 13:13:23 UTC deleted /afs/.openstack.org/docs/tobiko at slaweq's request as a member of https://review.opendev.org/#/admin/groups/tobiko-core
  • 2019-06-28 02:46:55 UTC afs01/02.dfw & afs01.ord restarted with greater -cb values: see https://review.opendev.org/668078
  • 2019-06-27 22:12:16 UTC Gitea06 had a corrupted root disk around the time of the Denver summit. It has been replaced with a new server and added back to the haproxy config.
  • 2019-06-27 19:40:42 UTC deleted https://github.com/openstack/tobiko at slaweq's request as a member of https://review.opendev.org/#/admin/groups/tobiko-core
  • 2019-06-27 15:28:31 UTC mirror.iad.rax.opendev.org started again at 15:24z after mysteriously entering shutoff state some time after 15:11z
  • 2019-06-27 13:11:52 UTC changed chat.freenode.net alias on eavesdrop.o.o from 162.213.39.42 to 38.229.70.22 and restarted openstack-meetbot
  • 2019-06-27 09:40:09 UTC switched irc host on eavesdrop.openstack.org as last one went unresponsive; rebooted host for good measure
  • 2019-06-25 16:08:39 UTC CORRECTION: restarted zuul scheduler on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)
  • 2019-06-25 16:05:05 UTC restarted all of zuul on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)
  • 2019-06-21 15:55:10 UTC mirror.iad.rax.opendev.org restarted at 15:46 utc for host migration
  • 2019-06-21 13:20:45 UTC restarted hound on codesearch.o.o due to persistent "too many open files" errors
  • 2019-06-20 15:00:35 UTC manually deleted instance 4bbfd576-baa1-410f-8384-95c7fac8475b in ovh bhs1; it has a stale node lock from zuul which will be released at the next scheduler restart (for some reason it has lost its ip addresses too, no idea if that's related)
  • 2019-06-19 22:58:45 UTC apache restarted on mirror.iad.rax.opendev.org at 22:16 utc, clearing stale content state
  • 2019-06-18 14:49:38 UTC ran "modprobe kafs rootcell=openstack.org:104.130.136.20:23.253.200.228" and "mount -t afs "#openstack.org:root.afs." /afs" on mirror01.iad.rax.opendev.org after reboot
  • 2019-06-18 14:33:24 UTC mirror.iad.rax.opendev.org started again at 14:10z after mysteriously entering shutoff state at 10:00z
  • 2019-06-14 20:32:26 UTC Updated static.openstack.org (as well as security, specs, tarballs, service-types, governance, releases) ssl cert as part of normal refresh cycle
  • 2019-06-14 14:51:24 UTC restarted all of zuul on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)
  • 2019-06-13 23:17:45 UTC developer.openstack.org docs.openstack.org ethercalc.openstack.org etherpad.openstack.org firehose.openstack.org git.openstack.org git.starlingx.io openstackid-dev.openstack.org openstackid.org refstack.openstack.org review.openstack.org storyboard.openstack.org translate.openstack.org wiki.openstack.org zuul.openstack.org ssl certs updated as part of normal refresh cycle.
  • 2019-06-13 19:00:41 UTC Updated ask.openstack.org's ssl cert as part of regular refresh cycle.
  • 2019-06-13 16:04:23 UTC restarted all of zuul on 7e45f84f056b3fa021aae1eecb0c23d9055656f3 (with repl change cherry-picked onto scheduler)
  • 2019-06-13 15:58:38 UTC Restarted nodepool launchers on commit 3412764a985b511cdc6b70dc801ffdb357ec02c2
  • 2019-06-12 03:44:26 UTC mirror01.iad.rax.opendev.org in emergency file, and serving mirror files via kafs which is currently hand-configured; see https://etherpad.openstack.org/p/opendev-mirror-afs
  • 2019-06-11 19:37:20 UTC flushed /afs/openstack.org/mirror/ubuntu/dists/xenial-{security,updates}/main/binary-amd64/Packages.gz on mirror.dfw.rax
  • 2019-06-10 16:07:00 UTC Downgraded openstacksdk to 0.27.0 on nb01 to test if the revert fixes rackspace image uploads
  • 2019-06-09 11:52:19 UTC restarted gerritbot now, as it seems to have silently dropped off freenode 2019-06-08 21:40:43 utc with a 248 second ping timeout
  • 2019-06-08 20:55:29 UTC vos release completed and cron locks released for mirror.fedora, mirror.ubuntu and mirror.ubuntu-ports
  • 2019-06-07 22:23:02 UTC fedora, ubuntu, ubuntu-ports mirrors are currently resyncing to afs02.dfw and won't update again until that is finished
  • 2019-06-07 21:36:38 UTC Exim on lists.openstack.org/lists.opendev.org/lists.starlingx.io/lists.airshipit.org is now enforcing spf failures (not soft failures). This means if you send email from a host that isn't allowed to by the spf record that email will be rejected.
  • 2019-06-07 21:19:56 UTC Performed a full zuul service restart. This reset memory usage (we were swapping), installed the debugging repl, and gives us access to ansible 2.8. Scheduler is running ce30029 on top of e0c975a and mergers + executors are running 00d0abb
  • 2019-06-07 20:11:59 UTC filed a removal request from the spamhaus pbl for the ip address of the new ask.openstack.org server
  • 2019-06-07 15:04:28 UTC deleted ~6k messages matching 'From [0-9]\+@qq.com' in /srv/mailman/openstack/qfiles/in/ on lists.o.o
  • 2019-06-07 14:38:18 UTC removed files02 from emergency file
  • 2019-06-06 23:25:00 UTC added files02.openstack.org to emergency file due to recent system-config changes breaking apache config
  • 2019-06-06 20:56:49 UTC rebooted afs02.dfw.openstack.org following a cinder volume outage for xvdc
  • 2019-06-05 16:36:26 UTC deleted old 2012.2 release from https://pypi.org/project/horizon/
  • 2019-06-03 14:23:26 UTC resized proxycache volume from 64 to 120GB on rax.dfw.opendev.org because it was 100% used (other mirrors report ~80G used)
  • 2019-05-31 19:44:10 UTC Performed repo renames. Some were cleanup from opendev migration and others were normal reorgs.
  • 2019-05-31 17:14:28 UTC manually removed nova 2013.1 release from pypi
  • 2019-05-31 16:48:57 UTC Gerrit is back up and running again. Thank you for your patience and sorry for the delay in this notification (we thought the statusbot was still busy updating channel topics).
  • 2019-05-31 15:11:05 UTC Gerrit is now entering its maintenance window. Expect Gerrit outages in the near future. We will notify when it is back up and running.
  • 2019-05-31 14:34:07 UTC The Gerrit service at https://review.opendev.org/ will be offline briefly for maintenance starting at 15:00 UTC (roughly 30 minutes from now); for details see http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006684.html
  • 2019-05-29 14:15:17 UTC enabled low query log on storyboard with long query time of 1.0 seconds
  • 2019-05-23 00:00:53 UTC zuul scheduler was restarted at 20:41z
  • 2019-05-21 20:14:45 UTC upgraded skopeo on all zuul executors to 0.1.37-1~dev~ubuntu16.04.2~ppa2
  • 2019-05-21 16:29:42 UTC blocked all traffic to wiki.openstack.org from 61.164.47.194 using an iptables drop rule to quell a denial of service condition
  • 2019-05-21 06:39:51 UTC ask.openstack.org migrated to new xenial server
  • 2019-05-20 19:26:52 UTC Deleted git.openstack.org and git01-git08.openstack.org servers. These cgit cluster hosts have been replaced with the gitea cluster.
  • 2019-05-20 07:20:10 UTC removed ask-staging-old and ask-staging01 servers, and all related dns entries (see https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup)
  • 2019-05-17 20:18:35 UTC Transferred Airship repos from https://github.com/openstack to https://github.com/airshipit according to list at https://www.irccloud.com/pastebin/hBEV5zSm/
  • 2019-05-17 16:44:47 UTC Removed /var/lib/docker-old on gitea02-05,07-08 to free disk space and improve disk headroom on those servers. This was not required on 01 and 06 as the dir did not exist. Note the dir was created when we recovered from the accidental k8sification of our servers and should no longer be needed.
  • 2019-05-17 15:44:33 UTC applied opendev migration renames to the storyboard-dev database to stop puppet complaining
  • 2019-05-16 17:38:27 UTC Gerrit is being restarted to add gitweb links back to Gerrit. Sorry for the noise.
  • 2019-05-16 16:40:00 UTC Restarted zuul-scheduler to pick up a fix for zuul artifact handling that affected non change object types like tags as part of release pipelines. Now running on ee4b6b1a27d1de95a605e188ae9e36d7f1597ebb
  • 2019-05-16 15:57:39 UTC temporarily upgraded skopeo on all zuul executors to manually-built 0.1.36-1~dev~ubuntu16.04.2~ppa19.1 package with https://github.com/containers/skopeo/pull/653 applied
  • 2019-05-16 14:37:46 UTC deleted groups.openstack.org and groups-dev.openstack.org servers, along with their corresponding trove instances
  • 2019-05-16 14:08:15 UTC temporarily upgraded skopeo on ze01 to manually-built 0.1.36-1~dev~ubuntu16.04.2~ppa19.1 package with https://github.com/containers/skopeo/pull/653 applied
  • 2019-05-09 20:24:19 UTC corrected zuul:zuul directory (/var/lib/zuul/keys/secrets/project/gerrit) permissions on zuul.o.o
  • 2019-05-08 14:30:52 UTC set 139.162.227.51 chat.freenode.net in /etc/hosts on eavesdrop01
  • 2019-05-07 23:28:09 UTC If your jobs failed due to connectivity issues to opendev.org they can be rechecked now. Services have been restored at that domain.
  • 2019-05-07 23:24:22 UTC Ansible cron disabled on bridge until we remove the k8s management from run_all.sh. This is necessary to keep k8s installations from breaking standalone docker usage.
  • 2019-05-03 14:53:22 UTC Disabled gitea06 backends in gitea-lb01 haproxy because gitea06 has a sad filesystem according to dmesg and openstack/requirements stable/stein was affected.
  • 2019-05-03 01:15:57 UTC bumped fedora.mirror quota to 800000000 k
  • 2019-05-02 15:06:00 UTC Gerrit is being restarted to pick up a (gitweb-related) configuration change
  • 2019-05-01 17:34:49 UTC Nodepool launchers restarted on commit f58a2a2b68c58f9626170795dba10b75e96cd551 to pick up memory leak fix
  • 2019-04-29 17:53:27 UTC restarted Zuul scheduler on zuul==3.8.1.dev29 # git sha 7e29b8a to pick up the fix for our memory leak
  • 2019-04-26 16:04:35 UTC restarted zuul scheduler at 6afa22c9949bbe769de8e54fd27bc0aad14298bc with a local revert of 3704095c7927568a1f32317337c3646a9d15769e to confirm it is cause of memory leak
  • 2019-04-24 16:54:25 UTC Moved /var/lib/planet/openstack aside on planet.o.o so that puppet will reclone it using the correct remote. This should fix planet publishing updates.
  • 2019-04-24 16:16:15 UTC Restarted nodepool launchers on commit f8ac79661a8d2e2ee980e511c9bdf19a41d156f8
  • 2019-04-23 19:32:53 UTC restarted all of Zuul at commit 6afa22c9949bbe769de8e54fd27bc0aad14298bc due to memory leak
  • 2019-04-23 19:10:55 UTC the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically
  • 2019-04-20 21:19:00 UTC re-enabled run_all.sh cron
  • 2019-04-20 03:51:35 UTC The OpenDev Gerrit and Zuul are back online; you may need to update your git remotes for projects which have moved.
  • 2019-04-19 15:07:37 UTC Gerrit is offline for several hours starting at 15:00 UTC to perform the opendev migration; see http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005011.html
  • 2019-04-17 04:53:38 UTC grafana02.openstack.org is now puppeting again with some manual intervention to install the grafana repo gpg key. this key should not change and can remain as is. will need a new version of puppetlabs-apt when puppet4 transition is complete to be fully automatic again. host running grafana 6.1.4
  • 2019-04-16 13:06:23 UTC nodepool-launcher was not running on nl04 due to OOM. Restarted
  • 2019-04-16 03:02:39 UTC Restarted zuul-scheduler and zuul-web on commit 0bb220c
  • 2019-04-16 02:17:29 UTC started re-replication of all projects on review.openstack.org to bring everything in sync due to some missing references on git servers; likely related to one-off replication reconfiguration previously
  • 2019-04-15 23:20:07 UTC restarted nodepool-launcher on nl01 and nl02 due to OOM; restarted n-l on nl03 due to limited memory
  • 2019-04-15 22:04:56 UTC Deleted clarkb-test-bridge-snapshot-boot (b1bbdf16-0669-4275-aa6a-cec31f3ee84b) and clarkb-test-lists-upgrade (40135a0e-4067-4682-875d-9a6cec6a999b) as both tasks they were set up to test for have been completed
  • 2019-04-12 19:17:42 UTC Upgraded lists.openstack.org from trusty to xenial
  • 2019-04-12 16:57:48 UTC Pre lists.openstack.org snapshot completed and is named lists.openstack.org-20190412-before-xenial-upgrade
  • 2019-04-10 19:06:54 UTC Restarting Gerrit on review.openstack.org to pick up new configuration for the replication plugin
  • 2019-04-10 08:15:42 UTC successfully submitted a request to remove 104.130.246.32 (review01.openstack.org) from Spamhaus PBL
  • 2019-04-09 18:52:16 UTC deleted old trusty-based openstackid-dev.openstack.org and openstackid.org servers now that the xenial-based replacements have been operating successfully in production for a while
  • 2019-04-09 07:39:14 UTC mirror01.nrt1.arm64ci.openstack.org currently offline pending (hopeful) recovery
  • 2019-04-04 13:56:59 UTC deleted nova instance "mirror01.la1.citycloud.openstack.org" and cinder volume "newvolume" from citycloud-la1 as that region is being taken permanently offline
  • 2019-04-03 18:14:13 UTC Added PyPi user 'spencer' as owner on PyPi package 'nimble' as we werent using the name and were asked nicely to give it to another group.
  • 2019-04-03 17:22:16 UTC Nodepool builders restarted with new code for cleaning up leaked DIB builds
  • 2019-04-03 14:29:49 UTC added Trinh Nguyen (new telemetry PTL) to gerrit groups {ceilometer,aodh,panko}-{core,stable-maint} http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004486.html
  • 2019-04-02 02:07:05 UTC restarted gerritbot as it seems to have dropped out
  • 2019-03-27 17:45:35 UTC deleted openstack-infra/jenkins-job-builder branch stable/1.6 previously at c9eb8936f3fbb00e7edfa749f6379f531bbf3b1d as requested by zbr
  • 2019-03-27 02:12:06 UTC renewed git.zuul-ci.org cert (which had expired)
  • 2019-03-26 14:22:40 UTC deleted 2012.2 from the cinder releases on pypi.org per http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004224.html
  • 2019-03-25 09:46:28 UTC cleared out dib_tmp, leaked images and rebooted nb01/02/03 (all had full /opt)
  • 2019-03-22 15:21:05 UTC deactivated duplicate gerrit accounts 19705, 23817, 25259, 25412 and 30149 at the request of lmiccini
  • 2019-03-21 23:08:31 UTC restarted zuul executors at git commit efae4deec5b538e90b88d690346a58538bd5cfff
  • 2019-03-21 14:53:29 UTC frickler force-merging https://review.openstack.org/644842 in order to unblock neutron after ansible upgrade
  • 2019-03-21 10:48:33 UTC restarted gerritbot which seems to have dropped out at 08:24:30,089 and not recovered after that
  • 2019-03-20 14:20:25 UTC restarted mysql and apache2 services on storyboard01.opendev.org to investigate cache memory pressure
  • 2019-03-19 21:11:31 UTC restarted all of zuul at commit 77ffb70104959803a8ee70076845c185bd17ddc1
  • 2019-03-19 01:14:21 UTC volumes on backup01.ord.rax.ci.openstack.org "rotated" and new backups now going to a fresh volume. See #644457 for notes for future rotations
  • 2019-03-18 22:31:13 UTC modified jeepyb files on review01.o.o to debug why manage-projects isn't setting retired project acls. I have since restored those file contents and `diff /opt/jeepyb/jeepyb/cmd/manage_projects.py /usr/local/lib/python2.7/dist-packages/jeepyb/cmd/manage_projects.py` shows no delta.
  • 2019-03-15 19:43:55 UTC Upgraded pyopenssl and cryptography on eavesdrop.openstack.org to work around https://tickets.puppetlabs.com/browse/PUP-8986 after the puppet 4 upgrade on this host
  • 2019-03-14 23:07:49 UTC bridge.o.o resized to a 8gb instance
  • 2019-03-13 22:47:48 UTC Deleted tag debian/1%2.1.2-1 in openstack/deb-python-mistralclient to workaround gitea bug with % in ref names. There was already a debian/2.1.2-1 replacement tag pointing to the same ref.
  • 2019-03-13 22:47:09 UTC Deleted tag debian/1%3.1.0-3 in openstack/deb-python-swiftclient to workaround gitea bug with % in ref names. There was already a debian/3.1.0-3 replacement tag pointing to the same ref.
  • 2019-03-13 22:46:18 UTC Added tag debian/1.6.0-2 to openstack/python-deb-oslotest as replacement for deleted tag debian/1%1.6.0-2 to workaround a gitea bug with % in ref names
  • 2019-03-13 21:50:37 UTC Upgraded afsdb01 and afsdb02 servers to Xenial from Trusty.
  • 2019-03-13 19:12:40 UTC Replaced deb-oslotest's debian/1%1.6.0-2 tag as the % is making gitea unhappy and the project is retired. Gitea bug: https://github.com/go-gitea/gitea/issues/6321 New tag: debian/1.6.0-2
  • 2019-03-13 18:22:46 UTC snapshotted and increased the paste-mysql-5.6 trove instance from 5gb disk/2gb ram to 10gb disk/4gb ram
  • 2019-03-12 18:18:27 UTC Rebooted refstack.openstack.org via openstack api as it was in a shutdown state
  • 2019-03-12 17:13:07 UTC changed intermediate registry password in bridge hostvars
  • 2019-03-12 16:26:37 UTC restarted meetbot and statusbot at ~1540z, switching them from card.freenode.net to niven.freenode.net as the former was taken out of rotation
  • 2019-03-12 07:06:43 UTC graphite-old.openstack.org server, volumes & dns records removed (replaced by graphite.opendev.org)
  • 2019-03-11 23:01:09 UTC Upgraded afs01.dfw, afs02.dfw, and afs01.ord to Xenial from Trusty
  • 2019-03-11 21:32:17 UTC restarted zuul executors for security fix 5ae25f0
  • 2019-03-08 17:57:40 UTC restarted all of zuul at commit 603ce6f474ef70439bfa3adcaa27d806c23511f7
  • 2019-03-06 09:18:38 UTC manually removed /var/log/exim4/paniclog on new g*.opendev.org servers after crosschecking the contents to reduce spam
  • 2019-03-05 17:12:35 UTC Gerrit is being restarted for a configuration change, it will be briefly offline.
  • 2019-03-05 14:06:52 UTC houndd restarted on codesearch to correct "too many open files" error
  • 2019-03-05 02:02:46 UTC afs02.dfw rebooted after hanging, likely related to outage for main02 block device on 2019-03-02
  • 2019-03-04 17:55:07 UTC restarted zuul at commit d298cb12e09d7533fbf161448cf2fc297d9fd138
  • 2019-03-04 17:25:11 UTC restarted nodepool launchers and builder at commit 3561e278c6178436aa1d8d673f839a676598ea17
  • 2019-03-04 05:14:58 UTC graphite.opendev.org now active replacement for graphite.openstack.org. everything on the firewall list that might need a restart to pickup new address has been done
  • 2019-03-04 00:36:22 UTC graphite.o.o A/AAAA records renamed to graphite-old.o.o, graphite.o.o now a CNAME to these until switch to graphite.opendev.org
  • 2019-02-27 22:05:01 UTC Removed kdc01.openstack.org and puppetmaster.openstack.org A/AAAA records from DNS
  • 2019-02-27 22:00:02 UTC Deleted Old kdc01.openstack.org (859d5e9c-193c-4c1b-8cb3-4da8316d060c) as it has been replaced by kdc03.openstack.org
  • 2019-02-27 21:30:51 UTC Deleted Old health.openstack.org (9adaa457-16ab-48ea-9618-54af6edd798b) as it has been replaced by health01.openstack.org
  • 2019-02-26 16:17:25 UTC restarted the gerritbot service on review01 to resolve its 12:30:30 ping timeout
  • 2019-02-25 21:07:46 UTC kdc03 is now our kerberos master/admin server. kdc01 is not yet deleted but is not running any services or the propogation cron. Will cleanup kdc01 after a day or two of happy afs services.
  • 2019-02-24 23:04:29 UTC leaked images and temp dirs cleared from nb01/nb02.o.o; reboots to clear orphaned mounts from failed builds, both have plenty of disk now and are building images
  • 2019-02-22 14:32:54 UTC deleted unused nb03.openstack.org/main01 cinder volume from vexxhost ca-ymq-1
  • 2019-02-22 14:30:55 UTC deleted old mirror01.ca-ymq-1.vexxhost.openstack.org server, long since replaced by mirror02
  • 2019-02-21 17:06:41 UTC (dmsimard) nb03 was found out of disk space on /opt, there is now 120GB available after cleaning up leaked images
  • 2019-02-20 19:25:23 UTC deleted openstack/cinderlib project and cinder project group (along with the associated group mapping entry) from storyboard.openstack.org's backend database
  • 2019-02-19 23:53:22 UTC Restarted zuul-scheduler at 5271b592afe708d33fc4b3d08d9a2cc97ae0ddfc and zuul mergers + executors at 6bc25035dd8a41c0522fbe43d149303c426cbc5a.
  • 2019-02-18 16:37:35 UTC Deleted Trusty pbx.openstack.org (038e80f5-15aa-4f69-8c6c-0f43b3587778) as new Xenial pbx01.opendev.org is up and running
  • 2019-02-18 15:42:48 UTC according to ovh, there was a ceph outage which affected the rootfs for our gra1 mirror there on 2018-01-27 between 11:20 and 16:20 utc
  • 2019-02-16 18:04:29 UTC old storyboard.openstack.org server and storyboard-mysql trove database snapshotted and deleted
  • 2019-02-15 23:28:38 UTC pbx01.opendev.org now hosting pbx.openstack.org. Old server to be deleted next week if all looks well then.
  • 2019-02-15 20:05:24 UTC The StoryBoard service on storyboard.openstack.org is offline momentarily for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-February/002666.html
  • 2019-02-14 22:15:57 UTC The test cloud region using duplicate IPs has been removed from nodepool. Jobs can be rechecked now.
  • 2019-02-14 21:33:54 UTC Jobs are failing due to ssh host key mismatches caused by duplicate IPs in a test cloud region. We are disabling the region and will let you know when jobs can be rechecked.
  • 2019-02-14 00:41:17 UTC manually ran "pip3 install kubernetes==9.0.0b1" on bridge to see if newer version avoids deadlock on k8s api calls
  • 2019-02-13 02:14:59 UTC zuul scheduler restarted for debug logging enhancements
  • 2019-02-13 01:34:58 UTC wrt per prior update, removed acme-opendev.org from openstackci user's domains in rax
  • 2019-02-12 23:01:23 UTC for testing purposes i have registered acme-opendev.org and setup openstackci account rax clouddns with it. this is for testing rax api integration without me worrying about wiping out openstack.org by accident
  • 2019-02-12 22:26:59 UTC restarted zuul-web at commit 6d6c69f93e9755b3b812c85ffceb1b830bd75d6f
  • 2019-02-12 15:31:02 UTC Set new github admin account to owner on the openstack-infra org.
  • 2019-02-12 14:48:27 UTC replaced openstackid-dev.openstack.org address records with a cname to openstackid-dev01.openstack.org
  • 2019-02-11 23:21:48 UTC restarted all of zuul with git sha 5957d7a95e677116f39e52c2a44d4ca8b795da34; ze08 is in the disabled list and configured to use jemalloc
  • 2019-02-11 21:07:21 UTC pbx.openstack.org's POTS number updated (see wiki for new number) due to an account shuffle.
  • 2019-02-11 18:22:55 UTC Installed libjemalloc1 on ze08.openstack.org to experiment with alternate malloc implementations. Other executors will act as controls
  • 2019-02-07 20:02:08 UTC Cleaned up Elasticsearch indexes from the future. One was from the year 2106 (job logs actually had those timestamps) and others from November 2019. Total data was a few megabytes.
  • 2019-02-06 21:41:25 UTC Restarted Geard on logstash01 and log processing workers on logstash-workerXX. Geard appeared to be out to lunch with a large queue and workers were not reconnecting on their own.
  • 2019-02-06 19:41:55 UTC Nodepool builders all restarted on commit 6a4a8f with new build timeout feature
  • 2019-02-06 17:29:32 UTC Any changes failed around 16:30 UTC today with a review comment from Zuul like "ERROR Unable to find playbook" can be safely rechecked; this was an unanticipated side effect of our work to move base job definitions between configuration repositories.
  • 2019-02-03 14:50:17 UTC Puppet on etherpad-dev01 failing due to broken hieradata lookups with puppet 4. Fix is https://review.openstack.org/634601
  • 2019-01-30 20:48:56 UTC kata-containers zuul tenant in production, along with the opendev/base-jobs repo
  • 2019-01-30 20:48:18 UTC inap-mtl01 upgraded from neutron mitaka to queens; orphaned ports removed
  • 2019-01-29 23:15:30 UTC http://zuul.openstack.org is not working. https://zuul.openstack.org does work. Please use that while we investigate.
  • 2019-01-28 22:18:36 UTC Deleted old 2014.* Sahara releases from pypi so that modern releases are sorted properly. Old sahara releases are available on the tarballs server.
  • 2019-01-28 20:58:06 UTC Cleaned up leaked images on nb01, nb02, and nb03 to free up disk space for dib.
  • 2019-01-23 20:52:47 UTC deactivated defunct gerrit account 28694 at the request of todin
  • 2019-01-23 16:54:32 UTC restarted zuul at 9e679eadedf2b64955b0511cada91018a1a0e30a
  • 2019-01-22 23:30:16 UTC cleared all nomail[B] subscription flags on openstack-discuss
  • 2019-01-22 23:03:21 UTC disabled automatic bounce processing on openstack-discuss so that we can investigate dmarc issues without everyone having their subscription disabled
  • 2019-01-21 19:30:46 UTC restarted zuul at 691b1bc17c77ebce5b2a568e586d19b77cebbc7b
  • 2019-01-21 19:19:12 UTC The error causing post failures on jobs has been corrected. It is safe to recheck these jobs.
  • 2019-01-20 17:06:05 UTC switched storyboard-dev.o.o to using a local database service instead of trove
  • 2019-01-20 16:13:25 UTC replaced expired zuul-ci.org ssl cert (and associated key/chain) due to unanticipated expiration
  • 2019-01-19 14:48:23 UTC deleted the old storyboard-dev.openstack.org server now that storyboard-dev01.opendev.org has been serving that vhost for a few days
  • 2019-01-18 17:09:33 UTC Upgraded review.openstack.org Gerrit to 2.13.12.ourlocalpatches. Please keep an eye on Gerrit for any unexpected behavior but initial indications look good.
  • 2019-01-18 15:59:33 UTC Cleaned up review.openstack.org:/home/gerrit2/review_site/lib in preparation for the Gerrit 2.13.12 upgrade via https://review.openstack.org/#/c/631346/1
  • 2019-01-18 07:58:55 UTC restarted gerritbot as it had dropped out
  • 2019-01-17 23:39:08 UTC On status.o.o apt-get removed python-httplib2 python-launchpadlib and python-lazr.restful client then reinstalled /opt/elastic-recheck with pip which resulted in Successfully installed elastic-recheck-0.0.1.dev2210 httplib2-0.12.0 launchpadlib-1.10.6 lazr.restfulclient-0.14.2. This to work around newer pip refusing to touch distutil installed packages.
  • 2019-01-16 19:52:03 UTC Submitted PTG attendace survey for the Infra team. Requested 2 days for ~10 people but mentioned we can be flexible. Hope to see you there.
  • 2019-01-16 19:15:56 UTC removed pabelanger admin user from openstack orgs (all) on github.com
  • 2019-01-15 23:39:15 UTC storyboard-dev.openstack.org dns updated to cname to storyboard-dev01.opendev.org
  • 2019-01-15 14:39:20 UTC restarted zuul executors and mergers to pick up git connection config fix
  • 2019-01-15 00:01:18 UTC restarted all of zuul with commit sha 67ef71d2a2d6b5b06e2355eefff00ae3df24bbf7
  • 2019-01-14 19:24:47 UTC updated openid provider for wiki.openstack.org from login.launchpad.net to login.ubuntu.com
  • 2019-01-10 17:30:16 UTC restarted gerrit for updated replication config
  • 2019-01-09 22:23:26 UTC removed ns3.openstack.org A 166.78.116.117 and ns3.openstack.org AAAA 2001:4801:7825:103:be76:4eff:fe10:4f7a as these IPs don't seem to belong to us anymore.
  • 2019-01-09 22:07:08 UTC deleted f8c9e5d6-818f-4168-a53e-414c7e3ccb34 adns1.openstack.org in ORD, af56cafc-6a3d-4ffb-b443-932ece962673 ns1.openstack.org in DFW, and c4f203c0-5315-45ac-ab11-dbce5bd33d67 ns2.openstack.org in ca-ymq-1. DNS now hosted by opendev nameservers
  • 2019-01-08 17:03:11 UTC changed vexxhost openstackci password
  • 2019-01-08 15:43:22 UTC restarted bind9 service on adns1.opendev.org in order to get it to properly recognize and re-sign the updated zuulci.org zone
  • 2019-01-07 23:51:03 UTC The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly.
  • 2019-01-07 22:41:56 UTC nl02 has been removed from the emergency maintenance list now that the filesystems on mirror02.regionone.limestone have been repaired and checked out
  • 2019-01-07 21:15:35 UTC generated and added dnssec keys for zuulci.org to /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o
  • 2019-01-07 19:41:32 UTC mirror02.regionone.limestone.openstack.org's filesystem on the additional cinder volume went read only for >1 week (total duration unknown) causing errors when apache was attempting to update it's cache files.
  • 2019-01-07 19:32:29 UTC temporarily lowered max-servers to 0 in limestone-regionone in preparation for a mirror instance reboot to clear a cinder volume issue
  • 2019-01-04 22:26:20 UTC restarted gerrit to restart gitea replication thread
  • 2019-01-04 21:07:14 UTC updated zuul-ci.org domain registration to use ns[12].opendev.org as nameservers
  • 2019-01-03 23:57:28 UTC restarted zuul scheduler at 2fd688352f5e220fda0dfc72b164144910670d95
  • 2019-01-02 22:47:32 UTC Restarted nodepool launchers nl01-nl04 to pick up hypervisor host id logging and update openstacksdk. Now running nodepool==3.3.2.dev88 # git sha f8bf6af and openstacksdk==0.22.0
  • 2019-01-02 22:36:58 UTC restarted all of zuul at commit 4540b71
  • 2019-01-02 22:16:05 UTC restarted gerrit to clear stuck gitea replication task
  • 2018-12-21 22:58:29 UTC the gerrit service on review.openstack.org is being restarted to pick up new configuration changes, and will return momentarily
  • 2018-12-21 22:54:52 UTC the gerrit service on review.openstack.org is being restarted to pick up new configuration changes, and will return momentarily
  • 2018-12-21 15:00:59 UTC approved changes 626391, 1626392, 626633, 626393 which expand puppet node definitions and ansible hostgroup patterns to also match opendev.org hostnames
  • 2018-12-20 13:07:54 UTC filed to exclude storyboard.openstack.org from spamhaus pbl
  • 2018-12-19 22:19:08 UTC added openstackadmin account to the following additional github orgs: gtest-org, openstack-ci, openstack-dev, stackforge, openstack-infra, openstack-attic, stackforge-attic
  • 2018-12-19 20:57:05 UTC deleted old puppetmaster.openstack.org and review.openstack.org servers in rackspace dfw after creating final snapshots
  • 2018-12-19 20:52:10 UTC converted opendevorg user on dockerhub to an organization owned by openstackzuul
  • 2018-12-19 20:23:06 UTC deleted unattached eavesdrop.openstack.org/main01 (50gb ssd), mirror01.dfw.rax.openstack.org/main01 (100gb), mirror01.dfw.rax.openstack.org/main02 (100gb), nb04.openstack.org/main01 (1tb) and nodepool.openstack.org/main01 (1tb) volumes in rackspace dfw
  • 2018-12-19 20:13:12 UTC deleted 1tb "bandersnatch-temp" volume in rackspace dfw
  • 2018-12-19 20:03:34 UTC deleted 1tb "before-run" snapshot of "bandersnatch-temp" volume in rackspace dfw
  • 2018-12-14 20:39:03 UTC started the new opendev mailing list manager process with `sudo service mailman-opendev start` on lists.openstack.org
  • 2018-12-11 14:34:53 UTC added openstackid.org to the emergency disable list while smarcet tests out php7.2 on openstackid-dev.openstack.org
  • 2018-12-10 23:14:26 UTC Restarted Zuul scheduler to pick up changes to how projects are grouped into relative priority queues.
  • 2018-12-10 20:19:14 UTC manually invoked `apt upgrade` on ns1 and ns2.opendev.org in order to silence cronspam about unattended-upgrades not upgrading netplan.io due to introducing a new dependency on python3-netifaces
  • 2018-12-10 19:35:39 UTC provider indicates the host on which ze01 resides has gone offline
  • 2018-12-10 14:18:14 UTC upgraded openafs on ze12 with `sudo apt install openafs-modules-dkms=1.6.22.2-1ppa1` and rebooted onto the latest hwe kernel
  • 2018-12-10 14:18:06 UTC restarted statusbot to recover from connectivity issues from saturday
  • 2018-12-06 20:00:02 UTC added ze12 to zuul executor pool to reduce memory pressure
  • 2018-12-06 18:47:00 UTC unblocked stackalytics-bot-2 access to review.o.o since the performance problems observed leading up to addition of the rule on 2018-11-23 seem to be unrelated (it eventually fell back to connecting via ipv4 and no recurrence was reported)
  • 2018-12-06 13:17:49 UTC deleted stale /var/log/exim4/paniclog on ns2.opendev.org to silence nightly cron alert e-mails about it
  • 2018-12-06 00:40:27 UTC rebooted all zuul executors (ze01-ze11) due to suspected performance degredation due to swap. underlying cause is unclear, but may be due to a regression in zuul introduced since 3.3.0, or in dependencies (including ansible). objgraph installed on all executors to support future memory profiling.
  • 2018-12-05 20:09:31 UTC removed lxd and lxd-client packages from ns1 and ns2.opendev.org, autoremoved, upgraded and rebooted
  • 2018-12-05 18:44:53 UTC Nodepool launchers restarted and now running with commit ee8ca083a23d5684d62b6a9709f068c59d7383e0
  • 2018-12-04 19:26:26 UTC moved bridge.o.o /etc/ansible/hosts/openstack.yaml to a .old file for clarity, as it is not (and perhaps was never) used
  • 2018-12-04 06:52:45 UTC fixed emergency file to re-enable bridge.o.o puppet runs (which stopped in http://grafana.openstack.org/d/qzQ_v2oiz/bridge-runtime?orgId=1&from=1543888040274&to=1543889448699)
  • 2018-12-04 01:52:44 UTC used rmlist to delete the openstack, openstack-dev, openstack-operators and openstack-sigs mailing lists on lists.o.o while leaving their archives in place
  • 2018-12-04 00:19:56 UTC clarkb upgraded the Nodepool magnum k8s cluster by pulling images and rebasing/restarting services for k8s on the master and minion nodes. Magnum doesn't support these upgrades via the API yet. Note that due to disk space issues the master node had its journal cleaned up in order to pull the new images down
  • 2018-12-03 20:59:50 UTC CentOS 7.6 appears to have been released. Our mirrors seem to have synced this release. This is creating a variety of fallout in projects such as tripleo and octavia. Considering that 7.5 is now no longer supported we should address this by rolling forward and fixing problems.
  • 2018-12-03 20:58:18 UTC removed static.openstack.org from the emergency disable list now that ara configuration for logs.o.o site has merged
  • 2018-11-30 00:34:34 UTC manual reboot of mirror01.nrt1.arm64ci.openstack.org after a lot of i/o failures
  • 2018-11-29 22:16:30 UTC manually restarted elastic-recheck service on status.openstack.org to clear event backlog
  • 2018-11-29 14:42:47 UTC temporarily added nl03.o.o to the emergency disable list and manually applied https://review.openstack.org/620924 in advance of it merging
  • 2018-11-29 14:29:53 UTC rebooted mirror01.sjc1.vexxhost.openstack.org via api as it seems to have been unreachable since ~02:30z
  • 2018-11-28 02:37:56 UTC removed f27 directories after https://review.openstack.org/618416
  • 2018-11-28 01:04:05 UTC pypi volume removed from afs : afs01.dfw much happier -> /dev/mapper/main-vicepa 4.0T 2.1T 1.9T 53% /vicepa
  • 2018-11-23 15:56:02 UTC temporarily blocked stackalytics-bot-2 access to review.o.o to confirm whether the errors reported for it are related to current performance problems
  • 2018-11-22 18:00:28 UTC manually triggered gerrit's jvm garbage collection from the javamelody interface, freeing some 40gb of used memory within the jvm
  • 2018-11-22 15:58:49 UTC We have recovered from high cpu usage on review.openstack.org by killing several requests in melody that had been running for several hours and brought gerrit to a crawl with proxy errors. Requests looked like this: "/changes/?q=is:watched+is:merged&n=25&O=81 GET" but we haven't been able to identify where these requests came from.
  • 2018-11-21 15:05:58 UTC rolled back garbled summit feedback pad using: wget -qO- 'http://localhost:9001/api/1.2.11/restoreRevision?apikey='$(cat /opt/etherpad-lite/etherpad-lite/APIKEY.txt)'&padID=BER-Feedback-Session&rev=5564'
  • 2018-11-20 18:28:53 UTC ran `rmlist openstack-tc` on lists.o.o to retire the openstack-tc ml without removing its archives
  • 2018-11-19 22:41:50 UTC clarkb force merged https://review.openstack.org/618849 to fix a bug in zuul-jobs that was effecting many jobs. Long story short we have to accomdoate ipv6 addresses in zuul-jobs.
  • 2018-11-19 00:23:49 UTC started phase 2 of openstack-discuss ml combining as described at http://lists.openstack.org/pipermail/openstack-dev/2018-September/134911.html
  • 2018-11-16 10:34:10 UTC restarted gerritbot as it seemed to have dropped out of at least this channel. didn't see anything particularly helpful in logs
  • 2018-11-15 14:50:42 UTC ran "systemctl --global mask --now gpg-agent.service gpg-agent.socket gpg-agent-ssh.socket gpg-agent-extra.socket gpg-agent-browser.socket" on bridge to disable gpg-agent socket activation
  • 2018-11-14 06:59:31 UTC after receiving several 500 errors from storyboard.o.o; i restarted the worker services and apache2 on the server. backtrace is in apache error logs and matches what was seen on client error box
  • 2018-11-14 05:57:02 UTC rebooting mirror01.iad.rax.openstack.org to see if it helps with persistent pypi.org connection resets -- see comment in https://storyboard.openstack.org/#!/story/2004334#comment-110394
  • 2018-11-14 01:19:24 UTC force-merged 617852 & 617845 at the request of triple-o to help with long gate backlog
  • 2018-11-13 03:49:00 UTC removed bridge.o.o /opt/system-config/playbooks/roles/exim/filter_plugins/__pycache__/filters.cpython-36.pyc file which was stopping exim role from running, see also https://github.com/ansible/ansible/pull/48587
  • 2018-11-09 03:40:24 UTC mirror.nrt1.arm64ci.opensatck.org up and running!
  • 2018-11-07 20:39:39 UTC logs.o.o was put in the emergency file to test if bumping to 16 wsgi processes addresses timeout issues pending https://review.openstack.org/616297
  • 2018-11-07 05:42:28 UTC planet01.o.o in the emergency file, pending investigation with vexxhost
  • 2018-11-07 05:30:24 UTC planet.o.o shutdown and in error state, vexxhost.com currently not responding (planet.o.o is hosted in ca-ymq-2)
  • 2018-11-06 21:36:44 UTC infra-core added to ansible-role-cloud-launcher-core after getting rcarrillocruz's go ahead
  • 2018-11-05 16:28:03 UTC Added stephenfin and ssbarnea to git-review-core in Gerrit. Both have agree to focus on bug fixes, stability, and improved testing. Or as corvus put it "to be really clear about that, i think any change which requires us to alter our contributor docs should have a nearly impossible hill to climb for acceptance".
  • 2018-11-02 18:55:58 UTC The firewall situation with ports 8080, 8081, and 8082 on mirror nodes has been resolved. You can recheck jobs that have failed to communicate to the mirrors on those ports now.
  • 2018-11-02 18:11:11 UTC OpenStack infra's mirror nodes stopped accepting connections on ports 8080, 8081, and 8082. We will notify when this is fixed and jobs can be rechecked if they failed to communicate with a mirror on these ports.
  • 2018-11-01 21:24:09 UTC openstacksdk 0.19.0 installed on nl01-04 and nb01-03 and all nodepool launchers and builders have been restarted
  • 2018-10-31 16:19:47 UTC manually installed linux-image-virtual-hwe-16.04 on etherpad01.openstack.org to test out theory about cache memory and system cpu utilization
  • 2018-10-30 23:16:27 UTC Old nodepool.openstack.org acfa8539-10a2-4bc4-aabc-e324aa855c70 deleted as we no longer use any services on this host. Filesystem snapshot saved and called nodepool.openstack.org-20181030.2
  • 2018-10-26 08:41:21 UTC restarted apache2 service on mirror.regionone.limestone.openstack.org
  • 2018-10-25 22:33:30 UTC Old nodepool images cleared out of cloud providers as part of the post ZK db transition cleanup.
  • 2018-10-25 19:02:08 UTC Old dib images cleared out of /opt/nodepool_dib on nb01, nb02, and nb03. Need to remove them from cloud providers next.
  • 2018-10-25 15:59:17 UTC Zuul and Nodepool running against the new three node zookeeper cluster at zk01 + zk02 + zk03 .openstack.org. Old server at nodepool.openstack.org will be deleted in the near future
  • 2018-10-25 15:32:59 UTC The Zuul and Nodepool database transition is complete. Changes updated during the Zuul outage may need to be rechecked.
  • 2018-10-25 14:41:30 UTC Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.
  • 2018-10-24 22:46:31 UTC nb01 and nb02 patched to have https://review.openstack.org/#/c/613141/ installed so that image uploads to rax work. Both nodes are in ansible emergency file so this won't be undone automatically. Will need openstacksdk release before removing them from the emergency file
  • 2018-10-23 20:00:28 UTC nb04.opensatck.org removed from emergency
  • 2018-10-23 16:18:23 UTC increased quota for project.starlingx volume from 100mb to 1gb
  • 2018-10-23 15:02:26 UTC doubled memory allocation for etherpad-mysql-5.6 trove instance from 2gb to 4gb (contains indicated ~2gb active use)
  • 2018-10-23 14:55:58 UTC doubled size of disk for etherpad-mysql-5.6 trove instance from 20gb to 40gb (contains 17.7gb data)
  • 2018-10-23 09:43:08 UTC nl04 in emergency with ovh-gra1 set to 0 for now
  • 2018-10-19 20:30:11 UTC Old logstash.openstack.org server (08c356e5-d225-4163-9dce-c57b4d68eb55) running trusty has been deleted in favor of new logstash01.openstack.org server running xenial
  • 2018-10-19 17:18:29 UTC Old etherpad.openstack.org server (8e3ab3b5-b264-494a-abfc-026ad29744da) deleted as it has been replaced by a new etherpad01.openstack.org server running Xenial.
  • 2018-10-18 20:03:25 UTC Old Trusty etherpad-dev server (85140e9f-9759-4c8b-aca1-bd92ad1cb6b3) deleted now that new Xenial etherpad-dev01 server has been running for a few days without apparent issue
  • 2018-10-18 13:05:40 UTC manually deleted corrupt /afs/.openstack.org/mirror/wheel/ubuntu-16.04-x86_64/s/sqlalchemy-utils/SQLAlchemy_Utils-0.33.6-py2.py3-none-any.whl and released mirror.wheel.xenialx64 volume
  • 2018-10-17 18:43:53 UTC manually deleted corrupt /afs/.openstack.org/mirror/wheel/ubuntu-16.04-x86_64/s/sqlalchemy-utils/SQLAlchemy_Utils-0.33.6-py2.py3-none-any.whl and released mirror.wheel.xenialx64 volume
  • 2018-10-16 00:05:29 UTC nl04 in emergency while I fiddle with ovh-gra quotas to see what works
  • 2018-10-15 15:16:32 UTC force-merged https://review.openstack.org/610484 in order to work around gate issue for OpenStack Chef cookbook CI
  • 2018-10-11 20:17:31 UTC all nodepool-builders / nodepool-launchers restarted to pick up latest code base (32b8f58)
  • 2018-10-09 21:52:35 UTC OVH BSH1 manual port cleanup is running periodically (every 20 minutes) in a root screen on bridge.o.o until a better solution appears
  • 2018-10-08 23:19:04 UTC preallocated remaining space on mirror02.us-west-1.packethost.openstack.org rootfs by writing /dev/zero to a file and then removing it
  • 2018-10-08 23:11:17 UTC started mirror02.us-west-1.packethost.openstack.org via openstackclient after ~4 hours in SHUTOFF state
  • 2018-10-08 17:04:29 UTC started mirror02.us-west-1.packethost.openstack.org via openstackclient after ~3 hours in SHUTOFF state
  • 2018-10-05 21:51:50 UTC rebooted logstash.openstack.org to "fix" broken layer 2 connectivity to backend gateway
  • 2018-10-02 18:46:48 UTC manually deleted /afs/.openstack.org/docs/charm-deployment-guide/latest and performed a vos release of the docs volume
  • 2018-10-02 16:10:58 UTC deleted openstack/cinder driverfixes/ocata branch formerly at 486c00794b1401077bd0c9a6071135c149382958
  • 2018-10-02 06:24:46 UTC We merged change https://review.openstack.org/606129 to change precedence of pipelines and I'm curious to see this in practice.
  • 2018-09-28 12:55:32 UTC (dmsimard) enqueued https://review.openstack.org/606058 to gate and promoted it to increase nodepool capacity
  • 2018-09-25 07:32:13 UTC graphite.o.o removed, puppet has run and config file looks ok
  • 2018-09-25 06:43:13 UTC graphite.o.o in emergency until merge of https://review.openstack.org/604972
  • 2018-09-20 06:23:03 UTC disabled bhs1.ovh again since mirror is not reachable
  • 2018-09-17 15:56:02 UTC Removed openstack/cinder driverfixes/ocata branch with HEAD a37cc259f197e1a515cf82deb342739a125b65c6
  • 2018-09-17 15:12:24 UTC manually deleted /afs/.openstack.org/mirror/wheel/ubuntu-16.04-x86_64/s/sqlalchemy-utils/SQLAlchemy_Utils-0.33.4-py2.py3-none-any.whl and released the mirror.wheel.xenialx64 volume
  • 2018-09-12 06:42:50 UTC nb03.o.o has been transitioned into the new linaro london cloud. hopefully this will stay more reliably attached to zookeeper
  • 2018-09-11 16:16:15 UTC storyboard.o.o webclient revert has been pushed out and its emergency list entry has been removed
  • 2018-09-11 15:44:29 UTC temporarily added storyboard.o.o to emergency disable list while manually applying https://review.openstack.org/601618 until it merges
  • 2018-09-10 21:51:09 UTC INAP will be upgrading Keystone to Pike (from Mitaka) tomorrow at 10pm UTC. Maintenance window is ~1 hour. This will impact inap-mtl01.
  • 2018-09-10 06:49:02 UTC manually uploaded 11.0.0, 10.0.1, 9.0.6, 8.1.5, ceilometer pypi releases per http://lists.openstack.org/pipermail/openstack-dev/2018-September/134496.html
  • 2018-09-06 23:01:52 UTC rebuilt mirror.mtl01.inap.openstack.org and removed unused volumes mirror.mtl01.internap.openstack.org/main02 and mirror.mtl01.internap.openstack.org/main01
  • 2018-08-31 09:59:00 UTC Jobs using devstack-gate (legacy devstack jobs) have been failing due to an ara update. We use now a newer ansible version, it's safe to recheck if you see "ImportError: No module named manager" in the logs.
  • 2018-08-31 00:34:50 UTC restarted etherpad-lite service on etherpad.openstack.org to start making use of the 1.7.0 release in the wake of https://review.openstack.org/597544
  • 2018-08-29 14:29:44 UTC reset /opt/etherpad-lite/etherpad-lite on etherpad-dev.o.o to the 1.7.0 release and restarted the etherpad-lite service for testing
  • 2018-08-29 03:06:48 UTC the openstack meetbot is now delaying channel joins until nickserv confirms identification; all channels should once again be logged
  • 2018-08-29 03:05:43 UTC updated chat.freenode.net entry in eavesdrop.o.o's /etc/hosts file from no-longer-active 82.96.96.11 to 38.229.70.22 (card.freenode.net) and restarted the openstack-meetbot and statusbot services
  • 2018-08-24 13:27:21 UTC updated chat.freenode.net entry in eavesdrop.o.o's /etc/hosts file from no-longer-active 195.154.200.232 to 82.96.96.11 (kornbluth.freenode.net) and restarted the openstack-meetbot and statusbot services
  • 2018-08-22 22:22:11 UTC manually deleted redundant branch "stable/queen" (previously at e1146cc01bce2c1bd6eecb08d92281297218f884) from openstack/networking-vsphere as requested in http://lists.openstack.org/pipermail/openstack-infra/2018-August/006064.html
  • 2018-08-21 16:34:05 UTC Stopped ze01.o.o, deleted executor-git directory on filesystem, started ze01.o.o again. Zuul has properly repopulated the directory with right file permissions
  • 2018-08-21 13:12:57 UTC started apache service on ask.openstack.org (died around log rotation again leaving no information as to why)
  • 2018-08-18 12:58:55 UTC removed stale pidfile and started apache on ask.o.o after it died silently during log rotation
  • 2018-08-17 14:30:17 UTC the hypervisor host for ze02 was restarted, server up since 22:53z, seems to be running jobs normally
  • 2018-08-16 23:11:36 UTC This means that config changes will need to be manually applied while we work to get the puppet cron running on bridge.o.o. New projects won't be created for example.
  • 2018-08-16 23:10:51 UTC Puppetmaster is no longer running puppet for us. bridge.openstack.org is now our cfg mgmt control. It is currently in a state of transition while we test things and puppet is not being automatically executed.
  • 2018-08-16 22:56:16 UTC restarted all zuul executors with linux 4.15.0-32-generic
  • 2018-08-16 14:30:31 UTC manually deleted the stable/rocky branch previously at 90dfca5dfc60e48544ff25f63c3fa59cb88fc521 from openstack/ovsdbapp at the request of amoralej and smcginnis
  • 2018-08-15 16:59:13 UTC 93b2b91f-7d01-442b-8dff-96a53088654a ethercalc01.openstack.org has been deleted in favor of new xenial ethercalc02 server
  • 2018-08-14 22:35:08 UTC Ethercalc service migrated to Xenial on new ethercalc02 instance. Backups updated to push to bup-ethercalc02 remote as well. We should delete ethercalc01.openstack.org in the near future then bup-ethercalc01 in the later future.
  • 2018-08-14 09:45:35 UTC nodepool dib images centos-7-0000009152 debian-stretch-0000000171 ubuntu-trusty-0000003720 removed, see https://review.openstack.org/591588
  • 2018-08-08 14:39:39 UTC manually deleted branch stable/rocky previously at commit a33b1499d7c00e646a9b49715a8a7dbd4467ec91 from openstack/python-tripleoclient as requested by mwhahaha, EmilienM and smcginnis
  • 2018-08-07 20:44:34 UTC Due to a bug, Zuul has been unable to report on cherry-picked changes over the last 24 hours. This has now been fixed; if you encounter a cherry-picked change missing its results (or was unable to merge), please recheck now.
  • 2018-08-07 19:54:40 UTC deleted a/aaaa rrs for the long-gone odsreg.openstack.org
  • 2018-08-07 17:21:48 UTC Updated openstacksdk on nodepool-launchers to 0.17.2 to fix provider thread crashes that result in idle providers
  • 2018-08-07 06:08:34 UTC http://zuul.openstack.org/api/config-errors shows *no* errors
  • 2018-08-06 23:27:13 UTC zuul now reports to gerrit over HTTPS rather than ssh; please keep an eye out for any issues
  • 2018-08-05 08:39:28 UTC the periodic translation jobs are not run - help needed to figure out failure
  • 2018-08-03 17:30:40 UTC Project renames and review.openstack.org downtime are complete without any major issue.
  • 2018-08-03 16:04:35 UTC The infra team is renaming projects in Gerrit. There will be a short ~10 minute Gerrit downtime in a few minutes as a result.
  • 2018-08-01 23:24:14 UTC set mlock +n on all channels (prevents sending to the channel without joining)
  • 2018-08-01 15:49:44 UTC Due to ongoing spam, all OpenStack-related channels now require authentication with nickserv. If an unauthenticated user joins a channel, they will be forwarded to #openstack-unregistered with a message about the problem and folks to help with any questions (volunteers welcome!).
  • 2018-08-01 04:58:43 UTC +r (registered users only) has been temporarily set on #openstack-infra due to incoming spam. this will be re-evaluated in a few hours
  • 2018-07-28 00:57:32 UTC all zuul-executors now running kernel 4.15.0-29-generic #31~16.04.1-Ubuntu
  • 2018-07-27 15:41:58 UTC A zuul config error slipped through and caused a pile of job failures with retry_limit - a fix is being applied and should be back up in a few minutes
  • 2018-07-26 22:46:09 UTC mirror.us-west-1.packethost.openstack.org cname updated to mirror02.us-west-1.packethost.openstack.org
  • 2018-07-25 20:28:05 UTC enqueued 585839 into gate to help fix tripleo queue
  • 2018-07-25 04:07:05 UTC upgraded openstacksdk to 0.17.0 on puppetmaster for to resolve vexxhost issues (see 0.17.0 release notes currently @ https://docs.openstack.org/releasenotes/openstacksdk/unreleased.html#bug-fixes)
  • 2018-07-24 14:13:37 UTC mirror.us-west-1.packethost.openstack.org reboot via openstack API due to not responding to SSH / HTTP requests. Server now back online.
  • 2018-07-24 01:49:56 UTC ze11.openstack.org is online and running jobs.
  • 2018-07-23 20:06:26 UTC set forward_auto_discards=0 on openstack-qa@lists.openstack.org ml to combat spam backscatter
  • 2018-07-23 18:49:19 UTC All nodepool builders restarted with latest code, include switch from shade to openstacksdk
  • 2018-07-23 18:39:19 UTC All nodepool launchers restarted with latest code, include switch from shade to openstacksdk
  • 2018-07-19 13:43:11 UTC logs.openstack.org is back on-line. Changes with "POST_FAILURE" job results should be rechecked.
  • 2018-07-19 12:59:58 UTC logs.openstack.org is offline, causing POST_FAILURE results from Zuul. Cause and resolution timeframe currently unknown.
  • 2018-07-19 05:37:06 UTC grafana.o.o switched to new grafana02.o.o
  • 2018-07-17 15:11:10 UTC switched primary address for openstackci pypi account from review@o.o to infra-root@o.o so that it doesn't get mixed in with gerrit backscatter (we can switch to a dedicated alias later if needed)
  • 2018-07-17 15:05:03 UTC changed validated e-mail address for openstackci account on pypi per https://mail.python.org/mm3/archives/list/distutils-sig@python.org/thread/5ER2YET54CSX4FV2VP24JA57REDDW5OI/
  • 2018-07-13 23:38:10 UTC logs.openstack.org is back on-line. Changes with "POST_FAILURE" job results should be rechecked.
  • 2018-07-13 21:53:48 UTC logs.openstack.org is offline, causing POST_FAILURE results from Zuul. Cause and resolution timeframe currently unknown.
  • 2018-07-08 15:11:26 UTC touched GerritSiteHeader.html on review.openstack.org to get hideci.js working again after https://review.openstack.org/559634 was puppeted
  • 2018-07-06 14:14:58 UTC manually restarted mosquitto, lpmqtt and germqtt services on firehose01.openstack.org (mosquitto died again during log rotation due to its signal handling bug, and the other two services subsequently died from connection failures because the broker was down)
  • 2018-07-06 03:05:36 UTC old reviewday and bugday processes on status.o.o manually killed, normal runs should resume
  • 2018-07-05 16:26:54 UTC restarted rabbitmq-server service on storyboard.openstack.org to clear a "lock wait timeout exceeded" internalerror condition blocking task status updates
  • 2018-06-29 03:20:12 UTC migrated /opt on nb03.o.o to a new cinder volume due to increasing space requirements from new builds
  • 2018-06-28 12:09:53 UTC lists.openstack.org has been removed from the emergency disable list now that https://review.openstack.org/576539 has merged
  • 2018-06-26 06:49:33 UTC project-config's zuul config is broken, it contains a removed job. https://review.openstack.org/577999 should fix it.
  • 2018-06-25 22:48:19 UTC puppetdb.openstack.org A and AAAA dns records removed
  • 2018-06-25 20:45:26 UTC Jenkins and Infracloud data removed from hieradata
  • 2018-06-25 16:43:25 UTC Nodepool launchers restarted with latest code
  • 2018-06-25 03:41:53 UTC nl03.o.o in emergency and vexxhost max-severs turned to 0 temporarily
  • 2018-06-25 03:35:03 UTC storyboard.o.o works & apache restarted after persistent errors appearing to occur after rabbitmq disconnection. See log around Sun Jun 24 08:33:43.058712 for original error (http://paste.openstack.org/show/724194/)
  • 2018-06-22 21:25:18 UTC SSL cert rotation for June 2018 completed.
  • 2018-06-22 21:00:24 UTC Removed docs-draft CNAME record to static.o.o as the doc drafts are no longer hosted separately
  • 2018-06-19 14:40:20 UTC lists.openstack.org added to emergency disable list until https://review.openstack.org/576539 merges
  • 2018-06-18 23:23:31 UTC openstackid.org has been removed from the emergency disable list now that https://review.openstack.org/576248 has merged, and after confirming with smarcet that he will keep an eye on it
  • 2018-06-13 16:48:21 UTC switched lists.airshipit.org and lists.starlingx.io dns records from cname to a/aaaa for proper mail routing
  • 2018-06-13 01:57:09 UTC yum-cron is now active on all git* hosts. some may have not had package updates for a while, so look at that first in case of issues
  • 2018-06-12 00:44:28 UTC storyboard.openstack.org has been removed from the emergency disable list now that https://review.openstack.org/574468 has merged
  • 2018-06-11 19:58:40 UTC Zuul was restarted for a software upgrade; changes uploaded or approved between 19:30 and 19:50 will need to be rechecked
  • 2018-06-09 14:48:20 UTC manually started the unbound daemon on mirror.gra1.ovh.openstack.org due to https://launchpad.net/bugs/1775833
  • 2018-06-09 13:20:39 UTC temporarily added storyboard.openstack.org to the emergency disable list and manually reverted to openstack-infra/storyboard commit f38f3bc while working to bisect a database locking problem
  • 2018-06-08 21:21:34 UTC Manually applied https://review.openstack.org/#/c/573738/ to nl03 as nl* are disabled in puppet until we sort out the migration to no zk schema
  • 2018-06-08 19:25:19 UTC Nodepool issue from earlier today seems to have been caused by nl03 launcher restart. Mixed, incompatible versions of code caused us to create min-ready nodes continually until we reached full capacity. A full shutdown and restart of nodepool launchers is necessary to prevent this going forward.
  • 2018-06-08 17:25:59 UTC The Zuul scheduler was offline briefly to clean up from debugging a nodepool issue, so changes uploaded or approved between 16:50 and 17:15 UTC may need to be rechecked or reapproved (all already queued changes are in the process of being reenqueued now)
  • 2018-06-08 13:48:39 UTC unbound was manually restarted on many zuul executors following the 1.5.8-1ubuntu1.1 security update, due to https://launchpad.net/bugs/1775833
  • 2018-06-08 13:48:35 UTC A misapplied distro security package update caused many jobs to fail with a MERGER_FAILURE error between ~06:30-12:30 UTC; these can be safely rechecked now that the problem has been addressed
  • 2018-06-08 06:03:09 UTC Zuul stopped receiving gerrit events around 04:00UTC; any changes submitted between then and now will probably require a "recheck" comment to be requeued. Thanks!
  • 2018-06-07 22:21:22 UTC Added vexxhost back to nl03 but our mirror node is unhappy there so have temporarily disabled the cloud again until the mirror node is up and running
  • 2018-06-07 16:11:59 UTC The zuul upgrade to ansible 2.5 is complete and zuul is running again. Changes uploaded or approved between 15:25 and 15:45 will need to be rechecked. Please report any problems in #openstack-infra
  • 2018-06-07 15:32:32 UTC Zuul update for Ansible 2.5 in progress. Scheduler crashed as unexpected side effect of pip upgrade. Will be back and running shortly.
  • 2018-06-06 15:52:11 UTC deleted /afs/.openstack.org/docs/project-install-guide/baremetal/draft at the request of pkovar
  • 2018-06-06 13:43:41 UTC added lists.starlingx.io cname to lists.openstack.org and started the mailman-starlingx service on the server now that https://review.openstack.org/569545 has been applied
  • 2018-06-05 21:36:58 UTC Zuul job-output.txt files were incomplete if at any point the job stopped producing logs for more than 5 seconds. This happened due to a timeout in the log streaming daemon. This has been fixed and the zuul executors have been restarted. Jobs running after now should have complete logs.
  • 2018-06-04 00:37:01 UTC survey01.openstack.org is no longer in the emergency disable list now that https://review.openstack.org/571976 has merged
  • 2018-06-03 13:39:52 UTC survey01.openstack.org has been placed into the emergency disable list until https://review.openstack.org/571976 merges so that setup can resume
  • 2018-05-30 22:17:39 UTC storyboard is deploying latest webclient again after fixing the deployment process around the webclient.
  • 2018-05-30 20:54:01 UTC storyboard.openstack.org has been removed from the emergency disable list now that storyboard-webclient tarball deployment is fixed
  • 2018-05-30 17:42:52 UTC git08 added back to git.o.o haproxy and all git backends updated to make git.openstack.org vhost their default vhost. This means that https clients that don't speak SNI will get the cert for git.o.o (and talk to git.o.o vhost) by default.
  • 2018-05-29 23:45:05 UTC Bypassed zuul on https://review.openstack.org/570811 due to needing a circular fix for the cmd2 environment markers solution
  • 2018-05-29 23:45:01 UTC Restarted statusbot as it seemed to have gotten lost in the midst of Saturday's netsplits
  • 2018-05-24 22:38:46 UTC removed AAAA records for afsdb01 & afsdb02 per https://review.openstack.org/559851
  • 2018-05-24 11:38:16 UTC afs mirror.pypi quota exceeded; increased to 1.9T (2000000000)
  • 2018-05-24 05:17:33 UTC mirror-update.o.o placed into emergency file, and cron is stopped on the host, pending recovery of several afs volumes
  • 2018-05-11 18:20:14 UTC restarted the etherpad-lite service on etherpad.openstack.org for the upgrade to 1.6.6
  • 2018-05-11 13:17:11 UTC Due to a Zuul outage, patches uploaded to Gerrit between 09:00UTC and 12:50UTC, were not properly added to Zuul. Please recheck any patches during this window and apologies for the inconvenience.
  • 2018-05-10 14:42:27 UTC restarted the etherpad-lite service on etherpad-dev.openstack.org to test release 1.6.6
  • 2018-05-07 22:58:47 UTC Any devstack job failure due to rsync errors related to tripleo-incubator can safely be rechecked now
  • 2018-05-02 22:13:28 UTC Gerrit maintenance has concluded successfully
  • 2018-05-02 20:08:03 UTC The Gerrit service at review.openstack.org will be offline over the next 1-2 hours for a server move and operating system upgrade: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130118.html
  • 2018-05-02 19:37:57 UTC The Gerrit service at review.openstack.org will be offline starting at 20:00 (in roughly 25 minutes) for a server move and operating system upgrade: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130118.html
  • 2018-04-28 13:58:59 UTC Trove instance for storyboard.openstack.org was rebooted 2018-04-28 07:29z due to a provider incident (DBHDJ5WPgalxvZo) but is back in working order
  • 2018-04-27 12:15:52 UTC fedora-26 removed from mirror.fedora (AFS mirror) and rsync configuration on mirror-update.o.o
  • 2018-04-27 11:19:06 UTC jessie removed from mirror.debian (AFS mirror) and reprepro configuration on mirror-update.o.o
  • 2018-04-26 17:10:25 UTC nb0[12], nl0[1-4] restarted nodepool services to pick up recent chagnes to nodepool. All running e9b82226a5641042e1aad1329efa6e3b376e7f3a of nodepool now.
  • 2018-04-26 15:07:31 UTC We've successfully troubleshooted the issue that prevented paste.openstack.org from loading and it's now back online, thank you for your patience.
  • 2018-04-26 14:29:42 UTC ze09 was rebooted 2018-04-26 08:15z due to a provider incident (CSHD-1AG24176PQz) but is back in working order
  • 2018-04-26 02:44:02 UTC restarted lodgeit on paste.o.o because it appeared hung
  • 2018-04-25 22:10:02 UTC ansible-role-puppet updated with new support for Puppet 4 (backward compatible with puppet 3)
  • 2018-04-25 22:09:40 UTC ic.openstack.org domain deleted from dns management as part of infracloud cleanup"
  • 2018-04-25 13:32:43 UTC ze09 was rebooted 2018-04-25 01:39z due to a provider incident (CSHD-vwoxBJl5x7L) but is back in working order
  • 2018-04-25 13:31:49 UTC logstash-worker20 was rebooted 2018-04-22 20:50z due to a provider incident (CSHD-AjJP61XQ2n5) but is back in working order
  • 2018-04-19 18:19:51 UTC all DIB images (minus gentoo) have been unpaused for nodepool-builder. Latest release of diskimage-builder fixed our issues related to pip10 and glean failing to boot.
  • 2018-04-19 07:43:59 UTC 7000+ leaked images and ~200TB of leaked images and objects cleaned up from our 3 RAX regions. See https://review.openstack.org/#/c/562510/ for more details
  • 2018-04-18 21:21:31 UTC Pypi mirroring with bandersnatch is now running with Bandersnatch 2.2.0 under python3. This allows us to blacklist packages if necessary (which we are doing to exclude very large packages with very frequent updates to reduce disk needs)
  • 2018-04-17 14:15:36 UTC deactivated duplicate gerrit accounts 26191 and 27230, and reassigned their openids to older account 8866
  • 2018-04-17 13:40:38 UTC running `mosquitto -v -c /etc/mosquitto/mosquitto.conf` under a root screen session for crash debugging purposes
  • 2018-04-17 09:52:58 UTC nb03.o.o placed into emergency file and manually applied pause of builds, while project-config gating is broken
  • 2018-04-17 00:04:20 UTC PyPi mirror updating is on pause while we sort out updating bandersnatch in order to blacklist large packages that keep filling our mirror disk volumes.
  • 2018-04-16 18:09:22 UTC restarted the mosquitto service on firehose01.openstack.org to pick up a recent configuration change
  • 2018-04-16 16:46:40 UTC increased AFS pypi mirror volume quota to 1800000000 kbytes (thats just under 1.8TB) as previous value of 1700000000 was nearing capacity
  • 2018-04-14 16:52:53 UTC The Gerrit service at https://review.openstack.org/ will be offline for a minute while it is restarted to pick up a configuration change allowing it to start commenting on stories in StoryBoard, and will return to service momentarily
  • 2018-04-13 21:02:29 UTC holding lock on mirror.debian for reprepro while I repair debian-security database in reprepro
  • 2018-04-13 20:46:30 UTC openstack/os-client-config bugs have been imported to storyboard.o.o from the os-client-config lp project
  • 2018-04-13 20:40:46 UTC openstack/openstacksdk bugs have been imported to storyboard.o.o from the python-openstacksdk lp project
  • 2018-04-13 20:35:40 UTC openstack/python-openstackclient bugs have been imported to storyboard.o.o from the python-openstackclient lp project
  • 2018-04-13 20:35:07 UTC openstack/tripleo-validations bugs have been imported to storyboard.o.o from the tripleo lp project filtering on the validations bugtag
  • 2018-04-13 20:31:13 UTC review-dev01.openstack.org has been removed from the emergency disable list now that storyboard integration testing is finished
  • 2018-04-12 23:41:30 UTC The Etherpad service at https://etherpad.openstack.org/ is being restarted to pick up the latest release version; browsers should see only a brief ~1min blip before reconnecting automatically to active pads
  • 2018-04-12 22:42:29 UTC nodejs has been manually nudged on etherpad.o.o to upgrade to 6.x packages now that https://review.openstack.org/561031 is in place
  • 2018-04-12 20:19:53 UTC manually corrected the eplite homedir path in /etc/passwd on etherpad.o.o and created it on the filesystem with appropriate ownership following https://review.openstack.org/528625
  • 2018-04-12 19:34:00 UTC restarted etherpad-lite service on etherpad-dev.openstack.org (NOT review-dev!) to pick up commits related to latest 1.6.5 tag
  • 2018-04-12 19:33:38 UTC restarted etherpad-lite service on review-dev.openstack.org to pick up commits related to latest 1.6.5 tag
  • 2018-04-11 22:31:53 UTC zuul was restarted to updated to the latest code; you may need to recheck changes uploaded or approvals added between 21:30 and 21:45
  • 2018-04-11 20:12:46 UTC added review-dev01.openstack.org to emergency disable list in preparation for manually experimenting with some configuration changes in an attempt to further diagnose the its-storyboard plugin
  • 2018-04-09 22:00:10 UTC removed AAAA RRs for afs01.dfw.o.o, afs02.dfw.o.o and afs01.ord.o.o per https://review.openstack.org/559851
  • 2018-04-09 16:53:40 UTC zuul was restarted to update to the latest code; please recheck any changes uploaded within the past 10 minutes
  • 2018-04-09 11:06:06 UTC ask-staging.o.o is pointing to a new xenial-based server. old server is at ask-staging-old.o.o for now
  • 2018-04-09 10:38:32 UTC afs02.dfw.o.o ran out of space in /vicepa. added +1tb volume (bringing in-line with afs01). two volumes appear to have become corrupt due to out-of-disk errors; ubuntu & debian. recovery involves a full release. ubuntu is done, debian is currently going; i have the cron lock
  • 2018-04-09 09:43:43 UTC the elasticsearch.service on es03.o.o was down since 02:00z, restarted now
  • 2018-04-04 20:27:56 UTC released bindep 2.7.0
  • 2018-04-04 08:32:36 UTC git08 is showing broken repos e.g. openstack/ara-clients is empty. placed git.openstack.org into emergency file and removed git08 from the list of backends for haproxy as temporary fix
  • 2018-04-02 16:06:04 UTC Cleaned up old unused dns records per http://paste.openstack.org/show/718183/ we no longer use the pypi hostname for our pypi mirrors and some of the clouds don't exist anymore.
  • 2018-03-29 19:45:13 UTC Clarkb has added zxiiro to the python-jenkins-core and release groups. ssbarnea and waynr added to the python-jenkins-core group after mailing list thread did not lead to any objections. I will let them coordinate and decide if other individuals are appropriate to add to python-jenkins-core
  • 2018-03-29 00:00:15 UTC Zuul has been restarted to update to the latest code; existing changes have been re-enqueued, you may need to recheck changes uploaded in the past 10 minutes
  • 2018-03-28 22:55:43 UTC removed tonyb to Group Project Bootstrappers on review.o.o
  • 2018-03-28 22:34:49 UTC added tonyb to Group Project Bootstrappers on review.o.o
  • 2018-03-28 21:53:56 UTC the zuul web dashboard will experience a short downtime as we roll out some changes - no job execution should be affected
  • 2018-03-27 16:21:25 UTC git08 removed from emergency file and added back to git loadbalancer
  • 2018-03-27 16:05:41 UTC set wgObjectCacheSessionExpiry to 86400 in /srv/mediawiki/Settings.php on the wiki to see if it effectively increases session duration to one day
  • 2018-03-26 21:50:23 UTC added git.zuul-ci.org cert to hieradata
  • 2018-03-26 21:35:32 UTC added git08.openstack.org to puppet emergency file for testing
  • 2018-03-26 21:06:29 UTC removed git08.openstack.org from git lb for manual testing
  • 2018-03-24 01:47:06 UTC CORRECTION: stray login.launchpad.net openids were rewritten to login.ubuntu.com
  • 2018-03-24 01:46:05 UTC Duplicate accounts on storyboard.openstack.org have been merged/cleaned up, and any stray login.launchpad.com openids rewritten to login.launchpad.net
  • 2018-03-23 15:51:08 UTC Gerrit will be temporarily unreachable as we restart it to complete the rename of some projects.
  • 2018-03-23 14:53:31 UTC zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed.
  • 2018-03-23 14:49:37 UTC zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed.
  • 2018-03-23 00:06:12 UTC graphite restarted on graphite.o.o to pick up logging changes from https://review.openstack.org/#/c/541488/
  • 2018-03-22 21:47:15 UTC killed a 21 day old puppet apply on nl03.o.o, was using 100% CPU. strace shown a spam of "sched_yield" and nothing else which seems to have been categorized as a ruby issue in https://tickets.puppetlabs.com/browse/PA-1743
  • 2018-03-22 21:23:16 UTC zuul executors have been restarted to pick up latest security fix for localhost execution
  • 2018-03-21 22:49:01 UTC review01-dev.o.o now online (ubuntu-xenial) and review-dev.o.o DNS redirected
  • 2018-03-21 04:44:06 UTC all today's builds deleted, and all image builds on hold until dib 2.12.1 release. dib fix is https://review.openstack.org/554771 ; however requires a tripleo fix in https://review.openstack.org/554705 to first unblock dib gate
  • 2018-03-20 15:04:47 UTC nl03.o.o removed from emergency file on puppetmaster.o.o
  • 2018-03-20 00:00:46 UTC all afs fileservers running with new settings per https://review.openstack.org/#/c/540198/6/doc/source/afs.rst ; monitoring but no current issues
  • 2018-03-19 20:51:14 UTC nl03.o.o added to emergency file and max-server to 0 for vexxhost until https://review.openstack.org/554354/ land and new raw images built / uploaded
  • 2018-03-19 14:56:07 UTC manually set ownership of /srv/static/tarballs/kolla/images/README.txt from root:root to jenkins:jenkins so that release jobs no longer fail uploading
  • 2018-03-16 18:42:58 UTC Restarted zuul-executors to pick up fix in https://review.openstack.org/553854
  • 2018-03-16 04:26:59 UTC mirror-update.o.o upgraded to bionic AFS packages (1.8.0~pre5-1ppa2). ubuntu-ports, ubuntu & debian mirrors recovered
  • 2018-03-15 19:28:17 UTC The regression stemming from one of yesterday's Zuul security fixes has been rectified, and Devstack/Tempest jobs which encountered POST_FAILURE results over the past 24 hours can safely be rechecked now
  • 2018-03-15 14:12:51 UTC POST_FAILURE results on Tempest-based jobs since the most recent Zuul security fixes are being investigated; rechecking those won't help for now but we'll keep you posted once a solution is identified
  • 2018-03-15 02:54:37 UTC mirror-update.o.o upgraded to xenial host mirror-update01.o.o. mirror-update turned into a cname for 01. old server remains at mirror-update-old.o.o but turned off, so as not run conflicting jobs; will clean up later
  • 2018-03-14 13:22:43 UTC added frickler to https://launchpad.net/~openstack-ci-core and set as administrator to enable ppa management
  • 2018-03-13 08:06:47 UTC Removed typo'd branch stable/queen from openstack/networking-infoblox at revision f6779d525d9bc622b03eac9c72ab5d425fe1283f
  • 2018-03-12 18:43:57 UTC Zuul has been restarted without the breaking change; please recheck any changes which failed tests with the error "Accessing files from outside the working dir ... is prohibited."
  • 2018-03-12 18:21:13 UTC Most jobs in zuul are currently failing due to a recent change to zuul; we are evaluating the issue and will follow up with a recommendation shortly. For the moment, please do not recheck.
  • 2018-03-07 11:22:44 UTC force-merged https://review.openstack.org/550425 to unblock CI
  • 2018-03-06 21:19:56 UTC The infrastructure team is aware of replication issues between review.openstack.org and github.com repositories. We're planning a maintenance to try and address the issue. We recommend using our official supported mirrors instead located at https://git.openstack.org.
  • 2018-03-06 08:24:37 UTC i have applied a manual revert of 4a781a7f8699f5b483f79b1bdface0ba2ba92428 on zuul01.openstack.org and placed it in the emergency file
  • 2018-03-05 03:25:10 UTC gerrit restarted to get github replication going; see http://lists.openstack.org/pipermail/openstack-infra/2018-March/005842.html for some details
  • 2018-03-01 00:56:56 UTC translate.o.o upgraded to zanata 4.3.3. see notes in https://etherpad.openstack.org/p/zanata_upgrade_4xx
  • 2018-02-28 23:31:10 UTC removed old ci-backup-rs-ord.openstack.org dns entry (replaced by backup01.ord.rax.ci.openstack.org) and entry from emergency file. host was deleted some time ago
  • 2018-02-27 10:24:35 UTC gerrit is being restarted due to extreme slowness
  • 2018-02-24 01:07:04 UTC Had to start zuul-scheduler using the init script directly after running `export _SYSTEMCTL_SKIP_REDIRECT=1` to avoid the systemd sysv redirection. One theory is that systemd was unhappy after operating in the low memory environment. We need to get this working again as well as fix bugs iwth pid file handling in the init script. Might consider a server reboot.
  • 2018-02-24 01:05:53 UTC The zuul-scheduler init script on zuul01.o.o appeared to stop working when attempting to start zuul after stopping it for running out of memory. Systemctl would report the process started successfully then exited 0. There were no zuul logs and no additional info in journalctl to further debug. Had to start zuul-scheduler using the init script directly after running `export
  • 2018-02-24 01:04:33 UTC Zuul was restarted to workaround a memory issue. If your jobs are not running they may need to be rechecked
  • 2018-02-22 16:30:26 UTC deleted http://paste.openstack.org/raw/665906/ from lodgeit openstack.pastes table (paste_id=665906) due to provider aup violation/takedown notice
  • 2018-02-22 02:30:16 UTC mirror.ubuntu reprepro has been repaired and back online
  • 2018-02-21 21:02:06 UTC bypassed ci testing for https://review.openstack.org/546758 to resolve a deadlock with imported zuul configuration in a new project
  • 2018-02-21 20:33:27 UTC bypassed ci testing for https://review.openstack.org/546746 to resolve a deadlock with imported zuul configuration in a new project
  • 2018-02-21 16:36:27 UTC manually removed /srv/static/tarballs/training-labs/dist/build, /srv/static/tarballs/training-labs/images, /srv/static/tarballs/dist and /srv/static/tarballs/build from static.openstack.org
  • 2018-02-21 14:18:04 UTC deleted stale afs lockfile for the ubuntu mirror and restarted a fresh reprepro-mirror-update of it on mirror-update.o.o
  • 2018-02-20 04:08:34 UTC mirror.ubuntu released in AFS and ubuntu bionic repos now online
  • 2018-02-20 02:08:01 UTC (dmsimard) The temporary server and volume clones from static.o.o have been deleted, their names were prefixed by "dmsimard-static".
  • 2018-02-19 15:17:03 UTC Zuul has been restarted to pick up latest memory fixes. Queues were saved however patches uploaded after 14:40UTC may have been missed. Please recheck if needed.
  • 2018-02-18 15:56:50 UTC Zuul has been restarted and queues were saved. However, patches uploaded after 14:40UTC may have been missed. Please recheck your patchsets where needed.
  • 2018-02-16 20:30:06 UTC replacement ze02.o.o server is online and processing jobs
  • 2018-02-14 15:06:26 UTC Due to a race in stable/queens branch creation and some job removals, Zuul has reported syntax errors for the past hour; if you saw a syntax error reported for "Job tripleo-ci-centos-7-ovb-containers-oooq not defined" you can safely recheck now
  • 2018-02-14 14:58:14 UTC bypassed zuul and directly submitted https://review.openstack.org/#/c/544359
  • 2018-02-14 14:54:11 UTC bypassed zuul and directly submitted https://review.openstack.org/#/c/544358
  • 2018-02-14 05:10:17 UTC ze02.o.o was hard rebooted but didn't fix ipv6 issues. i detached and re-attached the network from the existing vm, and that seemed to help. DNS has been updated
  • 2018-02-13 21:06:27 UTC restarted nl01 to nl04 to pick up latest fixes for nodepool
  • 2018-02-13 20:05:42 UTC planet01.o.o and ns2.o.o rebooted at provider's request
  • 2018-02-13 14:06:16 UTC temporarily added lists.openstack.org to the emergency maintenance list with https://review.openstack.org/543941 manually applied until it merges
  • 2018-02-12 19:40:32 UTC Ubuntu has published Trusty and Xenial updates for CVE-2018-6789; I have manually updated lists.openstack.org and lists.katacontainers.io with the new packages manually rather than waiting for unattended-upgrades to find them
  • 2018-02-12 04:41:10 UTC per previous message, seems host was rebooted. nl02.o.o looks ok; manually restarted zuul-merger on zm06. no more issues expected
  • 2018-02-12 03:25:47 UTC received rax notification that host of nl02.o.o and zm06.o.o has some sort of issue; currently can't log into either. updates from rax pending
  • 2018-02-07 21:07:22 UTC cleared stale reprepro update lockfile for debian, manually ran mirror update
  • 2018-02-07 19:51:02 UTC ipv6 addresses have been readded to all zuul executors
  • 2018-02-07 01:55:18 UTC (dmsimard) OVH BHS1 and GRA1 both recovered on their own and are back at full capacity.
  • 2018-02-06 23:02:12 UTC all nodepool launchers restarted to pick up https://review.openstack.org/541375
  • 2018-02-06 22:45:48 UTC provider ticket 180206-iad-0005440 has been opened to track ipv6 connectivity issues between some hosts in dfw; ze09.openstack.org has its zuul-executor process disabled so it can serve as an example while they investigate
  • 2018-02-06 22:43:44 UTC ze02.o.o rebooted with xenial 4.13 hwe kernel ... will monitor performance
  • 2018-02-06 17:53:57 UTC (dmsimard) High nodepool failure rates (500 errors) against OVH BHS1 and GRA1: http://paste.openstack.org/raw/663704/
  • 2018-02-06 17:53:11 UTC (dmsimard) zuul-scheduler issues with zookeeper ( kazoo.exceptions.NoNodeError / Exception: Node is not locked / kazoo.client: Connection dropped: socket connection broken ): https://etherpad.openstack.org/p/HRUjBTyabM
  • 2018-02-06 17:51:41 UTC (dmsimard) Different Zuul issues relative to ipv4/ipv6 connectivity, some executors have had their ipv6 removed: https://etherpad.openstack.org/p/HRUjBTyabM
  • 2018-02-06 17:49:03 UTC (dmsimard) CityCloud asked us to disable nodepool usage with them until July: https://review.openstack.org/#/c/541307/
  • 2018-02-06 10:31:35 UTC Our Zuul infrastructure is currently experiencing some problems and processing jobs very slowly, we're investigating. Please do not approve or recheck changes for now.
  • 2018-02-06 09:32:11 UTC graphite.o.o disk full. move /var/log/graphite/carbon-cache-a/*201[67]* to cinder-volume-based /var/lib/graphite/storage/carbon-cache-a.backup.2018-02-06 and server rebooted
  • 2018-02-05 23:02:43 UTC removed lists.openstack.org from the emergency maintenance file
  • 2018-02-05 21:03:27 UTC removed static.openstack.org from emergency maintenance list
  • 2018-02-05 14:56:15 UTC temporarily added lists.openstack.org to the emergency maintenance list pending merger of https://review.openstack.org/540876
  • 2018-02-03 14:05:46 UTC gerrit ssh api on review.openstack.org is once again limited to 100 concurrent connections per source ip address per https://review.openstack.org/529712
  • 2018-02-01 22:05:05 UTC deleted zuulv3.openstack.org and corresponding dns records now that zuul01.openstack.org has been in production for two weeks
  • 2018-02-01 17:57:16 UTC files02.openstack.org removed from emergency file after zuul-ci.org intermediate cert problem resolved
  • 2018-02-01 06:31:41 UTC mirror01.dfw.o.o has been retired due to performance issues; it is replaced by mirror02.dfw.o.o
  • 2018-01-31 21:22:06 UTC filed for removal of ask.openstack.org from mailspike blacklist per https://launchpad.net/bugs/1745512
  • 2018-01-31 02:55:44 UTC mirror.dfw.rax.openstack.org updated to mirror02.dfw.rax.openstack.org (TTL turned down to 5 minutes as we test)
  • 2018-01-31 02:53:34 UTC remove old A and AAAA records for mirror.dfw.openstack.org (note: not mirror.dfw.rax.openstack.org)
  • 2018-01-30 23:43:40 UTC scheduled maintenance will make the citycloud-kna1 api endpoint unavailable intermittently between 2018-02-07 07:00 and 2018-02-08 07:00 utc
  • 2018-01-30 23:42:51 UTC scheduled maintenance will make the citycloud-sto2 api endpoint unavailable intermittently between 2018-01-31 07:00 and 2018-02-01 07:00 utc
  • 2018-01-30 20:18:33 UTC reenqueued release pipeline jobs for openstack/tripleo-ipsec 8.0.0
  • 2018-01-30 17:34:55 UTC 537933 promoted to help address with integrated gate timeout issue with nova
  • 2018-01-30 16:24:42 UTC ticket 180130-ord-0000697 filed to investigate an apparent 100Mbps rate limit on the mirror01.dfw.rax instance
  • 2018-01-30 16:04:21 UTC removed nb01, nb02 and nb03 from the emergency maintenance list now that it's safe to start building new ubuntu-xenial images again
  • 2018-01-30 14:24:08 UTC most recent ubuntu-xenial images have been deleted from nodepool, so future job builds should revert to booting from the previous (working) image while we debug
  • 2018-01-30 13:54:39 UTC nb01, nb02 and nb03 have beeen placed in the emergency maintenance list in preparation for a manual application of https://review.openstack.org/539213
  • 2018-01-30 13:44:45 UTC Our ubuntu-xenial images (used for e.g. unit tests and devstack) are currently failing to install any packages, restrain from *recheck* or *approve* until the issue has been investigated and fixed.
  • 2018-01-29 16:24:17 UTC zuul.o.o is back online, feel free to recheck / approve patches.
  • 2018-01-29 14:31:21 UTC we've been able to restart zuul, and re-enqueue changes for gate. Please hold off on recheck or approves, we are still recovering. More info shortly.
  • 2018-01-29 13:35:52 UTC Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead.
  • 2018-01-29 11:04:48 UTC Zuul is currently under heavy load. Do not *recheck* or *approve* any changes.
  • 2018-01-28 17:21:57 UTC Zuul has been restarted due to an outage in our cloud provider. Changes already in queues have been restored, but changes uploaded and approved since 12:30 UTC may need to be rechecked or reapproved.
  • 2018-01-28 15:05:28 UTC Jobs are currently not running and are staying queued in Zuul pending the completion of a maintenance at our cloud provider. Jobs will resume once this maintenance has been completed.
  • 2018-01-27 00:28:21 UTC logs.openstack.org crontab re-enabled, and static.o.o removed from emergency file
  • 2018-01-27 00:17:34 UTC mounted new logs filesystem on static.o.o
  • 2018-01-26 23:31:29 UTC created logs filesystem with "mkfs.ext4 -m 0 -j -i 14336 -L $NAME /dev/main/$NAME" http://paste.openstack.org/show/654140/
  • 2018-01-26 23:29:35 UTC cloned all volumes from static.openstack.org for later fsck; replaced main10 device because it seemed slow and recreated logs logical volume.
  • 2018-01-25 16:03:26 UTC logs.openstack.org is stabilized and there should no longer be *new* POST_FAILURE errors. Logs for jobs that ran in the past weeks until earlier today are currently unavailable pending FSCK completion. We're going to temporarily disable *successful* jobs from uploading their logs to reduce strain on our current limited capacity. Thanks for your patience !
  • 2018-01-25 15:43:52 UTC (dmsimard) We're running a modified log_archive_maintenance.sh from ~/corvus and https://review.openstack.org/#/c/537929/ as safety nets to keep us from running out of disk space
  • 2018-01-25 15:42:53 UTC (dmsimard) fsck started running in a screen on logs.o.o for /dev/mapper/main-logs at around 15:30UTC, logs are being sent straight to /srv/static
  • 2018-01-25 14:27:33 UTC We're currently experiencing issues with the logs.openstack.org server which will result in POST_FAILURE for jobs, please stand by and don't needlessly recheck jobs while we troubleshoot the problem.
  • 2018-01-24 19:01:37 UTC enqueued and promoted 537437,2 at the request of mriedem to avoid regression in gate
  • 2018-01-24 15:25:43 UTC gerrit has been suffering from a full disk, some mails may have been lost in the last couple of hours. we will now restart gerrit to address ongoing slowness, too
  • 2018-01-24 02:37:25 UTC manually removed infracloud chocolate from clouds.yaml (https://review.openstack.org/#/c/536989/) as it was holding up puppet runs
  • 2018-01-22 21:29:07 UTC restarted openstack-paste (lodgeit) service on paste.openstack.org as it was timing out responding to proxied requests from apache
  • 2018-01-22 18:56:19 UTC (dmsimard) files02.o.o was put in the emergency file pending a fix due to a missing zuul-ci.org_intermediate.pem file preventing apache from restarting properly
  • 2018-01-22 17:33:49 UTC deleted all contents in /afs/openstack.org/docs/draft at request of pkovar and AJaeger
  • 2018-01-22 12:16:04 UTC gerrit account 26576 has been set to inactive due to continued review spam
  • 2018-01-20 20:16:16 UTC zuul.openstack.org has been restarted due to an unexpected issue. We were able to save / reload queues however new patchsets during the restart may have been missed. Please recheck if needed.
  • 2018-01-20 01:01:09 UTC the zuulv3.openstack.org server has been replaced by a larger zuul01.openstack.org server
  • 2018-01-20 00:37:13 UTC ze* and zm* hosts removed from emergency disable list now that maintenance has concluded
  • 2018-01-19 23:44:18 UTC Zuul will be offline over the next 20 minutes to perform maintenance; active changes will be reenqueued once work completes, but new patch sets or approvals during that timeframe may need to be rechecked or reapplied as appropriate
  • 2018-01-19 21:00:03 UTC temporarily added ze*.openstack.org and zm*.openstack.org to the emergency disable list in preparation for replacing zuulv3 with zuul01
  • 2018-01-19 20:35:57 UTC nl03.o.o is now online and launching nodes
  • 2018-01-18 22:05:36 UTC deleted nodepool and zuul feature/zuulv3 branches
  • 2018-01-18 02:47:57 UTC mirror.bhs1.ovh.openstack.org was unresponsive ... hard reboot and it has reappeared. nothing useful in console logs unfortunately
  • 2018-01-18 02:41:49 UTC nb04.o.o stopped to prepare for nb01.o.o replacement tomorrow
  • 2018-01-17 20:40:45 UTC Zuul will be offline for a few minutes; existing changes will be re-enqueued; approvals during the downtime will need to be re-added.
  • 2018-01-17 00:11:48 UTC (dmsimard) Zuul scheduler status.jsons are now periodically backuped provided by https://review.openstack.org/#/c/532955/ -- these are available over the vhost, ex: http://zuulv3.openstack.org/backup/status_1516146901.json
  • 2018-01-15 18:58:26 UTC updated github zuul app to use new hostname: zuul.openstack.org
  • 2018-01-15 18:24:36 UTC Zuul has been restarted and has lost queue contents; changes in progress will need to be rechecked.
  • 2018-01-15 04:51:34 UTC The logs.openstack.org filesystem has been restored to full health. We are attempting to keep logs uploaded between the prior alert and this one, however if your job logs are missing please issue a recheck.
  • 2018-01-14 22:38:52 UTC The filesystem for the logs.openstack.org site was marked read-only at 2018-01-14 16:47 UTC due to an outage incident at the service provider; a filesystem recovery is underway, but job logs uploaded between now and completion are unlikely to be retained so please refrain from rechecking due to POST_FAILURE results until this alert is rescinded.
  • 2018-01-14 22:27:22 UTC a `fsck -y` of /dev/mapper/main-logs is underway in a root screen session on static.openstack.org
  • 2018-01-14 22:25:30 UTC rebooted static.openstack.org to make sure disconnected volume /dev/xvdg reattaches correctly
  • 2018-01-12 16:46:50 UTC Zuul has been restarted and lost queue information; changes in progress will need to be rechecked.
  • 2018-01-12 14:26:44 UTC manually started the apache2 service on ask.openstack.org since it seems to have segfaulted and died during log rotation
  • 2018-01-11 17:48:53 UTC Due to an unexpected issue with zuulv3.o.o, we were not able to preserve running jobs for a restart. As a result, you'll need to recheck your previous patchsets
  • 2018-01-11 17:03:14 UTC deleted old odsreg.openstack.org instance
  • 2018-01-11 16:56:32 UTC previously mentioned trove maintenance activities in rackspace have been postponed/cancelled and can be ignored
  • 2018-01-11 12:47:55 UTC nl01 and nl02 restarted to recover nodes in deletion
  • 2018-01-11 02:38:51 UTC zuul restarted due to the unexpected loss of ze04; jobs requeued
  • 2018-01-11 02:13:27 UTC zuul-executor stopped on ze04.o.o and it is placed in the emergency file, due to an external reboot applying https://review.openstack.org/#/c/532575/. we will need to more carefully consider the rollout of this code
  • 2018-01-10 23:20:00 UTC deleted old kdc02.openstack.org server
  • 2018-01-10 23:16:52 UTC deleted old eavesdrop.openstack.org server
  • 2018-01-10 23:14:42 UTC deleted old apps-dev.openstack.org server
  • 2018-01-10 22:24:55 UTC The zuul system is being restarted to apply security updates and will be offline for several minutes. It will be restarted and changes re-equeued; changes approved during the downtime will need to be rechecked or re-approved.
  • 2018-01-10 22:16:52 UTC deleted old stackalytics.openstack.org instance
  • 2018-01-10 22:14:54 UTC deleted old zuul.openstack.org instance
  • 2018-01-10 22:09:27 UTC manually reenqueued openstack/nova refs/tags/14.1.0 into the release pipeline
  • 2018-01-10 21:51:03 UTC deleted old zuul-dev.openstack.org instance
  • 2018-01-10 15:16:27 UTC manually started mirror.regionone.infracloud-vanilla which had been in shutoff state following application of meltdown patches to infracloud hosts
  • 2018-01-10 14:59:51 UTC Gerrit is being restarted due to slowness and to apply kernel patches
  • 2018-01-10 14:55:51 UTC manually started mirror.regionone.infracloud-chocolate which had been in shutoff state following application of meltdown patches to infracloud hosts
  • 2018-01-10 13:58:36 UTC another set of broken images has been in use from about 06:00-11:00 UTC, reverted once more to the previous ones
  • 2018-01-10 13:56:23 UTC zuul-scheduler has been restarted due to heavy swapping, queues have been restored.
  • 2018-01-10 04:59:44 UTC image builds are paused and we have reverted images to old ones after dib release produced images withou pip for python2. This lack of pip for python2 broke tox siblings in many tox jobs
  • 2018-01-09 16:16:44 UTC rebooted nb04 through the nova api; oob console content looked like a botched live migration
  • 2018-01-09 15:57:32 UTC Trove maintenance scheduled for 04:00-12:00 UTC on 2018-01-24 impacting paste_mysql_5.6 instance
  • 2018-01-09 15:56:50 UTC Trove maintenance scheduled for 04:00-12:00 UTC on 2018-01-23 impacting zuul_v3 instance
  • 2018-01-09 15:56:09 UTC Trove maintenance scheduled for 04:00-12:00 UTC on 2018-01-17 impacting Wiki_MySQL and cacti_MySQL instances
  • 2018-01-08 20:37:23 UTC The jobs and queues in Zuul between 19:55UTC and 20:20UTC have been lost after recovering from a crash, you might need to re-check your patches if they were being tested during that period.
  • 2018-01-08 20:33:57 UTC (dmsimard) the msgpack issue experienced yesterday on zm and ze nodes propagated to zuulv3.o.o and crashed zuul-web and zuul-scheduler with the same python general protection fault. They were started after re-installing msgpack but the contents of the queues were lost.
  • 2018-01-08 10:27:03 UTC zuul has been restarted, all queues have been reset. please recheck your patches when appropriate
  • 2018-01-07 21:06:56 UTC Parts of the Zuul infrastructure had to be restarted to pick up new jobs properly, it's possible you may have to recheck your changes if they did not get job results or if they failed due to network connectivity issues.
  • 2018-01-07 20:55:48 UTC (dmsimard) all zuul-mergers and zuul-executors stopped simultaneously after what seems to be a msgpack update which did not get installed correctly: http://paste.openstack.org/raw/640474/ everything is started after reinstalling msgpack properly.
  • 2018-01-07 19:54:35 UTC (dmsimard) ze10 has a broken dpkg transaction (perhaps due to recent outage), fixed with dpkg --configure -a and reinstalling unattended-upgrades http://paste.openstack.org/show/640466/
  • 2018-01-06 01:20:22 UTC (dmsimard) ze09.o.o was rebuilt from scratch after what seems to be a failed live migration which thrashed the root partition disk
  • 2018-01-05 23:00:17 UTC (dmsimard) ze10 was rebooted after being hung since january 4th
  • 2018-01-05 21:35:40 UTC added 2gb swap file to eavesdrop01 at /swapfile since it has no ephemeral disk
  • 2018-01-05 21:14:27 UTC started openstack-meetbot service manually on eavesdrop01.o.o after it was sniped by the oom killer
  • 2018-01-05 17:40:21 UTC Old git.openstack.org server has been deleted. New server's A and AAAA dns record TTLs bumped to an hour. We are now running with PTI enabled on all CentOS control plane servers.
  • 2018-01-05 01:37:53 UTC git0*.openstack.org patched and kernels running with PTI enabled. git.openstack.org has been replaced with a new server running with PTI enabled. The old server is still in places for straggler clients. Will need to be deleted and DNS record TTLs set back to one hour
  • 2018-01-04 14:48:50 UTC zuul has been restarted, all queues have been reset. please recheck your patches when appropriate
  • 2018-01-03 18:27:29 UTC (dmsimard) +r applied to channels to mitigate ongoing freenode spam wave: http://paste.openstack.org/raw/629168/
  • 2018-01-03 12:37:50 UTC manually started apache2 service on ask.o.o, seems to have crashed/failed to correctly restart during log rotation
  • 2018-01-03 07:01:02 UTC We stopped publishing documents on the 23rd of December by accident, this is fixed now. Publishing to docs.o.o and developer.o.o is workinging again. If you miss a document publish, the next merge should publish...
  • 2018-01-03 00:50:57 UTC no zuul-executor is running on ze04 currently. /var/run/ is a tmpfs on xenial so /var/run/zuul does not exist on ze04 after a reboot preventing zuul-executor from starting. https://review.openstack.org/530820 is the proposed fix
  • 2018-01-03 00:49:52 UTC openstackci-ovh accounts are not working and ovh hosted mirrors are not pinging. We have removed OVH from nodepool via 530817 and this change has been manually applied (see previous message for why ansible puppet is not working)
  • 2018-01-03 00:49:11 UTC openstackci-ovh accounts are not working and ovh hosted mirrors are not pinging. The accounts breaking appears to prevent ansible puppet from running (due to failed inventory) this needs to be sorted out.
  • 2017-12-25 13:08:50 UTC zuul scheduler restarted and all changes reenqueued
  • 2017-12-22 19:21:07 UTC lists.openstack.org has been taken out of the emergency disable list now that the spam blackholing in /etc/aliases is managed by puppet
  • 2017-12-22 10:18:13 UTC zuul has been restarted, all queues have been reset. please recheck your patches when appropriate
  • 2017-12-22 06:45:18 UTC Zuul.openstack.org is currently under heavy load and not starting new jobs. We're waiting for an admin to restart Zuul.
  • 2017-12-21 14:51:11 UTC vexxhost temporarily disabled in nodepool via https://review.openstack.org/529572 to mitigate frequent job timeouts
  • 2017-12-21 14:47:42 UTC promoted 528823,2 in the gate to unblock projects relying on sphinxcontrib.datatemplates in their documentation builds
  • 2017-12-20 23:15:25 UTC updated storyboard-dev openids from login.launchpad.net to login.ubuntu.com to solve DBDuplicateEntry exceptions on login
  • 2017-12-20 22:49:29 UTC enqueued 529067,1 into the gate pipeline and promoted in order to unblock requirements and release changes
  • 2017-12-20 19:59:17 UTC Disabled compute026.vanilla.ic.o.o due to hard disk being in read-only mode
  • 2017-12-20 13:15:47 UTC gerrit is being restarted due to extreme slowness
  • 2017-12-20 00:37:29 UTC nl01 nl02 manually downgraded to afcb56e0fb887a090dbf1380217ebcc06ef6b66b due to broken quota handling on branch tip
  • 2017-12-19 23:56:21 UTC removed infra-files-ro and infra-files-rw from all-clouds.yaml as they are invalid, and cause issues deploying new keys. saved in a backup file on puppetmaster.o.o if required
  • 2017-12-19 20:39:55 UTC Manually repaired eavesdrop URLs in recent plaintext meeting minutes after https://review.openstack.org/529118 merged
  • 2017-12-18 14:44:46 UTC (dmsimard) the channel restrictions mentioned last night have been removed after #freenode confirmed the spam wave had stopped.
  • 2017-12-18 03:12:30 UTC (dmsimard) we re-ran mlock commands on OpenStack channels using the accessbot list of channels instead, here is the definitive list of channels that were made +r: http://paste.openstack.org/raw/629176/
  • 2017-12-18 02:55:19 UTC (dmsimard) all channels configured through gerritbot have been mlocked +r: http://paste.openstack.org/raw/629168/ we should remove this once the spam wave subsides
  • 2017-12-18 01:52:19 UTC (dmsimard) added +r to additional targetted channels: #openstack-keystone, #openstack-cinder, #openstack-telemetry, #openstack-requirements, #openstack-release, #tripleo
  • 2017-12-18 01:49:20 UTC The freenode network is currently the target of automated spam attacks, we have enabled temporary restrictions on targetted OpenStack channels which requires users to be logged on to NickServ. If you see spam in your channel, please report it in #openstack-infra. Thanks.
  • 2017-12-18 01:40:13 UTC (dmsimard) enabled mode +r to prevent unregistered users from joining channels hit by spam bots: #openstack-ansible, #openstack-dev, #openstack-infra, #openstack-kolla, #openstack-operators, #puppet-openstack, #rdo
  • 2017-12-18 01:22:17 UTC Slowly deleting 4161 "jenkins" verify -1 votes from open changes in Gerrit with a 1-second delay between each
  • 2017-12-17 22:18:04 UTC trusty-era codesearch.o.o (was 104.130.138.207) has been deleted
  • 2017-12-17 19:24:39 UTC zuul daemon stopped on zuul.openstack.org after AJaeger noticed jenkins was commenting about merge failures on changes
  • 2017-12-14 20:40:21 UTC eavesdrop01.o.o online and running xenial
  • 2017-12-14 05:13:08 UTC codesearch.o.o removed from the emergency file. after 527557 it should be fine to run under normal puppet conditions
  • 2017-12-12 20:16:59 UTC The zuul scheduler has been restarted after lengthy troubleshooting for a memory consumption issue; earlier changes have been reenqueued but if you notice jobs not running for a new or approved change you may want to leave a recheck comment or a new approval vote
  • 2017-12-12 14:40:45 UTC We're currently seeing an elevated rate of timeouts in jobs and the zuulv3.openstack.org dashboard is intermittently unresponsive, please stand by while we troubleshoot the issues.
  • 2017-12-12 09:10:05 UTC Zuul is back online, looks like a temporary network problem.
  • 2017-12-12 08:49:48 UTC Our CI system Zuul is currently not accessible. Wait with approving changes and rechecks until it's back online. Currently waiting for an admin to investigate.
  • 2017-12-11 02:06:47 UTC root keypairs manually updated in all clouds
  • 2017-12-09 00:05:04 UTC zuulv3.o.o removed from emergency file now that puppet-zuul is updated to match deployed zuul
  • 2017-12-08 22:47:45 UTC old docs-draft volume deleted from static.openstack.org, and the recovered extents divvied up between the tarballs and logs volumes (now 0.5tib and 13.4tib respectively)
  • 2017-12-08 20:31:46 UTC added zuulv3.openstack.org to emergency file due to manual fixes to apache rewrite rules
  • 2017-12-08 15:43:29 UTC zuul.openstack.org is scheduled to be rebooted as part of a provider host migration at 2017-12-12 at 04:00 UTC
  • 2017-12-08 15:43:09 UTC elasticsearch02, elasticsearch04 and review-dev are scheduled to be rebooted as part of a provider host migration at 2017-12-11 at 04:00 UTC
  • 2017-12-08 15:38:00 UTC the current stackalytics.openstack.org instance is not recovering via reboot after a failed host migration, and will likely need to be deleted and rebuilt when convenient
  • 2017-12-08 15:36:03 UTC rebooted zuul.openstack.org after it became unresponsive in what looked like a host migration activity
  • 2017-12-08 14:02:35 UTC The issues have been fixed, Zuul is operating fine again but has a large backlog. You can recheck jobs that failed.
  • 2017-12-08 07:06:10 UTC Due to some unforseen Zuul issues the gate is under very high load and extremely unstable at the moment. This is likely to persist until PST morning
  • 2017-12-08 05:37:48 UTC due to stuck jobs seemingly related to ze04, zuul has been restarted. jobs have been requeued
  • 2017-12-08 05:16:35 UTC manually started zuul-executor on ze04
  • 2017-12-07 22:22:09 UTC logstash service stopped, killed and started again on all logstash-worker servers
  • 2017-12-07 17:40:02 UTC This message is to inform you that the host your cloud server 'ze04.openstack.org' resides on became unresponsive. We have rebooted the server and will continue to monitor it for any further alerts.
  • 2017-12-07 16:45:54 UTC This message is to inform you that the host your cloud server 'ze04.openstack.org' resides on alerted our monitoring systems at 16:35 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding what is causing the alert. Please do not access or modify 'ze04.openstack.org' during this process.
  • 2017-12-07 03:08:09 UTC fedora 27 mirroring complete, fedora 25 removed
  • 2017-12-06 21:25:07 UTC proposal.slave.o.o / release.slave.o.o / signin01.ci.o.o have all been deleted and DNS records removed
  • 2017-12-06 08:21:54 UTC zuul-scheduler restarted due to very high number of stuck jobs. check/gate/triple-o requeued
  • 2017-12-05 21:47:03 UTC zk01.o.o, zk02.o.o and zk03.o.o now online and SSH keys accepted into puppetmaster.o.o. Currently no production servers are connected to them.
  • 2017-12-05 06:39:56 UTC translate-dev.o.o removed from emergency list
  • 2017-12-04 23:47:51 UTC mirror.fedora afs volume was >95% full; upped it to 300000000.
  • 2017-12-04 17:53:48 UTC Manually pruned some larger Apache cache entries and flushed the pypi volume cache on mirror.regionone.tripleo-test-cloud-rh1.openstack.org following a full root filesystem event
  • 2017-12-04 17:51:04 UTC cleaned out old unused crm114 data dirs on logstash worker nodes using `sudo find /var/lib/crm114 -mtime +7 -delete` as recent changes to crm114 scripts mean we've collapsed those data dirs into a much smaller set at different paths ignoring the old data.
  • 2017-12-01 22:19:58 UTC Launched a new Mailman server corresponding to https://review.openstack.org/524322 and filed to exclude its ipv4 address from spamhaus record PBL1665489
  • 2017-12-01 13:53:14 UTC gerrit has been restarted to get it back to its normal speed.
  • 2017-11-30 15:39:15 UTC if you receieved a result of "RETRY_LIMIT" after 14:15 UTC, it was likely due to an error since corrected. please "recheck"
  • 2017-11-29 23:41:29 UTC Requested removal of storyboard.o.o ipv4 address from policy blacklist (pbl listing PBL1660430 for 23.253.84.0/22)
  • 2017-11-28 23:26:47 UTC #openstack-shade was retired, redirect put in place and users directed to join #openstack-sdks
  • 2017-11-28 18:12:42 UTC rebooting non responsive codesearch.openstack.org. It pings but does not http(s) or ssh. Probably another live migration gone bad
  • 2017-11-27 15:33:10 UTC Hard rebooted ze05.openstack.org after it was found hung (unresponsive even at console, determination of cause inconclusive, no smoking gun in kmesg entries)
  • 2017-11-27 04:34:59 UTC rebooted status.o.o as it was hung, likely migration failure as had xen timeout errors on the console
  • 2017-11-23 22:58:20 UTC zuulv3.o.o restarted to address memory issues, was a few 100MB from swapping (15GB). Additionally, cleans up leaks nodepool nodes from previous ze03.o.o issues listed above (or below).
  • 2017-11-23 22:56:34 UTC Zuul has been restarted due to an unexpected issue. We're able to re-enqueue changes from check and gate pipelines, please check http://zuulv3.openstack.org/ for more information.
  • 2017-11-23 21:06:55 UTC We seem to have an issue with ze03.o.o loosing ansible-playbook processes, as such we've been jobs continue to run in pipelines for 11+ hours. For now, I have stopped ze03 and will audit other servers
  • 2017-11-22 04:18:32 UTC had to revert the change to delete zuul-env on dib images as the zuul cloner shim does depend on that python installation and its pyyaml lib. Followed up by deleting the debian and centos images that were built without zuul envs
  • 2017-11-21 20:46:20 UTC deleted static wheel-build slaves from rax-dfw (centos / ubuntu-trusty / ubuntu-xenial) along with DNS entries
  • 2017-11-21 02:47:29 UTC ci-backup-rs-ord.openstack.org shutdown and all backup hosts migrated to run against backup01.ord.rax.ci.openstack.org. old backups remain at /opt/old-backups on the new server
  • 2017-11-20 01:25:18 UTC gerrit restarted; review.o.o 42,571 Mb / 48,551 Mb and persistent system load ~10, definite i/o spike o /dev/xvb but nothing usual to the naked eye
  • 2017-11-17 23:24:29 UTC git-review 1.26.0 released, adding support for Gerrit 2.14 and Git 2.15: https://pypi.org/project/git-review/1.26.0/
  • 2017-11-15 06:09:38 UTC zuulv3 stopped / starteed again. It appears an influx of commits like: https://review.openstack.org/519924/ is causing zuul to burn memory quickly.
  • 2017-11-15 02:59:50 UTC Had to stop zuul-scheduler due memory issue, zuul pushed over 15GB of ram and started swapping. https://review.openstack.org/513915/ then prevented zuul from starting, which needed us to then land https://review.openstack.org/519949/
  • 2017-11-15 02:58:22 UTC Due to an unexpected outage with Zuul (1 hour), you'll need to recheck any jobs that were in progress. Sorry for the inconvenience.
  • 2017-11-10 05:23:58 UTC puppetmaster.o.o was hung with oom errors on the console. rax support rebooted it for me
  • 2017-11-10 05:19:13 UTC the zombie host @ 146.20.110.99 (443 days uptime!) has been shutdown. this was likely causing POST_FAILURES as jobs managed to get run on it
  • 2017-11-09 10:07:36 UTC ovh having global issues : https://twitter.com/olesovhcom/status/928559286288093184
  • 2017-11-09 06:35:18 UTC restarting gerrit. 502's reported. 33,603 Mb / 48,550 Mb (stable since last check 03:00UTC) , persistent system load ~9 and high cpu since around 2017-11-09 05:00 UTC
  • 2017-11-06 23:46:07 UTC openstackid.org temporarily added to the emergency disable list so puppet won't undo debugging settings while an issue is investigated for the conference schedule app
  • 2017-11-02 23:11:34 UTC increased mirror.pypi afs volume quota from 1000000000 to 1200000000
  • 2017-11-02 14:42:46 UTC killed stuck gerrit to github replication task on nova-specs repo
  • 2017-11-01 21:57:53 UTC jessie mirror not updated since oct 10 due to reboot of server mid-update. manually removed stale lockfile for debian jessie reprepro; mirror updated and release sucessfully.
  • 2017-11-01 17:52:08 UTC logstash-worker16.o.o to logstash-worker20.o.o now online and SSH keys accepted
  • 2017-10-31 17:16:47 UTC removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/requirements on ze07
  • 2017-10-31 17:16:27 UTC removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/neutron on ze10
  • 2017-10-31 17:16:16 UTC removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/python-glanceclient on ze05
  • 2017-10-31 17:16:00 UTC restarted all zuul executors and cleaned up old processes from previous restarts
  • 2017-10-30 23:19:52 UTC geard (really jenkins-log-client) restarted on logstash.o.o to pick up gear 0.11.0 performance improvements. https://review.openstack.org/516473 needed to workaround zuul transition there.
  • 2017-10-30 22:21:27 UTC gear 0.11.0 tagged wtih statsd performance improvements
  • 2017-10-30 11:06:07 UTC restarted all zuul executors and restarted scheduler
  • 2017-10-30 10:48:09 UTC Zuul has been restarted due to an unexpected issue. Please recheck any jobs that were in progress
  • 2017-10-27 17:00:46 UTC Restarted elasticsearch on elasticsearch07 as it's process had crashed. Log doesn't give many clues as to why. Restarted log workers afterwards.
  • 2017-10-27 17:00:10 UTC Killed elastic recheck static status page update processes from october first to unlock processing lock. Status page updates seem to be processing now.
  • 2017-10-26 18:07:53 UTC zuul scheduler restarted and check/check-tripleo/gate pipeline contents successfully reenqueued
  • 2017-10-26 15:32:52 UTC Provider maintenance is scheduled for 2017-10-30 between 06:00-09:00 UTC which may result in up to a 5 minute connectivity outage for the production Gerrit server's Trove database instance
  • 2017-10-26 11:22:18 UTC docs.o.o index page was lost due to broken build and publishing of broken build, suggested fix in https://review.openstack.org/515365
  • 2017-10-26 10:43:53 UTC we lost the docs.o.o central home page, somehow our publishing is broken
  • 2017-10-26 05:18:22 UTC zm[0-4].o.o rebuilt to xenial and added to zuulv3
  • 2017-10-25 19:30:28 UTC zl01.o.o to zl06.o.o, zlstatic01.o.o have been deleted and DNS entries removed from rackspace.
  • 2017-10-25 06:45:41 UTC zuul v2/jenkins config has been removed from project-config
  • 2017-10-24 01:45:35 UTC all zuul executors have been restarted to pick up the latest bubblewrap bindmount addition
  • 2017-10-21 08:26:51 UTC increased Zanata limit for concurrent requests to 20 using Zanata UI
  • 2017-10-21 02:33:01 UTC all zuul executors restarted to pick up the /usr/share/ca-certificates addition
  • 2017-10-18 14:56:18 UTC Gerrit account 8944 set to inactive to handle a duplicate account issue
  • 2017-10-18 04:45:12 UTC review.o.o hard rebooted due to failure during live migration (rax ticket: 171018-ord-0000074). manually restarted gerrit after boot, things seem ok now
  • 2017-10-18 00:33:55 UTC due to unscheduled restart of zuulv3.o.o you will need to 'recheck' your jobs that were last running. Sorry for the inconvenience.
  • 2017-10-16 15:21:53 UTC elasticsearch cluster is now green after triggering index curator early to clear out old indexes "lost" on es07
  • 2017-10-16 03:05:41 UTC elasticsearch07.o.o rebooted & elasticsearch started. data was migrated from SSD storage and "main" vg contains only one block device now
  • 2017-10-15 22:06:10 UTC Zuul v3 rollout maintenance is underway, scheduled to conclude by 23:00 UTC: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123618.html
  • 2017-10-15 21:20:10 UTC Zuul v3 rollout maintenance begins at 22:00 UTC (roughly 45 minutes from now): http://lists.openstack.org/pipermail/openstack-dev/2017-October/123618.html
  • 2017-10-12 23:06:18 UTC Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now
  • 2017-10-12 16:04:42 UTC removed mirror.npm volume from afs
  • 2017-10-12 14:57:16 UTC Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow.
  • 2017-10-11 17:26:50 UTC move Gerrit account 27031s' openid to account 21561 and marked 27031 inactive
  • 2017-10-11 13:07:12 UTC Due to unrelated emergencies, the Zuul v3 rollout has not started yet; stay tuned for further updates
  • 2017-10-11 11:13:10 UTC deleted the errant review/andreas_jaeger/zuulv3-unbound branch from the openstack-infra/project-config repository (formerly at commit 2e8ae4da5d422df4de0b9325bd9c54e2172f79a0)
  • 2017-10-11 10:10:02 UTC The CI system will be offline starting at 11:00 UTC (in just under an hour) for Zuul v3 rollout: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123337.html
  • 2017-10-11 07:46:41 UTC Lots of RETRY_LIMIT errors due to unbound useage with Zuul v3, we reverted the change; recheck your changes
  • 2017-10-10 01:43:02 UTC manually rotated all logs on zuulv3.openstack.org as a stop-gap to prevent a full rootfs later when scheduled log rotation kicks in; an additional 14gib were freed as a result
  • 2017-10-10 00:43:23 UTC restart of *gerrit* complete
  • 2017-10-10 00:39:01 UTC restarting zuul after prolonged period of high GC activity is causing 502 errors
  • 2017-10-09 20:53:36 UTC cleared all old workspaces on signing01.ci to deal with those which had cached git remotes to some no-longer-existing zuul v2 mergers
  • 2017-10-05 00:51:41 UTC updated openids in the storyboard.openstack.org database from login.launchpad.net to login.ubuntu.com
  • 2017-10-04 06:31:26 UTC The special infra pipelines in zuul v3 have disappared
  • 2017-10-03 03:00:20 UTC zuulv3 restarted with 508786 508787 508793 509014 509040 508955 manually applied; should fix branch matchers, use *slightly* less memory, and fix the 'base job not defined' error
  • 2017-10-02 12:50:51 UTC Restarted nodepool-launcher on nl01 and nl02 to fix zookeeper connection
  • 2017-10-02 12:45:00 UTC ran `sudo -u zookeeper ./zkCleanup.sh /var/lib/zookeeper 3` in /usr/share/zookeeper/bin on nodepool.openstack.org to free up 22gib of space for its / filesystem
  • 2017-09-28 22:41:03 UTC zuul.openstack.org has been added to the emergency disable list so that a temporary redirect to zuulv3 can be installed by hand
  • 2017-09-28 14:44:03 UTC The infra team is now taking Zuul v2 offline and bringing Zuul v3 online. Please see https://docs.openstack.org/infra/manual/zuulv3.html for more information, and ask us in #openstack-infra if you have any questions.
  • 2017-09-26 23:40:51 UTC project-config is unable to merge changes due to problems found during zuul v3 migration. for the time being, if any emergency changes are needed (eg, nodepool config), please discuss in #openstack-infra and force-merge them.
  • 2017-09-26 18:25:58 UTC The infra team is continuing work to bring Zuul v3 online; expect service disruptions and please see https://docs.openstack.org/infra/manual/zuulv3.html for more information.
  • 2017-09-25 23:37:33 UTC project-config is frozen until further notice for the zuul v3 transition; please don't approve any changes without discussion with folks familiar with the migration in #openstack-infra
  • 2017-09-25 20:52:05 UTC The infra team is bringing Zuul v3 online; expect service disruptions and please see https://docs.openstack.org/infra/manual/zuulv3.html for more information.
  • 2017-09-25 15:50:39 UTC deleted all workspaces from release.slave.openstack.org to deal with changes to zuul v2 mergers
  • 2017-09-22 21:33:40 UTC jeepyb and gerritlib fixes for adding project creator to new groups on Gerrit project creation in process of getting landed. Please double check group membership after the next project creation.
  • 2017-09-22 19:12:01 UTC /vicepa filesystem on afs01.ord.openstack.org has been repaired and vos release of docs and docs.dev volumes have resumed to normal frequency
  • 2017-09-22 17:39:22 UTC When seeding initial group members in Gerrit remove the openstack project creator account until jeepyb is updated to do so automatically
  • 2017-09-22 11:06:09 UTC no content is currently pushed to docs.openstack.org - post jobs run successfully but docs.o.o is not updated
  • 2017-09-21 19:23:16 UTC OpenIDs for the Gerrit service have been restored from a recent backup and the service is running again; before/after table states are being analyzed now to identify any remaining cleanup needed for changes made to accounts today
  • 2017-09-21 18:25:35 UTC The Gerrit service on review.openstack.org is being taken offline briefly to perform database repair work but should be back up shortly
  • 2017-09-21 18:19:03 UTC Gerrit OpenIDs have been accidentally overwritten and are in the process of being restored
  • 2017-09-21 17:54:32 UTC nl01.o.o and nl02.o.o are both back online with site-specific nodepool.yaml files.
  • 2017-09-21 14:08:07 UTC nodepool.o.o removed from emergency file, ovh-bhs1 came back online at 03:45z.
  • 2017-09-21 13:39:00 UTC Gerrit account 8971 for "Fuel CI" has been disabled due to excessive failure comments
  • 2017-09-21 02:50:04 UTC OVH-BHS1 mirror has disappeared unexpectedly. did not respond to hard reboot. nodepool.o.o in emergency file and region max-servers set to 0
  • 2017-09-20 23:17:13 UTC Please don't merge any new project creation changes until mordred gives the go ahead. We have new puppet problems on the git backends and there are staged jeepyb changes we want to watch before opening the flood gates
  • 2017-09-20 20:21:59 UTC nb03.o.o / nb04.o.o added to emergency file
  • 2017-09-19 23:42:19 UTC Gerrit is once again part of normal puppet config management. Problems with Gerrit gitweb links and Zuul post jobs have been addressed. We currently cannot create new gerrit projects (fixes in progress) and email sending is slow (being debugged).
  • 2017-09-19 22:34:37 UTC Gerrit is being restarted to address some final issues, review.openstack.org will be inaccessible for a few minutes while we restart
  • 2017-09-19 20:28:23 UTC Zuul and Gerrit are being restarted to address issues discovered with the Gerrit 2.13 upgrade. review.openstack.org will be inaccessible for a few minutes while we make these changes. Currently running jobs will be restarted for you once Zuul and Gerrit are running again.
  • 2017-09-19 07:25:16 UTC Post jobs are not executed currently, do not tag any releases
  • 2017-09-19 07:13:26 UTC Zuul is not running any post jobs
  • 2017-09-19 02:42:08 UTC Gerrit is being restarted to feed its insatiable memory appetite
  • 2017-09-19 00:10:07 UTC please avoid merging new project creation changes until after we have the git backends puppeting properly
  • 2017-09-18 23:48:12 UTC review.openstack.org Gerrit 2.13 upgrade is functionally complete. The Infra team will be cleaning up bookkeeping items over the next couple days. If you have any questions please let us know
  • 2017-09-18 23:34:42 UTC review.openstack.org added to emergency file until git.o.o puppet is fixed and we can supervise a puppet run on review.o.o
  • 2017-09-18 16:40:08 UTC The Gerrit service at https://review.openstack.org/ is offline, upgrading to 2.13, for an indeterminate period of time hopefully not to exceed 23:59 UTC today: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 15:04:04 UTC The Gerrit service at https://review.openstack.org/ is offline, upgrading to 2.13, for an indeterminate period of time hopefully not to exceed 23:59 UTC today: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 14:33:25 UTC Gerrit will be offline for the upgrade to 2.13 starting at 15:00 UTC (in roughly 30 minutes) and is expected to probably be down/unusable for 8+ hours while an offline reindex is performed: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 13:48:14 UTC accountPatchReviewDb database created and gerrit2 account granted access in Review-MySQL trove instance, in preparation for upcoming gerrit upgrade maintenance
  • 2017-09-18 13:38:33 UTC updatepuppetmaster cron job on puppetmaster.openstack.org has been disabled in preparation for the upcoming gerrit upgrade maintenance
  • 2017-09-18 13:38:31 UTC Gerrit will be offline for the upgrade to 2.13 starting at 15:00 UTC (in roughly 1.5 hours) and is expected to probably be down/unusable for 8+ hours while an offline reindex is performed: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 12:07:34 UTC Gerrit will be offline for the upgrade to 2.13 starting at 15:00 UTC (in roughly 3 hours) and is expected to probably be down/unusable for 8+ hours while an offline reindex is performed: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-17 15:30:17 UTC Zuul has been fixed, you can approve changes again.
  • 2017-09-17 05:52:25 UTC Zuul is currently not moving any changes into the gate queue. Wait with approving changes until this is fixed.
  • 2017-09-17 01:06:37 UTC Zuul has been restarted to pick up a bug fix in prep for Gerrit upgrade. Changes have been reenqueued for you.
  • 2017-09-16 14:21:27 UTC OpenStack CI is fixed and fully operational again, feel free to "recheck" your jobs
  • 2017-09-16 09:12:28 UTC OpenStack CI is currently not recording any votes in gerrit. Do not recheck your changes until this is fixed.
  • 2017-09-14 23:12:24 UTC Artifact signing key for Pike has been retired; key for Queens is now in production
  • 2017-09-13 23:05:46 UTC CentOS 7.4 point release today has resulted in some mirror disruption, repair underway; expect jobs on centos7 nodes to potentially fail for a few hours longer
  • 2017-09-13 14:36:45 UTC increased ovh quotas to bhs1:80 gra1:50 as we haven't had launch errors recently according to grafana
  • 2017-09-11 22:50:48 UTC zm05.o.o - zm08.o.o now online running on ubuntu xenial
  • 2017-09-09 00:17:24 UTC nodepool.o.o added to ansible emergency file so that we can hand tune the max-servers in ovh. Using our previous numbers results in lots of 500 errors from the clouds
  • 2017-09-08 16:08:20 UTC New 1TB cinder volume attached to Rax ORD backup server and backups filesystem extended to include that space. This was done in response to a full filesystem. Backups should begin functioning again on the next pulse.
  • 2017-09-08 13:48:12 UTC nodepool issue related to bad images has been resolved, builds should be coming back online soon. Restarted gerrit due to reasons. Happy Friday.
  • 2017-09-08 10:48:41 UTC Our CI systems experience a hickup, no new jobs are started. Please stay tuned and wait untils this resolved.
  • 2017-09-05 22:47:53 UTC logstash-worker16.o.o to logstash-worker20.o.o deleted in rackspace
  • 2017-09-04 19:18:46 UTC ubuntu-xenial nodepool-launcher (nl02.o.o) online
  • 2017-09-04 19:17:35 UTC logstash-worker16.o.o to logstash-worker20.o.o services stopped
  • 2017-08-29 18:00:17 UTC /etc/hosts on mirror.regionone.infracloud-vanilla.org has buildlogs.centos.org pinned to 38.110.33.4. This is temporary to see if round robin DNS is our issues when we proxy to buildlogs.centos.org
  • 2017-08-29 16:20:39 UTC replaced myself with clarkb at https://review.openstack.org/#/admin/groups/infra-ptl
  • 2017-08-28 12:11:46 UTC restarted ptgbot service on eavesdrop at 11:29 utc; was disconnected from freenode 2017-08-26 02:29 utc due to an irc ping timeout
  • 2017-08-24 16:00:19 UTC hound service on codesearch.o.o stopped / started to pick up new projects for indexing
  • 2017-08-23 23:17:52 UTC infracloud-vanilla is offline due to the keystone certificate expiring. this has also broken puppet-run-all on puppetmaster.
  • 2017-08-22 07:43:46 UTC Gerrit has been restarted successfully
  • 2017-08-22 07:37:59 UTC Gerrit is going to be restarted due to slow performance
  • 2017-08-17 16:10:10 UTC deleted mirror.mtl01.internap.openstack.org (internap -> inap rename)
  • 2017-08-17 04:21:46 UTC all RAX mirror hosts (iad, ord and dfw) migrated to new Xenial based hosts
  • 2017-08-16 23:43:32 UTC renamed nodepool internap provider to inap. new mirror server in use.
  • 2017-08-16 19:55:08 UTC zuul v3 executors ze02, ze03, ze04 are online
  • 2017-08-16 19:54:55 UTC zuul v2 launchers zl07, zl08, zl09 have been deleted due to reduced cloud capacity and to make way for zuul v3 executors
  • 2017-08-16 13:01:36 UTC trove configuration "sanity" created in rax dfw for mysql 5.7, setting our usual default overrides (wait_timeout=28800, character_set_server=utf8, collation_server=utf8_bin)
  • 2017-08-15 20:35:51 UTC created auto hold for gate-tripleo-ci-centos-7-containers-multinode to debug docker.io issues with reverse proxy
  • 2017-08-15 18:42:16 UTC mirror.sto2.citycloud.o.o DNS updated to 46.254.11.19 TTL 60
  • 2017-08-14 15:29:31 UTC mirror.kna1.citycloud.openstack.org DNS entry updated to 91.123.202.15
  • 2017-08-11 20:39:46 UTC created mirror.mtl01.inap.openstack.org to replace mirror.mtl01.internap.openstack.org (internap -> inap rename)
  • 2017-08-11 19:20:14 UTC The apps.openstack.org server has been stopped, snapshotted one last time, and deleted.
  • 2017-08-11 05:00:49 UTC restarted mirror.ord.rax.openstack.org per investigation in https://bugs.launchpad.net/openstack-gate/+bug/1708707 which suggested apache segfaults causing pypi download failures. Will monitor
  • 2017-08-10 23:47:46 UTC removed 8.8.8.8 dns servers from both infracloud-chocolate and infracloud-vanilla provider-subnet-infracloud subnet
  • 2017-08-10 20:03:12 UTC Image builds manually queued for centos-7, debian-jessie, fedora-25, fedora-26, opensuse-423, ubuntu-trusty and ubuntu-xenial to use latest glean (1.9.2)
  • 2017-08-10 19:50:10 UTC glean 1.9.2 released to properly support vfat configdrive labels
  • 2017-08-10 12:27:50 UTC mirror.lon1.citycloud.openstack.org migrated to a new compute node by Kim from citycloud. appears up. nodepool conf restored & nodepool.o.o taken out of emergency file
  • 2017-08-10 12:13:12 UTC nodepool in emergency file and citycloud-lon1 region commented out while we investigate issues with mirror
  • 2017-08-09 20:18:19 UTC OVH ticket 8344470555 has been opened to track voucher reinstatement/refresh
  • 2017-08-08 00:07:46 UTC Gerrit on review.openstack.org restarted just now, and is no longer using contact store functionality or configuration options
  • 2017-08-07 23:34:49 UTC The Gerrit service on review.openstack.org will be offline momentarily at 00:00 utc for a quick reconfiguration-related restart
  • 2017-08-07 16:38:16 UTC temporarily blocked 59.108.63.126 in iptables on static.openstack.org due to a denial of service condition involving tarballs.o.o/kolla/images/centos-source-registry-ocata.tar.gz
  • 2017-08-04 20:37:45 UTC Gerrit is being restarted to pick up CSS changes and should be back momentarily
  • 2017-08-02 20:00:10 UTC OSIC environment is active in Nodepool and running jobs normally once more
  • 2017-08-02 17:29:57 UTC infracloud-vanilla back online
  • 2017-08-02 14:18:29 UTC mirror.regionone.infracloud-vanilla.openstack.org DNS updated to 15.184.65.187
  • 2017-08-02 13:59:00 UTC We have disable infracloud-vanilla due to the compute host running mirror.regionone.infracloud-vanilla.o.o being offline. Please recheck your failed jobs to schedule them to another cloud.
  • 2017-08-01 23:49:09 UTC osic nodes have been removed from nodepool due to a problem with the mirror host beginning around 22:20 UTC. please recheck any jobs with failures installing packages.
  • 2017-08-01 22:16:19 UTC pypi mirror manually updated and released
  • 2017-08-01 21:28:46 UTC pypi mirrors have not updated since 2:15 UTC due to issue with pypi.python.org. reported issue, since corrected. mirror updates now in progress.
  • 2017-08-01 08:09:21 UTC Yolanda has started nodepool-launcher process because it was stopped for more than one hour
  • 2017-07-31 07:39:25 UTC Yolanda had to restart nodepool-launcher because vms were not being spinned and the process looked inactive for the latest 90 min
  • 2017-07-28 17:14:32 UTC The Gerrit service on review.openstack.org is being taken offline for roughly 5 minutes to perform a database backup and reconfiguration
  • 2017-07-23 23:23:03 UTC Job triggering events between 21:00 and 23:15 UTC were lost, and any patch sets uploaded or approved during that timeframe will need rechecking or reapproval before their jobs will run
  • 2017-07-22 00:27:10 UTC restarted logstash and jenkins-log-worker-{A,B,C,D} services on all logstash-workerNN servers to get logs processing again
  • 2017-07-22 00:26:02 UTC manually expired old elasticsearch shards to get the cluster back into a sane state
  • 2017-07-21 19:24:23 UTC docs.o.o is up again, https://review.openstack.org/486196 fixes it - but needed manual applying since jobs depend on accessing docs.o.o
  • 2017-07-21 18:43:07 UTC kibana on logstash.o.o is currently missing entries past 21:25 utc yesterday
  • 2017-07-21 18:42:20 UTC elasticsearch02 has been hard-rebooted via nova after it hung at roughly 21:25 utc yesterday; elasticsearch service on elasticsearch05 also had to be manually started following a spontaneous reboot from 2017-07-14 01:39..18:27 (provider ticket from that date mentions an unresponsive hypervisor host); cluster is recovering now but kibana on logstash.o.o is currently missing entries past
  • 2017-07-21 18:41:02 UTC docs.o.o is currently broken, we're investigating
  • 2017-07-21 17:07:30 UTC Restarting Gerrit for our weekly memory leak cleanup.
  • 2017-07-19 23:07:08 UTC restarted nodepool-launcher which had frozen (did not respond to SIGUSR2)
  • 2017-07-19 13:24:08 UTC the lists.o.o server is temporarily in emergency disable mode pending merger of https://review.openstack.org/484989
  • 2017-07-17 20:39:01 UTC /srv/static/tarballs/trove/images/ubuntu/mysql.qcow2 has been removed from static.openstack.org again
  • 2017-07-14 13:39:41 UTC deleted duplicate mirror.la1.citycloud and forced regeneration of dynamic inventory to get it to show up
  • 2017-07-13 19:09:37 UTC docs maintenance is complete and afsdb01 puppet and vos release cronjob have been reenabled
  • 2017-07-13 18:11:47 UTC puppet updates for afsdb01 have been temporarily suspended and its vos release cronjob disabled in preparation for manually reorganizing the docs volume
  • 2017-07-13 00:17:28 UTC zl08.o.o and zl09.o.o are now online and functional.
  • 2017-07-12 16:28:32 UTC both mirrors in infracloud-chocolate and infracloud-vanilla replaced with 250GB HDD mirror flavors now.
  • 2017-07-12 14:46:27 UTC DNS for mirror.regionone.infracloud-chocolate.openstack.org changed to 15.184.69.112, 60min TTL
  • 2017-07-12 13:22:05 UTC DNS for mirror.regionone.infracloud-vanilla.openstack.org changed to 15.184.66.172, 60min TTL
  • 2017-07-12 07:59:43 UTC Gerrit has been successfully restarted
  • 2017-07-12 07:51:20 UTC Gerrit is going to be restarted, due to low performance
  • 2017-07-12 06:53:30 UTC FYI, ask.openstack.org is down, review.o.o is slow - please have patience until this is fixed
  • 2017-07-11 18:00:06 UTC small hiccup in review-dev gerrit 2.13.8 -> 2.13.9 upgrade. Will be offline temporarily while we wait on puppet to curate lib installations
  • 2017-07-10 21:03:37 UTC 100gb cinder volume added and corresponding proxycache logical volume mounted at /var/cache/apache2 on mirrors for ca-ymq-1.vexxhost, dfw.rax, iad.rax, mtl01.internap, ord.rax, regionone.osic-cloud1
  • 2017-07-10 21:01:51 UTC zuul service on zuul.openstack.org restarted to clear memory utilization from slow leak
  • 2017-07-10 19:22:40 UTC similarly reinstalled tox on all other ubuntu-based zuul_nodes tracked in hiera (centos nodes seem to have been unaffected)
  • 2017-07-10 19:04:46 UTC reinstalled tox on proposal.slave.o.o using python 2.7, as it had defaulted to 3.4 at some point in the past (possibly related to the pip vs pip3 mixup last month)
  • 2017-07-10 17:01:01 UTC old mirror lv on static.o.o reclaimed to extend the tarballs lv by 150g
  • 2017-07-06 23:45:47 UTC nb03.openstack.org has been cleaned up and rebooted, and should return to building rotation
  • 2017-07-06 12:01:55 UTC docs.openstack.org is up again.
  • 2017-07-06 11:17:42 UTC docs.openstack.org has internal error (500). Fix is underway.
  • 2017-07-03 15:40:16 UTC "docs.openstack.org is working fine again, due to move of new location, each repo needs to merge one change to appear on docs.o.o"
  • 2017-07-03 15:26:19 UTC rebooting files01.openstack.org to clear up defunct apache2 zombies ignoring sigkill
  • 2017-07-03 15:21:17 UTC "We're experiencing a few problems with the reorg on docs.openstack.org and are looking into these..."
  • 2017-07-03 14:39:21 UTC We have switched now all docs publishing jobs to new documentation builds. For details see dhellmann's email http://lists.openstack.org/pipermail/openstack-dev/2017-July/119221.html . For problems, join us on #openstack-doc
  • 2017-07-01 00:33:44 UTC Reissued through June 2018 and manually tested all externally issued SSL/TLS certificates for our servers/services
  • 2017-06-29 18:03:43 UTC review-dev has been upgraded to gerrit 2.13.8. Please test behavior and functionality and note any abnormalities on https://etherpad.openstack.org/p/gerrit-2.13.-upgrade-steps
  • 2017-06-23 08:05:47 UTC ok git.openstack.org is working again, you can recheck failed jobs
  • 2017-06-23 06:06:21 UTC unknown issue with the git farm, everything broken - we're investigating
  • 2017-06-20 21:19:32 UTC The Gerrit service on review-dev.openstack.org is being taken offline for an upgrade to 2.13.7.4.988b40f
  • 2017-06-20 15:41:54 UTC Restarted openstack-paste service on paste.openstack.org as lodgeit runserver process was hung and unrsponsive (required sigterm followed up sighup before it would exit)
  • 2017-06-20 12:57:52 UTC restarting gerrit to address slowdown issues
  • 2017-06-18 21:29:58 UTC Image builds for ubuntu-trusty are paused and have been rolled back to yesterday until DNS issues can be unraveled
  • 2017-06-17 03:03:42 UTC zuulv3.o.o and ze01.o.o now using SSL/TLS for gearman operations
  • 2017-06-09 14:58:36 UTC The Gerrit service on review.openstack.org is being restarted now to clear an issue arising from an unanticipated SSH API connection flood
  • 2017-06-09 14:06:10 UTC Blocked 169.48.164.163 in iptables on review.o.o temporarily for excessive connection counts
  • 2017-06-07 20:40:18 UTC Blocked 60.251.195.198 in iptables on review.o.o temporarily for excessive connection counts
  • 2017-06-07 20:39:49 UTC Blocked 113.196.154.248 in iptables on review.o.o temporarily for excessive connection counts
  • 2017-06-07 20:07:25 UTC The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes
  • 2017-06-07 19:59:08 UTC Blocked 169.47.209.131, 169.47.209.133, 113.196.154.248 and 210.12.16.251 in iptables on review.o.o temporarily while debugging excessive connection counts
  • 2017-06-06 19:27:56 UTC both zuulv3.o.o and ze01.o.o are online and under puppet cfgmgmt
  • 2017-06-05 22:30:53 UTC Puppet updates are once again enabled for review-dev.openstack.org
  • 2017-06-05 14:37:25 UTC review-dev.openstack.org has been added to the emergency disable list for Puppet updates so additional trackingid entries can be tested there
  • 2017-06-01 14:35:16 UTC python-setuptools 36.0.1 has been released and now making its way into jobs. Feel free to 'recheck' your failures. If you have any problems, please join #openstack-infra
  • 2017-06-01 09:46:17 UTC There is a known issue with setuptools 36.0.0 and errors about the "six" package. For current details see https://github.com/pypa/setuptools/issues/1042 and monitor #openstack-infra
  • 2017-05-27 12:05:22 UTC The Gerrit service on review.openstack.org is restarting to clear some hung API connections and should return to service momentarily.
  • 2017-05-26 20:58:41 UTC OpenStack general mailing list archives from Launchpad (July 2010 to July 2013) have been imported into the current general archive on lists.openstack.org.
  • 2017-05-26 09:57:14 UTC Free space for logs.openstack.org reached 40GiB, so an early log expiration run (45 days) is underway in a root screen session.
  • 2017-05-25 23:18:21 UTC The nodepool-dsvm jobs are failing for now, until we reimplement zookeeper handling in our devstack plugin
  • 2017-05-24 17:46:12 UTC nb03.o.o and nb04.o.o are online (upgraded to xenial). Will be waiting a day or 2 before deleting nb01.o.o and nb02.o.o.
  • 2017-05-24 14:52:39 UTC both nb01.o.o and nb02.o.o are stopped. This is to allow nb03.o.o to build todays images
  • 2017-05-24 04:10:31 UTC Sufficient free space has been reclaimed that jobs are passing again; any POST_FAILURE results can now be rechecked.
  • 2017-05-23 21:25:01 UTC The logserver has filled up, so jobs are currently aborting with POST_FAILURE results; remediation is underway.
  • 2017-05-23 14:04:47 UTC Disabled Gerrit account 10842 (Xiexianbin) for posting unrequested third-party CI results on changes
  • 2017-05-17 10:55:41 UTC gerrit is being restarted to help stuck git replication issues
  • 2017-05-15 07:02:20 UTC eavesdrop is up again, logs from Sunday 21:36 to Monday 7:01 are missing
  • 2017-05-15 06:42:55 UTC eavesdrop is currently not getting updated
  • 2017-05-12 13:39:24 UTC The Gerrit service on http://review.openstack.org is being restarted to address hung remote replication tasks.
  • 2017-05-11 18:42:55 UTC OpenID authentication through LP/UO SSO is working again
  • 2017-05-11 17:29:50 UTC The Launchpad/UbuntuOne SSO OpenID provider is offline, preventing logins to review.openstack.org, wiki.openstack.org, et cetera; ETA for fix is unknown
  • 2017-05-03 18:54:36 UTC Gerrit on review.openstack.org is being restarted to accomodate a memory leak in Gerrit. Service should return shortly.
  • 2017-05-01 18:15:44 UTC Upgraded wiki.openstack.org from MediaWiki 1.28.0 to 1.28.2 for CVE-2017-0372
  • 2017-04-27 17:52:33 UTC DNS has been updated for the new redirects added to static.openstack.org, moving them off old-wiki.openstack.org (which is now being taken offline)
  • 2017-04-25 15:52:41 UTC Released bindep 2.4.0
  • 2017-04-21 20:38:54 UTC Gerrit is back in service and generally usable, though remote Git replicas (git.openstack.org and github.com) will be stale for the next few hours until online reindexing completes
  • 2017-04-21 20:06:20 UTC Gerrit is offline briefly for scheduled maintenance http://lists.openstack.org/pipermail/openstack-dev/2017-April/115702.html
  • 2017-04-21 19:44:12 UTC Gerrit will be offline briefly starting at 20:00 for scheduled maintenance http://lists.openstack.org/pipermail/openstack-dev/2017-April/115702.html
  • 2017-04-18 21:51:51 UTC nodepool.o.o restarted to pick up https://review.openstack.org/#/c/455466/
  • 2017-04-14 17:23:54 UTC vos release npm.mirror --localauth currently running from screen in afsdb01
  • 2017-04-14 02:01:28 UTC wiki.o.o required a hard restart due to host issues following rackspace network maintenence
  • 2017-04-13 19:53:37 UTC The Gerrit service on http://review.openstack.org is being restarted to address hung remote replication tasks.
  • 2017-04-13 08:52:57 UTC zuul was restarted due to an unrecoverable disconnect from gerrit. If your change is missing a CI result and isn't listed in the pipelines on http://status.openstack.org/zuul/ , please recheck
  • 2017-04-12 21:27:31 UTC Restarting Gerrit for our weekly memory leak cleanup.
  • 2017-04-11 14:48:58 UTC we have rolled back centos-7, fedora-25 and ubuntu-xenial images to the previous days release. Feel free to recheck your jobs now.
  • 2017-04-11 14:28:32 UTC latest base images have mistakenly put python3 in some places expecting python2 causing widespread failure of docs patches - fixes are underway
  • 2017-04-11 02:17:51 UTC bindep 2.3.0 relesed to fix fedora 25 image issues
  • 2017-04-09 16:23:03 UTC lists.openstack.org is back online. Thanks for your patience.
  • 2017-04-09 15:18:22 UTC We are preforming unscheduled maintenance on lists.openstack.org, the service is currently down. We'll post a follow up shortly
  • 2017-04-07 19:00:49 UTC ubuntu-precise has been removed from nodepool.o.o, thanks for the memories
  • 2017-04-06 15:00:18 UTC zuulv3 is offline awaiting a security update.
  • 2017-04-05 14:02:24 UTC git.openstack.org is synced up
  • 2017-04-05 12:53:14 UTC The Gerrit service on http://review.openstack.org is being restarted to address hung remote replication tasks, and should return to an operable state momentarily
  • 2017-04-05 11:16:06 UTC cgit.openstack.org is not up to date
  • 2017-04-04 16:13:40 UTC The openstackid-dev server has been temporarily rebuilt with a 15gb performance flavor in preparation for application load testing
  • 2017-04-01 13:29:37 UTC The http://logs.openstack.org/ site is back in operation; previous logs as well as any uploaded during the outage should be available again; jobs which failed with POST_FAILURE can also be safely rechecked.
  • 2017-03-31 21:52:06 UTC The upgrade maintenance for lists.openstack.org has been completed and it is back online.
  • 2017-03-31 20:00:04 UTC lists.openstack.org will be offline from 20:00 to 23:00 UTC for planned upgrade maintenance
  • 2017-03-31 08:27:06 UTC logs.openstack.org has corrupted disks, it's being repaired. Please avoid rechecking until this is fixed
  • 2017-03-31 07:46:38 UTC Jobs in gate are failing with POST_FAILURE. Infra roots are investigating
  • 2017-03-30 17:05:30 UTC The Gerrit service on review.openstack.org is being restarted briefly to relieve performance issues, and should return to service again momentarily.
  • 2017-03-29 18:47:18 UTC statusbot restarted since it seems to have fallen victim to a ping timeout (2017-03-26 20:55:32) and never realized it
  • 2017-03-23 19:13:06 UTC eavesdrop.o.o cinder volume rotated to avoid rackspace outage on Friday March 31 03:00-09:00 UTC
  • 2017-03-23 16:20:33 UTC Cinder volumes static.openstack.org/main08, eavesdrop.openstack.org/main01 and review-dev.openstack.org/main01 will lose connectivity Friday March 31 03:00-09:00 UTC unless replaced by Wednesday March 29.
  • 2017-03-21 08:43:22 UTC Wiki problems have been fixed, it's up and running
  • 2017-03-21 00:44:19 UTC LP bugs for monasca migrated to openstack/monasca-api in StoryBoard, defcore to openstack/defcore, refstack to openstack/refstack
  • 2017-03-16 15:59:20 UTC The Gerrit service on review.openstack.org is being restarted to address hung remote replication tasks, and should return to an operable state momentarily
  • 2017-03-16 11:49:38 UTC paste.openstack.org service is back up - turns out it was a networking issue, not a database issue. yay networks!
  • 2017-03-16 11:02:17 UTC paste.openstack.org is down, due to connectivity issues with backend database. support ticket has been created.
  • 2017-03-14 16:07:35 UTC Changes https://review.openstack.org/444323 and https://review.openstack.org/444342 have been approved, upgrading https://openstackid.org/ production to what's been running and tested on https://openstackid-dev.openstack.org/
  • 2017-03-14 13:55:27 UTC Gerrit has been successfully restarted
  • 2017-03-14 13:49:09 UTC Gerrit has been successfully restarted
  • 2017-03-14 13:42:50 UTC Gerrit is going to be restarted due to performance problems
  • 2017-03-14 04:22:30 UTC gerrit under load throwing 503 errors. Service restart fixed symptoms and appears to be running smoothly
  • 2017-03-13 17:46:25 UTC restarting gerrit to address performance problems
  • 2017-03-09 16:43:59 UTC nodepool-builder restarted on nb02.o.o after remounting /opt file system
  • 2017-03-07 15:59:57 UTC compute085.chocolate.ic.o.o back in service
  • 2017-03-07 15:46:03 UTC compute085.chocolate.ic.o.o currently disabled on controller00.chocolate.ic.o.o, investigating a failing with the neutron linuxbridge agent
  • 2017-03-06 21:33:48 UTC nova-computer for compute035.vanilla.ic.o.o has been disabled on controller.vanilla.ic.o.o. compute035.vanilla.ic.o.o appears to be having HDD issue, currently in ready-only mode.
  • 2017-03-06 21:17:46 UTC restarting gerrit to address performance problems
  • 2017-03-04 14:36:00 UTC CORRECTION: The afs01.dfw.openstack.org/main01 volume has been successfully replaced by afs01.dfw.openstack.org/main04 and is therefore no longer impacted by the coming block storage maintenance.
  • 2017-03-04 13:35:22 UTC The afs01.dfw.openstack.org/main01 volume has been successfully replaced by review.openstack.org/main02 and is therefore no longer impacted by the coming block storage maintenance.
  • 2017-03-03 21:47:51 UTC The review.openstack.org/main01 volume has been successfully replaced by review.openstack.org/main02 and is therefore no longer impacted by the coming block storage maintenance.
  • 2017-03-03 16:39:58 UTC Upcoming provider maintenance 04:00-10:00 UTC Wednesday, March 8 impacting Cinder volumes for: afs01.dfw, nb02 and review
  • 2017-03-03 14:28:54 UTC integrated gate is blocked by job waiting for trusty-multinode node
  • 2017-03-01 14:26:12 UTC Provider maintenance resulted in loss of connectivity to the static.openstack.org/main06 block device taking our docs-draft logical volume offline; filesystem recovery has been completed and the volume brought back into service.
  • 2017-02-28 23:13:36 UTC manually installed paramiko 1.18.1 on nodepool.o.o and restarted nodepool (due to suspected bug related to https://github.com/paramiko/paramiko/issues/44 in 1.18.2)
  • 2017-02-28 13:45:41 UTC gerrit is back to normal and I don't know how to use the openstackstaus bot
  • 2017-02-28 13:39:11 UTC ok gerrit is back to normal
  • 2017-02-28 13:10:06 UTC restarting gerrit to address performance problems
  • 2017-02-23 14:40:37 UTC nodepool-builder (nb01.o.o / nb02.o.o) stopped again. As a result of zuulv3-dev.o.o usage of infra-chocolate, we are accumulating DIB images on disk
  • 2017-02-23 13:42:06 UTC The mirror update process has completed and resulting issue confirmed solved; any changes whose jobs failed on invalid qemu package dependencies can now be safely rechecked to obtain new results.
  • 2017-02-23 13:05:37 UTC Mirror update failures are causing some Ubuntu-based jobs to fail on invalid qemu package dependencies; the problem mirror is in the process of updating now, so this condition should clear shortly.
  • 2017-02-22 14:55:51 UTC Created Continuous Integration Tools Development in All-Projects.git (UI), added zuul gerrit user to the group.
  • 2017-02-17 19:05:17 UTC Restarting gerrit due to performance problems
  • 2017-02-17 07:48:00 UTC osic-cloud disabled again, see https://review.openstack.org/435250 for some background
  • 2017-02-16 21:37:58 UTC zuulv3-dev.o.o is now online. Zuul services are currently stopped.
  • 2017-02-16 18:19:17 UTC osic-cloud1 temporarily disable. Currently waiting for root cause of networking issues.
  • 2017-02-15 23:18:25 UTC nl01.openstack.org (nodepool-launcher) is now online. Nodepool services are disabled.
  • 2017-02-15 20:58:25 UTC We're currently battling an increase in log volume which isn't leaving sufficient space for new jobs to upload logs and results in POST_FAILURE in those cases; recheck if necessary but keep spurious rebasing and rechecking to a minimum until we're in the clear.
  • 2017-02-14 23:08:17 UTC Hard rebooted mirror.ca-ymq-1.vexxhost.openstack.org because vgs was hanging indefinitely, impacting our ansible/puppet automation
  • 2017-02-13 17:20:54 UTC AFS replication issue has been addressed. Mirrors are currently re-syncing and coming back online.
  • 2017-02-13 15:51:28 UTC We are currently investigating an issue with our AFS mirrors which is causing some projects jobs to fail. We are working to correct the issue.
  • 2017-02-10 14:14:43 UTC The afs02.dfw.openstack.org/main02 volume in Rackspace DFW is expected to become unreachable between 04:00-10:00 UTC Sunday and may require corrective action on afs02.dfw.o.o as a result
  • 2017-02-10 14:12:44 UTC Rackspace will be performing Cinder maintenance in DFW from 04:00 UTC Saturday through 10:00 Sunday (two windows scheduled)
  • 2017-02-09 20:21:48 UTC Restarting gerrit due to performance problems
  • 2017-02-09 20:18:23 UTC Restarting gerrit due to performance problems
  • 2017-02-08 11:36:51 UTC The proposal node had disconnected from the static zuul-launcher. Restarting the launcher has restored connection and proposal jobs are running again
  • 2017-02-08 10:37:14 UTC post and periodic jobs are not running, seems proposal node is down
  • 2017-02-07 16:36:10 UTC restarted gerritbot since messages seemed to be going into a black hole
  • 2017-02-06 18:15:12 UTC rax notified us that the host groups.o.o is on was rebooted
  • 2017-02-04 17:44:18 UTC zuul-launchers restarted to pick up 428740
  • 2017-02-03 19:46:54 UTC elastic search delay (elastic-recheck) appears to have recovered. logstash daemon was stopped on logstash-workers, then started. Our logprocessors were also restarted
  • 2017-02-03 14:13:27 UTC static.o.o root partition at 100%, deleted apache2 logs greater then 5 days in /var/log/apache2 to free up space
  • 2017-02-02 22:53:06 UTC Restarting gerrit due to performance problems
  • 2017-01-30 21:12:09 UTC increased quoto on afs volume mirror.pypi from 500G to 1T
  • 2017-01-25 12:51:30 UTC Gerrit has been successfully restarted
  • 2017-01-25 12:48:18 UTC Gerrit is going to be restarted due to slow performance
  • 2017-01-24 18:16:30 UTC HTTPS cert and chain for zuul.openstack.org has been renewed and replaced.
  • 2017-01-24 18:16:22 UTC HTTPS cert and chain for ask.openstack.org has been renewed and replaced.
  • 2017-01-14 08:34:53 UTC OSIC cloud has been taken down temporarily, see https://review.openstack.org/420275
  • 2017-01-12 20:36:29 UTC Updated: Gerrit will be offline until 20:45 for scheduled maintenance (running longer than anticipated): http://lists.openstack.org/pipermail/openstack-dev/2017-January/109910.html
  • 2017-01-12 20:11:24 UTC Gerrit will be offline between now and 20:30 for scheduled maintenance: http://lists.openstack.org/pipermail/openstack-dev/2017-January/109910.html
  • 2017-01-12 17:41:11 UTC fedora (25) AFS mirror now online.
  • 2017-01-11 02:09:00 UTC manually disabled puppet ansible runs from puppetmaster.openstack.org in crontab due to CVE-2016-9587
  • 2017-01-11 02:08:10 UTC upgraded ansible on all zuul launchers due to CVE-2016-9587. see https://bugzilla.redhat.com/show_bug.cgi?id=1404378 and https://review.openstack.org/418636
  • 2017-01-10 20:14:26 UTC docs.openstack.org served from afs via files01.openstack.org
  • 2017-01-09 19:23:20 UTC using ironic node-set-maintenance $node off && ironic node-set-power-state $node reboot infracloud hypervisors that had disappeared were brought back to life. The mirror VM was then reenabled with openstack server set $vm_name active.
  • 2017-01-09 15:09:23 UTC Nodepool use of Infra-cloud's chocolate region has been disabled with https://review.openstack.org/417904 while nova host issues impacting its mirror instance are investigated.
  • 2017-01-09 15:08:02 UTC All zuul-launcher services have been emergency restarted so that zuul.conf change https://review.openstack.org/417679 will take effect.
  • 2017-01-08 09:43:24 UTC AFS doc publishing is broken, we have read-only file systems.
  • 2017-01-07 01:03:27 UTC docs and docs.dev (developer.openstack.org) afs volumes now have read-only replicas in dfw and ord, and they are being served by files01.openstack.org. a script runs on afsdb01 every 5 minutes to release them if there are any changes.
  • 2017-01-04 22:18:51 UTC elasticsearch rolling upgrade to version 1.7.6 is complete and cluster is recovered
  • 2017-01-02 21:30:44 UTC logstash daemons were 'stuck' and have been restarted on logstash-worker0X.o.o hosts. Events are being processed and indexed again as a result. Should probably look into upgrading logstash install (and possibly elasticsearch
  • 2016-12-29 11:11:50 UTC logs.openstack.org is up again. Feel free to recheck any failures.
  • 2016-12-29 08:20:50 UTC All CI tests are currently broken since logs.openstack.org is down. Refrain from recheck or approval until this is fixed.
  • 2016-12-29 03:00:42 UTC review.o.o (gerrit) restarted
  • 2016-12-21 18:00:07 UTC Gerrit is being restarted to update its OpenID SSO configuration
  • 2016-12-16 00:17:36 UTC nova services restart on controller00.chocolate.ic.openstack.org to fix nodes failing to launch, unsure why this fixed our issue
  • 2016-12-14 23:06:05 UTC nb01.o.o and nb02.o.o added to emergency file on puppetmaster. To manually apply https://review.openstack.org/#/c/410988/
  • 2016-12-14 17:00:06 UTC nb01.o.o and nb02.o.o builders restarted and running from master again. nodepool.o.o did not restart, but /opt/nodepool is pointing to master branch
  • 2016-12-13 17:04:17 UTC Canonical admins have resolved the issue with login.launchpad.net, so authentication should be restored now.
  • 2016-12-13 16:27:33 UTC Launchpad SSO is not currently working, so logins to our services like review.openstack.org and wiki.openstack.org are failing; the admins at Canonical are looking into the issue but there is no estimated time for a fix yet.
  • 2016-12-12 15:08:04 UTC The Gerrit service on review.openstack.org is restarting now to address acute performance issues, and will be back online momentarily.
  • 2016-12-09 23:11:43 UTC manually ran "pip uninstall pyopenssl" on refstack.openstack.org to resolve a problem with requests/cryptography/pyopenssl/mod_wsgi
  • 2016-12-09 22:00:09 UTC elasticsearch has finished shard recovery and relocation. Cluster is now green
  • 2016-12-09 19:03:15 UTC launcher/deleter on nodepool.o.o are now running the zuulv3 branch. zookeeper based nodepool builders (nb01, nb02) are in production
  • 2016-12-09 18:57:39 UTC performed full elasticsearch cluster restart in an attempt to get it to fully recover and go green. Previously was yellow for days unable to initialize some replica shards. Recovery of shards in progress now.
  • 2016-12-08 19:48:07 UTC nb01.o.o / nb02.o.o removed from emergency file
  • 2016-12-08 19:16:13 UTC nb01.o.o / nb02.o.o added to emergency file on puppetmaster
  • 2016-12-07 19:00:56 UTC The zuul-launcher service on zlstatic01 has been restarted following application of fix https://review.openstack.org/408194
  • 2016-12-05 18:55:57 UTC Further project-config changes temporarily frozen for approval until xenial job cut-over changes merge, in an effort to avoid unnecessary merge conflicts.
  • 2016-11-30 16:43:16 UTC afs01.dfw.o.o / afs02.dfw.o.o /dev/mapper/main-vicepa increased to 3TB
  • 2016-11-24 14:49:29 UTC OpenStack CI is processing jobs again. Thanks to the Canadian admin "team" that had their Thanksgiving holiday already ;) Jobs are all enqueued, no need to recheck.
  • 2016-11-24 13:40:03 UTC OpenStack CI has taken a Thanksgiving break; no new jobs are currently launched. We're currently hoping for a friendly admin to come out of Thanksgiving and fix the system.
  • 2016-11-24 05:40:46 UTC The affected filesystems on the log server are repaired. Please leave 'recheck' comments on any changes which failed with POST_FAILURE.
  • 2016-11-24 00:14:50 UTC Due to a problem with the cinder volume backing the log server, jobs are failing with POST_FAILURE. Please avoid issuing 'recheck' commands until the issue is resolved.
  • 2016-11-23 22:56:05 UTC Configuration management updates are temporarily disabled for openstackid.org in preparation for validating change 399253.
  • 2016-11-23 22:56:01 UTC The affected filesystems on the log server are repaired. Please leave 'recheck' comments on any changes which failed with POST_FAILURE.
  • 2016-11-23 22:45:15 UTC This message is to inform you that your Cloud Block Storage device static.openstack.org/main05 has been returned to service.
  • 2016-11-23 21:11:19 UTC Due to a problem with the cinder volume backing the log server, jobs are failing with POST_FAILURE. Please avoid issuing 'recheck' commands until the issue is resolved.
  • 2016-11-23 20:57:14 UTC received at 20:41:09 UTC: This message is to inform you that our monitoring systems have detected a problem with the server which hosts your Cloud Block Storage device 'static.openstack.org/main05' at 20:41 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding the alert. Please do not access or modify 'static.openstack.org/main05' during this process.
  • 2016-11-22 21:12:27 UTC Gerrit is offline until 21:30 UTC for scheduled maintenance: http://lists.openstack.org/pipermail/openstack-dev/2016-November/107379.html
  • 2016-11-22 14:29:16 UTC rebooted ask.openstack.org for a kernel update
  • 2016-11-21 12:20:56 UTC We are currently having capacity issues with our ubuntu-xenial nodes. We have addressed the issue but will be another few hours before new images have been uploaded to all cloud providers.
  • 2016-11-17 19:18:55 UTC zl04 is restarted now as well. This concludes the zuul launcher restarts for ansible synchronize logging workaround
  • 2016-11-17 19:06:28 UTC all zuul launchers except for zl04 restarted to pick up error logging fix for synchronize tasks. zl04 failed to stop and is being held aside for debugging purposes
  • 2016-11-15 18:58:00 UTC developer.openstack.org is now served from files.openstack.org
  • 2016-11-14 19:32:12 UTC Correction, https://review.openstack.org/396428 changes logs-DEV.openstack.org behavior, rewriting nonexistent files to their .gz compressed counterparts if available.
  • 2016-11-14 19:30:38 UTC https://review.openstack.org/396428 changes logs.openstack.org behavior, rewriting nonexistent files to their .gz compressed counterparts if available.
  • 2016-11-14 17:54:20 UTC Gerrit on review.o.o restarted to deal with GarbageCollection eating all the cpu. Previous restart was Novemeber 7th, so we lasted for one week.
  • 2016-11-11 18:43:11 UTC This message is to inform you that our monitoring systems have detected a problem with the server which hosts your Cloud Block Storage device 'wiki-dev.openstack.org/main01' at 18:27 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding the alert. Please do not access or modify 'wiki-dev.openstack.org/main01' during this process.
  • 2016-11-11 13:01:03 UTC Our OpenStack CI system is coming back online again. Thanks for your patience.
  • 2016-11-11 12:02:09 UTC Our OpenStack CI systems are stuck and no new jobs are submitted. Please do not recheck - and do not approve changes until this is fixed.
  • 2016-11-11 11:50:51 UTC nodepool/zuul look currently stuck, looks like no new jobs are started
  • 2016-11-10 17:09:24 UTC restarted all zuul-launchers to pick up https://review.openstack.org/394658
  • 2016-11-07 23:09:54 UTC removed the grafana keynote demo dashboard using curl -X DELETE http://grafyamlcreds@localhost:8080/api/dashboards/db/nodepool-new-clouds
  • 2016-11-07 08:47:58 UTC Gerrit is going to be restarted due to slowness and proxy errors
  • 2016-11-04 20:05:04 UTC The old phabricator demo server has been deleted.
  • 2016-11-04 20:04:39 UTC The old (smaller) review-dev server which was replaced in August has now been deleted.
  • 2016-11-02 14:47:47 UTC All hidden Gerrit groups owned by Administrators with no members or inclusions have been prefixed with "Unused-" for possible future (manual) deletion.
  • 2016-10-28 08:57:35 UTC restart apache2 on etherpad.o.o to clear out stale connections
  • 2016-10-27 11:23:46 UTC The nodepool-builder service on nodepool.o.o has been started again now that our keynote demo is complete.
  • 2016-10-26 05:42:32 UTC The Gerrit service on review.openstack.org is being restarted now to guard against potential performance issues later this week.
  • 2016-10-25 13:51:11 UTC The nodepool-builder process is intentionally stopped on nodepool.openstack.org and will be started again tomorrow after noon UTC.
  • 2016-10-21 20:44:36 UTC nodepool is in emergency file so that nodepool config can be more directly managed temporarily
  • 2016-10-20 18:10:09 UTC The Gerrit service on review.openstack.org is being restarted now in an attempt to resolve some mismatched merge states on a few changes, but should return momentarily.
  • 2016-10-20 17:26:37 UTC restarted ansible launchers with 2.5.2.dev31
  • 2016-10-18 23:42:50 UTC restarted logstash daemons as well to get logstash pipeline moving again. Appears they all went out to lunch for some reason (logstash logs not so great but they stopped reading from the tcp connection with log workers according to strace)
  • 2016-10-18 17:44:47 UTC logstash worker daemons restarted as they have all deadlocked. Proper fix in https://review.openstack.org/388122
  • 2016-10-18 16:12:40 UTC pycparser 2.16 released to fix assertion error from today.
  • 2016-10-18 14:06:54 UTC We are away of pycparser failures in the gate and working to address the issue.
  • 2016-10-12 21:33:19 UTC bandersnatch manually synced and mirror.pypi vos released to get around timeout on cron. Mirror appears to have reached steady state and should sync properly again.
  • 2016-10-11 02:49:34 UTC Jobs running on osic nodes are failing due to network issues with the mirror. We are temporarily disabling the cloud.
  • 2016-10-10 07:11:12 UTC Nodepool images can now be built for Gentoo as well - https://review.openstack.org/#/c/310865
  • 2016-10-07 16:46:26 UTC full sync of bandersnatch started, to pickup missing packages from AFS quota issue this morning
  • 2016-10-07 12:30:07 UTC mirror.pypi quota (AFS) bumped to 500GB (up from 400GB)
  • 2016-10-07 12:28:59 UTC mirror.pypi quota (AFS) bumped to 500MB (up from 400MB)
  • 2016-10-06 18:56:31 UTC nodepool now running 3 separate daemons with configuration managed by puppet. If you can always make sure there is a deleter running before we have a launcher to avoid leaking nodes.
  • 2016-10-05 03:15:53 UTC X.509 certificate renewed and updated in private hiera for openstackid.org
  • 2016-10-04 14:02:29 UTC The Gerrit service on review.openstack.org is being restarted to address performance degradation and should return momentarily
  • 2016-09-29 15:01:26 UTC manually running log_archive_maintenance.sh to make room for logs on static.o.o
  • 2016-09-26 16:12:24 UTC Launchpad SSO logins are confirmed working correctly again
  • 2016-09-26 15:50:16 UTC gerrit login manually set to error page in apache config to avoid accidental account creation while lp sso is offline
  • 2016-09-26 15:50:13 UTC Launchpad SSO is offline, preventing login to https://review.openstack.org/, https://wiki.openstack.org/ and many other sites; no ETA has been provided by the LP admin team
  • 2016-09-26 15:44:08 UTC Earlier job failures for "zuul-cloner: error: too few arguments" should now be solved, and can safely be rechecked
  • 2016-09-26 15:37:34 UTC added review.openstack.org to emergency disabled file
  • 2016-09-26 15:28:35 UTC A 4gb swapfile has been added on cacti.openstack.org at /swap while we try to work out what flavor its replacement should run
  • 2016-09-23 22:40:31 UTC mirror.iad.rax.openstack.org has been rebooted to restore sanity following connectivity issues to its cinder volume
  • 2016-09-22 14:50:38 UTC Rebooted wheel-mirror-centos-7-amd64.slave.openstack.org to clear persistent PAG creation error
  • 2016-09-22 04:44:55 UTC A bandersnatch update is running under a root screen session on mirror-update.openstack.org
  • 2016-09-21 13:44:26 UTC disabled apache2/puppetmaster processes on puppetmaster.openstack.org
  • 2016-09-20 14:44:06 UTC infra-cloud has been enabled again.
  • 2016-09-20 13:45:20 UTC OpenStack Infra now has a Twitter bot, follow it at https://twitter.com/openstackinfra
  • 2016-09-20 13:38:56 UTC infra-cloud temporarily taken off to debug some glance issues.
  • 2016-09-20 13:37:49 UTC openstack infra now has a twitter bot, follow it at https://twitter.com/openstackinfra
  • 2016-09-18 16:35:31 UTC The /srv/mediawiki filesystem for the production wiki site had communication errors, so has been manually put through an offline fsck and remounted again
  • 2016-09-13 17:12:12 UTC The Gerrit service on review.openstack.org is being restarted now to address current performance problems, but should return to a working state within a few minutes
  • 2016-09-09 16:59:50 UTC setuptools 27.1.2 addresses the circular import
  • 2016-09-09 15:56:05 UTC New setuptools release appears to have a circular import which is breaking many jobs - check for ImportError: cannot import name monkey.
  • 2016-09-08 01:26:02 UTC restarted nodepoold and nodepool builder to pick up change that should prevent leaking iamges when we hit the 8 hour image timeout.
  • 2016-09-07 20:21:28 UTC controller00 of infracloud is put on emergency hosts, as neutron debugging has been tweaked to investigate sporadic connect timeouts, please leave as is till we get more errors on logs
  • 2016-09-02 19:16:43 UTC Gerrit is completing an online re-index, you may encounter slowness until it is complete
  • 2016-09-02 18:07:50 UTC Gerrit is now going offline for maintenance, reserving a maintenance window through 22:00 UTC.
  • 2016-09-02 17:39:48 UTC The infrastructure team is taking Gerrit offline for maintenance, beginning shortly after 18:00 UTC for a potentially 4 hour maintenance window.
  • 2016-09-02 15:23:22 UTC The Gerrit service on review.openstack.org is restarting quickly to relieve resource pressure and restore normal performance
  • 2016-09-02 12:24:51 UTC restarted nodepool with the latest shade and nodepool changes. all looks well - floating-ips, images and flavors are not being hammered
  • 2016-09-02 05:38:24 UTC Space has been freed up on the log server. If you have POST_FAILURE results it is now safe to issue a 'recheck'
  • 2016-09-02 05:12:19 UTC The logs volume is full causing jobs to fail with POST_FAILURE. This is being worked on, please do not recheck until notified.
  • 2016-08-31 22:29:18 UTC that way the cloud8 people can work on getting the ips sorted in parallel
  • 2016-08-31 22:29:06 UTC in the mean time, it was suggested as a workaround to just use the cloud1 mirror since they're in the same data center by pointing the dns record there
  • 2016-08-31 22:28:50 UTC the networking in cloud8 is such that our mirror is behind the double nat - so our automation has no idea what the actual ip of the server is ... the cloud8 people are looking in to fixing this, but there are things outside of their immediate control
  • 2016-08-29 17:43:00 UTC email sent to rackspace about rax-iad networking issue. The region is still disabled in nodepool
  • 2016-08-26 19:19:03 UTC restarted apache2 on health.o.o to remove a runaway apache process using all the cpu and memory. Looked like it may be related to mysql connections issues. DB currently looks happy.
  • 2016-08-25 23:20:30 UTC mirror.mtl01.internap.openstack.org now online
  • 2016-08-25 19:47:45 UTC The Gerrit service on review.openstack.org is restarting to implement some performance tuning adjustments, and should return to working order momentarily.
  • 2016-08-23 20:07:55 UTC mirror.regionone.osic-cloud1.openstack.org upgraded to support both ipv4 / ipv6. DNS has also been updated.
  • 2016-08-23 16:53:58 UTC The https://wiki.openstack.org/ site (temporarily hosted from wiki-upgrade-test.o.o) has been updated from Mediawiki 1.27.0 to 1.27.1 per https://lists.wikimedia.org/pipermail/mediawiki-announce/2016-August/000195.html
  • 2016-08-20 15:39:13 UTC The its-storyboard plugin has been enabled on review.openstack.org per http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-08-16-19.02.log.html#l-90
  • 2016-08-19 19:28:55 UTC nodepool.o.o added to emergency file on puppetmaster.o.o. So we can remove ubuntu-xenail label from osic-cloud1
  • 2016-08-19 11:51:08 UTC OSIC has burned through the problematic IP range with failures, things should be back to normal now.
  • 2016-08-19 11:23:21 UTC DSVM jobs on OSIC currently failing because of IP collisions, fix is in the gate - https://review.openstack.org/#/c/357764/ - please hold rechecks until merged
  • 2016-08-19 11:18:22 UTC Precise tests on OSIC provider are currently failing, please stop your checks until the issue is resolved.
  • 2016-08-18 20:08:15 UTC mirror.nyj01.internap.openstack.org replacement server now online, DNS has been updated to 74.217.28.58
  • 2016-08-17 23:04:47 UTC osic-cloud8 credentials added to hieradata
  • 2016-08-17 19:46:43 UTC The volume for logs.openstack.org filled up rather suddenly, causing a number of jobs to fail with a POST_FAILURE result and no logs; we're manually expiring some logs now to buy breathing room, but any changes which hit that in the past few minutes will need to be rechecked and/or approved again
  • 2016-08-17 16:54:30 UTC tripleo-test-cloud-rh1 credentials update on nodepool.o.o to use opentackzuul project
  • 2016-08-17 02:37:29 UTC DNS for wiki.openstack.org currently goes to the wiki-upgrade-test.openstack.org server, as the former suffered a compromise due to missing iptables rules
  • 2016-08-15 22:45:15 UTC mirror.ord.rax.openstack.org upgraded to performance1-4 to address network bandwidth cap.
  • 2016-08-15 20:49:59 UTC gracefully restarting all zuul-launchers
  • 2016-08-15 20:34:14 UTC Installed ansible stable-2.1 branch on zuul launchers to pick up https://github.com/ansible/ansible/commit/d35377dac78a8fcc6e8acf0ffd92f47f44d70946
  • 2016-08-13 16:16:54 UTC The Gerrit service on review.openstack.org is online again
  • 2016-08-13 12:26:24 UTC gerrit is having issues ... it is being working, no ETA at the moment
  • 2016-08-12 23:09:05 UTC https://wiki.openstack.org/ is now running Mediawiki 1.27.0; please let us know in #openstack-infra if anything seems wrong
  • 2016-08-12 23:03:06 UTC ok https://wiki.openstack.org/ is now running Mediawiki 1.27.0; please let us know in #openstack-infra if anything seems wrong
  • 2016-08-12 21:01:01 UTC The Mediawiki service at wiki.openstack.org will be offline from 21:00 UTC until approximately 23:00 UTC for a planned upgrade http://lists.openstack.org/pipermail/openstack-dev/2016-August/101395.html
  • 2016-08-12 20:51:18 UTC The Gerrit service on review.openstack.org is restarting for a scheduled upgrade, but should return to service momentarily: http://lists.openstack.org/pipermail/openstack-dev/2016-August/101394.html
  • 2016-08-12 18:36:06 UTC Added wiki.openstack.org to /etc/ansible/hosts/emergency on puppetmaster.openstack.org in preparation for 21:00 UTC upgrade maintenance
  • 2016-08-10 16:51:12 UTC nodepool-builder restarted on nodepool.o.o to pickup nodepool.yaml changes for bluebox-sjc1
  • 2016-08-10 05:26:14 UTC zuul is being restarted to reload configuration. Jobs should be re-enqueued but if you're missing anything (and it's not on http://status.openstack.org/zuul/) please issue a recheck in 30min.
  • 2016-08-08 08:40:29 UTC Gerrit is going to be restarted
  • 2016-08-02 23:50:13 UTC restarted zuul to clear geard function registration to fix inaccuracies with nodepool demand calculations
  • 2016-07-30 16:59:01 UTC Emergency filesystem repairs are complete; any changes which failed jobs with POST_FAILURE status or due to lack of access to tarballs can be safely rechecked now
  • 2016-07-30 14:25:39 UTC Cinder connectivity was lost to the volumes for sites served from static.openstack.org (logs, docs-draft, tarballs) and so they will remain offline until repairs are complete
  • 2016-07-30 10:00:23 UTC All jobs currenty fail with POST_FAILURE
  • 2016-07-30 05:00:49 UTC zuul-launcher release ran on zl04-zl07, I've left the first 4 zuul-launchers so we can debug the "too many ready node online" issue
  • 2016-07-29 16:47:09 UTC Our PyPI mirrors should be current again as of 16:10 UTC today
  • 2016-07-28 22:50:11 UTC performed full restart of elasticsearch cluster to get it indexing logs again.
  • 2016-07-27 21:26:43 UTC more carefully restarted logstash daemons again. Bigdesk reports significantly higher data transport rates indicating maybe it is happy now.
  • 2016-07-27 14:31:01 UTC auto-hold added to nodepool.o.o for gate-project-config-layout while we debug pypi mirror failures
  • 2016-07-27 13:54:13 UTC Gerrit is being restarted now to relieve performance degradation
  • 2016-07-27 04:19:26 UTC gate-tempest-dsvm-platform-fedora24 added to nodepool auto-hold to debug ansible failures
  • 2016-07-26 20:03:46 UTC restarted logstash worker and logstash indexer daemons to get logstash data flowing again.
  • 2016-07-22 15:29:01 UTC Up to one hour outage expected for static.openstack.org/main04 cinder volume on Saturday, July 30, starting at 08:00 UTC; log uploads issues will probably break all ci jobs and need filesystem remediation after the maintenance concludes
  • 2016-07-22 00:02:34 UTC gerrit/git gc change merged; gerrit and git.o.o repos should gc'd at 04:07 UTC
  • 2016-07-21 00:00:31 UTC All file uploads are disabled on wiki.openstack.org by https://review.openstack.org/345100
  • 2016-07-20 20:07:42 UTC Wiki admins should watch https://wiki.openstack.org/w/index.php?title=Special%3AListUsers&username=&group=&creationSort=1&desc=1&limit=50 for signs of new accounts spamming (spot check linked "contribs" for them)
  • 2016-07-20 20:07:02 UTC New user account creation has been reenabled for the wiki by https://review.openstack.org/344502
  • 2016-07-19 20:20:36 UTC Puppet is reenabled on wiki.openstack.org, and is updating the page edit captcha from questy to recaptcha
  • 2016-07-16 17:34:08 UTC disabled "Microsoft Manila CI", account id 18128 because it was in a comment loop on change 294830
  • 2016-07-15 14:19:47 UTC Gerrit is restarting to correct memory/performance issues.
  • 2016-07-12 01:11:05 UTC zlstatic01.o.o back online
  • 2016-07-11 23:51:57 UTC zlstatic01 in graceful mode
  • 2016-07-08 22:26:21 UTC manually downgraded elasticsearch-curator and ran it to clean out old indexes that were making cluster very slow and unhappy
  • 2016-07-08 21:51:39 UTC restarted logstash on logstash workers with some help from kill. The daemons were not processing events leading to the crazy logstash queue graphs and refused to restart normally.
  • 2016-07-08 16:38:05 UTC ran puppet on codesearch.openstack.org and manually restarted hound
  • 2016-07-06 06:29:08 UTC All python 3.5 jobs are failing today, we need to build new xenial images first.
  • 2016-07-05 18:15:59 UTC Job instability resulting from a block storage connectivity error on mirror.iad.rax.openstack.org has been corrected; jobs running in rax-iad should be more reliable again.
  • 2016-07-05 10:37:26 UTC we now have python35 jobs enabled
  • 2016-07-04 08:16:19 UTC setuptools 24.0.0 broke dsvm tests, we've gone back to old images, it's safe to recheck now if you had a failure related to setuptools 24.0.0 (processor_architecture) - see bug 1598525
  • 2016-07-04 00:56:10 UTC To work around the periodic group expansion issue causing puppet to run on hosts disabled in our groups.txt file in git, i have added the list of disabled hosts from it to the emergency disabled group on the puppetmaster for now
  • 2016-07-02 00:06:39 UTC Gerrit, Zuul and static.openstack.org now available following the scheduled maintenance window.
  • 2016-07-01 20:08:28 UTC Gerrit is offline for maintenance until approximately 22:00 UTC
  • 2016-07-01 19:54:58 UTC The infrastructure team is taking Gerrit offline for maintenance beginning shortly after 20:00 UTC to upgrade the Zuul and static.openstack.org servers. We aim to have it back online around 22:00 UTC.
  • 2016-06-30 16:22:04 UTC zlstatic01.o.o restart to pick up zuul.NodeWorker.wheel-mirror-ubuntu-xenial-amd64.slave.openstack.org
  • 2016-06-29 21:30:29 UTC bindep 2.0.0 release and firefox/xvfb removal from bindep-fallback.txt should take effect in our next image update
  • 2016-06-29 18:59:30 UTC UCA AFS mirror online
  • 2016-06-29 18:29:58 UTC bindep 2.0.0 released
  • 2016-06-23 23:23:13 UTC https://github.com/Shrews/ansible-modules-core/commit/d11cb0d9a1c768735d9cb4b7acc32b971b524f13
  • 2016-06-23 23:22:23 UTC zuul launchers are all running locally patched ansible (source in ~root/ansible) to correct and/or further debug async timeout issue
  • 2016-06-22 22:09:48 UTC nodepool also supports auto-holding nodes for specific failed jobs (it will set the reason appropriately)
  • 2016-06-22 22:09:14 UTC nodepool now support adding a reason when holding a node "--reason <foo>" please use that so that we can remember why they are held :)
  • 2016-06-21 16:07:09 UTC Gerrit is being restarted now to apply an emergency security-related configuration change
  • 2016-06-20 13:14:52 UTC OpenID logins are back to normal
  • 2016-06-20 13:01:26 UTC OpenID login from review.o.o is experiencing difficulties, possibly due to transatlantic network performance issues. Things are being investigated
  • 2016-06-20 10:40:50 UTC static.openstack.org is back up. If you have POST_FAILURE and are missing logs from your CI jobs, please leave a 'recheck'.
  • 2016-06-20 05:24:05 UTC static.openstack.org (which hosts logs.openstack.org and tarballs.openstack.org among others) is currently being rebuilt. As jobs can not upload logs they are failing with POST_FAILURE. This should be resolved soon. Please do not recheck until then.
  • 2016-06-20 03:11:54 UTC static.openstack.org (which hosts logs.openstack.org) is currently migrating due to a hardware failure. It should be back up shortly.
  • 2016-06-18 17:44:10 UTC zl01 restarted properly
  • 2016-06-18 17:21:20 UTC zl01 currently graceful restarting via 330184
  • 2016-06-18 16:38:42 UTC Gerrit is restarting now to relieve memory pressure and restore responsiveness
  • 2016-06-17 16:34:44 UTC zuul was restarted for a software upgrade; events between 16:08 and 16:30 were missed, please recheck any changes uploaded during that time
  • 2016-06-17 01:14:35 UTC follow-up mail about zuul-related changes: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097595.html
  • 2016-06-16 23:56:49 UTC all jenkins servers have been deleted
  • 2016-06-16 22:43:06 UTC Jenkins is retired: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097584.html
  • 2016-06-16 20:20:36 UTC zl05 - zl07 are in production; jenkins05 - jenkins07 are in prepare for shutdown mode pending decomissioning
  • 2016-06-15 18:52:04 UTC jenkins07 back online. Will manually cleanup used nodes moving forward
  • 2016-06-15 18:40:21 UTC jenkins03 and jenkins04 are in prepare-for-shutdown mode in preparation for decomissioning
  • 2016-06-13 19:50:30 UTC zuul has been restarted with registration checks disabled -- we should no longer see NOT_REGISTERED errors after zuul restarts.
  • 2016-06-13 16:24:44 UTC jenkins02.openstack.org has been deleted
  • 2016-06-10 22:19:31 UTC jenkins02 is in prepare for shutdown in preparation for decomissioning
  • 2016-06-10 06:31:03 UTC All translation imports have broken UTF-8 encoding.
  • 2016-06-09 20:07:08 UTC jenkins.o.o is in prepare-for-shutdown in preparation for decomissioning. zlstatic01.openstack.org is running and attached to its workers instead.
  • 2016-06-09 17:42:26 UTC deleted jenkins01.openstack.org
  • 2016-06-08 18:12:10 UTC Zuul has been restarted to correct an error condition. Events since 17:30 may have been missed; please 'recheck' your changes if they were uploaded since then, or have "NOT_REGISTERED" errors.
  • 2016-06-08 00:24:27 UTC nodepool.o.o restarted to pick up review 326114
  • 2016-06-07 23:25:57 UTC jenkins01 is in prepare-for-shutdown mode in preparation for decommissioning.
  • 2016-06-07 08:13:44 UTC dig gate for project-config is fixed again with https://review.openstack.org/326273 merged.
  • 2016-06-07 07:12:13 UTC All project-config jobs fail - the dib gate is broken.
  • 2016-06-06 18:09:46 UTC zl01.openstack.org in production
  • 2016-06-04 01:23:46 UTC Gerrit maintenance concluded successfully
  • 2016-06-04 00:08:07 UTC Gerrit is offline for maintenance until 01:45 UTC (new ETA)
  • 2016-06-03 20:12:32 UTC Gerrit is offline for maintenance until 00:00 UTC
  • 2016-06-03 20:00:59 UTC The infrastructure team is taking Gerrit offline for maintenance this afternoon, beginning shortly after 20:00 UTC. We aim to have it back online around 00:00 UTC.
  • 2016-06-03 14:02:43 UTC Cleanup from earlier block storage disruption on static.openstack.org has been repaired, and any jobs which reported an "UNSTABLE" result or linked to missing logs between 08:00-14:00 UTC can be retriggered by leaving a "recheck" comment.
  • 2016-06-03 11:44:18 UTC CI is experiencing issues with test logs, all jobs are currently UNSTABLE as a result. No need to recheck until this is fixed! Thanks for your patience.
  • 2016-06-03 10:11:14 UTC CI is experiencing issues with test logs, all jobs are currently UNSTABLE as a result. No need to recheck until this is fixed! Thanks for your patience.
  • 2016-06-03 09:38:30 UTC CI is experiencing issues with test logs, all jobs are currently UNSTABLE as a result. No need to recheck until this is fixed! Thanks for your patience.
  • 2016-06-02 01:09:39 UTC nodepool.o.o restarted to fix jenkins01.o.o (wasn't launching jobs)
  • 2016-06-01 23:08:46 UTC zl01.openstack.org is back in production handling a portion of the job load
  • 2016-05-30 14:18:17 UTC openstack-meetbot back online, there was an issue with DNS.
  • 2016-05-30 13:13:53 UTC Statusbot has been restarted (no activity since 27/05)
  • 2016-05-27 23:00:57 UTC eavesdrop.o.o upgraded to ubuntu-trusty and online!
  • 2016-05-27 22:23:22 UTC statusbot back online
  • 2016-05-27 19:33:52 UTC elasticsearch07.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 18:59:58 UTC logstash.openstack.org upgraded to ubuntu trusty
  • 2016-05-27 18:51:40 UTC elasticsearch06.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 18:06:09 UTC jenkins06.o.o back online
  • 2016-05-27 17:58:01 UTC jenkins05.o.o back online
  • 2016-05-27 17:47:38 UTC elasticsearch05.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 17:24:17 UTC elasticsearch04.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 16:43:54 UTC elasticsearch03.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 16:20:16 UTC elasticsearch02.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 13:32:30 UTC nodepoold restarted to address zmq issue with jenkins02 and jenkins06
  • 2016-05-27 07:15:08 UTC zuul required a restart due to network outages. If your change is not listed on http://status.openstack.org/zuul/ and is missing results, please issue a 'recheck'.
  • 2016-05-27 03:23:11 UTC after a quick check, gerrit and its filesystem have been brought back online and should be working again
  • 2016-05-27 03:03:41 UTC Gerrit is going offline briefly to check possible filesystem corruption
  • 2016-05-27 00:48:13 UTC logstash-worker20.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-27 00:32:59 UTC logstash-worker19.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-27 00:18:55 UTC logstash-worker18.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-27 00:10:33 UTC puppetmaster.o.o remove from emergency file since OSIC is now back online
  • 2016-05-27 00:01:23 UTC logstash-worker17.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 23:29:01 UTC logstash-worker16.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 23:01:18 UTC logstash-worker15.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 22:33:27 UTC logstash-worker14.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 22:17:25 UTC zl01 removed from production
  • 2016-05-26 22:12:26 UTC logstash-worker13.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 21:59:00 UTC paste.openstack.org now running ubuntu-trusty and successfully responding to requests
  • 2016-05-26 21:43:25 UTC zuul launcher zl01.openstack.org is in production (handling load in parallel with jenkins)
  • 2016-05-26 21:05:10 UTC logstash-worker12.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 20:57:15 UTC puppet disabled on puppetmaster (for the puppetmaster host itsself -- not globally) and OSIC manually removed from clouds.yaml because OSIC is down which is causing ansible openstack inventory to fail
  • 2016-05-26 20:21:28 UTC osic appears down at the moment. Following up with #osic for information
  • 2016-05-26 19:45:23 UTC logstash-worker11.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:36:29 UTC logstash-worker10.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:23:09 UTC logstash-worker09.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:11:40 UTC logstash-worker08.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:00:29 UTC logstash-worker07.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 17:47:15 UTC logstash-worker06.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 17:00:59 UTC logstash-worker05.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 16:26:21 UTC logstash-worker04.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 16:11:10 UTC logstash-worker03.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 15:50:12 UTC logstash-worker02.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 15:28:49 UTC logstash-worker01.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-25 21:05:16 UTC zuul has been restarted with a change that records and reports estimated job durations internally. job times will be under-estimated until zuul builds up its internal database
  • 2016-05-25 20:35:42 UTC status.o.o has been upgraded to ubuntu trusty
  • 2016-05-25 18:42:28 UTC storyboard.o.o has been upgraded to ubuntu trusty
  • 2016-05-25 18:42:06 UTC graphite.o.o has been upgraded to ubuntu trusty
  • 2016-05-24 22:28:54 UTC graphite.o.o is currently down, we have an open ticket with RAX regarding the detaching of cinder volumes. 160524-dfw-0003689
  • 2016-05-24 20:23:55 UTC zuul-dev.openstack.org now running on ubuntu-trusty
  • 2016-05-24 19:34:00 UTC zm08.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 19:19:27 UTC zm07.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 19:09:49 UTC zm06.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 18:53:11 UTC zm05.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 18:30:51 UTC zm04.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 18:12:01 UTC zm03.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 17:52:34 UTC zm02.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 17:32:53 UTC zm01.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 13:21:46 UTC nodepoold restarted to pick up new version of shade / clean-floating-ips
  • 2016-05-23 17:46:37 UTC changed cacti.openstack.org IP address (for upgrade to trusty); gap in data around this time while iptables updates everywhere to allow snmp
  • 2016-05-20 13:40:03 UTC I've stopped jenkins01.o.o, it doesn't appear to be working properly. Nodes attach to jenkins but are not launched by nodepool. I believe zl01 might be the issue
  • 2016-05-18 20:12:03 UTC ran restart_jenkins_masters.yaml on jenkins02.o.o
  • 2016-05-18 01:47:59 UTC Gerrit is about to be restarted to help with page timeouts
  • 2016-05-18 01:28:06 UTC ovh-bhs1 has been down for better part of the last 12 hours. See http://paste.openstack.org/show/497434/ for info about the exception
  • 2016-05-18 00:55:21 UTC nodepool restarted to pickup clean-floating-ips patch
  • 2016-05-13 09:04:38 UTC tripleo-f22 nodes slowing coming online now in nodepool
  • 2016-05-13 08:32:35 UTC tripleo-test-cloud-rh1 added back to nodepool.o.o however having currently having issues launching tripleo-f22 nodes. TripleO CI team should be looking into it
  • 2016-05-13 07:03:46 UTC Removed nodepool.o.o from emergency file on puppetmaster.o.o
  • 2016-05-11 21:56:35 UTC nodepool restarted to pickup https://review.openstack.org/#/c/294339/
  • 2016-05-11 18:47:20 UTC npm mirror sync finished; lock is released
  • 2016-05-11 16:16:27 UTC all afs mirror volumes have been moved to afs01.dfw and afs02.dfw (so they are no longer in ord) to speed up vos release times. all are in regular service using read-only replicas except for npm.
  • 2016-05-11 12:00:27 UTC We have a workaround for our mirrors to attempt to translate package names if a match isn't immediately obvious. A more complete fix is yet to come. It is now safe to 'recheck' any jobs that failed due to "No matching distribution found". Please join #openstack-infra if you discover more problems.
  • 2016-05-11 07:08:56 UTC pip 8.1.2 broke our local python mirror, some jobs will fail with "No matching distribution found". We're investigating. Do not "recheck" until the issue is solved
  • 2016-05-10 17:11:59 UTC created afs02.dfw.openstack.org fileserver
  • 2016-05-10 16:14:42 UTC afs update: the vos release -force completed in just under 59 hours, so i followed up with a normal vos release (no -force) thereafter to make sure it will complete without error now. it's been running for ~4.5 hours so far
  • 2016-05-09 12:54:07 UTC released bandersnatch lock on mirror-update.o.o to resume bandersnatch updates
  • 2016-05-07 23:21:13 UTC vos release of mirror.pypi is running with -force this time, under the usual root screen session on afs0.dfw.openstack.org
  • 2016-05-06 23:57:19 UTC the Review-MySQL trove instance has now been expanded to 50gb (19% full) and /home/gerrit2 on review.openstack.org increased to 200gb (47% full)
  • 2016-05-06 19:06:56 UTC opened support ticket 160506-iad-0001201 for Review-MySQL trove instance taking >3 hours (so far) to resize its backing volume
  • 2016-05-06 16:56:58 UTC osic-cloud1 is coming back online. Thanks for the help #osic
  • 2016-05-06 16:46:35 UTC osic-cloud1 is down at the moment, #osic is looking into the issue. Will update shortly.
  • 2016-05-06 16:02:59 UTC OSIC leads 21 FIPs, they have been deleted manually.
  • 2016-05-06 15:43:14 UTC the current 100gb /home/gerrit2 on review.openstack.org is 95% full, so i've added a new 200gb ssd volume to review.o.o as a replacement for the current 100gb ssd volume. once i'm comfortable that things are still stable after the trove volume resize, i'll pvmove the extents from the old cinder volume to the new one and then extend the lv/fs to 200gb
  • 2016-05-06 15:42:37 UTC the trove instance for review.openstack.org was 10gb and 90% full, so i'm upping it to 50gb (which is supposed to be a non-impacting online operation)
  • 2016-05-06 14:47:31 UTC Zuul has been restarted. As a results, we only preserved patches in the gate queue. Be sure to recheck your patches in gerrit if needed.
  • 2016-05-06 14:17:04 UTC Zuul is currently recovering from a large number of changes, it will take a few hours until your job is processed. Please have patience and enjoy a great weekend!
  • 2016-05-05 20:30:54 UTC Gerrit is restarting to revert incorrect changes to test result displays
  • 2016-05-05 19:22:43 UTC Gerrit is restarting to address performance issues related to a suspected memory leak
  • 2016-05-03 20:38:56 UTC through some careful scripting (which involved apache reconfiguration to stop holding an open file lock) i offlined the tarballs volume on static.openstack.org to repair its filesystem so it could be remounted read-write
  • 2016-05-03 20:28:58 UTC restarting apache on review.openstack.org to pick up security patches. Gerrit web ui may disappear for a short time.
  • 2016-05-03 09:24:59 UTC Docs-draft filesystem has been restored. Please check your affected jobs again
  • 2016-05-03 08:36:36 UTC Filesystem on docs-draft.openstack.org is broken, we are on the process of repairing it. Please stop checking jobs using this filesystem until further notice
  • 2016-05-03 08:27:24 UTC Logs filesystem has been successfully restored, please recheck your jobs
  • 2016-05-03 06:47:23 UTC Filesystem on logs.openstack.org is broken, we are on the process of repairing it. Please stop checking your jobs until further notice
  • 2016-05-03 00:37:42 UTC gerrit configuration update blocked on failing beaker tests due to missing bouncycastle releases; job being made nonvoting in https://review.openstack.org/311898
  • 2016-05-02 23:47:45 UTC due to an error in https://review.openstack.org/295530 which will be corrected in https://review.openstack.org/311888 gerrit should not be restarted until the second change lands
  • 2016-05-02 21:51:56 UTC manual vos release of pypi mirror started in screen on fileserver; see https://etherpad.openstack.org/p/fix-afs
  • 2016-05-02 15:19:44 UTC steps to fix the pypi mirror problem in progress: https://etherpad.openstack.org/p/fix-afs
  • 2016-05-02 06:53:53 UTC AFS mirrors not publishing, they get suck on vos release since 29th April
  • 2016-04-22 15:03:19 UTC Log server was repaired as of 10:50 UTC and jobs have been stable since. If necessary, please recheck changes that have 'UNSTABLE' results.
  • 2016-04-22 10:54:56 UTC Log server has been repaired and jobs are stable again. If necessary please recheck changes that have 'UNSTABLE' results.
  • 2016-04-22 07:32:05 UTC Logs are failing to be uploaded causing jobs to be marked as UNSTABLE. We are working on repairing the log filesystem and will update when ready. Please do not recheck before then.
  • 2016-04-21 12:49:48 UTC OVH provider is enabled again, please wait for the job queue to be processed
  • 2016-04-21 10:38:33 UTC OVH servers are down, we are working to solve it. This will cause that jobs queue is processed slowly, please have patience.
  • 2016-04-19 13:41:32 UTC We have recovered one of our cloud providers, but there is a huge backlog of jobs to process. Please have patience until your jobs are processed
  • 2016-04-15 09:51:47 UTC Zuul and gerrit are working normally now. Please recheck any jobs that may have been affected by this failure.
  • 2016-04-15 09:23:40 UTC No jobs are being processed by gerrit and zuul . We are working to solve the problem, please be aware that no changes have been sent to the queue in the last hour, so you will need to recheck jobs for that period.
  • 2016-04-15 09:06:29 UTC Gerrit is going to be restarted because is not processing new changes
  • 2016-04-11 21:08:40 UTC Gerrit move maintenance completed successfully; note that DNS has been updated to new IP addresses as indicated in http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-11 20:08:57 UTC Gerrit is offline until 21:00 UTC for a server replacement http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-11 19:51:50 UTC Gerrit will be offline from 20:00 to 21:00 UTC (starting 10 minutes from now) for a server replacement http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-11 16:20:17 UTC Reminder, Gerrit will be offline from 20:00 to 21:00 UTC for a server replacement http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-07 08:36:04 UTC jobs depending on npm are now working again
  • 2016-04-06 10:20:39 UTC npm lint jobs are failing due to a problem with npm registry. The problem is under investigation, and we will update once the issue is solved.
  • 2016-04-05 20:01:57 UTC ubuntu xenial mirrors now online.
  • 2016-04-05 14:51:52 UTC dns for openstackid.org has been changed from 2001:4800:7817:102:be76:4eff:fe05:d9cd and 23.253.97.70 (openstackid 1.0.17 on ubuntu precise) to 2001:4800:7815:101:be76:4eff:fe04:7741 and 23.253.243.97 (openstackid 1.0.18 on ubuntu trusty). record ttls remain 300s for now
  • 2016-04-05 13:04:10 UTC jenkins06.o.o back online, appears to have run out of RAM
  • 2016-04-04 07:15:37 UTC Gerrit is going to be restarted due to bad performance
  • 2016-03-31 19:56:01 UTC Any jobs which erroneously failed on missing traceroute packages should be safe to recheck now
  • 2016-03-31 17:49:51 UTC Job failures for missing traceroute packages are in the process of being fixed now, ETA 30 minutes to effectiveness for new jobs
  • 2016-03-30 11:15:35 UTC Gate on project-config is currently broken due to IRC tests. The problem has been detected and we are working to fix the issue as soon as possible.
  • 2016-03-28 15:22:43 UTC Gerrit is restarting on review.openstack.org in an attempt to address an issue reading an object from the ec2-api repository
  • 2016-03-24 17:08:05 UTC restarted gerrit to address GC issue
  • 2016-03-21 14:59:32 UTC Rackspace has opened support tickets warning of disruptive maintenance March 22 05:00-07:00 UTC, March 24 03:00 to 07:00 UTC, and March 25 02:00 to 06:00 UTC which could impact network connectivity including disconnecting from Trove databases and Cinder block devices
  • 2016-03-19 22:25:25 UTC Gerrit is restarting to increase performance issues
  • 2016-03-15 15:33:38 UTC Launchpad SSO is back to normal - happy hacking
  • 2016-03-15 15:00:29 UTC Launchpad OpenID SSO is currently experiencing issues preventing login. The Launchpad team is working on the issue
  • 2016-03-15 11:37:22 UTC Gerrit had to be restarted because was not responsive. As a consequence, some of the test results have been lost, from 09:30 UTC to 11:30 UTC approximately. Please recheck any affected jobs by this problem.
  • 2016-03-15 11:34:39 UTC Gerrit had to be restarted because was not responsive. As a consequence, some of the test results have been lost, from 08:30 UTC to 10:30 UTC approximately. Please recheck any affected jobs by this problem.
  • 2016-03-15 11:15:09 UTC Gerrit is going to be restarted
  • 2016-03-11 11:01:42 UTC Gerrit has been restarted successfully
  • 2016-03-11 10:56:07 UTC Gerrit is going to be restarted due to bad performance
  • 2016-03-07 07:25:45 UTC gerrit is going to be restarted due to bad performance
  • 2016-03-04 11:25:20 UTC testing status bot
  • 2016-03-01 10:45:18 UTC gerrit finished restartign
  • 2016-03-01 10:39:09 UTC Gerrit is going to be restarted due to poor performance
  • 2016-02-29 12:07:53 UTC Infra currently has a long backlog. Please be patient and where possible avoid rechecks while it catches up.
  • 2016-02-19 08:35:19 UTC Gerrit is going to be restarted due to performance problems
  • 2016-02-17 06:50:39 UTC A problem with the mirror used for CI jobs in the rax-iad region has been corrected. Please recheck changes that recently failed jobs on nodes in rax-iad.
  • 2016-02-13 17:42:02 UTC Gerrit is back up
  • 2016-02-13 15:11:57 UTC Gerrit is offline for filesystem repair
  • 2016-02-13 00:23:22 UTC Gerrit is offline for maintenance, ETA updated to 01:00 utc
  • 2016-02-12 23:43:30 UTC Gerrit is offline for maintenance, ETA updated to 23:59 utc
  • 2016-02-12 23:08:44 UTC Gerrit is offline for maintenance, ETA updated to 23:30 utc
  • 2016-02-12 22:07:37 UTC Gerrit is offline for maintenacne until 23:00 utc
  • 2016-02-12 21:47:47 UTC The infrastructure team is taking gerrit offline for maintenance this afternoon, beginning at 22:00 utc. We should have it back online around 23:00 utc. http://lists.openstack.org/pipermail/openstack-dev/2016-February/086195.html
  • 2016-02-09 17:25:39 UTC Gerrit is restarting now, to alleviate current performance impact and WebUI errors.
  • 2016-02-03 12:41:39 UTC Infra running with lower capacity now, due to a temporary problem affecting one of our nodepool providers. Please expect some delays in your jobs. Apologies for any inconvenience caused.
  • 2016-01-30 09:23:17 UTC Testing status command
  • 2016-01-22 17:52:01 UTC Restarting zuul due to a memory leak
  • 2016-01-20 11:56:15 UTC Restart done, review.openstack.org is available
  • 2016-01-20 11:45:12 UTC review.openstack.org is being restarted to apply patches
  • 2016-01-18 16:50:38 UTC Gerrit is restarting quickly as a workaround for performance degradation
  • 2016-01-11 22:06:57 UTC Gerrit is restarting to resolve java memory issues
  • 2015-12-17 16:43:53 UTC Zuul is moving in very slow motion since roughly 13:30 UTC; the Infra team is investigating.
  • 2015-12-16 21:02:59 UTC Gerrit has been upgraded to 2.11. Please report any issues in #openstack-infra as soon as possible.
  • 2015-12-16 17:07:00 UTC Gerrit is offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html
  • 2015-12-16 16:21:49 UTC Gerrit will be offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html
  • 2015-12-04 16:55:08 UTC The earlier JJB bug which disrupted tox-based job configurations has been reverted and applied; jobs seem to be running successfully for the past two hours.
  • 2015-12-04 09:32:24 UTC Tox tests are broken at the moment. From openstack-infra we are working to fix them. Please don't approve changes until we notify that tox tests work again.
  • 2015-11-06 20:04:47 UTC Gerrit is offline until 20:15 UTC today for scheduled project rename maintenance
  • 2015-11-06 19:41:20 UTC Gerrit will be offline at 20:00-20:15 UTC today (starting 20 minutes from now) for scheduled project rename maintenance
  • 2015-10-27 06:32:40 UTC CI will be disrupted for an indeterminate period while our service provider reboots systems for a security fix
  • 2015-10-17 18:40:01 UTC Gerrit is back online. Github transfers are in progress and should be complete by 1900 UTC.
  • 2015-10-17 18:03:25 UTC Gerrit is offline for project renames.
  • 2015-10-17 17:11:10 UTC Gerrit will be offline for project renames starting at 1800 UTC.
  • 2015-10-13 11:19:47 UTC Gerrit has been restarted and is responding to normal load again.
  • 2015-10-13 09:44:48 UTC gerrit is undergoing an emergency restart to investigate load issues
  • 2015-10-05 14:03:13 UTC Gerrit was restarted to temporarily address performance problems
  • 2015-09-17 10:16:42 UTC Gate back to normal, thanks to the backlisting of the problematic version
  • 2015-09-17 08:02:50 UTC Gate is currently stuck, failing grenade upgrade tests due the release of oslo.utils 1.4.1 for Juno.
  • 2015-09-11 23:04:39 UTC Gerrit is offline from 23:00 to 23:30 UTC while some projects are renamed. http://lists.openstack.org/pipermail/openstack-dev/2015-September/074235.html
  • 2015-09-11 22:32:57 UTC 30 minute warning, Gerrit will be offline from 23:00 to 23:30 UTC while some projects are renamed http://lists.openstack.org/pipermail/openstack-dev/2015-September/074235.html
  • 2015-08-31 20:27:18 UTC puppet agent temporarily disabled on nodepool.openstack.org to avoid accidental upgrade to python-glanceclient 1.0.0
  • 2015-08-26 15:45:47 UTC restarting gerrit due to a slow memory leak
  • 2015-08-17 10:50:24 UTC Gerrit restart has resolved the issue and systems are back up and functioning
  • 2015-08-17 10:23:42 UTC review.openstack.org (aka gerrit) is going down for an emergency restart
  • 2015-08-17 07:07:38 UTC Gerrit is currently under very high load and may be unresponsive. infra are looking into the issue.
  • 2015-08-12 00:06:30 UTC Zuul was restarted due to an error; events (such as approvals or new patchsets) since 23:01 UTC have been lost and affected changes will need to be rechecked
  • 2015-08-05 21:11:30 UTC Correction: change events between 20:50-20:54 UTC (during the restart only) have been lost and will need to be rechecked or their approvals reapplied to trigger testing.
  • 2015-08-05 21:06:19 UTC Zuul has been restarted to resolve a reconfiguration failure: previously running jobs have been reenqueued but change events between 19:50-20:54 UTC have been lost and will need to be rechecked or their approvals reapplied to trigger testing.
  • 2015-08-03 13:41:37 UTC The Gerrit service on review.openstack.org has been restarted in an attempt to improve performance.
  • 2015-07-30 09:01:49 UTC CI is back online but has a huge backlog. Please be patient and if possible delay approving changes until it has caught up.
  • 2015-07-30 07:52:49 UTC CI system is broken and very far behind. Please do not approve any changes for a while.
  • 2015-07-30 07:43:12 UTC Our CI system is broken again today, jobs are not getting processed at all.
  • 2015-07-29 13:27:42 UTC zuul jobs after about 07:00 UTC may need a 'recheck' to enter the queue. Look if your change is in http://status.openstack.org/zuul/ and recheck if not.
  • 2015-07-29 12:52:20 UTC zuul's disks were at capacity. Space has been freed up and jobs are being re-queued.
  • 2015-07-29 09:30:59 UTC Currently our CI system is broken, jobs are not getting processed at all.
  • 2015-07-28 08:04:50 UTC zuul has been restarted and queues restored. It may take some time to work through the backlog.
  • 2015-07-28 06:48:20 UTC zuul is stuck and about to undergo an emergency restart, please be patient as job results may take a long time
  • 2015-07-22 14:35:43 UTC CI is slowly recovering, please be patient while the backlog is worked through.
  • 2015-07-22 14:17:30 UTC CI is currently recovering from an outage overnight. It is safe to recheck results with NOT_REGISTERED errors. It may take some time for zuul to work through the backlog.
  • 2015-07-22 08:16:50 UTC zuul jobs are currently stuck while problems with gearman are debugged
  • 2015-07-22 07:24:43 UTC zuul is undergoing an emergency restart. Jobs will be re-queued but some events may be lost.
  • 2015-07-10 22:00:47 UTC Gerrit is unavailable from approximately 22:00 to 22:30 UTC for project renames
  • 2015-07-10 21:04:01 UTC Gerrit will be unavailable from 22:00 to 22:30 UTC for project renames
  • 2015-07-03 19:33:46 UTC etherpad.openstack.org is still offline for scheduled database maintenance, ETA 19:45 UTC
  • 2015-07-03 19:05:45 UTC etherpad.openstack.org is offline for scheduled database maintenance, ETA 19:30 UTC
  • 2015-06-30 14:56:00 UTC The log volume was repaired and brought back online at 14:00 UTC. Log links today from before that time may be missing, and changes should be rechecked if fresh job logs are desired for them.
  • 2015-06-30 08:50:29 UTC OpenStack CI is down due to hard drive failures
  • 2015-06-12 22:45:07 UTC Gerrit is back online. Zuul reconfiguration for renamed projects is still in progress, ETA 23:30.
  • 2015-06-12 22:10:50 UTC Gerrit is offline for project renames. ETA 22:40
  • 2015-06-12 22:06:20 UTC Gerrit is offline for project renames. ETA 20:30
  • 2015-06-12 21:45:26 UTC Gerrit will be offline for project renames between 22:00 and 22:30 UTC
  • 2015-06-11 21:08:10 UTC Gerrit has been restarted to terminate a persistent looping third-party CI bot
  • 2015-06-04 18:43:17 UTC Gerrit has been restarted to clear an issue with its event stream. Any change events between 17:25 and 18:38 UTC should be rechecked or have their approvals reapplied to initiate testing.
  • 2015-05-13 23:00:05 UTC Gerrit and Zuul are back online.
  • 2015-05-13 22:42:09 UTC Gerrit and Zuul are going offline for reboots to fix a security vulnerability.
  • 2015-05-12 00:58:04 UTC Gerrit has been downgraded to version 2.8 due to the issues observed today. Please report further problems in #openstack-infra.
  • 2015-05-11 23:56:14 UTC Gerrit is going offline while we perform an emergency downgrade to version 2.8.
  • 2015-05-11 17:40:47 UTC We have discovered post-upgrade issues with Gerrit affecting nova (and potentially other projects). Some changes will not appear and some actions, such as queries, may return an error. We are continuing to investigate.
  • 2015-05-09 18:32:43 UTC Gerrit upgrade completed; please report problems in #openstack-infra
  • 2015-05-09 16:03:24 UTC Gerrit is offline from 16:00-20:00 UTC to upgrade to version 2.10.
  • 2015-05-09 15:18:16 UTC Gerrit will be offline from 1600-2000 UTC while it is upgraded to version 2.10
  • 2015-05-06 00:43:52 UTC Restarted gerrit due to stuck stream-events connections. Events since 23:49 were missed and changes uploaded since then will need to be rechecked.
  • 2015-05-05 17:05:25 UTC zuul has been restarted to troubleshoot an issue, gerrit events between 15:00-17:00 utc were lost and changes updated or approved during that time will need to be rechecked or have their approval votes readded to trigger testing
  • 2015-04-29 14:06:55 UTC gerrit has been restarted to clear a stuck events queue. any change events between 13:29-14:05 utc should be rechecked or have their approval votes reapplied to trigger jobs
  • 2015-04-28 15:38:04 UTC gerrit has been restarted to clear an issue with its event stream. any change events between 14:43-15:30 utc should be rechecked or have their approval votes reapplied to trigger jobs
  • 2015-04-28 12:43:46 UTC Gate is experiencing epic failures due to issues with mirrors, work is underway to mitigate and return to normal levels of sanity
  • 2015-04-27 13:48:14 UTC gerrit has been restarted to clear a problem with its event stream. change events between 13:09 and 13:36 utc should be rechecked or have approval votes reapplied as needed to trigger jobs
  • 2015-04-27 08:11:05 UTC Restarting gerrit because it stopped sending events (ETA 15 mins)
  • 2015-04-22 17:33:33 UTC gerrit is restarting to clear hung stream-events tasks. any review events between 16:48 and 17:32 utc will need to be rechecked or have their approval votes reapplied to trigger testing in zuul
  • 2015-04-18 15:11:25 UTC Gerrit is offline for emergency maintenance, ETA 15:30 UTC to completion
  • 2015-04-18 14:32:11 UTC Gerrit will be offline between 15:00-15:30 UTC today for emergency maintenance (starting half an hour from now)
  • 2015-04-18 14:02:07 UTC Gerrit will be offline between 15:00-15:30 UTC today for emergency maintenance (starting an hour from now)
  • 2015-04-18 02:29:15 UTC gerrit is undergoing a quick-ish restart to implement a debugging patch. should be back up in ~10 minutes. apologies for any inconvenience
  • 2015-04-17 23:07:06 UTC Gerrit is available again.
  • 2015-04-17 22:09:51 UTC Gerrit is unavailable until 23:59 UTC for project renames and a database update.
  • 2015-04-17 22:05:40 UTC Gerrit is unavailable until 23:59 UTC for project renames and a database update.
  • 2015-04-17 21:05:41 UTC Gerrit will be unavailable between 22:00 and 23:59 UTC for project renames and a database update.
  • 2015-04-16 19:48:11 UTC gerrit has been restarted to clear a problem with its event stream. any gerrit changes updated or approved between 19:14 and 19:46 utc will need to be rechecked or have their approval reapplied for zuul to pick them up
  • 2015-04-15 18:27:55 UTC Gerrit has been restarted. New patches, approvals, and rechecks between 17:30 and 18:20 UTC may have been missed by Zuul and will need rechecks or new approvals added.
  • 2015-04-15 18:05:15 UTC Gerrit has stopped emitting events so Zuul is not alerted to changes. We will restart Gerrit shortly to correct the problem.
  • 2015-04-10 15:45:54 UTC gerrit has been restarted to address a hung event stream. change events between 15:00 and 15:43 utc which were lost will need to be rechecked or have approval workflow votes reapplied for zuul to act on them
  • 2015-04-06 11:40:08 UTC gerrit has been restarted to restore event streaming. any change events missed by zuul (between 10:56 and 11:37 utc) will need to be rechecked or have new approval votes set
  • 2015-04-01 13:29:44 UTC gerrit has been restarted to restore event streaming. any change events missed by zuul (between 12:48 and 13:28 utc) will need to be rechecked or have new approval votes set
  • 2015-03-31 11:51:33 UTC Check/Gate unstuck, feel free to recheck your abusively-failed changes.
  • 2015-03-31 08:55:59 UTC CI Check/Gate pipelines currently stuck due to a bad dependency creeping in the system. No need to recheck your patches at the moment.
  • 2015-03-27 22:06:32 UTC Gerrit is offline for maintenance, ETA 22:30 UTC http://lists.openstack.org/pipermail/openstack-dev/2015-March/059948.html
  • 2015-03-27 21:02:04 UTC Gerrit maintenance commences in 1 hour at 22:00 UTC http://lists.openstack.org/pipermail/openstack-dev/2015-March/059948.html
  • 2015-03-26 13:13:33 UTC gerrit stopped emitting stream events around 11:30 utc and has now been restarted. please recheck any changes currently missing results from jenkins
  • 2015-03-21 16:07:02 UTC Gerrit is back online
  • 2015-03-21 15:08:01 UTC Gerrit is offline for scheduled maintenance || http://lists.openstack.org/pipermail/openstack-infra/2015-March/002540.html
  • 2015-03-21 14:54:23 UTC Gerrit will be offline starting at 1500 UTC for scheduled maintenance
  • 2015-03-04 17:17:49 UTC Issue solved, gate slowly digesting accumulated changes
  • 2015-03-04 08:32:42 UTC Zuul check queue stuck due to reboot maintenance window at one of our cloud providers - no need to recheck changes at the moment, they won't move forward.
  • 2015-01-30 19:32:23 UTC Gerrit is back online
  • 2015-01-30 19:10:04 UTC Gerrit and Zuul are offline until 1930 UTC for project renames
  • 2015-01-30 18:43:57 UTC Gerrit and Zuul will be offline from 1900 to 1930 UTC for project renames
  • 2015-01-30 16:15:03 UTC zuul is running again and changes have been reenqueud. seehttp://status.openstack.org/zuul/ before rechecking if in doubt
  • 2015-01-30 14:26:56 UTC zuul isn't running jobs since ~10:30 utc, investigation underway
  • 2015-01-27 17:54:45 UTC Gerrit and Zuul will be offline for a few minutes for a security update
  • 2015-01-20 19:54:47 UTC Gerrit restarted to address likely memory leak leading to server slowness. Sorry if you were caught in the restart
  • 2015-01-09 18:59:29 UTC paste.openstack.org is going offline for a database migration (duration: ~2 minutes)
  • 2014-12-06 16:06:03 UTC gerrit will be offline for 30 minutes while we rename a few projects. eta 16:30 utc
  • 2014-12-06 15:21:31 UTC [reminder] gerrit will be offline for 30 minutes starting at 16:00 utc for project renames
  • 2014-11-22 00:33:53 UTC Gating and log storage offline due to block device error. Recovery in progress, ETA unknown.
  • 2014-11-21 21:46:58 UTC gating is going offline while we deal with a broken block device, eta unknown
  • 2014-10-29 20:58:17 UTC Restarting gerrit to get fixed CI javascript
  • 2014-10-20 21:22:38 UTC Zuul erroneously marked some changes as having merge conflicts. Those changes have been added to the check queue to be rechecked and will be automatically updated when complete.
  • 2014-10-17 21:27:06 UTC Gerrit is back online
  • 2014-10-17 21:04:39 UTC Gerrit is offline from 2100-2130 for project renames
  • 2014-10-17 20:35:12 UTC Gerrit will be offline from 2100-2130 for project renames
  • 2014-10-17 17:04:01 UTC upgraded wiki.openstack.org from Mediawiki 1.24wmf19 to 1.25wmf4 per http://ci.openstack.org/wiki.html
  • 2014-10-16 16:20:43 UTC An error in a configuration change to mitigate the poodle vulnerability caused a brief outage of git.openstack.org from 16:06-16:12. The problem has been corrected and git.openstack.org is working again.
  • 2014-09-24 21:59:06 UTC The openstack-infra/config repo will be frozen for project-configuration changes starting at 00:01 UTC. If you have a pending configuration change that has not merged or is not in the queue, please see us in #openstack-infra.
  • 2014-09-24 13:43:48 UTC removed 79 disassociated floating ips in hpcloud
  • 2014-09-22 15:52:51 UTC removed 431 disassociated floating ips in hpcloud
  • 2014-09-22 15:52:23 UTC killed bandersnatch process on pypi.region-b.geo-1.openstack.org, hung since 2014-09-18 22:45 due to https://bitbucket.org/pypa/bandersnatch/issue/52
  • 2014-09-22 15:51:21 UTC restarted gerritbot to get it to rejoin channels
  • 2014-09-19 20:53:18 UTC Gerrit is back online
  • 2014-09-19 20:17:08 UTC Gerrit will be offline from 20:30 to 20:50 UTC for project renames
  • 2014-09-16 13:38:01 UTC jenkins ran out of jvm memory on jenkins06 at 01:42:20 http://paste.openstack.org/show/112155/
  • 2014-09-14 18:13:14 UTC all our pypi mirrors failed to update urllib3 properly, full mirror refresh underway now to correct, eta 20:00 utc
  • 2014-09-13 15:10:34 UTC shutting down all irc bots now to change their passwords (per the wallops a few minutes ago, everyone should do the same)
  • 2014-09-13 14:54:19 UTC rebooted puppetmaster.openstack.org due to out-of-memory condition
  • 2014-08-30 16:08:43 UTC Gerrit is offline for project renaming maintenance, ETA 1630
  • 2014-08-25 17:12:51 UTC restarted gerritbot
  • 2014-08-16 16:30:38 UTC Gerrit is offline for project renames. ETA 1645.
  • 2014-07-26 18:28:21 UTC Zuul has been restarted to move it beyond a change it was failing to report on
  • 2014-07-23 22:08:12 UTC zuul is working through a backlog of jobs due to an earlier problem with nodepool
  • 2014-07-23 20:42:47 UTC nodepool is unable to build test nodes so check and gate tests are delayed
  • 2014-07-15 18:23:58 UTC python2.6 jobs are failing due to bug 1342262 "virtualenv>=1.9.1 not found" A fix is out but there are still nodes built on the old stale images
  • 2014-06-28 14:40:16 UTC Gerrit will be offline from 1500-1515 UTC for project renames
  • 2014-06-15 15:30:13 UTC Launchpad is OK - statusbot lost the old channel statuses. They will need to be manually restored
  • 2014-06-15 02:32:57 UTC launchpad openid is down. login to openstack services will fail until launchpad openid is happy again
  • 2014-06-02 14:17:51 UTC setuptools issue was fixed in upstream in 3.7.1 and 4.0.1, please, recheck on bug 1325514
  • 2014-06-02 08:33:19 UTC setuptools upstream has broken the world. it's a known issue. we're hoping that a solution materializes soon
  • 2014-05-29 20:41:04 UTC Gerrit is back online
  • 2014-05-29 20:22:30 UTC Gerrit is going offline to correct an issue with a recent project rename. ETA 20:45 UTC.
  • 2014-05-28 00:08:31 UTC zuul is using a manually installed "gear" library with the timeout and logging changes
  • 2014-05-27 22:11:41 UTC Zuul is started and processing changes that were in the queue when it was stopped. Changes uploaded or approved since then will need to be re-approved or rechecked.
  • 2014-05-27 21:34:45 UTC Zuul is offline due to an operational issue; ETA 2200 UTC.
  • 2014-05-26 22:31:12 UTC stopping gerrit briefly to rebuild its search index in an attempt to fix post-rename oddities (will update with notices every 10 minutes until completed)
  • 2014-05-23 21:36:49 UTC Gerrit is offline in order to rename some projects. ETA: 22:00.
  • 2014-05-23 20:34:36 UTC Gerrit will be offline for about 20 minutes in order to rename some projects starting at 21:00 UTC.
  • 2014-05-09 16:44:31 UTC New contributors can't complete enrollment due to https://launchpad.net/bugs/1317957 (Gerrit is having trouble reaching the Foundation Member system)
  • 2014-05-07 13:12:58 UTC Zuul is processing changes now; some results were lost. Use "recheck bug 1317089" if needed.
  • 2014-05-07 13:04:11 UTC Zuul is stuck due to earlier networking issues with Gerrit server, work in progress.
  • 2014-05-02 23:27:29 UTC paste.openstack.org is going down for a short database upgrade
  • 2014-05-02 22:00:08 UTC Zuul is being restarted with some dependency upgrades and configuration changes; ETA 2215
  • 2014-05-01 00:06:18 UTC the gate is still fairly backed up, though nodepool is back on track and chipping away at remaining changes. some py3k/pypy node starvation is slowing recovery
  • 2014-04-30 20:26:57 UTC the gate is backed up due to broken nodepool images, fix in progress (eta 22:00 utc)
  • 2014-04-28 19:33:21 UTC Gerrit upgrade to 2.8 complete. See: https://wiki.openstack.org/wiki/GerritUpgrade Some cleanup tasks still ongoing; join #openstack-infra if you have any questions.
  • 2014-04-28 16:38:31 UTC Gerrit is unavailable until further notice for a major upgrade. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-28 15:31:50 UTC Gerrit downtime for upgrade begins in 30 minutes. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-28 14:31:51 UTC Gerrit downtime for upgrade begins in 90 minutes. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-25 20:59:57 UTC Gerrit will be unavailable for a few hours starting at 1600 UTC on Monday April 28th for an upgrade. See https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-25 17:17:55 UTC Gerrit will be unavailable for a few hours starting at 1600 UTC on Monday April 28th for an upgrade. See https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-16 00:00:14 UTC Restarting gerrit really quick to fix replication issue
  • 2014-04-08 01:33:50 UTC All services should be back up
  • 2014-04-08 00:22:30 UTC All of the project infrastructure hosts are being restarted for security updates.
  • 2014-03-25 13:30:44 UTC the issue with gerrit cleared on its own before any corrective action was taken
  • 2014-03-25 13:22:16 UTC the gerrit event stream is currently hung, blocking all testing. troubleshooting is in progress (next update at 14:00 utc)
  • 2014-03-12 12:24:44 UTC gerrit on review.openstack.org is down for maintenance (revised eta to resume is 13:00 utc)
  • 2014-03-12 12:07:18 UTC gerrit on review.openstack.org is down for maintenance (eta to resume is 12:30 utc)
  • 2014-03-12 11:28:08 UTC test/gate jobs are queuing now in preparation for gerrit maintenance at 12:00 utc (eta to resume is 12:30 utc)
  • 2014-02-26 22:25:55 UTC gerrit service on review.openstack.org will be down momentarily for a another brief restart--apologies for the disruption
  • 2014-02-26 22:13:11 UTC gerrit service on review.openstack.org will be down momentarily for a restart to add an additional git server
  • 2014-02-21 17:36:50 UTC Git-related build issues should be resolved. If your job failed with no build output, use "recheck bug 1282876".
  • 2014-02-21 16:34:23 UTC Some builds are failing due to errors in worker images; fix eta 1700 UTC.
  • 2014-02-20 23:41:09 UTC A transient error caused Zuul to report jobs as LOST; if you were affected, leave a comment with "recheck no bug"
  • 2014-02-18 23:33:18 UTC Gerrit login issues should be resolved.
  • 2014-02-13 22:35:01 UTC restarting zuul for a configuration change
  • 2014-02-10 16:21:11 UTC jobs are running for changes again, but there's a bit of a backlog so it will still probably take a few hours for everything to catch up
  • 2014-02-10 15:16:33 UTC the gate is experiencing delays due to nodepool resource issues (fix in progress, eta 16:00 utc)
  • 2014-02-07 20:10:08 UTC Gerrit and Zuul are offline for project renames. ETA 20:30 UTC.
  • 2014-02-07 18:59:03 UTC Zuul is now in queue-only mode preparing for project renames at 20:00 UTC
  • 2014-02-07 17:35:36 UTC Gerrit and Zuul going offline at 20:00 UTC for ~15mins for project renames
  • 2014-02-07 17:34:07 UTC Gerrit and Zuul going offline at 20:00 UTC for ~15mins for project renames
  • 2014-01-29 17:09:18 UTC the gate is merging changes again... issues with tox/virtualenv versions can be rechecked or reverified against bug 1274135
  • 2014-01-29 14:37:42 UTC most tests are failing as a result of new tox and testtools releases (bug 1274135, in progress)
  • 2014-01-29 14:25:35 UTC most tests are failing as a result of new tox and testtools releases--investigation in progress
  • 2014-01-24 21:55:40 UTC Zuul is restarting to pick up a bug fix
  • 2014-01-24 21:39:11 UTC Zuul is ignoring some enqueue events; fix in progress
  • 2014-01-24 16:13:31 UTC restarted gerritbot because it seemed to be on the wrong side of a netsplit
  • 2014-01-23 23:51:14 UTC Zuul is being restarted for an upgrade
  • 2014-01-22 20:51:44 UTC Zuul is about to restart for an upgrade; changes will be re-enqueued
  • 2014-01-17 19:13:32 UTC zuul.openstack.org underwent maintenance today from 16:50 to 19:00 UTC, so any changes approved during that timeframe should be reapproved so as to be added to the gate. new patchsets uploaded for those two hours should be rechecked (no bug) if test results are desired
  • 2014-01-14 12:29:06 UTC Gate currently blocked due to slave node exhaustion
  • 2014-01-07 16:47:29 UTC unit tests seem to be passing consistently after the upgrade. use bug 1266711 for related rechecks
  • 2014-01-07 14:51:19 UTC working on undoing the accidental libvirt upgrade which is causing nova and keystone unit test failures (ETA 15:30 UTC)
  • 2014-01-06 21:20:09 UTC gracefully stopping jenkins01 now. it has many nodes which are offline status and only a handful online, while nodepool thinks it has ~90 available to run jobs
  • 2014-01-06 19:37:28 UTC gracefully stopping jenkins02 now. it has many nodes which are offline status and only a handful online, while nodepool thinks it has ~75 available to run jobs
  • 2014-01-06 19:36:12 UTC gating is operating at reduced capacity while we work through a systems problem (ETA 21:00 UTC)
  • 2014-01-03 00:13:32 UTC see: https://etherpad.openstack.org/p/pip1.5Upgrade
  • 2014-01-02 17:07:54 UTC gating is severely hampered while we attempt to sort out the impact of today's pip 1.5/virtualenv 1.11 releases... no ETA for solution yet
  • 2014-01-02 16:58:35 UTC gating is severely hampered while we attempt to sort out the impact of the pip 1.5 release... no ETA for solution yet
  • 2013-12-24 06:11:50 UTC fix for grenade euca/bundle failures is in the gate. changes failing on those issues in the past 7 hours should be rechecked or reverified against bug 1263824
  • 2013-12-24 05:31:47 UTC gating is currently wedged by consistent grenade job failures--proposed fix is being confirmed now--eta 06:00 utc
  • 2013-12-13 17:21:56 UTC restarted gerritbot
  • 2013-12-11 21:35:29 UTC test
  • 2013-12-11 21:34:09 UTC test
  • 2013-12-11 21:20:28 UTC test
  • 2013-12-11 18:03:36 UTC Grenade gate infra issues: use "reverify bug 1259911"
  • 2013-12-06 17:05:12 UTC i'm running statusbot in screen to try to catch why it dies after a while.
  • 2013-12-04 18:34:41 UTC gate failures due to django incompatibility, pip bugs, and node performance problems
  • 2013-12-03 16:56:59 UTC docs jobs are failing due to a full filesystem; fix eta 1750 UTC
  • 2013-11-26 14:25:11 UTC Gate should be unwedged now, thanks for your patience
  • 2013-11-26 11:29:13 UTC Gate wedged - Most Py26 jobs fail currently (https://bugs.launchpad.net/openstack-ci/+bug/1255041)
  • 2013-11-20 22:45:24 UTC Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html
  • 2013-11-06 00:03:44 UTC filesystem resize complete, logs uploading successfully again in the past few minutes--feel free to 'recheck no bug' or 'reverify no bug' if your change failed jobs with an "unstable" result
  • 2013-11-05 23:31:13 UTC Out of disk space on log server, blocking test result uploads--fix in progress, eta 2400 utc
  • 2013-10-13 16:25:59 UTC etherpad migration complete
  • 2013-10-13 16:05:03 UTC etherpad is down for software upgrade and migration
  • 2013-10-11 16:36:06 UTC the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue
  • 2013-10-11 14:14:17 UTC The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC
  • 2013-10-11 12:48:07 UTC Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)
  • 2013-10-05 17:01:06 UTC puppet disabled on nodepool due to manually reverting gearman change
  • 2013-10-05 16:03:13 UTC Gerrit will be down for maintenance from 1600-1630 UTC
  • 2013-10-05 15:34:37 UTC Zuul is shutting down for Gerrit downtime from 1600-1630 UTC
  • 2013-10-02 09:54:09 UTC Jenkins01 is not failing, it's just very slow at the moment... so the gate is not completely stuck.
  • 2013-10-02 09:46:39 UTC One of our Jenkins masters is failing to return results, so the gate is currently stuck.
  • 2013-09-24 15:48:07 UTC changes seem to be making it through the gate once more, and so it should be safe to "recheck bug 1229797" or "reverify bug 1229797" on affected changes as needed
  • 2013-09-24 13:30:07 UTC dependency problems in gating, currently under investigation... more news as it unfolds
  • 2013-08-27 20:35:24 UTC Zuul has been restarted
  • 2013-08-27 20:10:38 UTC zuul is offline because of a pbr-related installation issue
  • 2013-08-24 22:40:08 UTC Zuul and nodepool are running again; rechecks have been issued (but double check your patch in case it was missed)
  • 2013-08-24 22:04:36 UTC Zuul and nodepool are being restarted
  • 2013-08-23 17:53:29 UTC recent UNSTABLE jobs were due to maintenance to expand capacity which is complete; recheck or reverify as needed
  • 2013-08-22 18:12:24 UTC stopping gerrit to correct a stackforge project rename error
  • 2013-08-22 17:55:56 UTC restarting gerrit to pick up a configuration change
  • 2013-08-22 15:06:03 UTC Zuul has been restarted; leave 'recheck no bug' or 'reverify no bug' comments to re-enqueue.
  • 2013-08-22 01:38:31 UTC Zuul is running again
  • 2013-08-22 01:02:06 UTC Zuul is offline for troubleshooting
  • 2013-08-21 21:10:59 UTC Restarting zuul, changes should be automatically re-enqueued
  • 2013-08-21 16:32:30 UTC LOST jobs are due to a known bug; use "recheck no bug"
  • 2013-08-19 20:27:37 UTC gate-grenade-devstack-vm is currently failing preventing merges. Proposed fix: https://review.openstack.org/#/c/42720/
  • 2013-08-16 13:53:35 UTC the gate seems to be properly moving now, but some changes which were in limbo earlier are probably going to come back with negative votes now. rechecking/reverifying those too
  • 2013-08-16 13:34:05 UTC the earlier log server issues seem to have put one of the jenkins servers in a bad state, blocking the gate--working on that, ETA 14:00 UTC
  • 2013-08-16 12:41:10 UTC still rechecking/reverifying false negative results on changes, but the gate is moving again
  • 2013-08-16 12:00:34 UTC log server has a larger filesystem now--rechecking/reverifying jobs, ETA 12:30 UTC
  • 2013-08-16 12:00:22 UTC server has a larger filesystem now--rechecking/reverifying jobs, ETA 12:30 UTC
  • 2013-08-16 11:21:47 UTC the log server has filled up, disrupting job completion--working on it now, ETA 12:30 UTC
  • 2013-08-16 11:07:34 UTC some sort of gating disruption has been identified--looking into it now
  • 2013-07-28 15:30:29 UTC restarted zuul to upgrade
  • 2013-07-28 00:25:57 UTC restarted jenkins to update scp plugin
  • 2013-07-26 14:19:34 UTC Performing maintenance on docs-draft site, unstable docs jobs expected for the next few minutes; use "recheck no bug"
  • 2013-07-20 18:38:03 UTC devstack gate should be back to normal
  • 2013-07-20 17:02:31 UTC devstack-gate jobs broken due to setuptools brokenness; fix in progress.
  • 2013-07-20 01:41:30 UTC replaced ssl certs for jenkins, review, wiki, and etherpad
  • 2013-07-19 23:47:31 UTC Project affected by the xattr cffi dependency issues should be able to run tests and have them pass. xattr has been fixed and the new version is on our mirror.
  • 2013-07-19 22:23:27 UTC Projects with a dependency on xattr are failing tests due to unresolved xattr dependencies. Fix should be in shortly
  • 2013-07-17 20:33:39 UTC Jenkins is running jobs again, some jobs are marked as UNSTABLE; fix in progress
  • 2013-07-17 18:43:20 UTC Zuul is queueing jobs while Jenkins is restarted for a security update
  • 2013-07-17 18:32:50 UTC Gerrit security updates have been applied
  • 2013-07-17 17:38:19 UTC Gerrit is being restarted to apply a security update
  • 2013-07-16 01:30:52 UTC Zuul is back up and outstanding changes have been re-enqueued in the gate queue.
  • 2013-07-16 00:23:27 UTC Zuul is down for an emergency load-related server upgrade. ETA 01:30 UTC.
  • 2013-07-06 16:29:49 UTC Neutron project rename in progress; see https://wiki.openstack.org/wiki/Network/neutron-renaming
  • 2013-07-06 16:29:32 UTC Gerrit and Zuul are back online, neutron rename still in progress
  • 2013-07-06 16:02:38 UTC Gerrit and Zuul are offline for neutron project rename; ETA 1630 UTC; see https://wiki.openstack.org/wiki/Network/neutron-renaming
  • 2013-06-14 23:28:41 UTC Zuul and Jenkins are back up (but somewhat backlogged). See http://status.openstack.org/zuul/
  • 2013-06-14 20:42:30 UTC Gerrit is back in service. Zuul and Jenkins are offline for further maintenance (ETA 22:00 UTC)
  • 2013-06-14 20:36:49 UTC Gerrit is back in service. Zuul and Jenkins are offline for further maintenance (ETA 22:00)
  • 2013-06-14 20:00:58 UTC Gerrit, Zuul and Jenkins are offline for maintenance (ETA 30 minutes)
  • 2013-06-14 18:29:37 UTC Zuul/Jenkins are gracefully shutting down in preparation for today's 20:00 UTC maintenance
  • 2013-06-11 17:32:14 UTC pbr 0.5.16 has been released and the gate should be back in business
  • 2013-06-11 16:00:10 UTC pbr change broke the gate, a fix is forthcoming
  • 2013-06-06 21:00:45 UTC jenkins log server is fixed; new builds should complete, old logs are being copied over slowly (you may encounter 404 errors following older links to logs.openstack.org until this completes)
  • 2013-06-06 19:38:01 UTC gating is currently broken due to a full log server (ETA 30 minutes)
  • 2013-05-16 20:02:47 UTC Gerrit, Zuul, and Jenkins are back online.
  • 2013-05-16 18:57:28 UTC Gerrit, Zuul, and Jenkins will all be shutting down for reboots at approximately 19:10 UTC.
  • 2013-05-16 18:46:38 UTC wiki.openstack.org and lists.openstack.org are back online
  • 2013-05-16 18:37:52 UTC wiki.openstack.org and lists.openstack.org are being rebooted. downtime should be < 5 min.
  • 2013-05-16 18:36:23 UTC eavesdrop.openstack.org is back online
  • 2013-05-16 18:31:14 UTC eavesdrop.openstack.org is being rebooted. downtime should be less than 5 minutes.
  • 2013-05-15 05:32:26 UTC upgraded gerrit to gerrit-2.4.2-17 to address a security issue: http://gerrit-documentation.googlecode.com/svn/ReleaseNotes/ReleaseNotes-2.5.3.html#_security_fixes
  • 2013-05-14 18:32:07 UTC gating is catching up queued jobs now and should be back to normal shortly (eta 30 minutes)
  • 2013-05-14 17:55:44 UTC gating is broken for a bit while we replace jenkins slaves (eta 30 minutes)
  • 2013-05-14 17:06:56 UTC gating is broken for a bit while we replace jenkins slaves (eta 30 minutes)
  • 2013-05-04 16:31:22 UTC lists.openstack.org and eavesdrop.openstack.org are back in service
  • 2013-05-04 16:19:45 UTC test
  • 2013-05-04 15:58:36 UTC eavesdrop and lists.openstack.org are offline for server upgrades and moves. ETA 1700 UTC.
  • 2013-05-02 20:20:45 UTC Jenkins is in shutdown mode so that we may perform an upgrade; builds will be delayed but should not be lost.
  • 2013-04-26 18:04:19 UTC We just added AAAA records (IPv6 addresses) to review.openstack.org and jenkins.openstack.org.
  • 2013-04-25 18:25:41 UTC meetbot is back on and confirmed to be working properly again... apologies for the disruption
  • 2013-04-25 17:40:34 UTC meetbot is on the wrong side of a netsplit; infra is working on getting it back
  • 2013-04-08 18:09:34 UTC A review.o.o repo needed to be reseeded for security reasons. To ensure that a force push did not miss anything a nuke from orbit approach was taken instead. Gerrit was stopped, old bad repo was removed, new good repo was added, and Gerrit was started again.
  • 2013-04-08 17:50:57 UTC The infra team is restarting Gerrit for git repo maintenance. If Gerrit is not responding please try again in a few minutes.
  • 2013-04-03 01:07:50 UTC https://review.openstack.org/#/c/25939/ should fix the prettytable dependency problem when merged (https://bugs.launchpad.net/nova/+bug/1163631)
  • 2013-04-03 00:48:01 UTC Restarting gerrit to try to correct an error condition in the stackforge/diskimage-builder repo
  • 2013-03-29 23:01:04 UTC Testing alert status
  • 2013-03-29 22:58:24 UTC Testing statusbot
  • 2013-03-28 13:32:02 UTC Everything is okay now.