Jump to: navigation, search

Difference between revisions of "Infrastructure Status"

Line 1: Line 1:
 +
* 2019-07-09 18:53:55 UTC manually deleted 30 leaked nodepool images from vexxhost-sjc1
 
* 2019-07-08 14:07:19 UTC restarted all of zuul on commit 5b851c14f2bd73039748fca71b5db3b05b697f7f
 
* 2019-07-08 14:07:19 UTC restarted all of zuul on commit 5b851c14f2bd73039748fca71b5db3b05b697f7f
 
* 2019-07-04 17:43:01 UTC started mirror.iad.rax.opendev.org which entered a shutoff state 2019-07-04T17:26:01Z associated with a fs-cache assertion failure kernel panic
 
* 2019-07-04 17:43:01 UTC started mirror.iad.rax.opendev.org which entered a shutoff state 2019-07-04T17:26:01Z associated with a fs-cache assertion failure kernel panic

Revision as of 18:53, 9 July 2019

  • 2019-07-09 18:53:55 UTC manually deleted 30 leaked nodepool images from vexxhost-sjc1
  • 2019-07-08 14:07:19 UTC restarted all of zuul on commit 5b851c14f2bd73039748fca71b5db3b05b697f7f
  • 2019-07-04 17:43:01 UTC started mirror.iad.rax.opendev.org which entered a shutoff state 2019-07-04T17:26:01Z associated with a fs-cache assertion failure kernel panic
  • 2019-07-03 15:58:49 UTC Restarted review.opendev.org's Gerrit service to restore git repo replication which appears to have deadlocked.
  • 2019-07-03 01:25:53 UTC post https://review.opendev.org/667782 rsync cron jobs commented out on mirror-update.openstack.org
  • 2019-07-02 23:46:03 UTC mirror.ord.rax.opendev.org rebooted and running openafs 1.8.3; should fix up periodic "hash mismatch" errors seen in jobs
  • 2019-07-02 06:05:11 UTC restarted afs file servers without auditlog. see notes in https://etherpad.openstack.org/p/opendev-mirror-afs for log recent log bundle
  • 2019-07-01 22:31:54 UTC restarted all of zuul on commit c5090244dc608a1ef232edded5cf92bf753dbb12
  • 2019-07-01 03:58:48 UTC AFS auditlog enabled for afs01/02.dfw and afs01.ord AFS servers. logging to /opt/dafileserver.audit.log. notes, including details on how to disable again @ https://etherpad.openstack.org/p/opendev-mirror-afs
  • 2019-06-28 13:13:23 UTC deleted /afs/.openstack.org/docs/tobiko at slaweq's request as a member of https://review.opendev.org/#/admin/groups/tobiko-core
  • 2019-06-28 02:46:55 UTC afs01/02.dfw & afs01.ord restarted with greater -cb values: see https://review.opendev.org/668078
  • 2019-06-27 22:12:16 UTC Gitea06 had a corrupted root disk around the time of the Denver summit. It has been replaced with a new server and added back to the haproxy config.
  • 2019-06-27 19:40:42 UTC deleted https://github.com/openstack/tobiko at slaweq's request as a member of https://review.opendev.org/#/admin/groups/tobiko-core
  • 2019-06-27 15:28:31 UTC mirror.iad.rax.opendev.org started again at 15:24z after mysteriously entering shutoff state some time after 15:11z
  • 2019-06-27 13:11:52 UTC changed chat.freenode.net alias on eavesdrop.o.o from 162.213.39.42 to 38.229.70.22 and restarted openstack-meetbot
  • 2019-06-27 09:40:09 UTC switched irc host on eavesdrop.openstack.org as last one went unresponsive; rebooted host for good measure
  • 2019-06-25 16:08:39 UTC CORRECTION: restarted zuul scheduler on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)
  • 2019-06-25 16:05:05 UTC restarted all of zuul on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)
  • 2019-06-21 15:55:10 UTC mirror.iad.rax.opendev.org restarted at 15:46 utc for host migration
  • 2019-06-21 13:20:45 UTC restarted hound on codesearch.o.o due to persistent "too many open files" errors
  • 2019-06-20 15:00:35 UTC manually deleted instance 4bbfd576-baa1-410f-8384-95c7fac8475b in ovh bhs1; it has a stale node lock from zuul which will be released at the next scheduler restart (for some reason it has lost its ip addresses too, no idea if that's related)
  • 2019-06-19 22:58:45 UTC apache restarted on mirror.iad.rax.opendev.org at 22:16 utc, clearing stale content state
  • 2019-06-18 14:49:38 UTC ran "modprobe kafs rootcell=openstack.org:104.130.136.20:23.253.200.228" and "mount -t afs "#openstack.org:root.afs." /afs" on mirror01.iad.rax.opendev.org after reboot
  • 2019-06-18 14:33:24 UTC mirror.iad.rax.opendev.org started again at 14:10z after mysteriously entering shutoff state at 10:00z
  • 2019-06-14 20:32:26 UTC Updated static.openstack.org (as well as security, specs, tarballs, service-types, governance, releases) ssl cert as part of normal refresh cycle
  • 2019-06-14 14:51:24 UTC restarted all of zuul on commit 3b52a71ff2225f03143862c36224e18f90a7cfd0 (with repl cherry-picked on scheduler)
  • 2019-06-13 23:17:45 UTC developer.openstack.org docs.openstack.org ethercalc.openstack.org etherpad.openstack.org firehose.openstack.org git.openstack.org git.starlingx.io openstackid-dev.openstack.org openstackid.org refstack.openstack.org review.openstack.org storyboard.openstack.org translate.openstack.org wiki.openstack.org zuul.openstack.org ssl certs updated as part of normal refresh cycle.
  • 2019-06-13 19:00:41 UTC Updated ask.openstack.org's ssl cert as part of regular refresh cycle.
  • 2019-06-13 16:04:23 UTC restarted all of zuul on 7e45f84f056b3fa021aae1eecb0c23d9055656f3 (with repl change cherry-picked onto scheduler)
  • 2019-06-13 15:58:38 UTC Restarted nodepool launchers on commit 3412764a985b511cdc6b70dc801ffdb357ec02c2
  • 2019-06-12 03:44:26 UTC mirror01.iad.rax.opendev.org in emergency file, and serving mirror files via kafs which is currently hand-configured; see https://etherpad.openstack.org/p/opendev-mirror-afs
  • 2019-06-11 19:37:20 UTC flushed /afs/openstack.org/mirror/ubuntu/dists/xenial-{security,updates}/main/binary-amd64/Packages.gz on mirror.dfw.rax
  • 2019-06-10 16:07:00 UTC Downgraded openstacksdk to 0.27.0 on nb01 to test if the revert fixes rackspace image uploads
  • 2019-06-09 11:52:19 UTC restarted gerritbot now, as it seems to have silently dropped off freenode 2019-06-08 21:40:43 utc with a 248 second ping timeout
  • 2019-06-08 20:55:29 UTC vos release completed and cron locks released for mirror.fedora, mirror.ubuntu and mirror.ubuntu-ports
  • 2019-06-07 22:23:02 UTC fedora, ubuntu, ubuntu-ports mirrors are currently resyncing to afs02.dfw and won't update again until that is finished
  • 2019-06-07 21:36:38 UTC Exim on lists.openstack.org/lists.opendev.org/lists.starlingx.io/lists.airshipit.org is now enforcing spf failures (not soft failures). This means if you send email from a host that isn't allowed to by the spf record that email will be rejected.
  • 2019-06-07 21:19:56 UTC Performed a full zuul service restart. This reset memory usage (we were swapping), installed the debugging repl, and gives us access to ansible 2.8. Scheduler is running ce30029 on top of e0c975a and mergers + executors are running 00d0abb
  • 2019-06-07 20:11:59 UTC filed a removal request from the spamhaus pbl for the ip address of the new ask.openstack.org server
  • 2019-06-07 15:04:28 UTC deleted ~6k messages matching 'From [0-9]\+@qq.com' in /srv/mailman/openstack/qfiles/in/ on lists.o.o
  • 2019-06-07 14:38:18 UTC removed files02 from emergency file
  • 2019-06-06 23:25:00 UTC added files02.openstack.org to emergency file due to recent system-config changes breaking apache config
  • 2019-06-06 20:56:49 UTC rebooted afs02.dfw.openstack.org following a cinder volume outage for xvdc
  • 2019-06-05 16:36:26 UTC deleted old 2012.2 release from https://pypi.org/project/horizon/
  • 2019-06-03 14:23:26 UTC resized proxycache volume from 64 to 120GB on rax.dfw.opendev.org because it was 100% used (other mirrors report ~80G used)
  • 2019-05-31 19:44:10 UTC Performed repo renames. Some were cleanup from opendev migration and others were normal reorgs.
  • 2019-05-31 17:14:28 UTC manually removed nova 2013.1 release from pypi
  • 2019-05-31 16:48:57 UTC Gerrit is back up and running again. Thank you for your patience and sorry for the delay in this notification (we thought the statusbot was still busy updating channel topics).
  • 2019-05-31 15:11:05 UTC Gerrit is now entering its maintenance window. Expect Gerrit outages in the near future. We will notify when it is back up and running.
  • 2019-05-31 14:34:07 UTC The Gerrit service at https://review.opendev.org/ will be offline briefly for maintenance starting at 15:00 UTC (roughly 30 minutes from now); for details see http://lists.openstack.org/pipermail/openstack-discuss/2019-May/006684.html
  • 2019-05-29 14:15:17 UTC enabled low query log on storyboard with long query time of 1.0 seconds
  • 2019-05-23 00:00:53 UTC zuul scheduler was restarted at 20:41z
  • 2019-05-21 20:14:45 UTC upgraded skopeo on all zuul executors to 0.1.37-1~dev~ubuntu16.04.2~ppa2
  • 2019-05-21 16:29:42 UTC blocked all traffic to wiki.openstack.org from 61.164.47.194 using an iptables drop rule to quell a denial of service condition
  • 2019-05-21 06:39:51 UTC ask.openstack.org migrated to new xenial server
  • 2019-05-20 19:26:52 UTC Deleted git.openstack.org and git01-git08.openstack.org servers. These cgit cluster hosts have been replaced with the gitea cluster.
  • 2019-05-20 07:20:10 UTC removed ask-staging-old and ask-staging01 servers, and all related dns entries (see https://etherpad.openstack.org/p/201808-infra-server-upgrades-and-cleanup)
  • 2019-05-17 20:18:35 UTC Transferred Airship repos from https://github.com/openstack to https://github.com/airshipit according to list at https://www.irccloud.com/pastebin/hBEV5zSm/
  • 2019-05-17 16:44:47 UTC Removed /var/lib/docker-old on gitea02-05,07-08 to free disk space and improve disk headroom on those servers. This was not required on 01 and 06 as the dir did not exist. Note the dir was created when we recovered from the accidental k8sification of our servers and should no longer be needed.
  • 2019-05-17 15:44:33 UTC applied opendev migration renames to the storyboard-dev database to stop puppet complaining
  • 2019-05-16 17:38:27 UTC Gerrit is being restarted to add gitweb links back to Gerrit. Sorry for the noise.
  • 2019-05-16 16:40:00 UTC Restarted zuul-scheduler to pick up a fix for zuul artifact handling that affected non change object types like tags as part of release pipelines. Now running on ee4b6b1a27d1de95a605e188ae9e36d7f1597ebb
  • 2019-05-16 15:57:39 UTC temporarily upgraded skopeo on all zuul executors to manually-built 0.1.36-1~dev~ubuntu16.04.2~ppa19.1 package with https://github.com/containers/skopeo/pull/653 applied
  • 2019-05-16 14:37:46 UTC deleted groups.openstack.org and groups-dev.openstack.org servers, along with their corresponding trove instances
  • 2019-05-16 14:08:15 UTC temporarily upgraded skopeo on ze01 to manually-built 0.1.36-1~dev~ubuntu16.04.2~ppa19.1 package with https://github.com/containers/skopeo/pull/653 applied
  • 2019-05-09 20:24:19 UTC corrected zuul:zuul directory (/var/lib/zuul/keys/secrets/project/gerrit) permissions on zuul.o.o
  • 2019-05-08 14:30:52 UTC set 139.162.227.51 chat.freenode.net in /etc/hosts on eavesdrop01
  • 2019-05-07 23:28:09 UTC If your jobs failed due to connectivity issues to opendev.org they can be rechecked now. Services have been restored at that domain.
  • 2019-05-07 23:24:22 UTC Ansible cron disabled on bridge until we remove the k8s management from run_all.sh. This is necessary to keep k8s installations from breaking standalone docker usage.
  • 2019-05-03 14:53:22 UTC Disabled gitea06 backends in gitea-lb01 haproxy because gitea06 has a sad filesystem according to dmesg and openstack/requirements stable/stein was affected.
  • 2019-05-03 01:15:57 UTC bumped fedora.mirror quota to 800000000 k
  • 2019-05-02 15:06:00 UTC Gerrit is being restarted to pick up a (gitweb-related) configuration change
  • 2019-05-01 17:34:49 UTC Nodepool launchers restarted on commit f58a2a2b68c58f9626170795dba10b75e96cd551 to pick up memory leak fix
  • 2019-04-29 17:53:27 UTC restarted Zuul scheduler on zuul==3.8.1.dev29 # git sha 7e29b8a to pick up the fix for our memory leak
  • 2019-04-26 16:04:35 UTC restarted zuul scheduler at 6afa22c9949bbe769de8e54fd27bc0aad14298bc with a local revert of 3704095c7927568a1f32317337c3646a9d15769e to confirm it is cause of memory leak
  • 2019-04-24 16:54:25 UTC Moved /var/lib/planet/openstack aside on planet.o.o so that puppet will reclone it using the correct remote. This should fix planet publishing updates.
  • 2019-04-24 16:16:15 UTC Restarted nodepool launchers on commit f8ac79661a8d2e2ee980e511c9bdf19a41d156f8
  • 2019-04-23 19:32:53 UTC restarted all of Zuul at commit 6afa22c9949bbe769de8e54fd27bc0aad14298bc due to memory leak
  • 2019-04-23 19:10:55 UTC the zuul scheduler is being restarted now in order to address a memory utilization problem; changes under test will be reenqueued automatically
  • 2019-04-20 21:19:00 UTC re-enabled run_all.sh cron
  • 2019-04-20 03:51:35 UTC The OpenDev Gerrit and Zuul are back online; you may need to update your git remotes for projects which have moved.
  • 2019-04-19 15:07:37 UTC Gerrit is offline for several hours starting at 15:00 UTC to perform the opendev migration; see http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005011.html
  • 2019-04-17 04:53:38 UTC grafana02.openstack.org is now puppeting again with some manual intervention to install the grafana repo gpg key. this key should not change and can remain as is. will need a new version of puppetlabs-apt when puppet4 transition is complete to be fully automatic again. host running grafana 6.1.4
  • 2019-04-16 13:06:23 UTC nodepool-launcher was not running on nl04 due to OOM. Restarted
  • 2019-04-16 03:02:39 UTC Restarted zuul-scheduler and zuul-web on commit 0bb220c
  • 2019-04-16 02:17:29 UTC started re-replication of all projects on review.openstack.org to bring everything in sync due to some missing references on git servers; likely related to one-off replication reconfiguration previously
  • 2019-04-15 23:20:07 UTC restarted nodepool-launcher on nl01 and nl02 due to OOM; restarted n-l on nl03 due to limited memory
  • 2019-04-15 22:04:56 UTC Deleted clarkb-test-bridge-snapshot-boot (b1bbdf16-0669-4275-aa6a-cec31f3ee84b) and clarkb-test-lists-upgrade (40135a0e-4067-4682-875d-9a6cec6a999b) as both tasks they were set up to test for have been completed
  • 2019-04-12 19:17:42 UTC Upgraded lists.openstack.org from trusty to xenial
  • 2019-04-12 16:57:48 UTC Pre lists.openstack.org snapshot completed and is named lists.openstack.org-20190412-before-xenial-upgrade
  • 2019-04-10 19:06:54 UTC Restarting Gerrit on review.openstack.org to pick up new configuration for the replication plugin
  • 2019-04-10 08:15:42 UTC successfully submitted a request to remove 104.130.246.32 (review01.openstack.org) from Spamhaus PBL
  • 2019-04-09 18:52:16 UTC deleted old trusty-based openstackid-dev.openstack.org and openstackid.org servers now that the xenial-based replacements have been operating successfully in production for a while
  • 2019-04-09 07:39:14 UTC mirror01.nrt1.arm64ci.openstack.org currently offline pending (hopeful) recovery
  • 2019-04-04 13:56:59 UTC deleted nova instance "mirror01.la1.citycloud.openstack.org" and cinder volume "newvolume" from citycloud-la1 as that region is being taken permanently offline
  • 2019-04-03 18:14:13 UTC Added PyPi user 'spencer' as owner on PyPi package 'nimble' as we werent using the name and were asked nicely to give it to another group.
  • 2019-04-03 17:22:16 UTC Nodepool builders restarted with new code for cleaning up leaked DIB builds
  • 2019-04-03 14:29:49 UTC added Trinh Nguyen (new telemetry PTL) to gerrit groups {ceilometer,aodh,panko}-{core,stable-maint} http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004486.html
  • 2019-04-02 02:07:05 UTC restarted gerritbot as it seems to have dropped out
  • 2019-03-27 17:45:35 UTC deleted openstack-infra/jenkins-job-builder branch stable/1.6 previously at c9eb8936f3fbb00e7edfa749f6379f531bbf3b1d as requested by zbr
  • 2019-03-27 02:12:06 UTC renewed git.zuul-ci.org cert (which had expired)
  • 2019-03-26 14:22:40 UTC deleted 2012.2 from the cinder releases on pypi.org per http://lists.openstack.org/pipermail/openstack-discuss/2019-March/004224.html
  • 2019-03-25 09:46:28 UTC cleared out dib_tmp, leaked images and rebooted nb01/02/03 (all had full /opt)
  • 2019-03-22 15:21:05 UTC deactivated duplicate gerrit accounts 19705, 23817, 25259, 25412 and 30149 at the request of lmiccini
  • 2019-03-21 23:08:31 UTC restarted zuul executors at git commit efae4deec5b538e90b88d690346a58538bd5cfff
  • 2019-03-21 14:53:29 UTC frickler force-merging https://review.openstack.org/644842 in order to unblock neutron after ansible upgrade
  • 2019-03-21 10:48:33 UTC restarted gerritbot which seems to have dropped out at 08:24:30,089 and not recovered after that
  • 2019-03-20 14:20:25 UTC restarted mysql and apache2 services on storyboard01.opendev.org to investigate cache memory pressure
  • 2019-03-19 21:11:31 UTC restarted all of zuul at commit 77ffb70104959803a8ee70076845c185bd17ddc1
  • 2019-03-19 01:14:21 UTC volumes on backup01.ord.rax.ci.openstack.org "rotated" and new backups now going to a fresh volume. See #644457 for notes for future rotations
  • 2019-03-18 22:31:13 UTC modified jeepyb files on review01.o.o to debug why manage-projects isn't setting retired project acls. I have since restored those file contents and `diff /opt/jeepyb/jeepyb/cmd/manage_projects.py /usr/local/lib/python2.7/dist-packages/jeepyb/cmd/manage_projects.py` shows no delta.
  • 2019-03-15 19:43:55 UTC Upgraded pyopenssl and cryptography on eavesdrop.openstack.org to work around https://tickets.puppetlabs.com/browse/PUP-8986 after the puppet 4 upgrade on this host
  • 2019-03-14 23:07:49 UTC bridge.o.o resized to a 8gb instance
  • 2019-03-13 22:47:48 UTC Deleted tag debian/1%2.1.2-1 in openstack/deb-python-mistralclient to workaround gitea bug with % in ref names. There was already a debian/2.1.2-1 replacement tag pointing to the same ref.
  • 2019-03-13 22:47:09 UTC Deleted tag debian/1%3.1.0-3 in openstack/deb-python-swiftclient to workaround gitea bug with % in ref names. There was already a debian/3.1.0-3 replacement tag pointing to the same ref.
  • 2019-03-13 22:46:18 UTC Added tag debian/1.6.0-2 to openstack/python-deb-oslotest as replacement for deleted tag debian/1%1.6.0-2 to workaround a gitea bug with % in ref names
  • 2019-03-13 21:50:37 UTC Upgraded afsdb01 and afsdb02 servers to Xenial from Trusty.
  • 2019-03-13 19:12:40 UTC Replaced deb-oslotest's debian/1%1.6.0-2 tag as the % is making gitea unhappy and the project is retired. Gitea bug: https://github.com/go-gitea/gitea/issues/6321 New tag: debian/1.6.0-2
  • 2019-03-13 18:22:46 UTC snapshotted and increased the paste-mysql-5.6 trove instance from 5gb disk/2gb ram to 10gb disk/4gb ram
  • 2019-03-12 18:18:27 UTC Rebooted refstack.openstack.org via openstack api as it was in a shutdown state
  • 2019-03-12 17:13:07 UTC changed intermediate registry password in bridge hostvars
  • 2019-03-12 16:26:37 UTC restarted meetbot and statusbot at ~1540z, switching them from card.freenode.net to niven.freenode.net as the former was taken out of rotation
  • 2019-03-12 07:06:43 UTC graphite-old.openstack.org server, volumes & dns records removed (replaced by graphite.opendev.org)
  • 2019-03-11 23:01:09 UTC Upgraded afs01.dfw, afs02.dfw, and afs01.ord to Xenial from Trusty
  • 2019-03-11 21:32:17 UTC restarted zuul executors for security fix 5ae25f0
  • 2019-03-08 17:57:40 UTC restarted all of zuul at commit 603ce6f474ef70439bfa3adcaa27d806c23511f7
  • 2019-03-06 09:18:38 UTC manually removed /var/log/exim4/paniclog on new g*.opendev.org servers after crosschecking the contents to reduce spam
  • 2019-03-05 17:12:35 UTC Gerrit is being restarted for a configuration change, it will be briefly offline.
  • 2019-03-05 14:06:52 UTC houndd restarted on codesearch to correct "too many open files" error
  • 2019-03-05 02:02:46 UTC afs02.dfw rebooted after hanging, likely related to outage for main02 block device on 2019-03-02
  • 2019-03-04 17:55:07 UTC restarted zuul at commit d298cb12e09d7533fbf161448cf2fc297d9fd138
  • 2019-03-04 17:25:11 UTC restarted nodepool launchers and builder at commit 3561e278c6178436aa1d8d673f839a676598ea17
  • 2019-03-04 05:14:58 UTC graphite.opendev.org now active replacement for graphite.openstack.org. everything on the firewall list that might need a restart to pickup new address has been done
  • 2019-03-04 00:36:22 UTC graphite.o.o A/AAAA records renamed to graphite-old.o.o, graphite.o.o now a CNAME to these until switch to graphite.opendev.org
  • 2019-02-27 22:05:01 UTC Removed kdc01.openstack.org and puppetmaster.openstack.org A/AAAA records from DNS
  • 2019-02-27 22:00:02 UTC Deleted Old kdc01.openstack.org (859d5e9c-193c-4c1b-8cb3-4da8316d060c) as it has been replaced by kdc03.openstack.org
  • 2019-02-27 21:30:51 UTC Deleted Old health.openstack.org (9adaa457-16ab-48ea-9618-54af6edd798b) as it has been replaced by health01.openstack.org
  • 2019-02-26 16:17:25 UTC restarted the gerritbot service on review01 to resolve its 12:30:30 ping timeout
  • 2019-02-25 21:07:46 UTC kdc03 is now our kerberos master/admin server. kdc01 is not yet deleted but is not running any services or the propogation cron. Will cleanup kdc01 after a day or two of happy afs services.
  • 2019-02-24 23:04:29 UTC leaked images and temp dirs cleared from nb01/nb02.o.o; reboots to clear orphaned mounts from failed builds, both have plenty of disk now and are building images
  • 2019-02-22 14:32:54 UTC deleted unused nb03.openstack.org/main01 cinder volume from vexxhost ca-ymq-1
  • 2019-02-22 14:30:55 UTC deleted old mirror01.ca-ymq-1.vexxhost.openstack.org server, long since replaced by mirror02
  • 2019-02-21 17:06:41 UTC (dmsimard) nb03 was found out of disk space on /opt, there is now 120GB available after cleaning up leaked images
  • 2019-02-20 19:25:23 UTC deleted openstack/cinderlib project and cinder project group (along with the associated group mapping entry) from storyboard.openstack.org's backend database
  • 2019-02-19 23:53:22 UTC Restarted zuul-scheduler at 5271b592afe708d33fc4b3d08d9a2cc97ae0ddfc and zuul mergers + executors at 6bc25035dd8a41c0522fbe43d149303c426cbc5a.
  • 2019-02-18 16:37:35 UTC Deleted Trusty pbx.openstack.org (038e80f5-15aa-4f69-8c6c-0f43b3587778) as new Xenial pbx01.opendev.org is up and running
  • 2019-02-18 15:42:48 UTC according to ovh, there was a ceph outage which affected the rootfs for our gra1 mirror there on 2018-01-27 between 11:20 and 16:20 utc
  • 2019-02-16 18:04:29 UTC old storyboard.openstack.org server and storyboard-mysql trove database snapshotted and deleted
  • 2019-02-15 23:28:38 UTC pbx01.opendev.org now hosting pbx.openstack.org. Old server to be deleted next week if all looks well then.
  • 2019-02-15 20:05:24 UTC The StoryBoard service on storyboard.openstack.org is offline momentarily for maintenance: http://lists.openstack.org/pipermail/openstack-discuss/2019-February/002666.html
  • 2019-02-14 22:15:57 UTC The test cloud region using duplicate IPs has been removed from nodepool. Jobs can be rechecked now.
  • 2019-02-14 21:33:54 UTC Jobs are failing due to ssh host key mismatches caused by duplicate IPs in a test cloud region. We are disabling the region and will let you know when jobs can be rechecked.
  • 2019-02-14 00:41:17 UTC manually ran "pip3 install kubernetes==9.0.0b1" on bridge to see if newer version avoids deadlock on k8s api calls
  • 2019-02-13 02:14:59 UTC zuul scheduler restarted for debug logging enhancements
  • 2019-02-13 01:34:58 UTC wrt per prior update, removed acme-opendev.org from openstackci user's domains in rax
  • 2019-02-12 23:01:23 UTC for testing purposes i have registered acme-opendev.org and setup openstackci account rax clouddns with it. this is for testing rax api integration without me worrying about wiping out openstack.org by accident
  • 2019-02-12 22:26:59 UTC restarted zuul-web at commit 6d6c69f93e9755b3b812c85ffceb1b830bd75d6f
  • 2019-02-12 15:31:02 UTC Set new github admin account to owner on the openstack-infra org.
  • 2019-02-12 14:48:27 UTC replaced openstackid-dev.openstack.org address records with a cname to openstackid-dev01.openstack.org
  • 2019-02-11 23:21:48 UTC restarted all of zuul with git sha 5957d7a95e677116f39e52c2a44d4ca8b795da34; ze08 is in the disabled list and configured to use jemalloc
  • 2019-02-11 21:07:21 UTC pbx.openstack.org's POTS number updated (see wiki for new number) due to an account shuffle.
  • 2019-02-11 18:22:55 UTC Installed libjemalloc1 on ze08.openstack.org to experiment with alternate malloc implementations. Other executors will act as controls
  • 2019-02-07 20:02:08 UTC Cleaned up Elasticsearch indexes from the future. One was from the year 2106 (job logs actually had those timestamps) and others from November 2019. Total data was a few megabytes.
  • 2019-02-06 21:41:25 UTC Restarted Geard on logstash01 and log processing workers on logstash-workerXX. Geard appeared to be out to lunch with a large queue and workers were not reconnecting on their own.
  • 2019-02-06 19:41:55 UTC Nodepool builders all restarted on commit 6a4a8f with new build timeout feature
  • 2019-02-06 17:29:32 UTC Any changes failed around 16:30 UTC today with a review comment from Zuul like "ERROR Unable to find playbook" can be safely rechecked; this was an unanticipated side effect of our work to move base job definitions between configuration repositories.
  • 2019-02-03 14:50:17 UTC Puppet on etherpad-dev01 failing due to broken hieradata lookups with puppet 4. Fix is https://review.openstack.org/634601
  • 2019-01-30 20:48:56 UTC kata-containers zuul tenant in production, along with the opendev/base-jobs repo
  • 2019-01-30 20:48:18 UTC inap-mtl01 upgraded from neutron mitaka to queens; orphaned ports removed
  • 2019-01-29 23:15:30 UTC http://zuul.openstack.org is not working. https://zuul.openstack.org does work. Please use that while we investigate.
  • 2019-01-28 22:18:36 UTC Deleted old 2014.* Sahara releases from pypi so that modern releases are sorted properly. Old sahara releases are available on the tarballs server.
  • 2019-01-28 20:58:06 UTC Cleaned up leaked images on nb01, nb02, and nb03 to free up disk space for dib.
  • 2019-01-23 20:52:47 UTC deactivated defunct gerrit account 28694 at the request of todin
  • 2019-01-23 16:54:32 UTC restarted zuul at 9e679eadedf2b64955b0511cada91018a1a0e30a
  • 2019-01-22 23:30:16 UTC cleared all nomail[B] subscription flags on openstack-discuss
  • 2019-01-22 23:03:21 UTC disabled automatic bounce processing on openstack-discuss so that we can investigate dmarc issues without everyone having their subscription disabled
  • 2019-01-21 19:30:46 UTC restarted zuul at 691b1bc17c77ebce5b2a568e586d19b77cebbc7b
  • 2019-01-21 19:19:12 UTC The error causing post failures on jobs has been corrected. It is safe to recheck these jobs.
  • 2019-01-20 17:06:05 UTC switched storyboard-dev.o.o to using a local database service instead of trove
  • 2019-01-20 16:13:25 UTC replaced expired zuul-ci.org ssl cert (and associated key/chain) due to unanticipated expiration
  • 2019-01-19 14:48:23 UTC deleted the old storyboard-dev.openstack.org server now that storyboard-dev01.opendev.org has been serving that vhost for a few days
  • 2019-01-18 17:09:33 UTC Upgraded review.openstack.org Gerrit to 2.13.12.ourlocalpatches. Please keep an eye on Gerrit for any unexpected behavior but initial indications look good.
  • 2019-01-18 15:59:33 UTC Cleaned up review.openstack.org:/home/gerrit2/review_site/lib in preparation for the Gerrit 2.13.12 upgrade via https://review.openstack.org/#/c/631346/1
  • 2019-01-18 07:58:55 UTC restarted gerritbot as it had dropped out
  • 2019-01-17 23:39:08 UTC On status.o.o apt-get removed python-httplib2 python-launchpadlib and python-lazr.restful client then reinstalled /opt/elastic-recheck with pip which resulted in Successfully installed elastic-recheck-0.0.1.dev2210 httplib2-0.12.0 launchpadlib-1.10.6 lazr.restfulclient-0.14.2. This to work around newer pip refusing to touch distutil installed packages.
  • 2019-01-16 19:52:03 UTC Submitted PTG attendace survey for the Infra team. Requested 2 days for ~10 people but mentioned we can be flexible. Hope to see you there.
  • 2019-01-16 19:15:56 UTC removed pabelanger admin user from openstack orgs (all) on github.com
  • 2019-01-15 23:39:15 UTC storyboard-dev.openstack.org dns updated to cname to storyboard-dev01.opendev.org
  • 2019-01-15 14:39:20 UTC restarted zuul executors and mergers to pick up git connection config fix
  • 2019-01-15 00:01:18 UTC restarted all of zuul with commit sha 67ef71d2a2d6b5b06e2355eefff00ae3df24bbf7
  • 2019-01-14 19:24:47 UTC updated openid provider for wiki.openstack.org from login.launchpad.net to login.ubuntu.com
  • 2019-01-10 17:30:16 UTC restarted gerrit for updated replication config
  • 2019-01-09 22:23:26 UTC removed ns3.openstack.org A 166.78.116.117 and ns3.openstack.org AAAA 2001:4801:7825:103:be76:4eff:fe10:4f7a as these IPs don't seem to belong to us anymore.
  • 2019-01-09 22:07:08 UTC deleted f8c9e5d6-818f-4168-a53e-414c7e3ccb34 adns1.openstack.org in ORD, af56cafc-6a3d-4ffb-b443-932ece962673 ns1.openstack.org in DFW, and c4f203c0-5315-45ac-ab11-dbce5bd33d67 ns2.openstack.org in ca-ymq-1. DNS now hosted by opendev nameservers
  • 2019-01-08 17:03:11 UTC changed vexxhost openstackci password
  • 2019-01-08 15:43:22 UTC restarted bind9 service on adns1.opendev.org in order to get it to properly recognize and re-sign the updated zuulci.org zone
  • 2019-01-07 23:51:03 UTC The Etherpad service at https://etherpad.openstack.org/ has been offline since 23:22 UTC due to a hypervisor issue in our service provider, but should hopefully return to service shortly.
  • 2019-01-07 22:41:56 UTC nl02 has been removed from the emergency maintenance list now that the filesystems on mirror02.regionone.limestone have been repaired and checked out
  • 2019-01-07 21:15:35 UTC generated and added dnssec keys for zuulci.org to /etc/ansible/hosts/group_vars/adns.yaml on bridge.o.o
  • 2019-01-07 19:41:32 UTC mirror02.regionone.limestone.openstack.org's filesystem on the additional cinder volume went read only for >1 week (total duration unknown) causing errors when apache was attempting to update it's cache files.
  • 2019-01-07 19:32:29 UTC temporarily lowered max-servers to 0 in limestone-regionone in preparation for a mirror instance reboot to clear a cinder volume issue
  • 2019-01-04 22:26:20 UTC restarted gerrit to restart gitea replication thread
  • 2019-01-04 21:07:14 UTC updated zuul-ci.org domain registration to use ns[12].opendev.org as nameservers
  • 2019-01-03 23:57:28 UTC restarted zuul scheduler at 2fd688352f5e220fda0dfc72b164144910670d95
  • 2019-01-02 22:47:32 UTC Restarted nodepool launchers nl01-nl04 to pick up hypervisor host id logging and update openstacksdk. Now running nodepool==3.3.2.dev88 # git sha f8bf6af and openstacksdk==0.22.0
  • 2019-01-02 22:36:58 UTC restarted all of zuul at commit 4540b71
  • 2019-01-02 22:16:05 UTC restarted gerrit to clear stuck gitea replication task
  • 2018-12-21 22:58:29 UTC the gerrit service on review.openstack.org is being restarted to pick up new configuration changes, and will return momentarily
  • 2018-12-21 22:54:52 UTC the gerrit service on review.openstack.org is being restarted to pick up new configuration changes, and will return momentarily
  • 2018-12-21 15:00:59 UTC approved changes 626391, 1626392, 626633, 626393 which expand puppet node definitions and ansible hostgroup patterns to also match opendev.org hostnames
  • 2018-12-20 13:07:54 UTC filed to exclude storyboard.openstack.org from spamhaus pbl
  • 2018-12-19 22:19:08 UTC added openstackadmin account to the following additional github orgs: gtest-org, openstack-ci, openstack-dev, stackforge, openstack-infra, openstack-attic, stackforge-attic
  • 2018-12-19 20:57:05 UTC deleted old puppetmaster.openstack.org and review.openstack.org servers in rackspace dfw after creating final snapshots
  • 2018-12-19 20:52:10 UTC converted opendevorg user on dockerhub to an organization owned by openstackzuul
  • 2018-12-19 20:23:06 UTC deleted unattached eavesdrop.openstack.org/main01 (50gb ssd), mirror01.dfw.rax.openstack.org/main01 (100gb), mirror01.dfw.rax.openstack.org/main02 (100gb), nb04.openstack.org/main01 (1tb) and nodepool.openstack.org/main01 (1tb) volumes in rackspace dfw
  • 2018-12-19 20:13:12 UTC deleted 1tb "bandersnatch-temp" volume in rackspace dfw
  • 2018-12-19 20:03:34 UTC deleted 1tb "before-run" snapshot of "bandersnatch-temp" volume in rackspace dfw
  • 2018-12-14 20:39:03 UTC started the new opendev mailing list manager process with `sudo service mailman-opendev start` on lists.openstack.org
  • 2018-12-11 14:34:53 UTC added openstackid.org to the emergency disable list while smarcet tests out php7.2 on openstackid-dev.openstack.org
  • 2018-12-10 23:14:26 UTC Restarted Zuul scheduler to pick up changes to how projects are grouped into relative priority queues.
  • 2018-12-10 20:19:14 UTC manually invoked `apt upgrade` on ns1 and ns2.opendev.org in order to silence cronspam about unattended-upgrades not upgrading netplan.io due to introducing a new dependency on python3-netifaces
  • 2018-12-10 19:35:39 UTC provider indicates the host on which ze01 resides has gone offline
  • 2018-12-10 14:18:14 UTC upgraded openafs on ze12 with `sudo apt install openafs-modules-dkms=1.6.22.2-1ppa1` and rebooted onto the latest hwe kernel
  • 2018-12-10 14:18:06 UTC restarted statusbot to recover from connectivity issues from saturday
  • 2018-12-06 20:00:02 UTC added ze12 to zuul executor pool to reduce memory pressure
  • 2018-12-06 18:47:00 UTC unblocked stackalytics-bot-2 access to review.o.o since the performance problems observed leading up to addition of the rule on 2018-11-23 seem to be unrelated (it eventually fell back to connecting via ipv4 and no recurrence was reported)
  • 2018-12-06 13:17:49 UTC deleted stale /var/log/exim4/paniclog on ns2.opendev.org to silence nightly cron alert e-mails about it
  • 2018-12-06 00:40:27 UTC rebooted all zuul executors (ze01-ze11) due to suspected performance degredation due to swap. underlying cause is unclear, but may be due to a regression in zuul introduced since 3.3.0, or in dependencies (including ansible). objgraph installed on all executors to support future memory profiling.
  • 2018-12-05 20:09:31 UTC removed lxd and lxd-client packages from ns1 and ns2.opendev.org, autoremoved, upgraded and rebooted
  • 2018-12-05 18:44:53 UTC Nodepool launchers restarted and now running with commit ee8ca083a23d5684d62b6a9709f068c59d7383e0
  • 2018-12-04 19:26:26 UTC moved bridge.o.o /etc/ansible/hosts/openstack.yaml to a .old file for clarity, as it is not (and perhaps was never) used
  • 2018-12-04 06:52:45 UTC fixed emergency file to re-enable bridge.o.o puppet runs (which stopped in http://grafana.openstack.org/d/qzQ_v2oiz/bridge-runtime?orgId=1&from=1543888040274&to=1543889448699)
  • 2018-12-04 01:52:44 UTC used rmlist to delete the openstack, openstack-dev, openstack-operators and openstack-sigs mailing lists on lists.o.o while leaving their archives in place
  • 2018-12-04 00:19:56 UTC clarkb upgraded the Nodepool magnum k8s cluster by pulling images and rebasing/restarting services for k8s on the master and minion nodes. Magnum doesn't support these upgrades via the API yet. Note that due to disk space issues the master node had its journal cleaned up in order to pull the new images down
  • 2018-12-03 20:59:50 UTC CentOS 7.6 appears to have been released. Our mirrors seem to have synced this release. This is creating a variety of fallout in projects such as tripleo and octavia. Considering that 7.5 is now no longer supported we should address this by rolling forward and fixing problems.
  • 2018-12-03 20:58:18 UTC removed static.openstack.org from the emergency disable list now that ara configuration for logs.o.o site has merged
  • 2018-11-30 00:34:34 UTC manual reboot of mirror01.nrt1.arm64ci.openstack.org after a lot of i/o failures
  • 2018-11-29 22:16:30 UTC manually restarted elastic-recheck service on status.openstack.org to clear event backlog
  • 2018-11-29 14:42:47 UTC temporarily added nl03.o.o to the emergency disable list and manually applied https://review.openstack.org/620924 in advance of it merging
  • 2018-11-29 14:29:53 UTC rebooted mirror01.sjc1.vexxhost.openstack.org via api as it seems to have been unreachable since ~02:30z
  • 2018-11-28 02:37:56 UTC removed f27 directories after https://review.openstack.org/618416
  • 2018-11-28 01:04:05 UTC pypi volume removed from afs : afs01.dfw much happier -> /dev/mapper/main-vicepa 4.0T 2.1T 1.9T 53% /vicepa
  • 2018-11-23 15:56:02 UTC temporarily blocked stackalytics-bot-2 access to review.o.o to confirm whether the errors reported for it are related to current performance problems
  • 2018-11-22 18:00:28 UTC manually triggered gerrit's jvm garbage collection from the javamelody interface, freeing some 40gb of used memory within the jvm
  • 2018-11-22 15:58:49 UTC We have recovered from high cpu usage on review.openstack.org by killing several requests in melody that had been running for several hours and brought gerrit to a crawl with proxy errors. Requests looked like this: "/changes/?q=is:watched+is:merged&n=25&O=81 GET" but we haven't been able to identify where these requests came from.
  • 2018-11-21 15:05:58 UTC rolled back garbled summit feedback pad using: wget -qO- 'http://localhost:9001/api/1.2.11/restoreRevision?apikey='$(cat /opt/etherpad-lite/etherpad-lite/APIKEY.txt)'&padID=BER-Feedback-Session&rev=5564'
  • 2018-11-20 18:28:53 UTC ran `rmlist openstack-tc` on lists.o.o to retire the openstack-tc ml without removing its archives
  • 2018-11-19 22:41:50 UTC clarkb force merged https://review.openstack.org/618849 to fix a bug in zuul-jobs that was effecting many jobs. Long story short we have to accomdoate ipv6 addresses in zuul-jobs.
  • 2018-11-19 00:23:49 UTC started phase 2 of openstack-discuss ml combining as described at http://lists.openstack.org/pipermail/openstack-dev/2018-September/134911.html
  • 2018-11-16 10:34:10 UTC restarted gerritbot as it seemed to have dropped out of at least this channel. didn't see anything particularly helpful in logs
  • 2018-11-15 14:50:42 UTC ran "systemctl --global mask --now gpg-agent.service gpg-agent.socket gpg-agent-ssh.socket gpg-agent-extra.socket gpg-agent-browser.socket" on bridge to disable gpg-agent socket activation
  • 2018-11-14 06:59:31 UTC after receiving several 500 errors from storyboard.o.o; i restarted the worker services and apache2 on the server. backtrace is in apache error logs and matches what was seen on client error box
  • 2018-11-14 05:57:02 UTC rebooting mirror01.iad.rax.openstack.org to see if it helps with persistent pypi.org connection resets -- see comment in https://storyboard.openstack.org/#!/story/2004334#comment-110394
  • 2018-11-14 01:19:24 UTC force-merged 617852 & 617845 at the request of triple-o to help with long gate backlog
  • 2018-11-13 03:49:00 UTC removed bridge.o.o /opt/system-config/playbooks/roles/exim/filter_plugins/__pycache__/filters.cpython-36.pyc file which was stopping exim role from running, see also https://github.com/ansible/ansible/pull/48587
  • 2018-11-09 03:40:24 UTC mirror.nrt1.arm64ci.opensatck.org up and running!
  • 2018-11-07 20:39:39 UTC logs.o.o was put in the emergency file to test if bumping to 16 wsgi processes addresses timeout issues pending https://review.openstack.org/616297
  • 2018-11-07 05:42:28 UTC planet01.o.o in the emergency file, pending investigation with vexxhost
  • 2018-11-07 05:30:24 UTC planet.o.o shutdown and in error state, vexxhost.com currently not responding (planet.o.o is hosted in ca-ymq-2)
  • 2018-11-06 21:36:44 UTC infra-core added to ansible-role-cloud-launcher-core after getting rcarrillocruz's go ahead
  • 2018-11-05 16:28:03 UTC Added stephenfin and ssbarnea to git-review-core in Gerrit. Both have agree to focus on bug fixes, stability, and improved testing. Or as corvus put it "to be really clear about that, i think any change which requires us to alter our contributor docs should have a nearly impossible hill to climb for acceptance".
  • 2018-11-02 18:55:58 UTC The firewall situation with ports 8080, 8081, and 8082 on mirror nodes has been resolved. You can recheck jobs that have failed to communicate to the mirrors on those ports now.
  • 2018-11-02 18:11:11 UTC OpenStack infra's mirror nodes stopped accepting connections on ports 8080, 8081, and 8082. We will notify when this is fixed and jobs can be rechecked if they failed to communicate with a mirror on these ports.
  • 2018-11-01 21:24:09 UTC openstacksdk 0.19.0 installed on nl01-04 and nb01-03 and all nodepool launchers and builders have been restarted
  • 2018-10-31 16:19:47 UTC manually installed linux-image-virtual-hwe-16.04 on etherpad01.openstack.org to test out theory about cache memory and system cpu utilization
  • 2018-10-30 23:16:27 UTC Old nodepool.openstack.org acfa8539-10a2-4bc4-aabc-e324aa855c70 deleted as we no longer use any services on this host. Filesystem snapshot saved and called nodepool.openstack.org-20181030.2
  • 2018-10-26 08:41:21 UTC restarted apache2 service on mirror.regionone.limestone.openstack.org
  • 2018-10-25 22:33:30 UTC Old nodepool images cleared out of cloud providers as part of the post ZK db transition cleanup.
  • 2018-10-25 19:02:08 UTC Old dib images cleared out of /opt/nodepool_dib on nb01, nb02, and nb03. Need to remove them from cloud providers next.
  • 2018-10-25 15:59:17 UTC Zuul and Nodepool running against the new three node zookeeper cluster at zk01 + zk02 + zk03 .openstack.org. Old server at nodepool.openstack.org will be deleted in the near future
  • 2018-10-25 15:32:59 UTC The Zuul and Nodepool database transition is complete. Changes updated during the Zuul outage may need to be rechecked.
  • 2018-10-25 14:41:30 UTC Zuul and Nodepool services are being restarted to migrate them to a new Zookeeper cluster. THis brings us an HA database running on newer servers.
  • 2018-10-24 22:46:31 UTC nb01 and nb02 patched to have https://review.openstack.org/#/c/613141/ installed so that image uploads to rax work. Both nodes are in ansible emergency file so this won't be undone automatically. Will need openstacksdk release before removing them from the emergency file
  • 2018-10-23 20:00:28 UTC nb04.opensatck.org removed from emergency
  • 2018-10-23 16:18:23 UTC increased quota for project.starlingx volume from 100mb to 1gb
  • 2018-10-23 15:02:26 UTC doubled memory allocation for etherpad-mysql-5.6 trove instance from 2gb to 4gb (contains indicated ~2gb active use)
  • 2018-10-23 14:55:58 UTC doubled size of disk for etherpad-mysql-5.6 trove instance from 20gb to 40gb (contains 17.7gb data)
  • 2018-10-23 09:43:08 UTC nl04 in emergency with ovh-gra1 set to 0 for now
  • 2018-10-19 20:30:11 UTC Old logstash.openstack.org server (08c356e5-d225-4163-9dce-c57b4d68eb55) running trusty has been deleted in favor of new logstash01.openstack.org server running xenial
  • 2018-10-19 17:18:29 UTC Old etherpad.openstack.org server (8e3ab3b5-b264-494a-abfc-026ad29744da) deleted as it has been replaced by a new etherpad01.openstack.org server running Xenial.
  • 2018-10-18 20:03:25 UTC Old Trusty etherpad-dev server (85140e9f-9759-4c8b-aca1-bd92ad1cb6b3) deleted now that new Xenial etherpad-dev01 server has been running for a few days without apparent issue
  • 2018-10-18 13:05:40 UTC manually deleted corrupt /afs/.openstack.org/mirror/wheel/ubuntu-16.04-x86_64/s/sqlalchemy-utils/SQLAlchemy_Utils-0.33.6-py2.py3-none-any.whl and released mirror.wheel.xenialx64 volume
  • 2018-10-17 18:43:53 UTC manually deleted corrupt /afs/.openstack.org/mirror/wheel/ubuntu-16.04-x86_64/s/sqlalchemy-utils/SQLAlchemy_Utils-0.33.6-py2.py3-none-any.whl and released mirror.wheel.xenialx64 volume
  • 2018-10-16 00:05:29 UTC nl04 in emergency while I fiddle with ovh-gra quotas to see what works
  • 2018-10-15 15:16:32 UTC force-merged https://review.openstack.org/610484 in order to work around gate issue for OpenStack Chef cookbook CI
  • 2018-10-11 20:17:31 UTC all nodepool-builders / nodepool-launchers restarted to pick up latest code base (32b8f58)
  • 2018-10-09 21:52:35 UTC OVH BSH1 manual port cleanup is running periodically (every 20 minutes) in a root screen on bridge.o.o until a better solution appears
  • 2018-10-08 23:19:04 UTC preallocated remaining space on mirror02.us-west-1.packethost.openstack.org rootfs by writing /dev/zero to a file and then removing it
  • 2018-10-08 23:11:17 UTC started mirror02.us-west-1.packethost.openstack.org via openstackclient after ~4 hours in SHUTOFF state
  • 2018-10-08 17:04:29 UTC started mirror02.us-west-1.packethost.openstack.org via openstackclient after ~3 hours in SHUTOFF state
  • 2018-10-05 21:51:50 UTC rebooted logstash.openstack.org to "fix" broken layer 2 connectivity to backend gateway
  • 2018-10-02 18:46:48 UTC manually deleted /afs/.openstack.org/docs/charm-deployment-guide/latest and performed a vos release of the docs volume
  • 2018-10-02 16:10:58 UTC deleted openstack/cinder driverfixes/ocata branch formerly at 486c00794b1401077bd0c9a6071135c149382958
  • 2018-10-02 06:24:46 UTC We merged change https://review.openstack.org/606129 to change precedence of pipelines and I'm curious to see this in practice.
  • 2018-09-28 12:55:32 UTC (dmsimard) enqueued https://review.openstack.org/606058 to gate and promoted it to increase nodepool capacity
  • 2018-09-25 07:32:13 UTC graphite.o.o removed, puppet has run and config file looks ok
  • 2018-09-25 06:43:13 UTC graphite.o.o in emergency until merge of https://review.openstack.org/604972
  • 2018-09-20 06:23:03 UTC disabled bhs1.ovh again since mirror is not reachable
  • 2018-09-17 15:56:02 UTC Removed openstack/cinder driverfixes/ocata branch with HEAD a37cc259f197e1a515cf82deb342739a125b65c6
  • 2018-09-17 15:12:24 UTC manually deleted /afs/.openstack.org/mirror/wheel/ubuntu-16.04-x86_64/s/sqlalchemy-utils/SQLAlchemy_Utils-0.33.4-py2.py3-none-any.whl and released the mirror.wheel.xenialx64 volume
  • 2018-09-12 06:42:50 UTC nb03.o.o has been transitioned into the new linaro london cloud. hopefully this will stay more reliably attached to zookeeper
  • 2018-09-11 16:16:15 UTC storyboard.o.o webclient revert has been pushed out and its emergency list entry has been removed
  • 2018-09-11 15:44:29 UTC temporarily added storyboard.o.o to emergency disable list while manually applying https://review.openstack.org/601618 until it merges
  • 2018-09-10 21:51:09 UTC INAP will be upgrading Keystone to Pike (from Mitaka) tomorrow at 10pm UTC. Maintenance window is ~1 hour. This will impact inap-mtl01.
  • 2018-09-10 06:49:02 UTC manually uploaded 11.0.0, 10.0.1, 9.0.6, 8.1.5, ceilometer pypi releases per http://lists.openstack.org/pipermail/openstack-dev/2018-September/134496.html
  • 2018-09-06 23:01:52 UTC rebuilt mirror.mtl01.inap.openstack.org and removed unused volumes mirror.mtl01.internap.openstack.org/main02 and mirror.mtl01.internap.openstack.org/main01
  • 2018-08-31 09:59:00 UTC Jobs using devstack-gate (legacy devstack jobs) have been failing due to an ara update. We use now a newer ansible version, it's safe to recheck if you see "ImportError: No module named manager" in the logs.
  • 2018-08-31 00:34:50 UTC restarted etherpad-lite service on etherpad.openstack.org to start making use of the 1.7.0 release in the wake of https://review.openstack.org/597544
  • 2018-08-29 14:29:44 UTC reset /opt/etherpad-lite/etherpad-lite on etherpad-dev.o.o to the 1.7.0 release and restarted the etherpad-lite service for testing
  • 2018-08-29 03:06:48 UTC the openstack meetbot is now delaying channel joins until nickserv confirms identification; all channels should once again be logged
  • 2018-08-29 03:05:43 UTC updated chat.freenode.net entry in eavesdrop.o.o's /etc/hosts file from no-longer-active 82.96.96.11 to 38.229.70.22 (card.freenode.net) and restarted the openstack-meetbot and statusbot services
  • 2018-08-24 13:27:21 UTC updated chat.freenode.net entry in eavesdrop.o.o's /etc/hosts file from no-longer-active 195.154.200.232 to 82.96.96.11 (kornbluth.freenode.net) and restarted the openstack-meetbot and statusbot services
  • 2018-08-22 22:22:11 UTC manually deleted redundant branch "stable/queen" (previously at e1146cc01bce2c1bd6eecb08d92281297218f884) from openstack/networking-vsphere as requested in http://lists.openstack.org/pipermail/openstack-infra/2018-August/006064.html
  • 2018-08-21 16:34:05 UTC Stopped ze01.o.o, deleted executor-git directory on filesystem, started ze01.o.o again. Zuul has properly repopulated the directory with right file permissions
  • 2018-08-21 13:12:57 UTC started apache service on ask.openstack.org (died around log rotation again leaving no information as to why)
  • 2018-08-18 12:58:55 UTC removed stale pidfile and started apache on ask.o.o after it died silently during log rotation
  • 2018-08-17 14:30:17 UTC the hypervisor host for ze02 was restarted, server up since 22:53z, seems to be running jobs normally
  • 2018-08-16 23:11:36 UTC This means that config changes will need to be manually applied while we work to get the puppet cron running on bridge.o.o. New projects won't be created for example.
  • 2018-08-16 23:10:51 UTC Puppetmaster is no longer running puppet for us. bridge.openstack.org is now our cfg mgmt control. It is currently in a state of transition while we test things and puppet is not being automatically executed.
  • 2018-08-16 22:56:16 UTC restarted all zuul executors with linux 4.15.0-32-generic
  • 2018-08-16 14:30:31 UTC manually deleted the stable/rocky branch previously at 90dfca5dfc60e48544ff25f63c3fa59cb88fc521 from openstack/ovsdbapp at the request of amoralej and smcginnis
  • 2018-08-15 16:59:13 UTC 93b2b91f-7d01-442b-8dff-96a53088654a ethercalc01.openstack.org has been deleted in favor of new xenial ethercalc02 server
  • 2018-08-14 22:35:08 UTC Ethercalc service migrated to Xenial on new ethercalc02 instance. Backups updated to push to bup-ethercalc02 remote as well. We should delete ethercalc01.openstack.org in the near future then bup-ethercalc01 in the later future.
  • 2018-08-14 09:45:35 UTC nodepool dib images centos-7-0000009152 debian-stretch-0000000171 ubuntu-trusty-0000003720 removed, see https://review.openstack.org/591588
  • 2018-08-08 14:39:39 UTC manually deleted branch stable/rocky previously at commit a33b1499d7c00e646a9b49715a8a7dbd4467ec91 from openstack/python-tripleoclient as requested by mwhahaha, EmilienM and smcginnis
  • 2018-08-07 20:44:34 UTC Due to a bug, Zuul has been unable to report on cherry-picked changes over the last 24 hours. This has now been fixed; if you encounter a cherry-picked change missing its results (or was unable to merge), please recheck now.
  • 2018-08-07 19:54:40 UTC deleted a/aaaa rrs for the long-gone odsreg.openstack.org
  • 2018-08-07 17:21:48 UTC Updated openstacksdk on nodepool-launchers to 0.17.2 to fix provider thread crashes that result in idle providers
  • 2018-08-07 06:08:34 UTC http://zuul.openstack.org/api/config-errors shows *no* errors
  • 2018-08-06 23:27:13 UTC zuul now reports to gerrit over HTTPS rather than ssh; please keep an eye out for any issues
  • 2018-08-05 08:39:28 UTC the periodic translation jobs are not run - help needed to figure out failure
  • 2018-08-03 17:30:40 UTC Project renames and review.openstack.org downtime are complete without any major issue.
  • 2018-08-03 16:04:35 UTC The infra team is renaming projects in Gerrit. There will be a short ~10 minute Gerrit downtime in a few minutes as a result.
  • 2018-08-01 23:24:14 UTC set mlock +n on all channels (prevents sending to the channel without joining)
  • 2018-08-01 15:49:44 UTC Due to ongoing spam, all OpenStack-related channels now require authentication with nickserv. If an unauthenticated user joins a channel, they will be forwarded to #openstack-unregistered with a message about the problem and folks to help with any questions (volunteers welcome!).
  • 2018-08-01 04:58:43 UTC +r (registered users only) has been temporarily set on #openstack-infra due to incoming spam. this will be re-evaluated in a few hours
  • 2018-07-28 00:57:32 UTC all zuul-executors now running kernel 4.15.0-29-generic #31~16.04.1-Ubuntu
  • 2018-07-27 15:41:58 UTC A zuul config error slipped through and caused a pile of job failures with retry_limit - a fix is being applied and should be back up in a few minutes
  • 2018-07-26 22:46:09 UTC mirror.us-west-1.packethost.openstack.org cname updated to mirror02.us-west-1.packethost.openstack.org
  • 2018-07-25 20:28:05 UTC enqueued 585839 into gate to help fix tripleo queue
  • 2018-07-25 04:07:05 UTC upgraded openstacksdk to 0.17.0 on puppetmaster for to resolve vexxhost issues (see 0.17.0 release notes currently @ https://docs.openstack.org/releasenotes/openstacksdk/unreleased.html#bug-fixes)
  • 2018-07-24 14:13:37 UTC mirror.us-west-1.packethost.openstack.org reboot via openstack API due to not responding to SSH / HTTP requests. Server now back online.
  • 2018-07-24 01:49:56 UTC ze11.openstack.org is online and running jobs.
  • 2018-07-23 20:06:26 UTC set forward_auto_discards=0 on openstack-qa@lists.openstack.org ml to combat spam backscatter
  • 2018-07-23 18:49:19 UTC All nodepool builders restarted with latest code, include switch from shade to openstacksdk
  • 2018-07-23 18:39:19 UTC All nodepool launchers restarted with latest code, include switch from shade to openstacksdk
  • 2018-07-19 13:43:11 UTC logs.openstack.org is back on-line. Changes with "POST_FAILURE" job results should be rechecked.
  • 2018-07-19 12:59:58 UTC logs.openstack.org is offline, causing POST_FAILURE results from Zuul. Cause and resolution timeframe currently unknown.
  • 2018-07-19 05:37:06 UTC grafana.o.o switched to new grafana02.o.o
  • 2018-07-17 15:11:10 UTC switched primary address for openstackci pypi account from review@o.o to infra-root@o.o so that it doesn't get mixed in with gerrit backscatter (we can switch to a dedicated alias later if needed)
  • 2018-07-17 15:05:03 UTC changed validated e-mail address for openstackci account on pypi per https://mail.python.org/mm3/archives/list/distutils-sig@python.org/thread/5ER2YET54CSX4FV2VP24JA57REDDW5OI/
  • 2018-07-13 23:38:10 UTC logs.openstack.org is back on-line. Changes with "POST_FAILURE" job results should be rechecked.
  • 2018-07-13 21:53:48 UTC logs.openstack.org is offline, causing POST_FAILURE results from Zuul. Cause and resolution timeframe currently unknown.
  • 2018-07-08 15:11:26 UTC touched GerritSiteHeader.html on review.openstack.org to get hideci.js working again after https://review.openstack.org/559634 was puppeted
  • 2018-07-06 14:14:58 UTC manually restarted mosquitto, lpmqtt and germqtt services on firehose01.openstack.org (mosquitto died again during log rotation due to its signal handling bug, and the other two services subsequently died from connection failures because the broker was down)
  • 2018-07-06 03:05:36 UTC old reviewday and bugday processes on status.o.o manually killed, normal runs should resume
  • 2018-07-05 16:26:54 UTC restarted rabbitmq-server service on storyboard.openstack.org to clear a "lock wait timeout exceeded" internalerror condition blocking task status updates
  • 2018-06-29 03:20:12 UTC migrated /opt on nb03.o.o to a new cinder volume due to increasing space requirements from new builds
  • 2018-06-28 12:09:53 UTC lists.openstack.org has been removed from the emergency disable list now that https://review.openstack.org/576539 has merged
  • 2018-06-26 06:49:33 UTC project-config's zuul config is broken, it contains a removed job. https://review.openstack.org/577999 should fix it.
  • 2018-06-25 22:48:19 UTC puppetdb.openstack.org A and AAAA dns records removed
  • 2018-06-25 20:45:26 UTC Jenkins and Infracloud data removed from hieradata
  • 2018-06-25 16:43:25 UTC Nodepool launchers restarted with latest code
  • 2018-06-25 03:41:53 UTC nl03.o.o in emergency and vexxhost max-severs turned to 0 temporarily
  • 2018-06-25 03:35:03 UTC storyboard.o.o works & apache restarted after persistent errors appearing to occur after rabbitmq disconnection. See log around Sun Jun 24 08:33:43.058712 for original error (http://paste.openstack.org/show/724194/)
  • 2018-06-22 21:25:18 UTC SSL cert rotation for June 2018 completed.
  • 2018-06-22 21:00:24 UTC Removed docs-draft CNAME record to static.o.o as the doc drafts are no longer hosted separately
  • 2018-06-19 14:40:20 UTC lists.openstack.org added to emergency disable list until https://review.openstack.org/576539 merges
  • 2018-06-18 23:23:31 UTC openstackid.org has been removed from the emergency disable list now that https://review.openstack.org/576248 has merged, and after confirming with smarcet that he will keep an eye on it
  • 2018-06-13 16:48:21 UTC switched lists.airshipit.org and lists.starlingx.io dns records from cname to a/aaaa for proper mail routing
  • 2018-06-13 01:57:09 UTC yum-cron is now active on all git* hosts. some may have not had package updates for a while, so look at that first in case of issues
  • 2018-06-12 00:44:28 UTC storyboard.openstack.org has been removed from the emergency disable list now that https://review.openstack.org/574468 has merged
  • 2018-06-11 19:58:40 UTC Zuul was restarted for a software upgrade; changes uploaded or approved between 19:30 and 19:50 will need to be rechecked
  • 2018-06-09 14:48:20 UTC manually started the unbound daemon on mirror.gra1.ovh.openstack.org due to https://launchpad.net/bugs/1775833
  • 2018-06-09 13:20:39 UTC temporarily added storyboard.openstack.org to the emergency disable list and manually reverted to openstack-infra/storyboard commit f38f3bc while working to bisect a database locking problem
  • 2018-06-08 21:21:34 UTC Manually applied https://review.openstack.org/#/c/573738/ to nl03 as nl* are disabled in puppet until we sort out the migration to no zk schema
  • 2018-06-08 19:25:19 UTC Nodepool issue from earlier today seems to have been caused by nl03 launcher restart. Mixed, incompatible versions of code caused us to create min-ready nodes continually until we reached full capacity. A full shutdown and restart of nodepool launchers is necessary to prevent this going forward.
  • 2018-06-08 17:25:59 UTC The Zuul scheduler was offline briefly to clean up from debugging a nodepool issue, so changes uploaded or approved between 16:50 and 17:15 UTC may need to be rechecked or reapproved (all already queued changes are in the process of being reenqueued now)
  • 2018-06-08 13:48:39 UTC unbound was manually restarted on many zuul executors following the 1.5.8-1ubuntu1.1 security update, due to https://launchpad.net/bugs/1775833
  • 2018-06-08 13:48:35 UTC A misapplied distro security package update caused many jobs to fail with a MERGER_FAILURE error between ~06:30-12:30 UTC; these can be safely rechecked now that the problem has been addressed
  • 2018-06-08 06:03:09 UTC Zuul stopped receiving gerrit events around 04:00UTC; any changes submitted between then and now will probably require a "recheck" comment to be requeued. Thanks!
  • 2018-06-07 22:21:22 UTC Added vexxhost back to nl03 but our mirror node is unhappy there so have temporarily disabled the cloud again until the mirror node is up and running
  • 2018-06-07 16:11:59 UTC The zuul upgrade to ansible 2.5 is complete and zuul is running again. Changes uploaded or approved between 15:25 and 15:45 will need to be rechecked. Please report any problems in #openstack-infra
  • 2018-06-07 15:32:32 UTC Zuul update for Ansible 2.5 in progress. Scheduler crashed as unexpected side effect of pip upgrade. Will be back and running shortly.
  • 2018-06-06 15:52:11 UTC deleted /afs/.openstack.org/docs/project-install-guide/baremetal/draft at the request of pkovar
  • 2018-06-06 13:43:41 UTC added lists.starlingx.io cname to lists.openstack.org and started the mailman-starlingx service on the server now that https://review.openstack.org/569545 has been applied
  • 2018-06-05 21:36:58 UTC Zuul job-output.txt files were incomplete if at any point the job stopped producing logs for more than 5 seconds. This happened due to a timeout in the log streaming daemon. This has been fixed and the zuul executors have been restarted. Jobs running after now should have complete logs.
  • 2018-06-04 00:37:01 UTC survey01.openstack.org is no longer in the emergency disable list now that https://review.openstack.org/571976 has merged
  • 2018-06-03 13:39:52 UTC survey01.openstack.org has been placed into the emergency disable list until https://review.openstack.org/571976 merges so that setup can resume
  • 2018-05-30 22:17:39 UTC storyboard is deploying latest webclient again after fixing the deployment process around the webclient.
  • 2018-05-30 20:54:01 UTC storyboard.openstack.org has been removed from the emergency disable list now that storyboard-webclient tarball deployment is fixed
  • 2018-05-30 17:42:52 UTC git08 added back to git.o.o haproxy and all git backends updated to make git.openstack.org vhost their default vhost. This means that https clients that don't speak SNI will get the cert for git.o.o (and talk to git.o.o vhost) by default.
  • 2018-05-29 23:45:05 UTC Bypassed zuul on https://review.openstack.org/570811 due to needing a circular fix for the cmd2 environment markers solution
  • 2018-05-29 23:45:01 UTC Restarted statusbot as it seemed to have gotten lost in the midst of Saturday's netsplits
  • 2018-05-24 22:38:46 UTC removed AAAA records for afsdb01 & afsdb02 per https://review.openstack.org/559851
  • 2018-05-24 11:38:16 UTC afs mirror.pypi quota exceeded; increased to 1.9T (2000000000)
  • 2018-05-24 05:17:33 UTC mirror-update.o.o placed into emergency file, and cron is stopped on the host, pending recovery of several afs volumes
  • 2018-05-11 18:20:14 UTC restarted the etherpad-lite service on etherpad.openstack.org for the upgrade to 1.6.6
  • 2018-05-11 13:17:11 UTC Due to a Zuul outage, patches uploaded to Gerrit between 09:00UTC and 12:50UTC, were not properly added to Zuul. Please recheck any patches during this window and apologies for the inconvenience.
  • 2018-05-10 14:42:27 UTC restarted the etherpad-lite service on etherpad-dev.openstack.org to test release 1.6.6
  • 2018-05-07 22:58:47 UTC Any devstack job failure due to rsync errors related to tripleo-incubator can safely be rechecked now
  • 2018-05-02 22:13:28 UTC Gerrit maintenance has concluded successfully
  • 2018-05-02 20:08:03 UTC The Gerrit service at review.openstack.org will be offline over the next 1-2 hours for a server move and operating system upgrade: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130118.html
  • 2018-05-02 19:37:57 UTC The Gerrit service at review.openstack.org will be offline starting at 20:00 (in roughly 25 minutes) for a server move and operating system upgrade: http://lists.openstack.org/pipermail/openstack-dev/2018-May/130118.html
  • 2018-04-28 13:58:59 UTC Trove instance for storyboard.openstack.org was rebooted 2018-04-28 07:29z due to a provider incident (DBHDJ5WPgalxvZo) but is back in working order
  • 2018-04-27 12:15:52 UTC fedora-26 removed from mirror.fedora (AFS mirror) and rsync configuration on mirror-update.o.o
  • 2018-04-27 11:19:06 UTC jessie removed from mirror.debian (AFS mirror) and reprepro configuration on mirror-update.o.o
  • 2018-04-26 17:10:25 UTC nb0[12], nl0[1-4] restarted nodepool services to pick up recent chagnes to nodepool. All running e9b82226a5641042e1aad1329efa6e3b376e7f3a of nodepool now.
  • 2018-04-26 15:07:31 UTC We've successfully troubleshooted the issue that prevented paste.openstack.org from loading and it's now back online, thank you for your patience.
  • 2018-04-26 14:29:42 UTC ze09 was rebooted 2018-04-26 08:15z due to a provider incident (CSHD-1AG24176PQz) but is back in working order
  • 2018-04-26 02:44:02 UTC restarted lodgeit on paste.o.o because it appeared hung
  • 2018-04-25 22:10:02 UTC ansible-role-puppet updated with new support for Puppet 4 (backward compatible with puppet 3)
  • 2018-04-25 22:09:40 UTC ic.openstack.org domain deleted from dns management as part of infracloud cleanup"
  • 2018-04-25 13:32:43 UTC ze09 was rebooted 2018-04-25 01:39z due to a provider incident (CSHD-vwoxBJl5x7L) but is back in working order
  • 2018-04-25 13:31:49 UTC logstash-worker20 was rebooted 2018-04-22 20:50z due to a provider incident (CSHD-AjJP61XQ2n5) but is back in working order
  • 2018-04-19 18:19:51 UTC all DIB images (minus gentoo) have been unpaused for nodepool-builder. Latest release of diskimage-builder fixed our issues related to pip10 and glean failing to boot.
  • 2018-04-19 07:43:59 UTC 7000+ leaked images and ~200TB of leaked images and objects cleaned up from our 3 RAX regions. See https://review.openstack.org/#/c/562510/ for more details
  • 2018-04-18 21:21:31 UTC Pypi mirroring with bandersnatch is now running with Bandersnatch 2.2.0 under python3. This allows us to blacklist packages if necessary (which we are doing to exclude very large packages with very frequent updates to reduce disk needs)
  • 2018-04-17 14:15:36 UTC deactivated duplicate gerrit accounts 26191 and 27230, and reassigned their openids to older account 8866
  • 2018-04-17 13:40:38 UTC running `mosquitto -v -c /etc/mosquitto/mosquitto.conf` under a root screen session for crash debugging purposes
  • 2018-04-17 09:52:58 UTC nb03.o.o placed into emergency file and manually applied pause of builds, while project-config gating is broken
  • 2018-04-17 00:04:20 UTC PyPi mirror updating is on pause while we sort out updating bandersnatch in order to blacklist large packages that keep filling our mirror disk volumes.
  • 2018-04-16 18:09:22 UTC restarted the mosquitto service on firehose01.openstack.org to pick up a recent configuration change
  • 2018-04-16 16:46:40 UTC increased AFS pypi mirror volume quota to 1800000000 kbytes (thats just under 1.8TB) as previous value of 1700000000 was nearing capacity
  • 2018-04-14 16:52:53 UTC The Gerrit service at https://review.openstack.org/ will be offline for a minute while it is restarted to pick up a configuration change allowing it to start commenting on stories in StoryBoard, and will return to service momentarily
  • 2018-04-13 21:02:29 UTC holding lock on mirror.debian for reprepro while I repair debian-security database in reprepro
  • 2018-04-13 20:46:30 UTC openstack/os-client-config bugs have been imported to storyboard.o.o from the os-client-config lp project
  • 2018-04-13 20:40:46 UTC openstack/openstacksdk bugs have been imported to storyboard.o.o from the python-openstacksdk lp project
  • 2018-04-13 20:35:40 UTC openstack/python-openstackclient bugs have been imported to storyboard.o.o from the python-openstackclient lp project
  • 2018-04-13 20:35:07 UTC openstack/tripleo-validations bugs have been imported to storyboard.o.o from the tripleo lp project filtering on the validations bugtag
  • 2018-04-13 20:31:13 UTC review-dev01.openstack.org has been removed from the emergency disable list now that storyboard integration testing is finished
  • 2018-04-12 23:41:30 UTC The Etherpad service at https://etherpad.openstack.org/ is being restarted to pick up the latest release version; browsers should see only a brief ~1min blip before reconnecting automatically to active pads
  • 2018-04-12 22:42:29 UTC nodejs has been manually nudged on etherpad.o.o to upgrade to 6.x packages now that https://review.openstack.org/561031 is in place
  • 2018-04-12 20:19:53 UTC manually corrected the eplite homedir path in /etc/passwd on etherpad.o.o and created it on the filesystem with appropriate ownership following https://review.openstack.org/528625
  • 2018-04-12 19:34:00 UTC restarted etherpad-lite service on etherpad-dev.openstack.org (NOT review-dev!) to pick up commits related to latest 1.6.5 tag
  • 2018-04-12 19:33:38 UTC restarted etherpad-lite service on review-dev.openstack.org to pick up commits related to latest 1.6.5 tag
  • 2018-04-11 22:31:53 UTC zuul was restarted to updated to the latest code; you may need to recheck changes uploaded or approvals added between 21:30 and 21:45
  • 2018-04-11 20:12:46 UTC added review-dev01.openstack.org to emergency disable list in preparation for manually experimenting with some configuration changes in an attempt to further diagnose the its-storyboard plugin
  • 2018-04-09 22:00:10 UTC removed AAAA RRs for afs01.dfw.o.o, afs02.dfw.o.o and afs01.ord.o.o per https://review.openstack.org/559851
  • 2018-04-09 16:53:40 UTC zuul was restarted to update to the latest code; please recheck any changes uploaded within the past 10 minutes
  • 2018-04-09 11:06:06 UTC ask-staging.o.o is pointing to a new xenial-based server. old server is at ask-staging-old.o.o for now
  • 2018-04-09 10:38:32 UTC afs02.dfw.o.o ran out of space in /vicepa. added +1tb volume (bringing in-line with afs01). two volumes appear to have become corrupt due to out-of-disk errors; ubuntu & debian. recovery involves a full release. ubuntu is done, debian is currently going; i have the cron lock
  • 2018-04-09 09:43:43 UTC the elasticsearch.service on es03.o.o was down since 02:00z, restarted now
  • 2018-04-04 20:27:56 UTC released bindep 2.7.0
  • 2018-04-04 08:32:36 UTC git08 is showing broken repos e.g. openstack/ara-clients is empty. placed git.openstack.org into emergency file and removed git08 from the list of backends for haproxy as temporary fix
  • 2018-04-02 16:06:04 UTC Cleaned up old unused dns records per http://paste.openstack.org/show/718183/ we no longer use the pypi hostname for our pypi mirrors and some of the clouds don't exist anymore.
  • 2018-03-29 19:45:13 UTC Clarkb has added zxiiro to the python-jenkins-core and release groups. ssbarnea and waynr added to the python-jenkins-core group after mailing list thread did not lead to any objections. I will let them coordinate and decide if other individuals are appropriate to add to python-jenkins-core
  • 2018-03-29 00:00:15 UTC Zuul has been restarted to update to the latest code; existing changes have been re-enqueued, you may need to recheck changes uploaded in the past 10 minutes
  • 2018-03-28 22:55:43 UTC removed tonyb to Group Project Bootstrappers on review.o.o
  • 2018-03-28 22:34:49 UTC added tonyb to Group Project Bootstrappers on review.o.o
  • 2018-03-28 21:53:56 UTC the zuul web dashboard will experience a short downtime as we roll out some changes - no job execution should be affected
  • 2018-03-27 16:21:25 UTC git08 removed from emergency file and added back to git loadbalancer
  • 2018-03-27 16:05:41 UTC set wgObjectCacheSessionExpiry to 86400 in /srv/mediawiki/Settings.php on the wiki to see if it effectively increases session duration to one day
  • 2018-03-26 21:50:23 UTC added git.zuul-ci.org cert to hieradata
  • 2018-03-26 21:35:32 UTC added git08.openstack.org to puppet emergency file for testing
  • 2018-03-26 21:06:29 UTC removed git08.openstack.org from git lb for manual testing
  • 2018-03-24 01:47:06 UTC CORRECTION: stray login.launchpad.net openids were rewritten to login.ubuntu.com
  • 2018-03-24 01:46:05 UTC Duplicate accounts on storyboard.openstack.org have been merged/cleaned up, and any stray login.launchpad.com openids rewritten to login.launchpad.net
  • 2018-03-23 15:51:08 UTC Gerrit will be temporarily unreachable as we restart it to complete the rename of some projects.
  • 2018-03-23 14:53:31 UTC zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed.
  • 2018-03-23 14:49:37 UTC zuul.o.o has been restarted to pick up latest code base and clear memory usage. Both check / gate queues were saved, be sure to check your patches and recheck when needed.
  • 2018-03-23 00:06:12 UTC graphite restarted on graphite.o.o to pick up logging changes from https://review.openstack.org/#/c/541488/
  • 2018-03-22 21:47:15 UTC killed a 21 day old puppet apply on nl03.o.o, was using 100% CPU. strace shown a spam of "sched_yield" and nothing else which seems to have been categorized as a ruby issue in https://tickets.puppetlabs.com/browse/PA-1743
  • 2018-03-22 21:23:16 UTC zuul executors have been restarted to pick up latest security fix for localhost execution
  • 2018-03-21 22:49:01 UTC review01-dev.o.o now online (ubuntu-xenial) and review-dev.o.o DNS redirected
  • 2018-03-21 04:44:06 UTC all today's builds deleted, and all image builds on hold until dib 2.12.1 release. dib fix is https://review.openstack.org/554771 ; however requires a tripleo fix in https://review.openstack.org/554705 to first unblock dib gate
  • 2018-03-20 15:04:47 UTC nl03.o.o removed from emergency file on puppetmaster.o.o
  • 2018-03-20 00:00:46 UTC all afs fileservers running with new settings per https://review.openstack.org/#/c/540198/6/doc/source/afs.rst ; monitoring but no current issues
  • 2018-03-19 20:51:14 UTC nl03.o.o added to emergency file and max-server to 0 for vexxhost until https://review.openstack.org/554354/ land and new raw images built / uploaded
  • 2018-03-19 14:56:07 UTC manually set ownership of /srv/static/tarballs/kolla/images/README.txt from root:root to jenkins:jenkins so that release jobs no longer fail uploading
  • 2018-03-16 18:42:58 UTC Restarted zuul-executors to pick up fix in https://review.openstack.org/553854
  • 2018-03-16 04:26:59 UTC mirror-update.o.o upgraded to bionic AFS packages (1.8.0~pre5-1ppa2). ubuntu-ports, ubuntu & debian mirrors recovered
  • 2018-03-15 19:28:17 UTC The regression stemming from one of yesterday's Zuul security fixes has been rectified, and Devstack/Tempest jobs which encountered POST_FAILURE results over the past 24 hours can safely be rechecked now
  • 2018-03-15 14:12:51 UTC POST_FAILURE results on Tempest-based jobs since the most recent Zuul security fixes are being investigated; rechecking those won't help for now but we'll keep you posted once a solution is identified
  • 2018-03-15 02:54:37 UTC mirror-update.o.o upgraded to xenial host mirror-update01.o.o. mirror-update turned into a cname for 01. old server remains at mirror-update-old.o.o but turned off, so as not run conflicting jobs; will clean up later
  • 2018-03-14 13:22:43 UTC added frickler to https://launchpad.net/~openstack-ci-core and set as administrator to enable ppa management
  • 2018-03-13 08:06:47 UTC Removed typo'd branch stable/queen from openstack/networking-infoblox at revision f6779d525d9bc622b03eac9c72ab5d425fe1283f
  • 2018-03-12 18:43:57 UTC Zuul has been restarted without the breaking change; please recheck any changes which failed tests with the error "Accessing files from outside the working dir ... is prohibited."
  • 2018-03-12 18:21:13 UTC Most jobs in zuul are currently failing due to a recent change to zuul; we are evaluating the issue and will follow up with a recommendation shortly. For the moment, please do not recheck.
  • 2018-03-07 11:22:44 UTC force-merged https://review.openstack.org/550425 to unblock CI
  • 2018-03-06 21:19:56 UTC The infrastructure team is aware of replication issues between review.openstack.org and github.com repositories. We're planning a maintenance to try and address the issue. We recommend using our official supported mirrors instead located at https://git.openstack.org.
  • 2018-03-06 08:24:37 UTC i have applied a manual revert of 4a781a7f8699f5b483f79b1bdface0ba2ba92428 on zuul01.openstack.org and placed it in the emergency file
  • 2018-03-05 03:25:10 UTC gerrit restarted to get github replication going; see http://lists.openstack.org/pipermail/openstack-infra/2018-March/005842.html for some details
  • 2018-03-01 00:56:56 UTC translate.o.o upgraded to zanata 4.3.3. see notes in https://etherpad.openstack.org/p/zanata_upgrade_4xx
  • 2018-02-28 23:31:10 UTC removed old ci-backup-rs-ord.openstack.org dns entry (replaced by backup01.ord.rax.ci.openstack.org) and entry from emergency file. host was deleted some time ago
  • 2018-02-27 10:24:35 UTC gerrit is being restarted due to extreme slowness
  • 2018-02-24 01:07:04 UTC Had to start zuul-scheduler using the init script directly after running `export _SYSTEMCTL_SKIP_REDIRECT=1` to avoid the systemd sysv redirection. One theory is that systemd was unhappy after operating in the low memory environment. We need to get this working again as well as fix bugs iwth pid file handling in the init script. Might consider a server reboot.
  • 2018-02-24 01:05:53 UTC The zuul-scheduler init script on zuul01.o.o appeared to stop working when attempting to start zuul after stopping it for running out of memory. Systemctl would report the process started successfully then exited 0. There were no zuul logs and no additional info in journalctl to further debug. Had to start zuul-scheduler using the init script directly after running `export
  • 2018-02-24 01:04:33 UTC Zuul was restarted to workaround a memory issue. If your jobs are not running they may need to be rechecked
  • 2018-02-22 16:30:26 UTC deleted http://paste.openstack.org/raw/665906/ from lodgeit openstack.pastes table (paste_id=665906) due to provider aup violation/takedown notice
  • 2018-02-22 02:30:16 UTC mirror.ubuntu reprepro has been repaired and back online
  • 2018-02-21 21:02:06 UTC bypassed ci testing for https://review.openstack.org/546758 to resolve a deadlock with imported zuul configuration in a new project
  • 2018-02-21 20:33:27 UTC bypassed ci testing for https://review.openstack.org/546746 to resolve a deadlock with imported zuul configuration in a new project
  • 2018-02-21 16:36:27 UTC manually removed /srv/static/tarballs/training-labs/dist/build, /srv/static/tarballs/training-labs/images, /srv/static/tarballs/dist and /srv/static/tarballs/build from static.openstack.org
  • 2018-02-21 14:18:04 UTC deleted stale afs lockfile for the ubuntu mirror and restarted a fresh reprepro-mirror-update of it on mirror-update.o.o
  • 2018-02-20 04:08:34 UTC mirror.ubuntu released in AFS and ubuntu bionic repos now online
  • 2018-02-20 02:08:01 UTC (dmsimard) The temporary server and volume clones from static.o.o have been deleted, their names were prefixed by "dmsimard-static".
  • 2018-02-19 15:17:03 UTC Zuul has been restarted to pick up latest memory fixes. Queues were saved however patches uploaded after 14:40UTC may have been missed. Please recheck if needed.
  • 2018-02-18 15:56:50 UTC Zuul has been restarted and queues were saved. However, patches uploaded after 14:40UTC may have been missed. Please recheck your patchsets where needed.
  • 2018-02-16 20:30:06 UTC replacement ze02.o.o server is online and processing jobs
  • 2018-02-14 15:06:26 UTC Due to a race in stable/queens branch creation and some job removals, Zuul has reported syntax errors for the past hour; if you saw a syntax error reported for "Job tripleo-ci-centos-7-ovb-containers-oooq not defined" you can safely recheck now
  • 2018-02-14 14:58:14 UTC bypassed zuul and directly submitted https://review.openstack.org/#/c/544359
  • 2018-02-14 14:54:11 UTC bypassed zuul and directly submitted https://review.openstack.org/#/c/544358
  • 2018-02-14 05:10:17 UTC ze02.o.o was hard rebooted but didn't fix ipv6 issues. i detached and re-attached the network from the existing vm, and that seemed to help. DNS has been updated
  • 2018-02-13 21:06:27 UTC restarted nl01 to nl04 to pick up latest fixes for nodepool
  • 2018-02-13 20:05:42 UTC planet01.o.o and ns2.o.o rebooted at provider's request
  • 2018-02-13 14:06:16 UTC temporarily added lists.openstack.org to the emergency maintenance list with https://review.openstack.org/543941 manually applied until it merges
  • 2018-02-12 19:40:32 UTC Ubuntu has published Trusty and Xenial updates for CVE-2018-6789; I have manually updated lists.openstack.org and lists.katacontainers.io with the new packages manually rather than waiting for unattended-upgrades to find them
  • 2018-02-12 04:41:10 UTC per previous message, seems host was rebooted. nl02.o.o looks ok; manually restarted zuul-merger on zm06. no more issues expected
  • 2018-02-12 03:25:47 UTC received rax notification that host of nl02.o.o and zm06.o.o has some sort of issue; currently can't log into either. updates from rax pending
  • 2018-02-07 21:07:22 UTC cleared stale reprepro update lockfile for debian, manually ran mirror update
  • 2018-02-07 19:51:02 UTC ipv6 addresses have been readded to all zuul executors
  • 2018-02-07 01:55:18 UTC (dmsimard) OVH BHS1 and GRA1 both recovered on their own and are back at full capacity.
  • 2018-02-06 23:02:12 UTC all nodepool launchers restarted to pick up https://review.openstack.org/541375
  • 2018-02-06 22:45:48 UTC provider ticket 180206-iad-0005440 has been opened to track ipv6 connectivity issues between some hosts in dfw; ze09.openstack.org has its zuul-executor process disabled so it can serve as an example while they investigate
  • 2018-02-06 22:43:44 UTC ze02.o.o rebooted with xenial 4.13 hwe kernel ... will monitor performance
  • 2018-02-06 17:53:57 UTC (dmsimard) High nodepool failure rates (500 errors) against OVH BHS1 and GRA1: http://paste.openstack.org/raw/663704/
  • 2018-02-06 17:53:11 UTC (dmsimard) zuul-scheduler issues with zookeeper ( kazoo.exceptions.NoNodeError / Exception: Node is not locked / kazoo.client: Connection dropped: socket connection broken ): https://etherpad.openstack.org/p/HRUjBTyabM
  • 2018-02-06 17:51:41 UTC (dmsimard) Different Zuul issues relative to ipv4/ipv6 connectivity, some executors have had their ipv6 removed: https://etherpad.openstack.org/p/HRUjBTyabM
  • 2018-02-06 17:49:03 UTC (dmsimard) CityCloud asked us to disable nodepool usage with them until July: https://review.openstack.org/#/c/541307/
  • 2018-02-06 10:31:35 UTC Our Zuul infrastructure is currently experiencing some problems and processing jobs very slowly, we're investigating. Please do not approve or recheck changes for now.
  • 2018-02-06 09:32:11 UTC graphite.o.o disk full. move /var/log/graphite/carbon-cache-a/*201[67]* to cinder-volume-based /var/lib/graphite/storage/carbon-cache-a.backup.2018-02-06 and server rebooted
  • 2018-02-05 23:02:43 UTC removed lists.openstack.org from the emergency maintenance file
  • 2018-02-05 21:03:27 UTC removed static.openstack.org from emergency maintenance list
  • 2018-02-05 14:56:15 UTC temporarily added lists.openstack.org to the emergency maintenance list pending merger of https://review.openstack.org/540876
  • 2018-02-03 14:05:46 UTC gerrit ssh api on review.openstack.org is once again limited to 100 concurrent connections per source ip address per https://review.openstack.org/529712
  • 2018-02-01 22:05:05 UTC deleted zuulv3.openstack.org and corresponding dns records now that zuul01.openstack.org has been in production for two weeks
  • 2018-02-01 17:57:16 UTC files02.openstack.org removed from emergency file after zuul-ci.org intermediate cert problem resolved
  • 2018-02-01 06:31:41 UTC mirror01.dfw.o.o has been retired due to performance issues; it is replaced by mirror02.dfw.o.o
  • 2018-01-31 21:22:06 UTC filed for removal of ask.openstack.org from mailspike blacklist per https://launchpad.net/bugs/1745512
  • 2018-01-31 02:55:44 UTC mirror.dfw.rax.openstack.org updated to mirror02.dfw.rax.openstack.org (TTL turned down to 5 minutes as we test)
  • 2018-01-31 02:53:34 UTC remove old A and AAAA records for mirror.dfw.openstack.org (note: not mirror.dfw.rax.openstack.org)
  • 2018-01-30 23:43:40 UTC scheduled maintenance will make the citycloud-kna1 api endpoint unavailable intermittently between 2018-02-07 07:00 and 2018-02-08 07:00 utc
  • 2018-01-30 23:42:51 UTC scheduled maintenance will make the citycloud-sto2 api endpoint unavailable intermittently between 2018-01-31 07:00 and 2018-02-01 07:00 utc
  • 2018-01-30 20:18:33 UTC reenqueued release pipeline jobs for openstack/tripleo-ipsec 8.0.0
  • 2018-01-30 17:34:55 UTC 537933 promoted to help address with integrated gate timeout issue with nova
  • 2018-01-30 16:24:42 UTC ticket 180130-ord-0000697 filed to investigate an apparent 100Mbps rate limit on the mirror01.dfw.rax instance
  • 2018-01-30 16:04:21 UTC removed nb01, nb02 and nb03 from the emergency maintenance list now that it's safe to start building new ubuntu-xenial images again
  • 2018-01-30 14:24:08 UTC most recent ubuntu-xenial images have been deleted from nodepool, so future job builds should revert to booting from the previous (working) image while we debug
  • 2018-01-30 13:54:39 UTC nb01, nb02 and nb03 have beeen placed in the emergency maintenance list in preparation for a manual application of https://review.openstack.org/539213
  • 2018-01-30 13:44:45 UTC Our ubuntu-xenial images (used for e.g. unit tests and devstack) are currently failing to install any packages, restrain from *recheck* or *approve* until the issue has been investigated and fixed.
  • 2018-01-29 16:24:17 UTC zuul.o.o is back online, feel free to recheck / approve patches.
  • 2018-01-29 14:31:21 UTC we've been able to restart zuul, and re-enqueue changes for gate. Please hold off on recheck or approves, we are still recovering. More info shortly.
  • 2018-01-29 13:35:52 UTC Zuul is currently under heavy load. Do not *recheck* or *approve* any changes until we give the go ahead.
  • 2018-01-29 11:04:48 UTC Zuul is currently under heavy load. Do not *recheck* or *approve* any changes.
  • 2018-01-28 17:21:57 UTC Zuul has been restarted due to an outage in our cloud provider. Changes already in queues have been restored, but changes uploaded and approved since 12:30 UTC may need to be rechecked or reapproved.
  • 2018-01-28 15:05:28 UTC Jobs are currently not running and are staying queued in Zuul pending the completion of a maintenance at our cloud provider. Jobs will resume once this maintenance has been completed.
  • 2018-01-27 00:28:21 UTC logs.openstack.org crontab re-enabled, and static.o.o removed from emergency file
  • 2018-01-27 00:17:34 UTC mounted new logs filesystem on static.o.o
  • 2018-01-26 23:31:29 UTC created logs filesystem with "mkfs.ext4 -m 0 -j -i 14336 -L $NAME /dev/main/$NAME" http://paste.openstack.org/show/654140/
  • 2018-01-26 23:29:35 UTC cloned all volumes from static.openstack.org for later fsck; replaced main10 device because it seemed slow and recreated logs logical volume.
  • 2018-01-25 16:03:26 UTC logs.openstack.org is stabilized and there should no longer be *new* POST_FAILURE errors. Logs for jobs that ran in the past weeks until earlier today are currently unavailable pending FSCK completion. We're going to temporarily disable *successful* jobs from uploading their logs to reduce strain on our current limited capacity. Thanks for your patience !
  • 2018-01-25 15:43:52 UTC (dmsimard) We're running a modified log_archive_maintenance.sh from ~/corvus and https://review.openstack.org/#/c/537929/ as safety nets to keep us from running out of disk space
  • 2018-01-25 15:42:53 UTC (dmsimard) fsck started running in a screen on logs.o.o for /dev/mapper/main-logs at around 15:30UTC, logs are being sent straight to /srv/static
  • 2018-01-25 14:27:33 UTC We're currently experiencing issues with the logs.openstack.org server which will result in POST_FAILURE for jobs, please stand by and don't needlessly recheck jobs while we troubleshoot the problem.
  • 2018-01-24 19:01:37 UTC enqueued and promoted 537437,2 at the request of mriedem to avoid regression in gate
  • 2018-01-24 15:25:43 UTC gerrit has been suffering from a full disk, some mails may have been lost in the last couple of hours. we will now restart gerrit to address ongoing slowness, too
  • 2018-01-24 02:37:25 UTC manually removed infracloud chocolate from clouds.yaml (https://review.openstack.org/#/c/536989/) as it was holding up puppet runs
  • 2018-01-22 21:29:07 UTC restarted openstack-paste (lodgeit) service on paste.openstack.org as it was timing out responding to proxied requests from apache
  • 2018-01-22 18:56:19 UTC (dmsimard) files02.o.o was put in the emergency file pending a fix due to a missing zuul-ci.org_intermediate.pem file preventing apache from restarting properly
  • 2018-01-22 17:33:49 UTC deleted all contents in /afs/openstack.org/docs/draft at request of pkovar and AJaeger
  • 2018-01-22 12:16:04 UTC gerrit account 26576 has been set to inactive due to continued review spam
  • 2018-01-20 20:16:16 UTC zuul.openstack.org has been restarted due to an unexpected issue. We were able to save / reload queues however new patchsets during the restart may have been missed. Please recheck if needed.
  • 2018-01-20 01:01:09 UTC the zuulv3.openstack.org server has been replaced by a larger zuul01.openstack.org server
  • 2018-01-20 00:37:13 UTC ze* and zm* hosts removed from emergency disable list now that maintenance has concluded
  • 2018-01-19 23:44:18 UTC Zuul will be offline over the next 20 minutes to perform maintenance; active changes will be reenqueued once work completes, but new patch sets or approvals during that timeframe may need to be rechecked or reapplied as appropriate
  • 2018-01-19 21:00:03 UTC temporarily added ze*.openstack.org and zm*.openstack.org to the emergency disable list in preparation for replacing zuulv3 with zuul01
  • 2018-01-19 20:35:57 UTC nl03.o.o is now online and launching nodes
  • 2018-01-18 22:05:36 UTC deleted nodepool and zuul feature/zuulv3 branches
  • 2018-01-18 02:47:57 UTC mirror.bhs1.ovh.openstack.org was unresponsive ... hard reboot and it has reappeared. nothing useful in console logs unfortunately
  • 2018-01-18 02:41:49 UTC nb04.o.o stopped to prepare for nb01.o.o replacement tomorrow
  • 2018-01-17 20:40:45 UTC Zuul will be offline for a few minutes; existing changes will be re-enqueued; approvals during the downtime will need to be re-added.
  • 2018-01-17 00:11:48 UTC (dmsimard) Zuul scheduler status.jsons are now periodically backuped provided by https://review.openstack.org/#/c/532955/ -- these are available over the vhost, ex: http://zuulv3.openstack.org/backup/status_1516146901.json
  • 2018-01-15 18:58:26 UTC updated github zuul app to use new hostname: zuul.openstack.org
  • 2018-01-15 18:24:36 UTC Zuul has been restarted and has lost queue contents; changes in progress will need to be rechecked.
  • 2018-01-15 04:51:34 UTC The logs.openstack.org filesystem has been restored to full health. We are attempting to keep logs uploaded between the prior alert and this one, however if your job logs are missing please issue a recheck.
  • 2018-01-14 22:38:52 UTC The filesystem for the logs.openstack.org site was marked read-only at 2018-01-14 16:47 UTC due to an outage incident at the service provider; a filesystem recovery is underway, but job logs uploaded between now and completion are unlikely to be retained so please refrain from rechecking due to POST_FAILURE results until this alert is rescinded.
  • 2018-01-14 22:27:22 UTC a `fsck -y` of /dev/mapper/main-logs is underway in a root screen session on static.openstack.org
  • 2018-01-14 22:25:30 UTC rebooted static.openstack.org to make sure disconnected volume /dev/xvdg reattaches correctly
  • 2018-01-12 16:46:50 UTC Zuul has been restarted and lost queue information; changes in progress will need to be rechecked.
  • 2018-01-12 14:26:44 UTC manually started the apache2 service on ask.openstack.org since it seems to have segfaulted and died during log rotation
  • 2018-01-11 17:48:53 UTC Due to an unexpected issue with zuulv3.o.o, we were not able to preserve running jobs for a restart. As a result, you'll need to recheck your previous patchsets
  • 2018-01-11 17:03:14 UTC deleted old odsreg.openstack.org instance
  • 2018-01-11 16:56:32 UTC previously mentioned trove maintenance activities in rackspace have been postponed/cancelled and can be ignored
  • 2018-01-11 12:47:55 UTC nl01 and nl02 restarted to recover nodes in deletion
  • 2018-01-11 02:38:51 UTC zuul restarted due to the unexpected loss of ze04; jobs requeued
  • 2018-01-11 02:13:27 UTC zuul-executor stopped on ze04.o.o and it is placed in the emergency file, due to an external reboot applying https://review.openstack.org/#/c/532575/. we will need to more carefully consider the rollout of this code
  • 2018-01-10 23:20:00 UTC deleted old kdc02.openstack.org server
  • 2018-01-10 23:16:52 UTC deleted old eavesdrop.openstack.org server
  • 2018-01-10 23:14:42 UTC deleted old apps-dev.openstack.org server
  • 2018-01-10 22:24:55 UTC The zuul system is being restarted to apply security updates and will be offline for several minutes. It will be restarted and changes re-equeued; changes approved during the downtime will need to be rechecked or re-approved.
  • 2018-01-10 22:16:52 UTC deleted old stackalytics.openstack.org instance
  • 2018-01-10 22:14:54 UTC deleted old zuul.openstack.org instance
  • 2018-01-10 22:09:27 UTC manually reenqueued openstack/nova refs/tags/14.1.0 into the release pipeline
  • 2018-01-10 21:51:03 UTC deleted old zuul-dev.openstack.org instance
  • 2018-01-10 15:16:27 UTC manually started mirror.regionone.infracloud-vanilla which had been in shutoff state following application of meltdown patches to infracloud hosts
  • 2018-01-10 14:59:51 UTC Gerrit is being restarted due to slowness and to apply kernel patches
  • 2018-01-10 14:55:51 UTC manually started mirror.regionone.infracloud-chocolate which had been in shutoff state following application of meltdown patches to infracloud hosts
  • 2018-01-10 13:58:36 UTC another set of broken images has been in use from about 06:00-11:00 UTC, reverted once more to the previous ones
  • 2018-01-10 13:56:23 UTC zuul-scheduler has been restarted due to heavy swapping, queues have been restored.
  • 2018-01-10 04:59:44 UTC image builds are paused and we have reverted images to old ones after dib release produced images withou pip for python2. This lack of pip for python2 broke tox siblings in many tox jobs
  • 2018-01-09 16:16:44 UTC rebooted nb04 through the nova api; oob console content looked like a botched live migration
  • 2018-01-09 15:57:32 UTC Trove maintenance scheduled for 04:00-12:00 UTC on 2018-01-24 impacting paste_mysql_5.6 instance
  • 2018-01-09 15:56:50 UTC Trove maintenance scheduled for 04:00-12:00 UTC on 2018-01-23 impacting zuul_v3 instance
  • 2018-01-09 15:56:09 UTC Trove maintenance scheduled for 04:00-12:00 UTC on 2018-01-17 impacting Wiki_MySQL and cacti_MySQL instances
  • 2018-01-08 20:37:23 UTC The jobs and queues in Zuul between 19:55UTC and 20:20UTC have been lost after recovering from a crash, you might need to re-check your patches if they were being tested during that period.
  • 2018-01-08 20:33:57 UTC (dmsimard) the msgpack issue experienced yesterday on zm and ze nodes propagated to zuulv3.o.o and crashed zuul-web and zuul-scheduler with the same python general protection fault. They were started after re-installing msgpack but the contents of the queues were lost.
  • 2018-01-08 10:27:03 UTC zuul has been restarted, all queues have been reset. please recheck your patches when appropriate
  • 2018-01-07 21:06:56 UTC Parts of the Zuul infrastructure had to be restarted to pick up new jobs properly, it's possible you may have to recheck your changes if they did not get job results or if they failed due to network connectivity issues.
  • 2018-01-07 20:55:48 UTC (dmsimard) all zuul-mergers and zuul-executors stopped simultaneously after what seems to be a msgpack update which did not get installed correctly: http://paste.openstack.org/raw/640474/ everything is started after reinstalling msgpack properly.
  • 2018-01-07 19:54:35 UTC (dmsimard) ze10 has a broken dpkg transaction (perhaps due to recent outage), fixed with dpkg --configure -a and reinstalling unattended-upgrades http://paste.openstack.org/show/640466/
  • 2018-01-06 01:20:22 UTC (dmsimard) ze09.o.o was rebuilt from scratch after what seems to be a failed live migration which thrashed the root partition disk
  • 2018-01-05 23:00:17 UTC (dmsimard) ze10 was rebooted after being hung since january 4th
  • 2018-01-05 21:35:40 UTC added 2gb swap file to eavesdrop01 at /swapfile since it has no ephemeral disk
  • 2018-01-05 21:14:27 UTC started openstack-meetbot service manually on eavesdrop01.o.o after it was sniped by the oom killer
  • 2018-01-05 17:40:21 UTC Old git.openstack.org server has been deleted. New server's A and AAAA dns record TTLs bumped to an hour. We are now running with PTI enabled on all CentOS control plane servers.
  • 2018-01-05 01:37:53 UTC git0*.openstack.org patched and kernels running with PTI enabled. git.openstack.org has been replaced with a new server running with PTI enabled. The old server is still in places for straggler clients. Will need to be deleted and DNS record TTLs set back to one hour
  • 2018-01-04 14:48:50 UTC zuul has been restarted, all queues have been reset. please recheck your patches when appropriate
  • 2018-01-03 18:27:29 UTC (dmsimard) +r applied to channels to mitigate ongoing freenode spam wave: http://paste.openstack.org/raw/629168/
  • 2018-01-03 12:37:50 UTC manually started apache2 service on ask.o.o, seems to have crashed/failed to correctly restart during log rotation
  • 2018-01-03 07:01:02 UTC We stopped publishing documents on the 23rd of December by accident, this is fixed now. Publishing to docs.o.o and developer.o.o is workinging again. If you miss a document publish, the next merge should publish...
  • 2018-01-03 00:50:57 UTC no zuul-executor is running on ze04 currently. /var/run/ is a tmpfs on xenial so /var/run/zuul does not exist on ze04 after a reboot preventing zuul-executor from starting. https://review.openstack.org/530820 is the proposed fix
  • 2018-01-03 00:49:52 UTC openstackci-ovh accounts are not working and ovh hosted mirrors are not pinging. We have removed OVH from nodepool via 530817 and this change has been manually applied (see previous message for why ansible puppet is not working)
  • 2018-01-03 00:49:11 UTC openstackci-ovh accounts are not working and ovh hosted mirrors are not pinging. The accounts breaking appears to prevent ansible puppet from running (due to failed inventory) this needs to be sorted out.
  • 2017-12-25 13:08:50 UTC zuul scheduler restarted and all changes reenqueued
  • 2017-12-22 19:21:07 UTC lists.openstack.org has been taken out of the emergency disable list now that the spam blackholing in /etc/aliases is managed by puppet
  • 2017-12-22 10:18:13 UTC zuul has been restarted, all queues have been reset. please recheck your patches when appropriate
  • 2017-12-22 06:45:18 UTC Zuul.openstack.org is currently under heavy load and not starting new jobs. We're waiting for an admin to restart Zuul.
  • 2017-12-21 14:51:11 UTC vexxhost temporarily disabled in nodepool via https://review.openstack.org/529572 to mitigate frequent job timeouts
  • 2017-12-21 14:47:42 UTC promoted 528823,2 in the gate to unblock projects relying on sphinxcontrib.datatemplates in their documentation builds
  • 2017-12-20 23:15:25 UTC updated storyboard-dev openids from login.launchpad.net to login.ubuntu.com to solve DBDuplicateEntry exceptions on login
  • 2017-12-20 22:49:29 UTC enqueued 529067,1 into the gate pipeline and promoted in order to unblock requirements and release changes
  • 2017-12-20 19:59:17 UTC Disabled compute026.vanilla.ic.o.o due to hard disk being in read-only mode
  • 2017-12-20 13:15:47 UTC gerrit is being restarted due to extreme slowness
  • 2017-12-20 00:37:29 UTC nl01 nl02 manually downgraded to afcb56e0fb887a090dbf1380217ebcc06ef6b66b due to broken quota handling on branch tip
  • 2017-12-19 23:56:21 UTC removed infra-files-ro and infra-files-rw from all-clouds.yaml as they are invalid, and cause issues deploying new keys. saved in a backup file on puppetmaster.o.o if required
  • 2017-12-19 20:39:55 UTC Manually repaired eavesdrop URLs in recent plaintext meeting minutes after https://review.openstack.org/529118 merged
  • 2017-12-18 14:44:46 UTC (dmsimard) the channel restrictions mentioned last night have been removed after #freenode confirmed the spam wave had stopped.
  • 2017-12-18 03:12:30 UTC (dmsimard) we re-ran mlock commands on OpenStack channels using the accessbot list of channels instead, here is the definitive list of channels that were made +r: http://paste.openstack.org/raw/629176/
  • 2017-12-18 02:55:19 UTC (dmsimard) all channels configured through gerritbot have been mlocked +r: http://paste.openstack.org/raw/629168/ we should remove this once the spam wave subsides
  • 2017-12-18 01:52:19 UTC (dmsimard) added +r to additional targetted channels: #openstack-keystone, #openstack-cinder, #openstack-telemetry, #openstack-requirements, #openstack-release, #tripleo
  • 2017-12-18 01:49:20 UTC The freenode network is currently the target of automated spam attacks, we have enabled temporary restrictions on targetted OpenStack channels which requires users to be logged on to NickServ. If you see spam in your channel, please report it in #openstack-infra. Thanks.
  • 2017-12-18 01:40:13 UTC (dmsimard) enabled mode +r to prevent unregistered users from joining channels hit by spam bots: #openstack-ansible, #openstack-dev, #openstack-infra, #openstack-kolla, #openstack-operators, #puppet-openstack, #rdo
  • 2017-12-18 01:22:17 UTC Slowly deleting 4161 "jenkins" verify -1 votes from open changes in Gerrit with a 1-second delay between each
  • 2017-12-17 22:18:04 UTC trusty-era codesearch.o.o (was 104.130.138.207) has been deleted
  • 2017-12-17 19:24:39 UTC zuul daemon stopped on zuul.openstack.org after AJaeger noticed jenkins was commenting about merge failures on changes
  • 2017-12-14 20:40:21 UTC eavesdrop01.o.o online and running xenial
  • 2017-12-14 05:13:08 UTC codesearch.o.o removed from the emergency file. after 527557 it should be fine to run under normal puppet conditions
  • 2017-12-12 20:16:59 UTC The zuul scheduler has been restarted after lengthy troubleshooting for a memory consumption issue; earlier changes have been reenqueued but if you notice jobs not running for a new or approved change you may want to leave a recheck comment or a new approval vote
  • 2017-12-12 14:40:45 UTC We're currently seeing an elevated rate of timeouts in jobs and the zuulv3.openstack.org dashboard is intermittently unresponsive, please stand by while we troubleshoot the issues.
  • 2017-12-12 09:10:05 UTC Zuul is back online, looks like a temporary network problem.
  • 2017-12-12 08:49:48 UTC Our CI system Zuul is currently not accessible. Wait with approving changes and rechecks until it's back online. Currently waiting for an admin to investigate.
  • 2017-12-11 02:06:47 UTC root keypairs manually updated in all clouds
  • 2017-12-09 00:05:04 UTC zuulv3.o.o removed from emergency file now that puppet-zuul is updated to match deployed zuul
  • 2017-12-08 22:47:45 UTC old docs-draft volume deleted from static.openstack.org, and the recovered extents divvied up between the tarballs and logs volumes (now 0.5tib and 13.4tib respectively)
  • 2017-12-08 20:31:46 UTC added zuulv3.openstack.org to emergency file due to manual fixes to apache rewrite rules
  • 2017-12-08 15:43:29 UTC zuul.openstack.org is scheduled to be rebooted as part of a provider host migration at 2017-12-12 at 04:00 UTC
  • 2017-12-08 15:43:09 UTC elasticsearch02, elasticsearch04 and review-dev are scheduled to be rebooted as part of a provider host migration at 2017-12-11 at 04:00 UTC
  • 2017-12-08 15:38:00 UTC the current stackalytics.openstack.org instance is not recovering via reboot after a failed host migration, and will likely need to be deleted and rebuilt when convenient
  • 2017-12-08 15:36:03 UTC rebooted zuul.openstack.org after it became unresponsive in what looked like a host migration activity
  • 2017-12-08 14:02:35 UTC The issues have been fixed, Zuul is operating fine again but has a large backlog. You can recheck jobs that failed.
  • 2017-12-08 07:06:10 UTC Due to some unforseen Zuul issues the gate is under very high load and extremely unstable at the moment. This is likely to persist until PST morning
  • 2017-12-08 05:37:48 UTC due to stuck jobs seemingly related to ze04, zuul has been restarted. jobs have been requeued
  • 2017-12-08 05:16:35 UTC manually started zuul-executor on ze04
  • 2017-12-07 22:22:09 UTC logstash service stopped, killed and started again on all logstash-worker servers
  • 2017-12-07 17:40:02 UTC This message is to inform you that the host your cloud server 'ze04.openstack.org' resides on became unresponsive. We have rebooted the server and will continue to monitor it for any further alerts.
  • 2017-12-07 16:45:54 UTC This message is to inform you that the host your cloud server 'ze04.openstack.org' resides on alerted our monitoring systems at 16:35 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding what is causing the alert. Please do not access or modify 'ze04.openstack.org' during this process.
  • 2017-12-07 03:08:09 UTC fedora 27 mirroring complete, fedora 25 removed
  • 2017-12-06 21:25:07 UTC proposal.slave.o.o / release.slave.o.o / signin01.ci.o.o have all been deleted and DNS records removed
  • 2017-12-06 08:21:54 UTC zuul-scheduler restarted due to very high number of stuck jobs. check/gate/triple-o requeued
  • 2017-12-05 21:47:03 UTC zk01.o.o, zk02.o.o and zk03.o.o now online and SSH keys accepted into puppetmaster.o.o. Currently no production servers are connected to them.
  • 2017-12-05 06:39:56 UTC translate-dev.o.o removed from emergency list
  • 2017-12-04 23:47:51 UTC mirror.fedora afs volume was >95% full; upped it to 300000000.
  • 2017-12-04 17:53:48 UTC Manually pruned some larger Apache cache entries and flushed the pypi volume cache on mirror.regionone.tripleo-test-cloud-rh1.openstack.org following a full root filesystem event
  • 2017-12-04 17:51:04 UTC cleaned out old unused crm114 data dirs on logstash worker nodes using `sudo find /var/lib/crm114 -mtime +7 -delete` as recent changes to crm114 scripts mean we've collapsed those data dirs into a much smaller set at different paths ignoring the old data.
  • 2017-12-01 22:19:58 UTC Launched a new Mailman server corresponding to https://review.openstack.org/524322 and filed to exclude its ipv4 address from spamhaus record PBL1665489
  • 2017-12-01 13:53:14 UTC gerrit has been restarted to get it back to its normal speed.
  • 2017-11-30 15:39:15 UTC if you receieved a result of "RETRY_LIMIT" after 14:15 UTC, it was likely due to an error since corrected. please "recheck"
  • 2017-11-29 23:41:29 UTC Requested removal of storyboard.o.o ipv4 address from policy blacklist (pbl listing PBL1660430 for 23.253.84.0/22)
  • 2017-11-28 23:26:47 UTC #openstack-shade was retired, redirect put in place and users directed to join #openstack-sdks
  • 2017-11-28 18:12:42 UTC rebooting non responsive codesearch.openstack.org. It pings but does not http(s) or ssh. Probably another live migration gone bad
  • 2017-11-27 15:33:10 UTC Hard rebooted ze05.openstack.org after it was found hung (unresponsive even at console, determination of cause inconclusive, no smoking gun in kmesg entries)
  • 2017-11-27 04:34:59 UTC rebooted status.o.o as it was hung, likely migration failure as had xen timeout errors on the console
  • 2017-11-23 22:58:20 UTC zuulv3.o.o restarted to address memory issues, was a few 100MB from swapping (15GB). Additionally, cleans up leaks nodepool nodes from previous ze03.o.o issues listed above (or below).
  • 2017-11-23 22:56:34 UTC Zuul has been restarted due to an unexpected issue. We're able to re-enqueue changes from check and gate pipelines, please check http://zuulv3.openstack.org/ for more information.
  • 2017-11-23 21:06:55 UTC We seem to have an issue with ze03.o.o loosing ansible-playbook processes, as such we've been jobs continue to run in pipelines for 11+ hours. For now, I have stopped ze03 and will audit other servers
  • 2017-11-22 04:18:32 UTC had to revert the change to delete zuul-env on dib images as the zuul cloner shim does depend on that python installation and its pyyaml lib. Followed up by deleting the debian and centos images that were built without zuul envs
  • 2017-11-21 20:46:20 UTC deleted static wheel-build slaves from rax-dfw (centos / ubuntu-trusty / ubuntu-xenial) along with DNS entries
  • 2017-11-21 02:47:29 UTC ci-backup-rs-ord.openstack.org shutdown and all backup hosts migrated to run against backup01.ord.rax.ci.openstack.org. old backups remain at /opt/old-backups on the new server
  • 2017-11-20 01:25:18 UTC gerrit restarted; review.o.o 42,571 Mb / 48,551 Mb and persistent system load ~10, definite i/o spike o /dev/xvb but nothing usual to the naked eye
  • 2017-11-17 23:24:29 UTC git-review 1.26.0 released, adding support for Gerrit 2.14 and Git 2.15: https://pypi.org/project/git-review/1.26.0/
  • 2017-11-15 06:09:38 UTC zuulv3 stopped / starteed again. It appears an influx of commits like: https://review.openstack.org/519924/ is causing zuul to burn memory quickly.
  • 2017-11-15 02:59:50 UTC Had to stop zuul-scheduler due memory issue, zuul pushed over 15GB of ram and started swapping. https://review.openstack.org/513915/ then prevented zuul from starting, which needed us to then land https://review.openstack.org/519949/
  • 2017-11-15 02:58:22 UTC Due to an unexpected outage with Zuul (1 hour), you'll need to recheck any jobs that were in progress. Sorry for the inconvenience.
  • 2017-11-10 05:23:58 UTC puppetmaster.o.o was hung with oom errors on the console. rax support rebooted it for me
  • 2017-11-10 05:19:13 UTC the zombie host @ 146.20.110.99 (443 days uptime!) has been shutdown. this was likely causing POST_FAILURES as jobs managed to get run on it
  • 2017-11-09 10:07:36 UTC ovh having global issues : https://twitter.com/olesovhcom/status/928559286288093184
  • 2017-11-09 06:35:18 UTC restarting gerrit. 502's reported. 33,603 Mb / 48,550 Mb (stable since last check 03:00UTC) , persistent system load ~9 and high cpu since around 2017-11-09 05:00 UTC
  • 2017-11-06 23:46:07 UTC openstackid.org temporarily added to the emergency disable list so puppet won't undo debugging settings while an issue is investigated for the conference schedule app
  • 2017-11-02 23:11:34 UTC increased mirror.pypi afs volume quota from 1000000000 to 1200000000
  • 2017-11-02 14:42:46 UTC killed stuck gerrit to github replication task on nova-specs repo
  • 2017-11-01 21:57:53 UTC jessie mirror not updated since oct 10 due to reboot of server mid-update. manually removed stale lockfile for debian jessie reprepro; mirror updated and release sucessfully.
  • 2017-11-01 17:52:08 UTC logstash-worker16.o.o to logstash-worker20.o.o now online and SSH keys accepted
  • 2017-10-31 17:16:47 UTC removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/requirements on ze07
  • 2017-10-31 17:16:27 UTC removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/neutron on ze10
  • 2017-10-31 17:16:16 UTC removed corrupted git repo /var/lib/zuul/executor-git/git.openstack.org/openstack/python-glanceclient on ze05
  • 2017-10-31 17:16:00 UTC restarted all zuul executors and cleaned up old processes from previous restarts
  • 2017-10-30 23:19:52 UTC geard (really jenkins-log-client) restarted on logstash.o.o to pick up gear 0.11.0 performance improvements. https://review.openstack.org/516473 needed to workaround zuul transition there.
  • 2017-10-30 22:21:27 UTC gear 0.11.0 tagged wtih statsd performance improvements
  • 2017-10-30 11:06:07 UTC restarted all zuul executors and restarted scheduler
  • 2017-10-30 10:48:09 UTC Zuul has been restarted due to an unexpected issue. Please recheck any jobs that were in progress
  • 2017-10-27 17:00:46 UTC Restarted elasticsearch on elasticsearch07 as it's process had crashed. Log doesn't give many clues as to why. Restarted log workers afterwards.
  • 2017-10-27 17:00:10 UTC Killed elastic recheck static status page update processes from october first to unlock processing lock. Status page updates seem to be processing now.
  • 2017-10-26 18:07:53 UTC zuul scheduler restarted and check/check-tripleo/gate pipeline contents successfully reenqueued
  • 2017-10-26 15:32:52 UTC Provider maintenance is scheduled for 2017-10-30 between 06:00-09:00 UTC which may result in up to a 5 minute connectivity outage for the production Gerrit server's Trove database instance
  • 2017-10-26 11:22:18 UTC docs.o.o index page was lost due to broken build and publishing of broken build, suggested fix in https://review.openstack.org/515365
  • 2017-10-26 10:43:53 UTC we lost the docs.o.o central home page, somehow our publishing is broken
  • 2017-10-26 05:18:22 UTC zm[0-4].o.o rebuilt to xenial and added to zuulv3
  • 2017-10-25 19:30:28 UTC zl01.o.o to zl06.o.o, zlstatic01.o.o have been deleted and DNS entries removed from rackspace.
  • 2017-10-25 06:45:41 UTC zuul v2/jenkins config has been removed from project-config
  • 2017-10-24 01:45:35 UTC all zuul executors have been restarted to pick up the latest bubblewrap bindmount addition
  • 2017-10-21 08:26:51 UTC increased Zanata limit for concurrent requests to 20 using Zanata UI
  • 2017-10-21 02:33:01 UTC all zuul executors restarted to pick up the /usr/share/ca-certificates addition
  • 2017-10-18 14:56:18 UTC Gerrit account 8944 set to inactive to handle a duplicate account issue
  • 2017-10-18 04:45:12 UTC review.o.o hard rebooted due to failure during live migration (rax ticket: 171018-ord-0000074). manually restarted gerrit after boot, things seem ok now
  • 2017-10-18 00:33:55 UTC due to unscheduled restart of zuulv3.o.o you will need to 'recheck' your jobs that were last running. Sorry for the inconvenience.
  • 2017-10-16 15:21:53 UTC elasticsearch cluster is now green after triggering index curator early to clear out old indexes "lost" on es07
  • 2017-10-16 03:05:41 UTC elasticsearch07.o.o rebooted & elasticsearch started. data was migrated from SSD storage and "main" vg contains only one block device now
  • 2017-10-15 22:06:10 UTC Zuul v3 rollout maintenance is underway, scheduled to conclude by 23:00 UTC: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123618.html
  • 2017-10-15 21:20:10 UTC Zuul v3 rollout maintenance begins at 22:00 UTC (roughly 45 minutes from now): http://lists.openstack.org/pipermail/openstack-dev/2017-October/123618.html
  • 2017-10-12 23:06:18 UTC Workarounds are in place for libcurl and similar dependency errors due to stale ubuntu mirroring, and for POST_FAILURE results stemming from runaway inode utilization on the logs site; feel free to recheck failing changes for either of these problems now
  • 2017-10-12 16:04:42 UTC removed mirror.npm volume from afs
  • 2017-10-12 14:57:16 UTC Job log uploads are failing due to lack of inodes. Jobs also fail due to mismatches in gnutls packages. Workarounds for both in progress with proper fixes to follow.
  • 2017-10-11 17:26:50 UTC move Gerrit account 27031s' openid to account 21561 and marked 27031 inactive
  • 2017-10-11 13:07:12 UTC Due to unrelated emergencies, the Zuul v3 rollout has not started yet; stay tuned for further updates
  • 2017-10-11 11:13:10 UTC deleted the errant review/andreas_jaeger/zuulv3-unbound branch from the openstack-infra/project-config repository (formerly at commit 2e8ae4da5d422df4de0b9325bd9c54e2172f79a0)
  • 2017-10-11 10:10:02 UTC The CI system will be offline starting at 11:00 UTC (in just under an hour) for Zuul v3 rollout: http://lists.openstack.org/pipermail/openstack-dev/2017-October/123337.html
  • 2017-10-11 07:46:41 UTC Lots of RETRY_LIMIT errors due to unbound useage with Zuul v3, we reverted the change; recheck your changes
  • 2017-10-10 01:43:02 UTC manually rotated all logs on zuulv3.openstack.org as a stop-gap to prevent a full rootfs later when scheduled log rotation kicks in; an additional 14gib were freed as a result
  • 2017-10-10 00:43:23 UTC restart of *gerrit* complete
  • 2017-10-10 00:39:01 UTC restarting zuul after prolonged period of high GC activity is causing 502 errors
  • 2017-10-09 20:53:36 UTC cleared all old workspaces on signing01.ci to deal with those which had cached git remotes to some no-longer-existing zuul v2 mergers
  • 2017-10-05 00:51:41 UTC updated openids in the storyboard.openstack.org database from login.launchpad.net to login.ubuntu.com
  • 2017-10-04 06:31:26 UTC The special infra pipelines in zuul v3 have disappared
  • 2017-10-03 03:00:20 UTC zuulv3 restarted with 508786 508787 508793 509014 509040 508955 manually applied; should fix branch matchers, use *slightly* less memory, and fix the 'base job not defined' error
  • 2017-10-02 12:50:51 UTC Restarted nodepool-launcher on nl01 and nl02 to fix zookeeper connection
  • 2017-10-02 12:45:00 UTC ran `sudo -u zookeeper ./zkCleanup.sh /var/lib/zookeeper 3` in /usr/share/zookeeper/bin on nodepool.openstack.org to free up 22gib of space for its / filesystem
  • 2017-09-28 22:41:03 UTC zuul.openstack.org has been added to the emergency disable list so that a temporary redirect to zuulv3 can be installed by hand
  • 2017-09-28 14:44:03 UTC The infra team is now taking Zuul v2 offline and bringing Zuul v3 online. Please see https://docs.openstack.org/infra/manual/zuulv3.html for more information, and ask us in #openstack-infra if you have any questions.
  • 2017-09-26 23:40:51 UTC project-config is unable to merge changes due to problems found during zuul v3 migration. for the time being, if any emergency changes are needed (eg, nodepool config), please discuss in #openstack-infra and force-merge them.
  • 2017-09-26 18:25:58 UTC The infra team is continuing work to bring Zuul v3 online; expect service disruptions and please see https://docs.openstack.org/infra/manual/zuulv3.html for more information.
  • 2017-09-25 23:37:33 UTC project-config is frozen until further notice for the zuul v3 transition; please don't approve any changes without discussion with folks familiar with the migration in #openstack-infra
  • 2017-09-25 20:52:05 UTC The infra team is bringing Zuul v3 online; expect service disruptions and please see https://docs.openstack.org/infra/manual/zuulv3.html for more information.
  • 2017-09-25 15:50:39 UTC deleted all workspaces from release.slave.openstack.org to deal with changes to zuul v2 mergers
  • 2017-09-22 21:33:40 UTC jeepyb and gerritlib fixes for adding project creator to new groups on Gerrit project creation in process of getting landed. Please double check group membership after the next project creation.
  • 2017-09-22 19:12:01 UTC /vicepa filesystem on afs01.ord.openstack.org has been repaired and vos release of docs and docs.dev volumes have resumed to normal frequency
  • 2017-09-22 17:39:22 UTC When seeding initial group members in Gerrit remove the openstack project creator account until jeepyb is updated to do so automatically
  • 2017-09-22 11:06:09 UTC no content is currently pushed to docs.openstack.org - post jobs run successfully but docs.o.o is not updated
  • 2017-09-21 19:23:16 UTC OpenIDs for the Gerrit service have been restored from a recent backup and the service is running again; before/after table states are being analyzed now to identify any remaining cleanup needed for changes made to accounts today
  • 2017-09-21 18:25:35 UTC The Gerrit service on review.openstack.org is being taken offline briefly to perform database repair work but should be back up shortly
  • 2017-09-21 18:19:03 UTC Gerrit OpenIDs have been accidentally overwritten and are in the process of being restored
  • 2017-09-21 17:54:32 UTC nl01.o.o and nl02.o.o are both back online with site-specific nodepool.yaml files.
  • 2017-09-21 14:08:07 UTC nodepool.o.o removed from emergency file, ovh-bhs1 came back online at 03:45z.
  • 2017-09-21 13:39:00 UTC Gerrit account 8971 for "Fuel CI" has been disabled due to excessive failure comments
  • 2017-09-21 02:50:04 UTC OVH-BHS1 mirror has disappeared unexpectedly. did not respond to hard reboot. nodepool.o.o in emergency file and region max-servers set to 0
  • 2017-09-20 23:17:13 UTC Please don't merge any new project creation changes until mordred gives the go ahead. We have new puppet problems on the git backends and there are staged jeepyb changes we want to watch before opening the flood gates
  • 2017-09-20 20:21:59 UTC nb03.o.o / nb04.o.o added to emergency file
  • 2017-09-19 23:42:19 UTC Gerrit is once again part of normal puppet config management. Problems with Gerrit gitweb links and Zuul post jobs have been addressed. We currently cannot create new gerrit projects (fixes in progress) and email sending is slow (being debugged).
  • 2017-09-19 22:34:37 UTC Gerrit is being restarted to address some final issues, review.openstack.org will be inaccessible for a few minutes while we restart
  • 2017-09-19 20:28:23 UTC Zuul and Gerrit are being restarted to address issues discovered with the Gerrit 2.13 upgrade. review.openstack.org will be inaccessible for a few minutes while we make these changes. Currently running jobs will be restarted for you once Zuul and Gerrit are running again.
  • 2017-09-19 07:25:16 UTC Post jobs are not executed currently, do not tag any releases
  • 2017-09-19 07:13:26 UTC Zuul is not running any post jobs
  • 2017-09-19 02:42:08 UTC Gerrit is being restarted to feed its insatiable memory appetite
  • 2017-09-19 00:10:07 UTC please avoid merging new project creation changes until after we have the git backends puppeting properly
  • 2017-09-18 23:48:12 UTC review.openstack.org Gerrit 2.13 upgrade is functionally complete. The Infra team will be cleaning up bookkeeping items over the next couple days. If you have any questions please let us know
  • 2017-09-18 23:34:42 UTC review.openstack.org added to emergency file until git.o.o puppet is fixed and we can supervise a puppet run on review.o.o
  • 2017-09-18 16:40:08 UTC The Gerrit service at https://review.openstack.org/ is offline, upgrading to 2.13, for an indeterminate period of time hopefully not to exceed 23:59 UTC today: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 15:04:04 UTC The Gerrit service at https://review.openstack.org/ is offline, upgrading to 2.13, for an indeterminate period of time hopefully not to exceed 23:59 UTC today: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 14:33:25 UTC Gerrit will be offline for the upgrade to 2.13 starting at 15:00 UTC (in roughly 30 minutes) and is expected to probably be down/unusable for 8+ hours while an offline reindex is performed: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 13:48:14 UTC accountPatchReviewDb database created and gerrit2 account granted access in Review-MySQL trove instance, in preparation for upcoming gerrit upgrade maintenance
  • 2017-09-18 13:38:33 UTC updatepuppetmaster cron job on puppetmaster.openstack.org has been disabled in preparation for the upcoming gerrit upgrade maintenance
  • 2017-09-18 13:38:31 UTC Gerrit will be offline for the upgrade to 2.13 starting at 15:00 UTC (in roughly 1.5 hours) and is expected to probably be down/unusable for 8+ hours while an offline reindex is performed: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-18 12:07:34 UTC Gerrit will be offline for the upgrade to 2.13 starting at 15:00 UTC (in roughly 3 hours) and is expected to probably be down/unusable for 8+ hours while an offline reindex is performed: http://lists.openstack.org/pipermail/openstack-dev/2017-August/120533.html
  • 2017-09-17 15:30:17 UTC Zuul has been fixed, you can approve changes again.
  • 2017-09-17 05:52:25 UTC Zuul is currently not moving any changes into the gate queue. Wait with approving changes until this is fixed.
  • 2017-09-17 01:06:37 UTC Zuul has been restarted to pick up a bug fix in prep for Gerrit upgrade. Changes have been reenqueued for you.
  • 2017-09-16 14:21:27 UTC OpenStack CI is fixed and fully operational again, feel free to "recheck" your jobs
  • 2017-09-16 09:12:28 UTC OpenStack CI is currently not recording any votes in gerrit. Do not recheck your changes until this is fixed.
  • 2017-09-14 23:12:24 UTC Artifact signing key for Pike has been retired; key for Queens is now in production
  • 2017-09-13 23:05:46 UTC CentOS 7.4 point release today has resulted in some mirror disruption, repair underway; expect jobs on centos7 nodes to potentially fail for a few hours longer
  • 2017-09-13 14:36:45 UTC increased ovh quotas to bhs1:80 gra1:50 as we haven't had launch errors recently according to grafana
  • 2017-09-11 22:50:48 UTC zm05.o.o - zm08.o.o now online running on ubuntu xenial
  • 2017-09-09 00:17:24 UTC nodepool.o.o added to ansible emergency file so that we can hand tune the max-servers in ovh. Using our previous numbers results in lots of 500 errors from the clouds
  • 2017-09-08 16:08:20 UTC New 1TB cinder volume attached to Rax ORD backup server and backups filesystem extended to include that space. This was done in response to a full filesystem. Backups should begin functioning again on the next pulse.
  • 2017-09-08 13:48:12 UTC nodepool issue related to bad images has been resolved, builds should be coming back online soon. Restarted gerrit due to reasons. Happy Friday.
  • 2017-09-08 10:48:41 UTC Our CI systems experience a hickup, no new jobs are started. Please stay tuned and wait untils this resolved.
  • 2017-09-05 22:47:53 UTC logstash-worker16.o.o to logstash-worker20.o.o deleted in rackspace
  • 2017-09-04 19:18:46 UTC ubuntu-xenial nodepool-launcher (nl02.o.o) online
  • 2017-09-04 19:17:35 UTC logstash-worker16.o.o to logstash-worker20.o.o services stopped
  • 2017-08-29 18:00:17 UTC /etc/hosts on mirror.regionone.infracloud-vanilla.org has buildlogs.centos.org pinned to 38.110.33.4. This is temporary to see if round robin DNS is our issues when we proxy to buildlogs.centos.org
  • 2017-08-29 16:20:39 UTC replaced myself with clarkb at https://review.openstack.org/#/admin/groups/infra-ptl
  • 2017-08-28 12:11:46 UTC restarted ptgbot service on eavesdrop at 11:29 utc; was disconnected from freenode 2017-08-26 02:29 utc due to an irc ping timeout
  • 2017-08-24 16:00:19 UTC hound service on codesearch.o.o stopped / started to pick up new projects for indexing
  • 2017-08-23 23:17:52 UTC infracloud-vanilla is offline due to the keystone certificate expiring. this has also broken puppet-run-all on puppetmaster.
  • 2017-08-22 07:43:46 UTC Gerrit has been restarted successfully
  • 2017-08-22 07:37:59 UTC Gerrit is going to be restarted due to slow performance
  • 2017-08-17 16:10:10 UTC deleted mirror.mtl01.internap.openstack.org (internap -> inap rename)
  • 2017-08-17 04:21:46 UTC all RAX mirror hosts (iad, ord and dfw) migrated to new Xenial based hosts
  • 2017-08-16 23:43:32 UTC renamed nodepool internap provider to inap. new mirror server in use.
  • 2017-08-16 19:55:08 UTC zuul v3 executors ze02, ze03, ze04 are online
  • 2017-08-16 19:54:55 UTC zuul v2 launchers zl07, zl08, zl09 have been deleted due to reduced cloud capacity and to make way for zuul v3 executors
  • 2017-08-16 13:01:36 UTC trove configuration "sanity" created in rax dfw for mysql 5.7, setting our usual default overrides (wait_timeout=28800, character_set_server=utf8, collation_server=utf8_bin)
  • 2017-08-15 20:35:51 UTC created auto hold for gate-tripleo-ci-centos-7-containers-multinode to debug docker.io issues with reverse proxy
  • 2017-08-15 18:42:16 UTC mirror.sto2.citycloud.o.o DNS updated to 46.254.11.19 TTL 60
  • 2017-08-14 15:29:31 UTC mirror.kna1.citycloud.openstack.org DNS entry updated to 91.123.202.15
  • 2017-08-11 20:39:46 UTC created mirror.mtl01.inap.openstack.org to replace mirror.mtl01.internap.openstack.org (internap -> inap rename)
  • 2017-08-11 19:20:14 UTC The apps.openstack.org server has been stopped, snapshotted one last time, and deleted.
  • 2017-08-11 05:00:49 UTC restarted mirror.ord.rax.openstack.org per investigation in https://bugs.launchpad.net/openstack-gate/+bug/1708707 which suggested apache segfaults causing pypi download failures. Will monitor
  • 2017-08-10 23:47:46 UTC removed 8.8.8.8 dns servers from both infracloud-chocolate and infracloud-vanilla provider-subnet-infracloud subnet
  • 2017-08-10 20:03:12 UTC Image builds manually queued for centos-7, debian-jessie, fedora-25, fedora-26, opensuse-423, ubuntu-trusty and ubuntu-xenial to use latest glean (1.9.2)
  • 2017-08-10 19:50:10 UTC glean 1.9.2 released to properly support vfat configdrive labels
  • 2017-08-10 12:27:50 UTC mirror.lon1.citycloud.openstack.org migrated to a new compute node by Kim from citycloud. appears up. nodepool conf restored & nodepool.o.o taken out of emergency file
  • 2017-08-10 12:13:12 UTC nodepool in emergency file and citycloud-lon1 region commented out while we investigate issues with mirror
  • 2017-08-09 20:18:19 UTC OVH ticket 8344470555 has been opened to track voucher reinstatement/refresh
  • 2017-08-08 00:07:46 UTC Gerrit on review.openstack.org restarted just now, and is no longer using contact store functionality or configuration options
  • 2017-08-07 23:34:49 UTC The Gerrit service on review.openstack.org will be offline momentarily at 00:00 utc for a quick reconfiguration-related restart
  • 2017-08-07 16:38:16 UTC temporarily blocked 59.108.63.126 in iptables on static.openstack.org due to a denial of service condition involving tarballs.o.o/kolla/images/centos-source-registry-ocata.tar.gz
  • 2017-08-04 20:37:45 UTC Gerrit is being restarted to pick up CSS changes and should be back momentarily
  • 2017-08-02 20:00:10 UTC OSIC environment is active in Nodepool and running jobs normally once more
  • 2017-08-02 17:29:57 UTC infracloud-vanilla back online
  • 2017-08-02 14:18:29 UTC mirror.regionone.infracloud-vanilla.openstack.org DNS updated to 15.184.65.187
  • 2017-08-02 13:59:00 UTC We have disable infracloud-vanilla due to the compute host running mirror.regionone.infracloud-vanilla.o.o being offline. Please recheck your failed jobs to schedule them to another cloud.
  • 2017-08-01 23:49:09 UTC osic nodes have been removed from nodepool due to a problem with the mirror host beginning around 22:20 UTC. please recheck any jobs with failures installing packages.
  • 2017-08-01 22:16:19 UTC pypi mirror manually updated and released
  • 2017-08-01 21:28:46 UTC pypi mirrors have not updated since 2:15 UTC due to issue with pypi.python.org. reported issue, since corrected. mirror updates now in progress.
  • 2017-08-01 08:09:21 UTC Yolanda has started nodepool-launcher process because it was stopped for more than one hour
  • 2017-07-31 07:39:25 UTC Yolanda had to restart nodepool-launcher because vms were not being spinned and the process looked inactive for the latest 90 min
  • 2017-07-28 17:14:32 UTC The Gerrit service on review.openstack.org is being taken offline for roughly 5 minutes to perform a database backup and reconfiguration
  • 2017-07-23 23:23:03 UTC Job triggering events between 21:00 and 23:15 UTC were lost, and any patch sets uploaded or approved during that timeframe will need rechecking or reapproval before their jobs will run
  • 2017-07-22 00:27:10 UTC restarted logstash and jenkins-log-worker-{A,B,C,D} services on all logstash-workerNN servers to get logs processing again
  • 2017-07-22 00:26:02 UTC manually expired old elasticsearch shards to get the cluster back into a sane state
  • 2017-07-21 19:24:23 UTC docs.o.o is up again, https://review.openstack.org/486196 fixes it - but needed manual applying since jobs depend on accessing docs.o.o
  • 2017-07-21 18:43:07 UTC kibana on logstash.o.o is currently missing entries past 21:25 utc yesterday
  • 2017-07-21 18:42:20 UTC elasticsearch02 has been hard-rebooted via nova after it hung at roughly 21:25 utc yesterday; elasticsearch service on elasticsearch05 also had to be manually started following a spontaneous reboot from 2017-07-14 01:39..18:27 (provider ticket from that date mentions an unresponsive hypervisor host); cluster is recovering now but kibana on logstash.o.o is currently missing entries past
  • 2017-07-21 18:41:02 UTC docs.o.o is currently broken, we're investigating
  • 2017-07-21 17:07:30 UTC Restarting Gerrit for our weekly memory leak cleanup.
  • 2017-07-19 23:07:08 UTC restarted nodepool-launcher which had frozen (did not respond to SIGUSR2)
  • 2017-07-19 13:24:08 UTC the lists.o.o server is temporarily in emergency disable mode pending merger of https://review.openstack.org/484989
  • 2017-07-17 20:39:01 UTC /srv/static/tarballs/trove/images/ubuntu/mysql.qcow2 has been removed from static.openstack.org again
  • 2017-07-14 13:39:41 UTC deleted duplicate mirror.la1.citycloud and forced regeneration of dynamic inventory to get it to show up
  • 2017-07-13 19:09:37 UTC docs maintenance is complete and afsdb01 puppet and vos release cronjob have been reenabled
  • 2017-07-13 18:11:47 UTC puppet updates for afsdb01 have been temporarily suspended and its vos release cronjob disabled in preparation for manually reorganizing the docs volume
  • 2017-07-13 00:17:28 UTC zl08.o.o and zl09.o.o are now online and functional.
  • 2017-07-12 16:28:32 UTC both mirrors in infracloud-chocolate and infracloud-vanilla replaced with 250GB HDD mirror flavors now.
  • 2017-07-12 14:46:27 UTC DNS for mirror.regionone.infracloud-chocolate.openstack.org changed to 15.184.69.112, 60min TTL
  • 2017-07-12 13:22:05 UTC DNS for mirror.regionone.infracloud-vanilla.openstack.org changed to 15.184.66.172, 60min TTL
  • 2017-07-12 07:59:43 UTC Gerrit has been successfully restarted
  • 2017-07-12 07:51:20 UTC Gerrit is going to be restarted, due to low performance
  • 2017-07-12 06:53:30 UTC FYI, ask.openstack.org is down, review.o.o is slow - please have patience until this is fixed
  • 2017-07-11 18:00:06 UTC small hiccup in review-dev gerrit 2.13.8 -> 2.13.9 upgrade. Will be offline temporarily while we wait on puppet to curate lib installations
  • 2017-07-10 21:03:37 UTC 100gb cinder volume added and corresponding proxycache logical volume mounted at /var/cache/apache2 on mirrors for ca-ymq-1.vexxhost, dfw.rax, iad.rax, mtl01.internap, ord.rax, regionone.osic-cloud1
  • 2017-07-10 21:01:51 UTC zuul service on zuul.openstack.org restarted to clear memory utilization from slow leak
  • 2017-07-10 19:22:40 UTC similarly reinstalled tox on all other ubuntu-based zuul_nodes tracked in hiera (centos nodes seem to have been unaffected)
  • 2017-07-10 19:04:46 UTC reinstalled tox on proposal.slave.o.o using python 2.7, as it had defaulted to 3.4 at some point in the past (possibly related to the pip vs pip3 mixup last month)
  • 2017-07-10 17:01:01 UTC old mirror lv on static.o.o reclaimed to extend the tarballs lv by 150g
  • 2017-07-06 23:45:47 UTC nb03.openstack.org has been cleaned up and rebooted, and should return to building rotation
  • 2017-07-06 12:01:55 UTC docs.openstack.org is up again.
  • 2017-07-06 11:17:42 UTC docs.openstack.org has internal error (500). Fix is underway.
  • 2017-07-03 15:40:16 UTC "docs.openstack.org is working fine again, due to move of new location, each repo needs to merge one change to appear on docs.o.o"
  • 2017-07-03 15:26:19 UTC rebooting files01.openstack.org to clear up defunct apache2 zombies ignoring sigkill
  • 2017-07-03 15:21:17 UTC "We're experiencing a few problems with the reorg on docs.openstack.org and are looking into these..."
  • 2017-07-03 14:39:21 UTC We have switched now all docs publishing jobs to new documentation builds. For details see dhellmann's email http://lists.openstack.org/pipermail/openstack-dev/2017-July/119221.html . For problems, join us on #openstack-doc
  • 2017-07-01 00:33:44 UTC Reissued through June 2018 and manually tested all externally issued SSL/TLS certificates for our servers/services
  • 2017-06-29 18:03:43 UTC review-dev has been upgraded to gerrit 2.13.8. Please test behavior and functionality and note any abnormalities on https://etherpad.openstack.org/p/gerrit-2.13.-upgrade-steps
  • 2017-06-23 08:05:47 UTC ok git.openstack.org is working again, you can recheck failed jobs
  • 2017-06-23 06:06:21 UTC unknown issue with the git farm, everything broken - we're investigating
  • 2017-06-20 21:19:32 UTC The Gerrit service on review-dev.openstack.org is being taken offline for an upgrade to 2.13.7.4.988b40f
  • 2017-06-20 15:41:54 UTC Restarted openstack-paste service on paste.openstack.org as lodgeit runserver process was hung and unrsponsive (required sigterm followed up sighup before it would exit)
  • 2017-06-20 12:57:52 UTC restarting gerrit to address slowdown issues
  • 2017-06-18 21:29:58 UTC Image builds for ubuntu-trusty are paused and have been rolled back to yesterday until DNS issues can be unraveled
  • 2017-06-17 03:03:42 UTC zuulv3.o.o and ze01.o.o now using SSL/TLS for gearman operations
  • 2017-06-09 14:58:36 UTC The Gerrit service on review.openstack.org is being restarted now to clear an issue arising from an unanticipated SSH API connection flood
  • 2017-06-09 14:06:10 UTC Blocked 169.48.164.163 in iptables on review.o.o temporarily for excessive connection counts
  • 2017-06-07 20:40:18 UTC Blocked 60.251.195.198 in iptables on review.o.o temporarily for excessive connection counts
  • 2017-06-07 20:39:49 UTC Blocked 113.196.154.248 in iptables on review.o.o temporarily for excessive connection counts
  • 2017-06-07 20:07:25 UTC The Gerrit service on review.openstack.org is being restarted now to clear some excessive connection counts while we debug the intermittent request failures reported over the past few minutes
  • 2017-06-07 19:59:08 UTC Blocked 169.47.209.131, 169.47.209.133, 113.196.154.248 and 210.12.16.251 in iptables on review.o.o temporarily while debugging excessive connection counts
  • 2017-06-06 19:27:56 UTC both zuulv3.o.o and ze01.o.o are online and under puppet cfgmgmt
  • 2017-06-05 22:30:53 UTC Puppet updates are once again enabled for review-dev.openstack.org
  • 2017-06-05 14:37:25 UTC review-dev.openstack.org has been added to the emergency disable list for Puppet updates so additional trackingid entries can be tested there
  • 2017-06-01 14:35:16 UTC python-setuptools 36.0.1 has been released and now making its way into jobs. Feel free to 'recheck' your failures. If you have any problems, please join #openstack-infra
  • 2017-06-01 09:46:17 UTC There is a known issue with setuptools 36.0.0 and errors about the "six" package. For current details see https://github.com/pypa/setuptools/issues/1042 and monitor #openstack-infra
  • 2017-05-27 12:05:22 UTC The Gerrit service on review.openstack.org is restarting to clear some hung API connections and should return to service momentarily.
  • 2017-05-26 20:58:41 UTC OpenStack general mailing list archives from Launchpad (July 2010 to July 2013) have been imported into the current general archive on lists.openstack.org.
  • 2017-05-26 09:57:14 UTC Free space for logs.openstack.org reached 40GiB, so an early log expiration run (45 days) is underway in a root screen session.
  • 2017-05-25 23:18:21 UTC The nodepool-dsvm jobs are failing for now, until we reimplement zookeeper handling in our devstack plugin
  • 2017-05-24 17:46:12 UTC nb03.o.o and nb04.o.o are online (upgraded to xenial). Will be waiting a day or 2 before deleting nb01.o.o and nb02.o.o.
  • 2017-05-24 14:52:39 UTC both nb01.o.o and nb02.o.o are stopped. This is to allow nb03.o.o to build todays images
  • 2017-05-24 04:10:31 UTC Sufficient free space has been reclaimed that jobs are passing again; any POST_FAILURE results can now be rechecked.
  • 2017-05-23 21:25:01 UTC The logserver has filled up, so jobs are currently aborting with POST_FAILURE results; remediation is underway.
  • 2017-05-23 14:04:47 UTC Disabled Gerrit account 10842 (Xiexianbin) for posting unrequested third-party CI results on changes
  • 2017-05-17 10:55:41 UTC gerrit is being restarted to help stuck git replication issues
  • 2017-05-15 07:02:20 UTC eavesdrop is up again, logs from Sunday 21:36 to Monday 7:01 are missing
  • 2017-05-15 06:42:55 UTC eavesdrop is currently not getting updated
  • 2017-05-12 13:39:24 UTC The Gerrit service on http://review.openstack.org is being restarted to address hung remote replication tasks.
  • 2017-05-11 18:42:55 UTC OpenID authentication through LP/UO SSO is working again
  • 2017-05-11 17:29:50 UTC The Launchpad/UbuntuOne SSO OpenID provider is offline, preventing logins to review.openstack.org, wiki.openstack.org, et cetera; ETA for fix is unknown
  • 2017-05-03 18:54:36 UTC Gerrit on review.openstack.org is being restarted to accomodate a memory leak in Gerrit. Service should return shortly.
  • 2017-05-01 18:15:44 UTC Upgraded wiki.openstack.org from MediaWiki 1.28.0 to 1.28.2 for CVE-2017-0372
  • 2017-04-27 17:52:33 UTC DNS has been updated for the new redirects added to static.openstack.org, moving them off old-wiki.openstack.org (which is now being taken offline)
  • 2017-04-25 15:52:41 UTC Released bindep 2.4.0
  • 2017-04-21 20:38:54 UTC Gerrit is back in service and generally usable, though remote Git replicas (git.openstack.org and github.com) will be stale for the next few hours until online reindexing completes
  • 2017-04-21 20:06:20 UTC Gerrit is offline briefly for scheduled maintenance http://lists.openstack.org/pipermail/openstack-dev/2017-April/115702.html
  • 2017-04-21 19:44:12 UTC Gerrit will be offline briefly starting at 20:00 for scheduled maintenance http://lists.openstack.org/pipermail/openstack-dev/2017-April/115702.html
  • 2017-04-18 21:51:51 UTC nodepool.o.o restarted to pick up https://review.openstack.org/#/c/455466/
  • 2017-04-14 17:23:54 UTC vos release npm.mirror --localauth currently running from screen in afsdb01
  • 2017-04-14 02:01:28 UTC wiki.o.o required a hard restart due to host issues following rackspace network maintenence
  • 2017-04-13 19:53:37 UTC The Gerrit service on http://review.openstack.org is being restarted to address hung remote replication tasks.
  • 2017-04-13 08:52:57 UTC zuul was restarted due to an unrecoverable disconnect from gerrit. If your change is missing a CI result and isn't listed in the pipelines on http://status.openstack.org/zuul/ , please recheck
  • 2017-04-12 21:27:31 UTC Restarting Gerrit for our weekly memory leak cleanup.
  • 2017-04-11 14:48:58 UTC we have rolled back centos-7, fedora-25 and ubuntu-xenial images to the previous days release. Feel free to recheck your jobs now.
  • 2017-04-11 14:28:32 UTC latest base images have mistakenly put python3 in some places expecting python2 causing widespread failure of docs patches - fixes are underway
  • 2017-04-11 02:17:51 UTC bindep 2.3.0 relesed to fix fedora 25 image issues
  • 2017-04-09 16:23:03 UTC lists.openstack.org is back online. Thanks for your patience.
  • 2017-04-09 15:18:22 UTC We are preforming unscheduled maintenance on lists.openstack.org, the service is currently down. We'll post a follow up shortly
  • 2017-04-07 19:00:49 UTC ubuntu-precise has been removed from nodepool.o.o, thanks for the memories
  • 2017-04-06 15:00:18 UTC zuulv3 is offline awaiting a security update.
  • 2017-04-05 14:02:24 UTC git.openstack.org is synced up
  • 2017-04-05 12:53:14 UTC The Gerrit service on http://review.openstack.org is being restarted to address hung remote replication tasks, and should return to an operable state momentarily
  • 2017-04-05 11:16:06 UTC cgit.openstack.org is not up to date
  • 2017-04-04 16:13:40 UTC The openstackid-dev server has been temporarily rebuilt with a 15gb performance flavor in preparation for application load testing
  • 2017-04-01 13:29:37 UTC The http://logs.openstack.org/ site is back in operation; previous logs as well as any uploaded during the outage should be available again; jobs which failed with POST_FAILURE can also be safely rechecked.
  • 2017-03-31 21:52:06 UTC The upgrade maintenance for lists.openstack.org has been completed and it is back online.
  • 2017-03-31 20:00:04 UTC lists.openstack.org will be offline from 20:00 to 23:00 UTC for planned upgrade maintenance
  • 2017-03-31 08:27:06 UTC logs.openstack.org has corrupted disks, it's being repaired. Please avoid rechecking until this is fixed
  • 2017-03-31 07:46:38 UTC Jobs in gate are failing with POST_FAILURE. Infra roots are investigating
  • 2017-03-30 17:05:30 UTC The Gerrit service on review.openstack.org is being restarted briefly to relieve performance issues, and should return to service again momentarily.
  • 2017-03-29 18:47:18 UTC statusbot restarted since it seems to have fallen victim to a ping timeout (2017-03-26 20:55:32) and never realized it
  • 2017-03-23 19:13:06 UTC eavesdrop.o.o cinder volume rotated to avoid rackspace outage on Friday March 31 03:00-09:00 UTC
  • 2017-03-23 16:20:33 UTC Cinder volumes static.openstack.org/main08, eavesdrop.openstack.org/main01 and review-dev.openstack.org/main01 will lose connectivity Friday March 31 03:00-09:00 UTC unless replaced by Wednesday March 29.
  • 2017-03-21 08:43:22 UTC Wiki problems have been fixed, it's up and running
  • 2017-03-21 00:44:19 UTC LP bugs for monasca migrated to openstack/monasca-api in StoryBoard, defcore to openstack/defcore, refstack to openstack/refstack
  • 2017-03-16 15:59:20 UTC The Gerrit service on review.openstack.org is being restarted to address hung remote replication tasks, and should return to an operable state momentarily
  • 2017-03-16 11:49:38 UTC paste.openstack.org service is back up - turns out it was a networking issue, not a database issue. yay networks!
  • 2017-03-16 11:02:17 UTC paste.openstack.org is down, due to connectivity issues with backend database. support ticket has been created.
  • 2017-03-14 16:07:35 UTC Changes https://review.openstack.org/444323 and https://review.openstack.org/444342 have been approved, upgrading https://openstackid.org/ production to what's been running and tested on https://openstackid-dev.openstack.org/
  • 2017-03-14 13:55:27 UTC Gerrit has been successfully restarted
  • 2017-03-14 13:49:09 UTC Gerrit has been successfully restarted
  • 2017-03-14 13:42:50 UTC Gerrit is going to be restarted due to performance problems
  • 2017-03-14 04:22:30 UTC gerrit under load throwing 503 errors. Service restart fixed symptoms and appears to be running smoothly
  • 2017-03-13 17:46:25 UTC restarting gerrit to address performance problems
  • 2017-03-09 16:43:59 UTC nodepool-builder restarted on nb02.o.o after remounting /opt file system
  • 2017-03-07 15:59:57 UTC compute085.chocolate.ic.o.o back in service
  • 2017-03-07 15:46:03 UTC compute085.chocolate.ic.o.o currently disabled on controller00.chocolate.ic.o.o, investigating a failing with the neutron linuxbridge agent
  • 2017-03-06 21:33:48 UTC nova-computer for compute035.vanilla.ic.o.o has been disabled on controller.vanilla.ic.o.o. compute035.vanilla.ic.o.o appears to be having HDD issue, currently in ready-only mode.
  • 2017-03-06 21:17:46 UTC restarting gerrit to address performance problems
  • 2017-03-04 14:36:00 UTC CORRECTION: The afs01.dfw.openstack.org/main01 volume has been successfully replaced by afs01.dfw.openstack.org/main04 and is therefore no longer impacted by the coming block storage maintenance.
  • 2017-03-04 13:35:22 UTC The afs01.dfw.openstack.org/main01 volume has been successfully replaced by review.openstack.org/main02 and is therefore no longer impacted by the coming block storage maintenance.
  • 2017-03-03 21:47:51 UTC The review.openstack.org/main01 volume has been successfully replaced by review.openstack.org/main02 and is therefore no longer impacted by the coming block storage maintenance.
  • 2017-03-03 16:39:58 UTC Upcoming provider maintenance 04:00-10:00 UTC Wednesday, March 8 impacting Cinder volumes for: afs01.dfw, nb02 and review
  • 2017-03-03 14:28:54 UTC integrated gate is blocked by job waiting for trusty-multinode node
  • 2017-03-01 14:26:12 UTC Provider maintenance resulted in loss of connectivity to the static.openstack.org/main06 block device taking our docs-draft logical volume offline; filesystem recovery has been completed and the volume brought back into service.
  • 2017-02-28 23:13:36 UTC manually installed paramiko 1.18.1 on nodepool.o.o and restarted nodepool (due to suspected bug related to https://github.com/paramiko/paramiko/issues/44 in 1.18.2)
  • 2017-02-28 13:45:41 UTC gerrit is back to normal and I don't know how to use the openstackstaus bot
  • 2017-02-28 13:39:11 UTC ok gerrit is back to normal
  • 2017-02-28 13:10:06 UTC restarting gerrit to address performance problems
  • 2017-02-23 14:40:37 UTC nodepool-builder (nb01.o.o / nb02.o.o) stopped again. As a result of zuulv3-dev.o.o usage of infra-chocolate, we are accumulating DIB images on disk
  • 2017-02-23 13:42:06 UTC The mirror update process has completed and resulting issue confirmed solved; any changes whose jobs failed on invalid qemu package dependencies can now be safely rechecked to obtain new results.
  • 2017-02-23 13:05:37 UTC Mirror update failures are causing some Ubuntu-based jobs to fail on invalid qemu package dependencies; the problem mirror is in the process of updating now, so this condition should clear shortly.
  • 2017-02-22 14:55:51 UTC Created Continuous Integration Tools Development in All-Projects.git (UI), added zuul gerrit user to the group.
  • 2017-02-17 19:05:17 UTC Restarting gerrit due to performance problems
  • 2017-02-17 07:48:00 UTC osic-cloud disabled again, see https://review.openstack.org/435250 for some background
  • 2017-02-16 21:37:58 UTC zuulv3-dev.o.o is now online. Zuul services are currently stopped.
  • 2017-02-16 18:19:17 UTC osic-cloud1 temporarily disable. Currently waiting for root cause of networking issues.
  • 2017-02-15 23:18:25 UTC nl01.openstack.org (nodepool-launcher) is now online. Nodepool services are disabled.
  • 2017-02-15 20:58:25 UTC We're currently battling an increase in log volume which isn't leaving sufficient space for new jobs to upload logs and results in POST_FAILURE in those cases; recheck if necessary but keep spurious rebasing and rechecking to a minimum until we're in the clear.
  • 2017-02-14 23:08:17 UTC Hard rebooted mirror.ca-ymq-1.vexxhost.openstack.org because vgs was hanging indefinitely, impacting our ansible/puppet automation
  • 2017-02-13 17:20:54 UTC AFS replication issue has been addressed. Mirrors are currently re-syncing and coming back online.
  • 2017-02-13 15:51:28 UTC We are currently investigating an issue with our AFS mirrors which is causing some projects jobs to fail. We are working to correct the issue.
  • 2017-02-10 14:14:43 UTC The afs02.dfw.openstack.org/main02 volume in Rackspace DFW is expected to become unreachable between 04:00-10:00 UTC Sunday and may require corrective action on afs02.dfw.o.o as a result
  • 2017-02-10 14:12:44 UTC Rackspace will be performing Cinder maintenance in DFW from 04:00 UTC Saturday through 10:00 Sunday (two windows scheduled)
  • 2017-02-09 20:21:48 UTC Restarting gerrit due to performance problems
  • 2017-02-09 20:18:23 UTC Restarting gerrit due to performance problems
  • 2017-02-08 11:36:51 UTC The proposal node had disconnected from the static zuul-launcher. Restarting the launcher has restored connection and proposal jobs are running again
  • 2017-02-08 10:37:14 UTC post and periodic jobs are not running, seems proposal node is down
  • 2017-02-07 16:36:10 UTC restarted gerritbot since messages seemed to be going into a black hole
  • 2017-02-06 18:15:12 UTC rax notified us that the host groups.o.o is on was rebooted
  • 2017-02-04 17:44:18 UTC zuul-launchers restarted to pick up 428740
  • 2017-02-03 19:46:54 UTC elastic search delay (elastic-recheck) appears to have recovered. logstash daemon was stopped on logstash-workers, then started. Our logprocessors were also restarted
  • 2017-02-03 14:13:27 UTC static.o.o root partition at 100%, deleted apache2 logs greater then 5 days in /var/log/apache2 to free up space
  • 2017-02-02 22:53:06 UTC Restarting gerrit due to performance problems
  • 2017-01-30 21:12:09 UTC increased quoto on afs volume mirror.pypi from 500G to 1T
  • 2017-01-25 12:51:30 UTC Gerrit has been successfully restarted
  • 2017-01-25 12:48:18 UTC Gerrit is going to be restarted due to slow performance
  • 2017-01-24 18:16:30 UTC HTTPS cert and chain for zuul.openstack.org has been renewed and replaced.
  • 2017-01-24 18:16:22 UTC HTTPS cert and chain for ask.openstack.org has been renewed and replaced.
  • 2017-01-14 08:34:53 UTC OSIC cloud has been taken down temporarily, see https://review.openstack.org/420275
  • 2017-01-12 20:36:29 UTC Updated: Gerrit will be offline until 20:45 for scheduled maintenance (running longer than anticipated): http://lists.openstack.org/pipermail/openstack-dev/2017-January/109910.html
  • 2017-01-12 20:11:24 UTC Gerrit will be offline between now and 20:30 for scheduled maintenance: http://lists.openstack.org/pipermail/openstack-dev/2017-January/109910.html
  • 2017-01-12 17:41:11 UTC fedora (25) AFS mirror now online.
  • 2017-01-11 02:09:00 UTC manually disabled puppet ansible runs from puppetmaster.openstack.org in crontab due to CVE-2016-9587
  • 2017-01-11 02:08:10 UTC upgraded ansible on all zuul launchers due to CVE-2016-9587. see https://bugzilla.redhat.com/show_bug.cgi?id=1404378 and https://review.openstack.org/418636
  • 2017-01-10 20:14:26 UTC docs.openstack.org served from afs via files01.openstack.org
  • 2017-01-09 19:23:20 UTC using ironic node-set-maintenance $node off && ironic node-set-power-state $node reboot infracloud hypervisors that had disappeared were brought back to life. The mirror VM was then reenabled with openstack server set $vm_name active.
  • 2017-01-09 15:09:23 UTC Nodepool use of Infra-cloud's chocolate region has been disabled with https://review.openstack.org/417904 while nova host issues impacting its mirror instance are investigated.
  • 2017-01-09 15:08:02 UTC All zuul-launcher services have been emergency restarted so that zuul.conf change https://review.openstack.org/417679 will take effect.
  • 2017-01-08 09:43:24 UTC AFS doc publishing is broken, we have read-only file systems.
  • 2017-01-07 01:03:27 UTC docs and docs.dev (developer.openstack.org) afs volumes now have read-only replicas in dfw and ord, and they are being served by files01.openstack.org. a script runs on afsdb01 every 5 minutes to release them if there are any changes.
  • 2017-01-04 22:18:51 UTC elasticsearch rolling upgrade to version 1.7.6 is complete and cluster is recovered
  • 2017-01-02 21:30:44 UTC logstash daemons were 'stuck' and have been restarted on logstash-worker0X.o.o hosts. Events are being processed and indexed again as a result. Should probably look into upgrading logstash install (and possibly elasticsearch
  • 2016-12-29 11:11:50 UTC logs.openstack.org is up again. Feel free to recheck any failures.
  • 2016-12-29 08:20:50 UTC All CI tests are currently broken since logs.openstack.org is down. Refrain from recheck or approval until this is fixed.
  • 2016-12-29 03:00:42 UTC review.o.o (gerrit) restarted
  • 2016-12-21 18:00:07 UTC Gerrit is being restarted to update its OpenID SSO configuration
  • 2016-12-16 00:17:36 UTC nova services restart on controller00.chocolate.ic.openstack.org to fix nodes failing to launch, unsure why this fixed our issue
  • 2016-12-14 23:06:05 UTC nb01.o.o and nb02.o.o added to emergency file on puppetmaster. To manually apply https://review.openstack.org/#/c/410988/
  • 2016-12-14 17:00:06 UTC nb01.o.o and nb02.o.o builders restarted and running from master again. nodepool.o.o did not restart, but /opt/nodepool is pointing to master branch
  • 2016-12-13 17:04:17 UTC Canonical admins have resolved the issue with login.launchpad.net, so authentication should be restored now.
  • 2016-12-13 16:27:33 UTC Launchpad SSO is not currently working, so logins to our services like review.openstack.org and wiki.openstack.org are failing; the admins at Canonical are looking into the issue but there is no estimated time for a fix yet.
  • 2016-12-12 15:08:04 UTC The Gerrit service on review.openstack.org is restarting now to address acute performance issues, and will be back online momentarily.
  • 2016-12-09 23:11:43 UTC manually ran "pip uninstall pyopenssl" on refstack.openstack.org to resolve a problem with requests/cryptography/pyopenssl/mod_wsgi
  • 2016-12-09 22:00:09 UTC elasticsearch has finished shard recovery and relocation. Cluster is now green
  • 2016-12-09 19:03:15 UTC launcher/deleter on nodepool.o.o are now running the zuulv3 branch. zookeeper based nodepool builders (nb01, nb02) are in production
  • 2016-12-09 18:57:39 UTC performed full elasticsearch cluster restart in an attempt to get it to fully recover and go green. Previously was yellow for days unable to initialize some replica shards. Recovery of shards in progress now.
  • 2016-12-08 19:48:07 UTC nb01.o.o / nb02.o.o removed from emergency file
  • 2016-12-08 19:16:13 UTC nb01.o.o / nb02.o.o added to emergency file on puppetmaster
  • 2016-12-07 19:00:56 UTC The zuul-launcher service on zlstatic01 has been restarted following application of fix https://review.openstack.org/408194
  • 2016-12-05 18:55:57 UTC Further project-config changes temporarily frozen for approval until xenial job cut-over changes merge, in an effort to avoid unnecessary merge conflicts.
  • 2016-11-30 16:43:16 UTC afs01.dfw.o.o / afs02.dfw.o.o /dev/mapper/main-vicepa increased to 3TB
  • 2016-11-24 14:49:29 UTC OpenStack CI is processing jobs again. Thanks to the Canadian admin "team" that had their Thanksgiving holiday already ;) Jobs are all enqueued, no need to recheck.
  • 2016-11-24 13:40:03 UTC OpenStack CI has taken a Thanksgiving break; no new jobs are currently launched. We're currently hoping for a friendly admin to come out of Thanksgiving and fix the system.
  • 2016-11-24 05:40:46 UTC The affected filesystems on the log server are repaired. Please leave 'recheck' comments on any changes which failed with POST_FAILURE.
  • 2016-11-24 00:14:50 UTC Due to a problem with the cinder volume backing the log server, jobs are failing with POST_FAILURE. Please avoid issuing 'recheck' commands until the issue is resolved.
  • 2016-11-23 22:56:05 UTC Configuration management updates are temporarily disabled for openstackid.org in preparation for validating change 399253.
  • 2016-11-23 22:56:01 UTC The affected filesystems on the log server are repaired. Please leave 'recheck' comments on any changes which failed with POST_FAILURE.
  • 2016-11-23 22:45:15 UTC This message is to inform you that your Cloud Block Storage device static.openstack.org/main05 has been returned to service.
  • 2016-11-23 21:11:19 UTC Due to a problem with the cinder volume backing the log server, jobs are failing with POST_FAILURE. Please avoid issuing 'recheck' commands until the issue is resolved.
  • 2016-11-23 20:57:14 UTC received at 20:41:09 UTC: This message is to inform you that our monitoring systems have detected a problem with the server which hosts your Cloud Block Storage device 'static.openstack.org/main05' at 20:41 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding the alert. Please do not access or modify 'static.openstack.org/main05' during this process.
  • 2016-11-22 21:12:27 UTC Gerrit is offline until 21:30 UTC for scheduled maintenance: http://lists.openstack.org/pipermail/openstack-dev/2016-November/107379.html
  • 2016-11-22 14:29:16 UTC rebooted ask.openstack.org for a kernel update
  • 2016-11-21 12:20:56 UTC We are currently having capacity issues with our ubuntu-xenial nodes. We have addressed the issue but will be another few hours before new images have been uploaded to all cloud providers.
  • 2016-11-17 19:18:55 UTC zl04 is restarted now as well. This concludes the zuul launcher restarts for ansible synchronize logging workaround
  • 2016-11-17 19:06:28 UTC all zuul launchers except for zl04 restarted to pick up error logging fix for synchronize tasks. zl04 failed to stop and is being held aside for debugging purposes
  • 2016-11-15 18:58:00 UTC developer.openstack.org is now served from files.openstack.org
  • 2016-11-14 19:32:12 UTC Correction, https://review.openstack.org/396428 changes logs-DEV.openstack.org behavior, rewriting nonexistent files to their .gz compressed counterparts if available.
  • 2016-11-14 19:30:38 UTC https://review.openstack.org/396428 changes logs.openstack.org behavior, rewriting nonexistent files to their .gz compressed counterparts if available.
  • 2016-11-14 17:54:20 UTC Gerrit on review.o.o restarted to deal with GarbageCollection eating all the cpu. Previous restart was Novemeber 7th, so we lasted for one week.
  • 2016-11-11 18:43:11 UTC This message is to inform you that our monitoring systems have detected a problem with the server which hosts your Cloud Block Storage device 'wiki-dev.openstack.org/main01' at 18:27 UTC. We are currently investigating the issue and will update you as soon as we have additional information regarding the alert. Please do not access or modify 'wiki-dev.openstack.org/main01' during this process.
  • 2016-11-11 13:01:03 UTC Our OpenStack CI system is coming back online again. Thanks for your patience.
  • 2016-11-11 12:02:09 UTC Our OpenStack CI systems are stuck and no new jobs are submitted. Please do not recheck - and do not approve changes until this is fixed.
  • 2016-11-11 11:50:51 UTC nodepool/zuul look currently stuck, looks like no new jobs are started
  • 2016-11-10 17:09:24 UTC restarted all zuul-launchers to pick up https://review.openstack.org/394658
  • 2016-11-07 23:09:54 UTC removed the grafana keynote demo dashboard using curl -X DELETE http://grafyamlcreds@localhost:8080/api/dashboards/db/nodepool-new-clouds
  • 2016-11-07 08:47:58 UTC Gerrit is going to be restarted due to slowness and proxy errors
  • 2016-11-04 20:05:04 UTC The old phabricator demo server has been deleted.
  • 2016-11-04 20:04:39 UTC The old (smaller) review-dev server which was replaced in August has now been deleted.
  • 2016-11-02 14:47:47 UTC All hidden Gerrit groups owned by Administrators with no members or inclusions have been prefixed with "Unused-" for possible future (manual) deletion.
  • 2016-10-28 08:57:35 UTC restart apache2 on etherpad.o.o to clear out stale connections
  • 2016-10-27 11:23:46 UTC The nodepool-builder service on nodepool.o.o has been started again now that our keynote demo is complete.
  • 2016-10-26 05:42:32 UTC The Gerrit service on review.openstack.org is being restarted now to guard against potential performance issues later this week.
  • 2016-10-25 13:51:11 UTC The nodepool-builder process is intentionally stopped on nodepool.openstack.org and will be started again tomorrow after noon UTC.
  • 2016-10-21 20:44:36 UTC nodepool is in emergency file so that nodepool config can be more directly managed temporarily
  • 2016-10-20 18:10:09 UTC The Gerrit service on review.openstack.org is being restarted now in an attempt to resolve some mismatched merge states on a few changes, but should return momentarily.
  • 2016-10-20 17:26:37 UTC restarted ansible launchers with 2.5.2.dev31
  • 2016-10-18 23:42:50 UTC restarted logstash daemons as well to get logstash pipeline moving again. Appears they all went out to lunch for some reason (logstash logs not so great but they stopped reading from the tcp connection with log workers according to strace)
  • 2016-10-18 17:44:47 UTC logstash worker daemons restarted as they have all deadlocked. Proper fix in https://review.openstack.org/388122
  • 2016-10-18 16:12:40 UTC pycparser 2.16 released to fix assertion error from today.
  • 2016-10-18 14:06:54 UTC We are away of pycparser failures in the gate and working to address the issue.
  • 2016-10-12 21:33:19 UTC bandersnatch manually synced and mirror.pypi vos released to get around timeout on cron. Mirror appears to have reached steady state and should sync properly again.
  • 2016-10-11 02:49:34 UTC Jobs running on osic nodes are failing due to network issues with the mirror. We are temporarily disabling the cloud.
  • 2016-10-10 07:11:12 UTC Nodepool images can now be built for Gentoo as well - https://review.openstack.org/#/c/310865
  • 2016-10-07 16:46:26 UTC full sync of bandersnatch started, to pickup missing packages from AFS quota issue this morning
  • 2016-10-07 12:30:07 UTC mirror.pypi quota (AFS) bumped to 500GB (up from 400GB)
  • 2016-10-07 12:28:59 UTC mirror.pypi quota (AFS) bumped to 500MB (up from 400MB)
  • 2016-10-06 18:56:31 UTC nodepool now running 3 separate daemons with configuration managed by puppet. If you can always make sure there is a deleter running before we have a launcher to avoid leaking nodes.
  • 2016-10-05 03:15:53 UTC X.509 certificate renewed and updated in private hiera for openstackid.org
  • 2016-10-04 14:02:29 UTC The Gerrit service on review.openstack.org is being restarted to address performance degradation and should return momentarily
  • 2016-09-29 15:01:26 UTC manually running log_archive_maintenance.sh to make room for logs on static.o.o
  • 2016-09-26 16:12:24 UTC Launchpad SSO logins are confirmed working correctly again
  • 2016-09-26 15:50:16 UTC gerrit login manually set to error page in apache config to avoid accidental account creation while lp sso is offline
  • 2016-09-26 15:50:13 UTC Launchpad SSO is offline, preventing login to https://review.openstack.org/, https://wiki.openstack.org/ and many other sites; no ETA has been provided by the LP admin team
  • 2016-09-26 15:44:08 UTC Earlier job failures for "zuul-cloner: error: too few arguments" should now be solved, and can safely be rechecked
  • 2016-09-26 15:37:34 UTC added review.openstack.org to emergency disabled file
  • 2016-09-26 15:28:35 UTC A 4gb swapfile has been added on cacti.openstack.org at /swap while we try to work out what flavor its replacement should run
  • 2016-09-23 22:40:31 UTC mirror.iad.rax.openstack.org has been rebooted to restore sanity following connectivity issues to its cinder volume
  • 2016-09-22 14:50:38 UTC Rebooted wheel-mirror-centos-7-amd64.slave.openstack.org to clear persistent PAG creation error
  • 2016-09-22 04:44:55 UTC A bandersnatch update is running under a root screen session on mirror-update.openstack.org
  • 2016-09-21 13:44:26 UTC disabled apache2/puppetmaster processes on puppetmaster.openstack.org
  • 2016-09-20 14:44:06 UTC infra-cloud has been enabled again.
  • 2016-09-20 13:45:20 UTC OpenStack Infra now has a Twitter bot, follow it at https://twitter.com/openstackinfra
  • 2016-09-20 13:38:56 UTC infra-cloud temporarily taken off to debug some glance issues.
  • 2016-09-20 13:37:49 UTC openstack infra now has a twitter bot, follow it at https://twitter.com/openstackinfra
  • 2016-09-18 16:35:31 UTC The /srv/mediawiki filesystem for the production wiki site had communication errors, so has been manually put through an offline fsck and remounted again
  • 2016-09-13 17:12:12 UTC The Gerrit service on review.openstack.org is being restarted now to address current performance problems, but should return to a working state within a few minutes
  • 2016-09-09 16:59:50 UTC setuptools 27.1.2 addresses the circular import
  • 2016-09-09 15:56:05 UTC New setuptools release appears to have a circular import which is breaking many jobs - check for ImportError: cannot import name monkey.
  • 2016-09-08 01:26:02 UTC restarted nodepoold and nodepool builder to pick up change that should prevent leaking iamges when we hit the 8 hour image timeout.
  • 2016-09-07 20:21:28 UTC controller00 of infracloud is put on emergency hosts, as neutron debugging has been tweaked to investigate sporadic connect timeouts, please leave as is till we get more errors on logs
  • 2016-09-02 19:16:43 UTC Gerrit is completing an online re-index, you may encounter slowness until it is complete
  • 2016-09-02 18:07:50 UTC Gerrit is now going offline for maintenance, reserving a maintenance window through 22:00 UTC.
  • 2016-09-02 17:39:48 UTC The infrastructure team is taking Gerrit offline for maintenance, beginning shortly after 18:00 UTC for a potentially 4 hour maintenance window.
  • 2016-09-02 15:23:22 UTC The Gerrit service on review.openstack.org is restarting quickly to relieve resource pressure and restore normal performance
  • 2016-09-02 12:24:51 UTC restarted nodepool with the latest shade and nodepool changes. all looks well - floating-ips, images and flavors are not being hammered
  • 2016-09-02 05:38:24 UTC Space has been freed up on the log server. If you have POST_FAILURE results it is now safe to issue a 'recheck'
  • 2016-09-02 05:12:19 UTC The logs volume is full causing jobs to fail with POST_FAILURE. This is being worked on, please do not recheck until notified.
  • 2016-08-31 22:29:18 UTC that way the cloud8 people can work on getting the ips sorted in parallel
  • 2016-08-31 22:29:06 UTC in the mean time, it was suggested as a workaround to just use the cloud1 mirror since they're in the same data center by pointing the dns record there
  • 2016-08-31 22:28:50 UTC the networking in cloud8 is such that our mirror is behind the double nat - so our automation has no idea what the actual ip of the server is ... the cloud8 people are looking in to fixing this, but there are things outside of their immediate control
  • 2016-08-29 17:43:00 UTC email sent to rackspace about rax-iad networking issue. The region is still disabled in nodepool
  • 2016-08-26 19:19:03 UTC restarted apache2 on health.o.o to remove a runaway apache process using all the cpu and memory. Looked like it may be related to mysql connections issues. DB currently looks happy.
  • 2016-08-25 23:20:30 UTC mirror.mtl01.internap.openstack.org now online
  • 2016-08-25 19:47:45 UTC The Gerrit service on review.openstack.org is restarting to implement some performance tuning adjustments, and should return to working order momentarily.
  • 2016-08-23 20:07:55 UTC mirror.regionone.osic-cloud1.openstack.org upgraded to support both ipv4 / ipv6. DNS has also been updated.
  • 2016-08-23 16:53:58 UTC The https://wiki.openstack.org/ site (temporarily hosted from wiki-upgrade-test.o.o) has been updated from Mediawiki 1.27.0 to 1.27.1 per https://lists.wikimedia.org/pipermail/mediawiki-announce/2016-August/000195.html
  • 2016-08-20 15:39:13 UTC The its-storyboard plugin has been enabled on review.openstack.org per http://eavesdrop.openstack.org/meetings/infra/2016/infra.2016-08-16-19.02.log.html#l-90
  • 2016-08-19 19:28:55 UTC nodepool.o.o added to emergency file on puppetmaster.o.o. So we can remove ubuntu-xenail label from osic-cloud1
  • 2016-08-19 11:51:08 UTC OSIC has burned through the problematic IP range with failures, things should be back to normal now.
  • 2016-08-19 11:23:21 UTC DSVM jobs on OSIC currently failing because of IP collisions, fix is in the gate - https://review.openstack.org/#/c/357764/ - please hold rechecks until merged
  • 2016-08-19 11:18:22 UTC Precise tests on OSIC provider are currently failing, please stop your checks until the issue is resolved.
  • 2016-08-18 20:08:15 UTC mirror.nyj01.internap.openstack.org replacement server now online, DNS has been updated to 74.217.28.58
  • 2016-08-17 23:04:47 UTC osic-cloud8 credentials added to hieradata
  • 2016-08-17 19:46:43 UTC The volume for logs.openstack.org filled up rather suddenly, causing a number of jobs to fail with a POST_FAILURE result and no logs; we're manually expiring some logs now to buy breathing room, but any changes which hit that in the past few minutes will need to be rechecked and/or approved again
  • 2016-08-17 16:54:30 UTC tripleo-test-cloud-rh1 credentials update on nodepool.o.o to use opentackzuul project
  • 2016-08-17 02:37:29 UTC DNS for wiki.openstack.org currently goes to the wiki-upgrade-test.openstack.org server, as the former suffered a compromise due to missing iptables rules
  • 2016-08-15 22:45:15 UTC mirror.ord.rax.openstack.org upgraded to performance1-4 to address network bandwidth cap.
  • 2016-08-15 20:49:59 UTC gracefully restarting all zuul-launchers
  • 2016-08-15 20:34:14 UTC Installed ansible stable-2.1 branch on zuul launchers to pick up https://github.com/ansible/ansible/commit/d35377dac78a8fcc6e8acf0ffd92f47f44d70946
  • 2016-08-13 16:16:54 UTC The Gerrit service on review.openstack.org is online again
  • 2016-08-13 12:26:24 UTC gerrit is having issues ... it is being working, no ETA at the moment
  • 2016-08-12 23:09:05 UTC https://wiki.openstack.org/ is now running Mediawiki 1.27.0; please let us know in #openstack-infra if anything seems wrong
  • 2016-08-12 23:03:06 UTC ok https://wiki.openstack.org/ is now running Mediawiki 1.27.0; please let us know in #openstack-infra if anything seems wrong
  • 2016-08-12 21:01:01 UTC The Mediawiki service at wiki.openstack.org will be offline from 21:00 UTC until approximately 23:00 UTC for a planned upgrade http://lists.openstack.org/pipermail/openstack-dev/2016-August/101395.html
  • 2016-08-12 20:51:18 UTC The Gerrit service on review.openstack.org is restarting for a scheduled upgrade, but should return to service momentarily: http://lists.openstack.org/pipermail/openstack-dev/2016-August/101394.html
  • 2016-08-12 18:36:06 UTC Added wiki.openstack.org to /etc/ansible/hosts/emergency on puppetmaster.openstack.org in preparation for 21:00 UTC upgrade maintenance
  • 2016-08-10 16:51:12 UTC nodepool-builder restarted on nodepool.o.o to pickup nodepool.yaml changes for bluebox-sjc1
  • 2016-08-10 05:26:14 UTC zuul is being restarted to reload configuration. Jobs should be re-enqueued but if you're missing anything (and it's not on http://status.openstack.org/zuul/) please issue a recheck in 30min.
  • 2016-08-08 08:40:29 UTC Gerrit is going to be restarted
  • 2016-08-02 23:50:13 UTC restarted zuul to clear geard function registration to fix inaccuracies with nodepool demand calculations
  • 2016-07-30 16:59:01 UTC Emergency filesystem repairs are complete; any changes which failed jobs with POST_FAILURE status or due to lack of access to tarballs can be safely rechecked now
  • 2016-07-30 14:25:39 UTC Cinder connectivity was lost to the volumes for sites served from static.openstack.org (logs, docs-draft, tarballs) and so they will remain offline until repairs are complete
  • 2016-07-30 10:00:23 UTC All jobs currenty fail with POST_FAILURE
  • 2016-07-30 05:00:49 UTC zuul-launcher release ran on zl04-zl07, I've left the first 4 zuul-launchers so we can debug the "too many ready node online" issue
  • 2016-07-29 16:47:09 UTC Our PyPI mirrors should be current again as of 16:10 UTC today
  • 2016-07-28 22:50:11 UTC performed full restart of elasticsearch cluster to get it indexing logs again.
  • 2016-07-27 21:26:43 UTC more carefully restarted logstash daemons again. Bigdesk reports significantly higher data transport rates indicating maybe it is happy now.
  • 2016-07-27 14:31:01 UTC auto-hold added to nodepool.o.o for gate-project-config-layout while we debug pypi mirror failures
  • 2016-07-27 13:54:13 UTC Gerrit is being restarted now to relieve performance degradation
  • 2016-07-27 04:19:26 UTC gate-tempest-dsvm-platform-fedora24 added to nodepool auto-hold to debug ansible failures
  • 2016-07-26 20:03:46 UTC restarted logstash worker and logstash indexer daemons to get logstash data flowing again.
  • 2016-07-22 15:29:01 UTC Up to one hour outage expected for static.openstack.org/main04 cinder volume on Saturday, July 30, starting at 08:00 UTC; log uploads issues will probably break all ci jobs and need filesystem remediation after the maintenance concludes
  • 2016-07-22 00:02:34 UTC gerrit/git gc change merged; gerrit and git.o.o repos should gc'd at 04:07 UTC
  • 2016-07-21 00:00:31 UTC All file uploads are disabled on wiki.openstack.org by https://review.openstack.org/345100
  • 2016-07-20 20:07:42 UTC Wiki admins should watch https://wiki.openstack.org/w/index.php?title=Special%3AListUsers&username=&group=&creationSort=1&desc=1&limit=50 for signs of new accounts spamming (spot check linked "contribs" for them)
  • 2016-07-20 20:07:02 UTC New user account creation has been reenabled for the wiki by https://review.openstack.org/344502
  • 2016-07-19 20:20:36 UTC Puppet is reenabled on wiki.openstack.org, and is updating the page edit captcha from questy to recaptcha
  • 2016-07-16 17:34:08 UTC disabled "Microsoft Manila CI", account id 18128 because it was in a comment loop on change 294830
  • 2016-07-15 14:19:47 UTC Gerrit is restarting to correct memory/performance issues.
  • 2016-07-12 01:11:05 UTC zlstatic01.o.o back online
  • 2016-07-11 23:51:57 UTC zlstatic01 in graceful mode
  • 2016-07-08 22:26:21 UTC manually downgraded elasticsearch-curator and ran it to clean out old indexes that were making cluster very slow and unhappy
  • 2016-07-08 21:51:39 UTC restarted logstash on logstash workers with some help from kill. The daemons were not processing events leading to the crazy logstash queue graphs and refused to restart normally.
  • 2016-07-08 16:38:05 UTC ran puppet on codesearch.openstack.org and manually restarted hound
  • 2016-07-06 06:29:08 UTC All python 3.5 jobs are failing today, we need to build new xenial images first.
  • 2016-07-05 18:15:59 UTC Job instability resulting from a block storage connectivity error on mirror.iad.rax.openstack.org has been corrected; jobs running in rax-iad should be more reliable again.
  • 2016-07-05 10:37:26 UTC we now have python35 jobs enabled
  • 2016-07-04 08:16:19 UTC setuptools 24.0.0 broke dsvm tests, we've gone back to old images, it's safe to recheck now if you had a failure related to setuptools 24.0.0 (processor_architecture) - see bug 1598525
  • 2016-07-04 00:56:10 UTC To work around the periodic group expansion issue causing puppet to run on hosts disabled in our groups.txt file in git, i have added the list of disabled hosts from it to the emergency disabled group on the puppetmaster for now
  • 2016-07-02 00:06:39 UTC Gerrit, Zuul and static.openstack.org now available following the scheduled maintenance window.
  • 2016-07-01 20:08:28 UTC Gerrit is offline for maintenance until approximately 22:00 UTC
  • 2016-07-01 19:54:58 UTC The infrastructure team is taking Gerrit offline for maintenance beginning shortly after 20:00 UTC to upgrade the Zuul and static.openstack.org servers. We aim to have it back online around 22:00 UTC.
  • 2016-06-30 16:22:04 UTC zlstatic01.o.o restart to pick up zuul.NodeWorker.wheel-mirror-ubuntu-xenial-amd64.slave.openstack.org
  • 2016-06-29 21:30:29 UTC bindep 2.0.0 release and firefox/xvfb removal from bindep-fallback.txt should take effect in our next image update
  • 2016-06-29 18:59:30 UTC UCA AFS mirror online
  • 2016-06-29 18:29:58 UTC bindep 2.0.0 released
  • 2016-06-23 23:23:13 UTC https://github.com/Shrews/ansible-modules-core/commit/d11cb0d9a1c768735d9cb4b7acc32b971b524f13
  • 2016-06-23 23:22:23 UTC zuul launchers are all running locally patched ansible (source in ~root/ansible) to correct and/or further debug async timeout issue
  • 2016-06-22 22:09:48 UTC nodepool also supports auto-holding nodes for specific failed jobs (it will set the reason appropriately)
  • 2016-06-22 22:09:14 UTC nodepool now support adding a reason when holding a node "--reason <foo>" please use that so that we can remember why they are held :)
  • 2016-06-21 16:07:09 UTC Gerrit is being restarted now to apply an emergency security-related configuration change
  • 2016-06-20 13:14:52 UTC OpenID logins are back to normal
  • 2016-06-20 13:01:26 UTC OpenID login from review.o.o is experiencing difficulties, possibly due to transatlantic network performance issues. Things are being investigated
  • 2016-06-20 10:40:50 UTC static.openstack.org is back up. If you have POST_FAILURE and are missing logs from your CI jobs, please leave a 'recheck'.
  • 2016-06-20 05:24:05 UTC static.openstack.org (which hosts logs.openstack.org and tarballs.openstack.org among others) is currently being rebuilt. As jobs can not upload logs they are failing with POST_FAILURE. This should be resolved soon. Please do not recheck until then.
  • 2016-06-20 03:11:54 UTC static.openstack.org (which hosts logs.openstack.org) is currently migrating due to a hardware failure. It should be back up shortly.
  • 2016-06-18 17:44:10 UTC zl01 restarted properly
  • 2016-06-18 17:21:20 UTC zl01 currently graceful restarting via 330184
  • 2016-06-18 16:38:42 UTC Gerrit is restarting now to relieve memory pressure and restore responsiveness
  • 2016-06-17 16:34:44 UTC zuul was restarted for a software upgrade; events between 16:08 and 16:30 were missed, please recheck any changes uploaded during that time
  • 2016-06-17 01:14:35 UTC follow-up mail about zuul-related changes: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097595.html
  • 2016-06-16 23:56:49 UTC all jenkins servers have been deleted
  • 2016-06-16 22:43:06 UTC Jenkins is retired: http://lists.openstack.org/pipermail/openstack-dev/2016-June/097584.html
  • 2016-06-16 20:20:36 UTC zl05 - zl07 are in production; jenkins05 - jenkins07 are in prepare for shutdown mode pending decomissioning
  • 2016-06-15 18:52:04 UTC jenkins07 back online. Will manually cleanup used nodes moving forward
  • 2016-06-15 18:40:21 UTC jenkins03 and jenkins04 are in prepare-for-shutdown mode in preparation for decomissioning
  • 2016-06-13 19:50:30 UTC zuul has been restarted with registration checks disabled -- we should no longer see NOT_REGISTERED errors after zuul restarts.
  • 2016-06-13 16:24:44 UTC jenkins02.openstack.org has been deleted
  • 2016-06-10 22:19:31 UTC jenkins02 is in prepare for shutdown in preparation for decomissioning
  • 2016-06-10 06:31:03 UTC All translation imports have broken UTF-8 encoding.
  • 2016-06-09 20:07:08 UTC jenkins.o.o is in prepare-for-shutdown in preparation for decomissioning. zlstatic01.openstack.org is running and attached to its workers instead.
  • 2016-06-09 17:42:26 UTC deleted jenkins01.openstack.org
  • 2016-06-08 18:12:10 UTC Zuul has been restarted to correct an error condition. Events since 17:30 may have been missed; please 'recheck' your changes if they were uploaded since then, or have "NOT_REGISTERED" errors.
  • 2016-06-08 00:24:27 UTC nodepool.o.o restarted to pick up review 326114
  • 2016-06-07 23:25:57 UTC jenkins01 is in prepare-for-shutdown mode in preparation for decommissioning.
  • 2016-06-07 08:13:44 UTC dig gate for project-config is fixed again with https://review.openstack.org/326273 merged.
  • 2016-06-07 07:12:13 UTC All project-config jobs fail - the dib gate is broken.
  • 2016-06-06 18:09:46 UTC zl01.openstack.org in production
  • 2016-06-04 01:23:46 UTC Gerrit maintenance concluded successfully
  • 2016-06-04 00:08:07 UTC Gerrit is offline for maintenance until 01:45 UTC (new ETA)
  • 2016-06-03 20:12:32 UTC Gerrit is offline for maintenance until 00:00 UTC
  • 2016-06-03 20:00:59 UTC The infrastructure team is taking Gerrit offline for maintenance this afternoon, beginning shortly after 20:00 UTC. We aim to have it back online around 00:00 UTC.
  • 2016-06-03 14:02:43 UTC Cleanup from earlier block storage disruption on static.openstack.org has been repaired, and any jobs which reported an "UNSTABLE" result or linked to missing logs between 08:00-14:00 UTC can be retriggered by leaving a "recheck" comment.
  • 2016-06-03 11:44:18 UTC CI is experiencing issues with test logs, all jobs are currently UNSTABLE as a result. No need to recheck until this is fixed! Thanks for your patience.
  • 2016-06-03 10:11:14 UTC CI is experiencing issues with test logs, all jobs are currently UNSTABLE as a result. No need to recheck until this is fixed! Thanks for your patience.
  • 2016-06-03 09:38:30 UTC CI is experiencing issues with test logs, all jobs are currently UNSTABLE as a result. No need to recheck until this is fixed! Thanks for your patience.
  • 2016-06-02 01:09:39 UTC nodepool.o.o restarted to fix jenkins01.o.o (wasn't launching jobs)
  • 2016-06-01 23:08:46 UTC zl01.openstack.org is back in production handling a portion of the job load
  • 2016-05-30 14:18:17 UTC openstack-meetbot back online, there was an issue with DNS.
  • 2016-05-30 13:13:53 UTC Statusbot has been restarted (no activity since 27/05)
  • 2016-05-27 23:00:57 UTC eavesdrop.o.o upgraded to ubuntu-trusty and online!
  • 2016-05-27 22:23:22 UTC statusbot back online
  • 2016-05-27 19:33:52 UTC elasticsearch07.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 18:59:58 UTC logstash.openstack.org upgraded to ubuntu trusty
  • 2016-05-27 18:51:40 UTC elasticsearch06.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 18:06:09 UTC jenkins06.o.o back online
  • 2016-05-27 17:58:01 UTC jenkins05.o.o back online
  • 2016-05-27 17:47:38 UTC elasticsearch05.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 17:24:17 UTC elasticsearch04.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 16:43:54 UTC elasticsearch03.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 16:20:16 UTC elasticsearch02.o.o upgraded to ubuntu-trusty and cluster is green
  • 2016-05-27 13:32:30 UTC nodepoold restarted to address zmq issue with jenkins02 and jenkins06
  • 2016-05-27 07:15:08 UTC zuul required a restart due to network outages. If your change is not listed on http://status.openstack.org/zuul/ and is missing results, please issue a 'recheck'.
  • 2016-05-27 03:23:11 UTC after a quick check, gerrit and its filesystem have been brought back online and should be working again
  • 2016-05-27 03:03:41 UTC Gerrit is going offline briefly to check possible filesystem corruption
  • 2016-05-27 00:48:13 UTC logstash-worker20.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-27 00:32:59 UTC logstash-worker19.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-27 00:18:55 UTC logstash-worker18.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-27 00:10:33 UTC puppetmaster.o.o remove from emergency file since OSIC is now back online
  • 2016-05-27 00:01:23 UTC logstash-worker17.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 23:29:01 UTC logstash-worker16.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 23:01:18 UTC logstash-worker15.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 22:33:27 UTC logstash-worker14.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 22:17:25 UTC zl01 removed from production
  • 2016-05-26 22:12:26 UTC logstash-worker13.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 21:59:00 UTC paste.openstack.org now running ubuntu-trusty and successfully responding to requests
  • 2016-05-26 21:43:25 UTC zuul launcher zl01.openstack.org is in production (handling load in parallel with jenkins)
  • 2016-05-26 21:05:10 UTC logstash-worker12.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 20:57:15 UTC puppet disabled on puppetmaster (for the puppetmaster host itsself -- not globally) and OSIC manually removed from clouds.yaml because OSIC is down which is causing ansible openstack inventory to fail
  • 2016-05-26 20:21:28 UTC osic appears down at the moment. Following up with #osic for information
  • 2016-05-26 19:45:23 UTC logstash-worker11.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:36:29 UTC logstash-worker10.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:23:09 UTC logstash-worker09.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:11:40 UTC logstash-worker08.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 18:00:29 UTC logstash-worker07.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 17:47:15 UTC logstash-worker06.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 17:00:59 UTC logstash-worker05.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 16:26:21 UTC logstash-worker04.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 16:11:10 UTC logstash-worker03.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 15:50:12 UTC logstash-worker02.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-26 15:28:49 UTC logstash-worker01.openstack.org now running ubuntu-trusty and processing requests
  • 2016-05-25 21:05:16 UTC zuul has been restarted with a change that records and reports estimated job durations internally. job times will be under-estimated until zuul builds up its internal database
  • 2016-05-25 20:35:42 UTC status.o.o has been upgraded to ubuntu trusty
  • 2016-05-25 18:42:28 UTC storyboard.o.o has been upgraded to ubuntu trusty
  • 2016-05-25 18:42:06 UTC graphite.o.o has been upgraded to ubuntu trusty
  • 2016-05-24 22:28:54 UTC graphite.o.o is currently down, we have an open ticket with RAX regarding the detaching of cinder volumes. 160524-dfw-0003689
  • 2016-05-24 20:23:55 UTC zuul-dev.openstack.org now running on ubuntu-trusty
  • 2016-05-24 19:34:00 UTC zm08.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 19:19:27 UTC zm07.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 19:09:49 UTC zm06.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 18:53:11 UTC zm05.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 18:30:51 UTC zm04.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 18:12:01 UTC zm03.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 17:52:34 UTC zm02.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 17:32:53 UTC zm01.openstack.org now running on ubuntu-trusty and processing gearman requests
  • 2016-05-24 13:21:46 UTC nodepoold restarted to pick up new version of shade / clean-floating-ips
  • 2016-05-23 17:46:37 UTC changed cacti.openstack.org IP address (for upgrade to trusty); gap in data around this time while iptables updates everywhere to allow snmp
  • 2016-05-20 13:40:03 UTC I've stopped jenkins01.o.o, it doesn't appear to be working properly. Nodes attach to jenkins but are not launched by nodepool. I believe zl01 might be the issue
  • 2016-05-18 20:12:03 UTC ran restart_jenkins_masters.yaml on jenkins02.o.o
  • 2016-05-18 01:47:59 UTC Gerrit is about to be restarted to help with page timeouts
  • 2016-05-18 01:28:06 UTC ovh-bhs1 has been down for better part of the last 12 hours. See http://paste.openstack.org/show/497434/ for info about the exception
  • 2016-05-18 00:55:21 UTC nodepool restarted to pickup clean-floating-ips patch
  • 2016-05-13 09:04:38 UTC tripleo-f22 nodes slowing coming online now in nodepool
  • 2016-05-13 08:32:35 UTC tripleo-test-cloud-rh1 added back to nodepool.o.o however having currently having issues launching tripleo-f22 nodes. TripleO CI team should be looking into it
  • 2016-05-13 07:03:46 UTC Removed nodepool.o.o from emergency file on puppetmaster.o.o
  • 2016-05-11 21:56:35 UTC nodepool restarted to pickup https://review.openstack.org/#/c/294339/
  • 2016-05-11 18:47:20 UTC npm mirror sync finished; lock is released
  • 2016-05-11 16:16:27 UTC all afs mirror volumes have been moved to afs01.dfw and afs02.dfw (so they are no longer in ord) to speed up vos release times. all are in regular service using read-only replicas except for npm.
  • 2016-05-11 12:00:27 UTC We have a workaround for our mirrors to attempt to translate package names if a match isn't immediately obvious. A more complete fix is yet to come. It is now safe to 'recheck' any jobs that failed due to "No matching distribution found". Please join #openstack-infra if you discover more problems.
  • 2016-05-11 07:08:56 UTC pip 8.1.2 broke our local python mirror, some jobs will fail with "No matching distribution found". We're investigating. Do not "recheck" until the issue is solved
  • 2016-05-10 17:11:59 UTC created afs02.dfw.openstack.org fileserver
  • 2016-05-10 16:14:42 UTC afs update: the vos release -force completed in just under 59 hours, so i followed up with a normal vos release (no -force) thereafter to make sure it will complete without error now. it's been running for ~4.5 hours so far
  • 2016-05-09 12:54:07 UTC released bandersnatch lock on mirror-update.o.o to resume bandersnatch updates
  • 2016-05-07 23:21:13 UTC vos release of mirror.pypi is running with -force this time, under the usual root screen session on afs0.dfw.openstack.org
  • 2016-05-06 23:57:19 UTC the Review-MySQL trove instance has now been expanded to 50gb (19% full) and /home/gerrit2 on review.openstack.org increased to 200gb (47% full)
  • 2016-05-06 19:06:56 UTC opened support ticket 160506-iad-0001201 for Review-MySQL trove instance taking >3 hours (so far) to resize its backing volume
  • 2016-05-06 16:56:58 UTC osic-cloud1 is coming back online. Thanks for the help #osic
  • 2016-05-06 16:46:35 UTC osic-cloud1 is down at the moment, #osic is looking into the issue. Will update shortly.
  • 2016-05-06 16:02:59 UTC OSIC leads 21 FIPs, they have been deleted manually.
  • 2016-05-06 15:43:14 UTC the current 100gb /home/gerrit2 on review.openstack.org is 95% full, so i've added a new 200gb ssd volume to review.o.o as a replacement for the current 100gb ssd volume. once i'm comfortable that things are still stable after the trove volume resize, i'll pvmove the extents from the old cinder volume to the new one and then extend the lv/fs to 200gb
  • 2016-05-06 15:42:37 UTC the trove instance for review.openstack.org was 10gb and 90% full, so i'm upping it to 50gb (which is supposed to be a non-impacting online operation)
  • 2016-05-06 14:47:31 UTC Zuul has been restarted. As a results, we only preserved patches in the gate queue. Be sure to recheck your patches in gerrit if needed.
  • 2016-05-06 14:17:04 UTC Zuul is currently recovering from a large number of changes, it will take a few hours until your job is processed. Please have patience and enjoy a great weekend!
  • 2016-05-05 20:30:54 UTC Gerrit is restarting to revert incorrect changes to test result displays
  • 2016-05-05 19:22:43 UTC Gerrit is restarting to address performance issues related to a suspected memory leak
  • 2016-05-03 20:38:56 UTC through some careful scripting (which involved apache reconfiguration to stop holding an open file lock) i offlined the tarballs volume on static.openstack.org to repair its filesystem so it could be remounted read-write
  • 2016-05-03 20:28:58 UTC restarting apache on review.openstack.org to pick up security patches. Gerrit web ui may disappear for a short time.
  • 2016-05-03 09:24:59 UTC Docs-draft filesystem has been restored. Please check your affected jobs again
  • 2016-05-03 08:36:36 UTC Filesystem on docs-draft.openstack.org is broken, we are on the process of repairing it. Please stop checking jobs using this filesystem until further notice
  • 2016-05-03 08:27:24 UTC Logs filesystem has been successfully restored, please recheck your jobs
  • 2016-05-03 06:47:23 UTC Filesystem on logs.openstack.org is broken, we are on the process of repairing it. Please stop checking your jobs until further notice
  • 2016-05-03 00:37:42 UTC gerrit configuration update blocked on failing beaker tests due to missing bouncycastle releases; job being made nonvoting in https://review.openstack.org/311898
  • 2016-05-02 23:47:45 UTC due to an error in https://review.openstack.org/295530 which will be corrected in https://review.openstack.org/311888 gerrit should not be restarted until the second change lands
  • 2016-05-02 21:51:56 UTC manual vos release of pypi mirror started in screen on fileserver; see https://etherpad.openstack.org/p/fix-afs
  • 2016-05-02 15:19:44 UTC steps to fix the pypi mirror problem in progress: https://etherpad.openstack.org/p/fix-afs
  • 2016-05-02 06:53:53 UTC AFS mirrors not publishing, they get suck on vos release since 29th April
  • 2016-04-22 15:03:19 UTC Log server was repaired as of 10:50 UTC and jobs have been stable since. If necessary, please recheck changes that have 'UNSTABLE' results.
  • 2016-04-22 10:54:56 UTC Log server has been repaired and jobs are stable again. If necessary please recheck changes that have 'UNSTABLE' results.
  • 2016-04-22 07:32:05 UTC Logs are failing to be uploaded causing jobs to be marked as UNSTABLE. We are working on repairing the log filesystem and will update when ready. Please do not recheck before then.
  • 2016-04-21 12:49:48 UTC OVH provider is enabled again, please wait for the job queue to be processed
  • 2016-04-21 10:38:33 UTC OVH servers are down, we are working to solve it. This will cause that jobs queue is processed slowly, please have patience.
  • 2016-04-19 13:41:32 UTC We have recovered one of our cloud providers, but there is a huge backlog of jobs to process. Please have patience until your jobs are processed
  • 2016-04-15 09:51:47 UTC Zuul and gerrit are working normally now. Please recheck any jobs that may have been affected by this failure.
  • 2016-04-15 09:23:40 UTC No jobs are being processed by gerrit and zuul . We are working to solve the problem, please be aware that no changes have been sent to the queue in the last hour, so you will need to recheck jobs for that period.
  • 2016-04-15 09:06:29 UTC Gerrit is going to be restarted because is not processing new changes
  • 2016-04-11 21:08:40 UTC Gerrit move maintenance completed successfully; note that DNS has been updated to new IP addresses as indicated in http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-11 20:08:57 UTC Gerrit is offline until 21:00 UTC for a server replacement http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-11 19:51:50 UTC Gerrit will be offline from 20:00 to 21:00 UTC (starting 10 minutes from now) for a server replacement http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-11 16:20:17 UTC Reminder, Gerrit will be offline from 20:00 to 21:00 UTC for a server replacement http://lists.openstack.org/pipermail/openstack-dev/2016-April/091274.html
  • 2016-04-07 08:36:04 UTC jobs depending on npm are now working again
  • 2016-04-06 10:20:39 UTC npm lint jobs are failing due to a problem with npm registry. The problem is under investigation, and we will update once the issue is solved.
  • 2016-04-05 20:01:57 UTC ubuntu xenial mirrors now online.
  • 2016-04-05 14:51:52 UTC dns for openstackid.org has been changed from 2001:4800:7817:102:be76:4eff:fe05:d9cd and 23.253.97.70 (openstackid 1.0.17 on ubuntu precise) to 2001:4800:7815:101:be76:4eff:fe04:7741 and 23.253.243.97 (openstackid 1.0.18 on ubuntu trusty). record ttls remain 300s for now
  • 2016-04-05 13:04:10 UTC jenkins06.o.o back online, appears to have run out of RAM
  • 2016-04-04 07:15:37 UTC Gerrit is going to be restarted due to bad performance
  • 2016-03-31 19:56:01 UTC Any jobs which erroneously failed on missing traceroute packages should be safe to recheck now
  • 2016-03-31 17:49:51 UTC Job failures for missing traceroute packages are in the process of being fixed now, ETA 30 minutes to effectiveness for new jobs
  • 2016-03-30 11:15:35 UTC Gate on project-config is currently broken due to IRC tests. The problem has been detected and we are working to fix the issue as soon as possible.
  • 2016-03-28 15:22:43 UTC Gerrit is restarting on review.openstack.org in an attempt to address an issue reading an object from the ec2-api repository
  • 2016-03-24 17:08:05 UTC restarted gerrit to address GC issue
  • 2016-03-21 14:59:32 UTC Rackspace has opened support tickets warning of disruptive maintenance March 22 05:00-07:00 UTC, March 24 03:00 to 07:00 UTC, and March 25 02:00 to 06:00 UTC which could impact network connectivity including disconnecting from Trove databases and Cinder block devices
  • 2016-03-19 22:25:25 UTC Gerrit is restarting to increase performance issues
  • 2016-03-15 15:33:38 UTC Launchpad SSO is back to normal - happy hacking
  • 2016-03-15 15:00:29 UTC Launchpad OpenID SSO is currently experiencing issues preventing login. The Launchpad team is working on the issue
  • 2016-03-15 11:37:22 UTC Gerrit had to be restarted because was not responsive. As a consequence, some of the test results have been lost, from 09:30 UTC to 11:30 UTC approximately. Please recheck any affected jobs by this problem.
  • 2016-03-15 11:34:39 UTC Gerrit had to be restarted because was not responsive. As a consequence, some of the test results have been lost, from 08:30 UTC to 10:30 UTC approximately. Please recheck any affected jobs by this problem.
  • 2016-03-15 11:15:09 UTC Gerrit is going to be restarted
  • 2016-03-11 11:01:42 UTC Gerrit has been restarted successfully
  • 2016-03-11 10:56:07 UTC Gerrit is going to be restarted due to bad performance
  • 2016-03-07 07:25:45 UTC gerrit is going to be restarted due to bad performance
  • 2016-03-04 11:25:20 UTC testing status bot
  • 2016-03-01 10:45:18 UTC gerrit finished restartign
  • 2016-03-01 10:39:09 UTC Gerrit is going to be restarted due to poor performance
  • 2016-02-29 12:07:53 UTC Infra currently has a long backlog. Please be patient and where possible avoid rechecks while it catches up.
  • 2016-02-19 08:35:19 UTC Gerrit is going to be restarted due to performance problems
  • 2016-02-17 06:50:39 UTC A problem with the mirror used for CI jobs in the rax-iad region has been corrected. Please recheck changes that recently failed jobs on nodes in rax-iad.
  • 2016-02-13 17:42:02 UTC Gerrit is back up
  • 2016-02-13 15:11:57 UTC Gerrit is offline for filesystem repair
  • 2016-02-13 00:23:22 UTC Gerrit is offline for maintenance, ETA updated to 01:00 utc
  • 2016-02-12 23:43:30 UTC Gerrit is offline for maintenance, ETA updated to 23:59 utc
  • 2016-02-12 23:08:44 UTC Gerrit is offline for maintenance, ETA updated to 23:30 utc
  • 2016-02-12 22:07:37 UTC Gerrit is offline for maintenacne until 23:00 utc
  • 2016-02-12 21:47:47 UTC The infrastructure team is taking gerrit offline for maintenance this afternoon, beginning at 22:00 utc. We should have it back online around 23:00 utc. http://lists.openstack.org/pipermail/openstack-dev/2016-February/086195.html
  • 2016-02-09 17:25:39 UTC Gerrit is restarting now, to alleviate current performance impact and WebUI errors.
  • 2016-02-03 12:41:39 UTC Infra running with lower capacity now, due to a temporary problem affecting one of our nodepool providers. Please expect some delays in your jobs. Apologies for any inconvenience caused.
  • 2016-01-30 09:23:17 UTC Testing status command
  • 2016-01-22 17:52:01 UTC Restarting zuul due to a memory leak
  • 2016-01-20 11:56:15 UTC Restart done, review.openstack.org is available
  • 2016-01-20 11:45:12 UTC review.openstack.org is being restarted to apply patches
  • 2016-01-18 16:50:38 UTC Gerrit is restarting quickly as a workaround for performance degradation
  • 2016-01-11 22:06:57 UTC Gerrit is restarting to resolve java memory issues
  • 2015-12-17 16:43:53 UTC Zuul is moving in very slow motion since roughly 13:30 UTC; the Infra team is investigating.
  • 2015-12-16 21:02:59 UTC Gerrit has been upgraded to 2.11. Please report any issues in #openstack-infra as soon as possible.
  • 2015-12-16 17:07:00 UTC Gerrit is offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html
  • 2015-12-16 16:21:49 UTC Gerrit will be offline for a software upgrade from 17:00 to 21:00 UTC. See: http://lists.openstack.org/pipermail/openstack-dev/2015-December/081037.html
  • 2015-12-04 16:55:08 UTC The earlier JJB bug which disrupted tox-based job configurations has been reverted and applied; jobs seem to be running successfully for the past two hours.
  • 2015-12-04 09:32:24 UTC Tox tests are broken at the moment. From openstack-infra we are working to fix them. Please don't approve changes until we notify that tox tests work again.
  • 2015-11-06 20:04:47 UTC Gerrit is offline until 20:15 UTC today for scheduled project rename maintenance
  • 2015-11-06 19:41:20 UTC Gerrit will be offline at 20:00-20:15 UTC today (starting 20 minutes from now) for scheduled project rename maintenance
  • 2015-10-27 06:32:40 UTC CI will be disrupted for an indeterminate period while our service provider reboots systems for a security fix
  • 2015-10-17 18:40:01 UTC Gerrit is back online. Github transfers are in progress and should be complete by 1900 UTC.
  • 2015-10-17 18:03:25 UTC Gerrit is offline for project renames.
  • 2015-10-17 17:11:10 UTC Gerrit will be offline for project renames starting at 1800 UTC.
  • 2015-10-13 11:19:47 UTC Gerrit has been restarted and is responding to normal load again.
  • 2015-10-13 09:44:48 UTC gerrit is undergoing an emergency restart to investigate load issues
  • 2015-10-05 14:03:13 UTC Gerrit was restarted to temporarily address performance problems
  • 2015-09-17 10:16:42 UTC Gate back to normal, thanks to the backlisting of the problematic version
  • 2015-09-17 08:02:50 UTC Gate is currently stuck, failing grenade upgrade tests due the release of oslo.utils 1.4.1 for Juno.
  • 2015-09-11 23:04:39 UTC Gerrit is offline from 23:00 to 23:30 UTC while some projects are renamed. http://lists.openstack.org/pipermail/openstack-dev/2015-September/074235.html
  • 2015-09-11 22:32:57 UTC 30 minute warning, Gerrit will be offline from 23:00 to 23:30 UTC while some projects are renamed http://lists.openstack.org/pipermail/openstack-dev/2015-September/074235.html
  • 2015-08-31 20:27:18 UTC puppet agent temporarily disabled on nodepool.openstack.org to avoid accidental upgrade to python-glanceclient 1.0.0
  • 2015-08-26 15:45:47 UTC restarting gerrit due to a slow memory leak
  • 2015-08-17 10:50:24 UTC Gerrit restart has resolved the issue and systems are back up and functioning
  • 2015-08-17 10:23:42 UTC review.openstack.org (aka gerrit) is going down for an emergency restart
  • 2015-08-17 07:07:38 UTC Gerrit is currently under very high load and may be unresponsive. infra are looking into the issue.
  • 2015-08-12 00:06:30 UTC Zuul was restarted due to an error; events (such as approvals or new patchsets) since 23:01 UTC have been lost and affected changes will need to be rechecked
  • 2015-08-05 21:11:30 UTC Correction: change events between 20:50-20:54 UTC (during the restart only) have been lost and will need to be rechecked or their approvals reapplied to trigger testing.
  • 2015-08-05 21:06:19 UTC Zuul has been restarted to resolve a reconfiguration failure: previously running jobs have been reenqueued but change events between 19:50-20:54 UTC have been lost and will need to be rechecked or their approvals reapplied to trigger testing.
  • 2015-08-03 13:41:37 UTC The Gerrit service on review.openstack.org has been restarted in an attempt to improve performance.
  • 2015-07-30 09:01:49 UTC CI is back online but has a huge backlog. Please be patient and if possible delay approving changes until it has caught up.
  • 2015-07-30 07:52:49 UTC CI system is broken and very far behind. Please do not approve any changes for a while.
  • 2015-07-30 07:43:12 UTC Our CI system is broken again today, jobs are not getting processed at all.
  • 2015-07-29 13:27:42 UTC zuul jobs after about 07:00 UTC may need a 'recheck' to enter the queue. Look if your change is in http://status.openstack.org/zuul/ and recheck if not.
  • 2015-07-29 12:52:20 UTC zuul's disks were at capacity. Space has been freed up and jobs are being re-queued.
  • 2015-07-29 09:30:59 UTC Currently our CI system is broken, jobs are not getting processed at all.
  • 2015-07-28 08:04:50 UTC zuul has been restarted and queues restored. It may take some time to work through the backlog.
  • 2015-07-28 06:48:20 UTC zuul is stuck and about to undergo an emergency restart, please be patient as job results may take a long time
  • 2015-07-22 14:35:43 UTC CI is slowly recovering, please be patient while the backlog is worked through.
  • 2015-07-22 14:17:30 UTC CI is currently recovering from an outage overnight. It is safe to recheck results with NOT_REGISTERED errors. It may take some time for zuul to work through the backlog.
  • 2015-07-22 08:16:50 UTC zuul jobs are currently stuck while problems with gearman are debugged
  • 2015-07-22 07:24:43 UTC zuul is undergoing an emergency restart. Jobs will be re-queued but some events may be lost.
  • 2015-07-10 22:00:47 UTC Gerrit is unavailable from approximately 22:00 to 22:30 UTC for project renames
  • 2015-07-10 21:04:01 UTC Gerrit will be unavailable from 22:00 to 22:30 UTC for project renames
  • 2015-07-03 19:33:46 UTC etherpad.openstack.org is still offline for scheduled database maintenance, ETA 19:45 UTC
  • 2015-07-03 19:05:45 UTC etherpad.openstack.org is offline for scheduled database maintenance, ETA 19:30 UTC
  • 2015-06-30 14:56:00 UTC The log volume was repaired and brought back online at 14:00 UTC. Log links today from before that time may be missing, and changes should be rechecked if fresh job logs are desired for them.
  • 2015-06-30 08:50:29 UTC OpenStack CI is down due to hard drive failures
  • 2015-06-12 22:45:07 UTC Gerrit is back online. Zuul reconfiguration for renamed projects is still in progress, ETA 23:30.
  • 2015-06-12 22:10:50 UTC Gerrit is offline for project renames. ETA 22:40
  • 2015-06-12 22:06:20 UTC Gerrit is offline for project renames. ETA 20:30
  • 2015-06-12 21:45:26 UTC Gerrit will be offline for project renames between 22:00 and 22:30 UTC
  • 2015-06-11 21:08:10 UTC Gerrit has been restarted to terminate a persistent looping third-party CI bot
  • 2015-06-04 18:43:17 UTC Gerrit has been restarted to clear an issue with its event stream. Any change events between 17:25 and 18:38 UTC should be rechecked or have their approvals reapplied to initiate testing.
  • 2015-05-13 23:00:05 UTC Gerrit and Zuul are back online.
  • 2015-05-13 22:42:09 UTC Gerrit and Zuul are going offline for reboots to fix a security vulnerability.
  • 2015-05-12 00:58:04 UTC Gerrit has been downgraded to version 2.8 due to the issues observed today. Please report further problems in #openstack-infra.
  • 2015-05-11 23:56:14 UTC Gerrit is going offline while we perform an emergency downgrade to version 2.8.
  • 2015-05-11 17:40:47 UTC We have discovered post-upgrade issues with Gerrit affecting nova (and potentially other projects). Some changes will not appear and some actions, such as queries, may return an error. We are continuing to investigate.
  • 2015-05-09 18:32:43 UTC Gerrit upgrade completed; please report problems in #openstack-infra
  • 2015-05-09 16:03:24 UTC Gerrit is offline from 16:00-20:00 UTC to upgrade to version 2.10.
  • 2015-05-09 15:18:16 UTC Gerrit will be offline from 1600-2000 UTC while it is upgraded to version 2.10
  • 2015-05-06 00:43:52 UTC Restarted gerrit due to stuck stream-events connections. Events since 23:49 were missed and changes uploaded since then will need to be rechecked.
  • 2015-05-05 17:05:25 UTC zuul has been restarted to troubleshoot an issue, gerrit events between 15:00-17:00 utc were lost and changes updated or approved during that time will need to be rechecked or have their approval votes readded to trigger testing
  • 2015-04-29 14:06:55 UTC gerrit has been restarted to clear a stuck events queue. any change events between 13:29-14:05 utc should be rechecked or have their approval votes reapplied to trigger jobs
  • 2015-04-28 15:38:04 UTC gerrit has been restarted to clear an issue with its event stream. any change events between 14:43-15:30 utc should be rechecked or have their approval votes reapplied to trigger jobs
  • 2015-04-28 12:43:46 UTC Gate is experiencing epic failures due to issues with mirrors, work is underway to mitigate and return to normal levels of sanity
  • 2015-04-27 13:48:14 UTC gerrit has been restarted to clear a problem with its event stream. change events between 13:09 and 13:36 utc should be rechecked or have approval votes reapplied as needed to trigger jobs
  • 2015-04-27 08:11:05 UTC Restarting gerrit because it stopped sending events (ETA 15 mins)
  • 2015-04-22 17:33:33 UTC gerrit is restarting to clear hung stream-events tasks. any review events between 16:48 and 17:32 utc will need to be rechecked or have their approval votes reapplied to trigger testing in zuul
  • 2015-04-18 15:11:25 UTC Gerrit is offline for emergency maintenance, ETA 15:30 UTC to completion
  • 2015-04-18 14:32:11 UTC Gerrit will be offline between 15:00-15:30 UTC today for emergency maintenance (starting half an hour from now)
  • 2015-04-18 14:02:07 UTC Gerrit will be offline between 15:00-15:30 UTC today for emergency maintenance (starting an hour from now)
  • 2015-04-18 02:29:15 UTC gerrit is undergoing a quick-ish restart to implement a debugging patch. should be back up in ~10 minutes. apologies for any inconvenience
  • 2015-04-17 23:07:06 UTC Gerrit is available again.
  • 2015-04-17 22:09:51 UTC Gerrit is unavailable until 23:59 UTC for project renames and a database update.
  • 2015-04-17 22:05:40 UTC Gerrit is unavailable until 23:59 UTC for project renames and a database update.
  • 2015-04-17 21:05:41 UTC Gerrit will be unavailable between 22:00 and 23:59 UTC for project renames and a database update.
  • 2015-04-16 19:48:11 UTC gerrit has been restarted to clear a problem with its event stream. any gerrit changes updated or approved between 19:14 and 19:46 utc will need to be rechecked or have their approval reapplied for zuul to pick them up
  • 2015-04-15 18:27:55 UTC Gerrit has been restarted. New patches, approvals, and rechecks between 17:30 and 18:20 UTC may have been missed by Zuul and will need rechecks or new approvals added.
  • 2015-04-15 18:05:15 UTC Gerrit has stopped emitting events so Zuul is not alerted to changes. We will restart Gerrit shortly to correct the problem.
  • 2015-04-10 15:45:54 UTC gerrit has been restarted to address a hung event stream. change events between 15:00 and 15:43 utc which were lost will need to be rechecked or have approval workflow votes reapplied for zuul to act on them
  • 2015-04-06 11:40:08 UTC gerrit has been restarted to restore event streaming. any change events missed by zuul (between 10:56 and 11:37 utc) will need to be rechecked or have new approval votes set
  • 2015-04-01 13:29:44 UTC gerrit has been restarted to restore event streaming. any change events missed by zuul (between 12:48 and 13:28 utc) will need to be rechecked or have new approval votes set
  • 2015-03-31 11:51:33 UTC Check/Gate unstuck, feel free to recheck your abusively-failed changes.
  • 2015-03-31 08:55:59 UTC CI Check/Gate pipelines currently stuck due to a bad dependency creeping in the system. No need to recheck your patches at the moment.
  • 2015-03-27 22:06:32 UTC Gerrit is offline for maintenance, ETA 22:30 UTC http://lists.openstack.org/pipermail/openstack-dev/2015-March/059948.html
  • 2015-03-27 21:02:04 UTC Gerrit maintenance commences in 1 hour at 22:00 UTC http://lists.openstack.org/pipermail/openstack-dev/2015-March/059948.html
  • 2015-03-26 13:13:33 UTC gerrit stopped emitting stream events around 11:30 utc and has now been restarted. please recheck any changes currently missing results from jenkins
  • 2015-03-21 16:07:02 UTC Gerrit is back online
  • 2015-03-21 15:08:01 UTC Gerrit is offline for scheduled maintenance || http://lists.openstack.org/pipermail/openstack-infra/2015-March/002540.html
  • 2015-03-21 14:54:23 UTC Gerrit will be offline starting at 1500 UTC for scheduled maintenance
  • 2015-03-04 17:17:49 UTC Issue solved, gate slowly digesting accumulated changes
  • 2015-03-04 08:32:42 UTC Zuul check queue stuck due to reboot maintenance window at one of our cloud providers - no need to recheck changes at the moment, they won't move forward.
  • 2015-01-30 19:32:23 UTC Gerrit is back online
  • 2015-01-30 19:10:04 UTC Gerrit and Zuul are offline until 1930 UTC for project renames
  • 2015-01-30 18:43:57 UTC Gerrit and Zuul will be offline from 1900 to 1930 UTC for project renames
  • 2015-01-30 16:15:03 UTC zuul is running again and changes have been reenqueud. seehttp://status.openstack.org/zuul/ before rechecking if in doubt
  • 2015-01-30 14:26:56 UTC zuul isn't running jobs since ~10:30 utc, investigation underway
  • 2015-01-27 17:54:45 UTC Gerrit and Zuul will be offline for a few minutes for a security update
  • 2015-01-20 19:54:47 UTC Gerrit restarted to address likely memory leak leading to server slowness. Sorry if you were caught in the restart
  • 2015-01-09 18:59:29 UTC paste.openstack.org is going offline for a database migration (duration: ~2 minutes)
  • 2014-12-06 16:06:03 UTC gerrit will be offline for 30 minutes while we rename a few projects. eta 16:30 utc
  • 2014-12-06 15:21:31 UTC [reminder] gerrit will be offline for 30 minutes starting at 16:00 utc for project renames
  • 2014-11-22 00:33:53 UTC Gating and log storage offline due to block device error. Recovery in progress, ETA unknown.
  • 2014-11-21 21:46:58 UTC gating is going offline while we deal with a broken block device, eta unknown
  • 2014-10-29 20:58:17 UTC Restarting gerrit to get fixed CI javascript
  • 2014-10-20 21:22:38 UTC Zuul erroneously marked some changes as having merge conflicts. Those changes have been added to the check queue to be rechecked and will be automatically updated when complete.
  • 2014-10-17 21:27:06 UTC Gerrit is back online
  • 2014-10-17 21:04:39 UTC Gerrit is offline from 2100-2130 for project renames
  • 2014-10-17 20:35:12 UTC Gerrit will be offline from 2100-2130 for project renames
  • 2014-10-17 17:04:01 UTC upgraded wiki.openstack.org from Mediawiki 1.24wmf19 to 1.25wmf4 per http://ci.openstack.org/wiki.html
  • 2014-10-16 16:20:43 UTC An error in a configuration change to mitigate the poodle vulnerability caused a brief outage of git.openstack.org from 16:06-16:12. The problem has been corrected and git.openstack.org is working again.
  • 2014-09-24 21:59:06 UTC The openstack-infra/config repo will be frozen for project-configuration changes starting at 00:01 UTC. If you have a pending configuration change that has not merged or is not in the queue, please see us in #openstack-infra.
  • 2014-09-24 13:43:48 UTC removed 79 disassociated floating ips in hpcloud
  • 2014-09-22 15:52:51 UTC removed 431 disassociated floating ips in hpcloud
  • 2014-09-22 15:52:23 UTC killed bandersnatch process on pypi.region-b.geo-1.openstack.org, hung since 2014-09-18 22:45 due to https://bitbucket.org/pypa/bandersnatch/issue/52
  • 2014-09-22 15:51:21 UTC restarted gerritbot to get it to rejoin channels
  • 2014-09-19 20:53:18 UTC Gerrit is back online
  • 2014-09-19 20:17:08 UTC Gerrit will be offline from 20:30 to 20:50 UTC for project renames
  • 2014-09-16 13:38:01 UTC jenkins ran out of jvm memory on jenkins06 at 01:42:20 http://paste.openstack.org/show/112155/
  • 2014-09-14 18:13:14 UTC all our pypi mirrors failed to update urllib3 properly, full mirror refresh underway now to correct, eta 20:00 utc
  • 2014-09-13 15:10:34 UTC shutting down all irc bots now to change their passwords (per the wallops a few minutes ago, everyone should do the same)
  • 2014-09-13 14:54:19 UTC rebooted puppetmaster.openstack.org due to out-of-memory condition
  • 2014-08-30 16:08:43 UTC Gerrit is offline for project renaming maintenance, ETA 1630
  • 2014-08-25 17:12:51 UTC restarted gerritbot
  • 2014-08-16 16:30:38 UTC Gerrit is offline for project renames. ETA 1645.
  • 2014-07-26 18:28:21 UTC Zuul has been restarted to move it beyond a change it was failing to report on
  • 2014-07-23 22:08:12 UTC zuul is working through a backlog of jobs due to an earlier problem with nodepool
  • 2014-07-23 20:42:47 UTC nodepool is unable to build test nodes so check and gate tests are delayed
  • 2014-07-15 18:23:58 UTC python2.6 jobs are failing due to bug 1342262 "virtualenv>=1.9.1 not found" A fix is out but there are still nodes built on the old stale images
  • 2014-06-28 14:40:16 UTC Gerrit will be offline from 1500-1515 UTC for project renames
  • 2014-06-15 15:30:13 UTC Launchpad is OK - statusbot lost the old channel statuses. They will need to be manually restored
  • 2014-06-15 02:32:57 UTC launchpad openid is down. login to openstack services will fail until launchpad openid is happy again
  • 2014-06-02 14:17:51 UTC setuptools issue was fixed in upstream in 3.7.1 and 4.0.1, please, recheck on bug 1325514
  • 2014-06-02 08:33:19 UTC setuptools upstream has broken the world. it's a known issue. we're hoping that a solution materializes soon
  • 2014-05-29 20:41:04 UTC Gerrit is back online
  • 2014-05-29 20:22:30 UTC Gerrit is going offline to correct an issue with a recent project rename. ETA 20:45 UTC.
  • 2014-05-28 00:08:31 UTC zuul is using a manually installed "gear" library with the timeout and logging changes
  • 2014-05-27 22:11:41 UTC Zuul is started and processing changes that were in the queue when it was stopped. Changes uploaded or approved since then will need to be re-approved or rechecked.
  • 2014-05-27 21:34:45 UTC Zuul is offline due to an operational issue; ETA 2200 UTC.
  • 2014-05-26 22:31:12 UTC stopping gerrit briefly to rebuild its search index in an attempt to fix post-rename oddities (will update with notices every 10 minutes until completed)
  • 2014-05-23 21:36:49 UTC Gerrit is offline in order to rename some projects. ETA: 22:00.
  • 2014-05-23 20:34:36 UTC Gerrit will be offline for about 20 minutes in order to rename some projects starting at 21:00 UTC.
  • 2014-05-09 16:44:31 UTC New contributors can't complete enrollment due to https://launchpad.net/bugs/1317957 (Gerrit is having trouble reaching the Foundation Member system)
  • 2014-05-07 13:12:58 UTC Zuul is processing changes now; some results were lost. Use "recheck bug 1317089" if needed.
  • 2014-05-07 13:04:11 UTC Zuul is stuck due to earlier networking issues with Gerrit server, work in progress.
  • 2014-05-02 23:27:29 UTC paste.openstack.org is going down for a short database upgrade
  • 2014-05-02 22:00:08 UTC Zuul is being restarted with some dependency upgrades and configuration changes; ETA 2215
  • 2014-05-01 00:06:18 UTC the gate is still fairly backed up, though nodepool is back on track and chipping away at remaining changes. some py3k/pypy node starvation is slowing recovery
  • 2014-04-30 20:26:57 UTC the gate is backed up due to broken nodepool images, fix in progress (eta 22:00 utc)
  • 2014-04-28 19:33:21 UTC Gerrit upgrade to 2.8 complete. See: https://wiki.openstack.org/wiki/GerritUpgrade Some cleanup tasks still ongoing; join #openstack-infra if you have any questions.
  • 2014-04-28 16:38:31 UTC Gerrit is unavailable until further notice for a major upgrade. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-28 15:31:50 UTC Gerrit downtime for upgrade begins in 30 minutes. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-28 14:31:51 UTC Gerrit downtime for upgrade begins in 90 minutes. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-25 20:59:57 UTC Gerrit will be unavailable for a few hours starting at 1600 UTC on Monday April 28th for an upgrade. See https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-25 17:17:55 UTC Gerrit will be unavailable for a few hours starting at 1600 UTC on Monday April 28th for an upgrade. See https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-16 00:00:14 UTC Restarting gerrit really quick to fix replication issue
  • 2014-04-08 01:33:50 UTC All services should be back up
  • 2014-04-08 00:22:30 UTC All of the project infrastructure hosts are being restarted for security updates.
  • 2014-03-25 13:30:44 UTC the issue with gerrit cleared on its own before any corrective action was taken
  • 2014-03-25 13:22:16 UTC the gerrit event stream is currently hung, blocking all testing. troubleshooting is in progress (next update at 14:00 utc)
  • 2014-03-12 12:24:44 UTC gerrit on review.openstack.org is down for maintenance (revised eta to resume is 13:00 utc)
  • 2014-03-12 12:07:18 UTC gerrit on review.openstack.org is down for maintenance (eta to resume is 12:30 utc)
  • 2014-03-12 11:28:08 UTC test/gate jobs are queuing now in preparation for gerrit maintenance at 12:00 utc (eta to resume is 12:30 utc)
  • 2014-02-26 22:25:55 UTC gerrit service on review.openstack.org will be down momentarily for a another brief restart--apologies for the disruption
  • 2014-02-26 22:13:11 UTC gerrit service on review.openstack.org will be down momentarily for a restart to add an additional git server
  • 2014-02-21 17:36:50 UTC Git-related build issues should be resolved. If your job failed with no build output, use "recheck bug 1282876".
  • 2014-02-21 16:34:23 UTC Some builds are failing due to errors in worker images; fix eta 1700 UTC.
  • 2014-02-20 23:41:09 UTC A transient error caused Zuul to report jobs as LOST; if you were affected, leave a comment with "recheck no bug"
  • 2014-02-18 23:33:18 UTC Gerrit login issues should be resolved.
  • 2014-02-13 22:35:01 UTC restarting zuul for a configuration change
  • 2014-02-10 16:21:11 UTC jobs are running for changes again, but there's a bit of a backlog so it will still probably take a few hours for everything to catch up
  • 2014-02-10 15:16:33 UTC the gate is experiencing delays due to nodepool resource issues (fix in progress, eta 16:00 utc)
  • 2014-02-07 20:10:08 UTC Gerrit and Zuul are offline for project renames. ETA 20:30 UTC.
  • 2014-02-07 18:59:03 UTC Zuul is now in queue-only mode preparing for project renames at 20:00 UTC
  • 2014-02-07 17:35:36 UTC Gerrit and Zuul going offline at 20:00 UTC for ~15mins for project renames
  • 2014-02-07 17:34:07 UTC Gerrit and Zuul going offline at 20:00 UTC for ~15mins for project renames
  • 2014-01-29 17:09:18 UTC the gate is merging changes again... issues with tox/virtualenv versions can be rechecked or reverified against bug 1274135
  • 2014-01-29 14:37:42 UTC most tests are failing as a result of new tox and testtools releases (bug 1274135, in progress)
  • 2014-01-29 14:25:35 UTC most tests are failing as a result of new tox and testtools releases--investigation in progress
  • 2014-01-24 21:55:40 UTC Zuul is restarting to pick up a bug fix
  • 2014-01-24 21:39:11 UTC Zuul is ignoring some enqueue events; fix in progress
  • 2014-01-24 16:13:31 UTC restarted gerritbot because it seemed to be on the wrong side of a netsplit
  • 2014-01-23 23:51:14 UTC Zuul is being restarted for an upgrade
  • 2014-01-22 20:51:44 UTC Zuul is about to restart for an upgrade; changes will be re-enqueued
  • 2014-01-17 19:13:32 UTC zuul.openstack.org underwent maintenance today from 16:50 to 19:00 UTC, so any changes approved during that timeframe should be reapproved so as to be added to the gate. new patchsets uploaded for those two hours should be rechecked (no bug) if test results are desired
  • 2014-01-14 12:29:06 UTC Gate currently blocked due to slave node exhaustion
  • 2014-01-07 16:47:29 UTC unit tests seem to be passing consistently after the upgrade. use bug 1266711 for related rechecks
  • 2014-01-07 14:51:19 UTC working on undoing the accidental libvirt upgrade which is causing nova and keystone unit test failures (ETA 15:30 UTC)
  • 2014-01-06 21:20:09 UTC gracefully stopping jenkins01 now. it has many nodes which are offline status and only a handful online, while nodepool thinks it has ~90 available to run jobs
  • 2014-01-06 19:37:28 UTC gracefully stopping jenkins02 now. it has many nodes which are offline status and only a handful online, while nodepool thinks it has ~75 available to run jobs
  • 2014-01-06 19:36:12 UTC gating is operating at reduced capacity while we work through a systems problem (ETA 21:00 UTC)
  • 2014-01-03 00:13:32 UTC see: https://etherpad.openstack.org/p/pip1.5Upgrade
  • 2014-01-02 17:07:54 UTC gating is severely hampered while we attempt to sort out the impact of today's pip 1.5/virtualenv 1.11 releases... no ETA for solution yet
  • 2014-01-02 16:58:35 UTC gating is severely hampered while we attempt to sort out the impact of the pip 1.5 release... no ETA for solution yet
  • 2013-12-24 06:11:50 UTC fix for grenade euca/bundle failures is in the gate. changes failing on those issues in the past 7 hours should be rechecked or reverified against bug 1263824
  • 2013-12-24 05:31:47 UTC gating is currently wedged by consistent grenade job failures--proposed fix is being confirmed now--eta 06:00 utc
  • 2013-12-13 17:21:56 UTC restarted gerritbot
  • 2013-12-11 21:35:29 UTC test
  • 2013-12-11 21:34:09 UTC test
  • 2013-12-11 21:20:28 UTC test
  • 2013-12-11 18:03:36 UTC Grenade gate infra issues: use "reverify bug 1259911"
  • 2013-12-06 17:05:12 UTC i'm running statusbot in screen to try to catch why it dies after a while.
  • 2013-12-04 18:34:41 UTC gate failures due to django incompatibility, pip bugs, and node performance problems
  • 2013-12-03 16:56:59 UTC docs jobs are failing due to a full filesystem; fix eta 1750 UTC
  • 2013-11-26 14:25:11 UTC Gate should be unwedged now, thanks for your patience
  • 2013-11-26 11:29:13 UTC Gate wedged - Most Py26 jobs fail currently (https://bugs.launchpad.net/openstack-ci/+bug/1255041)
  • 2013-11-20 22:45:24 UTC Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html
  • 2013-11-06 00:03:44 UTC filesystem resize complete, logs uploading successfully again in the past few minutes--feel free to 'recheck no bug' or 'reverify no bug' if your change failed jobs with an "unstable" result
  • 2013-11-05 23:31:13 UTC Out of disk space on log server, blocking test result uploads--fix in progress, eta 2400 utc
  • 2013-10-13 16:25:59 UTC etherpad migration complete
  • 2013-10-13 16:05:03 UTC etherpad is down for software upgrade and migration
  • 2013-10-11 16:36:06 UTC the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue
  • 2013-10-11 14:14:17 UTC The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC
  • 2013-10-11 12:48:07 UTC Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)
  • 2013-10-05 17:01:06 UTC puppet disabled on nodepool due to manually reverting gearman change
  • 2013-10-05 16:03:13 UTC Gerrit will be down for maintenance from 1600-1630 UTC
  • 2013-10-05 15:34:37 UTC Zuul is shutting down for Gerrit downtime from 1600-1630 UTC
  • 2013-10-02 09:54:09 UTC Jenkins01 is not failing, it's just very slow at the moment... so the gate is not completely stuck.
  • 2013-10-02 09:46:39 UTC One of our Jenkins masters is failing to return results, so the gate is currently stuck.
  • 2013-09-24 15:48:07 UTC changes seem to be making it through the gate once more, and so it should be safe to "recheck bug 1229797" or "reverify bug 1229797" on affected changes as needed
  • 2013-09-24 13:30:07 UTC dependency problems in gating, currently under investigation... more news as it unfolds
  • 2013-08-27 20:35:24 UTC Zuul has been restarted
  • 2013-08-27 20:10:38 UTC zuul is offline because of a pbr-related installation issue
  • 2013-08-24 22:40:08 UTC Zuul and nodepool are running again; rechecks have been issued (but double check your patch in case it was missed)
  • 2013-08-24 22:04:36 UTC Zuul and nodepool are being restarted
  • 2013-08-23 17:53:29 UTC recent UNSTABLE jobs were due to maintenance to expand capacity which is complete; recheck or reverify as needed
  • 2013-08-22 18:12:24 UTC stopping gerrit to correct a stackforge project rename error
  • 2013-08-22 17:55:56 UTC restarting gerrit to pick up a configuration change
  • 2013-08-22 15:06:03 UTC Zuul has been restarted; leave 'recheck no bug' or 'reverify no bug' comments to re-enqueue.
  • 2013-08-22 01:38:31 UTC Zuul is running again
  • 2013-08-22 01:02:06 UTC Zuul is offline for troubleshooting
  • 2013-08-21 21:10:59 UTC Restarting zuul, changes should be automatically re-enqueued
  • 2013-08-21 16:32:30 UTC LOST jobs are due to a known bug; use "recheck no bug"
  • 2013-08-19 20:27:37 UTC gate-grenade-devstack-vm is currently failing preventing merges. Proposed fix: https://review.openstack.org/#/c/42720/
  • 2013-08-16 13:53:35 UTC the gate seems to be properly moving now, but some changes which were in limbo earlier are probably going to come back with negative votes now. rechecking/reverifying those too
  • 2013-08-16 13:34:05 UTC the earlier log server issues seem to have put one of the jenkins servers in a bad state, blocking the gate--working on that, ETA 14:00 UTC
  • 2013-08-16 12:41:10 UTC still rechecking/reverifying false negative results on changes, but the gate is moving again
  • 2013-08-16 12:00:34 UTC log server has a larger filesystem now--rechecking/reverifying jobs, ETA 12:30 UTC
  • 2013-08-16 12:00:22 UTC server has a larger filesystem now--rechecking/reverifying jobs, ETA 12:30 UTC
  • 2013-08-16 11:21:47 UTC the log server has filled up, disrupting job completion--working on it now, ETA 12:30 UTC
  • 2013-08-16 11:07:34 UTC some sort of gating disruption has been identified--looking into it now
  • 2013-07-28 15:30:29 UTC restarted zuul to upgrade
  • 2013-07-28 00:25:57 UTC restarted jenkins to update scp plugin
  • 2013-07-26 14:19:34 UTC Performing maintenance on docs-draft site, unstable docs jobs expected for the next few minutes; use "recheck no bug"
  • 2013-07-20 18:38:03 UTC devstack gate should be back to normal
  • 2013-07-20 17:02:31 UTC devstack-gate jobs broken due to setuptools brokenness; fix in progress.
  • 2013-07-20 01:41:30 UTC replaced ssl certs for jenkins, review, wiki, and etherpad
  • 2013-07-19 23:47:31 UTC Project affected by the xattr cffi dependency issues should be able to run tests and have them pass. xattr has been fixed and the new version is on our mirror.
  • 2013-07-19 22:23:27 UTC Projects with a dependency on xattr are failing tests due to unresolved xattr dependencies. Fix should be in shortly
  • 2013-07-17 20:33:39 UTC Jenkins is running jobs again, some jobs are marked as UNSTABLE; fix in progress
  • 2013-07-17 18:43:20 UTC Zuul is queueing jobs while Jenkins is restarted for a security update
  • 2013-07-17 18:32:50 UTC Gerrit security updates have been applied
  • 2013-07-17 17:38:19 UTC Gerrit is being restarted to apply a security update
  • 2013-07-16 01:30:52 UTC Zuul is back up and outstanding changes have been re-enqueued in the gate queue.
  • 2013-07-16 00:23:27 UTC Zuul is down for an emergency load-related server upgrade. ETA 01:30 UTC.
  • 2013-07-06 16:29:49 UTC Neutron project rename in progress; see https://wiki.openstack.org/wiki/Network/neutron-renaming
  • 2013-07-06 16:29:32 UTC Gerrit and Zuul are back online, neutron rename still in progress
  • 2013-07-06 16:02:38 UTC Gerrit and Zuul are offline for neutron project rename; ETA 1630 UTC; see https://wiki.openstack.org/wiki/Network/neutron-renaming
  • 2013-06-14 23:28:41 UTC Zuul and Jenkins are back up (but somewhat backlogged). See http://status.openstack.org/zuul/
  • 2013-06-14 20:42:30 UTC Gerrit is back in service. Zuul and Jenkins are offline for further maintenance (ETA 22:00 UTC)
  • 2013-06-14 20:36:49 UTC Gerrit is back in service. Zuul and Jenkins are offline for further maintenance (ETA 22:00)
  • 2013-06-14 20:00:58 UTC Gerrit, Zuul and Jenkins are offline for maintenance (ETA 30 minutes)
  • 2013-06-14 18:29:37 UTC Zuul/Jenkins are gracefully shutting down in preparation for today's 20:00 UTC maintenance
  • 2013-06-11 17:32:14 UTC pbr 0.5.16 has been released and the gate should be back in business
  • 2013-06-11 16:00:10 UTC pbr change broke the gate, a fix is forthcoming
  • 2013-06-06 21:00:45 UTC jenkins log server is fixed; new builds should complete, old logs are being copied over slowly (you may encounter 404 errors following older links to logs.openstack.org until this completes)
  • 2013-06-06 19:38:01 UTC gating is currently broken due to a full log server (ETA 30 minutes)
  • 2013-05-16 20:02:47 UTC Gerrit, Zuul, and Jenkins are back online.
  • 2013-05-16 18:57:28 UTC Gerrit, Zuul, and Jenkins will all be shutting down for reboots at approximately 19:10 UTC.
  • 2013-05-16 18:46:38 UTC wiki.openstack.org and lists.openstack.org are back online
  • 2013-05-16 18:37:52 UTC wiki.openstack.org and lists.openstack.org are being rebooted. downtime should be < 5 min.
  • 2013-05-16 18:36:23 UTC eavesdrop.openstack.org is back online
  • 2013-05-16 18:31:14 UTC eavesdrop.openstack.org is being rebooted. downtime should be less than 5 minutes.
  • 2013-05-15 05:32:26 UTC upgraded gerrit to gerrit-2.4.2-17 to address a security issue: http://gerrit-documentation.googlecode.com/svn/ReleaseNotes/ReleaseNotes-2.5.3.html#_security_fixes
  • 2013-05-14 18:32:07 UTC gating is catching up queued jobs now and should be back to normal shortly (eta 30 minutes)
  • 2013-05-14 17:55:44 UTC gating is broken for a bit while we replace jenkins slaves (eta 30 minutes)
  • 2013-05-14 17:06:56 UTC gating is broken for a bit while we replace jenkins slaves (eta 30 minutes)
  • 2013-05-04 16:31:22 UTC lists.openstack.org and eavesdrop.openstack.org are back in service
  • 2013-05-04 16:19:45 UTC test
  • 2013-05-04 15:58:36 UTC eavesdrop and lists.openstack.org are offline for server upgrades and moves. ETA 1700 UTC.
  • 2013-05-02 20:20:45 UTC Jenkins is in shutdown mode so that we may perform an upgrade; builds will be delayed but should not be lost.
  • 2013-04-26 18:04:19 UTC We just added AAAA records (IPv6 addresses) to review.openstack.org and jenkins.openstack.org.
  • 2013-04-25 18:25:41 UTC meetbot is back on and confirmed to be working properly again... apologies for the disruption
  • 2013-04-25 17:40:34 UTC meetbot is on the wrong side of a netsplit; infra is working on getting it back
  • 2013-04-08 18:09:34 UTC A review.o.o repo needed to be reseeded for security reasons. To ensure that a force push did not miss anything a nuke from orbit approach was taken instead. Gerrit was stopped, old bad repo was removed, new good repo was added, and Gerrit was started again.
  • 2013-04-08 17:50:57 UTC The infra team is restarting Gerrit for git repo maintenance. If Gerrit is not responding please try again in a few minutes.
  • 2013-04-03 01:07:50 UTC https://review.openstack.org/#/c/25939/ should fix the prettytable dependency problem when merged (https://bugs.launchpad.net/nova/+bug/1163631)
  • 2013-04-03 00:48:01 UTC Restarting gerrit to try to correct an error condition in the stackforge/diskimage-builder repo
  • 2013-03-29 23:01:04 UTC Testing alert status
  • 2013-03-29 22:58:24 UTC Testing statusbot
  • 2013-03-28 13:32:02 UTC Everything is okay now.