Jump to: navigation, search

Difference between revisions of "Infrastructure Status"

Line 1: Line 1:
 +
* 2014-09-13 14:54:19 UTC rebooted puppetmaster.openstack.org due to out-of-memory condition
 
* 2014-08-30 16:08:43 UTC Gerrit is offline for project renaming maintenance, ETA 1630
 
* 2014-08-30 16:08:43 UTC Gerrit is offline for project renaming maintenance, ETA 1630
 
* 2014-08-25 17:12:51 UTC restarted gerritbot
 
* 2014-08-25 17:12:51 UTC restarted gerritbot

Revision as of 14:54, 13 September 2014

  • 2014-09-13 14:54:19 UTC rebooted puppetmaster.openstack.org due to out-of-memory condition
  • 2014-08-30 16:08:43 UTC Gerrit is offline for project renaming maintenance, ETA 1630
  • 2014-08-25 17:12:51 UTC restarted gerritbot
  • 2014-08-16 16:30:38 UTC Gerrit is offline for project renames. ETA 1645.
  • 2014-07-26 18:28:21 UTC Zuul has been restarted to move it beyond a change it was failing to report on
  • 2014-07-23 22:08:12 UTC zuul is working through a backlog of jobs due to an earlier problem with nodepool
  • 2014-07-23 20:42:47 UTC nodepool is unable to build test nodes so check and gate tests are delayed
  • 2014-07-15 18:23:58 UTC python2.6 jobs are failing due to bug 1342262 "virtualenv>=1.9.1 not found" A fix is out but there are still nodes built on the old stale images
  • 2014-06-28 14:40:16 UTC Gerrit will be offline from 1500-1515 UTC for project renames
  • 2014-06-15 15:30:13 UTC Launchpad is OK - statusbot lost the old channel statuses. They will need to be manually restored
  • 2014-06-15 02:32:57 UTC launchpad openid is down. login to openstack services will fail until launchpad openid is happy again
  • 2014-06-02 14:17:51 UTC setuptools issue was fixed in upstream in 3.7.1 and 4.0.1, please, recheck on bug 1325514
  • 2014-06-02 08:33:19 UTC setuptools upstream has broken the world. it's a known issue. we're hoping that a solution materializes soon
  • 2014-05-29 20:41:04 UTC Gerrit is back online
  • 2014-05-29 20:22:30 UTC Gerrit is going offline to correct an issue with a recent project rename. ETA 20:45 UTC.
  • 2014-05-28 00:08:31 UTC zuul is using a manually installed "gear" library with the timeout and logging changes
  • 2014-05-27 22:11:41 UTC Zuul is started and processing changes that were in the queue when it was stopped. Changes uploaded or approved since then will need to be re-approved or rechecked.
  • 2014-05-27 21:34:45 UTC Zuul is offline due to an operational issue; ETA 2200 UTC.
  • 2014-05-26 22:31:12 UTC stopping gerrit briefly to rebuild its search index in an attempt to fix post-rename oddities (will update with notices every 10 minutes until completed)
  • 2014-05-23 21:36:49 UTC Gerrit is offline in order to rename some projects. ETA: 22:00.
  • 2014-05-23 20:34:36 UTC Gerrit will be offline for about 20 minutes in order to rename some projects starting at 21:00 UTC.
  • 2014-05-09 16:44:31 UTC New contributors can't complete enrollment due to https://launchpad.net/bugs/1317957 (Gerrit is having trouble reaching the Foundation Member system)
  • 2014-05-07 13:12:58 UTC Zuul is processing changes now; some results were lost. Use "recheck bug 1317089" if needed.
  • 2014-05-07 13:04:11 UTC Zuul is stuck due to earlier networking issues with Gerrit server, work in progress.
  • 2014-05-02 23:27:29 UTC paste.openstack.org is going down for a short database upgrade
  • 2014-05-02 22:00:08 UTC Zuul is being restarted with some dependency upgrades and configuration changes; ETA 2215
  • 2014-05-01 00:06:18 UTC the gate is still fairly backed up, though nodepool is back on track and chipping away at remaining changes. some py3k/pypy node starvation is slowing recovery
  • 2014-04-30 20:26:57 UTC the gate is backed up due to broken nodepool images, fix in progress (eta 22:00 utc)
  • 2014-04-28 19:33:21 UTC Gerrit upgrade to 2.8 complete. See: https://wiki.openstack.org/wiki/GerritUpgrade Some cleanup tasks still ongoing; join #openstack-infra if you have any questions.
  • 2014-04-28 16:38:31 UTC Gerrit is unavailable until further notice for a major upgrade. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-28 15:31:50 UTC Gerrit downtime for upgrade begins in 30 minutes. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-28 14:31:51 UTC Gerrit downtime for upgrade begins in 90 minutes. See: https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-25 20:59:57 UTC Gerrit will be unavailable for a few hours starting at 1600 UTC on Monday April 28th for an upgrade. See https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-25 17:17:55 UTC Gerrit will be unavailable for a few hours starting at 1600 UTC on Monday April 28th for an upgrade. See https://wiki.openstack.org/wiki/GerritUpgrade
  • 2014-04-16 00:00:14 UTC Restarting gerrit really quick to fix replication issue
  • 2014-04-08 01:33:50 UTC All services should be back up
  • 2014-04-08 00:22:30 UTC All of the project infrastructure hosts are being restarted for security updates.
  • 2014-03-25 13:30:44 UTC the issue with gerrit cleared on its own before any corrective action was taken
  • 2014-03-25 13:22:16 UTC the gerrit event stream is currently hung, blocking all testing. troubleshooting is in progress (next update at 14:00 utc)
  • 2014-03-12 12:24:44 UTC gerrit on review.openstack.org is down for maintenance (revised eta to resume is 13:00 utc)
  • 2014-03-12 12:07:18 UTC gerrit on review.openstack.org is down for maintenance (eta to resume is 12:30 utc)
  • 2014-03-12 11:28:08 UTC test/gate jobs are queuing now in preparation for gerrit maintenance at 12:00 utc (eta to resume is 12:30 utc)
  • 2014-02-26 22:25:55 UTC gerrit service on review.openstack.org will be down momentarily for a another brief restart--apologies for the disruption
  • 2014-02-26 22:13:11 UTC gerrit service on review.openstack.org will be down momentarily for a restart to add an additional git server
  • 2014-02-21 17:36:50 UTC Git-related build issues should be resolved. If your job failed with no build output, use "recheck bug 1282876".
  • 2014-02-21 16:34:23 UTC Some builds are failing due to errors in worker images; fix eta 1700 UTC.
  • 2014-02-20 23:41:09 UTC A transient error caused Zuul to report jobs as LOST; if you were affected, leave a comment with "recheck no bug"
  • 2014-02-18 23:33:18 UTC Gerrit login issues should be resolved.
  • 2014-02-13 22:35:01 UTC restarting zuul for a configuration change
  • 2014-02-10 16:21:11 UTC jobs are running for changes again, but there's a bit of a backlog so it will still probably take a few hours for everything to catch up
  • 2014-02-10 15:16:33 UTC the gate is experiencing delays due to nodepool resource issues (fix in progress, eta 16:00 utc)
  • 2014-02-07 20:10:08 UTC Gerrit and Zuul are offline for project renames. ETA 20:30 UTC.
  • 2014-02-07 18:59:03 UTC Zuul is now in queue-only mode preparing for project renames at 20:00 UTC
  • 2014-02-07 17:35:36 UTC Gerrit and Zuul going offline at 20:00 UTC for ~15mins for project renames
  • 2014-02-07 17:34:07 UTC Gerrit and Zuul going offline at 20:00 UTC for ~15mins for project renames
  • 2014-01-29 17:09:18 UTC the gate is merging changes again... issues with tox/virtualenv versions can be rechecked or reverified against bug 1274135
  • 2014-01-29 14:37:42 UTC most tests are failing as a result of new tox and testtools releases (bug 1274135, in progress)
  • 2014-01-29 14:25:35 UTC most tests are failing as a result of new tox and testtools releases--investigation in progress
  • 2014-01-24 21:55:40 UTC Zuul is restarting to pick up a bug fix
  • 2014-01-24 21:39:11 UTC Zuul is ignoring some enqueue events; fix in progress
  • 2014-01-24 16:13:31 UTC restarted gerritbot because it seemed to be on the wrong side of a netsplit
  • 2014-01-23 23:51:14 UTC Zuul is being restarted for an upgrade
  • 2014-01-22 20:51:44 UTC Zuul is about to restart for an upgrade; changes will be re-enqueued
  • 2014-01-17 19:13:32 UTC zuul.openstack.org underwent maintenance today from 16:50 to 19:00 UTC, so any changes approved during that timeframe should be reapproved so as to be added to the gate. new patchsets uploaded for those two hours should be rechecked (no bug) if test results are desired
  • 2014-01-14 12:29:06 UTC Gate currently blocked due to slave node exhaustion
  • 2014-01-07 16:47:29 UTC unit tests seem to be passing consistently after the upgrade. use bug 1266711 for related rechecks
  • 2014-01-07 14:51:19 UTC working on undoing the accidental libvirt upgrade which is causing nova and keystone unit test failures (ETA 15:30 UTC)
  • 2014-01-06 21:20:09 UTC gracefully stopping jenkins01 now. it has many nodes which are offline status and only a handful online, while nodepool thinks it has ~90 available to run jobs
  • 2014-01-06 19:37:28 UTC gracefully stopping jenkins02 now. it has many nodes which are offline status and only a handful online, while nodepool thinks it has ~75 available to run jobs
  • 2014-01-06 19:36:12 UTC gating is operating at reduced capacity while we work through a systems problem (ETA 21:00 UTC)
  • 2014-01-03 00:13:32 UTC see: https://etherpad.openstack.org/p/pip1.5Upgrade
  • 2014-01-02 17:07:54 UTC gating is severely hampered while we attempt to sort out the impact of today's pip 1.5/virtualenv 1.11 releases... no ETA for solution yet
  • 2014-01-02 16:58:35 UTC gating is severely hampered while we attempt to sort out the impact of the pip 1.5 release... no ETA for solution yet
  • 2013-12-24 06:11:50 UTC fix for grenade euca/bundle failures is in the gate. changes failing on those issues in the past 7 hours should be rechecked or reverified against bug 1263824
  • 2013-12-24 05:31:47 UTC gating is currently wedged by consistent grenade job failures--proposed fix is being confirmed now--eta 06:00 utc
  • 2013-12-13 17:21:56 UTC restarted gerritbot
  • 2013-12-11 21:35:29 UTC test
  • 2013-12-11 21:34:09 UTC test
  • 2013-12-11 21:20:28 UTC test
  • 2013-12-11 18:03:36 UTC Grenade gate infra issues: use "reverify bug 1259911"
  • 2013-12-06 17:05:12 UTC i'm running statusbot in screen to try to catch why it dies after a while.
  • 2013-12-04 18:34:41 UTC gate failures due to django incompatibility, pip bugs, and node performance problems
  • 2013-12-03 16:56:59 UTC docs jobs are failing due to a full filesystem; fix eta 1750 UTC
  • 2013-11-26 14:25:11 UTC Gate should be unwedged now, thanks for your patience
  • 2013-11-26 11:29:13 UTC Gate wedged - Most Py26 jobs fail currently (https://bugs.launchpad.net/openstack-ci/+bug/1255041)
  • 2013-11-20 22:45:24 UTC Please refrain from approving changes that don't fix gate-blocking issues -- http://lists.openstack.org/pipermail/openstack-dev/2013-November/019941.html
  • 2013-11-06 00:03:44 UTC filesystem resize complete, logs uploading successfully again in the past few minutes--feel free to 'recheck no bug' or 'reverify no bug' if your change failed jobs with an "unstable" result
  • 2013-11-05 23:31:13 UTC Out of disk space on log server, blocking test result uploads--fix in progress, eta 2400 utc
  • 2013-10-13 16:25:59 UTC etherpad migration complete
  • 2013-10-13 16:05:03 UTC etherpad is down for software upgrade and migration
  • 2013-10-11 16:36:06 UTC the gate is moving again for the past half hour or so--thanks for your collective patience while we worked through the issue
  • 2013-10-11 14:14:17 UTC The Infrastructure team is working through some devstack node starvation issues which is currently holding up gating and slowing checks. ETA 1600 UTC
  • 2013-10-11 12:48:07 UTC Gate is currently stuck (probably due to networking issues preventing new test nodes from being spun)
  • 2013-10-05 17:01:06 UTC puppet disabled on nodepool due to manually reverting gearman change
  • 2013-10-05 16:03:13 UTC Gerrit will be down for maintenance from 1600-1630 UTC
  • 2013-10-05 15:34:37 UTC Zuul is shutting down for Gerrit downtime from 1600-1630 UTC
  • 2013-10-02 09:54:09 UTC Jenkins01 is not failing, it's just very slow at the moment... so the gate is not completely stuck.
  • 2013-10-02 09:46:39 UTC One of our Jenkins masters is failing to return results, so the gate is currently stuck.
  • 2013-09-24 15:48:07 UTC changes seem to be making it through the gate once more, and so it should be safe to "recheck bug 1229797" or "reverify bug 1229797" on affected changes as needed
  • 2013-09-24 13:30:07 UTC dependency problems in gating, currently under investigation... more news as it unfolds
  • 2013-08-27 20:35:24 UTC Zuul has been restarted
  • 2013-08-27 20:10:38 UTC zuul is offline because of a pbr-related installation issue
  • 2013-08-24 22:40:08 UTC Zuul and nodepool are running again; rechecks have been issued (but double check your patch in case it was missed)
  • 2013-08-24 22:04:36 UTC Zuul and nodepool are being restarted
  • 2013-08-23 17:53:29 UTC recent UNSTABLE jobs were due to maintenance to expand capacity which is complete; recheck or reverify as needed
  • 2013-08-22 18:12:24 UTC stopping gerrit to correct a stackforge project rename error
  • 2013-08-22 17:55:56 UTC restarting gerrit to pick up a configuration change
  • 2013-08-22 15:06:03 UTC Zuul has been restarted; leave 'recheck no bug' or 'reverify no bug' comments to re-enqueue.
  • 2013-08-22 01:38:31 UTC Zuul is running again
  • 2013-08-22 01:02:06 UTC Zuul is offline for troubleshooting
  • 2013-08-21 21:10:59 UTC Restarting zuul, changes should be automatically re-enqueued
  • 2013-08-21 16:32:30 UTC LOST jobs are due to a known bug; use "recheck no bug"
  • 2013-08-19 20:27:37 UTC gate-grenade-devstack-vm is currently failing preventing merges. Proposed fix: https://review.openstack.org/#/c/42720/
  • 2013-08-16 13:53:35 UTC the gate seems to be properly moving now, but some changes which were in limbo earlier are probably going to come back with negative votes now. rechecking/reverifying those too
  • 2013-08-16 13:34:05 UTC the earlier log server issues seem to have put one of the jenkins servers in a bad state, blocking the gate--working on that, ETA 14:00 UTC
  • 2013-08-16 12:41:10 UTC still rechecking/reverifying false negative results on changes, but the gate is moving again
  • 2013-08-16 12:00:34 UTC log server has a larger filesystem now--rechecking/reverifying jobs, ETA 12:30 UTC
  • 2013-08-16 12:00:22 UTC server has a larger filesystem now--rechecking/reverifying jobs, ETA 12:30 UTC
  • 2013-08-16 11:21:47 UTC the log server has filled up, disrupting job completion--working on it now, ETA 12:30 UTC
  • 2013-08-16 11:07:34 UTC some sort of gating disruption has been identified--looking into it now
  • 2013-07-28 15:30:29 UTC restarted zuul to upgrade
  • 2013-07-28 00:25:57 UTC restarted jenkins to update scp plugin
  • 2013-07-26 14:19:34 UTC Performing maintenance on docs-draft site, unstable docs jobs expected for the next few minutes; use "recheck no bug"
  • 2013-07-20 18:38:03 UTC devstack gate should be back to normal
  • 2013-07-20 17:02:31 UTC devstack-gate jobs broken due to setuptools brokenness; fix in progress.
  • 2013-07-20 01:41:30 UTC replaced ssl certs for jenkins, review, wiki, and etherpad
  • 2013-07-19 23:47:31 UTC Project affected by the xattr cffi dependency issues should be able to run tests and have them pass. xattr has been fixed and the new version is on our mirror.
  • 2013-07-19 22:23:27 UTC Projects with a dependency on xattr are failing tests due to unresolved xattr dependencies. Fix should be in shortly
  • 2013-07-17 20:33:39 UTC Jenkins is running jobs again, some jobs are marked as UNSTABLE; fix in progress
  • 2013-07-17 18:43:20 UTC Zuul is queueing jobs while Jenkins is restarted for a security update
  • 2013-07-17 18:32:50 UTC Gerrit security updates have been applied
  • 2013-07-17 17:38:19 UTC Gerrit is being restarted to apply a security update
  • 2013-07-16 01:30:52 UTC Zuul is back up and outstanding changes have been re-enqueued in the gate queue.
  • 2013-07-16 00:23:27 UTC Zuul is down for an emergency load-related server upgrade. ETA 01:30 UTC.
  • 2013-07-06 16:29:49 UTC Neutron project rename in progress; see https://wiki.openstack.org/wiki/Network/neutron-renaming
  • 2013-07-06 16:29:32 UTC Gerrit and Zuul are back online, neutron rename still in progress
  • 2013-07-06 16:02:38 UTC Gerrit and Zuul are offline for neutron project rename; ETA 1630 UTC; see https://wiki.openstack.org/wiki/Network/neutron-renaming
  • 2013-06-14 23:28:41 UTC Zuul and Jenkins are back up (but somewhat backlogged). See http://status.openstack.org/zuul/
  • 2013-06-14 20:42:30 UTC Gerrit is back in service. Zuul and Jenkins are offline for further maintenance (ETA 22:00 UTC)
  • 2013-06-14 20:36:49 UTC Gerrit is back in service. Zuul and Jenkins are offline for further maintenance (ETA 22:00)
  • 2013-06-14 20:00:58 UTC Gerrit, Zuul and Jenkins are offline for maintenance (ETA 30 minutes)
  • 2013-06-14 18:29:37 UTC Zuul/Jenkins are gracefully shutting down in preparation for today's 20:00 UTC maintenance
  • 2013-06-11 17:32:14 UTC pbr 0.5.16 has been released and the gate should be back in business
  • 2013-06-11 16:00:10 UTC pbr change broke the gate, a fix is forthcoming
  • 2013-06-06 21:00:45 UTC jenkins log server is fixed; new builds should complete, old logs are being copied over slowly (you may encounter 404 errors following older links to logs.openstack.org until this completes)
  • 2013-06-06 19:38:01 UTC gating is currently broken due to a full log server (ETA 30 minutes)
  • 2013-05-16 20:02:47 UTC Gerrit, Zuul, and Jenkins are back online.
  • 2013-05-16 18:57:28 UTC Gerrit, Zuul, and Jenkins will all be shutting down for reboots at approximately 19:10 UTC.
  • 2013-05-16 18:46:38 UTC wiki.openstack.org and lists.openstack.org are back online
  • 2013-05-16 18:37:52 UTC wiki.openstack.org and lists.openstack.org are being rebooted. downtime should be < 5 min.
  • 2013-05-16 18:36:23 UTC eavesdrop.openstack.org is back online
  • 2013-05-16 18:31:14 UTC eavesdrop.openstack.org is being rebooted. downtime should be less than 5 minutes.
  • 2013-05-15 05:32:26 UTC upgraded gerrit to gerrit-2.4.2-17 to address a security issue: http://gerrit-documentation.googlecode.com/svn/ReleaseNotes/ReleaseNotes-2.5.3.html#_security_fixes
  • 2013-05-14 18:32:07 UTC gating is catching up queued jobs now and should be back to normal shortly (eta 30 minutes)
  • 2013-05-14 17:55:44 UTC gating is broken for a bit while we replace jenkins slaves (eta 30 minutes)
  • 2013-05-14 17:06:56 UTC gating is broken for a bit while we replace jenkins slaves (eta 30 minutes)
  • 2013-05-04 16:31:22 UTC lists.openstack.org and eavesdrop.openstack.org are back in service
  • 2013-05-04 16:19:45 UTC test
  • 2013-05-04 15:58:36 UTC eavesdrop and lists.openstack.org are offline for server upgrades and moves. ETA 1700 UTC.
  • 2013-05-02 20:20:45 UTC Jenkins is in shutdown mode so that we may perform an upgrade; builds will be delayed but should not be lost.
  • 2013-04-26 18:04:19 UTC We just added AAAA records (IPv6 addresses) to review.openstack.org and jenkins.openstack.org.
  • 2013-04-25 18:25:41 UTC meetbot is back on and confirmed to be working properly again... apologies for the disruption
  • 2013-04-25 17:40:34 UTC meetbot is on the wrong side of a netsplit; infra is working on getting it back
  • 2013-04-08 18:09:34 UTC A review.o.o repo needed to be reseeded for security reasons. To ensure that a force push did not miss anything a nuke from orbit approach was taken instead. Gerrit was stopped, old bad repo was removed, new good repo was added, and Gerrit was started again.
  • 2013-04-08 17:50:57 UTC The infra team is restarting Gerrit for git repo maintenance. If Gerrit is not responding please try again in a few minutes.
  • 2013-04-03 01:07:50 UTC https://review.openstack.org/#/c/25939/ should fix the prettytable dependency problem when merged (https://bugs.launchpad.net/nova/+bug/1163631)
  • 2013-04-03 00:48:01 UTC Restarting gerrit to try to correct an error condition in the stackforge/diskimage-builder repo
  • 2013-03-29 23:01:04 UTC Testing alert status
  • 2013-03-29 22:58:24 UTC Testing statusbot
  • 2013-03-28 13:32:02 UTC Everything is okay now.