Difference between revisions of "CinderVictoriaMidCycleSummary"
|Line 144:||Line 144:|
* tosky - patch to os-brick bindep.txt adding a comment about updating devstack when a new bindep is added
* tosky - patch to os-brick bindep.txt adding a comment about updating devstack when a new bindep is added
===Ceph iSCSI driver===
===Ceph iSCSI driver===
Revision as of 22:54, 13 August 2020
- 1 Introduction
- 2 Session One: R-16: 24 June 2020
- 3 Session Two: R-9: 12 August 2020
- 3.1 Cinder Project Updates
- 3.2 gate issues
- 3.3 Ceph iSCSI driver
- 3.4 review the Brocade FCZM driver situation
- 3.5 support volume re-image
- 3.6 Revisit specifying availability_zone or volume_type for backup restore
- 3.7 in-flight image encryption effort update
- 3.8 Sizing encrypted volumes (continued)
- 3.9 NFS online extend
- 3.10 os-brick filters
- 3.11 backup drivers
At the Victoria (Virtual) PTG we decided to hold two mid-cycle meetings, each two hours long. The first meeting would be before the Cinder Spec Freeze, and the second would be around the New Feature Status Checkpoint. So the mid-cycle meetings will be the week of R-16 and the week of R-9 (so as not to conflict with Kubecon, which is happening at R-8).
Session One: R-16: 24 June 2020
We met in BlueJeans from 1400 to 1600 UTC.
Cinder Project Updates
New os-brick releases for stein (2.8.6) and train (2.10.4) have happened to address Bug #1883654 (fix for OSSN-0086 not working on Python 2.7). There have been some gate problems holding up the cinder releases containing the new os-brick libraries; train (15.3.0) should happen today, and stein (14.2.0) soon.
You can keep track of what's been released by looking at Launchpad:
I'm looking for a volunteer for the position of "release czar" for Cinder. You don't have to be a core contributor, you just need to be a responsible and active member of the Cinder community. Ping me in email or on IRC if you are interested in finding out more.
We agreed to try the experiment of a monthly video meeting; it will be at the regular weekly meeting time on the last Wednesday of each month (and we'll make adjustments as necessary as conflicts arise). So the first one will be 29 July. We'll take a poll about what videoconferencing software to use.
- rosmaita - send out survey about monthly video meeting
Ivan asked for some feedback for his "Backup Backends Configuration" spec, https://review.opendev.org/#/c/712301/
His question was about the Data Model, whether he should re-purpose existing tables for volume_types, since what he needs for backup_types is very similar, or whether he should add new tables. After some discussion to the team agreed that new tables would be safer and more flexible in case volume_types and backup_types diverge as they undergo development.
- e0ne - will update https://review.opendev.org/#/c/712301/
Remove Brocade FCZM Driver?
The Brocade Fibre Channel Zone Manager driver was declared 'unsupported' in Ussuri and subject to removal in Victoria by https://review.opendev.org/#/c/696857/
The vendor announced no support after Train, and no intention to support python 3: https://docs.broadcom.com/doc/12397527 (warning: opens a PDF). So we're in a weird situation in Ussuri and Victoria, because the vender explicltly denounced python 3 support, and we *only* support python 3. But we don't want to be down to only 1 FCZM, so we had agreed earlier to keep the driver in-tree but marked 'unsupported' as long as we think it will run under Python 3.
Now we have evidence that initialize-connection will fail under python 3.6 (code expects a list, gets an iterator). We don't know at this point how pervasive a problem that is in the code, and we also don't have third-party CI to validate changes. But it doesn't look great for Cinder to only have one FCZM driver. Plus, we don't know how many people will be impacted by removing it.
After some discussion, we decided to do the following:
1. rosmaita will put up a patch removing the Brocade FCZM driver, but we'll mark it as WIP.
2. Gorka will try to find some time to look into it and see if he can fix it. If he can't we'll go ahead and remove it.
3. In the meantime, rosmaita will send a note to the ML explaining the situation and that there's a removal patch and a date; hopefully, impacted people will speak up and let us know.
4. We will review the situation at part 2 of the mid-cycle (which is in roughly 7 weeks).
- rosmaita - put up WIP removal patch - https://review.opendev.org/#/c/738148/
- rosmaita - email to ML - http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015692.html
- geguileo - assess the fixability of the driver
- anyone interested - contact geguileo to find out what you can do to help
Volume List Query Optimization
This effort is being promoted by haixin. He's got a spec and a patch up:
Roughly, the problem is that if a user tries to filter the volume-detail-list for status=error, some volumes in error (namely, volumes that are in error status because the error occurred while they were being "managed") don't show up in the list. The proposal is to make all the volumes in error show up
It looked like this might require an API change, so we had some discussion of a new microversion and maybe adding some kind of flag to show all errors. And there are several interesting points you can read on the etherpad, https://etherpad.opendev.org/p/cinder-victoria-mid-cycles
I had the action item of summarizing the discussion as a comment on the spec review, so you can go there to see a summary: https://review.opendev.org/#/c/726070/
- rosmaita - summarize the discussion on the spec (done!)
- haixin - needs to explain either on the spec, on an etherpad, on the ML, or in IRC how he plans to implement the change (it looks like there are 3 or so different bugs here)
- anyone interested - it would be good to make sure that when volumes go into these "managing" statuses, user messages are being created to give users more info about what has happened
Support Revert to Any Snapshot
This topic was proposed by xuanyd, who has a spec proposed: https://review.opendev.org/#/c/736111/
Currently, Cinder only supports revert-volume-to-most-recent-snapshot. Some (many?) storage vendors support reverting to any snapshot. Cinder should do this too.
Most of the discussion was about the Cinder project's policy that if a feature can be implemented in a generic way, then it should, and backends that support an optimized version can override the generic implementation to use native support. Since there's a generic (though inefficient) way to support revert-to-any-snapshot, exposing this feature must consist of supplying a generic implementation, and then backends that support it natively can advertise that. The key point is that the community is against adding this to the API with the generic implementation raising a 'not implemented' exception.
Then the discussion turned to how the generic implementation should go. A key issue is what happens to the snapshots that are more recent than the one you just reverted your volume to. That needs to be worked out in the spec, maybe by using RBD as the reference architecture.
The question came up of how many drivers already support this natively. The Inspur MCS driver, the RBD driver, and IBM Storwize driver have been tested; looks like the Dell EMC SC Series driver should also be able to do this.
The discussion already sparked a lively discussion on the spec review, so see https://review.opendev.org/#/c/736111/ for more details.
- all interested reviewers - leave comments on the spec
Victoria Milestone-1 Review
This is a short cycle and M-1 happened last week. The current situation is that we haven't hit *any* of our targets for M-1. So the top priorities for the next 2 weeks are the patches associated with the Victoria Milestone-1 Blueprints, in particular:
- https://review.opendev.org/#/c/663549/ os-brick
- https://review.opendev.org/#/c/700799/ cinder
- https://review.opendev.org/#/c/715762/ tempest test case
- for background:
- White paper (English): https://01.org/blogs/liangfang/2020/intel%C2%AE-optane%E2%84%A2-technology-equipped-storage-solution-accelerate-china-unicom
- White paper (Chinese): https://www.intel.cn/content/www/cn/zh/architecture-and-technology/wocloud-optimized-performance-with-intel-optane-ssd.html?wapkw=%E8%81%94%E9%80%9A%E4%BA%91
- NFS encrypted volume support
- brick gpg encryption support
- new backend driver - Hitachi
- everyone - review the above!
Session Two: R-9: 12 August 2020
We met in BlueJeans from 1400 to 1600 UTC.
recording: <not yet available>
Cinder Project Updates
The next PTG scheduled for 26-30 October 2020, which is the week after the summit. There is no charge to attend, but the foundation would like you to register: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016424.html
Where we are in the Victoria cycle: https://releases.openstack.org/victoria/schedule.html
- this is week R-9 (ussuri cycle-trailing release deadline; ussuri cinderlib has been released; see https://launchpad.net/cinderlib/+series)
- 1 week to Cinder New Feature Status Checkpoint and Driver Features Declaration
- 3 weeks to final non-client library releases at R-6 (os-brick)
- 4 weeks to final client library release for Ussuri at R-5
- 4 weeks to Milestone-3 and feature freeze (R-5)
- 4 weeks to 3rd Party CI Compliance Checkpoint (R-5)
- 4 weeks to Victoria community goal completion (R-5)
- 6 weeks to RC-1 target week (R-3)
- rosmaita - email about os-brick deadline
- rosmaita - email about Cinder New Feature Status Checkpoint
- rosmaita - email about Driver Features Declaration
- rosmaita - email about 3rd Party CI compliance Checkpoint
We're currently seeing problems with cinder-tempest-plugin-lvm-lio-barbican and cinder-grenade-mn-sub-volbak jobs.
The cinder-tempest-plugin-lvm-lio-barbican failure is connected to a new bindep that was added to os-brick but isn't a direct dependency for either cinder or cinderlib. We've tried various approaches to fixing this with mixed results. Luigi finally figured out the devstack-approved way to do this: https://review.opendev.org/#/c/745838/ This looks like it has fixed the zuul job, so once the QA team approves it, we should be back in business.
Melanie Witt figured out that the cinder-grenade-mn-sub-volbak failures are caused by an update to msgpack that breaks Keystone token handling: http://lists.openstack.org/pipermail/openstack-discuss/2020-August/016446.html Once her patch is merged, cinder-grenade-mn-sub-volbak will hopefully be clear for us.
- tosky - patch to os-brick bindep.txt adding a comment about updating devstack when a new bindep is added
Ceph iSCSI driver
The tl;dr is that we'll work on getting this reviewed and merged early in Wallaby.
There are a bunch of moving parts, the driver, the CI jobs, brick changes, and the ceph project. Walt reported that things are mostly coming together.
- the driver seems to be working, but it would be good to have reviews about the code structure -- https://review.opendev.org/#/c/662829/
- the ceph iscsi CI job is timing out, but it's not clear why or where -- could use some help looking it over: https://review.opendev.org/#/c/667108/
- need eyes on this devstack-plugin-ceph patch -- https://review.opendev.org/#/c/668667/
- reminder: the cinder core team has approval privileges for devstack-plugin-ceph
The driver depends on a rbd-iscsi-client. Some current Cinder drivers also have clients whose code is in the cinder tree along with the driver code. Walt's preference is to keep this separate as the rbd-target-api may change, and this lets us update the rbd-iscsi-client internals at will to keep up with changes to rbd-target-api and the ceph-iscsi driver will continue to work. The consensus was that this makes sense and we should keep the rbd-iscsi-client separate. We'll pull it in as a cinder project using the 'independent' release model, which makes sense because its changes are tied to the Ceph project, not OpenStack.
- cinder team - review the above
- rosmaita - get the paperwork started to make rbd-iscsi-client a cinder project deliverable
review the Brocade FCZM driver situation
The Brocade FCZM driver was not working in Python 3. (Brocade decided not to support beyond Python 2.7). Gorka took the initiative to get his hands on a Brocade FC switch so he could test the FCZM driver out, and he's got patches up to fix it to run in Python 3: https://review.opendev.org/#/q/project:openstack/cinder+branch:master+topic:brocade
The driver was marked 'unsupported' in Ussuri and subject to removal in Victoria. We want to backport Gorka's patches to Ussuri (which is Python-3 only) and Train (because that's the release where a lot of people were making the transition to running in Python 3 even though 2.7 was still supported there).
We discussed what to do about the driver in Victoria. We adjusted the 'unsupported' driver removal policy about a year ago so that we don't immediately remove unsupported drivers at the earliest opportunity in order to give driver maintainers more time to get their third-party CI working (which has been the biggest problem). The Brocade situation is a bit different because the vendor has announced no interest in supporting the FCZM driver past Train.
Gorka proposed that he could run CI tests with Victoria RC-1 to verify the driver. We can make the situation clear in documentation and release notes. Historically, it's been a stable driver (except for the Python 3 business), so this is at least reasonable. We can revisit the situation at the Wallaby mid-cycle.
As a side note, the team announced the impending removal of the driver at the end of June: http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015692.html We were hoping to get some feedback from users about the desirability of maintaining this thing, but didn't hear back from anyone.
- rosmaita - put up a patch for docs and a release note as outlined above
- geguileo - run CI for the Brocade FCZM with cinder RC-1
- rosmaita - reply to ML posting announcing our decision
support volume re-image
We had a quick discussion of rambo-li's proposal to implement the volume re-image feature, which had been approved in Stein, re-targeted to Train, but never implemented.
- the cinder spec needs to be re-proposed for Wallaby
- include the proposal to use the Nova external events API to notify the result of the operation
- will need to include cinder-tempest-plugin tests to make sure everything works as expected
- revise the volume statuses that for which a re-image will be attempted
Eric mentioned that the code will probably need to do explicit testing for NFS, can't assume it will work the same way as other drivers
- rosmaita - get ^^ onto the current patch
- rambo-li - revise the spec
Revisit specifying availability_zone or volume_type for backup restore
Alan Bishop brought up an unimplemented newton spec that would allow specifying an availability zone and/or volume type for the backup restore API, mainly to let people know he's interested in working on it and to make sure the community still thinks it's a good idea. He pointed out some recent Launchpad bugs that indicate there's user interest in this topic. The consensus what that this is still a feature worth implementing and Alan is just the person to do it!
- rosmaita - remove current assignee from spec and re-target it to Wallaby
in-flight image encryption effort update
Luzi brought us up to date on what's been going on with the in-flight encryption effort. The Barbican Secret Consumer API, which is a key part of the scheme, is nearly complete. With the Victoria os-brick release only 3 weeks away, we agreed that this is looking like a Wallaby feature at this point.
The WIP os-brick patch is https://review.opendev.org/#/c/709432
- rosmaita - re-target spec for Wallaby
Sizing encrypted volumes (continued)
The issue (roughly): an encrypted volume must have a header, which takes up some space; when you re-type a "full" volume from an unencrypted volume type to an encrypted volume type, there's not enough room for the header and so the retype fails.
Sofia reported on her efforts to implement a scheme that was worked out at the cinder weekly meetings, namely, to allow a new size to be specified on retype (or an "allow-expansion" flag or something). The problem is that drivers optimize migration in different ways, and since we previously didn't have a new size parameter, there's no easy way to get this info into the drivers. The consensus was that Sofia will write up a spec to change the driver API to enable resize on migration and suggest how/whether to handle a displayed size of a volume vs. the actual size of the volume. She'll also put together an etherpad outlining what this would look like.
- enriquetaso - spec and etherpad as described above
NFS online extend
Lucio put together a dedicated etherpad with the discussion points, so see that for full details: https://etherpad.opendev.org/p/fix-nfs-online-extend
The plan is to have Nova do the online extend. The workflow roughly is that Cinder is told to do the extend to size N, Cinder updates the size of the volume in the DB, asks Nova to do the asynchronous operation. Cinder polls the nova server-action API to monitor the status of the operation. If the extend fails, Cinder needs to restore the original size in the DB. If the cinder service goes down while polling, the volume could remain with the incorrect size.
Gorka suggested that we could do this:
- Store the real current size in the volume's admin metadata
- Leave the volume status as extending
- Change the volume back to in-use when Nova completes the operation successfully
- Revert if nova fails
- Add cleanup on service start to check Nova's action for volume in extending status that have the admin metadata content and either change the status to in-use if it completed successfully or revert it if it failed
There had been some resistance to making this operation dependent on polling an external API, but the consensus seemed to be that it's the best we can do for this case.
Sofia has been working on cinder-tempest-plugin tests to test online extend for regular and encrypted volumes, so we'll be able to use them with the NFS backend to verify that it works. Lee Yarwood is currently working on a bug related to encrypted NFS volumes: https://bugs.launchpad.net/cinder/+bug/1888680
The question had come up earlier about making sure online extend wasn't done at the same time as snapshot creation; Lucio put up a patch to lock the volume during snapshot creation.
An open question is whether this operation can safely be done when multiattach is enabled; Lucio is going to look into that.
A final point was that the Windows SMB driver can apparently do what's outlined here (i.e., the extend happens on the Nova side). Could be worth looking into how they do it.
- lseki - ping lyarwood to see status of LP Bug #1888680
- lseki - PoC if Nova can extend a multiattached NetApp NFS volume
- lseki - take a look into Windows SMB driver (maybe talk to Lucian Petruț at Cloudbase)
Rajat brought up a question about os-brick rootwrap filters defined in the file etc/os-brick/rootwrap.d/os-brick.filters in the os-brick code repository, namely, do we need them given that nova and cinder and glance_store define these filters in their own files?
Turns out that the reason why the os-brick files aren't actually used is that there is (or was) a feeling in the wider openstack community that libraries should not ship configuration files. There's an unapproved devstack patch from 2015 where you can follow the discussion: https://review.openstack.org/#/c/207677/
So what it comes down to is that whoever uses os-brick needs to be root (as is suggested for using the python-brick-cinderclient-ext), in which case the filters aren't needed, or has to configure the filters for themselves (in which case, how are they going to know what filters are needed)? The consensus is that we should leave these files for now as documentation for what filters library consumers need to add to their rootwrap config so they can use os-brick.
- everyone - it would be nice if there were a more thorough solution for this; maybe someone can think of something
We ran out of time so this will be postponed to the weekly Cinder meetings. Two issues:
- should we require third party CI for the backup drivers (we currently don't)?
- what about the IBM TSM backup driver, which Ivan discovered was broken recently? (I have an email out to someone at IBM trying to get some info.)
- rosmaita - follow up on the above