CinderWallabyMidCycleSummary

Introduction
At the Wallaby Virtual PTG we decided to hold two mid-cycle meetings, each two hours long. The first meeting would be before the Cinder Spec Freeze, and the second would be around the New Feature Status Checkpoint. So the mid-cycle meetings will be the week of R-18 and the week of R-9.

What follows is a brief summary of what was discussed. Full details are on the etherpad: https://etherpad.opendev.org/p/cinder-wallaby-mid-cycles

Session One: R-18: 9 December 2020
We met in BlueJeans from 1400 to 1600 UTC. etherpad: https://etherpad.openstack.org/p/cinder-wallaby-mid-cycles recording: https://www.youtube.com/watch?v=rKmlpBTIDGA

upcoming deadlines

 * 18 December - spec freeze
 * 21 January - Wallaby Milestone-2 and new driver merge deadline
 * 1 March - os-brick wallaby release
 * 8 March - python-cinderclient wallaby release

stable releases
Our current release cadence on the stable branches is (roughly) every two months. We've recently released:
 * os-brick: 4.1.0 (wallaby), 3.0.4 (ussuri)
 * cinder: 17.0.1 (victoria), 16.2.1 (ussuri), 15.4.1 (train)

We managed to slip a critical fix past the Train gate, but three or four other patches that had been approved and targeted for 15.4.1 are stuck in the gate. Once we get the Train gate sorted out, we'll most likely do an extra release from stable/train out of cadence.

end of year meetings
The last meeting of the year will be 23 December. Being also the last meeting of the month, it will be held in videoconference. We won't meet on 30 December.

requirements/lower-constraints update
The lower-constraints job was broken recently by a pip update that included a new dependencies resolver that is apparently more strict than the old one, and as a result, it was detecting incompatible constraints from cinder specifying a lower constraint for some libraries than its dependencies allowed as a minimum. We started to change the values in lower-constraints.txt one by one, and it became pretty obvious that a larger change was needed. So we merged this patch yesterday: https://review.opendev.org/c/openstack/cinder/+/766085

That patch raised the minimums in requirements.txt to the versions in a pip freeze of a recent py38 testenv and raised the values in lower-constraints.txt accordingly. This included some big jumps in major versions for some libraries, but it does bring our lower-constraints.txt much closer to reality. The lower-constraints job just uses the lower-constraints.txt as constraints and basically checks to make sure the requirements can be satisfied and runs unit tests. Since functional and tempest tests and the third party CIs are using versions of libraries much closer to the openstack upper-constraints, that left a big gap between what we were claiming as lower constraints and what was actually being tested.

actions

 * rosmaita - look into what exactly is the function of lower-constraints; it was supposed to be a guideline for packagers, but we need to keep a balance between what's claimed and what's actually being tested
 * driver maintainers - check the values in setup.cfg for dependencies used by your driver and make sure that the minimum isn't too low; also make sure the values in driver-requirements.txt are consistent with setup.cfg (setup.cfg is what's actually used; the driver-requirements.txt is a legacy convenience file for packagers)

Block Storage API v2 removal
The goal of this session was to make a list of task/owner/completion dates so we can move this along and get it done during this cycle. Ivan (e0ne) is acting as v2 removal coordinator.

tempest
Whether tempest uses v2 or not is controlled by a variable. Tempest has a tempest-cinder-v2-api job: https://opendev.org/openstack/tempest/src/commit/dd84940a81316129bb6adccf7ae4b5cdcbb37382/zuul.d/project.yaml#L123 At some point during Wallaby, tempest will no longer be able to run this job against cinder master.

action

 * rosmaita - talk to gmann about this and figure out how he wants to handle the change

action

 * e0ne - is looking into this

actions

 * e0ne has a WIP patch that will remove v2 endpoint creation from devstack: https://review.opendev.org/c/openstack/devstack/+/766017

other services
Ivan pointed out that when we removed the v1 API, several other services broke even though all the cinder tests were passing. So we need to test horizon, nova, glance, and heat gates when v2 is disabled.

actions

 * e0ne - will put up an empty patch for each of the other services with Depends-on: https://review.opendev.org/c/openstack/cinder/+/554372
 * e0ne - will look into whether this should also be done for rally and vitrage

openstackclient, openstackSDK
Give them a heads-up that if they plan to continue Block Storage API v2 support, they'll have to make arrangements to test against an older cinder release.

actions

 * rosmaita - contact [osc][sdk] on the ML

actions

 * rosmaita - general announcement to the ML

how much v2 code must be removed from Cinder in Wallaby?

 * We need to remove the v2 endpoint.
 * v3 uses a lot of v2 classes, which we'll eventually want to refactor. Before doing this, though, we need to verify that there is good v3 test coverage.
 * documentation
 * remove the v2 api-ref
 * refactor the api-ref functional tests to handle only v3
 * make sure the general documentation does not contain references to any v2-specific stuff

actions
rosmaita - tag if necessary and then patch to make sure it only runs v3

python-cinderclient
We already announced on the ML that we'd be removing v2 support in Wallaby:
 * http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018531.html
 * http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018697.html
 * http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018876.html

This has to happen before the client library freeze, which is the week of 8 March.

The cinderclient is structured similar to the Block Storage API in that a lot of v2 classes are reused for v3 functionality. This may make removing v2 functionality from the CLI tricky.

actions
somebody - needs to dig in and see what needs to happen

consistent and secure policies
Keystone made some changes back in Queens (enhanced scoping) and Rocky (default roles). Combining roles and scoping provides common "personas" that can be supported by openstack services in the default policy configuration.

There's a patch up that describes what personas cinder will support, and which provides a matrix of cinder policies and how each of the supported personas should behave by default: https://review.opendev.org/c/openstack/cinder/+/763306

This will give us a document under version control that describes the expected behavior and which will make it easier to verify that the new policies are doing the correct thing.

In reviewing the permissions matrix, Rajat noticed that the default policy to reset the status of a group snapshot is currently admin-or-owner, which seemed weird. A bit of digging revealed that this may have been a mistake when the old policy.json was converted to policies-in-code:
 * policy.json (admin only): https://review.opendev.org/c/openstack/cinder/+/507812/6/etc/cinder/policy.json#b108
 * policy-in-code (admin or owner): https://review.opendev.org/c/openstack/cinder/+/507812/6/cinder/policies/group_snapshot_actions.py#27

We had a quick discussion over whether anyone remembered if this was an intentional change (nobody remembered). Eric pointed out that exposing this to end-users can be dangerous if they set the state to something like 'CREATING'. The consensus was that this should really be an administrative action by default, and we should treat the current value as a bug, and backport it as far as possible.

We discussed a few other possible changes, but the key takeaway is that even though the "project-admin" persona has "admin" in its name, such a person is *not* an admin in the way we're used to thinking of an "admin". This is discussed in more detail on the permissions matrix patch: https://review.opendev.org/c/openstack/cinder/+/763306

actions

 * everyone - review (or at least look over) https://review.opendev.org/c/openstack/cinder/+/763306
 * rosmaita - file a bug about the default value for the group:reset_group_snapshot_status policy

Windows RBD os-brick connector
Lucian (lpetrut) led a followup discussion of a topic he brought up at last week's Cinder meeting, namely adding a mixin class to os-brick to provide a simplified RBD connector for Windows given that Ceph Pacific will support mounting RBD images on Windows.
 * patch https://review.opendev.org/c/openstack/os-brick/+/718403
 * bp https://blueprints.launchpad.net/cinder/+spec/os-brick-windows-rbd
 * weekly meeting discussion http://eavesdrop.openstack.org/meetings/cinder/2020/cinder.2020-12-02-14.00.log.html#l-161

There was a concern that the windows RBD connector would have to be version aware, or should maybe pass version info through the connection properties. We discussed this a bit, and it wasn't clear that having this information would actually be helpful, and it might expose information via the connector that we don't want exposed. Also, Nova might use cached connector properties that contain old ceph version info. The consensus was that we don't need to be passing the ceph version in the connection info at the current time.

migration support for a volume with replication status enabled
https://review.opendev.org/c/openstack/cinder-specs/+/766130

The problem is that the current workflow disallows a migration of a volume that's got replication enabled too high in the stack. The way cinder currently handles multiattach isn't a good model for what needs to be done here. Gorka suggested that this should probably be done the way snapshots are currently handled to go through the scheduler to find out if the operation will be allowed. He dug up two old patches from when snapshots were changed to work this way:
 * Manage snapshot: https://review.opendev.org/c/openstack/cinder/+/538413
 * Create snapshot: https://review.opendev.org/c/openstack/cinder/+/509011

This looks like a promising model for the new spec.

storing volume format info
https://review.opendev.org/c/openstack/cinder-specs/+/760999

The format info will be stored as admin metadata, but shouldn't be exposed in the volume-show call; instead, it should show up in the Attachments API.

We had a brief discussion over what "admin metadata" is supposed to be, because it's ambiguous over whether it's arbitrary info that administrators can add to volumes, or whether it's required for administrating the volume. The consensus was that it should be more like the latter; some drivers use admin metadata to hold useful info for themselves. So the general idea is:
 * drivers can add stuff in the admin metadata
 * system-admins can see it for troubleshooting
 * system-admins should not touch the fields that they themselves don't create

Session Two: R-9: 10 February 2021
We met in BlueJeans from 1400 to 1600 UTC. etherpad: https://etherpad.openstack.org/p/cinder-wallaby-mid-cycles recordings:
 * part 1 (2 hours) https://www.youtube.com/watch?v=HCcJFGQdOmg
 * part 2 (15 minutes) https://www.youtube.com/watch?v=BaQAsOSD41M

Cinder Project Business

 * stable releases
 * planning stable releases in 2 weeks; tracking on https://etherpad.opendev.org/p/stable-releases-review-tracker-22-02-2021
 * new drivers status
 * community drivers: s3 backup, ceph iSCSI
 * CI-related patches have not merged yet
 * ceph-iscsi: https://review.opendev.org/q/topic:%22ceph-iscsi%22+(status:open%20OR%20status:merged)
 * s3 backup: https://review.opendev.org/q/topic:%22feature%252Fsupport-s3-backup-driver%22+(status:open%20OR%20status:merged)
 * third-party merged: open-e jovian dss, DellEMC PowerVault, TOYOU
 * third-party unmerged: kioxia kumoscale (proposed merge deadline of Friday 12 Feb)
 * http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020362.html
 * everyone seems OK with the extension, though we hope not to have this happen in non-pandemic times
 * proposal: release cinder 18.0.0.0b1 after all drivers have merged to give people an opportunity to consume & test the new drivers
 * will require:
 * release os-brick 4.2.0 from master (to make the nvmeof connector changes available)
 * update cinder requirements to use latest brick
 * release cinder 18.0.0.0b1
 * rbd-iscsi-client
 * all governance, project-config patches accepted
 * outstanding patches
 * reformat as a Cinder project: https://review.opendev.org/c/openstack/rbd-iscsi-client/+/774748
 * update Cinder docs: https://review.opendev.org/c/openstack/cinder/+/773158

FS backend format info documentation brainstorming
Background: in NFS-based backends, volume data may be stored as raw or qcow2 files, thought this is supposed to be transparent to the user. Cinder uses qemu-img info to figure out what format the volume is stored in before it does operations on the volume. The problem is that it's possible for a volume consumer to store a qcow2 file in their volume, and in that case, Cinder is fooled into thinking that the *volume* is itself in qcow2 format and acts accordingly, whereas Cinder should be treating this as a raw format file.

The wallaby spec to address this issue is https://review.opendev.org/c/openstack/cinder-specs/+/760999, which addresses this issue by adding volume-admin-metadata specifying the "real" format of the file representing a volume on an NFS-based backend. The question here is: What do we do about legacy volumes on an NFS-based backend?
 * we could execute qemu-img info on the volume and store the result as the volume's format. The problem with this idea is that qemu-img info is giving us innaccurate info now (which is why we have a bug), and we might store the incorrect result, which will lead to incorrect behavior when cinder operates on that volume
 * the agreement is that the current behavior is probably OK for legacy volumes
 * there is a question about what to do when managing volumes on an NFS-based backend. We need to have a way to get the format info into cinder
 * one way to avoid the problem completely is to set the nfs_qcow2_volumes option to True, and then Cinder will assume that all volumes on an NFS-based backend are qcow2 files, and will treat them correctly no matter what's the volume contents are.

Eric mentioned that we need to make sure that this doesn't impact encrypted volumes (probably not, but we should make sure). Some current NFS encryption patches:
 * https://review.opendev.org/c/openstack/cinder/+/597148
 * https://review.opendev.org/c/openstack/cinder/+/749155

Avoid raw to raw conversions when creating volume from raw image
Rajat wanted to discuss this part of the code: https://github.com/openstack/cinder/blob/master/cinder/image/image_utils.py#L716-L720

When creating a volume from an image, we download the volume and convert it to raw format (because Glance can store images in various formats) before writing it into the volume. Obviously, this is wasteful when the Glance image is in raw format already--not just in CPU power to do the conversion, but also with respect to the tmp space where the downloaded image is stored. Operators have encountered problems when the tmp space fills up.

On that note, Eric has a WIP patch up that would use sparse files in tmp space so that more images can fit before running out of disk space. But another issue is that cinder currently doesn't check how much space is available before it tries to do the download-and-convert process. We could do a check first before filling up the tmp space and crashing. (This is an open task for someone interested.)

This is a well-defined task that could maybe a starter for google summer of code. (Deadline for GSOC proposals from the mentor side is 19 Feb.)

"ospurge" via openstackclient via openstacksdk
There's a patch up adding a purge command to openstackclient using functionality that has already merged into openstacksdk. Would be nice for the Cinder team to review what this does and list what enhancements it needs so that operators using it will get a thorough purge of the resources consumed by a project that's being deleted.

This came up on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2021-January/019874.html The use case in the email is: "I was wondering what the right approach to clean up all resources of a project was in order for it to be deleted?"

There's currently code in openstacksdk that does a purge operation for cinder: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/block_storage/v3/_proxy.py#L547

There's a patch up to openstackclient that exposes this functionality in OSC: https://review.opendev.org/c/openstack/python-openstackclient/+/734485

We took a quick look at the code to see what else would need to be added to delete all cinder resources and database records associated with a project. Some missing items are:
 * quotas
 * groups
 * transfers
 * private volume type with only this project having access
 * group snapshots
 * qos
 * attachments

Additionally, the SDK code is doing the right thing by deleting backups and snapshots first and then deleting volumes, but it could use the cascading delete feature for volumes, and then cinder would handle the backup and snapshot deletion for each volume. This functionality has been available in Cinder for a long time (it's in v2, so it's also in v3.0, so doesn't require a microversion to use it).

A question came up about whether the sequencing (with respect to other services) is correct, for example, you need to delete instances first so that any attached volumes can be released. Similarly, if Glance is using the cinder glance_store, you should delete the images first (which would then delete the volumes). This brought up another question, namely, what do you want to do if some of those images are public or shared with other projects? It's possible that the current strategy only addresses one use case really well, but could leave an operator in a bad situation for other use cases.

The conclusion of the discussion is that there is an opportunity for someone in the cinder community to "interface" with the OSC, OSDK team to work out a useful roject-purge functionality, and also to figure out what the feature gaps are between python-cinderclient and OSC, and also what exactly is being implemented in the OpenStack SDK.

Refactoring cinder.utils / volume_utils
Eric wanted to raise this issue that has been discussed at previous cinder meetings, reminding us that we agreed this was a good idea, and that there are patches that need to be reviewed, to wit:
 * https://review.opendev.org/c/openstack/cinder/+/771290
 * https://review.opendev.org/c/openstack/cinder/+/771290

Note that there may be a race condition between these patches and the new driver patches which may require some changes.

Block Storage API v2 removal
We are still committed to doing this.

rosmaita will post a patch removing v2 support from the python-cinderclient.

e0ne will report on the status of removing v2 from cinder at the next cinder meeting.

Backup drivers issue with the container parameter
Gorka discussed this on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020124.html

The consensus is that Gorka has identified a real problem here, and that his proposed solution makes sense, namely:
 * Create a policy for accepting the `container` parameter on the API (defaulting to allow for backward compatibility).
 * maybe drop this param and introduce a new one: --backup-options which will have backend-specific options
 * Add a new configuration option `backup_container_regex` to control acceptable values for the `container` (defaults to `.*` for backward compatibility).
 * can error out right at the API level; can include the regex in the response to say "your container must be named like this"
 * this would happen *after* the policy check to see if the user is even allowed to specify a container

Consistent & Secure RBAC status

 * it turns out that the URL structure of the Block Storage API (containing a project_id in the path) is a problem for using system-scoped tokens
 * we need to get agreement on what the behavior should be for each of the 'personas' for each policy
 * question: should a project-reader be able to see the password in an attachment?
 * question: is a 'reader' role supposed to only "see" the same data, regardless of scope?

mypy
Eric reminded us that we previously agreed to add type annotations to the cinder code, but the patches are still sitting there.

We tried to move this along by getting people to volunteer to be responsible to particular patches: http://eavesdrop.openstack.org/meetings/cinder/2020/cinder.2020-11-18-14.00.log.html#l-50

That was in November 2020 (!). We agreed that everyone should honor their commitments and try to get these patches merged soon.

Test with current versions of Ceph
Eric noted that Red Hat has begun testing with Ceph Pacific (currently in RC) and a few bugs have been reported. We need a way in upstream Cinder to make sure we are testing against the updated Ceph early enough to catch problems.

Ceph releases once a year, with the latest ceph RC being published around openstack spring cycle M-2. So it would be good to do some kind of testing around this time to run some tests to see if anything breaks, because around the time of the OpenStack coordinated release, the latest Ceph will have been available for about one month already.


 * need to figure out how easy it is to configure a job to use Ceph $VERSION
 * it may be as simple as this: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/748245/2/devstack/lib/ceph
 * if it really is that simple, should we have a periodic job for all cinder-supported ceph versions? (we support Ceph+2, which is the current supported ceph releases (2 of them) plus the next 2 older releases -- but we currently only test with one version (usually the most recent). Or is that overkill?
 * short term action: add a note about this to the release cycle tasks
 * longer term: figure out something more specific

Reviewing XS patches
Eric pointed out that the new Gerrit interface does T-shirt sizing of patches, and that there are a whole bunch of extra-small patches sitting around that we could get reviewed fairly quickly if the team commits to reviewing them. For example: https://review.opendev.org/q/project:openstack/cinder+AND+status:open+AND+size:%253C%253D10

Further, some of these patches, though small, address some serious bugs that we really should fix ASAP. Everyone agreed that we should address this soon, but that good intentions to look at the reviews in the above link aren't sufficient. So we decided that we could have some kind of review session where we get together in a videoconference at some specific time and do the reviews. The first meeting will be a week from Friday (that is, on 19 February) for 2 hours. We'll see how it goes and set a frequency for these meetings after we've done the first one.

announcement: http://lists.openstack.org/pipermail/openstack-discuss/2021-February/020433.html

CI failures unittest
Sofia pointed out that a recent change (that has already been backported) has introduced instability into the unit tests. She's got a patch up to address this: https://review.opendev.org/c/openstack/cinder/+/774496/

We'll probably see job failures in master until the fix is merged. So it needs a quick review, with backports to follow the original patch into the stable branches.

R-9 Bug Review
Sofia noted that for week R-9, there are 6 new bugs in total: 5 for Cinder (all driver related) and one from os-brick: https://etherpad.opendev.org/p/cinder-wallaby-r9-bug-review

The bug against os-brick looked kind of concerning: https://bugs.launchpad.net/os-brick/+bug/1915196

But Eric thinks that it might be a speculative bug, because it looks like we already have code in place to address the issue: https://opendev.org/openstack/os-brick/src/branch/master/os_brick/initiator/linuxscsi.py#L635

Hacking update
Eric has patches up to update the version of hacking that Cinder is using: https://review.opendev.org/c/openstack/cinder/+/770145

He thinks there may already be similar patches for the other cinder deliverable repositories.

Eric also has a patch up to update the version of pylint we use with cinder: https://review.opendev.org/c/openstack/cinder/+/771508

It's a good idea to keep pylint up to date; the updated version of pylint located a few bugs waiting to happen.