Difference between revisions of "CinderXenaMidCycleSummary"

Revision as of 13:10, 4 June 2021

Introduction

At the Xena Virtual PTG we decided to continue the practice of holding two mid-cycle meetings, each two hours long, during weeks R-18 and R-9.

What follows is a brief summary of what was discussed. Full details are on the etherpad: https://etherpad.opendev.org/p/cinder-xena-mid-cycles

Session One: R-18: 2 June 2021

We met in BlueJeans from 1400 to 1600 UTC.
etherpad: https://etherpad.openstack.org/p/cinder-xena-mid-cycles
recording: https://www.youtube.com/watch?v=bSs5AHz2Iq8

General Cinder Project Business

Quick discussion of the following issues:

Reminder that we are in OFTC now for #openstack-cinder and #openstack-meeting-alt. There are 69 people showing in the old Freenode cinder room, but no one has said anything in there, so it looks like people have made the transition.
The TC has asked teams to consider holding their team meetings in their project channel instead of one of the dedicated #openstack-meeting* rooms. Sounded like most people are OK with that or have no opinion. Since we already have one change people need to be aware of right now (our next meeting will be the first in OFTC), we'll continue to meet in #openstack-meeting-alt for now and take a vote about moving in the next meeting or so.
The release team likes to make sure libraries have a released early in the cycle so that changes that have accumulated in master since the most recent stable branch was cut can get a workout and we can find out if we've broken anyone. We did a quick review of unmerged os-brick patches and picked two that should be included in this early release. Rajat will update the release patch when the changes have been merged. (We will wait to release an early python-cinderclient until after v2 support has been removed; see below.)
A question came up about a backport proposed to stable/queens: https://review.opendev.org/c/openstack/cinder/+/760362. The issue is that it's a pretty big patch (on the order of 1K lines, though half of that is unit tests) and that it's not a clean backport from rocky (and hence will require more careful reviewing). On the other hand, the change is isolated to a single driver. The general feeling was that since we've already allowed backporting from victoria (where it was introduced) back to rocky, we should be consistent and allow it to be backported to queens. Plus, it's a significant bugfix, and people are still using queens.
It turns out that os-brick is not tagged 'vulnerability:managed', which was a surprise to me. The question for the team was: did anyone remember this being done intentionally, or was it simply an oversight (and we didn't notice because most people file security bugs directly against cinder)? No one thought this had been done intentionally, and the team agreed that it makes sense for private security bugs for os-brick to be managed by the VMT, as is currently done for cinder and the cinderclient.
- https://review.opendev.org/c/openstack/governance/+/794680
A discussion about the default envlist in tox.ini for cinder project repos was prompted by a proposed patch to os-brick: https://review.opendev.org/c/openstack/os-brick/+/793221. We're inconsistent about this across repos, although to a certain extent it doesn't matter because no one present admitted to ever using the default tox environment when running tests locally. The default environment is aimed at new contributors, and the general consensus is that we'd like them to run both (a) the unit tests with the latest python version we support, plus (b) pep8.
- We should put a statement of this in an appropriate place.
- We may need to think about this some more. See the discussion on https://review.opendev.org/c/openstack/cinder/+/794661.

Block Storage API v2 Removal Update

Patches have been posted cinder and the cinderclient, but they aren't seeing much review action. So we'll take an adopt-a-patch strategy so people have responsibility for reviewing a specific patch. For each patch, we need two core reviewers (as usual), and it would be nice to have another reviewer (doesn't have to be core) to do a sanity check review, since most of the patches are large.

"Remove v2 support from the shell"
- https://review.opendev.org/c/openstack/python-cinderclient/+/791834
- core 1: e0ne (reviewed!)
- core 2: hemna
- sanity checker: eharney
"Remove v2 classes"
- https://review.opendev.org/c/openstack/python-cinderclient/+/792959
- core 1: e0ne (reviewed!)
- core 2: jungleboyj
- sanity checker:
- rajat: glance DNM patch to run the glance with cinder store tests with no v2 available
"Remove Block Storage API v2"
- https://review.opendev.org/c/openstack/cinder/+/792299
- core 1: whoami-rajat
- core 2: eharney
- sanity checker:
"Update Block Storage API v2 api-ref"
- https://review.opendev.org/c/openstack/cinder/+/793244
- core 1: jungleboyj (reviewed!)
- core 2:hemna
- sanity checker:

We discussed what to do about the cinder enable_v3_api option. See:

Some options are:

Deprecate in Xena for removal in Y; and honor it in Xena. This means that if it's enabled, the only Block Storage API call available on nodes running cinder-api will be the /versions request, which will return an empty list of available versions. Not sure there's any utility in that. (This is what's done in the current patch.)
Deprecate it in Xena for removal in Y, but ignore it in Xena (that is, act as if enable_v3_api=true even if it's actually set false in cinder.conf).
Just go ahead and remove it in Xena, because if it's ignored, there's no point in it being there at all. On the other hand, option 2 would log a deprecation message that could explain that the option is now a no-op, so there would be the log message available to people who didn't read the release notes carefully.

We decided to think about this some more and handle it in a follow up patch.

cgroupv1 -> cgroupv2 update

Follow up from the PTG: libcgroup v2.0, which supports cgroup v2, and provides 'cgexec' which cinder uses, was released May 6, 2021. This would allow us to use the current code with minimal changes. However, it won't be packaged for some distros, so we need to switch to Plan B.

The alternative is to use systemd, where we would define drop-in slice files that set the io throttling, and then use systemd-run to run the copy/convert commands currently being run with cgexec from libcgroup. Eric pointed out that we could have cinder would generate the files on-demand, as we do elsewhere in the code. That way we could continue to use the config option that sets the io throttling value, which would preserve backward compatibility with current config files.

Just for the record, here are the places where throttling is handled in the current code:

Xena Specs Review

(Note: even though the R-18 midcycle is timed so that spec proposers can get feedback in advance of the spec freeze at the end of week R-15, the word apparently hasn't gotten out and we need to figure out a better way to communicate this, both so that proposers know to show up at the midcycle, but also so that they realize they should participate in the midcycle scheduling process very early in the cycle if the current times are preventing attendance.)

Here's a summary of the specs we discussed. Anyone with a spec proposal that wasn't discussed, and who needs more feedback than is currently on the Gerrit review, should reach out to the Cinder team for help (by putting a topic on the weekly meeting agenda, asking in the OFTC #openstack-cinder channel, or via the openstack-discuss mailing list).

Update original volume az

spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/778437

The team is generally supportive of this idea, but we need more details.
This spec could use a more accurate title, because it's more like "Migrate all the stuff associated with a backend to a new availability zone when the backend has been moved to a new AZ", which gives the reader a better idea of the complexity of the issue being addressed.
The question came up of what about in-use volumes? Will this only work for available volumes? What steps does an operator need to take before running this cinder-manage operation?
We need a clear description of how other resources that are AZ-aware will be handled, for example:
- snapshots
- backups
- groups
Need to add a documentation note that an operator will have to update any volume_types that include an availability zone.

Support revert any snapshot to the volume

spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/736111

The team remains supportive of this idea, but continues to stress that we need a much more thorough proposal, especially with respect to:
- Vendors need to be aware of the implications of this functionality, especially for deletion
- It would be extremely helpful to have an outline of tempest test scenarios sketched out in the spec, because it will give a clear statement of the expected pre-conditions and post-conditions of various applications of this operation
  - It must be clear that all the normal Cinder API semantics are respected when this operation is added
  - A suite of tests in the cinder-tempest-plugin (that can be implemented from the outline) will help driver maintainers assess the impact of this change on their drivers
A question came up about the 'safe_revert_middle_snapshot' property. Is it a static property that can be defined for each driver, or does it depend on other factors (for example, licensing, backend API version, etc.) that must be determined dynamically from the driver?
Please explain clearly what happens to the more recent snapshots under this proposal. For example,
- volume -> snap A -> snap B -> snap C
- revert the volume to snap B and then write data to the volume
- what effect does this have on snap C?
All the normal cinder API semantics should still work. For example,
- if you revert volume v1 to A, you should still be able to delete A
- if you then revert volume v2 to A, you should have the original data from A (that is, it shouldn't contain anything that v1 has changed after its revert)

Migration support for a volume with replication status enabled

spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/766130

A key question here is: What happens to the old replica?
There will need to be some documentation making operators aware of what questions to ask vendors about their specific backend. For example, if replication is async, there will be a time when the migrated volume is not replicated. There are probably some other issues that you can think of.
There's a comment at the end of the spec: "We could follow the multiattach implementation for this spec proposal". Could you please explain in what way multiattach is helpful to your implementation? It's not clear to us what you're thinking here, and an explanation will help us understand your proposal better.

@@ Line 108: / Line 108: @@
 ====Migration support for a volume with replication status enabled====
 spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/766130
+* A key question here is: What happens to the old replica?
+* There will need to be some documentation making operators aware of what questions to ask vendors about their specific backend.  For example, if replication is async, there will be a time when the migrated volume is not replicated.  There are probably some other issues that you can think of.
+* There's a comment at the end of the spec: "We could follow the multiattach implementation for this spec proposal".  Could you please explain in what way multiattach is helpful to your implementation?  It's not clear to us what you're thinking here, and an explanation will help us understand your proposal better.