- 1 Introduction
- 2 Session One: R-18: 2 June 2021
- 2.1 General Cinder Project Business
- 2.2 Block Storage API v2 Removal Update
- 2.3 cgroupv1 -> cgroupv2 update
- 2.4 Xena Specs Review
- 3 Session Two: R-9: 4 August 2021
- 3.1 Reset state robustification for snapshot
- 3.2 what to do about the enable_v3_api option?
- 3.3 python-cinderclient next steps
- 3.4 os-brick Xena release
- 3.5 sqlalchemy-migrate -> alembic
- 3.6 what to do about qcow2-v2 support being yanked from various linux distros
- 3.7 keystone "trusts" support in cinder
- 3.8 surfacing backend info to operators
- 3.9 general project maintenance issues
At the Xena Virtual PTG we decided to continue the practice of holding two mid-cycle meetings, each two hours long, during weeks R-18 and R-9.
What follows is a brief summary of what was discussed. Full details are on the etherpad: https://etherpad.opendev.org/p/cinder-xena-mid-cycles
Session One: R-18: 2 June 2021
We met in BlueJeans from 1400 to 1600 UTC.
General Cinder Project Business
Quick discussion of the following issues:
- Reminder that we are in OFTC now for #openstack-cinder and #openstack-meeting-alt. There are 69 people showing in the old Freenode cinder room, but no one has said anything in there, so it looks like people have made the transition.
- The TC has asked teams to consider holding their team meetings in their project channel instead of one of the dedicated #openstack-meeting* rooms. Sounded like most people are OK with that or have no opinion. Since we already have one change people need to be aware of right now (our next meeting will be the first in OFTC), we'll continue to meet in #openstack-meeting-alt for now and take a vote about moving in the next meeting or so.
- also, see this short email thread on this issue: http://lists.openstack.org/pipermail/openstack-discuss/2021-June/022865.html
- The release team likes to make sure libraries have a released early in the cycle so that changes that have accumulated in master since the most recent stable branch was cut can get a workout and we can find out if we've broken anyone. We did a quick review of unmerged os-brick patches and picked two that should be included in this early release. Rajat will update the release patch when the changes have been merged. (We will wait to release an early python-cinderclient until after v2 support has been removed; see below.)
- A question came up about a backport proposed to stable/queens: https://review.opendev.org/c/openstack/cinder/+/760362. The issue is that it's a pretty big patch (on the order of 1K lines, though half of that is unit tests) and that it's not a clean backport from rocky (and hence will require more careful reviewing). On the other hand, the change is isolated to a single driver. The general feeling was that since we've already allowed backporting from victoria (where it was introduced) back to rocky, we should be consistent and allow it to be backported to queens. Plus, it's a significant bugfix, and people are still using queens.
- It turns out that os-brick is not tagged 'vulnerability:managed', which was a surprise to me. The question for the team was: did anyone remember this being done intentionally, or was it simply an oversight (and we didn't notice because most people file security bugs directly against cinder)? No one thought this had been done intentionally, and the team agreed that it makes sense for private security bugs for os-brick to be managed by the VMT, as is currently done for cinder and the cinderclient.
- A discussion about the default envlist in tox.ini for cinder project repos was prompted by a proposed patch to os-brick: https://review.opendev.org/c/openstack/os-brick/+/793221. We're inconsistent about this across repos, although to a certain extent it doesn't matter because no one present admitted to ever using the default tox environment when running tests locally. The default environment is aimed at new contributors, and the general consensus is that we'd like them to run both (a) the unit tests with the latest python version we support, plus (b) pep8.
- We should put a statement of this in an appropriate place.
- We may need to think about this some more. See the discussion on https://review.opendev.org/c/openstack/cinder/+/794661.
Block Storage API v2 Removal Update
Patches have been posted cinder and the cinderclient, but they aren't seeing much review action. So we'll take an adopt-a-patch strategy so people have responsibility for reviewing a specific patch. For each patch, we need two core reviewers (as usual), and it would be nice to have another reviewer (doesn't have to be core) to do a sanity check review, since most of the patches are large.
- "Remove v2 support from the shell"
- core 1: e0ne (reviewed!)
- core 2: hemna
- sanity checker: eharney
- "Remove v2 classes"
- core 1: e0ne (reviewed!)
- core 2: jungleboyj
- sanity checker:
- rajat: glance DNM patch to run the glance with cinder store tests with no v2 available
- "Remove Block Storage API v2"
- core 1: whoami-rajat
- core 2: eharney
- sanity checker:
- "Update Block Storage API v2 api-ref"
- core 1: jungleboyj (reviewed!)
- core 2:hemna
- sanity checker:
what about the enable_v3_api option?
We discussed what to do about the cinder enable_v3_api option. See:
Some options are:
- Deprecate in Xena for removal in Y; and honor it in Xena. This means that if it's enabled, the only Block Storage API call available on nodes running cinder-api will be the /versions request, which will return an empty list of available versions. Not sure there's any utility in that. (This is what's done in the current patch.)
- Deprecate it in Xena for removal in Y, but ignore it in Xena (that is, act as if enable_v3_api=true even if it's actually set false in cinder.conf).
- Just go ahead and remove it in Xena, because if it's ignored, there's no point in it being there at all. On the other hand, option 2 would log a deprecation message that could explain that the option is now a no-op, so there would be the log message available to people who didn't read the release notes carefully.
We decided to think about this some more and handle it in a follow up patch.
cgroupv1 -> cgroupv2 update
Follow up from the PTG: libcgroup v2.0, which supports cgroup v2, and provides 'cgexec' which cinder uses, was released May 6, 2021. This would allow us to use the current code with minimal changes. However, it won't be packaged for some distros, so we need to switch to Plan B.
The alternative is to use systemd, where we would define drop-in slice files that set the io throttling, and then use systemd-run to run the copy/convert commands currently being run with cgexec from libcgroup. Eric pointed out that we could have cinder would generate the files on-demand, as we do elsewhere in the code. That way we could continue to use the config option that sets the io throttling value, which would preserve backward compatibility with current config files.
Just for the record, here are the places where throttling is handled in the current code:
- options: https://opendev.org/openstack/cinder/src/commit/d5f0e51879c03654db4bf89564695e01af593350/cinder/volume/driver.py#L116-L123
- set throttle: https://opendev.org/openstack/cinder/src/commit/d5f0e51879c03654db4bf89564695e01af593350/cinder/volume/driver.py#L560-L576
- copy volume: https://opendev.org/openstack/cinder/src/commit/d5f0e51879c03654db4bf89564695e01af593350/cinder/volume/volume_utils.py#L665-L669
- convert image: https://opendev.org/openstack/cinder/src/commit/d5f0e51879c03654db4bf89564695e01af593350/cinder/image/image_utils.py#L394-L406
Xena Specs Review
(Note: even though the R-18 midcycle is timed so that spec proposers can get feedback in advance of the spec freeze at the end of week R-15, the word apparently hasn't gotten out and we need to figure out a better way to communicate this, both so that proposers know to show up at the midcycle, but also so that they realize they should participate in the midcycle scheduling process very early in the cycle if the current times are preventing attendance.)
Here's a summary of the specs we discussed. Anyone with a spec proposal that wasn't discussed, and who needs more feedback than is currently on the Gerrit review, should reach out to the Cinder team for help (by putting a topic on the weekly meeting agenda, asking in the OFTC #openstack-cinder channel, or via the openstack-discuss mailing list).
Also, here's a link to the midcycle recording for more context to the summaries that follow: https://www.youtube.com/watch?v=bSs5AHz2Iq8
Update original volume az
spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/778437
- The team is generally supportive of this idea, but we need more details.
- This spec could use a more accurate title, because it's more like "Migrate all the stuff associated with a backend to a new availability zone when the backend has been moved to a new AZ", which gives the reader a better idea of the complexity of the issue being addressed.
- The question came up of what about in-use volumes? Will this only work for available volumes? What steps does an operator need to take before running this cinder-manage operation?
- We need a clear description of how other resources that are AZ-aware will be handled, for example:
- Need to add a documentation note that an operator will have to update any volume_types that include an availability zone.
Support revert any snapshot to the volume
spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/736111
- The team remains supportive of this idea, but continues to stress that we need a much more thorough proposal, especially with respect to:
- Vendors need to be aware of the implications of this functionality, especially for deletion
- It would be extremely helpful to have an outline of tempest test scenarios sketched out in the spec, because it will give a clear statement of the expected pre-conditions and post-conditions of various applications of this operation
- It must be clear that all the normal Cinder API semantics are respected when this operation is added
- A suite of tests in the cinder-tempest-plugin (that can be implemented from the outline) will help driver maintainers assess the impact of this change on their drivers
- A question came up about the 'safe_revert_middle_snapshot' property. Is it a static property that can be defined for each driver, or does it depend on other factors (for example, licensing, backend API version, etc.) that must be determined dynamically from the driver?
- Please explain clearly what happens to the more recent snapshots under this proposal. For example,
- volume -> snap A -> snap B -> snap C
- revert the volume to snap B and then write data to the volume
- what effect does this have on snap C?
- All the normal cinder API semantics should still work. For example,
- if you revert volume v1 to A, you should still be able to delete A
- if you then revert volume v2 to A, you should have the original data from A (that is, it shouldn't contain anything that v1 has changed after its revert)
Migration support for a volume with replication status enabled
spec proposal: https://review.opendev.org/c/openstack/cinder-specs/+/766130
- A key question here is: What happens to the old replica?
- There will need to be some documentation making operators aware of what questions to ask vendors about their specific backend. For example, if replication is async, there will be a time when the migrated volume is not replicated. There are probably some other issues that you can think of.
- There's a comment at the end of the spec: "We could follow the multiattach implementation for this spec proposal". Could you please explain in what way multiattach is helpful to your implementation? It's not clear to us what you're thinking here, and an explanation will help us understand your proposal better.
Session Two: R-9: 4 August 2021
We met in BlueJeans from 1400 to 1600 UTC.
Reset state robustification for snapshot
- see questions on https://wiki.openstack.org/wiki/Reset_State_Robustification
- spreadsheet of cases: https://docs.google.com/spreadsheets/d/1cd3auGYk1xlushNA2kwK-nVAEE3p2nloAMNfC0km_3A/edit#gid=106594995
- cinder reset state command: https://paste.opendev.org/show/807889/ <-- see --type option
The issue is that there are a *lot* of state transitions to consider, it looks unfeasible to account for them all. So what to do?
Eric mentioned that an alternative would be to just remove this capability from the REST API and make it only available to cinder-manage, where presumably it would be used only by people who know the implications of what they're doing. But the problem with that is that you have to have access to a cinder node to run cinder-manage from, and that would mean only top-tier support would have access to it, which probably isn't what we want.
Recommendation is to pick a small, clear case (for example, the 2 in the spec) and implement this to at least get started. Also, there are some destination states (particularly those ending in "-ing") that should *never* be transferred to, so we can also block those. (These states are used by cinder to mark the middle of a workflow, and you don't want to allow someone to pretend to put the volume into the middle of a desired workflow simply by changing the state in the DB, because that's not really going to do anything.)
TusharTgite: do the implementation along the lines discussed above
what to do about the enable_v3_api option?
This is a follow up from xena R-18 midcycle, discussed here: https://wiki.openstack.org/wiki/CinderXenaMidCycleSummary#what_about_the_enable_v3_api_option.3F Asked the TC about this on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2021-July/023898.html
rosmaita: put up a patch removing the option: https://review.opendev.org/c/openstack/cinder/+/803533
python-cinderclient next steps
- short-term (meaning, "immediate")
- seeing a lot of failures in py36 functional tests due to timeouts. Need to merge this to master and then backport: Unset tempest.lib timeout in functional tests: https://review.opendev.org/c/openstack/python-cinderclient/+/802966
- victoria: this had been stalled, but the dependency to enable building devstack on centos-8 (where we can run python 3.6) has merged. So need to merge this one so that the python-cinderclient-functional-py36 Zuul job will actually test something: Remove skip_missing_interpreters: https://review.opendev.org/c/openstack/python-cinderclient/+/800635
- wallaby and then backport (probably): Add missing classes to cinderclient.v3: https://review.opendev.org/c/openstack/python-cinderclient/+/802904
- occasional functional test failures due to quotas issues (default volume quota too low, and the tempest user is facing the quota limit of 10 volumes, at which point the tests fail). We discussed some options:
- glance approach: https://opendev.org/openstack/devstack/src/commit/ba68a49598309c5f6c5e4a0ec9d2b13b8229eee0/lib/cinder#L569-L575
- expose quota config in devstack (which ones?)
- simply raise default quotas: https://opendev.org/openstack/cinder/src/commit/4896811eab0b80e0acd23654a1a36df1b2fdda7c/cinder/quota.py#L36-L58
- let's discuss at the PTG (unless someone here knows of someone just dying to work on cinderclient right now)
rosmaita: put up a patch adjusting the config: https://review.opendev.org/c/openstack/python-cinderclient/+/803522
os-brick Xena release
This is coming up fast: os-brick Xena release is at R-7 (week of 16 August)
- NVMeoF connection agent
- CI doesn't test the agent. We need tempest integration for this because it's a complete new scope of work.
- discussion: this is a priority for os-brick for Xena ... need reviews this week to give Zohar and team time to revise
- LVM-related segfaults patches
- discussion: this is a squash of the cinder-side fixes, so only 1 patch to review!
everyone: review the 2 above patches pronto
sqlalchemy-migrate -> alembic
Stephen Finucane put up a bunch of patches to get this moving. He probably won't be able to help further due to a job switch.
Luigi pointed out that sqlalchemy-migrate could be unsupported already. We could propose a festival to make this happen since we need it asap.
rosmaita: organize a review festival or something around this
what to do about qcow2-v2 support being yanked from various linux distros
"Yanked" may be a bit strong; the proposal in RHEL, for example, is to have read-only support.
Not really a cinder problem because qcow-v3 has been the default for qcow2 creation in qemu-img since QEMU-1.7 (Dec 2013), so for anyone running openstack since Icehouse (spring 2014), cinder has only been creating qcow2-v3 images for remotefs-based drivers.
The issue for cinder is if an operator manages a qcow2-v2 format image.
Eric asked: does the cinder manage API even support managing qcow2 using the NFS drivers? He suspects that maybe we don't. So, if not, we should continue not to support this!
rosmaita: look into whether we already don't allow an operator to manage a NFS driver volume that's a qcow and report back
keystone "trusts" support in cinder
The context for this is that there's an etherpad collecting operator pain points to assess the feasibility of having a "fix some pain points in your project" community goal (that is, each project fixes its own pain points instead of trying to find an openstack-wide pain point to address). This patch is on the list: https://review.opendev.org/c/openstack/cinder/+/785362
It's still not clear that the problem reported requires trusts. I'd rather use service tokens, which we already support, and are less complicated.
rosmaita: leave a comment on the patch and/or associated bug explaining this and asking for a reason why what's proposed *cannot* be accomplished via service tokens
surfacing backend info to operators
The context for this one is a patch that is using the 'disabled_reason' field of the volume service to explain why replication is disabled on the backend, which seems to be an unintended use of that field: https://review.opendev.org/c/openstack/cinder/+/791281/5/cinder/volume/manager.py#636
GirishChilukuri explained: During initialization of replicated backend's because of some reason any of the site may be down and we are setting service replication_status as disabled. So to give info to user we are using disabled_reason to explain why replication is disabled. We could see "disabled_reason" is used during failover to update the reason, So we have used the disabled_reason field to update replication status.
Some points that came up during discussion:
- this is not an appropriate use of disabled_reason on a service
- surfacing this info (particularly about replication status) is a common use case across drivers
- not clear that the service object is the place this should be recorded, though
Girish and team will make a proposal for how to surface this info, consulting with other driver developers (simondodsley from Pure expressed some interest)
general project maintenance issues
- We have some small patches addressing mostly testing infrastructure that need to be reviewed and merged, the sooner the better:
- third-party CI
- Request: looking for anyone interested in helping to monitor third-party CI. This could be ideas, follow-ups, managing, and more.
- issues noted
- no one seems to have followed up on adding the 'autogenerated' tag to Gerrit comments so that they can be easily filtered in the "new" gerrit interface: http://lists.openstack.org/pipermail/openstack-discuss/2021-May/022733.html
- the third-party CIs seem flakier than ever in terms of reliability and responsiveness
- need to enforce 'supported' status more carefully
- mypy patches need review
- Eric's not going to stop, so you might as well review these so we can get them merged and start taking advantage of the type-checking goodness mypy will give us
- Xena features!
- This week is R-9; feature freeze is R-5
- we are tracking them via Launchpad blueprints: https://blueprints.launchpad.net/cinder/xena
- if you're working on a feature, make sure you have a BP filed to help prioritize reviews
- if you have a BP filed and it's not showing up in the list, contact rosmaita to get it included
everyone: read through and act on the above!