Jump to: navigation, search

Difference between revisions of "CinderWallabyMidCycleSummary"

m (Block Storage API v2 removal)
m (Backup drivers issue with the container parameter)
Line 223: Line 223:
  
 
===Backup drivers issue with the container parameter===
 
===Backup drivers issue with the container parameter===
 +
Gorka discussed this on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020124.html
 +
 +
The consensus is that Gorka has identified a real problem here, and that his proposed solution makes sense, namely:
 +
* Create a policy for accepting the `container` parameter on the API  (defaulting to allow for backward compatibility).
 +
** maybe drop this param and introduce a new one: --backup-options which will have backend-specific options
 +
* Add a new configuration option `backup_container_regex` to control  acceptable values for the `container` (defaults to `.*` for backward compatibility).
 +
** can error out right at the API level; can include the regex in the response to say "your container must be named like this"
 +
** this would happen *after* the policy check to see if the user is even allowed to specify a container
 +
 
===Consistent & Secure RBAC status===
 
===Consistent & Secure RBAC status===
 
===mypy===
 
===mypy===

Revision as of 04:25, 11 February 2021

Introduction

At the Wallaby Virtual PTG we decided to hold two mid-cycle meetings, each two hours long. The first meeting would be before the Cinder Spec Freeze, and the second would be around the New Feature Status Checkpoint. So the mid-cycle meetings will be the week of R-18 and the week of R-9.

What follows is a brief summary of what was discussed. Full details are on the etherpad: https://etherpad.opendev.org/p/cinder-wallaby-mid-cycles

Session One: R-18: 9 December 2020

We met in BlueJeans from 1400 to 1600 UTC.
etherpad: https://etherpad.openstack.org/p/cinder-wallaby-mid-cycles
recording: https://www.youtube.com/watch?v=rKmlpBTIDGA

Cinder Project Updates

upcoming deadlines

  • 18 December - spec freeze
  • 21 January - Wallaby Milestone-2 and new driver merge deadline
  • 1 March - os-brick wallaby release
  • 8 March - python-cinderclient wallaby release

stable releases

Our current release cadence on the stable branches is (roughly) every two months. We've recently released:

  • os-brick: 4.1.0 (wallaby), 3.0.4 (ussuri)
  • cinder: 17.0.1 (victoria), 16.2.1 (ussuri), 15.4.1 (train)

We managed to slip a critical fix past the Train gate, but three or four other patches that had been approved and targeted for 15.4.1 are stuck in the gate. Once we get the Train gate sorted out, we'll most likely do an extra release from stable/train out of cadence.

end of year meetings

The last meeting of the year will be 23 December. Being also the last meeting of the month, it will be held in videoconference. We won't meet on 30 December.

requirements/lower-constraints update

The lower-constraints job was broken recently by a pip update that included a new dependencies resolver that is apparently more strict than the old one, and as a result, it was detecting incompatible constraints from cinder specifying a lower constraint for some libraries than its dependencies allowed as a minimum. We started to change the values in lower-constraints.txt one by one, and it became pretty obvious that a larger change was needed. So we merged this patch yesterday: https://review.opendev.org/c/openstack/cinder/+/766085

That patch raised the minimums in requirements.txt to the versions in a pip freeze of a recent py38 testenv and raised the values in lower-constraints.txt accordingly. This included some big jumps in major versions for some libraries, but it does bring our lower-constraints.txt much closer to reality. The lower-constraints job just uses the lower-constraints.txt as constraints and basically checks to make sure the requirements can be satisfied and runs unit tests. Since functional and tempest tests and the third party CIs are using versions of libraries much closer to the openstack upper-constraints, that left a big gap between what we were claiming as lower constraints and what was actually being tested.

actions
  • rosmaita - look into what exactly is the function of lower-constraints; it was supposed to be a guideline for packagers, but we need to keep a balance between what's claimed and what's actually being tested
  • driver maintainers - check the values in setup.cfg for dependencies used by your driver and make sure that the minimum isn't too low; also make sure the values in driver-requirements.txt are consistent with setup.cfg (setup.cfg is what's actually used; the driver-requirements.txt is a legacy convenience file for packagers)

Block Storage API v2 removal

The goal of this session was to make a list of task/owner/completion dates so we can move this along and get it done during this cycle. Ivan (e0ne) is acting as v2 removal coordinator.

other projects using v2

tempest

Whether tempest uses v2 or not is controlled by a variable. Tempest has a tempest-cinder-v2-api job: https://opendev.org/openstack/tempest/src/commit/dd84940a81316129bb6adccf7ae4b5cdcbb37382/zuul.d/project.yaml#L123 At some point during Wallaby, tempest will no longer be able to run this job against cinder master.

action
  • rosmaita - talk to gmann about this and figure out how he wants to handle the change
grenade
action
  • e0ne - is looking into this
devstack
actions
other services

Ivan pointed out that when we removed the v1 API, several other services broke even though all the cinder tests were passing. So we need to test horizon, nova, glance, and heat gates when v2 is disabled.

actions
openstackclient, openstackSDK

Give them a heads-up that if they plan to continue Block Storage API v2 support, they'll have to make arrangements to test against an older cinder release.

actions
  • rosmaita - contact [osc][sdk] on the ML
others?
actions
  • rosmaita - general announcement to the ML

how much v2 code must be removed from Cinder in Wallaby?

  • We need to remove the v2 endpoint.
  • v3 uses a lot of v2 classes, which we'll eventually want to refactor. Before doing this, though, we need to verify that there is good v3 test coverage.
  • documentation
    • remove the v2 api-ref
    • refactor the api-ref functional tests to handle only v3
    • make sure the general documentation does not contain references to any v2-specific stuff

cinder-tempest-plugin

actions

rosmaita - tag if necessary and then patch to make sure it only runs v3

python-cinderclient

We already announced on the ML that we'd be removing v2 support in Wallaby:


This has to happen before the client library freeze, which is the week of 8 March.

The cinderclient is structured similar to the Block Storage API in that a lot of v2 classes are reused for v3 functionality. This may make removing v2 functionality from the CLI tricky.

actions

somebody - needs to dig in and see what needs to happen

consistent and secure policies

Keystone made some changes back in Queens (enhanced scoping) and Rocky (default roles). Combining roles and scoping provides common "personas" that can be supported by openstack services in the default policy configuration.

There's a patch up that describes what personas cinder will support, and which provides a matrix of cinder policies and how each of the supported personas should behave by default: https://review.opendev.org/c/openstack/cinder/+/763306

This will give us a document under version control that describes the expected behavior and which will make it easier to verify that the new policies are doing the correct thing.

In reviewing the permissions matrix, Rajat noticed that the default policy to reset the status of a group snapshot is currently admin-or-owner, which seemed weird. A bit of digging revealed that this may have been a mistake when the old policy.json was converted to policies-in-code:

We had a quick discussion over whether anyone remembered if this was an intentional change (nobody remembered). Eric pointed out that exposing this to end-users can be dangerous if they set the state to something like 'CREATING'. The consensus was that this should really be an administrative action by default, and we should treat the current value as a bug, and backport it as far as possible.

We discussed a few other possible changes, but the key takeaway is that even though the "project-admin" persona has "admin" in its name, such a person is *not* an admin in the way we're used to thinking of an "admin". This is discussed in more detail on the permissions matrix patch: https://review.opendev.org/c/openstack/cinder/+/763306

actions

Windows RBD os-brick connector

Lucian (lpetrut) led a followup discussion of a topic he brought up at last week's Cinder meeting, namely adding a mixin class to os-brick to provide a simplified RBD connector for Windows given that Ceph Pacific will support mounting RBD images on Windows.


There was a concern that the windows RBD connector would have to be version aware, or should maybe pass version info through the connection properties. We discussed this a bit, and it wasn't clear that having this information would actually be helpful, and it might expose information via the connector that we don't want exposed. Also, Nova might use cached connector properties that contain old ceph version info. The consensus was that we don't need to be passing the ceph version in the connection info at the current time.

review of proposed specs

migration support for a volume with replication status enabled

https://review.opendev.org/c/openstack/cinder-specs/+/766130

The problem is that the current workflow disallows a migration of a volume that's got replication enabled too high in the stack. The way cinder currently handles multiattach isn't a good model for what needs to be done here. Gorka suggested that this should probably be done the way snapshots are currently handled to go through the scheduler to find out if the operation will be allowed. He dug up two old patches from when snapshots were changed to work this way:

This looks like a promising model for the new spec.

storing volume format info

https://review.opendev.org/c/openstack/cinder-specs/+/760999

The format info will be stored as admin metadata, but shouldn't be exposed in the volume-show call; instead, it should show up in the Attachments API.

We had a brief discussion over what "admin metadata" is supposed to be, because it's ambiguous over whether it's arbitrary info that administrators can add to volumes, or whether it's required for administrating the volume. The consensus was that it should be more like the latter; some drivers use admin metadata to hold useful info for themselves. So the general idea is:

  • drivers can add stuff in the admin metadata
  • system-admins can see it for troubleshooting
  • system-admins should not touch the fields that they themselves don't create

Session Two: R-9: 10 February 2021

We met in BlueJeans from 1400 to 1600 UTC.
etherpad: https://etherpad.openstack.org/p/cinder-wallaby-mid-cycles
recording: <not yet available>

Cinder Project Business

FS backend format info documentation brainstorming

Background: in NFS-based backends, volume data may be stored as raw or qcow2 files, thought this is supposed to be transparent to the user. Cinder uses qemu-img info to figure out what format the volume is stored in before it does operations on the volume. The problem is that it's possible for a volume consumer to store a qcow2 file in their volume, and in that case, Cinder is fooled into thinking that the *volume* is itself in qcow2 format and acts accordingly, whereas Cinder should be treating this as a raw format file.

The wallaby spec to address this issue is https://review.opendev.org/c/openstack/cinder-specs/+/760999, which addresses this issue by adding volume-admin-metadata specifying the "real" format of the file representing a volume on an NFS-based backend. The question here is: What do we do about legacy volumes on an NFS-based backend?

  • we could execute qemu-img info on the volume and store the result as the volume's format. The problem with this idea is that qemu-img info is giving us innaccurate info now (which is why we have a bug), and we might store the incorrect result, which will lead to incorrect behavior when cinder operates on that volume
  • the agreement is that the current behavior is probably OK for legacy volumes
  • there is a question about what to do when managing volumes on an NFS-based backend. We need to have a way to get the format info into cinder
  • one way to avoid the problem completely is to set the nfs_qcow2_volumes option to True, and then Cinder will assume that all volumes on an NFS-based backend are qcow2 files, and will treat them correctly no matter what's the volume contents are.


Eric mentioned that we need to make sure that this doesn't impact encrypted volumes (probably not, but we should make sure). Some current NFS encryption patches:

Avoid raw to raw conversions when creating volume from raw image

Rajat wanted to discuss this part of the code: https://github.com/openstack/cinder/blob/master/cinder/image/image_utils.py#L716-L720

When creating a volume from an image, we download the volume and convert it to raw format (because Glance can store images in various formats) before writing it into the volume. Obviously, this is wasteful when the Glance image is in raw format already--not just in CPU power to do the conversion, but also with respect to the tmp space where the downloaded image is stored. Operators have encountered problems when the tmp space fills up.

On that note, Eric has a WIP patch up that would use sparse files in tmp space so that more images can fit before running out of disk space. But another issue is that cinder currently doesn't check how much space is available before it tries to do the download-and-convert process. We could do a check first before filling up the tmp space and crashing. (This is an open task for someone interested.)

This is a well-defined task that could maybe a starter for google summer of code. (Deadline for GSOC proposals from the mentor side is 19 Feb.)

"ospurge" via openstackclient via openstacksdk

There's a patch up adding a purge command to openstackclient using functionality that has already merged into openstacksdk. Would be nice for the Cinder team to review what this does and list what enhancements it needs so that operators using it will get a thorough purge of the resources consumed by a project that's being deleted.

This came up on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2021-January/019874.html The use case in the email is: "I was wondering what the right approach to clean up all resources of a project was in order for it to be deleted?"

There's currently code in openstacksdk that does a purge operation for cinder: https://opendev.org/openstack/openstacksdk/src/branch/master/openstack/block_storage/v3/_proxy.py#L547

There's a patch up to openstackclient that exposes this functionality in OSC: https://review.opendev.org/c/openstack/python-openstackclient/+/734485

We took a quick look at the code to see what else would need to be added to delete all cinder resources and database records associated with a project. Some missing items are:

  • quotas
  • groups
  • transfers
  • private volume type with only this project having access
  • group snapshots
  • qos
  • attachments

Additionally, the SDK code is doing the right thing by deleting backups and snapshots first and then deleting volumes, but it could use the cascading delete feature for volumes, and then cinder would handle the backup and snapshot deletion for each volume. This functionality has been available in Cinder for a long time (it's in v2, so it's also in v3.0, so doesn't require a microversion to use it).

A question came up about whether the sequencing (with respect to other services) is correct, for example, you need to delete instances first so that any attached volumes can be released. Similarly, if Glance is using the cinder glance_store, you should delete the images first (which would then delete the volumes). This brought up another question, namely, what do you want to do if some of those images are public or shared with other projects? It's possible that the current strategy only addresses one use case really well, but could leave an operator in a bad situation for other use cases.

The conclusion of the discussion is that there is an opportunity for someone in the cinder community to "interface" with the OSC, OSDK team to work out a useful roject-purge functionality, and also to figure out what the feature gaps are between python-cinderclient and OSC, and also what exactly is being implemented in the OpenStack SDK.

Refactoring cinder.utils / volume_utils

Eric wanted to raise this issue that has been discussed at previous cinder meetings, reminding us that we agreed this was a good idea, and that there are patches that need to be reviewed, to wit:


Note that there may be a race condition between these patches and the new driver patches which may require some changes.

Block Storage API v2 removal

We are still committed to doing this.

rosmaita will post a patch removing v2 support from the python-cinderclient.

e0ne will report on the status of removing v2 from cinder at the next cinder meeting.

Backup drivers issue with the container parameter

Gorka discussed this on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2021-January/020124.html

The consensus is that Gorka has identified a real problem here, and that his proposed solution makes sense, namely:

  • Create a policy for accepting the `container` parameter on the API (defaulting to allow for backward compatibility).
    • maybe drop this param and introduce a new one: --backup-options which will have backend-specific options
  • Add a new configuration option `backup_container_regex` to control acceptable values for the `container` (defaults to `.*` for backward compatibility).
    • can error out right at the API level; can include the regex in the response to say "your container must be named like this"
    • this would happen *after* the policy check to see if the user is even allowed to specify a container

Consistent & Secure RBAC status

mypy

Test with current versions of Ceph

Reviewing XS patches

CI failures unittest

R-9 Bug Review