Jump to: navigation, search

CinderVictoriaPTGSummary

Revision as of 02:30, 10 June 2020 by Brian-rosmaita (talk | contribs) (Dynamic front-end QoS)

Contents

Introduction

This page contains a summary of the subjects covered during the Cinder project sessions at the Victoria PTG, held virtually June 2-5, 2020.

The Cinder Team at the Victoria (Virtual) PTG, June 2020.


This document aims to give a summary of each session. More context is available on the cinder PTG etherpad:


The sessions were recorded, so to get all the details of any discussion, you can watch/listen to the recording. Links to the recordings are located at appropriate places below.

Tuesday 02 June: Current Cinder Issues

recordings

Discuss the security issue when an attached volume exposes host information to non-admins

Discussion led by whoami-rajat

This was about https://bugs.launchpad.net/cinder/+bug/1740950 and Related-Bug: https://bugs.launchpad.net/cinder/+bug/1736773

The compute host is leaked in the volume-detail response, and also in the REST Attachments API.

We've already agreed that the volume-detail response will only display the host when called in an administrative context (the API response is already allowed to have a null value there, so there is no API change).

For the Attachments API response, need to do some research to see whether this field is needed and in what contexts. Doesn't look like Nova needs it, doesn't look like any of our os-brick connectors are using it, but on the other hand, the info has been available in the response so long that we need to be careful, because something could be using it. We have to be careful about what info is left out of the response, because if it's too restricted, the caller won't be able to connect to the backend.

Conclusions

  • short term, need to get policies in place to govern this
  • open issue: the proposal was to keep the default behavior, but that continues to allow the leakage of compute host name; would probably be better to have a more restrictive policy (since we don't believe that any services depend on it), and then operators can adjust the policies to allow users who have use cases that need this info to see it
  • ACTION - whoami-rajat to continue working on this

Sizing encrypted volumes

Discussion led by eharney and enriquetaso

When you create an encrypted volume, you lose some space for the LUKS header. This has shown up in some migration bugs. If you have an unencrypted volume and migrate to an encrypted type, it fails because there isn't enough space on the target to fit the entire source volume.


The exact amount of space lost isn't clear, it may vary by luks format and also iscsi (cryptsetup) vs rbd (luks). It's probably on the order of 1-2 MB, but we allocate space in GB. Whether you can do really fine-grained space allocation or not probably depends on the backend containing the volume. We also need to keep the gigabyte boundaries for accounting purposes (usage, quotas, charges, etc.) because that's what everyone's tooling expects.

Walt suggested that we change the API to allow a new size to be specified during a volume migration. For retype, Gorka suggested that we could allow a flag similar to the current migration_policy in the retype request, which gives the user the option to say it's OK to perform a migration if it's required. We could have a flag indicating that it's OK for the volume to be increased in size if it's required for the retype to succeed. This way we don't go behind the user's back -- we fail if the volume won't fit, and allow them to retype to a new size if they want the retype to succeed.

Conclusions

  • Take the Walt/Gorka approach. It will allow us to fix a bunch of bugs, and we can move to something more elaborate later if there's demand for it.
  • ACTION - Sofia - add a note to the docs explaining that retyping an unencrypted volume to the same size encrypted volume will most likely fail (so that users know not to try this in older releases)

Quotas!!!

Discussion led by rosmaita

The problem is that there are race conditions and bugs in the code so that operators see stuff like duplicate entries in the the quotas table, incorrect usage reports, etc.

There is a spec proposed for removing the quota usage cache (database table): https://review.opendev.org/730701

The idea is that instead of trying to keep this table updated, we determine usage on the fly. Some people are suspicious that this could cause a performance problem, but Eric thinks that with appropriate indices, this would be very fast for a database to compute. It would definitely be a very small number relative to the amount of time it takes to actually build a volume.

There may still be some race conditions we need to worry about with the new approach. Our usage code was based on Nova's, but they've rewritten theirs to eliminate the database table. But they also accepted that it's OK for users to occasionally be over quota. We may have to do something similar.

Two related issues came up during the discussion:

  • the keystone "unified limits" work -- this has to do with where a user's limits are stored, but doesn't impact usage computations. Moving the user limits to keystone would be an independent effort
  • the REST API currently returns 413 or 500 for usages. The API SIG suggests returning 403 instead of 413, but there doesn't seem to be a point in us changing. (You'd need a new microversion to return the "correct" response code, but you'd still need to be able to handle the "incorrect" one if that microversion weren't available, so why not just stick with what we have.) The 500s should be fixed, though. (But that wouldn't require a microversion change.)

Conclusions

  • The approach in https://review.opendev.org/730701 looks promising. It would be nice if we had some performance numbers to assess the impact of the change.
  • Need to figure out what new race conditions we will see and how to deal with them. (Looking at what Nova did should help here.)
  • ACTION - continue discussion on the spec


(The team adjourned early so that the Cinder core security team could review the patches and plan for rolling out the changes for the bugfix for Launchpad Bug #1823200, which would become public the next day.)

Wednesday 03 June: Third Party Drivers

recordings

Bug: Fail to extend attached volume using generic NFS driver

Discussion led by lseki

The bug: https://bugs.launchpad.net/cinder/+bug/1870367

It affects Quobyte and NetApp drivers relying on generic NFS implementation we need to fix it in 2 steps:

(1) don't allow extend of an attached volume for nfs drivers: https://review.opendev.org/#/c/725805/

(2) figure out how to do the online extend safely, and then re-enable that functionality

This was working at some point, but has stopped. Suspect that changes in qemu, qemu-img seem to be doing more locking (versions 2.10, 2.11, somewhere around there). The hypothesis is that nova locks the file and then the cinder process can't write to it.

Need to research whether there is a qemu operation that we could ask nova to do, like,

virsh qemu-monitor-command <DOMAIN> --hmp "block_resize <DISK> 2G"

The idea would be to model this on the snapshot, you have a nova API that's implemented through libvirt.

Conclusions

  • continue the two-stage approach, that is, block the operation for nfs drivers so that you don't get failures, and research how to actually fix it
  • seems like we need to get nova cooperation on this, we shouldn't try to get around the locking somehow on our own
  • testing: we run devstack-plugin-nfs-tempest-full in the cinder gates; when ready, should enable volume-feature-enabled.extend_attached_volume in tempest on that job

Improvement proposal for LVM volume driver, direct attached storage via cinder

The proposer wasn't available, but we discussed the issue anyway because it's come up several times.


The basic idea is that you can get some optimization by opening volumes as devices instead of connecting via iSCSI when they're on the same node. The problem is that this is a very special case.

The team discussed a few ways to do this:

  • introduce this as an optimization in the LVM driver, if you are on same node, skip iscsi layer
    • pro: would allow more functionality
    • con: makes the driver more complex and likely to introduce bugs
  • add an option in cinder to enable "try local connection" on the LVM driver
    • if it's enabled adds path to the connection info
    • If os-brick receives the path, then tries to use it if it exists, if it doesn't, it does the normal iSCSI connection
  • do something similar in the lio target


What is the attitude of the project to this?

  • if it's a new target, could clone the lio job and run those tests (because it's a single node case)
    • need to have good documentation, especially about the limitations
  • if it's the "optional" one, it would use the same lio job but setting the "try local connection" config
  • bottom line is that we want as little interference with other drivers as possible

Conclusions

  • we want to know more about the use case -- it really sounds like using cinder to manage nova ephemeral storage
  • proposer can bring this up again after addressing the above points

Ceph iSCSI driver

Discussion led by hemna

Walt began work on this a while back but got stalled because some of the dependencies weren't packaged for a distribution, so there wasn't a way to get infra-approved CI for it.

It came up again on the mailing list in April: http://lists.openstack.org/pipermail/openstack-discuss/2020-April/014278.html

Walt wrote an update this morning: http://lists.openstack.org/pipermail/openstack-discuss/2020-June/015237.html

Ubuntu Focal now has packages, so we can aim for Focal as the CI target. Packages may be coming soon for Fedora.

There's an effort to get the ceph devstack plugin updated to a modern version of ceph: https://review.opendev.org/#/c/676722/ -- would be good to help get this merged so we have a modern ceph to work with.

Conclusions

  • Walt has some time to continue with this now; he'll coordinate with the Linaro developers to work on it

Remove volume driver abc classes

Discussion led by eharney

The design of the base driver was to include a bunch of interfaces as abstract classes, for example: https://opendev.org/openstack/cinder/src/commit/be9d5f3fa54b1b7948f44a9c91cf3f4753c0a03b/cinder/volume/driver.py#L1998

It's a whole bunch of code that didn't work out well (abc didn't do exactly what we had hoped), and it would be good to get rid of it. (This has been a TODO for a few years.)

The general idea is to start removing these various "xxxVD" abstract classes and have a single volume driver class instead of all these subclasses with abc on them.

General observations:

  • probably phase this over a cycle or so
  • will definitely affect out-of-tree drivers, will need to publicize this
  • maybe leave the classes there, but remove abc from them and make them no-op classes
  • change in-tree drivers to inherit only from the base driver class, instead of that class comma xxxVD comma yyyVD

We have two sets of classes that describe the interface for drivers:

  • the abc classes in cinder.volume.driver
  • the classes in cinder.interface

It would be better to have a single source of interface class that defines what you need to do to implement a driver.

Conclusions

  • doesn't seem to be conceptually difficult, but will touch a lot of code
  • e0ne is interested in seeing this happen
  • need to work out the points in "General observations" above more clearly before starting, but the general sense of the community is this is a good idea

Backup Driver issues

Discussion led by Shatadru

Well, one issue anyway. The issue is the container parameter for the backup drivers. It's handled differently depending on what driver you're using:

  • swift creates a bucket
  • nfs creates a directory
  • ceph uses it as a pool name


What happens with Ceph is that if the pool doesn't exist, the backup will eventually fail, which can be mystifying to users:

  • they expect it to act like the other drivers, where you can use it to organize your backups
  • it's not documented anywhere that for Ceph, you need to give it an existing pool name or your request will fail


Most operators using Ceph probably don't want this used by end users because they've probably set aside a particular pool they want used for backups and don't want users to put their backups in some other pool. Also, they probably don't want users to even know any of the pool names.

Would like a configuration setting to allow the Ceph driver to ignore the container if it's passed. Problem is, this doesn't make so much sense for the other backend drivers, so we don't want to make it a global setting -- and that means it won't be the kind of thing we can detect and reject at the API layer.

It's a real problem for Ceph because an end user can probe to find out what pools they have access to, and they can put things into the "wrong" pool. But it seems perfectly OK for the other drivers.

Conclusions

  • should add a configuration option for the Ceph driver, something like 'ignore_container' and then the configured backup pool is used even when a user specifies something else
  • we'll track this as a Launchpad bug: https://bugs.launchpad.net/cinder/+bug/1881934
  • ACTION - Shatadru - will put up a documentation patch
  • ACTION - rosmaita - will update the bug with this discussion to point the direction the fix should take

Backporting 'supported' status

We allowed this during Ussuri in https://review.opendev.org/#/c/700775/ but that may have been an oversight. But then we did it again with https://review.opendev.org/#/c/718758/ because a precedent had been set. Just want to verify that we think this is a good idea.

Conclusions

  • it's OK to do this, but we'll only allow a backport to the most recent stable branch
  • need to see evidence that the driver actually runs correctly in that stable branch (because most of the third-party CI only test master)
  • ACTION - rosmaita - update the driver policy to clearly state the above

Keeping 'unsupported' drivers in-tree

We changed our next-cycle-removal policy in January 2020 to keep the drivers around longer, for the convenience of operators who want to use the driver, but also in the hope that the vendor would ultimately take responsibility to get them 'supported' again. And in fact that did happen during Ussuri, several drivers who had problems meeting the CI-must-run-python-3 requirement in Train did get their CIs fixed in Ussuri and their drivers marked 'supported' again.

However, we ran into a problem that caused some reviewers to express second thoughts, namely, a cinder feature entailed a change in some drivers, including at least one 'unsupported' driver: https://review.opendev.org/#/c/720722/

So the point of this discussion is to make sure that people are on-board with the idea of keeping 'unsupported' drivers in tree, even if that means the cinder team (not the specific driver maintainer) may have to make changes to the driver to support general cinder features.

Conclusions

  • we already know that we need to keep the 'unsupported' drivers passing tests, which means the cinder team (not the specific driver maintainer) may have to make changes to the driver code; adding code to support a cinder feature doesn't seem relevantly different
  • the current policy is that we can remove an 'unsupported' driver if maintenance becomes a burden, so we will continue to assess this on a case-by-case basis
    • the case-by-case business is because in the case discussed above, some unsupported drivers were fixed because they inherited from a class that was fixed, whereas there was at least one unsupported driver that required its own change. If the change was complicated, it would make sense to keep the former but remove the latter.
  • ACTION - looks like this is already covered by the current policy, so no action needed on this one

Ussuri Development Cycle Retrospective for Driver Developers

We didn't get a lot of feedback.

Broken RBD live migration

This wasn't actually a discussion, someone left a question on the etherpad and Gorka answered it. See the etherpad for details: https://etherpad.opendev.org/p/victoria-ptg-cinder

Thursday 04 June: Cinder Improvements

recordings

Recent Security Issues

Quick discussion of OSSN-0086, which was announced yesterday. We has a similar issue last cycle, which is described in OSSN-0085.

Conclusions

  • It would be good for driver maintainers to look over those OSSNs and make sure that their drivers aren't vulnerable to a similar problem.

Reviewing our reviewing strategies

Discussion led by eharney

Eric reminded the team about 'tox -e cover' and that you can use it to make sure that the code you're reviewing is being tested. It helped him find a problem with some code that looked like it was being tested, but actually was not.

This prompted a discussion of using spec or autospec when mocking stuff; without either, mock will allow you to call non-existent functions on a mocked object without complaining, which was the problem here: it allowed a call to .has_calls() to look like it was doing something, whereas it wasn't (the correct method would be .assert_has_calls(), which would have made the test (correctly) fail).

There was also some discussion about the pylint job, which is always failing; it would be good if it ran only on the changes in the current patch so that you'd know if the failure was being caused by the patch you're reviewing.

Conclusions

  • would be good to have the coverage results right there on the review; we can add a non-voting coverage job in Victoria and revisit at the next PTG to see whether people are finding it useful
  • ACTION - rosmaita - set up the job for ^^
  • we should make "patch" and "mock_object" methods in "TestCase" use "autospec=True" (though if you do it now, you get 77 unit test failures)
  • ACTION - ^^
  • need to revisit the pylint job, because it can detect problems, but in its current state it isn't that useful
  • ACTION - ^^
  • some teams have implemented a before-and-after coverage test with a threshold (so that you don't allow changes that decrease coverage)
  • ACTION - ^^
  • what it comes down to is that these static analysis jobs aren't going to increase the amount of time it takes for the current zuul "check" to complete, so they're worth a try

Interop WG interlude

This was a reminder from the Interoperability Working Group to make sure they are up-to-date with respect to Cinder. (We haven't had any major REST API changes during Ussuri, so we are probably OK). Recent change removed Block Storage API v2 from the interop tests, which is fine, since the v2 API was deprecated in Pike.

Conclusions

  • there is still a position as Interop WG Cinder project Liaison open!
  • ACTION - rosmaita - verify that the functionality covered by the removed v2 tests continues to be covered by the current v3 tests

Looking at type annotation

Discussion led by eharney

Eric discussed using type annotations, which are part of the Python 3 language, to help improve code by allowing static checking of parameters and return types.

He's got a patch up to introduce a 'mypy' tox environment that will execute mypy type checking against cinder code: https://review.opendev.org/#/c/733620/

References:

The question came up that code using these won't be directly backportable to stable branches that still support Python 2.7, and what about our policy about not using Python 3-only language features in part of the code used by drivers? The consensus was that we should use Python 3. Some backports may be more complicated, but that's the way it goes.

Conclusions

Cinderclient CLI usability issue

Discussion led by rosmaita

Many Block Storage API requests return a 202 (Accepted) response. This doesn't really tell you whether your request has succeeded or not, and end users don't always understand this.

One thing we can do is enhance the 202 response so that it gives a user a hint about what to do next. In the cinderclient, we can add the --poll option to more commands.

The question came up, maybe we shouldn't enhance the cinderclient, instead we should put it into the openstackclient. And that led to ...

OpenStack Client tangent

Because we were talking about working on the cinderclient, the question came up: what about the openstackclient? Since Jay is on the TC, and the TC is thinking about the unified client issue, we talked about this a bit so he could bring our input back to the TC. Here are a few points; see the etherpad for more.

  • There is a feature gap right now between the openstackclient and the cinderclient. We can't just stop work on the cinderclient and instead put new commands into the openstackclient because a user will still need to use the cinderclient for the "gap" features that aren't in openstackclient yet. Before we stop work on the cinderclient, the openstackclient must have feature parity with the cinderclient.
  • People want access to new features now, not next cycle or the one after that. While it would be nice to have a unified client, when faced with the choice between the convenience of a CLI that doesn't do everything, and the inconvenience of having to use a different client for cinder in order to use new features, new features are going to win.
  • The osc, openstack sdk, and shade have been promoted as "opinionated" tools. While that may be appropriate for some users, operators at least are going to want full access to what the APIs offer. So it's not obvious that a move to the openstackclient makes sense until it's clear what the design philosophy of these tools are.

Conclusions

  • At this point, it doesn't make sense to stop working on the python-cinderclient.
  • To get back to the session topic, it doesn't look like it would be too difficult to implement --poll for other resources.

Community Goals

Discussion led by Jay Bryant.

There is one goal approved so far for Victoria: tosky's move all CI jobs to native zuul v3 jobs. That's good from the cinder point of view, because tosky has already converted most of our jobs.

The likely second goal is to move all CI/CD jobs to Ubuntu 20.04 (Focal): https://review.opendev.org/#/c/731213/

For the W cycle, the goals on https://etherpad.opendev.org/p/YVR-v-series-goals are all candidates. An early favorite is the removal of rootwrap. There's some talk about the openstackcli, but that will probably require more discussion (see previous session).

Ussuri Development Cycle Retrospective

Didn't get a lot of feedback on this one, either. One issue that did come up was that while we might not be able to speed up reviewing, we could make reviews more predictable.


(We adjourned early for the Virtual Happy Hour, which was not recorded.)

Friday 05 June 2020: XP & TCB (cross-project and taking care of business)

recordings

Cinder/Glance creating image from volume with Ceph

Discussion led by abhishekk

When glance doesn't know the size of an image being created in advance, when it's using the Ceph backend you get thrashing because it acts conservatively and makes really small resize requests as the image is being written. At the Shanghai summit, we discussed two approaches to fix this: (1) passing the size in request (since cinder already knows how large the volume needs to be; image API change) or (2) a glance_store change (make way bigger resize requests (like 1GB) & then shrink at the end). In Shanghai, the cinder team was satisfied with solution (2). The Glance team didn't have time to implement this in Ussuri and wanted to make sure we're still good with that option. We are.

Make cinder store of glance compatible with multiple stores

Discussion led by abhishekk

Glance now has the ability to offer different backend image stores to users. One use case is for an operator to offer different tiers which are presented to users as a store_id and a description (for example, 'bronze': 'Less expensive storage suitable for whatever'; an operator doesn't need to expose the backend technology to end users, just the fact that there are different stores available). If an operator is using the cinder glance_store driver, currently only one volume_type can be configured for use by that store, even though there may be multiple storage tiers defined by various volume_types in cinder.

It looks like one way to address this would be to continue to use a single cinder glance_store, but give it the ability to pass different volume_types when it makes a volume-create request. The volume_type would not be exposed directly to end users -- instead, each volume_type could be associated with a store_id in the glance stores-info response so that the stores abstraction layer would not be broken.

Gate job for glance cinder store

Discussion led by whoami-rajat

Rajat says:

Currently we support cinder store in glance but there isn't any gate job to verify the image creation/deletion works as expected when glance receives a request. I have been working on a gate job on glance_store for cinder[1] (which is passing finally) but i feel it would make sense to have a job on glance gate to test the whole flow of image creation when glance receives a request.

[1] https://review.opendev.org/#/c/329825/

Conclusions

  • The cinder glance_store seems to be increasing in use; both the Cinder and Glance teams agree that it would be good to make sure it's being tested well.

Glance image co-location

Discussion led by jokke_

In Ussuri, cinder implemented a scheme to assist Glance in image co-location. What this means is that if a volume was created from an image, if that volume was later uploaded as an image, cinder passes along the image_id of the original image so that Glance can determine what store(s) that original image is located in, and put copies of the new image in the same locations. The Glance team would like to modify the implementation on the cinder side so that instead of passing the base_image_ref as a header in the image-create call to glance, if the base_image_ref is present, cinder makes a call to the glance Import API. This would enable Glance to reuse a code path which would be better tested and easier to maintain.

Conclusions

  • The proposal seems reasonable. It shouldn't make the image-create take longer on the cinder side; the Import API returns a 202 (Accepted) and does its work asynchronously, so it would just be a quick API call. We need to look more closely, though.

Refresh connection_info

Discussion led by eharney (lyarwood couldn't be here)

We briefly discussed the proposal Lee outlined on the ML and in a comment on a review:


It seems reasonable. Gorka mentioned that there's an email thread out there where he discusses some aspects of this in more detail.

Conclusions

  • Follow up with lyarwood when he gets back about putting this into a spec
  • Dig up Gorka's email thread to make sure those points are addressed

DR operations and broken volume connections

Related topic introduced by sfernand

Disaster recovery operations leave broken volume connections in the vms. Maybe it could be useful for nova to automatically reestablish iscsi connections after a failover/failback?

  • could maybe do this if multipath is enabled ... but is not going to be easy
  • Fernando is interested in working on this ... Gorka volunteered to talk through the problem with him and bring in Lee at some point because he's familiar with this too
  • work items:
    • Cinder drivers must report whether they do async or sync, A/A (nothing to do since the multipath already has access to both paths) or A/P (we need to refresh the connection) replication
    • Cinder must either tell Nova which volumes have a new "source" or automatically update the initialized connection and tell Nova to refresh specific instances
    • OS-Brick needs to be able to switch from the old to the new connection as cleanly as possible

Dynamic front-end QoS

We have the ability to specify front-end QoS in Cinder that are applied by Nova when a volume is attached. There have been requests for a feature that would allow these to be changed while a volume is attached. This would be useful when QoS specs are changed in Cinder, or when a volume is extended and is using iops_per_gb QoS specs.

This would require:

Conclusions

  • There was widespread agreement that this is a great idea, but no one present expressed interest in picking it up

Victoria spec review

Technical debt

Cycle Priorities and Schedule