CinderZedPTGSummary

Tuesday 05 April

recordings

For the benefit of people who haven't attended, this is the way the cinder team works at the PTG:

sessions are recorded
please sign in on the "Attendees" section of this etherpad for each day
all notes, questions, etc. happen in the etherpad; try to remember to preface your comment with your irc nick
anyone present can comment or ask questions in the etherpad
also, anyone present should feel free to ask questions or make comments during any of the discussions
we discuss topics in the order listed in the etherpad, making adjustments as we go for sessions that run longer or shorter
we stick to the scheduled times for cross-project sessions, but for everything else we are flexible

Release cadence discussion: tick-tock model

https://governance.openstack.org/tc/resolutions/20220210-release-cadence-adjustment.html

There was a PTL-TC session about this on 4th April, 2022 (Monday) and the following points were discussed:

This only affects the upgrade path and not the release model which remains same (i.e. 6 months)
It proposes a tick-tock release model where if a release is tick, the subsequent release will be tock and so on
This new effort provides the ability to upgrade from tick->tick release (skipping one release) but we cannot upgrade (directly) from tock-> tock release
There is a job in place, grenade-skip-level, that will run on tick releases and check upgrade from tick-> tick release (or N-2 to N release)

There's a patch up by Gorka documenting the impact of the new release cadence on Cinder and it requires changes based on the discussion at PTG about the following points: Patch: https://review.opendev.org/c/openstack/cinder/+/830283

removal of configuration option
deprecating or removing a driver
Python version support
Backports (works the same way)
New deprecation policy (deprecation policy in the project team guide: https://docs.openstack.org/project-team-guide/deprecation.html)

There will be a 2 cycle deprecation process which we can see with the following example, Suppose we have a config option "cinder_option_foo" deprecated in AA (tick), we need to continue the deprecation process in BB (tock), then we can remove that option in CC (tick + 1).

conclusions

action: geguileo to update the patch with the current discussion points

Best review practices doc

whoami-rajat is working on putting together a review doc that would help new reviewers to efficiently review changes hence increasing the quality of review. link: https://review.opendev.org/c/openstack/cinder/+/834448 The discussion had a great point that should be mentioned in the review doc regarding a reviewer doesn't have to review everything but mentioning what they reviewed would benefit the other reviewers a lot. Eg: If someone reviewed the releasenote, it saves other reviewers time looking at the releasenote. Also there is a suggestion regarding adding the tick/tock release cadence specific review points.

conclusions

action: whoami-rajat to update the review doc with the suggested points

Secure RBAC

We made the project ID optional in the url to support the system scope use case with the plan to expand scopes from project level to system level in Zed. System level personas will deal with system level resources that are not project specific, Eg: host information. We also have to take into account mixed personas for some resources like volume type is a system level resource but acts at project level if it is private and also needs to be listed by project members to create resources like volumes.

The community goal is divided into different phases and the goals for every phase are defined as follows:

Phase 1: project scope support -- COMPLETED
Phase 2: project manager and service role
Phase 3: (in AA) implement system-member and system-reader personas

The two new roles i.e. manager and service are intended to serve some use cases as follows:

Manager: It will have more authority than members but less authority than an admin. Currently, it is useful for set default volume type for a project.
Service: useful for service to service interaction. Eg: currently we requires an admin token for cinder-nova interaction that makes a service like cinder to be able to do anything in nova as an admin.

There were doubts regarding resource filtering which we can propose as extend work item to the current SRBAC goal. Currently our resource filtering has same functional structure i.e. if it doesn't work for non-admins then it doesn't work for admins either. There was another concern regarding attribute level granularity. Eg: the host field in the volume show response is a system scope entity which should be not be returned with a project scoped token response.

conclusions

action: rosmaita to update policy matrix
action: consider attributes (system level like host) associated to personas (for show, list, filtering...)

More "Cloudy" like actions for Cinder

Walt discussed that certain cases should automatically be handled by the cinder.

1) Certain actions should also invoke an automatic migration due to space limitations. For example, when there are multiple pools against the same backend and a user wants to extend a volume. If the volume doesn't fit on it's existing pool, but there is space for it on another pool on the same backend. There is a concern that If moving between pools take considerably long time then it would not be good that the operation takes that amount of time then we have the following points to consider:

moving between pools with dd will not be efficient
A user message could be useful in this case
Some backends (like RBD) backend can do this efficiently
would be better if we have a generic mechanism + efficient way in the driver to do it

There was a concern regarding a lot of concurrent migration happening due to this at the same time causing performance issue but we currently do that while migrating volumes and it works fine. There was also a suggestion to not migrate the original volume if it's a large one and rather migrate a smaller sized volume to free up the space required to extend but there are a lot of things to consider in this case, major being that the other volume might belong to another project and any failure during the operation might corrupt the other volume as well.

2) Backups, sometimes induce a snap of the volume. Snaps require living on the same pool as the original volume as the original volume. An optimization to this is the volume drivers can say whether they want to use snapshots or clones, depending on what's best for them. The driver can report if it will require the full space for a temp resource or not, if it requires it, it will go through the scheduler to check for free space, in the other case we will just proceed with the efficient cloning.

conclusions

action: Walt to write a spec describing the design and working of it

Unifying and fixing of Capacity factors calculations and reporting

There are some inconsistencies with our scheduler stats like the allocated_capacity_gb is created by the volume manger to let the scheduler know what cinder has already allocated against a backend/pool, this value isn't being updated for migrations. This value can go negative because the init_host calculations only account for in-use and available currently. Patch: https://review.opendev.org/c/openstack/cinder/+/826510

Also we've an issue where there are a few places in Cinder that try and calculate the virtual free space for a pool but the problem is the Capacity filter and Capacity weigher do it differently. Patch: https://review.opendev.org/c/openstack/cinder/+/831247

The backend's stats may show that there is lots of space free/available, but cinder's view might be different due to:

reserved_percentage
thin vs. thick provisioning
lazy volume creation (space unused until volume is actually created)
max_over_subscription_ratio

These calculations needs to be corrected so the operators have the accurate idea which backends are low on space and requires attention and we don't face resource creation failures even though when there is available space in the backend.

conclusions

action: review the patches proposed by Walt

Volume affinity/anti-affinity issues

We have the affinity filters but affinity is kept in check while creating volumes but not when migrating them. One way to handle this is we can preserve the scheduler hint (for affinity/anti-affinity) for later operations on the volume. There are a lot of things to consider with this approach:

What happens if the original volume (we kept affinity from) is deleted/migrated?
Should we keep it as a UUID or host?
Should we consider the scheduler hint only for the volume create operation or preserve it for all the operation for the rest of the life of that volume?
How to go about the design -- should we store it in the metadata or create a separate table to store the volume UUIDs provided for affinity/anti-affinity
What to do about when this requires cascade operations, should we move a lot of resources during the operation maintaining the affinity/anti-affinity?
Need to also think about cases like backup/restore, when replication is enabled

conclusions

action: define in our docs that we only honor hints on creation
action: ask Nova team if they have already solved this problem. Nova has a spec up for a similar case: https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/complex-anti-affinity-policies.html
action: continue discussion in upcoming meetings and collect the points gathered in a spec

Wednesday 06 April

recordings

For Driver Maintainers: How Cinder (the project) Works

Brian provided a quick overview of Cinder's software development cycle, when key deadlines occur in each cycle, the difference between features and bugfixes, when bugfixes are backportable, things you can do to make sure your patches are ready for review, where key information about the project is located, etc.

link: https://etherpad.opendev.org/p/how-cinder-works

Documenting the Driver Interface

During this session, the team reviewed documentation patches for the driver interface class which was a pending item from last PTG. We got valuable feedback and we are planning to do it again to get these type of changes in. Patches to review:

conclusions

action: do the review session again and review current patches

Third-party CI: testing

We found out that most of our third party CI drivers are not testing encryption. A fixed key should be enough to test it. Later this discussion was generalized to what should be tested in third party CI and following is the list:

compute API -- attachments, bfv
volume API
image API -- glance configured to use cinder as the backend (nice to have but not required)
scenario tests
cinder-tempest-plugin

A suggestion is a python script tool would be helpful to check what the CI promises and which tests are running in tempest (tool will check in tempest.conf and cinder.conf) for which we need to force 3rd party CI systems to store things in a specific location.

upstream tempest tests: https://etherpad.opendev.org/p/cinder-community-CI-tests example downstream tempest tests: https://etherpad.opendev.org/p/cinder-3rd-party-CI-tests-rh

conclusions

action add the current discussion points in the third party CI document
action list of desired tests for comments

Third-party CI: infrastructure

NetApp team provided a great presentation on Software Factory. <placeholder for link of presentation>

Thursday 07 April

recordings

Status update on the Quota fix effort

Gorka did a presentation on the new quota system and provided options to choose which system we would like in cinder. The team was in agreement that we want both drivers: counting & mixed mode and the following points were discussed:

We understand that this increases the code to be maintained and we are ok with it
We don't want a system to find that the performance is BAD and be stuck on that release until we release the more efficient driver
It's acceptable to implement the 2 drivers in 2 different releases, First the counting one

We need to add release note mentioning the potential performance problem for large projects and that they may want to skip a release

conclusions

action: Gorka to repropose the spec for dynamic resource counting + try to implement the mix quota system (if it goes into same release else next release) with a single spec

NetApp ONTAP - migration from ZAPI to REST API

NetApp team presented the solution proposed to make the migration from ZAPI to REST API. We discussed the possibility to backport this feature to older releases to ensure the customers will not have problems on future NetApp ONTAP versions that will not support ZAPI calls

link: https://etherpad.opendev.org/p/zed-ptg-netapp-rest-migration

There was a concern regarding we need to provide backward compatibility when Customers using a particular version of OpenStack but not upgrading ONTAP cluster. For this we can keep ZAPI calls for long and have a config option for customers to opt in to the REST API interface or ZAPI one

Backporting discussion:

Trains doesn't seem to be an option
Reasons that backporting seems reasonable:
- It's not a NEW feature that users will have/be able to see
- It's a feature today (that backend supports both protocols) but next year this will be a bug (OSP won't work with a newly bought system that comes with the latest firmware version).

We had a opposition regarding the backport reasoning:

OpenStack version should say which storage system versions are supported
e0ne made an analogy with Operating System versions and OpenStack: each OpenStack release is tied to particular versions of supported operating systems and python runtimes for example, https://governance.openstack.org/tc/reference/runtimes/wallaby.html
- should take a similar approach with what vendor backend software versions are supported
- don't necessarily need a list of versions published with each OpenStack release, but it is a good guideline for the cinder-stable-maint team to take into account when looking at backports
- we have already done this with Ceph: https://docs.openstack.org/cinder/latest/configuration/block-storage/drivers/ceph-rbd-volume-driver.html

conclusions

action: voting on the last IRC meeting of April (Video) to backport till the Wallaby backport

Update on Image Encryption

We are still waiting for Barbican to implement secret consumers which Prevents from deleting secrets accidentally. Alan has a concern that if barbican has multiple consumers then they need to be from same users and projects

ACLs should help in this case
If you're using an encrypted volume, cinder might allow you to have ownership of it but barbican may not

conclusions

action: Luzi to repropose the cinder spec for Zed

Restoring from cinder backups create thick volumes instead of thin

Gorka explained the problem statement is when we do migration from thin -> thin, it stays thin -- dd is smart enough to do it (if you ask it to) but when doing backup it stays thin but when restoring, it is a thin volume which behaves like a thick volume (because all gaps are filled and takes full space of the volume).

There was a suggestion that SCSI has commands to only read allocated blocks but not sure if it exists. Eric suggested that we can do this simply by reading 0 sized blocks but not write them and write other blocks. Gorka has concern if the 0 sized block is valid and we want to write it. Example of storage that doesn't return Zeroes:

There are different types of TRIM defined by SATA Words 69 and 169 returned from an ATA IDENTIFY DEVICE command:
- Non-deterministic TRIM: Each read command to the logical block address (LBA) after a TRIM may return different data.
- Deterministic TRIM (DRAT): All read commands to the LBA after a TRIM shall return the same data, or become determinate.
- Deterministic Read Zero after TRIM (RZAT): All read commands to the LBA after a TRIM shall return zero.

Further supporting the point of skipping zero blocks is a concern:

SCSI thin HAS to return zeroes as per specs with UNMAPed blocks.
SCSI will also do it on thick if it has LBPRZ set
NVMe seems to be the same with DEALLOCATEd blocks

conclusions

action discuss this again if anyone is interested in taking this up
- what's the deadline? - see spec deadline for e.g. Zed
action find the current cinder code that is currently doing this
- dd conv=sparse
- qemu-img convert (-S defaults to 4k)
When writing back we have to be CAREFUL:
- If it's an existing volume we cannot just skip a specific location (since it could have data)
action Zaitcev to take a look and report - for sure before next PTG, hopefully in a month!
action look for places where rbd sparsify is useful

Cross Project with Manila team

Devstack-plugin-ceph changes

Manila team explained that we're switching to cephadm to install/deploy the ceph cluster. Link to patch: https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484

DNM patch which switches the deployment to cephadm for all services
- https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/834223
Cinder change to not create backup pool when cephadm is used (since it's created automatically)
- https://review.opendev.org/c/openstack/devstack/+/834269

Failures are seen where we are occasionally losing connection to the database in c-vol logs
- https://zuul.opendev.org/t/openstack/build/aa1f3d81f4bd437ab938d2ada5eadb75/log/controller/logs/screen-c-vol.txt#1308-1371
- we may need a different node sizing for the devstack vm

conclusions

action: Test with centos as the devstack host, manila "service" image
action: Split CI changes into different test sets which focus on different services?
- not possible with the integrated jobs (we could run less tests and triage failures some at a time)
Turn on cephadm based deployment in the multi-node job: https://github.com/openstack/devstack-plugin-ceph/blob/b0418e177f6d6059063dd1da2e1a3d228b83b309/.zuul.yaml#L92

Cross project with Glance

New API to expose location information

We have OSSN-0065 describing the security risk of enabling ``show_multiple_locations`` option but this is required for cinder to perform certain optimizations when creating a volume from image (in case of cinder and RBD store). The proposal is to create a new admin only API to provide the location of image and avoid dependency on the config options.

We discussed that the OSSN is still valid and we looked at possible solutions to tackle it:

Create a new locations API and access it via a config group section in cinder.conf for cinder-glance interaction (currently we use it for cinder-nova interaction)
use service role -- when keystone implements it
nova might have an internal endpoint to expose for other services to use i.e. a different endpoint listed in keystone

conclusions

action: write a spec describing the current API design for the new locations API (alternative: nova's approach of using alternative endpoint and service role/token as well)

Clone v2: RBD deferred deletion

Recently cinder has utilized Ceph clone v2 support for its RBD backend, since then if you attempt to delete an image from glance that has a dependent volume, all future uses of that image will fail in error state. Despite the fact that image itself is still inside of Ceph/Glance. This issue is reproducible if you are using ceph client version greater than 'luminous'.

The idea is to implement deferred deletion for glance which will require us to first do the cinder work -- fix all cases of dependencies (volume->snapshot->volume) where we end up not being able to delete a particular resource independently of others. The issue can be seen in the following scenarios:

Creating bootable volume from image causes this issue RBD store (glance) -> RBD backend (cinder) (cinder volume depends on glance image)
new feature of glance/cinder to optimize volume upload to image will cause this issue for glance as well

RBD clone v2 doesn't solve all issues we're facing and needs more work. Currently in cinder we use RBD flatten operation to break dependency chains (after a certain depth) which is still in WIP. The current approach is to fix issues in cinder and apply the same approach in glance.

Currently we are not able to delete glance images that have a volume dependent on it which need to fix it where one option is to flatten when we want to delete the glance image.

conclusions

action fix things on cinder side and see how we can fix glance using the same techniques (also document it since customers face these issues all the time)
- Eric from cinder team and Abhishek from glance team will be driving this effort.

Friday 08 April

recordings

FIPS

Ade Lee is working on new FIPS jobs which run our current job tests with FIPS enabled. Link: https://review.opendev.org/c/openstack/cinder/+/790535

We discussed that the patch is adding a lot of jobs and which ones we wanted to keep:

openstack-tox-functional-py36-fips job is not required
cinder-plugin-ceph-tempest-mn-aa-fips -- the original job is not stable but if this one stays stable, we can have it
tempest-integrated-storage-fips -- not required since lvm-lio-barbican already tests this
devstack-plugin-nfs-tempest-full-fips -- doubts if this is required or not
- FIPS goal is to have as much coverage as possible but if team thinks it doesn't add much value then we can skip it
- would be good to have for the future NFS encryption work
cinder-tempest-lvm-multibackend-fips -- not required
ceph https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/826484 + (example on how to enable it) https://review.opendev.org/c/openstack/devstack-plugin-ceph/+/834223

We saw failures in test_boot_cloned_encrypted_volume tests. Gorka has a patch that should fix this.

Failure: https://zuul.opendev.org/t/openstack/build/27cc72c64e9f405098cebe0e29f3c39f

Fix: https://review.opendev.org/c/openstack/os-brick/+/836391

We also discussed FIPS testing in 3rd party jobs for which we need to encourage vendors to test their CIs with FIPS. The problem here is not all third party CI (or maybe none) run on CentOS so will discuss again when we have ubuntu support available.

conclusions

action: alee to update the current set of FIPS job as per the discussion

implement a force-delete-from-db command

The problem here is if a backend goes dead, we cannot delete it and its stuck volumes, because we cannot talk to it. Currently users directly run mysql commands in DB which is dangerous. There is a patch with the approach to add a cinder-manage command to fix this but it's probably not a good idea because:

it's hard to use in a deployment as we've go to that particular node to run it
not better maintained

Another suggestion is to add an API flag to cinder manage/unmanage command. unmanage removes the volume from DB but doesn't actually remove it from backend which is a good choice for this case but we have to consider case when volumes are part of a group.

We also have the concern that this won't be a good UX since we've to go resource by resource to do this. We should be able to remove all resources from DB for a particular backend. Also non-admins users can't see the host/backend but this is mostly an admin operation.

conclusions

action Eric to write a spec for this

Reporting of storage_protocol

Currently there are inconsistencies in drivers that use the same protocol like:

NFS, nfs
NVMeOF, NVMe-oF, nvmeof
'fibre_channel' vs 'FC'
- cinder/volume/drivers/ibm/ibm_storage/__init__.py

The proposed solution is to handle this via a tuple where we will return in the following format (old_type, new_format)

Patch: https://review.opendev.org/c/openstack/cinder/+/836069

There was a concern that this change could break goodness function (on existing deployments) in scheduler as we're changing the storage_protocol field from string to tuple. The finalized approach is:

get_pools -> string
goodness_function -> not use new mapping
capabilities -> we can deal with mapping

conclusions

action Gorka update the current patch with the mapping logic as discussed -- note we aren't going to fix all places
action Gorka to add constants for standard protocol names (in cinder)

NVMe-oF efforts

Issues: Many things are broken:

There are multiple variants of the connection_properties, and different code paths in the connector, so some bugs are only present in some cases.
On the os-brick connector
- Cannot connect and disconnect a volume, and then connect it again immediately (this used to work)
- Encryption never worked and we didn't know (unless we are using LUKSv1)
- In-use extend broken
- many other things

On Cinder:
- nvmet: Connection on nova is lost to ALL volumes are lost on create_export and remove_export
- nvmet: On create_export we see kernel warnings about the port
- nvmet: Premature call to terminate_connection. We don't currently see an issue because it's based on create_export, but we cannot add host based restrictions because of it (would break on multi-attach)

To fix many of those we stop using nvmetcli as a CLI program and use the nvmet as a Python Library

Testing:
- Missing integration test for some paths
  - Add NVMe-oF devstack support: https://review.opendev.org/c/openstack/devstack/+/814193
- Add support for all combinations to the nvmet target
  - New connection properties format: https://review.opendev.org/c/openstack/cinder/+/836072
  - Shared subsystems: https://review.opendev.org/c/openstack/cinder/+/836074
  - Prerequisite patches:

Add CI

Which combination should we do?
- old conn_info + non shared subsystems
- old conn_info + shared subsystems
- new conn_info + non shared
- new conn_info + shared

NVMe native multipathing support: https://review.opendev.org/c/openstack/os-brick/+/830800

NVMe-oF Agent: https://review.opendev.org/c/openstack/os-brick/+/802691

Suggestion to have a periodic collection of new data paths and inform nova about it and nova updates that info in os-brick

sounds good at a high level
could be tricky in cases like live-migration
need to discuss with nova team

conclusions

action review patches proposed by Gorka
action discuss with kioxia and other NVMe driver vendors to discuss about healing agent

os-brick rootwrap config

currently the os-brick rootwrap config is stored in nova and its also stored in cinder.

Nova: https://github.com/openstack/nova/blob/master/etc/nova/rootwrap.d/compute.filters#6

Cinder: https://github.com/openstack/cinder/blob/master/etc/cinder/rootwrap.d/volume.filters#L34-L38=

The idea is to move these to os-brick so that there is only one definition and so that nova can drop its rootwrap dependency going forward. nova moved to using privsep a long time ago for privileged escalation when required. Sean thinks that these filters are out of sync between nova and os-brick

Completely moving to privsep in os-brick seems not possible since in os-brick we are still keeping rootwrap and privsep is useful for python code but not for running shell commands. we use rootwrap to start privsep in os-brick.

We also need to test glance cinder configuration if we plan any changes in this area as Nova team wants to remove the rootwrap filters from it's requirements which is currently due to os-brick.

Keeping only os-brick filters, nova or cinder (initiator) needs to tell it where to find the filters. os-vif has a config option to handle this case and probably we can map same logic for os-brick.

os-vif implementation: https://github.com/openstack/os-vif/blob/master/os_vif/plugin.py#L70-L89=

conclusions

action (eharney) look into cinder db migration to migrate legacy luks encryption types to current os-brick types
- https://review.opendev.org/c/openstack/os-brick/+/791273/

Bugs found with glance backed by cinder (with multiattach) at high concurrency

Following are the details of the issue:

Operation: Image save
This has even failed with only concurrency 2
Backend? LVM

There are already proposed changes on glance side that should fix this and handle multiattach case efficiently.

conclusions

action review https://review.opendev.org/c/openstack/cinder/+/836753
action Rajat to try to reproduce this scenario and work on possible fix(es)
action review our attachment documentation and add it (if doesn't exist yet)

CI undeleted Tempest artifacts

We are seeing volumes being left on backends after both successful and unsuccessful CI runs. Would love to understand why these are left? Maybe some tests do not run the correct cleanup methods. Is there a way to cleanup the test in tempest/cinder-tempest to ensure they cleanup correctly?

This wasn't discussed in detail and Simon was not around but there are a few comments to the proposed agenda added which Gorka will update to Simon and we will discuss this in a followup meeting like midcycle PTG.

conclusions

action geguileo talk with Simon to ask for information about failing resource and trace back it to the failing test and tell him about Luigi's suggestions

CinderZedPTGSummary

Contents

Introduction

Tuesday 05 April

recordings

Release cadence discussion: tick-tock model

conclusions

Best review practices doc

conclusions

Secure RBAC

conclusions

More "Cloudy" like actions for Cinder

conclusions

Unifying and fixing of Capacity factors calculations and reporting

conclusions

Volume affinity/anti-affinity issues

conclusions

Wednesday 06 April

recordings

For Driver Maintainers: How Cinder (the project) Works

Documenting the Driver Interface

conclusions

Third-party CI: testing

conclusions

Third-party CI: infrastructure

Thursday 07 April

recordings

Status update on the Quota fix effort

conclusions

NetApp ONTAP - migration from ZAPI to REST API

conclusions

Update on Image Encryption

conclusions

Restoring from cinder backups create thick volumes instead of thin

conclusions

Cross Project with Manila team

Devstack-plugin-ceph changes

conclusions

Cross project with Glance

New API to expose location information

conclusions

Clone v2: RBD deferred deletion

conclusions

Friday 08 April

recordings

FIPS

conclusions

implement a force-delete-from-db command

conclusions

Reporting of storage_protocol

conclusions

NVMe-oF efforts

conclusions

os-brick rootwrap config

conclusions

Bugs found with glance backed by cinder (with multiattach) at high concurrency

conclusions

CI undeleted Tempest artifacts

conclusions