CinderAntelopePTGSummary

Introduction
The Sixth virtual PTG for the 2023.1 Antelope cycle of Cinder was conducted from Tuesday, 18th October, 2022 to Friday, 21 October, 2022, 4 hours each day (1300-1700 UTC). This page will provide a summary of all the topics discussed throughout the PTG.



This document aims to give a summary of each session. More context is available on the cinder Antelope PTG etherpad:
 * https://etherpad.opendev.org/p/antelope-ptg-cinder

The sessions were recorded, so to get all the details of any discussion, you can watch/listen to the recording. Links to the recordings are located at appropriate places below.

recordings

 * https://www.youtube.com/watch?v=o26Zx-syG4g
 * https://www.youtube.com/watch?v=5IbCJT4pfow

User survey feedback
https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030843.html https://docs.google.com/spreadsheets/d/1hHC4hg_Zt9FLYYJ7UA9iVomBbExUhhyJd2QpmrriiBQ/edit#gid=0

The user survey feedback comments are summarized in the following 3 sections:

1) Done:
 * Ceph QoS ==> Done in Zed https://docs.openstack.org/releasenotes/cinder/zed.html (code https://review.opendev.org/c/openstack/cinder/+/820027)
 * Shared volume ==> Multi-attach (if driver supports it)
 * "Images deployment" ==> cinder glance_store has been improved over the past few releases; not sure what this means exactly

2) Actionable:
 * Document HA deployments
 * Online retyping between different Ceph RBD backends (clusters) ==> Eric will try to look if libvirt supports it now
 * Improvements on encryption: key rotation, multiple LUKS keys ==> Could explore some ideas

3) Questions:
 * Real Active/Active ==> What does this mean specifically?
 * Live migration with Pure iSCSI ==> This should work in new OpenStack releases
 * Error management:
 * Better attach/detach cleanup on failure ==> For example not leaving volumes on reserved/detaching?
 * Better error handling when failed to create/mount/delete ==> User Messages?
 * Better support for cinder-backup services- especially the filesystem drivers. ==> Bug in driver?
 * Volume Group expansion ==> Extend volumes? Or more operations (which)?

User survey question review
The details provided by operators in the user survey feedback were vague and the team agreed to revise the questions to yield more useful information in the feedback.

The team proposed some good ideas as follows:


 * Ask them to provide driver along with protocol
 * Revise the list to mention driver with protocol for operators to select like NetApp iSCSI, HPE3PAR FC etc
 * Alphabetical ordering would be good and easy to find relevant driver-protocol combination
 * Be specific about the feedback, provide release, launchpad bug link if there is an issue

Based on the points, we've revised the user survey feedback questions in the following etherpad.

https://etherpad.opendev.org/p/antelope-ptg-cinder-user-survey-current-questions

conclusion

 * action: Brian to talk to Allison regarding the revised survey feedback (status after PTG: Done)

SLURP release cadence
The concept of SLURP (Skip Level Upgrade Release Process) was introduced because six month upgrades are difficult infeasible, or undesirable for operators. 2023.1 Antelope will the the first SLURP release of OpenStack. following are some of the details to keep in mind with respect to SLURP and not SLURP releases.


 * every other release will be considered to be a “SLURP (Skip Level Upgrade Release Process)” release
 * Upgrades will be supported between “SLURP” releases, in addition to between adjacent major releases
 * Deployments wishing to move to a one year upgrade cycle will synchronize on a “SLURP” release, and then skip the following “not-SLURP” release
 * Testing: test upgrade between SLURP releases
 * Deprecations: deprecation, waiting, and removal can only happen in “SLURP” releases
 * Data migrations: Part of supporting “SLURP to SLURP” upgrades involves keeping a stable (read “compatible” not “unchanging”) database schema from “SLURP to SLURP”
 * Releasenotes: https://review.opendev.org/c/openstack/project-team-guide/+/843457

For detailed info: https://governance.openstack.org/tc/resolutions/20220210-release-cadence-adjustment.html

Cinder well know Encryption Problem
Presentation: https://docs.google.com/presentation/d/1HOHnO9T3BD1KO5uk_y34aWhMs_A5i9ANPn6zIujQxCk/edit

This has been a complex issue to handle and is being discussed since multiple PTGs. There was another topic discussed, "Allocation size vs requested size for specific storage provider like Dell PowerFlex", which had work items that would act as a initial base for the encryption work having the following work items:


 * Keep two DB fields for the user size and actual size
 * requested size -> user size
 * allocated size -> real size
 * Partition the volume and only the partition with user size should be visible inside the VM

The encryption work will follow up on this initial work to implement the following:


 * Calculate encryption header size to know how much user visible size is in the volume
 * start encrypting the volume on creation instead of doing it on first attachment

conclusions
action: Sofia to work on the encryption work after the initial base work is completed

Operator hour
Christian Rohmann joined us and briefed us about their deployment and the current pain points they have with respect to cinder. They were mostly focused on backup related things and a number of backup topics were discussed.

1) State of non-rbd cinder-backup drivers such as S3

The current problem is, non-rbd backends are not optimized as they copy data chunk by chunk. Also they don't work very well with different type of volumes like thin provisioned, encrypted etc.

To address this issue, we will need to implement a generic block tracking feature. We can split the feature into two: backend and frontend.


 * action: Gorka agrees to do a brain dump of what he looked into for future reference

2) Encryption layer for backups

We can implement an encryption layer on the backups using barbican or a static key. The key scope could be global or per project basis.


 * action: Gorka agrees to write a spec and crohmann can find someone to work on it
 * Update after PTG: Gorka wrote a spec: https://review.opendev.org/c/openstack/cinder-specs/+/862601

3) Backup features we currently have

We also discussed about the backup features we currently have so operators could make good use of it:


 * Limit concurrent backup/restore operations
 * Scale backup service vertically and horizontally to improve performance
 * We can configure the worker processes for backup
 * Run cinder backup in Active-Active

4) Some other issues that were mentioned and are pain points of operators:


 * Cannot recover a backup process if the service dies, would be good to have it continue where it left off
 * RBD image has a lock and there is no way to know who/what left it there
 * Interested in 512e/4k support for RBD
 * https://review.opendev.org/c/openstack/cinder/+/658283
 * Resuming operations after restarts
 * auto migrate volumes in pool

Image cache issue when volume created from cache is of less size than cache
Image cache is a very useful feature that allows us to clone and extend the volume from cache instead of downloading the whole image again and again from glance, hence providing optimization.

The problem we face is, if the first volume created with image cache enabled is a large volume (say 100GB) and subsequent volumes created from same image are small sized (say 10GB) then subsequent volumes will also be created of same size as first volume (i.e. 100GB instead of 10GB).

We discussed Possible ways to fix it:


 * 1) Create the first entry with the requested size, if another request comes in with a smaller size then update the cache entry
 * 2) Create the cache entry with the minimum sized volume required by the image
 * 3) Use a tuple (image-id, size) to query the cache entries and have multiple cache entries associated to a single image

The solution described in 2. seems to be simplest and most straightforward to implement.

conclusion

 * action: Pete (zaitcev) is to identify the original spec for image cache and update it.
 * https://specs.openstack.org/openstack/cinder-specs/specs/liberty/image-volume-cache.html
 * action: Pete (zaitcev) to propose a fix for the issue.

recordings

 * https://www.youtube.com/watch?v=gTTePvdbcYk
 * https://www.youtube.com/watch?v=LwXTci5p-5c
 * https://www.youtube.com/watch?v=yJLHOiH3RIE

image encryption work update
The idea of this feature is to provide support for encrypted images. The work is currently dependent on the secret consumer work on barbican side.

https://review.opendev.org/q/topic:secret-consumers

There is also a patch on os-brick side that can be reviewed: https://review.opendev.org/c/openstack/os-brick/+/709432

conclusion
action: Cinder team to review the os-brick change.

Scenarios where cinder does not check the integrity of image data
The issue is when show_multiple_locations config option is set to True in glance, we allow malicious users to update the location information of a public/shared image.

This parameter is required by cinder to do certain optimizations when using glance cinder store. Here the security comes as a trade off for optimization. Also we don't do image signature verification in the optimized path.

related to this OSSN: https://wiki.openstack.org/wiki/OSSN/OSSN-0090

conclusion

 * action: Brian to file a bug about image signature verification to check for certificates with a link to the nova implementation
 * update after PTG: Brian filed the bug https://bugs.launchpad.net/cinder/+bug/1994150
 * action: check for documentation for optimization vs security tradeoff

Cinder isn't very cloud like with pools enabled
When a deployment has pools being reported by a backend, a pool(s) can get full with volumes. When there are pools that are full, all volumes on the full pool can now have operations fail on them, while volumes in pools that aren't full don't fail. Cinder should mitigate this and migrate the volumes that are being operated on to a pool that has space for the operation. Operators end up doing this manually as the only solution to fix the failed operations. This manual intervention is not scalable and is not 'cloud'. From a customer's perspective, operations on their volumes should just work.

Commands that fail on volumes where pool is full:
 * backup
 * clone
 * snapshot
 * extend

Operation (like clone volume) will take more time since we also need to add the time taken to migrate the volume if the pool selected is full and then clone the volume

Idea of having this configurable to do the automatic migrations or not
 * embed it into the volume type so volumes with a particular type can be migrated
 * Keep it in the volume type vs global config option
 * both can be done together

conclusion

 * action: Walt to write a spec for the extend case (it can be updated later when other operations are also ready)

Migrating from cinderclient to OSC
Current OSC and cinderclient gaps: https://docs.openstack.org/python-openstackclient/latest/cli/decoder.html#cinder-cli

Projects are moving towards openstackclient like nova, neutron, keystone and also glance is planning to. A lot of gaps have been bridged between the cinderclient and openstackclient. The vision is to unify all the project specific clients to OpenStackClient to provide a better UX.

Currently there are 2 ways to do an API call:
 * using python bindings in project specific clients
 * openstacksdk

openstacksdk has 3 layers:
 * resource layer: set and get attributes for resources
 * proxy layer: acts as a server to connect to project(nova, cinder) api
 * cloud layer: combines operation together

conclusion

 * action: Rajat (whoami-rajat) to create a parity doc/sheet for OSC-cinderclient -- check for openstacksdk as well as osc-cinderclient gaps

cinder-backup is blocking nova instance live-migrations
We can't live migrate an instance if a volume backup is going on due to the volume status lock ('backing-up').

Existing spec: https://review.opendev.org/c/openstack/cinder-specs/+/818551/

Gorka thinks we can benefit if we use the attachments API for internal attachment operations like attaching during a migration

conclusion

 * action: ask Christian about other cases where this feature would be useful, because it seems like a large feature just for 1 use case.
 * update after PTG: Christian left a comment on the existing spec stating that the purpose of the spec is to allow all independent operations to run in parallel. https://review.opendev.org/c/openstack/cinder-specs/+/818551/comments/6ade3ca0_d95e489d

recordings

 * https://www.youtube.com/watch?v=Q4XOHKkjSgw
 * https://www.youtube.com/watch?v=X4wqFpnzpC8

SRBAC update
Based on recent discussions with operators regarding scope, we will be confined to only project scope and the personas to be implemented are project admin, project member and project reader. Cinder has already implemented all the personas but is missing the ``scope_type`` restriction in the policies.

Tempest team is testing policies with enforced scope and enforce new defaults as True: https://review.opendev.org/c/openstack/tempest/+/614484

Following are the goals for 2023.1: (first 3 are important)
 * switch enforce scope to True by default
 * switch enforce new defaults to True (maybe, but definitely by 2023.2)
 * add scope type to policies -- for cinder
 * implement service role -- needed in keystone first

conclusion

 * update policy matrix
 * remove previously deprecated stuff -- this is not related to SRBAC but we split one policy into multiple to support granularity and now removing the old one (Eg: create_update policy split into create and update policies)
 * update the policy/base.py file so that generated strings in sample policy.yaml make sense
 * probably also change the names of the "constants" that are defined in the base file and used in all the individual policy files (because those are also misleading)
 * add the scope_type=['project'] to all rules
 * key thing: legacy admin (role:admin) should do everything in the new policy defaults that they could do in the old defaults
 * add tempest testing : https://review.opendev.org/c/openstack/tempest/+/614484

Assisted volume extend for remotefs drivers
In case of filesystem type drivers, they don't support online extend as of today.

There is an approach being discussed to make the online extend synchronous: https://review.opendev.org/c/openstack/nova-specs/+/855490

There are concerns about network failure and cinder might wait some amount of time to get a reply from nova making the operation slower. Another concern was that we might end up with 2 code paths, one with extended event and other with the new synchronous extend for different drivers.

conclusion

 * action: kgube to write a spec clarifying the design changes need to be done on the cinder side to support this for FS drivers

Allocation size vs requested size for specific storage provider like Dell PowerFlex
Dell Powerflex driver works in a different way where it rounds every volume capacity by multiple of 8GB. The problem with this behavior is when the user creates a volume with a requested size, the size shown in DB doesn't reflect what's being created in the backend.

The following approach seems to solve the issue:
 * Keep two DB fields for the user size and actual size
 * requested size -> user size
 * allocated size -> real size
 * Partition the volume and only the partition with user size should be visible inside the VM

only admin will be able to see the actual size, which will require a new microversion to be reported in the response and also real size should be sent in the notification.

conclusion
action: JP to write a spec to mention the details of changes required to be done on:
 * os-brick side (only show user visible size partition)
 * cinder DB side (also including the new microversion to get the user visible and actual size)
 * Optimization: Partition on volume create operation instead of the attach operation on os-brick side
 * The partition should always exist (even when an 8GB volume is requested) because otherwise it will not be able to extend it
 * Sse a partitioning method that allows recursive partitioning
 * On extend Cinder will need to extend that partition
 * os-brick will need to receive a new flag in the connection info to tell it to use the first partition and return it (because users can have volumes with partitions)

os-brick privsep conversion
Nova has shifted from rootwrap to privsep many cycles ago but nova needs to keep the rootwrap files around because of os-brick. Also a recent security issue was reported related to this: https://bugs.launchpad.net/os-brick/+bug/1989008

'Stephen has proposed a couple of patches to migrate os-brick code from using rootwrap to privsep.

conclusion
action: review changes proposed by stephen
 * https://review.opendev.org/c/openstack/os-brick/+/791271
 * https://review.opendev.org/c/openstack/os-brick/+/791272
 * https://review.opendev.org/c/openstack/os-brick/+/791273
 * https://review.opendev.org/c/openstack/os-brick/+/791274
 * https://review.opendev.org/c/openstack/os-brick/+/791275

Configurable soft delete
https://lists.openstack.org/pipermail/openstack-discuss/2022-October/030729.html

The idea is to make the soft delete of database records configurable. This will allow operators, who don't want to store the DB records as soft deleted entries, to remove the DB row along with the resource deletion.

The advantage of this is, there won't be a need to purge the database afterwards every time and also won't fall in the issue of failures during purge due to foreign key dependencies and other issues we've seen in the past.

Stephen has agreed to implement this.

conclusion

 * action: Stephen to propose a patch to implement this
 * action: Cinder team to review and add comments regarding any concerns

recordings

 * https://www.youtube.com/watch?v=mZziVEdRFQ4
 * https://www.youtube.com/watch?v=QU16M-p-IBw

reset state robustification
This feature improves our reset state handling for various resources.

Tushar added a reminder to review his patch series as it has been pending since 3 releases.

conclusion

 * action: Review the patch series proposed by Tushar
 * https://review.opendev.org/q/topic:bp%252Freset-state-robustification

Read only attachments
This seems to be handled on the nova side if we pass access_mode parameter in the connection information which is returned from the driver.

https://opendev.org/openstack/nova/src/commit/b1958b7cfa6b8aca5b76b3f133627bb733d29f00/nova/virt/libvirt/volume/volume.py

There is still value in pursuing this from cinder standpoint to restrict the access mode while exporting/mapping the volume.

conclusion

 * action: Rajat to ask nova team if they're currently doing RO attachments with libvirt (or is it supported with libvirt)
 * Gorka checked the nova code and it seems to be there
 * action: Rajat to continue looking into the lvm + lio + iscsi configuration for single attachment case

User messages
User messages still makes sense as it helps detect failures in asynchronous operations. There were modifications in the message framework that uses context to pass values making the create message calls having less parameters.

Also Brian mentioned that we've a mutually exclusive condition with the passed message and the exception. Exception has a generic map which allows it to be mapped to a generic message so we should only pass either message or the exception.

This would be a good task for an intern if guided properly.

conclusion

 * action: mention the new way of adding user messages (by passing context) in the contributor docs

Cross project with glance
1) Continuing discussion about the security issue https://wiki.openstack.org/wiki/OSSN/OSSN-0090

There was a spec proposed in the Zed cycle that tried to handle the same issue.

new location APIs spec: https://specs.openstack.org/openstack/glance-specs/specs/zed/approved/glance/new-location-info-apis.html

The discussion concluded in modifying the existing spec in the following way:

Location ADD API:
 * Erno mentioned that we only want to add the location once, during image create when the image is in QUEUED state and no-one should be allowed to add location after the image is active
 * This wouldn't require the service role and a basic check on the glance side to check image status should suffice

Location GET API:
 * This would still require the service role since we don't want to expose locations to end users

HTTP Store issue:
 * we want a flag while adding the location that "we want to checksum this image"
 * This will be possible with the new location create API by allowing to pass metadata into it

conclusion

 * action: Rajat to update the spec with latest discussion
 * action: Brian to interact with Keystone team for the service role

2) Refactoring cinder driver

Refactoring glance cinder store to make it more readable.

conclusion

 * action: Rajat to continue working on the existing patch
 * https://review.opendev.org/c/openstack/glance_store/+/843103

Discuss the approach of handling the volume format in FS drivers
When we create a snapshot of an attached volume, the active file changes to the snapshot and the format should be updated to qcow2 on the cinder side along with the active file (which we should be currently doing).

We also need to modify the current snapshot deletion case for blockrebase and blockcommit to handle changes done in the proposed patch.

A tempest test would be good to test this scenario.

conclusion

 * action: Review Melanie's patch -- Rajat to add code for snapshot deletion case handling
 * https://review.opendev.org/c/openstack/cinder/+/857528
 * Add tempest tests for this case and similar scenarios

Reporting of pool capacity factors in API
We are proposing a change that will report the capabilities factors on a per pool basis as API response. It seems like a good idea to be done in the get-pools command.

conclusion

 * action: Review patch proposed by Walt
 * https://review.opendev.org/c/openstack/cinder/+/844601

Mechanism for Marking Pools down
The general idea is to associate a state with pool (like pool is down when it's full) so scheduler can take decisions based on that status of the pool. Eg: we mark a pool down when a pool is 100% full and we will add a PoolDownFilter in scheduler to ensure that scheduler doesn't allow provisioning against pools that are down.

Walt has done this in his repo:
 * https://github.com/sapcc/cinder/blob/stable/train-m3/cinder/volume/manager.py#L2714-L2775
 * https://github.com/sapcc/cinder/blob/stable/train-m3/cinder/scheduler/filters/sap_pool_down_filter.py

This will allow pools to be set to multiple states: UP, DOWN, DISABLE, DRAINING, MAINTENANCE etc

conclusion

 * action: Walt to write a spec to mention the following cases:
 * Add a filter for scheduler to check for pool status
 * In case of multiple managers, add a leader election algorithm to find which manager should report the stats
 * A new API for operators to set the pool status

Backup restore into sparse volumes
The problem we are facing is restoring a backup into a thin volume isn't sparse.

The current approach implemented looks for zero chunks and skip those: https://review.opendev.org/c/openstack/cinder/+/852654

The approach is good for new volumes but will cause issues for existing volumes and we should write zeros into them.

conclusion

 * action: Pete to figure out the case of restoring to existing volumes vs new volumes (for new ones, we can do it as we're doing it now)
 * action: Pete to compare sha256 solution to auto detect zeroes

force delete volume from db
This has been discussed in the Zed PTG. The use case was to be able to delete volumes from cinder when volumes in backend doesn't exist.

The solution discussed was adding argument to unmanage operation to handle case of removing the DB record.

There was one proposal to add this to cinder-manage command line but didn't seem like the right approach

conclusion

 * action: Eric agrees to write a spec for details of the discussed solution