CinderDalmatianPTGSummary

Summary

Improve driver documentation

When proposing new or updating existing drivers, our interface can be confusing. We agreed on an effort to migrate class documentation to the interface definitions, make clear the mandatory and optional methods for drivers, and maintain this documentation going forward.

User visible information in volume types

Volume type metadata and extra specs are not visible to users, making it difficult to ascertain whether a volume type will lead to encryption or replication. Agreement was reached on metadata fields similar to Glance's metadefs, allowing drivers to report capabilities in a standard way and allowing that metadata to become visible to admins and users.

Optional backup driver dependencies

Although volume driver dependencies are optional, those of our backup drivers are listed in requirements and are therefore always installed irrespective of deployment configuration. Agreement was reached to move these dependencies to driver-requirements. This should simplify efforts of both deployers and packagers.

Simplify deployments

A few ideas were proposed to improve our deployment:

When cinder-volume is deployed in an active-active configuration, our backend_host parameter must be updated on every backend to have the same value, we would like to remove this additional step.
Use of predictable names to define tenants, instead of unique IDs.
Have dedicated quote section in our configuration for quote-related options.

Recording - Day 1, Part 2

Response schema validation

A spec to add response schema validation in addition to our existing request validation was proposed. With adequate coverage, clients in various languages could be autogenerated (in theory). No significant objects were raised, agreement to review related patches this cycle.

Documentation of Ceph auth requirements

We do not provide comprehensive and easy-to-find documentation on exactly what authentication/permission expectations there are between different services, leading deployers troubleshoot themselves and not be able to rely on upstream best practices. Agreement to improve this situation in the current cycle.

Cinder backup improvements

An accumulation of bug fixes and performance improvements have stalled in the review queue. We went through the major ones as a team to attempt to bring awareness and unblock the remaining review requirements. See WIKI notes for specific details.

Migrating backups between multiple backends

There is desire to support multiple backup backends where new backups go to a new backend while backups in an old backend remain readable. We want to avoid needing to create a full backup in the new backend to support incremental snapshots (and the associated charge). A spec will be proposed for review and work towards this goal will proceed during this cycle.

Recording - Day 2, Part 1

Cross-project with Glance

In a cross-project collaboration with the Glance team, an improved method for image migration was proposed. Agreement to introduce a new migration operation in Glance. Cinder context was provided and a consensus on path forward was reached.

Recording - Day 2, Part 2

Cross-project with Nova

In a cross-project collaboration with both Nova and Glance, the topic of image encryption was discussed. Dan Smith from the Nova team provided input from the Nova side. Cinder and Nova expect LUKS formatted images, but Glance currently supports GPG encrypted images - requiring re-encryption prior to use. It was noted that LUKS encrypted images can be created without root permission and the Glance team is now looking to drop GPG support and consolidate around LUKS as our unified format.

Recording - Day 3

Active-Active support for NetApp

A NetApp engineer raised questions about adding active-active support to the driver. Questions were answered and that work should proceed in this cycle.

Performance of parallel clone operations

For the clone operation in cinder we're using a single distributed lock to prevent the source volume from being deleted mid-operation. This causes multiple concurrent clone operations to block. Under the right conditions, this can cause a significant performance degradation. Multiple possible solutions were discussed (read-write locks) and a consensus to use the DB was reached. Some details are unclear, awaiting a spec before moving forward.

Volume encryption with user defined keys

Cinder does not currently support encryption with key provided by the user. Users could both manage their own keys and data could be recovered even if keys are lost at the deployment. There are several technical challenges to support this. Several of these hurdles were raised, more thought and research is needed before we have a spec that could be reviewed.

Recording - Day 4

Tuesday 9 April

Improve driver documentation (whoami-rajat)

This was initially discussed during the Zed PTG
- https://wiki.openstack.org/wiki/CinderZedPTGSummary#Documenting_the_Driver_Interface
It would help driver vendors and reviewers if we improve our driver interface documentation
Efforts are tracked here
- https://review.opendev.org/q/topic:%22improve-driver-docstring%22
Meeting Notes:
- when new drivers proposed, interface confusing
- trying to improve doc strings for driver authors to help implementors
- effort started, gorka has setup patch
- activity has paused
- proposal
  - more verbosity
  - mandatory, optional, etc for methods
  - need a list of items, assignments may be required to move this forward
  - this could be in a new etherpad
  - brian proposes moving class docs to interface
- example: https://review.opendev.org/c/openstack/cinder/+/836822/4/cinder/volume/driver.py
- yoga etherpad found that can be reused https://etherpad.opendev.org/p/yoga-volume-driver-API
- gorka and brian propose first pass to define work items
- will revisit this in the PTG in following days
- edit review policy to only need one +2 to help things go faster
- if something does merge that needs fixing, this can be done quickly
  - as long as it's before the doc freeze
- action item: review old etherpad, start assignments tomorrowish

User visible information in volume types

Spec: https://review.opendev.org/c/openstack/cinder-specs/+/909195
many possible ways to realize this are outlined in the spec - please read the options
Needs a decision in which direction this should be going
Presented in cinder meeting previously
usecase: weather a volume type leads to encryption or replication in the backend or not
need encryption and replication extra specs to be visible for users
extra table in DB for extra spec metadata
- whitelist blacklist in API to evaluate if it should be visible for users or not
Alan already worked on user visible extra specs
- it shows replication
  - Josephine says it is partial info
    - replication can be from cinder side - volume type
    - can be from backend side - this part is not visible currently
Brian proposes a metadata field in volume type that shows these properties
- operators might have to duplicate info - in metadata and extra specs
metadefs in glance could be a reference
- catalog of metadata for various resources
- encryption - true/false
Gorka says it seems to be a 2 part problem
- show extra specs to admins
- human operator knows about it but it isn't reported anywhere - like RBD supporting replication
we need drivers to report the backend information and it can be shown to users with user visible extra specs
we need a standardized way to describe the properties like encryption/replication etc -- won't be a good idea to do it description
define keys and validate in metadefs to maintain standardized key names
we can do the metadef things in parts
- define keys that should be used
- then start validating + more features
example from glance/nova image properties
- https://docs.openstack.org/glance/latest/admin/useful-image-properties.html
we will be going with the approach of metadefs
#action: update the spec to leverage metadefs to achieve this functionality
- the metadefs can be part 2 (nice to have) for now

Make backup driver's dependencies optional

Bug: https://bugs.launchpad.net/cinder/+bug/2058601
While dependencies of volume drivers are optional, dependencies of backup drivers are mandatory so are installed always
Can we apply the same approach to avoid installing unused packages ?
- gcs: google-api-python-client, oauth2client https://review.opendev.org/c/openstack/cinder/+/902122
- s3: boto3 https://review.opendev.org/c/openstack/cinder/+/902104
- swift: python-swiftclient TBD
- ceph: rados and rbd are already optional dependencies
- glusterfs, nfs and posix: No additional dependency is needed
proposal to move dependencies to driver-requirements
will packages like boto3 still be checked by requirements checks?
modification to setup.cfg
comments to denote which requirements are for volume drivers vs. backup drivers
will start with s3 patch as POC, others to follow
concerns about centralized checks (licenses, etc) if packages optional
Brian is mentioning about: https://review.opendev.org/c/openstack/requirements/+/915165
- this removes driver specific depnendencies from global requirements and UC
glance_store maintains driver dependencies in test-requirements
- https://github.com/openstack/glance_store/blob/master/test-requirements.txt#L20-L28
we should have a centralized way for monitoring these dependencies so we don't end up in a conflict where glance and cinder are not able to install a package due to version conflict?
team thinks the content of setup.cfg is not checked by the requirements tooling
#action: takashi to revise his patches (separate out the backend driver requirements in setup.cfg by a comment)
- takashi to check on whether requirements job pays attention to setup.cfg "extras"

Simplify deployments

Working in TripleO / Puppet some time, I found a few points which can be improved to simplify deployments
The objective of this topic is gather ideas and agree with the approach
Common backend_host for all backends
- When c-vol is deployed in act/sby mode, backend_host is configured properly so that volumes can be managed after failover
- However backend_host needs to be configured in every single backend section which seems to be redundant
- https://review.opendev.org/c/openstack/cinder/+/789089
  - (whoami-rajat) isn't this achievable with backend_defaults section?
    - No. This does not work now and that's what I'm proposing in the above patch
  - Gorka agrees makes sense to have this ability
    - It would have 3 ways of configuring it (in order of preference)
      - In the specific driver section
      - In the backend_defaults section -> These needs the proposed patch
      - In the host option of the [DEFAULT] section -> This needs dedicated config file for cinder-volume, otherwise it affects the other services like c-scheduler or c-backup
Use predictable names to define internal tenants, instead of ids
- cinder has cinder_internal_tenant_project_id and cinder_internal_tenant_user_id to define internal tenants
  - https://review.opendev.org/c/openstack/cinder/+/797245/12/cinder/context.py
- However this is not predictable until internal tenant is actually created
- https://review.opendev.org/c/openstack/cinder/+/797245
  - TODO(tkajinam) Add config example . Probably before/after
  - TODO(tkajinam) Needs update in https://docs.openstack.org/cinder/latest/admin/image-volume-cache.html
- Dedicated quotas section for quota-related options
  - https://review.opendev.org/c/openstack/cinder/+/772181
  - The same was done in nova, manila. Neutron uses the dedicated [QUOTAS] section. Can we make the same change in cinder to make the interface consistent ?
  - Probably more options can be migrated to dedicated sections to slim up the DEFAULT section ?
    - https://github.com/openstack/cinder/blob/b0f0b9015b9dfa228dff98eeee5116d8eca1c3cc/cinder/opts.py#L240-L326
- Duplicate options to parse http forward header
  - https://launchpad.net/bugs/1967686
  - https://review.opendev.org/c/openstack/cinder/+/836252
Consensus to continue discussion in the reviews, no objections to ideas proposed
priority for reviews may be appropriate, patches have been posted for some time

Add reponse schema validation (and fix gaps in our request body and query string validation

Spec: https://review.opendev.org/c/openstack/cinder-specs/+/914543
We (the SDK team) would like to generate OpenAPI schemas (with extensions to support e.g. microversions and actions) for core OpenStack services. We'd like these to be stored in-tree to ensure things are complete and up-to-date, to avoid landing another large deliverable on the SDK team, and to allow Cinder to fix their own issues
OpenAPI 3.1 is a superset of JSONSchema, which means we can use the same tooling we currently use for this (read: JSON Schema *everywhere*)
This will take the form of a glob of new JSON Schema dictionaries in 'cinder/api/schemas' plus decorators for our various (non-deprecated) APIs
- We will also add decorators to indicate other things that will be useful in spec generation, such as a highlighting removed APIs (HTTP 410 (Gone)) and resource actions (HTTP 400 (Bad Request))
Validation of response bodies will be enabled by a new config options and will be opt-in to avoid breaking production. We will however turn it on by default in our unit, functional and integration tests
Eventually API documentation will switch from os-api-ref to a new tool developed and owned by the SDK team, but this is a stretch goal. When this happens, only the Sphinx extension itself will live out-of-tree (like os-api-ref today)
Advantages:
- We can start auto-generating API bindings in a load of languages (Go, Python, Rust, ...)
- We will have a mechanism to avoid accidentally introducing API changes
- Our API documentation will be (automatically) validated
- We will likely highlight bugs and issues with the API
Disadvantages
- There will be a lot of "large" reviews to be attended to (but see point about self-validation above)
Open questions:
- Do we add schemas for deprecated but not removed APIs/actions (e.g. proxy APIs)
  - Brian suggests no, to discourage use of deprecated APIs
    - example: consistency group https://docs.openstack.org/api-ref/block-storage/v3/#consistency-groups-deprecated
- Do we want to publish the combined API schema via an API endpoint, or just statically in our docs?
  - My gut says no, since this will force API (micro)versioning constraints upon us, making it hard to fix bugs or tighten validation in the schema, change the OpenAPI schema version, etc.
  - Consensus is no (brian, rajat)
- Meta point: ideally, nova and cinder should come to the same decisions on the above questions
  - Nova on Friday
- Does the pending deprecation of Paste affect any of this?
  - https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/message/P4KN435GV2SET2J6IQC773QF37IRFVFW/
Meeting Notes:
- Rajat to review spec, there is some schema validation in cinder already
- what we have does not handle responses
  - we do handle responses but not as part of cinder schema validation, it is done with API sample tests (and also validations on tempest side)
- Stephen will clean up existing patch as POC

Documentation of Ceph auth caps for RBD clients used by Cinder / Glance / Nova is missing or inconsistent

ML post: https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/E3VYY24HUGBNH7626ALOGZMJRVX5VOSZ/
leading to bug report: https://bugs.launchpad.net/glance/+bug/2051244
Let us write down the proper caps for once and document them!
- Ensure we only grant services permissions and to pool they actually require
- Let's properly use the Ceph managed caps / profiles (https://docs.ceph.com/en/latest/rados/operations/user-management/#authorization-capabilities) like "rbd".
  - Especially the resulting blocklist capability is important to now run into weird lock issues (https://docs.ceph.com/en/latest/rbd/rbd-exclusive-locks/#rbd-exclusive-locks)
Each project should use correct profile and documentation should be consistent.
What is current state?
Someone to go through deployment and document profiles and premissions necessary, and then translate that into documentation, this would be helpful
Request seems very reasonable, no objections.
Would be great if devstack-plugin-ceph could deploy in this way as well, for consistency.
crohmann: Kindly keep me in the loop, I'd really love this cleaned up and glady help writing the patches / docs once proper caps have been determined!

Cinder Backup improvements

There is a pile of bugfixes and optimizations to cinder-backup. (very sorry about the delays -- zaitcev)
Bugfixes
- Optimize getting parent backup for new incremental
  - https://review.opendev.org/c/openstack/cinder/+/484729
- Speed up starting cinder-backup
  - https://review.opendev.org/c/openstack/cinder/+/657543
    - zaitcev took over so cannot vote +2, someone else please review
- Ceph: Catch more failure conditions on volume backup
  - https://review.opendev.org/c/openstack/cinder/+/897245
- Fix snapshot status is always backing-up
  - https://review.opendev.org/c/openstack/cinder/+/806665
  - Cinder backup DB table lacks any indices
  - https://bugs.launchpad.net/cinder/+bug/2048396
  - This should already be merged that adds those indexes: https://review.opendev.org/c/openstack/cinder/+/819669
    - so, code is probably out of date at Christian's installation
      - https://paste.opendev.org/show/bRnPc1qFNuu5jNw4t8eQ/
    - crohmann: I did indeed miss this patch, yes. But this "only" add an index for "deleted", would it not make sense to have indexes on volume_id, project_id and also the deleted column to cover the most common selections?
- Idea to use this week's Review meeting on Friday to focus on pending backup patches
  - https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open,50
  - https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open+branch:master+label:Verified%3E%3D1+not+label%3ACode-Review%3C%3D-1
Features / Specs
- Ceph: Add option to keep only last n snapshots per backup
  - This was discussed over a loooong period of time and I'd appreciate for this to be either merged or rejected for good
  - Another installation was asking about this as well https://review.opendev.org/c/openstack/cinder/+/810457/comments/19fa14ba_e49102ad
  - Bug: https://bugs.launchpad.net/cinder/+bug/1703011
  - Topic: https://review.opendev.org/q/topic:ceph_keep_snapshots
    - Patch: https://review.opendev.org/c/openstack/cinder/+/810457
- Spec to introduce new backup_status field for volume
  - https://review.opendev.org/c/openstack/cinder-specs/+/868761
  - Accepted and merged a while ago, but not implemented yet.
  - jbernard and happystacker mentioned in past weeklies they would pick this one up
- Volume backup timeout for large volumes when using backend based on chunkeddriver
  - Actually a bug, https://bugs.launchpad.net/cinder/+bug/1918119
  - See Enrico Bocchi and Luis Fernandez Alvarez from Cern about their endevours with Cinder-Backup (Recording: https://www.youtube.com/watch?v=ni-UgftgAy0)
  - Jon Bernard from RedHat offered to look into the performance issues with chunkeddriver we discussed before https://etherpad.opendev.org/p/cinder-bobcat-meetings#L464
  - Later zaitcev mentioned he was going to work on this issue.
    - (confirmed, although it took a back seat to encrypted backups for now)
      - Which are also great to have! Really, thanks for working on them!
      - But backups via chunked driver also have to be fast enough for them to even be considered instead of ceph/rbd
    - All backup drivers targeting object storage or NFS, which are based on the abstract chunked driver, are really slow - so only Ceph RBD is fast enough to be used for realistically large volumes
    - This causes cinder-backup to not really support "offsite-backups"
    - Bug https://bugs.launchpad.net/cinder/+bug/1918119
    - Cern talk about it being too slow to use
    - See past discussion https://meetings.opendev.org/meetings/cinder/2024/cinder.2024-01-24-14.01.log.html#l-64
    - zaitcev was working on this last I believe, but jbernard was also showing some interest in fixing this
  - The "chunkeddriver" either in the way it's implemented or even the whole approach needs to be reworked to make it feasable to use offsite-locations
- mypy, Ceph: correct types of file-likes in backup and restore (by zaitcev)
  - https://review.opendev.org/c/openstack/cinder/+/866093
  - See discussion around moderizing Python stack (e.g by adding Types) - https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/4V63CHMZ4GPC4IYN7JCJPVKHLZAHN5BL/
  - Could / should we add typing to cinder (-backup) and will you accept patches? - YES
    - That would be great to have, we would like to have typing in all Cinder if possible
zaitcev is working on encrypting backups in chunkdriver. See spec https://review.opendev.org/c/openstack/cinder-specs/+/862601 and review placeholder https://review.opendev.org/c/openstack/cinder/+/915192

Migrate cinder backup backend transparently

use case: migrate a backup backend data (like swift) to another (like S3) without downtime
- new data should go to s3 but old data should be intact
It's not really a migration in the sense of moving data from one backend to the other
- It's about using a new backup backend and still have the old one readable
- Ideally they want to be able to do incremental backups in the new backend and restore using both (following the chain)
Alan mentions about supporting multiple backup backends as a solution
- this will allow access to old backups (with the swift backend)
- new backups will go into the new backend (with the s3 backend)
- https://specs.openstack.org/openstack/cinder-specs/specs/victoria/backup-backends-configuration.html
Gorka mentions the need of backup_type (similar to volume_type for volume backends)
this is an example of spec template
- https://github.com/openstack/cinder-specs/blob/master/specs/template.rst
We used to have another spec that tried to do something similar
- https://specs.openstack.org/openstack/cinder-specs/specs/untargeted/generic-backup-implementation.html
Old patch that started this work: https://review.opendev.org/c/openstack/cinder/+/519355
limitation: do incremental backup on new backend (s3) which has parent backup in old backend (swift)
- could be a followup feature
- workaround: create a full backup in new backend -- customers will be charged more (2TB full backup vs 200MB incremental backup)
Minimum implementation
- create backup_type and associate it to a particular backup backend
- In creae backup, scheduler will check if the backup_type is associated to a backend, then create it there else randomize the create between the available backup backends
- Configuration of enabled_backup_backends and the sections? Maybe this can be optional and we just use [DEFAULT] for now in all of them
- create, delete, restore etc features should work based on the backup type
#action: write a high level spec for discussion based on the above points
reach out to us on #openstack-cinder channel on OFTC

Wednesday 10 April

general info/observations

QA team is discussing Replace paramiko with libssh2
cinder CI updates for Dalmatian: https://review.opendev.org/q/topic:%22dalmatian-ci-update%22
we should probably discuss this FIXME: https://opendev.org/openstack/cinder-tempest-plugin/src/commit/a259e8d43eb2ee78fe3ad6d0424ecd42bfd18cbd/.zuul.yaml#L11-L13
- action: rosmaita - deprecate the tgt iscsi helper

Continue doc conversation from day before

https://etherpad.opendev.org/p/document-driver-interface
Deadlines: To be decided in the next upstream meeting

Cross project with glance about same store image migration (whoami-rajat)

Meeting will be in the Cinder "room"
- https://etherpad.opendev.org/p/apr2024-ptg-glance#L157
- https://review.opendev.org/c/openstack/glance-specs/+/914639
- Generic Image migration + Optimization for same store migration
- WIP Spec (high level details for discussion purposes): https://review.opendev.org/c/openstack/glance-specs/+/914639
- Description:
  - Currently the preferred way of migration in glance is two step
    - 1. Copy the image from source store to destination store
    - 2. delete the image from source store
  - As i can see, it has two problems with this approach:
    - 1. requires manual intervention from operators after waiting for image copy to finish and then delete the image from source store
    - 2. No way for stores to optimize the operation
  - This can be addressed by introducing a migration operation in glance which will have two features
    - 1. A generic migration workflow where we will perform the image copy and delete in the same API operation
    - 2. Allow an interface for glance store methods to optimize the migration if possible else it will fall back to the generic workflow

Cross project session with nova and glance about in-flight image encryption (rosmaita)

https://review.opendev.org/c/openstack/glance-specs/+/609667
Nova and Cinder require LUKS format
- for encryption, cinder gets the binary secret (a byte array) from barbican and converts it into a string of hex digits (which is used as the luks passphrase), Nova doesn't - it generates passphrases directly and stores them in Barbican
current proposal/patchset:
- Glance uses GPG encryption, conversion to LUKS could happen at upload-time to prevent performance impact on every boot
- https://specs.openstack.org/openstack/cinder-specs/specs/zed/image-encryption.html#proposed-change
- using container_format seems good for LUKS but it's not technically correct since container format should sit below the image format and not above it
new proposal (as interpreted by mhen)
- get rid of GPG encryption and vastly simplify the patchset by using LUKS encryption for images like Cinder and Nova already do when creating images of encrypted disks
  - as proposed by Dan Smith (Nova)
- figure out which metadata to add to Glance images to properly reflect the new use cases
  - maybe streamline existing attributes like "cinder_encryption_key_id" and rename it to the same as Nova and the (to-be-introduced) user-side are using
- Cinder would keep its behavior for Cinder-created encrypted volumes
  - Cinder lets Barbican generate a binary key as secret_type=symmetric
  - Cinder uses binascii.hexlify() on the binary key and passes the result as passphrase to LUKS
  - any image created from such volume would keep the reference to the key that is marked as secret_type=symmetric and would trigger the binascii.hexlify() call before use
- support for Nova- or user-supplied images is to be added to Cinder
  - secret_type=passphrase indicates that the Barbican secret carries the final passphrase (not binary), this instructs Cinder *not* to binascii.hexlify() the secret payload before passing it to LUKS
  - users can use qemu tooling to create a LUKS image and put the passphrase into Barbican by specifying secret_type=passphrase
    - as an alternative users can have Barbican create the key, do hexlify themselves and specify secret_type=symmetric to imitate what Cinder does, if they want the entropy of Barbican (e.g. HSM)
  - Cinder can use the LUKS encryption contained in the image as-is and copy the LUKS-encrypted blocks 1:1 into the volume backend storage, it just needs to differentiate between secret_types to handle the key/passphrase correctly
    - it already does this when restoring images it created itself during the "os-volume_upload_image" Cinder API action from encrypted volumes

Thursday 11 April

Add Active-Active support for NetApp iSCSI/FCP drivers

As part of this release, we will implement active-active support for NetApp iSCSI and FCP drivers. This will allow users to configure NetApp iSCSI/FCP backends in cinder clustered environments.
failover and failover_completed methods will be implemented as proposed in this spec https://specs.openstack.org/openstack/cinder-specs/specs/ocata/ha-aa-replication.html
geguileo: Sounds good, and they already have experience since they did it for the NFS driver
Mind the release schedule and deadlines (feature freeze)

Discuss bug https://bugs.launchpad.net/cinder/+bug/2060830

create volume from volume/snapshot creates a lock with delete_volume
operations are serialized due to single lock for clone operations
lock prevents source volume from being deleted during operation (same for snapshots)
other operations managed by status field to handle this
why do we use a shared lock for this particilar one?
we could update the status as with other operations
problem: no reference counting for nested operations, no way to reach original state
gorka empathizes
a read-write lock would be quite approproate for this case
tooz makes this ^ complicated
consensus growing around cinder-specific solution using the DB to implement a rw-lock
DB will be mysql/mariadb as it's the one we officially support
alternative is to set status field and implement ref counts, not ideal but consistent with code base
#action: revisit in upstream meeting if anyone interested can assemble a solution using db locking semantics

Volume encryption with user defined keys

Spec: https://review.opendev.org/c/openstack/cinder-specs/+/914513/1
Cinder currently lags support the API to create a volume with a predefined (e.g. already stored in Barbican) encryption key.
Meeting Notes:
- The idea is to create volumes from pre-existing keys from barbican
- The preferred way is to ask cinder to create an encrypted volume and cinder communicates with barbican to create the key
- Cinder creates the passphrase by converting the barbican key into a hex value (binascii.hexlify())
- The user will not be able to decrypt the volume with their keys in barbican since they need to mimic cinder's procedure for encryption using the custom passphrase as long as Cinder strictly transforms it using binascii.hexlify() like it does currently
- here's the info about the secret types:
  - https://docs.openstack.org/barbican/latest/api/reference/secret_types.html
  - https://docs.openstack.org/cinder/latest/contributor/contributing.html (it outlines how we work and the role of specs, etc)
- The proposal is to have the development in parallel of
  - 1. API change to allow creating volumes with pre-existing secret in barbican
    - implement passing Barbican secret ids during volume creation API call and skip secret order create done by Cinder internally
    - but check received secrets in regards to their metadata (cipher, mode, bit length) to be compatible with the volume type's encryption specification
      - (which Cinder didn't need to do before since it always created secrets itself in a closed ecosystem)
  - 2. support for "passphrase" secret types from Barbican which will circumvent Cinder's binascii.hexlify() conversion and used as passphrases as-is in Cinder
    - currently only "symmetric" is supported, which is transformed using binascii.hexlify() by Cinder before passing it to LUKS
- need to review the documentation around this, particularly, what an encryption type is and what the fields are used for (admin facing), and also what needs to be supplied as end user facing docs

Review CI updates

https://review.opendev.org/q/topic:%22dalmatian-ci-update%22

Friday 12 April

Review development cycle processes & schedule & documentation --> postpone

https://docs.openstack.org/cinder/latest/contributor/index.html#managing-development
some stuff is out of date, some things may be missing
should be interesting to anyone who wants to know what goes into making cinder and all it associated products

Cinder Backup improvements

There is a pile of bugfixes and optimizations to cinder-backup. (very sorry about the delays -- zaitcev)
Improvements
- Optimize getting parent backup for new incremental
  - https://review.opendev.org/c/openstack/cinder/+/484729
- Speed up starting cinder-backup
  - https://review.opendev.org/c/openstack/cinder/+/657543
    - zaitcev took over so cannot vote +2, someone else please review

Bugfixes
- Ceph: Catch more failure conditions on volume backup
  - https://review.opendev.org/c/openstack/cinder/+/897245
- Fix snapshot status is always backing-up
  - https://review.opendev.org/c/openstack/cinder/+/806665
- Cinder backup DB table lacks any indices -- this is already done
  - https://bugs.launchpad.net/cinder/+bug/2048396
  - This should already be merged that adds those indexes: https://review.opendev.org/c/openstack/cinder/+/819669
    - so, code is probably out of date at Christian's installation
      - https://paste.opendev.org/show/bRnPc1qFNuu5jNw4t8eQ/
    - crohmann: I did indeed miss this patch, yes. But this "only" add an index for "deleted", would it not make sense to have indexes on volume_id, project_id and also the deleted column to cover the most common selections?
      - It also adds it on the project_id
      - Adding an index on the volume_id makes sense for deployments with a lot of backups per project (<- volume ?)
- Idea to use this week's Review meeting on Friday to focus on pending backup patches
  - https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open,50
  - https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open+branch:master+label:Verified%3E%3D1+not+label%3ACode-Review%3C%3D-1
Features / Specs
- Ceph: Add option to keep only last n snapshots per backup
  - This was discussed over a loooong period of time and I'd appreciate for this to be either merged or rejected for good
  - Another installation was asking about this as well https://review.opendev.org/c/openstack/cinder/+/810457/comments/19fa14ba_e49102ad
  - Bug: https://bugs.launchpad.net/cinder/+bug/1703011
  - Topic: https://review.opendev.org/q/topic:ceph_keep_snapshots
    - Patch: https://review.opendev.org/c/openstack/cinder/+/810457
- Spec to introduce new backup_status field for volume
  - https://review.opendev.org/c/openstack/cinder-specs/+/868761
  - Accepted and merged a while ago, but not implemented yet.
  - Jbernard and happystacker mentioned in past weeklies they would pick this one up
- Volume backup timeout for large volumes when using backend based on chunkeddriver
  - Actually a bug, https://bugs.launchpad.net/cinder/+bug/1918119
  - See Enrico Bocchi and Luis Fernandez Alvarez from Cern about their endevours with Cinder-Backup
  - Jon Bernard from RedHat offered to look into the performance issues with chunkeddriver we discussed before https://etherpad.opendev.org/p/cinder-bobcat-meetings#L464
  - Later zaitcev mentioned he was going to work on this issue.
    - (confirmed, although it took a back seat to encrypted backups for now)
    - Which are also great to have! Really, thanks for working on them!
    - But backups via chunked driver also have to be fast enough for them to even be considered instead of ceph/rbd
  - All backup drivers targeting object storage or NFS, which are based on the abstract chunked driver, are really slow - so only Ceph RBD is fast enough to be used for realistically large volumes
    - This causes cinder-backup to not really support "offsite-backups"
    - Bug https://bugs.launchpad.net/cinder/+bug/1918119
    - Cern talk about it being too slow to use
    - See past discussion https://meetings.opendev.org/meetings/cinder/2024/cinder.2024-01-24-14.01.log.html#l-64
    - zaitcev was working on this last I believe, but jbernard was also showing some interest in fixing this
  - The "chunkeddriver" either in the way it's implemented or even the whole approach needs to be reworked to make it feasable to use offsite-locations
- mypy, Ceph: correct types of file-likes in backup and restore (by zaitcev)
  - https://review.opendev.org/c/openstack/cinder/+/866093
  - See discussion around moderizing Python stack (e.g by adding Types) - https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/4V63CHMZ4GPC4IYN7JCJPVKHLZAHN5BL/
  - Could / should we add typing to cinder (-backup) and will you accept patches? - YES
    - That would be great to have, we would like to have typing in all Cinder if possible
zaitcev is working on encrypting backups in chunkdriver. See spec https://review.opendev.org/c/openstack/cinder-specs/+/862601 and review placeholder https://review.opendev.org/c/openstack/cinder/+/915192

Just FYI: new Spec for Image Encryption with LUKS

https://review.opendev.org/c/openstack/glance-specs/+/915726
Should there be a spec for Cinder too?

CinderDalmatianPTGSummary

Contents

Introduction

Summary

Improve driver documentation

User visible information in volume types

Optional backup driver dependencies

Simplify deployments

Response schema validation

Documentation of Ceph auth requirements

Cinder backup improvements

Migrating backups between multiple backends

Cross-project with Glance

Cross-project with Nova

Active-Active support for NetApp

Performance of parallel clone operations

Volume encryption with user defined keys

Tuesday 9 April

Improve driver documentation (whoami-rajat)

User visible information in volume types

Make backup driver's dependencies optional

Simplify deployments

Add reponse schema validation (and fix gaps in our request body and query string validation

Documentation of Ceph auth caps for RBD clients used by Cinder / Glance / Nova is missing or inconsistent

Cinder Backup improvements

Migrate cinder backup backend transparently

Wednesday 10 April

general info/observations

Continue doc conversation from day before

Cross project with glance about same store image migration (whoami-rajat)

Cross project session with nova and glance about in-flight image encryption (rosmaita)

Thursday 11 April

Add Active-Active support for NetApp iSCSI/FCP drivers

Discuss bug https://bugs.launchpad.net/cinder/+bug/2060830

Volume encryption with user defined keys

Review CI updates

Friday 12 April

Review development cycle processes & schedule & documentation --> postpone

Cinder Backup improvements

Just FYI: new Spec for Image Encryption with LUKS