Difference between revisions of "CinderDalmatianPTGSummary"

Revision as of 20:42, 22 April 2024

Tuesday 9 April

TODO

Wednesday 10 April

general info/observations

QA team is discussing Replace paramiko with libssh2
cinder CI updates for Dalmatian: https://review.opendev.org/q/topic:%22dalmatian-ci-update%22
we should probably discuss this FIXME: https://opendev.org/openstack/cinder-tempest-plugin/src/commit/a259e8d43eb2ee78fe3ad6d0424ecd42bfd18cbd/.zuul.yaml#L11-L13
- action: rosmaita - deprecate the tgt iscsi helper

Continue doc conversation from day before

https://etherpad.opendev.org/p/document-driver-interface
Deadlines: To be decided in the next upstream meeting

Cross project with glance about same store image migration (whoami-rajat)

Meeting will be in the Cinder "room"
- https://etherpad.opendev.org/p/apr2024-ptg-glance#L157
- https://review.opendev.org/c/openstack/glance-specs/+/914639
- Generic Image migration + Optimization for same store migration
- WIP Spec (high level details for discussion purposes): https://review.opendev.org/c/openstack/glance-specs/+/914639
- Description:
  - Currently the preferred way of migration in glance is two step
    - 1. Copy the image from source store to destination store
    - 2. delete the image from source store
  - As i can see, it has two problems with this approach:
    - 1. requires manual intervention from operators after waiting for image copy to finish and then delete the image from source store
    - 2. No way for stores to optimize the operation
  - This can be addressed by introducing a migration operation in glance which will have two features
    - 1. A generic migration workflow where we will perform the image copy and delete in the same API operation
    - 2. Allow an interface for glance store methods to optimize the migration if possible else it will fall back to the generic workflow

Cross project session with nova and glance about in-flight image encryption (rosmaita)

https://review.opendev.org/c/openstack/glance-specs/+/609667
Nova and Cinder require LUKS format
- for encryption, cinder gets the binary secret (a byte array) from barbican and converts it into a string of hex digits (which is used as the luks passphrase), Nova doesn't - it generates passphrases directly and stores them in Barbican
current proposal/patchset:
- Glance uses GPG encryption, conversion to LUKS could happen at upload-time to prevent performance impact on every boot
- https://specs.openstack.org/openstack/cinder-specs/specs/zed/image-encryption.html#proposed-change
- using container_format seems good for LUKS but it's not technically correct since container format should sit below the image format and not above it
new proposal (as interpreted by mhen)
- get rid of GPG encryption and vastly simplify the patchset by using LUKS encryption for images like Cinder and Nova already do when creating images of encrypted disks
  - as proposed by Dan Smith (Nova)
- figure out which metadata to add to Glance images to properly reflect the new use cases
  - maybe streamline existing attributes like "cinder_encryption_key_id" and rename it to the same as Nova and the (to-be-introduced) user-side are using
- Cinder would keep its behavior for Cinder-created encrypted volumes
  - Cinder lets Barbican generate a binary key as secret_type=symmetric
  - Cinder uses binascii.hexlify() on the binary key and passes the result as passphrase to LUKS
  - any image created from such volume would keep the reference to the key that is marked as secret_type=symmetric and would trigger the binascii.hexlify() call before use
- support for Nova- or user-supplied images is to be added to Cinder
  - secret_type=passphrase indicates that the Barbican secret carries the final passphrase (not binary), this instructs Cinder *not* to binascii.hexlify() the secret payload before passing it to LUKS
  - users can use qemu tooling to create a LUKS image and put the passphrase into Barbican by specifying secret_type=passphrase
    - as an alternative users can have Barbican create the key, do hexlify themselves and specify secret_type=symmetric to imitate what Cinder does, if they want the entropy of Barbican (e.g. HSM)
  - Cinder can use the LUKS encryption contained in the image as-is and copy the LUKS-encrypted blocks 1:1 into the volume backend storage, it just needs to differentiate between secret_types to handle the key/passphrase correctly
    - it already does this when restoring images it created itself during the "os-volume_upload_image" Cinder API action from encrypted volumes

Thursday 11 April

Add Active-Active support for NetApp iSCSI/FCP drivers

As part of this release, we will implement active-active support for NetApp iSCSI and FCP drivers. This will allow users to configure NetApp iSCSI/FCP backends in cinder clustered environments.
failover and failover_completed methods will be implemented as proposed in this spec https://specs.openstack.org/openstack/cinder-specs/specs/ocata/ha-aa-replication.html
geguileo: Sounds good, and they already have experience since they did it for the NFS driver
Mind the release schedule and deadlines (feature freeze)

Discuss bug https://bugs.launchpad.net/cinder/+bug/2060830

create volume from volume/snapshot creates a lock with delete_volume
operations are serialized due to single lock for clone operations
lock prevents source volume from being deleted during operation (same for snapshots)
other operations managed by status field to handle this
why do we use a shared lock for this particilar one?
we could update the status as with other operations
problem: no reference counting for nested operations, no way to reach original state
gorka empathizes
a read-write lock would be quite approproate for this case
tooz makes this ^ complicated
consensus growing around cinder-specific solution using the DB to implement a rw-lock
DB will be mysql/mariadb as it's the one we officially support
alternative is to set status field and implement ref counts, not ideal but consistent with code base
#action: revisit in upstream meeting if anyone interested can assemble a solution using db locking semantics

Volume encryption with user defined keys

Spec: https://review.opendev.org/c/openstack/cinder-specs/+/914513/1
Cinder currently lags support the API to create a volume with a predefined (e.g. already stored in Barbican) encryption key.
Meeting Notes:
- The idea is to create volumes from pre-existing keys from barbican
- The preferred way is to ask cinder to create an encrypted volume and cinder communicates with barbican to create the key
- Cinder creates the passphrase by converting the barbican key into a hex value (binascii.hexlify())
- The user will not be able to decrypt the volume with their keys in barbican since they need to mimic cinder's procedure for encryption using the custom passphrase as long as Cinder strictly transforms it using binascii.hexlify() like it does currently
- here's the info about the secret types:
  - https://docs.openstack.org/barbican/latest/api/reference/secret_types.html
  - https://docs.openstack.org/cinder/latest/contributor/contributing.html (it outlines how we work and the role of specs, etc)
- The proposal is to have the development in parallel of
  - 1. API change to allow creating volumes with pre-existing secret in barbican
    - implement passing Barbican secret ids during volume creation API call and skip secret order create done by Cinder internally
    - but check received secrets in regards to their metadata (cipher, mode, bit length) to be compatible with the volume type's encryption specification
      - (which Cinder didn't need to do before since it always created secrets itself in a closed ecosystem)
  - 2. support for "passphrase" secret types from Barbican which will circumvent Cinder's binascii.hexlify() conversion and used as passphrases as-is in Cinder
    - currently only "symmetric" is supported, which is transformed using binascii.hexlify() by Cinder before passing it to LUKS
- need to review the documentation around this, particularly, what an encryption type is and what the fields are used for (admin facing), and also what needs to be supplied as end user facing docs

Review CI updates

https://review.opendev.org/q/topic:%22dalmatian-ci-update%22

Friday 12 April

Review development cycle processes & schedule & documentation --> postpone

https://docs.openstack.org/cinder/latest/contributor/index.html#managing-development
some stuff is out of date, some things may be missing
should be interesting to anyone who wants to know what goes into making cinder and all it associated products

Cinder Backup improvements

There is a pile of bugfixes and optimizations to cinder-backup. (very sorry about the delays -- zaitcev)
Improvements
- Optimize getting parent backup for new incremental
  - https://review.opendev.org/c/openstack/cinder/+/484729
- Speed up starting cinder-backup
  - https://review.opendev.org/c/openstack/cinder/+/657543
    - zaitcev took over so cannot vote +2, someone else please review

Bugfixes
- Ceph: Catch more failure conditions on volume backup
  - https://review.opendev.org/c/openstack/cinder/+/897245
- Fix snapshot status is always backing-up
  - https://review.opendev.org/c/openstack/cinder/+/806665
- Cinder backup DB table lacks any indices -- this is already done
  - https://bugs.launchpad.net/cinder/+bug/2048396
  - This should already be merged that adds those indexes: https://review.opendev.org/c/openstack/cinder/+/819669
    - so, code is probably out of date at Christian's installation
      - https://paste.opendev.org/show/bRnPc1qFNuu5jNw4t8eQ/
    - crohmann: I did indeed miss this patch, yes. But this "only" add an index for "deleted", would it not make sense to have indexes on volume_id, project_id and also the deleted column to cover the most common selections?
      - It also adds it on the project_id
      - Adding an index on the volume_id makes sense for deployments with a lot of backups per project (<- volume ?)
- Idea to use this week's Review meeting on Friday to focus on pending backup patches
  - https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open,50
  - https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open+branch:master+label:Verified%3E%3D1+not+label%3ACode-Review%3C%3D-1
Features / Specs
- Ceph: Add option to keep only last n snapshots per backup
  - This was discussed over a loooong period of time and I'd appreciate for this to be either merged or rejected for good
  - Another installation was asking about this as well https://review.opendev.org/c/openstack/cinder/+/810457/comments/19fa14ba_e49102ad
  - Bug: https://bugs.launchpad.net/cinder/+bug/1703011
  - Topic: https://review.opendev.org/q/topic:ceph_keep_snapshots
    - Patch: https://review.opendev.org/c/openstack/cinder/+/810457
- Spec to introduce new backup_status field for volume
  - https://review.opendev.org/c/openstack/cinder-specs/+/868761
  - Accepted and merged a while ago, but not implemented yet.
  - Jbernard and happystacker mentioned in past weeklies they would pick this one up
- Volume backup timeout for large volumes when using backend based on chunkeddriver
  - Actually a bug, https://bugs.launchpad.net/cinder/+bug/1918119
  - See Enrico Bocchi and Luis Fernandez Alvarez from Cern about their endevours with Cinder-Backup
  - Jon Bernard from RedHat offered to look into the performance issues with chunkeddriver we discussed before https://etherpad.opendev.org/p/cinder-bobcat-meetings#L464
  - Later zaitcev mentioned he was going to work on this issue.
    - (confirmed, although it took a back seat to encrypted backups for now)
    - Which are also great to have! Really, thanks for working on them!
    - But backups via chunked driver also have to be fast enough for them to even be considered instead of ceph/rbd
  - All backup drivers targeting object storage or NFS, which are based on the abstract chunked driver, are really slow - so only Ceph RBD is fast enough to be used for realistically large volumes
    - This causes cinder-backup to not really support "offsite-backups"
    - Bug https://bugs.launchpad.net/cinder/+bug/1918119
    - Cern talk about it being too slow to use
    - See past discussion https://meetings.opendev.org/meetings/cinder/2024/cinder.2024-01-24-14.01.log.html#l-64
    - zaitcev was working on this last I believe, but jbernard was also showing some interest in fixing this
  - The "chunkeddriver" either in the way it's implemented or even the whole approach needs to be reworked to make it feasable to use offsite-locations
- mypy, Ceph: correct types of file-likes in backup and restore (by zaitcev)
  - https://review.opendev.org/c/openstack/cinder/+/866093
  - See discussion around moderizing Python stack (e.g by adding Types) - https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/4V63CHMZ4GPC4IYN7JCJPVKHLZAHN5BL/
  - Could / should we add typing to cinder (-backup) and will you accept patches? - YES
    - That would be great to have, we would like to have typing in all Cinder if possible
zaitcev is working on encrypting backups in chunkdriver. See spec https://review.opendev.org/c/openstack/cinder-specs/+/862601 and review placeholder https://review.opendev.org/c/openstack/cinder/+/915192

Just FYI: new Spec for Image Encryption with LUKS

https://review.opendev.org/c/openstack/glance-specs/+/915726
Should there be a spec for Cinder too?

@@ Line 24: / Line 24: @@
 ==Wednesday 10 April==
-TODO
+===general info/observations===
+*    QA team is discussing Replace paramiko with libssh2
+*    cinder CI updates for Dalmatian: https://review.opendev.org/q/topic:%22dalmatian-ci-update%22
+*    we should probably discuss this FIXME: https://opendev.org/openstack/cinder-tempest-plugin/src/commit/a259e8d43eb2ee78fe3ad6d0424ecd42bfd18cbd/.zuul.yaml#L11-L13
+**    action: rosmaita - deprecate the tgt iscsi helper
+===    Continue doc conversation from day before===
+*    https://etherpad.opendev.org/p/document-driver-interface
+*    Deadlines: To be decided in the next upstream meeting
+===     Cross project with glance about same store image migration (whoami-rajat)===
+*    Meeting will be in the Cinder "room"
+**    https://etherpad.opendev.org/p/apr2024-ptg-glance#L157
+**    https://review.opendev.org/c/openstack/glance-specs/+/914639
+**    Generic Image migration + Optimization for same store migration
+**    WIP Spec (high level details for discussion purposes): https://review.opendev.org/c/openstack/glance-specs/+/914639
+**    Description:
+***    Currently the preferred way of migration in glance is two step
+****    1. Copy the image from source store to destination store
+****    2. delete the image from source store
+***    As i can see, it has two problems with this approach:
+****    1. requires manual intervention from operators after waiting for image copy to finish and then delete the image from source store
+****    2. No way for stores to optimize the operation
+***    This can be addressed by introducing a migration operation in glance which will have two features
+****    1. A generic migration workflow where we will perform the image copy and delete in the same API operation
+****    2. Allow an interface for glance store methods to optimize the migration if possible else it will fall back to the generic workflow
+===    Cross project session with nova and glance about in-flight image encryption (rosmaita)===
+*    https://review.opendev.org/c/openstack/glance-specs/+/609667
+*    Nova and Cinder require LUKS format
+**    for encryption, cinder gets the binary secret (a byte array) from barbican and converts it into a string of hex digits (which is used as the luks passphrase), Nova doesn't - it generates passphrases directly and stores them in Barbican
+*    current proposal/patchset:
+**    Glance uses GPG encryption, conversion to LUKS could happen at upload-time to prevent performance impact on every boot
+**    https://specs.openstack.org/openstack/cinder-specs/specs/zed/image-encryption.html#proposed-change
+**    using container_format seems good for LUKS but it's not technically correct since container format should sit below the image format and not above it
+*    new proposal (as interpreted by mhen)
+**    get rid of GPG encryption and vastly simplify the patchset by using LUKS encryption for images like Cinder and Nova already do when creating images of encrypted disks
+***    as proposed by Dan Smith (Nova)
+**    figure out which metadata to add to Glance images to properly reflect the new use cases
+***    maybe streamline existing attributes like "cinder_encryption_key_id" and rename it to the same as Nova and the (to-be-introduced) user-side are using
+**    Cinder would keep its behavior for Cinder-created encrypted volumes
+***    Cinder lets Barbican generate a binary key as secret_type=symmetric
+***    Cinder uses binascii.hexlify() on the binary key and passes the result as passphrase to LUKS
+***    any image created from such volume would keep the reference to the key that is marked as secret_type=symmetric and would trigger the binascii.hexlify() call before use
+**    support for Nova- or user-supplied images is to be added to Cinder
+***    secret_type=passphrase indicates that the Barbican secret carries the final passphrase (not binary), this instructs Cinder *not* to binascii.hexlify() the secret payload before passing it to LUKS
+***    users can use qemu tooling to create a LUKS image and put the passphrase into Barbican by specifying secret_type=passphrase
+****    as an alternative users can have Barbican create the key, do hexlify themselves and specify secret_type=symmetric to imitate what Cinder does, if they want the entropy of Barbican (e.g. HSM)
+***    Cinder can use the LUKS encryption contained in the image as-is and copy the LUKS-encrypted blocks 1:1 into the volume backend storage, it just needs to differentiate between secret_types to handle the key/passphrase correctly
+****    it already does this when restoring images it created itself during the "os-volume_upload_image" Cinder API action from encrypted volumes
 ==Thursday 11 April==
@@ Line 79: / Line 128: @@
 ==Friday 12 April==
-* Review development cycle processes & schedule & documentation --> postpone
+===Review development cycle processes & schedule & documentation --> postpone===
-**https://docs.openstack.org/cinder/latest/contributor/index.html#managing-development
+*https://docs.openstack.org/cinder/latest/contributor/index.html#managing-development
-**some stuff is out of date, some things may be missing
+*some stuff is out of date, some things may be missing
-**should be interesting to anyone who wants to know what goes into making cinder and all it associated products
+*should be interesting to anyone who wants to know what goes into making cinder and all it associated products
+===Cinder Backup improvements===
+*There is a pile of bugfixes and optimizations to cinder-backup. (very sorry about the delays -- zaitcev)
+*Improvements
+**Optimize getting parent backup for new incremental
+***https://review.opendev.org/c/openstack/cinder/+/484729
+**Speed up starting cinder-backup
+***https://review.opendev.org/c/openstack/cinder/+/657543
+****zaitcev took over so cannot vote +2, someone else please review
-*Cinder Backup improvements
+*Bugfixes
-**There is a pile of bugfixes and optimizations to cinder-backup. (very sorry about the delays -- zaitcev)
+**Ceph: Catch more failure conditions on volume backup
-**Improvements
+***https://review.opendev.org/c/openstack/cinder/+/897245
-***Optimize getting parent backup for new incremental
+**Fix snapshot status is always backing-up
-****https://review.opendev.org/c/openstack/cinder/+/484729
+***https://review.opendev.org/c/openstack/cinder/+/806665
-***Speed up starting cinder-backup
+**Cinder backup DB table lacks any indices -- this is already done
-****https://review.opendev.org/c/openstack/cinder/+/657543
+***https://bugs.launchpad.net/cinder/+bug/2048396
-*****zaitcev took over so cannot vote +2, someone else please review
+***This should already be merged that adds those indexes: https://review.opendev.org/c/openstack/cinder/+/819669
+****so, code is probably out of date at Christian's installation
+*****https://paste.opendev.org/show/bRnPc1qFNuu5jNw4t8eQ/
+****crohmann: I did indeed miss this patch, yes. But this "only" add an index for "deleted", would it not make sense to have indexes on volume_id, project_id and also the deleted column to cover the most common selections?
+*****It also adds it on the project_id
+*****Adding an index on the volume_id makes sense for deployments with a lot of backups per project (<- volume ?)
+**Idea to use this week's Review meeting on Friday to focus on pending backup patches
+***https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open,50
+***https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open+branch:master+label:Verified%3E%3D1+not+label%3ACode-Review%3C%3D-1
+*    Features / Specs
+**    Ceph: Add option to keep only last n snapshots per backup
+***    This was discussed over a loooong period of time and I'd appreciate for this to be either merged or rejected for good
+***    Another installation was asking about this as well https://review.opendev.org/c/openstack/cinder/+/810457/comments/19fa14ba_e49102ad
+***    Bug: https://bugs.launchpad.net/cinder/+bug/1703011
+***    Topic: https://review.opendev.org/q/topic:ceph_keep_snapshots
+****    Patch: https://review.opendev.org/c/openstack/cinder/+/810457
+**    Spec to introduce new backup_status field for volume
+***    https://review.opendev.org/c/openstack/cinder-specs/+/868761
+***    Accepted and merged a while ago, but not implemented yet.
+***    Jbernard and happystacker mentioned in past weeklies they would pick this one up
+**    Volume backup timeout for large volumes when using backend based on chunkeddriver
+***    Actually a bug, https://bugs.launchpad.net/cinder/+bug/1918119
+***    See Enrico Bocchi and Luis Fernandez Alvarez from Cern about their endevours with Cinder-Backup
+***    Jon Bernard from RedHat offered to look into the performance issues with chunkeddriver we discussed before https://etherpad.opendev.org/p/cinder-bobcat-meetings#L464
+***    Later zaitcev mentioned he was going to work on this issue.
+****    (confirmed, although it took a back seat to encrypted backups for now)
+****    Which are also great to have! Really, thanks for working on them!
+****    But backups via chunked driver also have to be fast enough for them to even be considered instead of ceph/rbd
+***    All backup drivers targeting object storage or NFS, which are based on the abstract chunked driver, are really slow - so only Ceph RBD is fast enough to be used for realistically large volumes
+****    This causes cinder-backup to not really support "offsite-backups"
+****    Bug https://bugs.launchpad.net/cinder/+bug/1918119
+****    Cern talk about it being too slow to use
+****    See past discussion https://meetings.opendev.org/meetings/cinder/2024/cinder.2024-01-24-14.01.log.html#l-64
+****    zaitcev was working on this last I believe, but jbernard was also showing some interest in fixing this
+***    The "chunkeddriver" either in the way it's implemented or even the whole approach needs to be reworked to make it feasable to use offsite-locations
+**    mypy, Ceph: correct types of file-likes in backup and restore (by zaitcev)
+***    https://review.opendev.org/c/openstack/cinder/+/866093
+***    See discussion around moderizing Python stack (e.g by adding Types) - https://lists.openstack.org/archives/list/openstack-discuss@lists.openstack.org/thread/4V63CHMZ4GPC4IYN7JCJPVKHLZAHN5BL/
+***    Could / should we add typing to cinder (-backup) and will you accept patches? - YES
+****    That would be great to have, we would like to have typing in all Cinder if possible
+*    zaitcev is working on encrypting backups in chunkdriver. See  spec https://review.opendev.org/c/openstack/cinder-specs/+/862601 and review placeholder https://review.opendev.org/c/openstack/cinder/+/915192
-**Bugfixes
+===Just FYI: new Spec for Image Encryption with LUKS===
-***Ceph: Catch more failure conditions on volume backup
+*    https://review.opendev.org/c/openstack/glance-specs/+/915726
-****https://review.opendev.org/c/openstack/cinder/+/897245
+*    Should there be a spec for Cinder too?
-***Fix snapshot status is always backing-up
-****https://review.opendev.org/c/openstack/cinder/+/806665
-***Cinder backup DB table lacks any indices -- this is already done
-****https://bugs.launchpad.net/cinder/+bug/2048396
-****This should already be merged that adds those indexes: https://review.opendev.org/c/openstack/cinder/+/819669
-*****so, code is probably out of date at Christian's installation
-******https://paste.opendev.org/show/bRnPc1qFNuu5jNw4t8eQ/
-*****crohmann: I did indeed miss this patch, yes. But this "only" add an index for "deleted", would it not make sense to have indexes on volume_id, project_id and also the deleted column to cover the most common selections?
-******It also adds it on the project_id
-******Adding an index on the volume_id makes sense for deployments with a lot of backups per project (<- volume ?)
-***Idea to use this week's Review meeting on Friday to focus on pending backup patches
-****https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open,50
-****https://review.opendev.org/q/project:openstack/cinder+dir:cinder/backup+status:open+branch:master+label:Verified%3E%3D1+not+label%3ACode-Review%3C%3D-1

Difference between revisions of "CinderDalmatianPTGSummary"

Revision as of 20:42, 22 April 2024

Contents

Introduction

Tuesday 9 April

Wednesday 10 April

general info/observations

Continue doc conversation from day before

Cross project with glance about same store image migration (whoami-rajat)

Cross project session with nova and glance about in-flight image encryption (rosmaita)

Thursday 11 April

Add Active-Active support for NetApp iSCSI/FCP drivers

Discuss bug https://bugs.launchpad.net/cinder/+bug/2060830

Volume encryption with user defined keys

Review CI updates

Friday 12 April

Review development cycle processes & schedule & documentation --> postpone

Cinder Backup improvements

Just FYI: new Spec for Image Encryption with LUKS