Jump to: navigation, search

CinderYogaPTGSummary

Revision as of 13:25, 28 October 2021 by Brian-rosmaita (talk | contribs) (added "default type retrospective")

Introduction

This page contains a summary of the subjects covered during the Cinder project sessions at the Project Team Gathering for the Yoga development cycle, held virtually October 18-22, 2021. The Cinder project team met from Tuesday 19 October to Friday 22 October, for 4 hours each day (1300-1700 UTC).

Subset of the Cinder Team at the Yoga (Virtual) PTG, October 2021.


This document aims to give a summary of each session. More context is available on the cinder Yoga PTG etherpad:


The sessions were recorded, so to get all the details of any discussion, you can watch/listen to the recording. Links to the recordings are located at appropriate places below.

Tuesday 19 October

recordings

Greetings, user survey discussion

For the benefit of people who haven't attended, this is the way the cinder team works at the PTG:

  • sessions are recorded
  • please sign in on the "Attendees" section of this etherpad for each day
  • all notes, questions, etc. happen in the etherpad; try to remember to preface your comment with your irc nick
  • anyone present can comment or ask questions in the etherpad
  • also, anyone present should feel free to ask questions or make comments during any of the discussions
  • we discuss topics in the order listed in the etherpad, making adjustments as we go for sessions that run longer or shorter
  • we stick to the scheduled times for cross-project sessions, but for everything else we are flexible


Next, we took a look at the Project Specific Feedback Responses from the latest User Survey. Here's an ethercalc that's organized to make the cinder-relevant responses easier to see: https://ethercalc.openstack.org/=2021-user-survey

Our question on the survey was: "If there was one thing you would like to see changed (added, removed, fixed) in Cinder, what would it be?"

We received 39 responses (out of about 425 responses to the survey).

Looking through the responses, there were requests for features that we already have and some comments that we didn't understand. We decided send a response to the mailing list mentioning the implemented features and starting a discussion on the items we didn't understand. (Survey responses are anonymous, so we can't contact operators directly.) Amy Marrich (spotz, who facilitates the operator meetups) has mentioned that operators tend to follow the meetup twitter account, so a good way to contact operators is to post to the ML and then notify her to tweet out the link from the ops meetup account.

The quantitative responses (for example, how many deployments include cinder) are on the OpenStack analytics page: https://www.openstack.org/analytics/

Some feedback for the User Survey team:

  • The data in the report on the website is really difficult to consume (it's displayed in non-resizable graphs, and it's difficult to distinguish the percentages for "interested", "testing" and "production"). There's an option to download as PDF, but that gives you the same non-resizable graphs. It would be helpful to be able to download the data as a CSV file.
  • For the next survey, we want to add the question: What driver(s) are you using for your Cinder environment?
  • We like our current question, but a lot of the answers are too vague -- do you have any suggestions on how to indicate to people that they should be clear and specific?

conclusions

  • action (rosmaita): start an etherpad for the response to operators
  • action (rosmaita): communicate our feedback to the User Survey team

In-flight image encryption update

Josephine Seifert (Luzi) updated us on the status of the in-flight encryption effort. The current plan is to have "experimental Image Encryption without Secret Consumers". The reason is to allow coding and reviewing of the Image Encryption work (and set up CI) while waiting for the Secret Consumers API. (The Secret Consumers API in Barbican will allow services to register that a secret is in use (though the secret owner can still delete it by using a --force flag. The holdup is that microversioning needs to be introduced to the Barbican API before the new API can be added.)

The current idea is to release this as an Experimental feature and "officially" release when the Secret Consumers API is ready. This strategy is described in a Glance spec-lite: https://review.opendev.org/c/openstack/glance-specs/+/792134/

What's required from the cinder team for this work is:

  • os-brick -- will have the PGP encryption code to be used by the services. The patch for this is available for review: https://review.opendev.org/709432
  • cinder -- download image from glance will need to decrypt such images to write them to volumes
  • cinder -- upload volume to image (maybe? need to check the spec)
  • cinder -- will also need to use Secret Consumers when available (if cinder does encryption on upload)
    • may also want to add Secret registration to our current luks encrypted volume code to protect encryption key ids
  • what about the glance cinder backend?
    • we have an optimized path that clones instead of downloads and copies onto the volume
    • need to handle this case
  • what about the image-volume cache?
    • should these things not be included in the cache?
    • need to see what the spec says about this


The current cinder spec is: https://specs.openstack.org/openstack/cinder-specs/specs/xena/image-encryption.html

There isn't a patch yet for the cinder changes, though there is a glance PoC patch with placeholders for Secret Consumers: https://review.opendev.org/c/openstack/glance/+/705445

conclusions

  • action (rosmaita) Review the cinder spec again. It was approved in Train, and hasn't really been looked at since.
  • action (cinder team) Interested parties should also look at the spec again.
  • action (whoami-rajat) Review the spec again specifically with cinder glance_store cases in mind.
  • action (Luzi) Gorka pointed out that it will be easier to review if some cinder POC patches are available:
    • cinder patch (pending?)
    • cinder-spec (done)
    • os-brick patch (ready)
    • glance patch (ready)

Support for quiesced snapshot/backup

Arthur Outhenin-Chalandre (Mr_Freezeex) wants to support quiescing volumes for backup or snapshot, following up on some previous proposals:


The netapp/vmware volume driver already support quiesced snapshot without nova calls, and this could be extended to other drivers.

Some points that came up during discussion were:

  • consistency groups: give you only crash consistency, the same as current regular snapshots made from the cinder side. If crash consistency is all people need, then this proposal isn't necessary
  • question: how is this supposed to work for multiple volumes attached to an instance?
  • generic groups allow arbitrary grouping of volumes, how will this impact them?
  • is cinder the correct entrypoint for this request?
  • Simon mentioned that quiescing the entire VM is overkill for taking a snapshot (and is possibly risky), especially if most people are set up to deal with crash-consistent snapshots anyway
  • Gorka pointed out that we need to consider some broader use cases and decide whether they need to be addressed, for example:
    • Single volume: snapshot with quiesce (easy case)
    • All volumes in a single VM
      • Belong to the same group (generic or consistency)
      • Are independent volumes (including boot)
    • All volumes in a group (could be attached to different VMs)
    • multiattach (single volume attached to multiple VMs)


An implementation doesn't have to handle all of the above (and there are probably some more use cases that will come up), but we do want to be clear on exactly what use cases are being addressed and which ones we aren't going to handle.

conclusions

  • action (Mr_Freexeex) will propose a spec addressing the above issues

default types retrospective

Rajat Dhasmana (whoami-rajat) led a discussion about where we are currently with default volume types. Back in Train, we decided that cinder would no longer allow untyped volumes (there are places in the code, particularly in some drivers, where a volume type is assumed, and when it's not there bad stuff happens). So to address this, a __DEFAULT__ type (a very minimal type, basically just name, id, and description) was introduced in the Train release to guarantee that there was at least one volume type in every deployment, and any untyped volumes were assigned this type in the database. Further, if the default_volume_type cinder option wasn't set, the __DEFAULT__ type would be used.

A problem was that there are some deployment tools (and operators) that already created and set a default_volume_type, and operators reported that end users were confused when they saw __DEFAULT__ in the volume-type-list response, and would explicitly create volumes of type __DEFAULT__, which wasn't what operators wanted, they wanted end users to simply use the configured default type. The problem was that the __DEFAULT__ type couldn't be deleted (because its purpose was to make sure that there was always some volume type available in the deployment).

We reworked the logic on this later (and backported it to Train) so that the default_volume_type cinder option is required (with a default value of __DEFAULT__) and cinder will not allow the volume type that's the value of default_volume_type to be deleted (while it's the default), and that there will always be at least one volume type. So it's possible for operators to treat the __DEFAULT__ type like any other type (for example, it can be deleted if there are no existing volumes of that type).

However, __DEFAULT__ is still there out of the box, and it's causing confusion for deployments that don't want to use it, or where it's unnecessary.

We discussed this a bit and concluded that it's a deployment responsibility to decide what to do about the __DEFAULT__ volume type in a particular deployment.

But there are definitely some things we can still do on the cinder side to improve the situation:

  • Make sure it's clear in the operator docs that we already have mechanism in-place to avoid creating untyped volumes, so it's OK to delete the __DEFAULT__ if it's not used anymore. (We picked its name so that it wouldn't clash with any existing volume types, but "__DEFAULT__" looks official and scary, and can lead operators to think that its necessary for cinder's correct functioning.)
    • add this info somewhere in the operator configuration docs
  • In victoria, cinder introduced default types per project: https://specs.openstack.org/openstack/cinder-specs/specs/victoria/default-volume-type-overrides.html. We need to promote the idea that if a user wants to see what the effective default volume type is, they need to make the GET /v3/{project_id}/types/default API call, not look at the volume-type-list and try to figure it out from the name or description. We can improve the documentation to promote this call:
    • upgrade the API-REF
      • add something into the volume-create section
      • probably also in the type list
    • upgrade Client help as well?
      • check to see what the volume-create help text says, add something there
    • in the installation docs, say something about the importance of the default type config and that the __DEFAULT__ type can (and should be) removed (by the operator/deployment tool) after startup a default has been created and set in cinder.conf
  • Gorka suggested that maybe we could introduce a microversion that somehow highlights your effective default volume type when you make the volume-type-list request
    • horizon may already be doing something like this (where "like this" means highlighting the default volume type)? In any case, we should do it too.

conclusions

action (rosmaita) make sure people follow up on this ... we've had inquiries about small features and documentation work from various community members, and this would be ideal for such people

Wednesday 20 October

recordings


Thursday 21 October

recordings


Friday 22 October

recordings