Jump to: navigation, search

CinderUssuriPTGSummary

Revision as of 13:18, 23 November 2019 by Brian-rosmaita (talk | contribs)

Introduction

This page contains a summary of the subjects covered during the Ussuri PTG held in Shanghai, China, November 7-8, 2019. (It will also eventually contain a summary of the Virtual PTG held November 25 and 27, 2019.)

The full etherpad and all associated notes may be found here: https://etherpad.openstack.org/p/shanghai-ptg-cinder

Thursday

Cinder project onboarding and "meet the Cinder developers" session

Everyone who attended were 100% satisfied and very complimentary about Cinder. Unfortunately, no one attended, so we spent the time figuring out how to get the recording equipment connected and positioned properly.

(recording 1 starts here)

Python 2 support & Work remaining to remove Py27 support

I came into this arguing that we need to keep Python 2 testing in master for a while -- at least while we are still supporting Python 2 in stable branches, because otherwise backports become a big problem (won't have a clean backport if any py3-only language features are used in a patch to master). Pretty much no one agreed with this.

Sean pointed out that as libraries drop py2 support, we won't be able to use them in py2 testing anyway. Ivan and Sean can't wait to start ripping out py2-compatability code. Gorka didn't think that the extra effort to modify backports would be that big a deal, and that if we're going to start using py3 for real, might as well start now.

Actions

  • Reminder to reviewers: we need to be checking the code coverage of test cases very carefully so that new code has excellent coverage and will be likely to fail when tested with py2 in stable branches when a backport is proposed
  • Reminder to committers: be patient with reviewers when they ask for more tests!
  • Reminder to community: new features can use py3 language constructs; bugfixes likely to be backported should be more conservative and write for py2 compatabilty
  • Reminder to driver maintainers: ^^
  • Ivan and Sean have a green light to start removing py2 compatability

Policy migration

Some background: https://etherpad.openstack.org/p/policy-migration-steps Keystone has added a default read-only role and service-scoped roles, but they don't do anything until projects write policies that use them oslo.policy code has a way to define a default policy + deprecated default policy; during deprecation, the most permissive wins. This will allow easy migration to new policies for operators.

There are still questions about how to set up testing for these. Keystone did only unit tests, but the tests were very heavyweight; had to set up all the users in the DB each time; they wish they had done things in tempest. But there is a concern that it may not be practical to do in tempest, either. (It may depend on the project.)

A pop-up team is going to be started to help get the larger projects moved to the new Policy Code.

Actions

  • rosmaita will investigate the different testing approaches. Note: It's possible that tempest will add the methods to create different users and the different projects will have to do their own testing using those.
  • rosmaita To look at the scoping options and understand what the impact on Cinder will be.
    • Need to create a matrix of our policies and different scopes.
    • Need to figure out how the administrative context fits in
    • Need to check current test coverage and where testing needs to be enhanced.
    • Don't have to have one person do it. It is possible to split up the work.
  • rosmaita and e0ne (and anyone else interested in this) to join the pop-up team to get more info and help get Cinder started.

Cinder V2 API removal

We can't just remove V2 code right now (e.g., the V2 extensions need to be moved to V3) but we can remove access to the API. Though actually that's not true either, there is a lot of background work that needs to be done before we remove V2 API:

  • Tempest still assumes that the V2 API will be there. Need to fix it.
  • OpenStack Client also has some V2 API assumptions.
  • Devstack also will not work.

Sean has a patch to see how badly things break with V2 removal: https://review.opendev.org/554372

V3 is pretty much exactly the same as V2. We should be able to change and just have people switch the endpoint and have it work. It would be nice if we could just update the catalog but that doesn't appear to be the case.

Actions

  • follow up on this for the virtual PTG. What did we find with Sean's patch?
    • Create a list of the specific work items that need to be completed.
    • At that point we may be able to split the work up to an intern (if we have an intern).

Cinder REST API V4

We had talked in Vancouver about getting to a point after we have enough micro-versions piling up to move to V4.

Actions

  • We don't need to do that in this release (let's get rid of V2 first), but it is something we need to keep in mind as a future goal.

(recording 2 starts here)

Volume local cache

Requires both Cinder and Nova work:

Currently there're different types of fast NVME SSDs, such as Intel Optane SSD, which r/w throughput can be 2.x~3.x GB/s, latency can be ~10 us. While typical remote volume for a VM can be hundreds of MB/s, latency can be millisecond level (iscsi / rbd). So these fast SSDs can be mounted on compute node locally and used as a cache for remote volumes. Regarding storage team, we need to add support in os-brick.

Consensus was: there are some storage solutions this cannot be done for (Ceph, no mount point on host machine), some that might not require this (some vendors already have super-fast caching), and some it's worth doing for, so the overall feeling was supportive for this effort.

See the PTG etherpad for details. Picture of the flip chart used during the discussion: https://twitter.com/jungleboyj/status/1192323512238776320

Actions

  • Liang Fang to continue working on this

Mutable options

The context for this is a NetApp customer request who wants to be able to change backend credentials without restarting any services.

Problem is that the current mutable config can be done for the REST API, but doesn't extend beyond that. Further, changing driver credentials is a little more work since it may require reloading the driver or having a mechanism in all drivers to recognize and handle that change. Also, we don't want config options that are shared across drivers to be mutable.

Gorka pointed out that a driver supporting Active-Active would not need mutable options for this purpose. It would be better to implement Active-Active instead of refresh credentials this way. A/A HA support has been ready for several releases now, but so far RBD has been the only driver to test and enable it.

The team feels that using Active/Active is the best way to go.

Actions

  • Gorka volunteered to support the NetApp team if they choose to implement A/A
  • need to add to the developer docs that just making an option mutable in oslo.config does not solve the problem for drivers (more info on the etherpad)

(recording 3 starts here)

Cross Project Discussion with Edge Working Group

Apparently the next version of TripleO will support storage at the edge. They were wondering if we knew anything about that. We don't.

As far as edge persistent storage goes, telcos think about having NFS-only and have it in the core - scary concept.

In considering the edge use case, it is important to understand the physical limitations of what people have in mind. For example, one small telco rack, or a smaller DC with air conditioning, or a bigger DC with AC and bigger storage unit, etc. You really can't talk about "the edge" (insert U2 joke here).

See the etherpad for more.

Default volume types depending on project or user

Having a single volume type default is too restrictive for bigger clouds with multiple AZs and many tenants/projects. Operators want more defaults to use in particular situations.

The selection of which default to use is easy; the hard part of this will be the code enabling creation of the default at the end user/project level. Will need:

  • new API calls (create, show, list, update, delete), new microversion
  • client support
  • tell horizon about it

Actions

  • geguileo - write the spec
    • request from Glance: we may also want a per-service default (triggered when a service token is passed)

Cinder retype doesn't use driver assisted migration

Gorka thinks this doesn't depend on the driver; he thinks it's broken for all drivers. There is code in the manager that prevents the efficient path from being taken: https://github.com/openstack/cinder/blob/ca5c2ce4e8ae9fbc92181ac4ba09cec3429a71e6/cinder/volume/manager.py#L2490 There was a reason for it; we need to review and see if it still holds.

Ivan thinks this is just a bug. Though we don't have a bug open for it.

Actions

  • e0ne to investigate and fix it if he can verify that it is broken.

EOL some of the currently open branches

We have 8 open branches plus master (ussuri). Sent an email to the ML asking for data so we can make a good decision about this: http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010385.html

Got zero responses, so this apparently isn't seen as a big deal by the community.

The policy is that we need to announce 6 months ahead of time the fact that we are planning to EOL a branch. This allows time for a vendor to come in and pick it up if necessary. So, if we want to drop branches we just need to announce that we are planning to EOL branches and then we can do it in 6 months.

The driverfixes branches have not been used in quite a while.

  • Should we delete those? No, we don't really want to lose that history of commits.
  • Could we re-name them? Put 'archived' in the title or something to make it clear that it doesn't still take code. (Or just document that they are an archive of old driver fixes.)
  • When we EOL a driver we should probably make it a driverfixes branch. (Not clear on exactly what's being proposed here, need to follow up at VPTG.)

Actions

  • rosmaita - find out about renaming branches from infra team; also, about read-only branches (change to gerrit so no patches can be proposed to the branch)?
    • proposal: EOL o, p and rename them archived-ocata, archived-pike
  • rosmaita - send proposal to ML that o, p are due to exit EM status in 6 months
  • revisit this at the Virtual PTG
    • the EOL policy was revised recently, no longer requires the 6 month waiting period
    • want to reconsider whether not deleting the EOL branches is a good idea if we're not going to merge anything into them

(recording 4 starts here)

Discuss the latest User Survey Results

Here's a handy compiled list of only the Cinder responses: https://etherpad.openstack.org/p/cinder-2019-user-survey-question-responses

Actions

  • replication needs better documentation so that people know we can failover and fail back correctly
  • ivan is planning to continue the generic backup driver work

Meeting with the Nova team

When we failover in Cinder, volumes are no longer usable in Nova, but we don't tell Nova that the failover has ocurred. Any procedure in Nova to correct the situation needs to be done manually. It would be better if we let Nova know that a failover has occurred so they can do something.

A complication is that Nova can't simply detach and attach the volume because data that is in flight would be lost.

How about boot from volume? In that case the instance is dead anyway because access to the volume has been lost. Could go through the shutdown, detach, attach, reboot path. Problem is that detach is going to fail. Need to force it or handle the failure. But we aren't sure that Nova will allow a detach of a boot volume. And we don't currently have a force detach API.

Also discussed a possible Nova bug for images created from encrypted volumes: https://bugs.launchpad.net/nova/+bug/1852106 , though it's not clear that the scenario described in the bug can actually happen

Actions

  • need to figure out how to pass the force to os-brick to detach volume and when rebooting a volume
  • rosmaita to investigate Bug #1852106

Friday

Meeting with the Glance team

Support for Glance multiple stores in Cinder

References: (cinder spec) https://review.openstack.org/#/c/641267/

The Cinder team is still OK with this idea (which was approved for Train).

Actions
  • retarget spec for Ussuri
  • get Abhishek's patch reviewed

Image snapshot co-location

For the Edge use case, Glance is planning to use info provided by Nova about what image a server was booted from to co-locate snapshots of that server in the same store as the original image. Would like to do the same with Cinder volumes uploaded as images. Just need a header that specifies the "base" image of the volume being uploaded as an image. We agreed that this is a separate use case from the above.

Actions
  • Abhishek will write the spec for Cinder

Glance Cinder driver is very limited

We think it uses only default volume type, and also, it is not very well tested. We all agreed that this is a sad state of affairs.

Actions
  • somebody should do something

Meet with Horizon about their proposed implementation of Cinder user messages

Horizon is interested in exposing the User Messages API. We agreed that this is a great idea.

There's a question about having the message displayed in a requested language. It's possible that this is already handled at the REST API layer via the "Accept-Language" header. If it's not, that's probably the place to support this.

Actions

  • rosmaita determine whether this would require a change to the API code, or whether existing code handles this already

Attach/Detach speed

Gorka was wondering whether there are any complaints about attach/detach speed in OpenStack, particularly since people are now using Cinder to provide volumes for Kubernetes (cinder in-tree driver, Cinder-CSI, Ember-CSI) and may be seeing a lot more attach/detach requests.

Everybody seems to be OK with it, it's only geguileo who's complaining.

Actions

  • not a concern at the moment

Topics from Train mid-cycle: status and carry-over to Ussuri

Notes about the Train mid-cycle: https://wiki.openstack.org/wiki/CinderTrainMidCycleSummary