CinderUssuriPTGSummary

Introduction
This page contains a summary of the subjects covered during the Ussuri PTG held in Shanghai, China, November 7-8, 2019. It also contains a summary of the Virtual PTG held November 25 and 27, 2019.

The sessions were recorded. Links to the recordings are located at appropriate places below.

The full etherpad and all associated notes may be found here:
 * Shanghai: https://etherpad.openstack.org/p/shanghai-ptg-cinder
 * Virtual: https://etherpad.openstack.org/p/cinder-ussuri-virtual-ptg-planning

An etherpad of action items looking for owners is here: https://etherpad.openstack.org/p/cinder-ussuri-ptg-actions



Cinder project onboarding and "meet the Cinder developers" session
Everyone who attended were 100% satisfied and very complimentary about Cinder. Unfortunately, no one attended, so we spent the time figuring out how to get the recording equipment connected and positioned properly.

 Video Recording Part 1

Python 2 support & Work remaining to remove Py27 support
I came into this arguing that we need to keep Python 2 testing in master for a while -- at least while we are still supporting Python 2 in stable branches, because otherwise backports become a big problem (won't have a clean backport if any py3-only language features are used in a patch to master). Pretty much no one agreed with this.

Sean pointed out that as libraries drop py2 support, we won't be able to use them in py2 testing anyway. Ivan and Sean can't wait to start ripping out py2-compatability code. Gorka didn't think that the extra effort to modify backports would be that big a deal, and that if we're going to start using py3 for real, might as well start now.

Actions

 * Reminder to reviewers: we need to be checking the code coverage of test cases very carefully so that new code has excellent coverage and will be likely to fail when tested with py2 in stable branches when a backport is proposed
 * Reminder to committers: be patient with reviewers when they ask for more tests!
 * Reminder to community: new features can use py3 language constructs; bugfixes likely to be backported should be more conservative and write for py2 compatabilty
 * Reminder to driver maintainers: ^^
 * Ivan and Sean have a green light to start removing py2 compatability

Policy migration
Some background: https://etherpad.openstack.org/p/policy-migration-steps Keystone has added a default read-only role and service-scoped roles, but they don't do anything until projects write policies that use them oslo.policy code has a way to define a default policy + deprecated default policy; during deprecation, the most permissive wins. This will allow easy migration to new policies for operators.

There are still questions about how to set up testing for these. Keystone did only unit tests, but the tests were very heavyweight; had to set up all the users in the DB each time; they wish they had done things in tempest. But there is a concern that it may not be practical to do in tempest, either. (It may depend on the project.)

A pop-up team is going to be started to help get the larger projects moved to the new Policy Code.

Actions

 * rosmaita will investigate the different testing approaches. Note: It's possible that tempest will add the methods to create different users and the different projects will have to do their own testing using those.
 * rosmaita To look at the scoping options and understand what the impact on Cinder will be.
 * Need to create a matrix of our policies and different scopes.
 * Need to figure out how the administrative context fits in
 * Need to check current test coverage and where testing needs to be enhanced.
 * Don't have to have one person do it. It is possible to split up the work.
 * rosmaita and e0ne (and anyone else interested in this) to join the pop-up team to get more info and help get Cinder started.

Cinder V2 API removal
We can't just remove V2 code right now (e.g., the V2 extensions need to be moved to V3) but we can remove access to the API. Though actually that's not true either, there is a lot of background work that needs to be done before we remove V2 API:
 * Tempest still assumes that the V2 API will be there. Need to fix it.
 * OpenStack Client also has some V2 API assumptions.
 * Devstack also will not work.

Sean has a patch to see how badly things break with V2 removal: https://review.opendev.org/554372

V3 is pretty much exactly the same as V2. We should be able to change and just have people switch the endpoint and have it work. It would be nice if we could just update the catalog but that doesn't appear to be the case.

Actions

 * follow up on this for the virtual PTG. What did we find with Sean's patch?
 * Create a list of the specific work items that need to be completed.
 * At that point we may be able to split the work up to an intern (if we have an intern).

Cinder REST API V4
We had talked in Vancouver about getting to a point after we have enough micro-versions piling up to move to V4.

Actions
Video Recording Part 2
 * We don't need to do that in this release (let's get rid of V2 first), but it is something we need to keep in mind as a future goal.

Volume local cache
Requires both Cinder and Nova work:
 * Cinder spec: https://review.opendev.org/#/c/684556/
 * Nova spec: https://review.opendev.org/689070

Currently there're different types of fast NVME SSDs, such as Intel Optane SSD, which r/w throughput can be 2.x~3.x GB/s, latency can be ~10 us. While typical remote volume for a VM can be hundreds of MB/s, latency can be millisecond level (iscsi / rbd). So these fast SSDs can be mounted on compute node locally and used as a cache for remote volumes. Regarding storage team, we need to add support in os-brick.

Consensus was: there are some storage solutions this cannot be done for (Ceph, no mount point on host machine), some that might not require this (some vendors already have super-fast caching), and some it's worth doing for, so the overall feeling was supportive for this effort.

See the PTG etherpad for details. Picture of the flip chart used during the discussion: https://twitter.com/jungleboyj/status/1192323512238776320

Actions

 * Liang Fang to continue working on this

Mutable options
The context for this is a NetApp customer request who wants to be able to change backend credentials without restarting any services.

Problem is that the current mutable config can be done for the REST API, but doesn't extend beyond that. Further, changing driver credentials is a little more work since it may require reloading the driver or having a mechanism in all drivers to recognize and handle that change. Also, we don't want config options that are shared across drivers to be mutable.

Gorka pointed out that a driver supporting Active-Active would not need mutable options for this purpose. It would be better to implement Active-Active instead of refresh credentials this way. A/A HA support has been ready for several releases now, but so far RBD has been the only driver to test and enable it.

The team feels that using Active/Active is the best way to go.

Actions

 * Gorka volunteered to support the NetApp team if they choose to implement A/A
 * need to add to the developer docs that just making an option mutable in oslo.config does not solve the problem for drivers (more info on the etherpad)

Video Recording Part 3

Cross Project Discussion with Edge Working Group
Apparently the next version of TripleO will support storage at the edge. They were wondering if we knew anything about that. We don't.

As far as edge persistent storage goes, telcos think about having NFS-only and have it in the core - scary concept.

In considering the edge use case, it is important to understand the physical limitations of what people have in mind. For example, one small telco rack, or a smaller DC with air conditioning, or a bigger DC with AC and bigger storage unit, etc. You really can't talk about "the edge" (insert U2 joke here).

See the etherpad for more.

Default volume types depending on project or user
Having a single volume type default is too restrictive for bigger clouds with multiple AZs and many tenants/projects. Operators want more defaults to use in particular situations.

The selection of which default to use is easy; the hard part of this will be the code enabling creation of the default at the end user/project level. Will need:
 * new API calls (create, show, list, update, delete), new microversion
 * client support
 * tell horizon about it

Actions

 * geguileo - write the spec
 * request from Glance: we may also want a per-service default (triggered when a service token is passed)

Cinder retype doesn't use driver assisted migration
Gorka thinks this doesn't depend on the driver; he thinks it's broken for all drivers. There is code in the manager that prevents the efficient path from being taken: https://github.com/openstack/cinder/blob/ca5c2ce4e8ae9fbc92181ac4ba09cec3429a71e6/cinder/volume/manager.py#L2490 There was a reason for it; we need to review and see if it still holds.

Ivan thinks this is just a bug. Though we don't have a bug open for it.

Actions

 * e0ne to investigate and fix it if he can verify that it is broken.

EOL some of the currently open branches
We have 8 open branches plus master (ussuri). Sent an email to the ML asking for data so we can make a good decision about this: http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010385.html

Got zero responses, so this apparently isn't seen as a big deal by the community.

The policy is that we need to announce 6 months ahead of time the fact that we are planning to EOL a branch. This allows time for a vendor to come in and pick it up if necessary. So, if we want to drop branches we just need to announce that we are planning to EOL branches and then we can do it in 6 months.

The driverfixes branches have not been used in quite a while.
 * Should we delete those? No, we don't really want to lose that history of commits.
 * Could we re-name them? Put 'archived' in the title or something to make it clear that it doesn't still take code.  (Or just document that they are an archive of old driver fixes.)
 * When we EOL a driver we should probably make it a driverfixes branch. (Not clear on exactly what's being proposed here, need to follow up at VPTG.)

Actions

 * rosmaita - find out about renaming branches from infra team; also, about read-only branches (change to gerrit so no patches can be proposed to the branch)?
 * proposal: EOL o, p and rename them archived-ocata, archived-pike
 * rosmaita - send proposal to ML that o, p are due to exit EM status in 6 months
 * revisit this at the Virtual PTG
 * the EOL policy was revised recently, no longer requires the 6 month waiting period
 * want to reconsider whether not deleting the EOL branches is a good idea if we're not going to merge anything into them

Video Recording Part 4

Discuss the latest User Survey Results
Here's a handy compiled list of only the Cinder responses: https://etherpad.openstack.org/p/cinder-2019-user-survey-question-responses

Actions
Video Recording Part 5
 * replication needs better documentation so that people know we can failover and fail back correctly
 * ivan is planning to continue the generic backup driver work

Meeting with the Nova team
When we failover in Cinder, volumes are no longer usable in Nova, but we don't tell Nova that the failover has ocurred. Any procedure in Nova to correct the situation needs to be done manually. It would be better if we let Nova know that a failover has occurred so they can do something.

A complication is that Nova can't simply detach and attach the volume because data that is in flight would be lost.

How about boot from volume? In that case the instance is dead anyway because access to the volume has been lost. Could go through the shutdown, detach, attach, reboot path. Problem is that detach is going to fail. Need to force it or handle the failure. But we aren't sure that Nova will allow a detach of a boot volume. And we don't currently have a force detach API.

Also discussed a possible Nova bug for images created from encrypted volumes: https://bugs.launchpad.net/nova/+bug/1852106, though it's not clear that the scenario described in the bug can actually happen

Actions

 * need to figure out how to pass the force to os-brick to detach volume and when rebooting a volume
 * rosmaita to investigate Bug #1852106

Meeting with the Glance team
Video Recording Part 1

Support for Glance multiple stores in Cinder
References: (cinder spec) https://review.openstack.org/#/c/641267/

The Cinder team is still OK with this idea (which was approved for Train).

Actions

 * retarget spec for Ussuri
 * get Abhishek's patch reviewed

Image snapshot co-location
For the Edge use case, Glance is planning to use info provided by Nova about what image a server was booted from to co-locate snapshots of that server in the same store as the original image. Would like to do the same with Cinder volumes uploaded as images. Just need a header that specifies the "base" image of the volume being uploaded as an image. We agreed that this is a separate use case from the above.

Actions

 * Abhishek will write the spec for Cinder

Glance Cinder driver is very limited
We think it uses only default volume type, and also, it is not very well tested. We all agreed that this is a sad state of affairs.

Actions

 * somebody should do something

Meet with Horizon about their proposed implementation of Cinder user messages
Horizon is interested in exposing the User Messages API. We agreed that this is a great idea.

There's a question about having the message displayed in a requested language. It's possible that this is already handled at the REST API layer via the "Accept-Language" header. If it's not, that's probably the place to support this.

Actions

 * rosmaita determine whether this would require a change to the API code, or whether existing code handles this already

Attach/Detach speed
Gorka was wondering whether there are any complaints about attach/detach speed in OpenStack, particularly since people are now using Cinder to provide volumes for Kubernetes (cinder in-tree driver, Cinder-CSI, Ember-CSI) and may be seeing a lot more attach/detach requests.

Everybody seems to be OK with it, it's only geguileo who's complaining.

Actions

 * not a concern at the moment

Topics from Train mid-cycle: status and carry-over to Ussuri
Notes about the Train mid-cycle: https://wiki.openstack.org/wiki/CinderTrainMidCycleSummary

Mid-cycle etherpad: https://etherpad.openstack.org/p/cinder-train-mid-cycle-planning

Multiattach
All items need followup. Goals are:
 * short-term: document some guidance for how this feature should be tested
 * long-term: get some new tests into the cinder-tempest-plugin for this

Actions

 * rosmaita draft the short-term document

iSCSI Ceph driver
Due to some downstream priorities changes, Walt is having trouble finding time to work on this. Ivan suggested that we encourage Walt to post whatever he's got, even if it's not working, so what he's learned isn't lost. There are some patches up and a github repo for some code Walt had to write that doesn't have a home in OpenStack or Ceph yet

Actions

 * rosmaita: follow up with Walt
 * rosmaita: put together an etherpad with links to the work done so far

3rd Party CI Irregularities
Third-party testing by backend vendors of their driver code is very important to the project. But most of the 3rd Party CI appear to be pretty unstable.

For most vendors, updating their 3rd Party CI to run python 3.7 in Train was not a simple task. It would be good if we could offer them better guidance about how to set up & maintain their 3rd Party CI. Would also like vendors to be running the cinder-tempest-plugin, but don't want to make it a demand unless we can make the path easier. (BTW, Datera is running the cinder-tempest-plugin in their CI!)

Third Party CI Docs (partial list)
 * https://wiki.openstack.org/wiki/Cinder/how-to-contribute-a-driver
 * https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers
 * https://docs.openstack.org/infra/system-config/third_party.html
 * https://docs.openstack.org/cinder/latest/contributor/drivers.html

Actions

 * Luigi has some ideas about using RDO Software Factory as a basis for 3rd Party CI; need to follow up with him on that
 * Gorka: will check about what RDO has available
 * e0ne: will look to see who's using cinder-tempest-plugin
 * the team: after gorka and e0ne report back, reorganize & update the 3rd party CI docs

Improve Automated Test Coverage
We want to do this via the cinder-tempest-plugin. Sophia (enriquetaso) is mentoring an Outreachy intern who has begun some work on this. Eric has been writing bugs to suggest test cases that need to be addressed.

SQLAlchemy to Alembic migration
No progress on this. Put in a proposal for a summer intern to work on this; maybe we'll get lucky.

See https://etherpad.openstack.org/p/cinder-train-ptg-planning (line #247) for more info.

Capabilities Reporting
Operators need to read the vendor's manual to figure out which extra specs they can write for a particular backend, and what they're used for. it would be nice to drivers report their capabilities in a way that the operator can figure out this info from the CLI.

Everyone agreed that we still want to do this. It will require an API change and there's already a spec for this: https://review.opendev.org/#/c/655939/1/specs/train/backend_capabilities.rst

Actions

 * revisit at the Virtual PTG and figure out who's interested in working on it

Cinder Business
Video Recording Part 2

Cinder Ussuri Priorities
We will finalize this after the Virtual PTG, but here's the initial list:
 * Increase testing coverage
 * Increase number of CIs running cinder-tempest-plugins
 * Better support for third party CIs: Make their life easier by having a way to deploy a robust system
 * Volume types per user/project/service-token
 * better documentation
 * Generic Backups
 * Improve HA Active-Active documentation
 * want to make it easier to test it
 * remove V2 API
 * remove python 2 support

Cinder-core update
See http://lists.openstack.org/pipermail/openstack-discuss/2019-November/010519.html We are at roughly the same review strength we had in Train.

Meeting Time Change update
We were holding off on this until after the Summit so that new contributors could participate in a poll. We'll consider the options from Liang Fang's original proposal at the Cinder weekly meeting: http://eavesdrop.openstack.org/meetings/cinder/2019/cinder.2019-10-23-16.00.log.html#l-166 These are to move the meeting 1 or 2 hours earlier. There has also been some discussion on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010328.html

Actions

 * rosmaita put together a community poll

Virtual PTG
We discussed what the format should be. Consensus was to do it over 2 consecutive days, using 2 hours each day. This should make it easier for people to participate in at least part of the meeting. We want to do it soon; consensus was the week after KubeCon to avoid conflicts. So that would be the last week in November.

Actions

 * rosmaita put together a community poll to determine days/times

Virtual Mid-Cycle
There is interest in having a midcycle. Although everyone recognizes that face-to-face is the best, contributors have been having trouble getting travel support. So we decided to do a completely Virtual Mid-Cycle meetup for Ussuri. We decided to figure out the format after we see how the Virtual PTG works out.

Monday (Virtual)
Video Recording

Forum session recap: Are You Using Upgrade Checks?
Jay gave a quick recap of the Forum session. There are a number of action items in the etherpad above. They are assigned to jungleboyj right now as a TC action.
 * https://etherpad.openstack.org/p/PVG-upgrade-check-forum

There are still some questions about (a) how operators are using these, and (b) what kind of checks we should be providing from the development side. The Cinder team was seeing this as pre-check. Others are seeing it as a check that is used along the way while an upgrade is in process to ensure that things are ready before operators start up their services. Pre-checks seem to make sense for us; Sean noted that we could add an option to do some pre-checks to the cinder-status command.

So what should the Cinder team do during the Ussuri cycle (before we have the above issues settled)? At the very least, we should still add them when a driver is unsupported and subject to removal:
 * inform operators that in order to use an unsupported driver, a flag has to be set in cinder.conf
 * inform operators that they need to contact the vendor about whether they have plans to have the driver re-instated; otherwise, the operator needs to prepare to migrate the affected volumes to a backend with a supported driver for the next Cinder release

Actions

 * jungleboyj - Start a discussion on the mailing list to find out if anyone is actually using or has used the upgrade checks in production
 * need to figure out where the documentation for this goes

Snapshot co-location
The spec for this is: https://review.opendev.org/#/c/695630/

This is related to glance multi-store support in Cinder, but the spec needs to specify more carefully what the use case is. We think that is: a user has a volume that was created from a glance image, and wants to upload it as an image; want to give Glance info so that in can put the new image in the same store as the original image. (So use of the term "snapshot" here may be inaccurate.)

This feature depends on the implementation of the other glance multistore spec: https://review.opendev.org/#/c/661676/

Actions

 * Rajat, Abhishek - update the spec

Python 2 support removal
Gave a quick summary of what we discussed in Shanghai (see above) so that we're all on the same page.

There's a patch up now removing py2 testing from Cinder: https://review.opendev.org/695317. Once that's approved, will do the same for the other components.

General advice to Cinder developers about using Python 3 language features: https://wiki.openstack.org/wiki/CinderUssuriPTGSummary#Actions

Actions

 * rosmaita - get the testing/gate patches merged, then let the good times roll

User messages
Quick discussion of the admin action "leakage" issue discussed on https://review.opendev.org/#/c/694954/

Consensus was that it would be useful to expose admin-oriented actions in user messages that only admins would be able to view. Maybe set a special flag when the message is created, and then use the admin context to decide whether this gets shown or not. Agreed that the message content will be same as we have currently (that is, don't expose any sensitive information even to admins). We can wait until admin-facing user messages are being used and get feedback about whether more info is required or not.

Ivan pointed out that this change should not require a new microversion, since there's no change to the user message API and no change to the current response.

Actions

 * rosmaita - write up a spec

3rd Party CI irregularities
The issue we want to address is that the 3rd Party CI systems seem pretty unstable. We'd like to be able to provide some more support to make the infrastructures more reliable. Luigi suggested using RDO Software Factory as a basis for 3rd Party CI.

References:
 * https://opendev.org/x/third-party-ci-tools
 * https://zuul-ci.org/docs/zuul/admin/quick-start.html

Actions

 * Luigi - follow up with RDO team and get some feedback on how plausible this scenario is
 * e0ne - will look to see who's using cinder-tempest-plugin

Extending default volume type support for tenants
Quick recap of the Shanghai discussion (see above). Simon had mentioned that he might have developer at Pure who'd be interested in doing the implementation. Rajat volunteered to help support the implementation.

Actions

 * Gorka - write up the spec
 * rosmaita - follow up with Simon

Quotas!
Eric has a patch up that may fix one of many problems: https://review.opendev.org/#/c/695096/ Eric thinks the patch could be optimized if someone is interested.

The general problem is that we update multiple tables and there can be (are) race conditions and you wind up with strange situations like negative quota values or multiple quotas for the same project. Operators have posted some scripts to be used occasionally clean up the database, but it would be better to fix this in Cinder.

EOL for driverfixes/{m,n} and stable/{o,p}
Since the Shanghai discussion, a patch has merged that removes the 6 month waiting period for the transition from EM -> EOL: https://review.opendev.org/#/c/682381/

There was a discussion about this in #openstack-tc last week: http://eavesdrop.openstack.org/irclogs/%23openstack-tc/%23openstack-tc.2019-11-22.log.html#t2019-11-22T15:35:01

Consensus is that we should go ahead and do this.

Actions

 * rosmaita - send notice on the ML that we are going to do this in one week
 * http://lists.openstack.org/pipermail/openstack-discuss/2019-November/011136.html
 * rosmaita - put up a release patch to EOL the branches
 * https://review.opendev.org/#/c/696173/

Driver support matrix
Follow-up from the discussion in Shanghai. A suggestion was made that multipath should be a specific category in the support matrix.

Consensus is that multipath is more a feature of the backend than of the driver. It is useful to know if drivers do it, but it's not the kind of thing like replication that they do or not. Also, there are options that have to be set in nova in order for it to be useful - nova:libvirt:use_volume_multipath. So there doesn't seem to be a point in adding this to the support matrix.

Wednesday (Virtual)
Video Recording

v2 API removal update
Outward facing issues:
 * Sean's patch that removes v2 from the 'versions' response had some strange failures (but Rajat thinks there may be a quick fix).
 * Right now, devstack expects both v2 and v3 to be available and creates endpoints for both in the service catalog: https://opendev.org/openstack/devstack/src/branch/master/lib/cinder
 * On the other hand, it looks like tempest is v3 ready: https://review.opendev.org/#/c/530702/
 * Also, Ivan is pretty sure that Horizon uses v3 only.
 * We should notify Nova and Glance to make sure they don't rely on v2 for anything.

Internal issues:
 * we should be able to clean up the stuff that v3 inherited from v2
 * we may not be able to clean up the v2/contrib stuff yet because of microversion reliance

Actions

 * Rajat take a shot at fixing Sean's patch
 * rosmaita - work on the devstack stuff (service catalog)
 * we'll be optimistic about tempest
 * rosmaita - send a general email to the ML saying that we plan to do this

Forum session recap: How are you using Cinder's Volume Types?
Session etherpad: https://etherpad.openstack.org/p/PVG-how-using-cinder-volume-types

Sean mentioned some highlights of the Forum session.
 * NECTAR is running a patch that allows a volume type to be assigned to an AZ: https://github.com/NeCTAR-RC/cinder/commit/d5a3d938a8e0934d31b5a3c568846b3d32843866
 * There were some questions about whether RBD supports volume online migration
 * Operators are interested in the "Support filter backend based on operation type" spec that was implemented in Rocky, but need some documentation explaining how to use it. The implementation is https://github.com/openstack/cinder/commit/e1ec4b4c2e1f0de512f09e38824c1d7e2fa38617

Actions

 * e0ne is planning to do some testing around the RBD volume migration
 * need someone to pick up the documentation of "Support filter backend based on operation type"

Ceph iSCSI work
This is an important feature for Ironic. There's an etherpad gathering some of the work Walt's done on this: https://etherpad.openstack.org/p/cinder-ceph-iscsi-driver

Actions
rosmaita - follow up with Walt about his bandwidth and ask him to add missing stuff to the etherpad

Ussuri community goals
Goal 1: Drop Python 2.7 Support -- we're going to do this in 2 phases. First is to get the python 2 check and gate jobs removed so we aren't depending on any py27 in the gate. Second will be to make the changes that will only allow Cinder to be installed with at least py36. That will follow in January or so when any other project that needs to install Cinder in py27 for their own testing has removed that dependency.

The cinder patch to drop py2 testing is https://review.opendev.org/#/c/695317/ -- once that's merged, we'll do the same for os-brick, cinderclient, the brick-client-ext, and cinderlib.

Goal 2: Project Specific New Contributor & PTL Docs -- the goal is not yet approved; it's at the formal vote stage: https://review.opendev.org/#/c/691737/. From comments on the patch, expectations are that the current PTL will do this with help from former PTLs. Luckily, we have 2 former PTLs who are still very active with the project. The open issue right now is that the docs are supposed to be consistent across projects, but there isn't a template for this yet.

There's a pre-selected goal for V to migrate all legacy zuul jobs: https://review.opendev.org/#/c/691278/. gmann has a patch up moving our legacy jobs (grenade) to the cinder repo and making them py3: https://review.opendev.org/#/c/695787/. At some point, we'll need to convert them to bona fide Zuul v3 jobs. Luigi left a bunch of info on the etherpad about what the moving parts for this are, and he pointed out that reviews are welcome.

It looks like another V goal is going to be "Consistent and secure default policies" goal, which was floated on the ML: http://lists.openstack.org/pipermail/openstack-discuss/2019-October/010291.html. A "Policy Popup Team" is being organized now to do some work during this cycle: https://review.opendev.org/#/c/695993/

Actions

 * everyone - keep an eye on the "remove py2 support" patches
 * everyone - reviews welcome on https://review.opendev.org/#/q/status:open+branch:master+topic:grenade_zuulv3

Backup Service Testing
This was a follow-up from the Train mid-cycle. The backup tests fail intermittently with timeouts. The situation now is that the cinder backup tests have been removed from tempest-full, but they are being run with tempest-integrated-storage (which basically means that failures will hit us but shouldn't block other projects). The issue could still use some investigation; Eric has a suggestion on the etherpad to write an elasticsearch query to look in the c-bak log for a specific IOError instead of looking for the timeout.

Eric also noted that the test jobs run a long time and fail with a not helpful message -- they look for volume going active but don't notice that it's in an error state and should fail sooner. Failing fast could conserve some gate resources. Maybe someone wants to follow up with something like https://review.opendev.org/#/c/565766/ ?

Actions

 * anyone who's interested - follow up on this
 * rosmaita - (came up during this discussion but not related to this topic) update etherpads with nondestructive translation instructions (get the read-only link to the etherpad and use translations tools there instead of in the writeable etherpad)

Increase testing coverage
We want to get more thorough tests into the cinder-tempest-plugin. Sofia (enriquetaso) is mentoring an Outreachy intern Anastasiya (anastzhyr) who will focus on this during her internship (3 Dec 2019 to 3 March 2020). The Cinder community can assist in this by (A) writing bugs tagged 'test-coverage' with specific ideas for tests that can be added, and (B) timely reviews of Anastaiya's patches.

Increase number of 3rd party CIs running cinder-tempest-plugin
This should be easy™ for current 3rd party CIs (and actually may already be a requirement). Need to get a baseline for how many CIs are running it now.

Better support for 3rd Party CIs
We'd like to their lives easier by having a standard way to deploy a robust system. This may be possible using RDO Software Factory. Luigi has already brought this up at the RDO meeting and RDO (and the SF people) are supportive of this idea: http://eavesdrop.openstack.org/meetings/rdo_meeting___2019_11_27/2019/rdo_meeting___2019_11_27.2019-11-27-15.00.log.html#l-76

WIP 3rd party CI doc section for SoftwareFactory:
 * https://softwarefactory-project.io/r/#/c/17097/
 * https://softwarefactory-project.io/logs/97/17097/2/check/sf-docs-build/ae24483/docs-html/guides/third_party_ci.html

Default volume-type enhancement
Gorka's working on a spec for having per user/project/service-token default volume-type; basic idea is what was discussed at the PTG (see above). Related to this is improving the documentation around volume types.

Generic Backups
Related patches:
 * https://review.opendev.org/#/c/620881/
 * https://review.opendev.org/#/c/630305/

Improve Active-Active (HA) Documentation
We're anticipating that operators will want to run Cinder in HA mode (Cinder active-active). Currently RBD supports running in HA.

A driver has to set a flag in order for the service to run in active-active. But in addition to setting the flag, it would be good for the feature to actually work with that driver. The problem is that what you need to do is very driver dependent. We don't have tempest tests that verify that a driver can run in HA (but it should be possible to add tests such that if they fail, then you know HA is not happening).

We need docs aimed at two audiences:
 * driver developers: want to clarify what you need to do to implement active-active and claim HA support. Should be able to give some general advice (like "watch out for race conditions on connection") to help in implementing and testing that their driver supports active-active.
 * operators: need to provide some advice about how to deploy Cinder in active-active mode (for example, you should have 3 API nodes, 3 scheduler nodes, 3 volume services)

Remove the v2 API
Should at least be able to get it out of the service catalog and remove the option to run it (and remove the option to not run v3), so that from an external point of view, all that's available is the v3 API. The refactoring to remove all v2 code from the API doesn't have to happen immediately.

Remove Python 2 support
This is a community goal (and it looks like we're in good shape to make this happen very early in the cycle).

Move away from squlalchemy- migrate to alembic
We're getting closer to the point where we will have no choice.

Should we hold a Virtual Mid-Cycle?
Cycle Schedule: https://releases.openstack.org/ussuri/schedule.html

The consensus that we should have a Ussuri Mid-Cycle and that it should be virtual. While we were discussing the timing (have it close to the spec freeze? or closer to M-2), Eric pointed out that if we were going to use the model that we used for this Virtual PTG, namely, 2-hour sessions spread over a couple of days, there's no reason why the sessions have to be close together since we don't need to arrange any physical facilities. So we can be flexible and have 2 hour sessions whenever it makes sense.

Right now, the sensible times to have 2-hour virtual meetings are:
 * around the Cinder Spec Freeze. The spec freeze is the last week in January and is at 15 weeks, which is exactly the middle of the cycle.  Maybe have the Virtual meet-up the week before so unmerged specs can have some discussion if necessary?
 * at the "Cinder New Feature Status Checkpoint" (week of 16 March 2020), which is 3 weeks before the final release for client libraries.

Actions

 * rosmaita - get feedback from the wider team at the weekly meeting and organize polls to determine the day of week and time