Jump to: navigation, search

Difference between revisions of "Nova-Cells-v2"

(TODOs)
(Open Questions)
 
(9 intermediate revisions by 3 users not shown)
Line 3: Line 3:
 
=== Bugs ===
 
=== Bugs ===
  
* https://bugs.launchpad.net/nova/+bug/1656276 - Error running nova-manage cell_v2 simple_cell_setup when configuring nova with puppet-nova
+
* https://bugs.launchpad.net/openstack-manuals/+bug/1673616 - Scaling in Operations Guide - cells section needs to be updated
** The create_cell CLI should go toward fixing this: https://review.openstack.org/#/c/332713/
+
* https://bugs.launchpad.net/nova/+bug/1682060 - empty nova service and hypervisor list
* https://bugs.launchpad.net/nova/+bug/1649341 - Undercloud upgrade fails with "Cell mappings are not created, but required for Ocata"
+
** Fixed with a docs patch: https://review.openstack.org/#/c/456923/ (mriedem)
** Fixed on master in Nova, backport proposed to stable/newton. However, this change caused issues for TripleO on master which is bug 1656276 above.
+
* https://bugs.launchpad.net/nova/+bug/1682693 - tags and not-tags cannot work properly
* https://bugs.launchpad.net/nova/+bug/1656017 - nova-manage cell_v2 map_cell0 always returns a non-0 exit code
+
** Patch from Kevin Zheng: https://review.openstack.org/#/c/456872/ - needs to be backported to ocata and newton
** dtp has a fix here: https://review.openstack.org/#/c/420132/
+
* https://bugs.launchpad.net/trove/+bug/1682845 - nova's server group API returns deleted instances as members
* https://bugs.launchpad.net/nova/+bug/1656673 - map_cell0 should use the main DB connection, not the API DB
+
** Original regression is reverted: https://review.openstack.org/#/c/457097/
** dansmith has a fix here: https://review.openstack.org/#/c/420439/
+
** TODO(dansmith) to provide a proper fix
* https://bugs.launchpad.net/nova/+bug/1656675 - There is no way to list cell mappings besides looking into the DB
+
 
** mriedem has a fix here: https://review.openstack.org/#/c/420440/
+
=== Blueprints ===
* https://bugs.launchpad.net/nova/+bug/1656691 - There is no way to delete a cell mapping except via DB directly
+
 
** mriedem has a fix here: https://review.openstack.org/#/c/420451/
+
These are all currently targeted for the Pike release.
 +
 
 +
* https://blueprints.launchpad.net/nova/+spec/discover-hosts-faster
 +
* https://blueprints.launchpad.net/nova/+spec/cells-aware-api
 +
* https://blueprints.launchpad.net/nova/+spec/cells-count-resources-to-check-quota-in-api
 +
* https://blueprints.launchpad.net/nova/+spec/list-instances-using-searchlight
 +
* https://blueprints.launchpad.net/nova/+spec/service-hyper-uuid-in-api
 +
* https://blueprints.launchpad.net/nova/+spec/convert-consoles-to-objects
  
 
=== TODOs ===
 
=== TODOs ===
  
* The deployment/upgrade process needs to be documented in more than just the release notes.
 
** dansmith has a start on the docs here: https://review.openstack.org/#/c/420198/
 
** (diana will take this) On a side note, we should also have man pages for the cell_v2 commands because there is confusion around the inputs and outputs and how return codes should be treated, i.e. is 1 an error or not? Put the CLI docs here: http://docs.openstack.org/developer/nova/man/nova-manage.html
 
*** Reviews for those docs: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:man
 
*** Let me (diana_clarke) know if you want them changed in any way. I did not document every option, as that doesn't appear to be the precedent in these docs. I suspect it's supposed to just be high-level, and that you're suppose to refer to the actual CLI help for more information.
 
** alaski's older docs patch (which is probably out of date now but might be useful) is here: https://review.openstack.org/#/c/267153/
 
** Summary of current commands:
 
*** map_cell0: creates a cell mapping for cell0
 
*** simple_cell_setup: creates a cell mapping for cell0 and creates a cell mapping and associates hosts with it (requires unmapped compute hosts registered already). Intended as a lightweight way for non-cells-v1 users to setup cells v2 during an upgrade.
 
*** map_cell_and_hosts: creates a cell mapping and associates hosts with it (requires unmapped compute hosts registered already)
 
*** discover_hosts: associates unmapped hosts with an existing cell mapping (or all cell mappings if a specific cell isn't specified)
 
** Commands TODO?:
 
*** create_cell: creates a cell mapping intended for association with hosts later via discover_hosts: https://review.openstack.org/#/c/332713/
 
*** list_cells: list current cell mappings, noting any that are empty (to help with 1. knowing whether a cell needs to be created 2. finding empty cells for use or for cleanup): https://review.openstack.org/420440
 
*** delete_cell: delete a cell mapping, for use in cleaning up erroneously created cell mappings or wrong cell mappings, etc: https://review.openstack.org/#/c/420451/
 
* Integrate the 'nova-status upgrade check' CLI into the CI/QA system (grenade).
 
 
* Older tracking etherpads (these may be out of date):
 
* Older tracking etherpads (these may be out of date):
 
** https://etherpad.openstack.org/p/cellsV2-remaining-work-items
 
** https://etherpad.openstack.org/p/cellsV2-remaining-work-items
Line 39: Line 30:
  
 
=== Open Questions ===
 
=== Open Questions ===
 +
 +
* How will we handle multiple cells where each cell has its own independent ceph cluster? (brought up in #openstack-nova by mnaser)
 +
** If glance has its own ceph cluster where it stores images and each cell has its own ceph cluster, then each instance create will require a download of the image from glance since the glance ceph cluster can't be reached by any cell. How can we handle the inefficiency?
 +
*** Idea from mnaser: could we cache images in the imagebackend (instead of on the hypervisor disk) so that each cell gets a copy of the image and can re-use it instead of re-downloading from glance every time?
 +
*** Workaround from mnaser: store images multiple times in glance (glance supports multiple image locations), once per cell ceph cluster, and nova could try locations until it finds an image whose ceph cluster it can access.
 +
**** (melwitt): I'm not sure how that works in glance, how it could access multiple ceph clusters and track separate credentials per ceph cluster?
 +
**** (mnaser): Glance exposes 'locations' attribute in the API which is a list of locations.  In the [https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L922-L925 clone] function for the RBD image driver, Nova attempts to check if it can clone from this location using `is_clonable()`.  You can see from the [https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L199-L225 is_cloneable] codebase that one of the checks is if the fsid of the Ceph cluster is the same as the one Nova connects to.  FSID's are supposed to be globally unique so it would return false.  I am assuming it'll keep looping until it hits the one that matches the fsid of the ceph cluster inside the cell!
  
 
* Should the computes self-register with a cell when the compute_nodes record is created from the ResourceTracker? https://review.openstack.org/#/c/369634/
 
* Should the computes self-register with a cell when the compute_nodes record is created from the ResourceTracker? https://review.openstack.org/#/c/369634/
 
** How would the computes know which cell to map to? We could add something to the model to flag a 'default' or 'staging' cell mapping, or put something into nova.conf on the compute node.
 
** How would the computes know which cell to map to? We could add something to the model to flag a 'default' or 'staging' cell mapping, or put something into nova.conf on the compute node.
 
** If we auto-register into a default/staging cell, how do we move hosts to other cells? nova-manage CLI?
 
** If we auto-register into a default/staging cell, how do we move hosts to other cells? nova-manage CLI?
* Why can't we create an empty cell, i.e. a cell mapping with no computes? This is a fresh-install scenario.
+
** We have an option to auto-map hosts from the scheduler since Ocata, with improvements being made in Pike: https://blueprints.launchpad.net/nova/+spec/discover-hosts-faster
** Note that the nova-status upgrade check command does not consider it a failure if there are cell mappings but no compute nodes yet but simple_cell_setup does consider that a failure, see bug 1656276.
 
** There has been a review up for this for awhile: https://review.openstack.org/#/c/332713/
 
*** This way, a fresh install would do something like: 'nova-manage cell_v2 map_cell0' 'nova-manage cell_v2 create_cell' and then once compute hosts are available, operator runs 'nova-manage cell_v2 discover_hosts'
 
  
 
=== Manifesto ===
 
=== Manifesto ===
Line 65: Line 60:
  
 
=== Code Review ===
 
=== Code Review ===
* https://review.openstack.org/#/q/topic:bp/cells-scheduling-interaction
+
* See the cells v2 section in the Pike review priorities etherpad: https://etherpad.openstack.org/p/pike-nova-priorities-tracking
* Otherwise see the cells v2 section in the Ocata review priorities etherpad: https://etherpad.openstack.org/p/ocata-nova-priorities-tracking
 
  
 
=== References ===
 
=== References ===

Latest revision as of 22:41, 12 December 2017

Nova Cells v2

Bugs

Blueprints

These are all currently targeted for the Pike release.

TODOs

Open Questions

  • How will we handle multiple cells where each cell has its own independent ceph cluster? (brought up in #openstack-nova by mnaser)
    • If glance has its own ceph cluster where it stores images and each cell has its own ceph cluster, then each instance create will require a download of the image from glance since the glance ceph cluster can't be reached by any cell. How can we handle the inefficiency?
      • Idea from mnaser: could we cache images in the imagebackend (instead of on the hypervisor disk) so that each cell gets a copy of the image and can re-use it instead of re-downloading from glance every time?
      • Workaround from mnaser: store images multiple times in glance (glance supports multiple image locations), once per cell ceph cluster, and nova could try locations until it finds an image whose ceph cluster it can access.
        • (melwitt): I'm not sure how that works in glance, how it could access multiple ceph clusters and track separate credentials per ceph cluster?
        • (mnaser): Glance exposes 'locations' attribute in the API which is a list of locations. In the clone function for the RBD image driver, Nova attempts to check if it can clone from this location using `is_clonable()`. You can see from the is_cloneable codebase that one of the checks is if the fsid of the Ceph cluster is the same as the one Nova connects to. FSID's are supposed to be globally unique so it would return false. I am assuming it'll keep looping until it hits the one that matches the fsid of the ceph cluster inside the cell!
  • Should the computes self-register with a cell when the compute_nodes record is created from the ResourceTracker? https://review.openstack.org/#/c/369634/
    • How would the computes know which cell to map to? We could add something to the model to flag a 'default' or 'staging' cell mapping, or put something into nova.conf on the compute node.
    • If we auto-register into a default/staging cell, how do we move hosts to other cells? nova-manage CLI?
    • We have an option to auto-map hosts from the scheduler since Ocata, with improvements being made in Pike: https://blueprints.launchpad.net/nova/+spec/discover-hosts-faster

Manifesto

http://docs.openstack.org/developer/nova/cells.html#manifesto

Testing

https://etherpad.openstack.org/p/nova-cells-testing

DB Table Analysis

https://etherpad.openstack.org/p/nova-cells-table-analysis

Scheduling requirements

https://etherpad.openstack.org/p/nova-cells-scheduling-requirements

Code Review

References