Difference between revisions of "Nova-Cells-v2"
(→Open Questions) |
(→Open Questions) |
||
(2 intermediate revisions by 2 users not shown) | |||
Line 11: | Line 11: | ||
** Original regression is reverted: https://review.openstack.org/#/c/457097/ | ** Original regression is reverted: https://review.openstack.org/#/c/457097/ | ||
** TODO(dansmith) to provide a proper fix | ** TODO(dansmith) to provide a proper fix | ||
+ | |||
+ | === Blueprints === | ||
+ | |||
+ | These are all currently targeted for the Pike release. | ||
+ | |||
+ | * https://blueprints.launchpad.net/nova/+spec/discover-hosts-faster | ||
+ | * https://blueprints.launchpad.net/nova/+spec/cells-aware-api | ||
+ | * https://blueprints.launchpad.net/nova/+spec/cells-count-resources-to-check-quota-in-api | ||
+ | * https://blueprints.launchpad.net/nova/+spec/list-instances-using-searchlight | ||
+ | * https://blueprints.launchpad.net/nova/+spec/service-hyper-uuid-in-api | ||
+ | * https://blueprints.launchpad.net/nova/+spec/convert-consoles-to-objects | ||
=== TODOs === | === TODOs === | ||
Line 19: | Line 30: | ||
=== Open Questions === | === Open Questions === | ||
+ | |||
+ | * How will we handle multiple cells where each cell has its own independent ceph cluster? (brought up in #openstack-nova by mnaser) | ||
+ | ** If glance has its own ceph cluster where it stores images and each cell has its own ceph cluster, then each instance create will require a download of the image from glance since the glance ceph cluster can't be reached by any cell. How can we handle the inefficiency? | ||
+ | *** Idea from mnaser: could we cache images in the imagebackend (instead of on the hypervisor disk) so that each cell gets a copy of the image and can re-use it instead of re-downloading from glance every time? | ||
+ | *** Workaround from mnaser: store images multiple times in glance (glance supports multiple image locations), once per cell ceph cluster, and nova could try locations until it finds an image whose ceph cluster it can access. | ||
+ | **** (melwitt): I'm not sure how that works in glance, how it could access multiple ceph clusters and track separate credentials per ceph cluster? | ||
+ | **** (mnaser): Glance exposes 'locations' attribute in the API which is a list of locations. In the [https://github.com/openstack/nova/blob/master/nova/virt/libvirt/imagebackend.py#L922-L925 clone] function for the RBD image driver, Nova attempts to check if it can clone from this location using `is_clonable()`. You can see from the [https://github.com/openstack/nova/blob/master/nova/virt/libvirt/storage/rbd_utils.py#L199-L225 is_cloneable] codebase that one of the checks is if the fsid of the Ceph cluster is the same as the one Nova connects to. FSID's are supposed to be globally unique so it would return false. I am assuming it'll keep looping until it hits the one that matches the fsid of the ceph cluster inside the cell! | ||
* Should the computes self-register with a cell when the compute_nodes record is created from the ResourceTracker? https://review.openstack.org/#/c/369634/ | * Should the computes self-register with a cell when the compute_nodes record is created from the ResourceTracker? https://review.openstack.org/#/c/369634/ | ||
Line 42: | Line 60: | ||
=== Code Review === | === Code Review === | ||
− | * | + | * See the cells v2 section in the Pike review priorities etherpad: https://etherpad.openstack.org/p/pike-nova-priorities-tracking |
− | |||
− | |||
=== References === | === References === |
Latest revision as of 22:41, 12 December 2017
Contents
Nova Cells v2
Bugs
- https://bugs.launchpad.net/openstack-manuals/+bug/1673616 - Scaling in Operations Guide - cells section needs to be updated
- https://bugs.launchpad.net/nova/+bug/1682060 - empty nova service and hypervisor list
- Fixed with a docs patch: https://review.openstack.org/#/c/456923/ (mriedem)
- https://bugs.launchpad.net/nova/+bug/1682693 - tags and not-tags cannot work properly
- Patch from Kevin Zheng: https://review.openstack.org/#/c/456872/ - needs to be backported to ocata and newton
- https://bugs.launchpad.net/trove/+bug/1682845 - nova's server group API returns deleted instances as members
- Original regression is reverted: https://review.openstack.org/#/c/457097/
- TODO(dansmith) to provide a proper fix
Blueprints
These are all currently targeted for the Pike release.
- https://blueprints.launchpad.net/nova/+spec/discover-hosts-faster
- https://blueprints.launchpad.net/nova/+spec/cells-aware-api
- https://blueprints.launchpad.net/nova/+spec/cells-count-resources-to-check-quota-in-api
- https://blueprints.launchpad.net/nova/+spec/list-instances-using-searchlight
- https://blueprints.launchpad.net/nova/+spec/service-hyper-uuid-in-api
- https://blueprints.launchpad.net/nova/+spec/convert-consoles-to-objects
TODOs
- Older tracking etherpads (these may be out of date):
Open Questions
- How will we handle multiple cells where each cell has its own independent ceph cluster? (brought up in #openstack-nova by mnaser)
- If glance has its own ceph cluster where it stores images and each cell has its own ceph cluster, then each instance create will require a download of the image from glance since the glance ceph cluster can't be reached by any cell. How can we handle the inefficiency?
- Idea from mnaser: could we cache images in the imagebackend (instead of on the hypervisor disk) so that each cell gets a copy of the image and can re-use it instead of re-downloading from glance every time?
- Workaround from mnaser: store images multiple times in glance (glance supports multiple image locations), once per cell ceph cluster, and nova could try locations until it finds an image whose ceph cluster it can access.
- (melwitt): I'm not sure how that works in glance, how it could access multiple ceph clusters and track separate credentials per ceph cluster?
- (mnaser): Glance exposes 'locations' attribute in the API which is a list of locations. In the clone function for the RBD image driver, Nova attempts to check if it can clone from this location using `is_clonable()`. You can see from the is_cloneable codebase that one of the checks is if the fsid of the Ceph cluster is the same as the one Nova connects to. FSID's are supposed to be globally unique so it would return false. I am assuming it'll keep looping until it hits the one that matches the fsid of the ceph cluster inside the cell!
- If glance has its own ceph cluster where it stores images and each cell has its own ceph cluster, then each instance create will require a download of the image from glance since the glance ceph cluster can't be reached by any cell. How can we handle the inefficiency?
- Should the computes self-register with a cell when the compute_nodes record is created from the ResourceTracker? https://review.openstack.org/#/c/369634/
- How would the computes know which cell to map to? We could add something to the model to flag a 'default' or 'staging' cell mapping, or put something into nova.conf on the compute node.
- If we auto-register into a default/staging cell, how do we move hosts to other cells? nova-manage CLI?
- We have an option to auto-map hosts from the scheduler since Ocata, with improvements being made in Pike: https://blueprints.launchpad.net/nova/+spec/discover-hosts-faster
Manifesto
http://docs.openstack.org/developer/nova/cells.html#manifesto
Testing
https://etherpad.openstack.org/p/nova-cells-testing
DB Table Analysis
https://etherpad.openstack.org/p/nova-cells-table-analysis
Scheduling requirements
https://etherpad.openstack.org/p/nova-cells-scheduling-requirements
Code Review
- See the cells v2 section in the Pike review priorities etherpad: https://etherpad.openstack.org/p/pike-nova-priorities-tracking
References
- Note the original cells wiki is here: https://wiki.openstack.org/wiki/Blueprint-nova-compute-cells
- Kilo design summit etherpad: https://etherpad.openstack.org/p/kilo-nova-cells
- nova-specs: https://review.openstack.org/#/q/status:open+project:openstack/nova-specs+branch:master+topic:bp/cells-instance-mapping,n,z
- Flow diagrams: http://paste.openstack.org/show/144068/
- https://etherpad.openstack.org/p/nova-cells-flow-diagram commentable version