Jump to: navigation, search

TricircleBigTentQnA

Revision as of 03:15, 13 July 2016 by Chaoyi Huang (talk | contribs) (What happens if a cloud exposes Tricircle instead of exposing, say, nova directly?)

Contents

Questions and Answers in the Tricircle Big Tent project application

How much of the API specifics need to be reimplemented in the Cinder / Nova APIGW components ? How much maintenance is needed there in case of changes in the bottom APIs ?

For VM/Volume related API(like VM/Volum/Backup/Snapshot...), no need to be re-implemented in Cinder/Nova APIGW, for the Tricircle just forwards the request. For those APIs which manage common attributes like Cinder Volume Type, Nova Flavor,quota which are only some objects in the databases, need to be re-implemented. The maintenance for the change in the bottom APIs is quite small. The Tricircle reuses tempest test cases for Nova/Cinder/Neutron to guarantee if there is change in the bottom APIs, and the change has impact on the implementation of the Tricircle, then the check/gate test for each patch submitted in the Tricircle will be failed, so that the contributor can correct the Tricircle in time. In this patch, the check and gate test has just been added to the patches(https://review.openstack.org/#/c/339332/), more test cases will be opened to cover the features coming in the Tricircle.

You're using an independent release model, which means you do not follow the OpenStack development cycles. Nova, Cinder, Neutron and Keystone follow a cycle-based release model. How does the Tricircle release map to supported releases in bottom instances ? How does it map to the supported Keystone/Neutron implementations running in the top Tricircle instance ?

The Tricircle will release in the same cycle based model, and have a branch accordingly when Nova, Cinder, Neutron and KeyStone have a new branch. In this patch, "independent" release model is configured only temporary because the Tricircle is in early stage of development, currently we want to develop more features and but not strictly follow the milestones like Newton-1, Newton-2, Newton-3. The "independent" release model may last one to two releases, when most of basic features are ready, will use the same release model as Nova, Cinder, Neutron and KeyStone. How about your suggestion for the Tricircle release model.

The milestone model is actually less strict than the other release models, because the milestones are picked based on the schedule rather than the stability of the code. If you follow an independent or cycle-with-intermediary model then you are telling your users that all releases are ready to be used in production. At this point, we are past the second milestone, and so I don't think Tricircle would be considered part of Newton anyway just because of the timing. That said, if you intend to try to follow the release cycle, choosing one of those models instead of independent will help users understand that.

Thank you for your comment. OK, will use cycle-with-intermediary release model instead. The comment for concise mission statement will be updated in next patch.

I don't like how this attempts to re-implement our core APIs: https://github.com/openstack/tricircle/blob/master/tricircle/nova_apigw/controllers/server.py#L121 The above shows many "expected" 500 errors, which is something we explicitly call a bug in OpenStack APIs. I am curious if DefCore tests pass when using Tricircle. It certainly fails the requirement for using certain upstream code sections.

The Tricircle has not had better error handling yet, this needs to be fixed, thank you for pointing out this "500" error handling issue. The check and gate test which is reusing Nova, Cinder,Neutron tempest test cases has just been added to the Tricircle in this patch(https://review.openstack.org/#/c/339332/). Because the job was just merged last week, currently only volume list|get related test cases were opened to test the tricircle(https://github.com/openstack/tricircle/blob/master/tricircle/tempestplugin/tempest_volume.sh). ostestr --regex '(tempest.api.volume.test_volumes_list|\ tempest.api.volume.test_volumes_get)' Server related and other test cases will be added into the job step by step. If tempest test cases pass, then DefCore tests should also pass. The Tricircle should have its own error handling mechanism otherwise there could be mishandling, but the output will be kept consistency as Nova, Cinder, Neutron.It's because that main feature of its is to handle remote resources running on independent openstack instances. Means that the Tricircle should have some capabilities of what are happening on remote site considering consistency.

We have discussed cascading at a previous design summit session, and on the ML. There were questions from that session around use cases that were never answered. In particular, why not exposing the geographical regions and AZs was not acceptable. The cases where a proxy approach seemed to be required, didn't appear to be the target use cases. I don't like how this dilutes the per project efforts around Federation, multi-Region support and scaling patterns like Cells v2 and Routed Networks. It would be better if as a community there is a single way to consume large collections of OpenStack clouds. Federation appears to be the current approach, although there is still work needed around Quotas and having an integrated network service to bridge remote isolated networks.

There are four use cases described why we need the Tricircle project in the reference material[1]:https://docs.google.com/presentation/d/1Zkoi4vMOGN713Vv_YO0GP6YLyjLpQ7fRbHlirpq6ZK4/edit?usp=sharing, and exposing independent geographical regions and AZs was not enough, should I describe all use cases in the commit message? This will make the commit message quite long. Just one use cases here in short, in OpenStack based public cloud, for one region in one site or multi-sites, the end user only wants to see one endpoint. But one OpenStack instance will reach the capacity limit at last, we have to add more OpenStack instance into the cloud for capacity expansion, how to expose one endpoint to the end user? And the end user still wants to add new virtual machine into same network, and security group should work for the virtual machines in different OpenStack instances. There are other use cases are described the in the above mentioned communication material. In financial area, application will often be deployed into two sites and three data centers for high reliability, availability and durability, one region cloud often should support multi-data center and multi-site. Except the four use cases mentioned in the material[1] mentioned above, and there is another use cases reported in OpenStack Austin summit: https://www.openstack.org/videos/video/distributed-nfv-and-openstack-challenges-and-potential-solutions . Federation and multi-region are good solutions, but they don't provide single endpoint exposed to the end user, that is one requirement in the use cases mentioned above, and also no networking automation(for example tenant level L2/L3 networking automation and security handling), no quota control across OpenStack instances. Cells is a good enhancement for Nova scalability, but there are some limitation in deployment for Cells are: 1)only nova supports cells. 2) using RPC for inter-data center communication will bring the difficulty in inter-dc troubleshooting and maintenance, no CLI or restful API or other tools to manage a child cell directly. If the link between the API cell and child cells is broken, then the child cell in the remote site is unmanageable. Analysis and compare of these candidate solutions is also provided in the material[1].

The Tricircle is just applying big-tent project, to be a member and complement of the OpenStack Eco-system. The Tricircle will not require any modification on existing components, it will make use of existing or updated features on existing components, and re-use tempest test cases to ensure the API compliance and consistency. No conflict will happen. As you proposed here, multi-region, cells and federation are possible ways to address these use cases, although some requirements are still not fulfilled. So there are many options for cloud operators, it's no harm for the Tricircle to provide one more option.

If the intent is to hide orchestration complexity from the user, it feels like this would be better as extensions to Heat.

Heat does't provide Nova, Cinder, Neutron API to end user, instead Heat provides its own APIs, but the end user or software still wants to use CLI or APIs or SDK of Nova, Cinder, Neutron. Especially for public cloud, some PaaS platform will talk to Nova, Cinder, Neutron API directly.

Is Tricircle planning to be gateway for every OpenStack project?

No, Nova, Cinder , Neutron only, at most +Glance + Ceilometer, No more

How can we verify the API's exposed by Tricircle are indeed identical to the service's ?

Have explained in the commit meesage and comment many time: reuse the tempest test cases of these services to test tricircle

How does this impact defcore?

If tempest can pass, then defcore pass

What happens if a cloud exposes Tricircle instead of exposing, say, nova directly?

Adding more cross OpenStack scheduling and netwroking automation capabibility

"To provide an API gateway and networking automation to allow multiple OpenStack instances to act as a single cloud" How about if we say "OpenStack clouds" then instead of "OpenStack instances"? Because I'm worried that folks will misunderstand "instance" here to mean "compute instance" and that's not at all what you're doing.

will update it in next patch

By "networking automation" do you mean automation of the physical networking constructing the underlying HW these OpenStack instances are installed on?

Thank you for your comment. The networking automation is to call bottom Neutron APIs (L2GW APIs in some scenario) to establish L2 or L3 network across Neutrons for the same tenant.

"Neutron API Server with Neutron Tricircle plugin", Is this a pure Neutron API? Because I would guess with Tricircle you're interacting not with Neutron but instead with Tricircle through the API GW, and I'm concerned about diverging from the underlying Neutron API here.

Thank you for your comment. This is pure Neutron API Server, and in the Tricircle will be configured with Neutron Tricircle Plugin. It'll be easier to explain in deployment scenario: when the cloud operator wants to install the Tricircle, the Neutron (https://github.com/openstack/neutron) should be installed first, then install the Neutron Tricircle Plugin, and configure the Neutron to use the Neutron Tricircle Plugin. After the installation and configuration is finished, then run the Neutron, Neutron will load the Neutron Tricircle plugin, just like OVN plugin or Dragonflow plugin or ODL plugin running under Neutron. The Neutron API server and Neutron database are required for Neutron to run. But for the Neutron Tricircle plugin, no agent node is needed, the Neutron Tricircle will call the bottom Neutron API as needed(L2GW API in some scenario) through Neutron RESTful API. In fact, in the Tricircle repository( https://github.com/openstack/neutron), only the Neutron Tricircle Plugin source code is developed and delivered. Neutron API server and database is what developed in Neutron project today, and in the repository https://github.com/openstack/neutron. The Tricircle project will not touch the Neutron source code in the Tricircle repository.

Are you mapping Neutron networks across Neutron installs using L2GW? If so, if a tenant interacts with the individual neutron servers, they will see different values for things like VLANs/VNIs, which may be confusing.

Good question. Thank you for your comment. The Tricircle planned to support several cross OpenStack L2 networking model: 1) local_network: a network will only spread in one bottom OpenStack 2) shared_vlan: a same VLAN segment network can spread into multiple bottom OpenStack clouds 3) shared_vxlan: a same VxLAN segment network can spread into multiple bottom OpenStack clouds 4) mixed_vlan_vxlan: a L2 network which can spread into multiple OpenStack clouds, and is consisted of different physical network type and different segment id. Usually the Tenant can not see the network type and segment id, if the tenant can see, then for 1)2)3) cases, the tenant can see same network segment id, for 4), it's like hierarchy port binding in neutron, the network has one master VxLAN network, with several dynamic binding VLAN or VxLAN networks in different bottom Neutrons. Just like the handling in current Neutron, if the network is multi-provider network, if the tenant is allowed to see the provider network information, then the tenant will see the network is consisted of different network type and segment id, especially for hierarchy port binding implementation. The Tricircle has similar handling. The spec about the cross OpenStack L2 networking is here: https://github.com/openstack/tricircle/blob/master/specs/cross-pod-l2-networking.rst The original discussion about the cross OpenStack L2 networking is at [21] https://etherpad.openstack.org/p/TricircleCrossPodL2Networking

I'm curious how Tricircle does the scheduling. Does it choose an OpenStack instance at random, or does it actually inspect the resources available in each instance and schedule based on that? I worry that this will eventually reimplement Nova's scheduler, or even be different enough that a request would be scheduled to an OpenStack instance that cannot satisfy the request.For instance, imagine that instance A has compute nodes with capability X, and instance B does not. How does tricircle ensure that a request for a VM with capability X get to instance A, without reimplementing Nova's scheduler?Also, are the AZs discussed here the group of AZs defined in Nova, or does tricircle have its own idea of what an AZ is?

Thank you for your comment. In fact, the scheduling in Tricircle is mostly for capacity expansion in one AZ(AZ is "availability zone" in short) if one AZ includes more than one bottom OpenStacks. In production cloud, at first, only one OpenStack instance will be put in one AZ, and prefer to add more compute node to this OpenStack instance, if almost all compute nodes have been occupied with VMs and no more compute nodes will be added into the OpenStack, then we have to add one more OpenStack instance into the AZ. Because one more OpenStack instance is added, so we need to forward all new VM/volume provision request to the new OpenStack instance, we have to build the relationship tenant-id and OpenStack, this binding could be done dynamically by the Tricircle, or by admin through admin api for maintenance purpose. Moreover, OpenStack instances inside one AZ can be classified into different categories. For example, servers in one OpenStack are only for general purposes, and the other OpenStack may be built for heavy load CAD modeling with GPU. So OpenStack in one AZ could be divided into different groups. Different OpenStack groups for different purposes, and the VM's cost and performance are also different. resource_affinity_tag will be used to organize the OpenStack group for different category, and the binding relationship will be built based on the same tag in the flavor/volume extra_spec.

All the binding is for forwarding request purpose, and the most common binding policy is based on the OpenStack instance establishment time: it's quite reasonable in capacity expansion, if one OpenStack resource is exhausted, then a new one is added, and all new request should be forwarded to the new one. Unlike the scheduling in Nova, the scheduling is based on lots of factors, and try to realize load balancing among compute nodes. Load balancing among OpenStack instances is not the purpose of scheduling(binding relationship) in the Tricircle, although it can. And absolutely never care about the scheduling on compute node base, that is what is done in Nova. You can refer to the spec of "dynamic pod binding" https://github.com/openstack/tricircle/blob/master/specs/dynamic-pod-binding.rst The AZ is the parameter used in the Nova/Cinder API, so it's same concept, but in the Tricircle, one AZ may includes more than one bottom OpenStack instance.