Jump to: navigation, search

TricircleBigTentQnA

Revision as of 04:26, 13 July 2016 by Chaoyi Huang (talk | contribs) (you said that api compat is important, but then you're hiding region names from your api proxy - can you explain the difference?)

Contents

Questions and Answers in the Tricircle Big Tent project application

How much of the API specifics need to be reimplemented in the Cinder / Nova APIGW components ? How much maintenance is needed there in case of changes in the bottom APIs ?

For VM/Volume related API(like VM/Volum/Backup/Snapshot...), no need to be re-implemented in Cinder/Nova APIGW, for the Tricircle just forwards the request. For those APIs which manage common attributes like Cinder Volume Type, Nova Flavor,quota which are only some objects in the databases, need to be re-implemented. The maintenance for the change in the bottom APIs is quite small. The Tricircle reuses tempest test cases for Nova/Cinder/Neutron to guarantee if there is change in the bottom APIs, and the change has impact on the implementation of the Tricircle, then the check/gate test for each patch submitted in the Tricircle will be failed, so that the contributor can correct the Tricircle in time. In this patch, the check and gate test has just been added to the patches(https://review.openstack.org/#/c/339332/), more test cases will be opened to cover the features coming in the Tricircle.

You're using an independent release model, which means you do not follow the OpenStack development cycles. Nova, Cinder, Neutron and Keystone follow a cycle-based release model. How does the Tricircle release map to supported releases in bottom instances ? How does it map to the supported Keystone/Neutron implementations running in the top Tricircle instance ?

The Tricircle will release in the same cycle based model, and have a branch accordingly when Nova, Cinder, Neutron and KeyStone have a new branch. In this patch, "independent" release model is configured only temporary because the Tricircle is in early stage of development, currently we want to develop more features and but not strictly follow the milestones like Newton-1, Newton-2, Newton-3. The "independent" release model may last one to two releases, when most of basic features are ready, will use the same release model as Nova, Cinder, Neutron and KeyStone. How about your suggestion for the Tricircle release model.

The milestone model is actually less strict than the other release models, because the milestones are picked based on the schedule rather than the stability of the code. If you follow an independent or cycle-with-intermediary model then you are telling your users that all releases are ready to be used in production. At this point, we are past the second milestone, and so I don't think Tricircle would be considered part of Newton anyway just because of the timing. That said, if you intend to try to follow the release cycle, choosing one of those models instead of independent will help users understand that.

Thank you for your comment. OK, will use cycle-with-intermediary release model instead. The comment for concise mission statement will be updated in next patch.

I don't like how this attempts to re-implement our core APIs: https://github.com/openstack/tricircle/blob/master/tricircle/nova_apigw/controllers/server.py#L121 The above shows many "expected" 500 errors, which is something we explicitly call a bug in OpenStack APIs. I am curious if DefCore tests pass when using Tricircle. It certainly fails the requirement for using certain upstream code sections.

The Tricircle has not had better error handling yet, this needs to be fixed, thank you for pointing out this "500" error handling issue. The check and gate test which is reusing Nova, Cinder,Neutron tempest test cases has just been added to the Tricircle in this patch(https://review.openstack.org/#/c/339332/). Because the job was just merged last week, currently only volume list|get related test cases were opened to test the tricircle(https://github.com/openstack/tricircle/blob/master/tricircle/tempestplugin/tempest_volume.sh). ostestr --regex '(tempest.api.volume.test_volumes_list|\ tempest.api.volume.test_volumes_get)' Server related and other test cases will be added into the job step by step. If tempest test cases pass, then DefCore tests should also pass. The Tricircle should have its own error handling mechanism otherwise there could be mishandling, but the output will be kept consistency as Nova, Cinder, Neutron.It's because that main feature of its is to handle remote resources running on independent openstack instances. Means that the Tricircle should have some capabilities of what are happening on remote site considering consistency.

We have discussed cascading at a previous design summit session, and on the ML. There were questions from that session around use cases that were never answered. In particular, why not exposing the geographical regions and AZs was not acceptable. The cases where a proxy approach seemed to be required, didn't appear to be the target use cases. I don't like how this dilutes the per project efforts around Federation, multi-Region support and scaling patterns like Cells v2 and Routed Networks. It would be better if as a community there is a single way to consume large collections of OpenStack clouds. Federation appears to be the current approach, although there is still work needed around Quotas and having an integrated network service to bridge remote isolated networks.

There are four use cases described why we need the Tricircle project in the reference material[1]:https://docs.google.com/presentation/d/1Zkoi4vMOGN713Vv_YO0GP6YLyjLpQ7fRbHlirpq6ZK4/edit?usp=sharing, and exposing independent geographical regions and AZs was not enough, should I describe all use cases in the commit message? This will make the commit message quite long. Just one use cases here in short, in OpenStack based public cloud, for one region in one site or multi-sites, the end user only wants to see one endpoint. But one OpenStack instance will reach the capacity limit at last, we have to add more OpenStack instance into the cloud for capacity expansion, how to expose one endpoint to the end user? And the end user still wants to add new virtual machine into same network, and security group should work for the virtual machines in different OpenStack instances. There are other use cases are described the in the above mentioned communication material. In financial area, application will often be deployed into two sites and three data centers for high reliability, availability and durability, one region cloud often should support multi-data center and multi-site. Except the four use cases mentioned in the material[1] mentioned above, and there is another use cases reported in OpenStack Austin summit: https://www.openstack.org/videos/video/distributed-nfv-and-openstack-challenges-and-potential-solutions . Federation and multi-region are good solutions, but they don't provide single endpoint exposed to the end user, that is one requirement in the use cases mentioned above, and also no networking automation(for example tenant level L2/L3 networking automation and security handling), no quota control across OpenStack instances. Cells is a good enhancement for Nova scalability, but there are some limitation in deployment for Cells are: 1)only nova supports cells. 2) using RPC for inter-data center communication will bring the difficulty in inter-dc troubleshooting and maintenance, no CLI or restful API or other tools to manage a child cell directly. If the link between the API cell and child cells is broken, then the child cell in the remote site is unmanageable. Analysis and compare of these candidate solutions is also provided in the material[1].

The Tricircle is just applying big-tent project, to be a member and complement of the OpenStack Eco-system. The Tricircle will not require any modification on existing components, it will make use of existing or updated features on existing components, and re-use tempest test cases to ensure the API compliance and consistency. No conflict will happen. As you proposed here, multi-region, cells and federation are possible ways to address these use cases, although some requirements are still not fulfilled. So there are many options for cloud operators, it's no harm for the Tricircle to provide one more option.

If the intent is to hide orchestration complexity from the user, it feels like this would be better as extensions to Heat.

Heat does't provide Nova, Cinder, Neutron API to end user, instead Heat provides its own APIs, but the end user or software still wants to use CLI or APIs or SDK of Nova, Cinder, Neutron. Especially for public cloud, some PaaS platform will talk to Nova, Cinder, Neutron API directly.

Is Tricircle planning to be gateway for every OpenStack project?

No, Nova, Cinder , Neutron only, at most +Glance + Ceilometer, No more

How can we verify the API's exposed by Tricircle are indeed identical to the service's ?

Have explained in the commit meesage and comment many time: reuse the tempest test cases of these services to test tricircle

How does this impact defcore?

If tempest can pass, then defcore pass

What happens if a cloud exposes Tricircle instead of exposing, say, nova directly?

Adding more cross OpenStack scheduling and netwroking automation capabibility

"To provide an API gateway and networking automation to allow multiple OpenStack instances to act as a single cloud" How about if we say "OpenStack clouds" then instead of "OpenStack instances"? Because I'm worried that folks will misunderstand "instance" here to mean "compute instance" and that's not at all what you're doing.

will update it in next patch

By "networking automation" do you mean automation of the physical networking constructing the underlying HW these OpenStack instances are installed on?

Thank you for your comment. The networking automation is to call bottom Neutron APIs (L2GW APIs in some scenario) to establish L2 or L3 network across Neutrons for the same tenant.

"Neutron API Server with Neutron Tricircle plugin", Is this a pure Neutron API? Because I would guess with Tricircle you're interacting not with Neutron but instead with Tricircle through the API GW, and I'm concerned about diverging from the underlying Neutron API here.

Thank you for your comment. This is pure Neutron API Server, and in the Tricircle will be configured with Neutron Tricircle Plugin. It'll be easier to explain in deployment scenario: when the cloud operator wants to install the Tricircle, the Neutron (https://github.com/openstack/neutron) should be installed first, then install the Neutron Tricircle Plugin, and configure the Neutron to use the Neutron Tricircle Plugin. After the installation and configuration is finished, then run the Neutron, Neutron will load the Neutron Tricircle plugin, just like OVN plugin or Dragonflow plugin or ODL plugin running under Neutron. The Neutron API server and Neutron database are required for Neutron to run. But for the Neutron Tricircle plugin, no agent node is needed, the Neutron Tricircle will call the bottom Neutron API as needed(L2GW API in some scenario) through Neutron RESTful API. In fact, in the Tricircle repository( https://github.com/openstack/neutron), only the Neutron Tricircle Plugin source code is developed and delivered. Neutron API server and database is what developed in Neutron project today, and in the repository https://github.com/openstack/neutron. The Tricircle project will not touch the Neutron source code in the Tricircle repository.

Are you mapping Neutron networks across Neutron installs using L2GW? If so, if a tenant interacts with the individual neutron servers, they will see different values for things like VLANs/VNIs, which may be confusing.

Good question. Thank you for your comment. The Tricircle planned to support several cross OpenStack L2 networking model: 1) local_network: a network will only spread in one bottom OpenStack 2) shared_vlan: a same VLAN segment network can spread into multiple bottom OpenStack clouds 3) shared_vxlan: a same VxLAN segment network can spread into multiple bottom OpenStack clouds 4) mixed_vlan_vxlan: a L2 network which can spread into multiple OpenStack clouds, and is consisted of different physical network type and different segment id. Usually the Tenant can not see the network type and segment id, if the tenant can see, then for 1)2)3) cases, the tenant can see same network segment id, for 4), it's like hierarchy port binding in neutron, the network has one master VxLAN network, with several dynamic binding VLAN or VxLAN networks in different bottom Neutrons. Just like the handling in current Neutron, if the network is multi-provider network, if the tenant is allowed to see the provider network information, then the tenant will see the network is consisted of different network type and segment id, especially for hierarchy port binding implementation. The Tricircle has similar handling. The spec about the cross OpenStack L2 networking is here: https://github.com/openstack/tricircle/blob/master/specs/cross-pod-l2-networking.rst The original discussion about the cross OpenStack L2 networking is at [21] https://etherpad.openstack.org/p/TricircleCrossPodL2Networking

I'm curious how Tricircle does the scheduling. Does it choose an OpenStack instance at random, or does it actually inspect the resources available in each instance and schedule based on that? I worry that this will eventually reimplement Nova's scheduler, or even be different enough that a request would be scheduled to an OpenStack instance that cannot satisfy the request.For instance, imagine that instance A has compute nodes with capability X, and instance B does not. How does tricircle ensure that a request for a VM with capability X get to instance A, without reimplementing Nova's scheduler?Also, are the AZs discussed here the group of AZs defined in Nova, or does tricircle have its own idea of what an AZ is?

Thank you for your comment. In fact, the scheduling in Tricircle is mostly for capacity expansion in one AZ(AZ is "availability zone" in short) if one AZ includes more than one bottom OpenStacks. In production cloud, at first, only one OpenStack instance will be put in one AZ, and prefer to add more compute node to this OpenStack instance, if almost all compute nodes have been occupied with VMs and no more compute nodes will be added into the OpenStack, then we have to add one more OpenStack instance into the AZ. Because one more OpenStack instance is added, so we need to forward all new VM/volume provision request to the new OpenStack instance, we have to build the relationship tenant-id and OpenStack, this binding could be done dynamically by the Tricircle, or by admin through admin api for maintenance purpose. Moreover, OpenStack instances inside one AZ can be classified into different categories. For example, servers in one OpenStack are only for general purposes, and the other OpenStack may be built for heavy load CAD modeling with GPU. So OpenStack in one AZ could be divided into different groups. Different OpenStack groups for different purposes, and the VM's cost and performance are also different. resource_affinity_tag will be used to organize the OpenStack group for different category, and the binding relationship will be built based on the same tag in the flavor/volume extra_spec.

All the binding is for forwarding request purpose, and the most common binding policy is based on the OpenStack instance establishment time: it's quite reasonable in capacity expansion, if one OpenStack resource is exhausted, then a new one is added, and all new request should be forwarded to the new one. Unlike the scheduling in Nova, the scheduling is based on lots of factors, and try to realize load balancing among compute nodes. Load balancing among OpenStack instances is not the purpose of scheduling(binding relationship) in the Tricircle, although it can. And absolutely never care about the scheduling on compute node base, that is what is done in Nova. You can refer to the spec of "dynamic pod binding" https://github.com/openstack/tricircle/blob/master/specs/dynamic-pod-binding.rst The AZ is the parameter used in the Nova/Cinder API, so it's same concept, but in the Tricircle, one AZ may includes more than one bottom OpenStack instance.

I agree with the concerns that this may end in less interoperability. The big red flag to me is that tricircle doesn't do microversions yet. These have been around for multiple cycles, and omitting them shows me just how far behind the API proxy implementations may get, and how wide the delta might be between a cloud running nova-api and a cloud running tricircle's API.

Thank you for your comment. Thank you to point out that the gap in micro-version support in the Tricircle. The Tricircle will prompt the priority for the micro-version support in Nova API-GW and Cinder API-GW. On the other hand, the Tricircle is just applying big-tent project, that means, the Tricircle service is just one optional service to deploy OpenStack cloud. If the cloud operator think that the Tricircle is not mature, and especially in micro-version support, they don't have to deploy the Tricircle. Nova/Cinder/Neutron are these fundamental OpenStack services, Nova/Cinder/Neutron support micro version, if some new features are added to these fundamental OpenStack services, and the micro version is changed, there are so many other core services and big-tent projects, I don't think all of them can immediately utilize and providing capability for the new features in the same cycle, if so, then any feature in Nova/Cinder/Neutron need to coordinate all other related projects. So it's reasonable for some new features(which lead to the micro-version change) will be introduced later in other core services and big-tent projects, including the Tricircle. Just one example, Glance has implemented V2 API long time ago, KeyStone has implemented V3 API long time ago, all other services even the core projects like Nova/Cinder/Neutron get this new version's feature into the project in several development cycles. So it's reasonable that the Tricircle may introduce the features which lead to micro-version change in Nova/Cinder later than Nova/Cinder, no need to support in sync. way. The Tricircle will provide the release note which micro-versions are supported(which may be later than Nova/Cinder), so that the cloud operator can make decision to use the Tricircle or not.

"I'll be honest, it's the proxies that worry me, it reimplements APIs in a not very interop friendly way." "I think the proxying part is the one that really worries me. It's good that they are using tempest. " "which makes it less worrisome as it actually tests the API. However, I've the feeling that won't be enough And we'll make clouds not interoperable"

For VM/Volume related API(like VM/Volum/Backup/Snapshot...), no need to be re-implemented in Cinder/Nova APIGW, for the Tricircle just forwards the request. For those APIs which manage common attributes like Cinder Volume Type, Nova Flavor,quota which are only some objects in the databases, need to be re-implemented. The maintenance for the change in the bottom APIs is quite small. The Tricircle reuses tempest test cases for Nova/Cinder/Neutron to guarantee if there is change in the bottom APIs, and the change has impact on the implementation of the Tricircle, then the check/gate test for each patch submitted in the Tricircle will be failed, so that the contributor can correct the Tricircle in time. In this patch, the check and gate test has just been added to the patches(https://review.openstack.org/#/c/339332/), more test cases will be opened to cover the features coming in the Tricircle.

Tempest test cases will be opened step by step in this folder: https://github.com/openstack/tricircle/tree/master/tricircle/tempestplugin. The check and gate test which is reusing Nova, Cinder,Neutron tempest test cases has just been added to the Tricircle in this patch(https://review.openstack.org/#/c/339332/). Because the job was just merged last week, currently only volume list|get related test cases were opened to test the tricircle(https://github.com/openstack/tricircle/blob/master/tricircle/tempestplugin/tempest_volume.sh). ostestr --regex '(tempest.api.volume.test_volumes_list|\ tempest.api.volume.test_volumes_get)' Server related and other test cases will be added into the job step by step.

For cloud inter-operable, it's based on the APIs. Tempest test cases is to test the API consistency in different clouds, so tempest test cases can guarantee the inter-operable.

"I worry we will end up with two ways of scaling openstack that are not API compatible"

first, the bottom openstack can still use cellsv2 in nova second, tricircle deals cross neutron networking the third, some use cases, for example, use case2, use cases3 use case 4, multiple openstack instances required, not just to scale single openstack. the deployment decision has already for multiple openstack instances. especially use case 2, financial area is serious on security.

We've previously declared that DefCore should not test via proxies, because it removes control of the API definition from the team implementing the API. http://governance.openstack.org/resolutions/20160504-defcore-proxy-tests.html Regardless of whether the intent is to be absolutely compatible or not, the practice of using a "smart" proxy introduces the chance that some incompatibility will be there, and so a cloud with Tricircle and a cloud without Tricircle will behave differently. So while the team itself seems to be doing things a good way, I'm afraid adding this project will break our previous proxy rule.

we'll use tempest test cases to test the tricircle for compatibility, and defcore test if needed. defcore test can be added

the point is that we have already very clearly said that defcore should not test projects through a proxy like this so whether or not the tests pass isn't the point

has tricircle team been able to influence other teams? to make things easier for you?

currently has interaction with L2GW, and no feature requirements on nova yet, so new requirement to nova. and also tacker talked to tricircle for multi-site support

you said that api compat is important, but then you're hiding region names from your api proxy - can you explain the difference?

region name is not used in any api to nova/cinder/neutro yet : ). but it is used in the keysotne connection, and it's pretty important data for an end user to understand where their resource are? in keystone, all region, subregion can be shown by the admin. which I think is why I was asking about the region name thing - it seems like a difference in conceptual model which could lead it to be a competing view of how we should think of resources and keystone supports region/sub-region model, so you will see like a region/sub-region tree.

Still on the proxy argument. I wonder how tricircle is planning to keep up with project's adding new APIs. It takes a bit longer for defcore to add an API but projects could add new APIs every cycle. In addition to this, I'd like to remind ppl that Glance is still paying the price of nova's image proxy. If someone uses Tricircle internally, I think it's less of a problem. Tricircle as a public service is probably what worries me the most. I'm worried about the proxy and the technical impact that has on the projects and the duplication of efforts

first, this is bigtent application, and I answered the concerns on one bigtent project, am I wrong? second the reference matrial[1] has listed the use cases where we have to use the tricircle. for adding new api, tricircle is a bigtent project(if), will not block nova,cinder/neutorn to add new app. if these projects add new api, tricircle will implement later(not reimplement all code).

(the following answer is added after the IRC meeting) The Tricircle will prompt the priority for the micro-version support in Nova API-GW and Cinder API-GW. On the other hand, the Tricircle is just applying big-tent project, that means, the Tricircle service is just one optional service to deploy OpenStack cloud. If the cloud operator think that the Tricircle is not mature, and especially in micro-version support, they don't have to deploy the Tricircle. Nova/Cinder/Neutron are these fundamental OpenStack services, Nova/Cinder/Neutron support micro version, if some new features are added to these fundamental OpenStack services, and the micro version is changed, there are so many other core services and big-tent projects, I don't think all of them can immediately utilize and providing capability for the new features in the same cycle, if so, then any feature in Nova/Cinder/Neutron need to coordinate all other related projects. So it's reasonable for some new features(which lead to the micro-version change) will be introduced later in other core services and big-tent projects, including the Tricircle. Just one example, Glance has implemented V2 API long time ago, KeyStone has implemented V3 API long time ago, all other services even the core projects like Nova/Cinder/Neutron get this new version's feature into the project in several development cycles. So it's reasonable that the Tricircle may introduce the features which lead to micro-version change in Nova/Cinder later than Nova/Cinder, no need to support in sync. way. The Tricircle will provide the release note which micro-versions are supported(which may be later than Nova/Cinder), so that the cloud operator can make decision to use the Tricircle or not.

notes nova adds lots of API microversions on every cycle http://docs.openstack.org/developer/nova/api_microversion_history.html So I am worried this hurts the openstack mission

for these who use nova/cinder/neutron directly can have latest api, tricirle will introduce the feature and microversion later

(the following answer is added after the IRC meeting) The Tricircle will prompt the priority for the micro-version support in Nova API-GW and Cinder API-GW. On the other hand, the Tricircle is just applying big-tent project, that means, the Tricircle service is just one optional service to deploy OpenStack cloud. If the cloud operator think that the Tricircle is not mature, and especially in micro-version support, they don't have to deploy the Tricircle. Nova/Cinder/Neutron are these fundamental OpenStack services, Nova/Cinder/Neutron support micro version, if some new features are added to these fundamental OpenStack services, and the micro version is changed, there are so many other core services and big-tent projects, I don't think all of them can immediately utilize and providing capability for the new features in the same cycle, if so, then any feature in Nova/Cinder/Neutron need to coordinate all other related projects. So it's reasonable for some new features(which lead to the micro-version change) will be introduced later in other core services and big-tent projects, including the Tricircle. Just one example, Glance has implemented V2 API long time ago, KeyStone has implemented V3 API long time ago, all other services even the core projects like Nova/Cinder/Neutron get this new version's feature into the project in several development cycles. So it's reasonable that the Tricircle may introduce the features which lead to micro-version change in Nova/Cinder later than Nova/Cinder, no need to support in sync. way. The Tricircle will provide the release note which micro-versions are supported(which may be later than Nova/Cinder), so that the cloud operator can make decision to use the Tricircle or not.

Now if it's a different API that does a specific thing, then that's not so bad either, it's more like competing with heat orchestration. Feels like heat should orchestra setting up the security groups and l2 gateways between regions

Heat does't provide Nova, Cinder, Neutron API to end user, instead Heat provides its own APIs, but the end user or software still wants to use CLI or APIs or SDK of Nova, Cinder, Neutron. Especially for public cloud, some PaaS platform will talk to Nova, Cinder, Neutron API directly. and we need to consider the use cases in the material[1]

Firstly I think the concept would be a great addition to OpenStack. My question is about the DB. You mentioned in the commit that Tricircle uses its own DB - can you elaborate a bit more on that? What DB is used? Are you re-using existing DB technologies already present in other OpenStack projects?

I think he means a separate database, not a special one, the database has several tables for the tricircle. right, it appears to use sqlalchemy https://github.com/openstack/tricircle/blob/master/tricircle/db/models.py