Jump to: navigation, search

Tricircle

Revision as of 07:01, 1 July 2016 by Chaoyi Huang (talk | contribs) (Overview)

Tricircle provides an OpenStack API gateway and networking automation to allow multiple OpenStack instances, spanning in one site or multiple sites or in hybrid cloud, to be managed as a single OpenStack cloud

Use Cases

Massive distributed edge cloud

Now building massive distributed edge clouds in edge data centers with computing and storage close to end users is emerging for enterprise application, NFV service and personal service.

Enterprise Application

Some enterprises also found issues for applications running in remote centralized cloud, for example for video editing, 3D modeling application and IoT service etc which bandwidth and latency are sensitive.

The high bandwidth and low latency provided by the edge cloud are critical for enterprise level applications like video editing, 3D modeling, IoT service

For Enterprise, most of the employee will work in different branches, and access to the nearby edge cloud, and collaboration among employee from different branch leads to the requirement on cross edge cloud functionalities, like tenant level networking, data distribution and migration.

NFV and Edge Cloud Service

NFV(network function virtualization) will provide more flexible and better customized networking capability, for example, dynamic customized network bandwidth management, and also help to move the computing and storage close to end users. With shortest path from the end users to the storage and computing, the uplink speed could be larger and terminate the bandwidth consumption as quick as possible, will definitely bring better user experience, and change the way of content generation and store: real time, all data in cloud.

For example, an user/enterprise can dynamically ask for high bandwidth/storage requirement for streaming the HD video/AR/VR data into the cloud temporarily, after finishing streaming, ask for more computing resources to do the post processing, and re-distribute the video to other sites. And when a user want to move/re-distribute the application and data from one edge cloud to another one, should be able to dynamically ask for cross edge cloud bandwidth managed by NFV.

For VNF(telecom virtualized application), distributed designed VNF will be placed to multiple edge data centers for higher reliability/availability. To provide this support, typically requires state replication between application instances (directly or via replicated database services, or via private designed message format), tenant level isolated networking plane across data centers is needed for application state replication.

Personal service

Current Internet is good at processing down-link service. All contents are stored in remote centralized data centers and to some extent the access is accelerated with CDN.

As more and more users generate content uploaded/streamed to the cloud and web site, these contents and data still have to be uploaded/streamed to some centralized data centers, the path is long and the bandwidth is limited and slow. For example, it’s very slow to uploading/streaming HD/2k/4k video for every user concurrently. For pictures or videos, they have to be uploaded with quality loss, and slow, using cloud as the first storage for users data has not the choice yet, currently it’s mainly for backup, and for none time/latency sensitive data. Some video captured and stored with quality loss even lead to the difficulty to provide the crime evidence or other purpose. The last mile of network access (fix or mobile) is wide enough, the main hindrance is that bandwidth in MAN(Metropolitan Area Network) and Backbone and WAN is limited and expensive. Real time video/data uploading/streaming from end user/terminal to the local edge cloud is quite attractive cloud service.

From family or personal point of view, the movement or distribution of App/Storage from one edge data center to another one is also needed. For example, all video will be stored and processed in Hawaii locally when I am taking video in travelling, but I want the video after processing to be moved to Shenzhen China when I come back to China. But in Shenzhen, I want to share the video with streaming service not only in Shenzhen but to friends in Shanghai Beijing, so the data and the streaming service can be built in Shenzhen/Shanghai/Beijing too. The dynamically bandwidth incremental and app/data movement/replication can be helped through NFV edge cloud.

Requirements

The emerging massive distributed edge clouds will be not only some independent clouds, but also some new requirements are needed:

  • Tenant level L2/L3 networking across data centers for isolation to tenant's E-W traffic
  • Tenant level Volume/VM/object storage backup/migration/distribution
  • Distributed image management
  • Distributed quota management
  • ...

Large scale cloud

Compared with Amazon, the scalability of OpenStack is still not good enough. One Amazon AZ can supports >50000 servers(http://www.slideshare.net/AmazonWebServices/spot301-aws-innovation-at-scale-aws-reinvent-2014).

Cells is a good enhancement, but the shortage of Cells are: 1) only nova supports cells. 2) using RPC for inter-datacenter communication will bring the difficulty in inter-dc troubleshooting and maintenance, no CLI or other tools to manage a child cell directly, if the link between the API cell and child cells is broken, then the child cell is unmanageable. 3) upgrade has to deal with DB and RPC change. 4)difficult for multi-vendor integration for different cells.

From the experience of production large scale public cloud point of view, the large scale cloud can only be built by capacity expansion step by step (intra-AZ and inter-AZ). And the challenge in capacity expansion is how to do the sizing:

  • Number of Nova-API Server...
  • Number of Cinder-API Server..
  • Number of Neutron-API Server…
  • Number of Scheduler..
  • Number of Conductor…
  • Specification of physical switch…
  • Size of storage for Image..
  • Size of management plane bandwidth…
  • Size of data plane bandwidth…
  • Reservation of rack space …
  • Reservation of networking slots…
  • ….

You have to estimate, calculate, monitor, simulate, test, online grey expansion for controller nodes and network nodes…whenever you add new machines to the cloud. The difficulty is that you can’t test and verify in all size.

The feasible way to expand one large scale cloud is to add some already tested building block. That means we would prefer to build large scale public cloud by adding tested OpenStack instance (including controller and compute nodes) one by one, but would not prefer to unconditionally enlarge the capacity of one OpenStack instance. This way put the cloud construction under control.

Building large scale cloud by adding tested OpenStack instance one by one, but tenant's VMs may need to to be added to same network even if you add a new OpenStack building, or networks will be added into same router even if these networks of the tenant located in different OpenStack building blocks. But from the end user and PaaS point of view, they still want to use OpenStack API for already developed CLI, SDK, Portal, PaaS, Heat, Maganum, Murano etc. This way of building large scale public cloud also brings some new requirement to OpenStack based cloud which is quite similar like that in massive distributed edge clouds:

  • Tenant level L2/L3 networking across OpenStack instances for isolation to tenant's E-W traffic
  • Distributed quota management
  • Global resource view of the tenant
  • Tenant level Volume/VM migration/backup
  • Multi-DC image import/clone/export
  • ...

OpenStack API enabled hybrid cloud

Refer to https://wiki.openstack.org/wiki/Jacket


The detailed use cases could be found in this presentation: https://docs.google.com/presentation/d/1UQWeAMIJgJsWw-cyz9R7NvcAuSWUnKvaZFXLfRAQ6fI/edit?usp=sharing

And also can meet the demand for several working group: Telco WG documents, Large Deployment Team Use Cases, and OPNFV Multisite Use Cases

Overview

The Tricircle provides an OpenStack API gateway and networking automation to allow multiple OpenStack instances, spanning in one site or multiple sites or in hybrid cloud, to be managed as a single OpenStack cloud.

The Tricircle and these managed OpenStack instances will use shared KeyStone (with centralized or distributed deployment) or federated KeyStones for identity management. The Tricircle presents one big region to the end user in KeyStone. And each OpenStack instance called a pod is a sub-region of the Tricircle in KeyStone, and usually not visible to end user directly.

The Tricircle acts as OpenStack API gateway, can handle OpenStack API calls, schedule one proper OpenStack instance if needed during the API calls handling, forward the API calls to the appropriate OpenStack instance, and deal with tenant level L2/L3 networking across OpenStack instances automatically, so that the VMs of the tenant, no matter located in which bottom OpenStack instance, can communicate with each other via L2 or L3.

The end user can see availability zone(AZ) and use AZ to provision VM, Volume, even Network through the Tricircle. One AZ can include many OpenStack instances, the Tricircle can schedule and bind OpenStack instance for the tenant inside one AZ. A tenant's resources could be bound to multiple specific bottom OpenStack instances in one or multiple AZs automatically.

The Tricircle is the formal open source project for OpenStack cascading solution ( https://wiki.openstack.org/wiki/OpenStack_cascading_solution ) but with enhanced and decoupled design.

The Tricircle could be extended to support more powerful capabilities such as support the Tricircle instance being virtually splitted into multiple micro instances which could enable user to have a more fine granularity on the tenancy and service. And the Tricircle also enables OpenStack based hybrid cloud.

Architecture

The cascading solution based on PoC design with enhancement is running in several production clouds like Huawei Enterprise Cloud in China, which brings the confidence of the value of cascading model, here the focus is on how to design and develop a perfect solution in open source.

The initial architectural in the PoC is stateful, which could be found in https://wiki.openstack.org/wiki/OpenStack_cascading_solution, and the major headache part identified in the PoC are status synchronization for VM,Volume, etc, UUID mapping and coupling with OpenSatck existing services like Nova, Cinder.

Now the Tricircle is being developed with stateless design to remove the challenges, and fully decoupled from OpenStack services. An improved design blueprint is being developed in https://docs.google.com/document/d/18kZZ1snMOCD9IQvUKI5NVDzSASpw-QKj7l2zNqMEd3g/edit?usp=sharing,

Stateless Architecture

Tricircle improved architecture design - stateless


  • Nova API-GW
  1. A standalone web service to receive all nova api request, and routing the request to appropriate bottom OpenStack instance according to Availability Zone ( during creation ) or resource id ( during operation and query ). If more than one pod in one Availability Zone, schedule and forward the request to proper pod, and build the tenant ID and pod binding relationship.
  2. Nova APIGW is the functionality to trigger automatic networking when new VMs are being provisioned.
  3. work as stateless service, and could run with processes distributed in multi-hosts.
  • Cinder API-GW
  1. a standalone web service to receive all cinder api request, and route the request to appropriate bottom OpenStack according to Availability Zone ( during creation ) or resource id ( during operation and query ). If more than one pod in one Availability Zone, schedule and forward the request to proper pod, and build the tenant ID and pod binding relationship.
  2. Cinder APIGW and Nova APIGW will make sure the volumes for the same VM will co-locate in same OpenStack instance.
  3. work as stateless service, and could run with processes distributed in multi-hosts.
  • Neutron API Server
  1. Neutron API Server is reused from Neutron to receive and handle Neutron API request.
  2. Neutron Tricircle Plugin. It runs under Neutron API server in the same process like OVN Neutron plugin. The Tricircle plugin serve for tenant level L2/L3 networking automation across multi-OpenStack instances. It will use driver interface to call L2GW api, especially for cross OpenStack mixed VLAN / VxLAN L2 networking.
  • Admin API
  1. manage sites(bottom OpenStack instances) and availability zone mappings.
  2. Retrieve object uuid routing.
  3. Expose api for maintenance.
  • XJob
  1. Receive and process cross OpenStack functionalities and other async jobs from Nova API-GW, or Cinder API-GW, Admin API or Neutron Tricircle Plugin. For example, when booting a VM for the first time for the project, router, security group rule, FIP and other resources may have not already been created in the bottom OpenStack instance. But it’s required. Not like network,security group, ssh keypair, other resources they must be created before a VM booting. These resources could be created in async way to accelerate response for the first VM booting request.
  2. Cross OpenStack networking also will be done in async jobs.
  3. Any of Admin API, Nova API-GW, Cinder API-GW, Neutron Tricircle plugin could send an async job to XJob through message bus with RPC API provided by XJob.
  • Database
  1. The Tricircle has its own database to store pods, pod-bindings, jobs, resource routing tables.
  2. And Neutron Tricircle plugin reuse DB of Neutron, for one tenant’s network, router will be spread into multiple OpenStack instances, and managing tenant level IP/mac address to avoid conflict.

FAQ

Q: What is the different between Tricircle and OpenStack Cascading?

OpenStack Cascading was mainly an implementation method used in a PoC done in late 2014 and early 2015, which aims to test out the idea that multiple OpenStack instances COULD be deployed across multiple geo-diverse sites. After the PoC was carried out successfully, the team then planned to contribute the core idea to the community.

Tricircle Project was born out of that idea, however got a different shape and focus. Unlike what is usually part of in a PoC, which has plenty twists and plumbers of feature enhancements, Tricircle in its earliest stage tries to build a clean architecture that is extendable, pluggable and reusable in nature.

In short, OpenStack Cascading is a specific deployment solution used for production purpose, while Tricircle represents an idea of one type of services, like Neutron or Murano, that in the future could be applied to OpenStack Ecosystem.

Q: What is the goal of Tricircle?

In short term, Tricircle would focus on developing a robust architecture and related features, in a long run, we hope we could successfully establish a paradigm that could be applied to the whole OpenStack community

Q: How can I set up Tricircle hand by hand ?

Yes, some volunteers sucessfully set up the Tricircle in 3 VMs with virtualbox in Ubuntu 14.04 LTS. The blog can be found in this

Or refer to the README.md in https://github.com/openstack/tricircle for single node setup using devstack.

How to read source code

To read the source code, it's much easier if you follow this blueprint:

Implement Stateless Architecture: https://blueprints.launchpad.net/tricircle/+spec/implement-stateless

This blueprint is to build Tricircle from scratch

Resources

Tricircle is designed to use the same tools for submission and review as other OpenStack projects. As such we follow the OpenStack development workflow. New contributors should follow the getting started steps before proceeding, as a Launchpad ID and signed contributor license are required to add new entries.

Meeting minutes and logs

all meeting logs and minutes could be found in
2016: http://eavesdrop.openstack.org/meetings/tricircle/2016/
2015: http://eavesdrop.openstack.org/meetings/tricircle/2015/

To do list

To do list is in the etherpad: https://etherpad.openstack.org/p/TricircleToDo

Team Member

Contact team members in IRC channel: #openstack-tricircle

Current active participants

Joe Huang, Huawei

Khayam Gondal, Dell

Shinobu Kinjo, RedHat

Ge Li, China UnionPay

Vega Cai, Huawei

Pengfei Shi, OMNI Lab

Bean Zhang, OMNI Lab

Yipei Niu, Huazhong University of Science and Technology

Ronghui Chao, Hunan University

Xiongqiu Long, Hunan University

Zhuo Tang, Hunan University

Howard Huang, Huawei