Tricircle before splitting

The Tricircle provides an OpenStack API gateway and networking automation to allow multiple OpenStack instances, spanning in one site or multiple sites or in hybrid cloud, to be managed as a single OpenStack cloud

Use Cases

Massive distributed edge cloud

Now building massive distributed edge clouds in edge data centers with computing and storage close to end users is emerging for enterprise application, NFV service and personal service.

Enterprise Application

Some enterprises also found issues for applications running in remote centralized cloud, for example for video editing, 3D modeling application and IoT service etc which bandwidth and latency are sensitive.

The high bandwidth and low latency provided by the edge cloud are critical for enterprise level applications like video editing, 3D modeling, IoT service

For Enterprise, most of the employee will work in different branches, and access to the nearby edge cloud, and collaboration among employee from different branch leads to the requirement on cross edge cloud functionalities, like tenant level networking, data distribution and migration.

NFV and Edge Cloud Service

NFV(network function virtualization) will provide more flexible and better customized networking capability, for example, dynamic customized network bandwidth management, and also help to move the computing and storage close to end users. With shortest path from the end users to the storage and computing, the uplink speed could be larger and terminate the bandwidth consumption as quick as possible, will definitely bring better user experience, and change the way of content generation and store: real time, all data in cloud.

For example, an user/enterprise can dynamically ask for high bandwidth/storage requirement for streaming the HD video/AR/VR data into the cloud temporarily, after finishing streaming, ask for more computing resources to do the post processing, and re-distribute the video to other sites. And when a user want to move/re-distribute the application and data from one edge cloud to another one, should be able to dynamically ask for cross edge cloud bandwidth managed by NFV.

For VNF(telecom virtualized application), distributed designed VNF will be placed to multiple edge data centers for higher reliability/availability. To provide this support, typically requires state replication between application instances (directly or via replicated database services, or via private designed message format), tenant level isolated networking plane across data centers is needed for application state replication.

Personal service

Current Internet is good at processing down-link service. All contents are stored in remote centralized data centers and to some extent the access is accelerated with CDN.

As more and more users generate content uploaded/streamed to the cloud and web site, these contents and data still have to be uploaded/streamed to some centralized data centers, the path is long and the bandwidth is limited and slow. For example, it’s very slow to uploading/streaming HD/2k/4k video for every user concurrently. For pictures or videos, they have to be uploaded with quality loss, and slow, using cloud as the first storage for users data has not the choice yet, currently it’s mainly for backup, and for none time/latency sensitive data. Some video captured and stored with quality loss even lead to the difficulty to provide the crime evidence or other purpose. The last mile of network access (fix or mobile) is wide enough, the main hindrance is that bandwidth in MAN(Metropolitan Area Network) and Backbone and WAN is limited and expensive. Real time video/data uploading/streaming from end user/terminal to the local edge cloud is quite attractive cloud service.

From family or personal point of view, the movement or distribution of App/Storage from one edge data center to another one is also needed. For example, all video will be stored and processed in Hawaii locally when I am taking video in travelling, but I want the video after processing to be moved to Shenzhen China when I come back to China. But in Shenzhen, I want to share the video with streaming service not only in Shenzhen but to friends in Shanghai Beijing, so the data and the streaming service can be built in Shenzhen/Shanghai/Beijing too. The dynamically bandwidth incremental and app/data movement/replication can be helped through NFV edge cloud.

Requirements

The emerging massive distributed edge clouds will be not only some independent clouds, but also some new requirements are needed:

Tenant level L2/L3 networking across data centers for isolation to tenant's E-W traffic
Tenant level Volume/VM/object storage backup/migration/distribution
Distributed image management
Distributed quota management
...

Large scale cloud

Compared with Amazon, the scalability of OpenStack is still not good enough. One Amazon AZ can supports >50000 servers(http://www.slideshare.net/AmazonWebServices/spot301-aws-innovation-at-scale-aws-reinvent-2014).

Cells is a good enhancement for Nova scalability, but the shortage of Cells are: 1) only nova supports cells. 2) using RPC for inter-data center communication will bring the difficulty in inter-dc troubleshooting and maintenance, no CLI or other tools to manage a child cell directly. If the link between the API cell and child cells is broken, then the child cell is unmanageable.

From the experience of production large scale public cloud point of view, the large scale cloud can be built by capacity expansion step by step (intra-AZ and inter-AZ). And the challenge in capacity expansion is how to do the sizing:

Number of Nova-API Server...
Number of Cinder-API Server..
Number of Neutron-API Server…
Number of Scheduler..
Number of Conductor…
Specification of physical switch…
Size of storage for Image..
Size of management plane bandwidth…
Size of data plane bandwidth…
Reservation of rack space …
Reservation of networking slots…
….

You have to estimate, calculate, monitor, simulate, test, online grey expansion for controller nodes and network nodes…whenever you add new machines to the cloud. The difficulty is that you can’t test and verify in all size.

The feasible way to expand one large scale cloud is to add some already tested building block. That means we would prefer to build large scale public cloud by adding tested OpenStack instance (including controller and compute nodes) one by one, but would not prefer to unconditionally enlarge the capacity of one OpenStack instance. This way put the cloud construction under control.

Building large scale cloud by adding tested OpenStack instance one by one, but tenant's VMs may need to to be added to same network even if you add a new OpenStack building, or networks will be added into same router even if these networks of the tenant located in different OpenStack building blocks. But from the end user and PaaS point of view, they still want to use OpenStack API for already developed CLI, SDK, Portal, PaaS, Heat, Maganum, Murano etc. This way of building large scale public cloud also brings some new requirement to OpenStack based cloud which is quite similar like that in massive distributed edge clouds:

Tenant level L2/L3 networking across OpenStack instances for isolation to tenant's E-W traffic
Distributed quota management
Global resource view of the tenant
Tenant level Volume/VM migration/backup
Multi-DC image import/clone/export
...

OpenStack API enabled hybrid cloud

Refer to https://wiki.openstack.org/wiki/Jacket

The detailed use cases could be found in this presentation: https://docs.google.com/presentation/d/1UQWeAMIJgJsWw-cyz9R7NvcAuSWUnKvaZFXLfRAQ6fI/edit?usp=sharing

More technical use cases could be found in the communication material used in Tricircle big-tent project application: https://docs.google.com/presentation/d/1Zkoi4vMOGN713Vv_YO0GP6YLyjLpQ7fRbHlirpq6ZK4/edit?usp=sharing

And also can meet the demand for several working group: Telco WG documents, Large Deployment Team Use Cases, and OPNFV Multisite Use Cases

Overview

The Tricircle provides an OpenStack API gateway and networking automation to allow multiple OpenStack instances, spanning in one site or multiple sites or in hybrid cloud, to be managed as a single OpenStack cloud.

The Tricircle and these managed OpenStack instances will use shared KeyStone (with centralized or distributed deployment) or federated KeyStones for identity management. The Tricircle presents one big region to the end user in KeyStone. And each OpenStack instance called a pod is a sub-region of the Tricircle in KeyStone, and usually not visible to end user directly.

The Tricircle acts as OpenStack API gateway, can handle OpenStack API calls, schedule one proper OpenStack instance if needed during the API calls handling, forward the API calls to the appropriate OpenStack instance, and deal with tenant level L2/L3 networking across OpenStack instances automatically, so that the VMs of the tenant, no matter located in which bottom OpenStack instance, can communicate with each other via L2 or L3.

The end user can see availability zone(AZ) and use AZ to provision VM, Volume, even Network through the Tricircle. One AZ can include many OpenStack instances, the Tricircle can schedule and bind OpenStack instance for the tenant inside one AZ. A tenant's resources could be bound to multiple specific bottom OpenStack instances in one or multiple AZs automatically.

The Tricircle is the formal open source project for OpenStack cascading solution ( https://wiki.openstack.org/wiki/OpenStack_cascading_solution ) but with enhanced and decoupled design.

The Tricircle could be extended to support more powerful capabilities such as support the Tricircle instance being virtually splitted into multiple micro instances which could enable user to have a more fine granularity on the tenancy and service. And the Tricircle also enables OpenStack based hybrid cloud.

Architecture

Now the Tricircle is designed as standalone service which is decoupled from OpenStack existing services like Nova, Cinder, Neutron. The design blueprint has been developed with ongoing improvement in https://docs.google.com/document/d/18kZZ1snMOCD9IQvUKI5NVDzSASpw-QKj7l2zNqMEd3g/edit?usp=sharing,

Sared KeyStone (centralized or distributed deployment) or federated KeyStones could be used for identity management for the Tricrcle and managed OpenStack instances. The Tricircle presents one big region to the end user in KeyStone. And each OpenStack instance called a pod is a sub-region of the Tricircle in KeyStone, and usually not visible to end user directly.

Nova API-GW

A standalone web service to receive all nova API request, and routing the request to appropriate bottom OpenStack instance according to Availability Zone ( during creation ) or VM's uuid ( during operation and query ). If more than one OpenStack instance in one Availability Zone, schedule one and forward the request to proper OpenStack instance, and build the binding relationship beween tenant ID and OpenStack instance.
Nova APIGW is the functionality to trigger networking automation when new VMs are being provisioned.
work as stateless service, and could run with processes distributed in multi-hosts.

Cinder API-GW

a standalone web service to receive all cinder API request, and route the request to appropriate bottom OpenStack instance according to Availability Zone ( during creation ) or resource id like volume/backup/snapshot uuid ( during operation and query ). If more than one OpenStack instance in one Availability Zone, schedule one OpenStack instance and forward the request to proper OpenStack instance, and build binding relationship between the tenant ID and OpenStack instance.
Cinder APIGW and Nova APIGW will make sure the volumes for the same VM will co-locate in same OpenStack instance.
work as stateless service, and could run with processes distributed in multi-hosts.

Neutron API Server

Neutron API Server is reused from Neutron to receive and handle Neutron API request.
Neutron Tricircle Plugin. It runs under Neutron API server in the same process like OVN Neutron plugin. The Tricircle plugin serve for tenant level L2/L3 networking automation across multi-OpenStack instances. It will use driver interface to call bottom OpenStack Neutron API and L2GW API if needed, especially for cross OpenStack mixed VLAN / VxLAN L2 networking.

Admin API

manage sites(bottom OpenStack instances) and availability zone mappings.
Retrieve object uuid routing.
Expose API for maintenance.

XJob

Receive and process cross OpenStack functionalities and other async. jobs from Nova API-GW, or Cinder API-GW, Admin API or Neutron Tricircle Plugin. For example, when booting a VM for the first time for the tenant, router, security group rule, FIP and other resources may have not already been created in the bottom OpenStack instance. But it’s required. Not like network,security group, ssh keypair, other resources they must be created before a VM booting. These resources could be created in async. way to accelerate response for the first VM booting request.
Cross OpenStack networking also will be done in async. jobs.
Any of Admin API, Nova API-GW, Cinder API-GW, Neutron Tricircle plugin could send an async. job to XJob through message bus with RPC API provided by XJob.

Database

The Tricircle has its own database to store pods, pod-bindings, jobs, resource routing tables.
Neutron Tricircle plugin reuse DB of Neutron, for one tenant’s network, router will be spread into multiple OpenStack instances, and managing tenant level IP/mac address to avoid conflict across different OpenStack instances.

For Glance deployment, there are several choice:

Shared Glance, if all OpenStack instances are located inside a high bandwidth, low latency site.
Shared Glance with distributed back-end, if OpenStack instances are located in several sites.
Distributed Glance deployment, Glance service is deployed distributed in multiple site with distributed back-end
Separate Glance deployment, each site is installed with separate Glance instance and back-end, no cross site image sharing is needed.

Value

The motivation to develop the Tricircle open source project:

The cascading solution based on PoC design with enhancement has been running in several production clouds, which showed the value of one OpenStack API gateway layer with networking automation functionality above multiple OpenStack instances, no matter in large scale centralized cloud scenario, or distributed enterprise application located in the distributed edge clouds, even hybrid cloud scenario:

OpenStack API eco-system reserved, from CLI, SDK to Heat, Murano, Magum etc, all of these could be reused seamlessly.
support modularized capacity expansion in large scale cloud.
L2/L3 networking automation across OpenStack instances.
Tenant's VMs communicate with each other via L2 or L3 networking across OpenStack instances.
Security group applied across OpenStack instances.
Tenant level IP/mac addesses management to avoid conflict across OpenStack instances.
Tenant level quota control across OpenStack instances.
Global resource usage view across OpenStack instances.
User level KeyPair management across OpenStack instances.
Tenant's data movement across OpenStack instances thanks to the tenant level L2/L3 networking.
...

Installation and Play

Set up the Tricircle in 3 VMs with virtualbox in Ubuntu 14.04 LTS. Install Tricircle in VirtualBox.
Or refer to installation guide in https://github.com/openstack/tricircle for single node/two nodes setup using devstack.

FAQ

Q: What is the difference between Tricircle and OpenStack Cascading?

OpenStack Cascading was mainly a solution used in a PoC done in late 2014 and early 2015, which aims to test out the idea that multiple OpenStack instances COULD be deployed across multiple geo-diverse sites, and managed by an OpenStack API layer, which was based on OpenStack services. After the PoC was carried out successfully, the team then planned to contribute the core idea to the community.

Tricircle was born out of that idea, however got a different shape and focus. Unlike what is implemented in the V1 of OpenStack cascading solution in the PoC, which has plenty twists and plumbers of feature enhancements, Tricircle in its earliest stage tries to build a clean architecture that is extendable, pluggable and reusable in nature.

In short, OpenStack Cascading is a specific deployment solution, while Tricircle represents a standalone project with decoupled group of services, like any other OpenStack project for example Neutron, Nova or Glance, etc, that in the future could be applied to OpenStack Ecosystem.

Q: What is the goal of Tricircle?

In short term, Tricircle would focus on developing a robust architecture and related features, in a long run, we hope we could successfully establish a paradigm that could be applied to the whole OpenStack community

How to read the source code

To read the source code, it's much easier if you follow this blueprint:

Implement Stateless Architecture: https://blueprints.launchpad.net/tricircle/+spec/implement-stateless

This blueprint is to build Tricircle from scratch

Resources

Design documentation: Tricircle Design Blueprint
Wiki: https://wiki.openstack.org/wiki/tricircle
Source: https://github.com/openstack/tricircle
Bugs: http://bugs.launchpad.net/tricircle
Blueprints: https://launchpad.net/tricircle
Review Board: https://review.openstack.org/#/q/project:openstack/tricircle
Weekly meeting IRC channel: #openstack-meeting, irc.freenode.net on every Wednesday starting from UTC 13:00 to UTC 14:00
Weekly meeting IRC log: https://wiki.openstack.org/wiki/Meetings/Tricircle
Tricircle project IRC channel: #openstack-tricircle on irc.freenode.net
Tricircle project IRC channel log: http://eavesdrop.openstack.org/irclogs/%23openstack-tricircle/
Mail list: openstack-dev@lists.openstack.org, with [openstack-dev][tricircle] in the mail subject
New contributor's guide: http://docs.openstack.org/infra/manual/developers.html
Documentation: http://docs.openstack.org/developer/tricircle

Tricircle big-tent application defense: https://review.openstack.org/#/c/338796 (A lots of comment and discussion to learn about Tricircle from many aspects)

Tricircle is designed to use the same tools for submission and review as other OpenStack projects. As such we follow the OpenStack development workflow. New contributors should follow the getting started steps before proceeding, as a Launchpad ID and signed contributor license are required to add new entries.

History

1. Tricircle before splitting(valid before Oct.2016): https://wiki.openstack.org/wiki/tricircle_before_splitting

Meeting minutes and logs

all meeting logs and minutes could be found in
2016: http://eavesdrop.openstack.org/meetings/tricircle/2016/
2015: http://eavesdrop.openstack.org/meetings/tricircle/2015/

To do list

To do list is in the etherpad: https://etherpad.openstack.org/p/TricircleToDo

Splitting Tricircle into two projects: https://etherpad.openstack.org/p/TricircleSplitting

Team Member

Contact team members in IRC channel: #openstack-tricircle

Current active participants

Joe Huang, Huawei

Khayam Gondal, Dell

Shinobu Kinjo, RedHat

Ge Li, China UnionPay

Vega Cai, Huawei

Pengfei Shi, OMNI Lab

Bean Zhang, OMNI Lab

Yipei Niu, Huazhong University of Science and Technology

Ronghui Cao, Hunan University

Xiongqiu Long, Hunan University

Zhuo Tang, Hunan University

Liuzyu, Hunan University

Jiawei He, Hunan University

KunKun Liu, Hunan University

Yangkai Shi, Hunan University

Yuquan Yue, Hunan University

Howard Huang, Huawei