Trio2o

The Trio2o is to provide APIs gateway for multiple OpenStack clouds, spanning in one site or multiple sites or in hybrid cloud, to act as a single OpenStack cloud.

Use Cases
Single endpoint requirement for multiple OpenStack instances is up to the cloud operators' favorite. Some operators may prefer to have single endpoint, but others may not. There are lots of scenario will have multiple OpenStack instaces in the cloud.

Large scale cloud
Compared with Amazon, the scalability of OpenStack is still not good enough. One Amazon AZ can supports >50000 servers(http://www.slideshare.net/AmazonWebServices/spot301-aws-innovation-at-scale-aws-reinvent-2014).

Cells is a good enhancement for Nova scalability, but the shortage of Cells are: 1) using RPC for inter-data center communication will bring the difficulty in inter-dc troubleshooting and maintenance, and some critical issue in operation. No CLI or restful API or other tools to manage a child cell directly. If the link between the API cell and child cells is broken, then the child cell in the remote edge cloud is unmanageable, no matter locally or remotely. 2). The challenge in security management for inter-site RPC communication. Please refer to the slides[1] for the challenge 3: Securing OpenStack over the Internet, Over 500 pin holes had to be opened in the firewall to allow this to work – Includes ports for VNC and SSH for CLIs. Using RPC in cells for edge cloud will face same security challenges.3)only nova supports cells. But not only Nova needs to support edge clouds, Neutron, Cinder should be taken into account too. How about Neutron to support service function chaining in edge clouds? Using RPC? how to address challenges mentioned above? And Cinder? 4). Using RPC to do the production integration for hundreds of edge cloud is quite challenge idea, it's basic requirements that these edge clouds may be bought from multi-vendor, hardware/software or both.

From the experience of production large scale public cloud point of view, the large scale cloud can be built by capacity expansion step by step (intra-AZ and inter-AZ). And the challenge in capacity expansion is how to do the sizing:
 * Number of Nova-API Server...
 * Number of Cinder-API Server..
 * Number of Neutron-API Server…
 * Number of Scheduler..
 * Number of Conductor…
 * Specification of physical switch…
 * Size of storage for Image..
 * Size of management plane bandwidth…
 * Size of data plane bandwidth…
 * Reservation of rack space …
 * Reservation of networking slots…

You have to estimate, calculate, monitor, simulate, test, online grey expansion for controller nodes and network nodes…whenever you add new machines to the cloud. The difficulty is that you can’t test and verify in all size.

The feasible way to expand one large scale cloud is to add some already tested building block. That means we would prefer to build large scale cloud by adding tested OpenStack instance (including controller and compute nodes) one by one, but would not prefer to unconditionally enlarge the capacity of one OpenStack instance. This way put the cloud construction under control.

Building large scale cloud by adding tested OpenStack instance one by one, but from the end user and PaaS point of view, they still want to use OpenStack API for already developed CLI, SDK, Portal, PaaS, Heat, Maganum, Murano etc. This way of building large scale public cloud also brings some new requirement to OpenStack based cloud which is quite similar like that in massive distributed edge clouds:


 * Single endpoint for the large scale cloud
 * Distributed quota management
 * Global resource view of the tenant
 * Tenant level Volume/VM migration/backup
 * Multi-DC image import/clone/export

Massive distributed edge cloud
Now building massive distributed edge clouds in edge data centers with computing and storage close to end users is emerging for enterprise application, NFV service and personal service.

Enterprise Application
Some enterprises also found issues for applications running in remote centralized cloud, for example for video editing, 3D modeling application and IoT service etc which bandwidth and latency are sensitive.

The high bandwidth and low latency provided by the edge cloud are critical for enterprise level applications like video editing, 3D modeling, AR/VR, IoT service, etc

For Enterprise, most of the employee will work in different branches, and access to the nearby edge cloud, and collaboration among employee from different branch leads to the requirement on cross edge cloud functionalities, like data distribution and migration.

NFV and Edge Cloud Service
NFV(network function virtualization) will provide more flexible and better customized networking capability, for example, dynamic customized network bandwidth management, and also help to move the computing and storage close to end users. With shortest path from the end users to the storage and computing, the uplink speed could be larger and terminate the bandwidth consumption as quick as possible, will definitely bring better user experience, and change the way of content generation and store: real time, all data in cloud.

For example, an user/enterprise can dynamically ask for high bandwidth/storage requirement for streaming the HD video/AR/VR data into the cloud temporarily, after finishing streaming, ask for more computing resources to do the post processing, and re-distribute the video to other sites. And when a user want to move/re-distribute the application and data from one edge cloud to another one, should be able to dynamically ask for cross edge cloud bandwidth managed by NFV.

Personal service
Current Internet is good at processing down-link service. All contents are stored in remote centralized data centers and to some extent the access is accelerated with CDN.

As more and more users generate content uploaded/streamed to the cloud and web site, these contents and data still have to be uploaded/streamed to some centralized data centers, the path is long and the bandwidth is limited and slow. For example, it’s very slow to uploading/streaming HD/2k/4k video for every user concurrently. For pictures or videos, they have to be uploaded with quality loss, and slow, using cloud as the first storage for users data has not the choice yet, currently it’s mainly for backup, and for none time/latency sensitive data. Some video captured and stored with quality loss even lead to the difficulty to provide the crime evidence or other purpose. The last mile of network access (fix or mobile) is wide enough, the main hindrance is that bandwidth in MAN(Metropolitan Area Network) and Backbone and WAN is limited and expensive. Real time video/data uploading/streaming from end user/terminal to the local edge cloud is quite attractive cloud service.

Requirements
The emerging massive distributed edge clouds will be not only some cloud islands, but also some new requirements are needed:
 * Single endpoint for large amount of small edge cloud in specific deployment scenario
 * Tenant level Volume/VM/object storage backup/migration/distribution
 * Distributed image management
 * Distributed quota management

OpenStack API enabled hybrid cloud
Refer to https://wiki.openstack.org/wiki/Jacket

Strong requirement on single endpoint of the hybrid cloud.

The detailed use cases could be found in this presentation: https://docs.google.com/presentation/d/16laTyn4ra-446v4p0kwMnpgHqwzMsz1r6QeiSI2Kq2M/

More technical use cases could be found in the communication material used in Tricircle big-tent project application, there is also requirement on single endpoint requirement in some deployments: https://docs.google.com/presentation/d/1Zkoi4vMOGN713Vv_YO0GP6YLyjLpQ7fRbHlirpq6ZK4/edit?usp=sharing

Overview
The Trio2o is to provide APIs gateway for multiple OpenStack clouds, spanning in one site or multiple sites or in hybrid cloud, to act as a single OpenStack cloud.

The Trio2o and these managed OpenStack instances will use shared KeyStone (with centralized or distributed deployment) or federated KeyStones for identity management. The Trio2o presents one big region to the end user in KeyStone. And each OpenStack instance called a pod is a sub-region of the Trio2o in KeyStone, and usually not visible to end user directly.

The Trio2o acts as OpenStack API gateway, can handle OpenStack API calls, schedule one proper OpenStack instance if needed during the API calls handling, forward the API calls to the appropriate OpenStack instance.

The end user can see availability zone(AZ) and use AZ to provision VM, Volume through the Trio2o. One AZ can include many OpenStack instances, the Trio2o can schedule and bind OpenStack instance for the tenant inside one AZ. A tenant's resources could be bound to multiple specific bottom OpenStack instances in one or multiple AZs automatically.

The Trio2o is derived from the old Tricircle, and work dedicated for the API-gateway.

The Trio2o could be extended to support more powerful capabilities such as support the Trio2o  instance being virtually splitted into multiple micro instances which could enable user to have a more fine granularity on the tenancy and service. And the Trio2o also enables OpenStack based hybrid cloud.

Architecture
Now the Trio2o is designed as standalone API-gateway service which is decoupled from OpenStack existing services like Nova, Cinder. The design blueprint has been developed with ongoing improvement in https://docs.google.com/document/d/1cmIUsClw964hJxuwj3ild87rcHL8JLC-c7T-DUQzd4k/,



Sared KeyStone (centralized or distributed deployment) or federated KeyStones could be used for identity management for the Trio2o and managed OpenStack instances. The Trio2o presents one big region to the end user in KeyStone. And each OpenStack instance called a pod is a sub-region of the Trio2o in KeyStone, and usually not visible to end user directly.


 * Nova API-GW
 * 1) A standalone web service to receive all nova API request, and routing the request to appropriate bottom OpenStack instance according to Availability Zone ( during creation ) or VM's uuid ( during operation and query ). If more than one OpenStack instance in one Availability Zone, schedule one and forward the request to proper OpenStack instance, and build the binding relationship between tenant ID and OpenStack instance.
 * 2) work as stateless service, and could run with processes distributed in multi-hosts.
 * Cinder API-GW
 * 1) a standalone web service to receive all cinder API request, and route the request to appropriate bottom OpenStack instance according to Availability Zone ( during creation ) or resource id like volume/backup/snapshot uuid ( during operation and query ).  If more than one OpenStack instance in one Availability Zone, schedule one OpenStack instance and forward the request to proper OpenStack instance, and build binding relationship between the tenant ID and OpenStack instance.
 * 2) Cinder APIGW and Nova APIGW will make sure the volumes for the same VM will co-locate in same OpenStack instance.
 * 3) work as stateless service, and could run with processes distributed in multi-hosts.
 * Admin API
 * 1) manage sites(bottom OpenStack instances) and availability zone mappings.
 * 2) Retrieve object uuid routing.
 * 3) Expose API for maintenance.
 * XJob
 * 1) Receive and process cross OpenStack functionalities and other async. jobs from Nova API-GW, or Cinder API-GW, Admin API
 * 2) Any of Admin API, Nova API-GW, Cinder API-GW could send an async. job to XJob through message bus with RPC API provided by XJob.
 * Database
 * 1) The Tricircle has its own database to store pods, pod-bindings, jobs, resource routing tables.

For Glance deployment, there are several choice:
 * Shared Glance, if all OpenStack instances are located inside a high bandwidth, low latency site.
 * Shared Glance with distributed back-end, if OpenStack instances are located in several sites.
 * Distributed Glance deployment, Glance service is deployed distributed in multiple site with distributed back-end
 * Separate Glance deployment, each site is installed with separate Glance instance and back-end, no cross site image sharing is needed.

Value
The motivation to develop the Trio2o open source project:


 * OpenStack API eco-system reserved, from CLI, SDK to Heat, Murano, Magum etc, all of these could be reused seamlessly.
 * support modularized capacity expansion in large scale cloud.
 * Tenant level quota control across OpenStack instances.
 * Global resource usage view across OpenStack instances.
 * User level KeyPair management across OpenStack instances.
 * Tenant's data movement across OpenStack instances thanks to the tenant level L2/L3 networking.

Installation and Play
Refer to installation guide in https://github.com/openstack/trio2o for single node/two nodes setup using devstack.

Resources

 * Design documentation: Trio2o Design Blueprint
 * Wiki: https://wiki.openstack.org/wiki/trio2o
 * Source: https://github.com/openstack/trio2o
 * Bugs: http://bugs.launchpad.net/trio2o
 * Blueprints: https://launchpad.net/trio2o
 * Review Board: https://review.openstack.org/#/q/project:openstack/trio2o
 * Weekly meeting IRC channel: #openstack-meeting, irc.freenode.net on every Wednesday starting from UTC 13:00 to UTC 14:00
 * Weekly meeting IRC log: https://wiki.openstack.org/wiki/Meetings/Trio2o
 * Trio2o project IRC channel: #openstack-trio2o on irc.freenode.net
 * Trio2o project IRC channel log: http://eavesdrop.openstack.org/irclogs/%23openstack-trio2o/
 * Mail list: openstack-dev@lists.openstack.org, with [openstack-dev][trio2o] in the mail subject
 * New contributor's guide: http://docs.openstack.org/infra/manual/developers.html

Trio2o is designed to use the same tools for submission and review as other OpenStack projects. As such we follow the OpenStack development workflow. New contributors should follow the getting started steps before proceeding, as a Launchpad ID and signed contributor license are required to add new entries.

How to read the source code
To read the source code, it's much easier if you follow this blueprint:

Implement Stateless Architecture: https://blueprints.launchpad.net/tricircle/+spec/implement-stateless

This blueprint is to build Tricircle from scratch, also was the code base for the Trio2o project.

History
Q: What is the difference between Trio2o, Tricircle and OpenStack Cascading?

OpenStack Cascading was mainly a solution used in a PoC done in late 2014 and early 2015, which aims to test out the idea that multiple OpenStack instances COULD be deployed across multiple geo-diverse sites, and managed by an OpenStack API layer, which was based on OpenStack services. After the PoC was carried out successfully, the team then planned to contribute the core idea to the community.

Tricircle was born out of that idea, unlike what is implemented in the V1 of OpenStack cascading solution in the PoC, which has plenty twists and plumbers of feature enhancements, Tricircle in its earliest stage tries to build a clean architecture that is extendable, pluggable and reusable in nature, it includes the OpenStack API gateway and networking automation functionalities

In Sept. 2016, according to the feedback from TCs on Tricircle big-tent application, Trio2o, which is to provide API gateway functionalities, was moved away from Tricircle, this makes Tricircle dedicated for networking automation across Neutron.

Tricircle: Dedicated for cross Neutron networking automation in multi-region OpenStack deployments, run without or with Trio2o.

Trio2o: Dedicated to provide API gateway for those who need single Nova/Cinder API endpoint in multi-region OpenStack deployment, run without or with Tricircle.

The wiki for Tricircle before splitting is linked here: https://wiki.openstack.org/wiki/tricircle_before_splitting

Q: Where is the source code from

Trio2o source code is forked from Tricircle and then with cleaning

https://etherpad.openstack.org/p/Trio2oCleaning

To do list
To do list is in the etherpad: https://etherpad.openstack.org/p/Trio2oToDo

Sync patch from Tricircle for Nova API-GW/Cinder API-GW part: http://lists.openstack.org/pipermail/openstack-dev/2016-December/108552.html

Meeting minutes and logs
all meeting logs and minutes could be found in

2016: http://eavesdrop.openstack.org/meetings/trio2o/2016/

Team Member
Contact team members in IRC channel: #openstack-trio2o

Current active participants
Joe Huang, Huawei

Khayam Gondal, Dell

Shinobu Kinjo, RedHat

Ge Li, China UnionPay

Vega Cai, Huawei

Pengfei Shi, OMNI Lab

Bean Zhang, OMNI Lab

Yipei Niu, Huazhong University of Science and Technology

Ronghui Cao, Hunan University

Xiongqiu Long, Hunan University

Zhuo Tang, Hunan University

Liuzyu, Hunan University

Jiawei He, Hunan University

KunKun Liu, Hunan University

Yangkai Shi, Hunan University

Yuquan Yue, Hunan University

Howard Huang, Huawei