Jump to: navigation, search

Difference between revisions of "OpenStack cascading solution"

(Use Case)
(Use Case)
Line 20: Line 20:
 
There are lots of OpenStack based clouds, each tenant will be allocated with one cascading OpenStack as the virtual OpenStack service, and single OpenStack API endpoint served for this tenant. The tenant's resources can be distributed or dynamically scaled to multi-OpenStack based clouds, these clouds may be federated with KeyStone, or using shared KeyStone, or  even some OpenStack clouds built in AWS or Azure, or VMWare vSphere.
 
There are lots of OpenStack based clouds, each tenant will be allocated with one cascading OpenStack as the virtual OpenStack service, and single OpenStack API endpoint served for this tenant. The tenant's resources can be distributed or dynamically scaled to multi-OpenStack based clouds, these clouds may be federated with KeyStone, or using shared KeyStone, or  even some OpenStack clouds built in AWS or Azure, or VMWare vSphere.
  
Under this deployment scenario, unlimited scalability in a cloud can be achieved, no unified cascading layer, tenant level resources orchestration among multi-OpenStack clouds fully distributed(even geographically). The database and load for one casacding OpenStack is very very small, easy for disaster recovery or backup. Multiple tenant may share one cascading OpenStack to reduce resource waste, but the principle is to keep the cascading OpenStack as thin as possible.
+
Under this deployment scenario, unlimited scalability in a cloud can be achieved, no unified cascading layer, tenant level resources orchestration among multi-OpenStack clouds fully distributed(even geographically), no central point at all. The database and load for one casacding OpenStack is very very small, easy for disaster recovery or backup. Multiple tenant may share one cascading OpenStack to reduce resource waste, but the principle is to keep the cascading OpenStack as thin as possible.
 
[[File:Cascading15.png]]
 
[[File:Cascading15.png]]
  

Revision as of 08:21, 17 April 2015

Overview

OpenStack cascading solution is designed for multi-site OpenStack clouds integration, and solve the scalability of OpenStack by the way, for example, a cloud with million level VMs geographically distributed in many data centers.

  • The parent OpenStack expose standard OpenStack API
  • The parent OpenStack manage many child OpenStacks by using standard OpenStack API
  • Each child OpenStack functions as a Amazon like available zone and is hidden by the parent OpenStack
Cascading01.png
  • Cascading OpenStack: the parent OpenStack, providing API and scheduling and orchestration of Cascaded OpenStacks
  • Cascaded OpenStack: the child OpenStack, provisioning the VM, Volume and virtual Networking resources


OpenStack cascading is "OpenStack orchestrate OpenStacks". OpenStack cascading mainly concentrate on API aggregation, and provide tenant level cross OpenStack IP address management, networking automation, image replication, etc. And also provide tenant with a virtual OpenStack experience although his resources distributed in multiple OpenStack instances. After cascading, tenant only need to access one API endpoint. For tenants, it's like a virtual single region, and multiple child OpenStacks are not visible which had already been integrated and hidden by the cascading OpenStack. The tenant can see multiple "available zones", where the resources distributed into, and each child OpenStack exactly work internally as Amazon like availability zone.

Use Case

  • Tenant level virtual OpenStack service over hybrid or federated or multiple OpenStack based clouds:

There are lots of OpenStack based clouds, each tenant will be allocated with one cascading OpenStack as the virtual OpenStack service, and single OpenStack API endpoint served for this tenant. The tenant's resources can be distributed or dynamically scaled to multi-OpenStack based clouds, these clouds may be federated with KeyStone, or using shared KeyStone, or even some OpenStack clouds built in AWS or Azure, or VMWare vSphere.

Under this deployment scenario, unlimited scalability in a cloud can be achieved, no unified cascading layer, tenant level resources orchestration among multi-OpenStack clouds fully distributed(even geographically), no central point at all. The database and load for one casacding OpenStack is very very small, easy for disaster recovery or backup. Multiple tenant may share one cascading OpenStack to reduce resource waste, but the principle is to keep the cascading OpenStack as thin as possible. Cascading15.png


  • Large Scale Cloud to work like One OpenStack Instance :

The cloud admin wants to provide one cloud to tenants with one OpenStack api, and want the cloud to work as one OpenStack instance. The cloud is distributed in multiple data centers or in a single very large scale data center (for example 100k nodes). The cloud will grow with capacity expansion gradually, to avoid vendor-lock in, multiple vendors' OpenStack distribution to build the cloud together is also required.

Cascading10.png

Motivation

The requirement and driving forces for multi-site clouds integration is cross DC / OpenStack resources orchestration: globally addressable tenants which result in global services. tenant virtual resources will be distributed in multi-site but connected by L2/L3 networking.

  • Ecosystem friendly open API for the unified multi-site resource orchestration

Ecosystem friendly open API : It takes almost 4 years for OpenStack to grow the eco-system, the OpenStack API must be retained for distributed but unified multi-site resource orchestration.

  • Multi-site cloud has requirement for multi-vendor OpenStack distribution, multi- OpenStack instance, multi- OpenStack version co-existence

Multi-vendor: anti-vendor lock in business policy.
Multi-instance: each vendor has his own OpenStack solution distribution, different OpenStack instance for different site
Multi-version: step-wise cloud construction, upgrade gradually

  • Restful open API /CLI for each site

OpenStack API in each site: Open, de facto standard API
make the cloud always workable and manageable standalone in each site
each site installation/upgrade/maintenance independently by different vendor or cloud admin

Refer to slides Building Multi-Site and Multi-OpenStack Cloud with OpenStack cascading for more background.


From technologies point of view, to build large scale distributed OpenStack based cloud, for example, the multi-site cloud includes 1 million VMs or 100k hosts, there are big challenges

Naturally, there are two ways to do that:

1. scale up a single monolithic OpenStack region, but

  • It’s a big challenge for a single OpenStack to manage scale for example 1 million VMs or 100K hosts.
  • Can not obtain real fault isolation area like EC2’s available zone, all of the cloud are tighten up into one OpenStack because of shareing RPC message bus and database.
  • Single huge monolithic system bring high risk with O&M & trouble shooting, and big challenge for even the most skilled Op team to handle SW rolling upgrade and configuration changes, etc.
  • Time consumption for heterogeneous vendor’s infrastructure integration, multi-vendor's infrastructure co-existence is high demand for large scale cloud

2. setup hundreds of OpenStack Regions with discrete API endpoint, but

  • Have to buy or develop his own cloud management platform to integrate the discrete cloud into one cloud, and also, OpenStack API ecosystem is lost.
  • Or, leave the cloud with splitted resource island without any association…

Inspiration

It's not reinventing the wheel. The OpenStack cascading solution is inspired from:

  • remote clustered hypvisors, like vCenter, Ironic running under Nova. This makes Nova can scale to larger scale.
  • plugable driver/agent architecture of Nova/Cinder/Neutron
  • the magic FRACTAL, which has character of recursive self-similar and growth to scale. Please refer to OpenStack cascading and fractal for more information.

From the inspiration, here comes the conclusion:

  • Handling cascaded Nova/Cinder similar as what has been done over vCenter and Ironic
  • Handling cascaded Neutron similar as the L2 OVS agent / L3 DVR agent. The challenge is to make cross cascaded OpenStack L2/L3 networking like that inside one OpenStack


Now, the cascaded OpenStack simply like a huge compute node, and the cascading OpenStack like the controller node.

Architecture

For detailed architecture design, please refer to following links.


Normally, Nova will use KVM or other hypervisor as the compute virtualization backend, Cinder will use LVM or other block storage as storage backend, Neutron will use OVS or other as L2 backend, linux router as L3 agent, etc.

The core architecture idea of OpenStack cascading is to add Nova as the hypervisor backend of Nova, Cinder as the block storage backend of Cinder, Neutron as the backend of Neutron, Glance as one image location of Glance, Ceilometer as the store of Ceilometer.

Cascading12.png

The OpenStack cascading includes cascading of Nova, Cinder, Neutron, Glance and Ceilometer. KeyStone will be global service shared by cascading and cascaded OpenStacks (or using KeyStone federation), and Heat will consume cascading OpenStack API. Therefore no cascading is required for KeyStone and Heat. The following picture is the architecture for Nova/Cinder/Neutron cascading.

Cascading02.png


  • Nova-Proxy: the hypervisor driver for Nova running on Nova-Compute node. Nova proxy makes the cascaded Nova being the hypervisor back end of Nova. Transfer the VM operation to the regarding cascaded Nova. Also responsible for attach volume and network to the VM in the cascaded OpenStack.


  • Cinder-Proxy: The Cinder-Volume driver for Cinder running on Cinder-Volume node. Cinder proxy makes Cinder being the block storage back end of Cinder. Transfer the volume operation to the regarding cascaded Cinder.


  • L2-Proxy: Similar role like OVS-Agent. L2-Proxy makes the cascaded Neutron as the L2 backend of Neutron. Finish L2-networking in the cascaded OpenStack, including cross OpenStack networking.


  • L3-Proxy: Similar role like DVR L3-Agent. L3-Proxy makes the cascaded Neutron as the L3 backend of Neutron. Finish L3-networking in the cascaded OpenStack, including cross OpenStack networking.


  • FW-Proxy: Similar role like FWaaS-Agent. FW-Proxy makes the cascaded Neutron as the FWaaS backend of Neutron.


  • LB-Proxy: Similar role like LBaaS-Agent. LB-Proxy makes the cascaded Neutron as the LBaaS backend of Neutron.


  • VPN-Proxy: Similar role like VPNaaS-Agent. VPN-Proxy makes the cascaded Neutron as the VPNaaS backend of Neutron.



For Glance, it's not a must to have cascading. It's up to back-end store and deployment decision. Both global glance or glance with cascading can work for OpenStack cascading solution. There are two ways to do glance cascading, both method will make the cascaded Glance as one of the image locations:

  1. The first way is to replicate image when image data uploaded or an image location patched or snapshot created, this way can get better end-user experience for shorter VM boot period, but the technologies used is more complex.
  2. The second way to do the glance cascading is to replicate the image data only when the image is used for the first time in the regarding cascaded OpenStack. This way will make the first time VM-boot take longer time to finish the task, but it's much more simple and robust.
  3. The third way is to just register the location in the cascading Glance for the already existing Image in the cascaded Glance


For Ceilometer, cascading solution has to be introduced. There is huge volume data will be generated inside Ceilometer, it's impossible to use shared Ceilometer for all sites.

Cascading03.png

  • Repli-Manager: Replicate image among the cascading and policy determined Cascaded OpenStacks. It's only required if using image synchronization when image data upload or an image location patched or snapshot created.
  • Ceilometer-Proxy: Transfer the request to destined Ceilometer or collect information from several Ceilometer.

Value to end user

  • Tenant level global IP address management.

The cascading OpenStack can work as the global IP address management for the tenant across multiple cascaded OpenStack.

  • Consolidate multi-clouds resources into a virtual region

Through KeyStone federation, the OpenStack cascading can consolidate the tenant resources in multi-hybrid OpenStack based cloud into a virtual region.

  • Isolated virtual data center across different physical data centers.

Big cloud operator wants to create a virtual data center over distributed physical data centers, the virtual data center includes secure and isolated tenant resources: VM, Volume, and cross data center L2/L3 networking with advanced service like FW,LB,VPN. OpenStack cascading can provide virtual isolated network plus VMs,Volumes across different physical data centers to the end user through the standard OpenStack api.

  • Virtual machine / volume migration / vApp migration from one data center to another:

With the help of OpenStack cascading, VM/Volume migration from one DC to another one is feasible. Ref to blog cross-dc-app-migration-over-openstack-cascading

  • High availability application across different physical data center.

With the aid of overlay virtual L2/L3 networking across data centers and image synchronization function, application backup/disaster recovery/load balance is easy to implement in the distributed cloud.

Value to cloud admin

See more detail information in the document linked in the section "Architecture of OpenStack cascading". The major advantage of the architecture is listed here:

  1. The cascading OpenStack aggregate many child OpenStack cloud via standard OpenStack API into one cloud, and expose one OpenStack API endpoint by the cascading OpenStack for the tenant.
  2. if one Cascaded OpenStack failed, other part of the cloud can still work and accessible. This makes one cascaded OpenStack can act as Amazon like Availability Zone. If the cascading OpenStack failed, all cascaded OpenStacks are still manageable and workable via OpenStack API independently. In phase I, the provisioning is not allowed for consistency consideration between cascading and cascaded OpenStack. In phase II, after the consistency issue is solved, the provisioning can be allowed even if the cascading OpenStack is failed.
  3. The cascading OpenStack manages cascaded OpenStacks via standard OpenStack API. OpenStack api is restful API with backward compatibility and multi-version API in parallel. Therefore multi-vendor/multi-version of OpenStack is feasible in one large scale cloud.
  4. Each cascaded OpenStack and the cascading OpenStack can be managed independently as standalone OpenStack. Therefore the upgrade or operation and maintenance can be done separately in availability zone granularity.
  5. Relied on the standard OpenStack API managed by the cascading OpenStack, a new vendor’s physical resources with OpenStack distribution built-in could be integrated into the cloud via plug & play model, just like USB device plugged into PC. This benefit makes OpenStack API as the soft defined “PCI” bus in Cloud era.
  6. Scalability not only in one cascaded OpenStack, but also for multi-vendors’s cascaded OpenStack and federated OpenStack spread into many data centers. Because OpenStack API is restful API, one cascading OpenStack to manage multiple OpenStack distributed in multiple data centers across WAN or LAN is feasible.

Blueprints

To start an incubation project or register blueprint under different project with one umbrella blueprint, it's a tough decision. None of current contribution methodology can work for OpenStack cascading: many projects like Nova/Cinder/Neutron/Glance/Ceilometer are involved, they work together to make cascading work. After the OpenStack cascading is introduced, the CI is quite different from the current single OpenStack instance CI environment, the CI enrionment for OpenStack cascading needs at least one cascading OpenStack and two cascaded OpenStack.

Your comments are welcome.

The blueprints are:

[tbd]

Proof of Concept

The PoC gives us the confidence that OpenStack cascading solution is feasible, please refer to Tricircle in StackForge for the PoC: https://github.com/stackforge/tricircle

The primary contact for PoC is Chaoyi Huang ( joehuang@huawei.com ), who is the inventor, designer and PoC source code reviewer.

Scalability Test Report

Please refer to: Test report for OpenStack cascading solution to support 1 million VMs in 100 data centers


Hands on lab

Only live demo video available now, difficult for us to setup an global accessible online lab.

YouTube: https://www.youtube.com/watch?v=OSU6PYRz5qY

Youku (low quality, for those who can't access YouTube):http://v.youku.com/v_show/id_XNzkzNDQ3MDg4.html

Vimeo: http://vimeo.com/107453159

Source Code Repository

https://github.com/stackforge/tricircle

The project name "Tricircle" comes from a fractal. See the blog OpenStack Cascading and Fractal for more information.

CI Environment

Ask for help. Warmly welcome CI guru to help build the CI environment for OpenStack cascading solution.

How to Play

Your comments are welcome. [tbd]

Mail-list

The OpenStack cascading solution PoC team has no special mail-list currently. You can also use OpenStack dev list to reach us, please include [openstack-dev] [tricircle] in the mail title.

Meetings

[tbd]

Technology Hub

Last update

--joehuang (talk) 12:57, 11 October 2014 (UTC)