Difference between revisions of "Fog Edge Massively Distributed Clouds"

Revision as of 07:18, 14 June 2017

The goal of the Fog/Edge/Massively Distributed Clouds working group is to debate and investigate how OpenStack can address Fog/Edge Computing use-cases (i.e. the supervision and use of a large number of remote data centers through a single distributed OpenStack system).

Status: active
Contact: Adrien Lebre <adrien.lebre@inria.fr>

Meetings

Chaired by: Adrien Lebre (ad_rien, France), Paul Andre Raymond (parnexius, USA)

Alternate member: Anthony Simonet (menthos, France)

Meeting agendas: https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017

Problem description

More and more academics and industry experts are advocating for going from large-centralized Cloud Computing infrastructures to smaller ones massively distributed at the edge of the network. Referred to as “fog/edge computing”, such a dawning paradigm is attracting growing interest as it improves the whole services agility in addition to bringing computing resources closer to end-users. However to favor the adoption of this a decentralized model of the Cloud Computing paradigm, the development of a system in charge of turning a complex and diverse network of resources into a global Cloud is critical.

Instead of developing yet another brokering system, the ambition of the Fog/Edge/Massively distributed clouds WG is:

to study to what extent the current OpenStack mechanisms can handle such massively distributed infrastructures
to propose revisions/extensions of internal mechanisms when appropriate.
to study how should current cloud APIs be extended to take the advantage of the geo-distribution (latency-aware applications, …)

Brokering/Orchestration of clouds are the first approaches that are considered when it comes to operate and use distinct clouds. Each micro DC hosts and supervises its own cloud and a brokering service is in charge of provisioning resources by picking them on each cloud. While these top/down approaches with a simple centralized broker can be acceptable for basic use cases, advanced brokering services become mandatory to meet requirements of production environments (monitoring, scheduling, automated provisioning, SLAs enforcements, quotas management, tenant networks…). In addition to dealing with scalability, latency and single point of failure issues, brokering services become more and more complex to finally integrate most of the mechanisms that are already implemented by each IaaS manager in charge of operating each site.

Upstream first: the vision of the F/E/MDC WG is to revise OpenStack through a bottom/up approach with the ultimate objective of delivering an OpenStack architecture that can natively cooperate with other instances, giving the illusion of a global cloud. Such an approach [1, 2] should enable the community to mitigate development efforts by reusing as much as possible of the existing and future OpenStack eco-system.

It is noteworthy to mention that OpenStack already proposes initial mechanisms to deal with WANWide deployments [3]. However, it is unclear whether current internal mechanisms enable the management of larger distributed cloud computing platforms (i.e. composed of hundred of distinct sites). In addition to identifying representative use-cases for massively distributed infrastructures, the first action the WG wants to perform is an analysis in terms of scalability as well as communication patterns of the different core services of OpenStack (nova, keystone, horizon, glance, cinder, neutron and swift) in a multi-sites context. Although some initiatives already investigated the massively distributed use-case in an OpenStack context [4, 5, 6], such a rigorous analysis of the vanilla stack is missing.

Such a study would enable the community to identify major challenges and answer questions such as:

Scalability of the controller: How many controller should/could be deployed to supervise the whole infrastructure? on which location(s)? One per site, one for several sites? How many compute nodes per controller would be necessary?
Should we have a single or multiple endpoints? Why?
Wide Area Network limitations (in terms of latency/bandwidth): Are there critical latency constraints that may prevent the correct functioning of core components? Are current services efficient enough to deal with WAN constraints (VM images, …)
Consistency: How can we guarantee consistency of core-services states? If one project/vm/… is created on one site, the states of the other sites should be consistent to avoid for instance double assignment of Ids/IPs/…
Security management : Do Fog/Edge infrastructure create new security issues ? How can we ensure the security of communications inside and between the different locations?
Fault tolerance issues: How can we revise OpenStack in a way that guarantees that the crash or the isolation of one (or several sites) does not impact other DCs? (Each site should be able to run independently.)
Maintainability: how can we upgrade the system in a consistent way (considering that upgrading the complete infrastructure can take a significant amount of time while facing crash and disconnection issues) ? In other words, we should propose mechanisms that allow OpenStack to behave correctly even if we have different versions of the core-services?
Interconnexion between multi-vendors (peering agreement challenges, interoperability…)

Following such a study, it will be possible to propose revisions/extensions and debate on the different approaches.

[1] http://people.rennes.inria.fr/Adrien.Lebre/PUBLIC/MassivelyDistributed-101.pdf
[2] https://etherpad.openstack.org/p/massively-distributed-clouds-overview (Initial massively distributed WG proposal made in Austin).
[3] http://docs.openstack.org/arch-design/multi-site.html
[4] https://www.openstack.org/assets/presentation-media/OpenStack-2016-Austin-D-NFV-vM.pdf
[5] https://wiki.openstack.org/wiki/Tricircle
[6] http://beyondtheclouds.github.io

Mission

Be the recognized forum of expertise for OpenStack deployments over multiple sites and provide advice and inputs to the Architecture WG and the entire OpenStack community. Be a catalyst for actions that deal with massively distributed cloud computing challenges, in particular by identifying cooperation opportunities.

Interactions with other WGs

Performance WG: https://wiki.openstack.org/wiki/Performance_Team
LDT WG: https://wiki.openstack.org/wiki/Large_Deployment_Team
NVF Telcos: https://etherpad.openstack.org/p/ops-telco-nfv-meeting-agenda
Architecture WG: TBD
Meghdwar: https://wiki.openstack.org/wiki/Meghdwar

How to participate

Sign up to the openstack-dev mailing list and look for posts with "[Massively distributed]" in the subject
Take part in our bi-monthly meetings on irc #openstack-distributed suggest your agenda items and take part in current discussions (Time should be defined according to the different timezone of participants).
Share particular use-cases or superuser stories
Review specs and provide your input
Email Adrien Lebre <adrien.lebre@inria.fr> with your suggestions, questions, …
Link toward IRC meetings: please see https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2016

Planned Actions for Queen cycle (additional actions can be proposed during our IRC meetings)

Evaluations of alternative communication solutions (qpid-router, ZMQ) and comparisons with the default RabbitMQ bus (Inria, Orange, Redhat)
Cockroach DB as a MySQL replacement: a prospective analysis (Inria, Orange)

Achieved Actions

Pike
- Consolidation of the EnOS framework: http://enos.readthedocs.io/en/latest/
- First evaluations of OpenStack WANWide: https://www.openstack.org/videos/boston-2017/toward-fog-edge-and-nfv-deployments-evaluating-openstack-wanwide
Ocata
- Identify a set of scenarios in which having a distributed cloud is required.
- Development of the EnOS framework: http://enos.readthedocs.io/en/latest/
- Evaluation of OpenStack scalability (collaboration with the Performance team): https://www.openstack.org/videos/barcelona-2016/chasing-1000-nodes-scale
Newton
- A PoC of Nova on top of Redis: https://www.openstack.org/videos/austin-2016/a-ring-to-rule-them-all-revising-openstack-internals-to-operate-massively-distributed-clouds

cross-cycle actions

Identifying/studying/discussing proposals and new building blocks to distribute Openstack accross several geographically distant regions.
For each OpenStack core-services, conducting a strong evaluation to identify
- bottlenecks
- blocking design choices (such as the rabbit+rpc question, ZeroMQ, …)
Produce visible results, available for the whole community. Wiki pages, summit presentations.
Analyze pros/cons of on-going actions and identify cooperation opportunities

Previous documents

Boston Summit etherpads, May 2017

Meeting agendas for 2016

https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2016

Barcelona Summit ether pad, Oct 2016:

https://etherpad.openstack.org/p/massively_distribute-barcelona_working_sessions

Initial proposal:

[1]

@@ Line 69: / Line 69: @@
-== Planned Actions for Queen cycle (proposals, to be discussed/confirmed during the next summit) ==
+== Planned Actions for Queen cycle (additional actions can be proposed during our IRC meetings) ==
 * Evaluations of alternative communication solutions (qpid-router, ZMQ) and comparisons with the default RabbitMQ bus (Inria, Orange, Redhat)
 * Cockroach DB as a MySQL replacement: a prospective analysis (Inria, Orange)