Fog Edge Massively Distributed Clouds

The goal of the Fog/Edge/Massively Distributed Clouds SIG is to guide the OpenStack community to best address fog/edge computing use cases—defined as the supervision and use of a large number of remote mini/micro/nano data centers through a collaborative OpenStack system. The FEMDC SIG advances the topic through debate and investigation of requirements for various implementation options.

Status: active Contact: Adrien Lebre  Paul-André Raymond 

Meetings
Chaired by: Adrien Lebre (ad_rien, France), Paul Andre Raymond (b-yond, USA)

Meeting agendas: https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2018

Problem description
More and more academics and industry experts are advocating for going from large-centralized Cloud Computing infrastructures to smaller ones massively distributed at the edge of the network. Referred to as “fog/edge computing”, such a dawning paradigm is attracting growing interest as it improves the whole services agility in addition to bringing computing resources closer to end-users. However to favor the adoption of this a decentralized model of the Cloud Computing paradigm, the development of a system in charge of turning a complex and diverse network of resources into a global Cloud is critical.

Instead of developing yet another brokering system, the ambition of the Fog/Edge/Massively distributed clouds SIG is:
 * to study to what extent the current OpenStack mechanisms can handle such massively distributed infrastructures
 * to propose revisions/extensions of internal mechanisms when appropriate.
 * to study how should current cloud APIs be extended to take the advantage of the geo-distribution (latency-aware applications, …)

Brokering/Orchestration of clouds are the first approaches that are considered when it comes to operate and use distinct clouds. Each micro DC hosts and supervises its own cloud and a brokering service is in charge of provisioning resources by picking them on each cloud. While these top/down approaches with a simple centralized broker can be acceptable for basic use cases, advanced brokering services become mandatory to meet requirements of production environments (monitoring, scheduling, automated provisioning, SLAs enforcements, quotas management, tenant networks…). In addition to dealing with scalability, latency and single point of failure issues, brokering services become more and more complex to finally integrate most of the mechanisms that are already implemented by each IaaS manager in charge of operating each site.

Upstream first: the vision of the F/E/MDC SIG is to revise OpenStack through a bottom/up approach with the ultimate objective of delivering an OpenStack architecture that can natively cooperate with other instances, giving the illusion of a global cloud. Such an approach [1, 2] should enable the community to mitigate development efforts by reusing as much as possible of the existing and future OpenStack eco-system.

It is noteworthy to mention that OpenStack already proposes initial mechanisms to deal with WANWide deployments [3]. However, it is unclear whether current internal mechanisms enable the management of larger distributed cloud computing platforms (i.e. composed of hundred of distinct sites). In addition to identifying representative use-cases for massively distributed infrastructures, the first action the SIG wants to perform is an analysis in terms of scalability as well as communication patterns of the different core services of OpenStack (nova, keystone, horizon, glance, cinder, neutron and swift) in a multi-sites context. Although some initiatives already investigated the massively distributed use-case in an OpenStack context [4, 5, 6], such a rigorous analysis of the vanilla stack is missing.

Such a study would enable the community to identify major challenges and answer questions such as:
 * Scalability of the controller: How many controller should/could be deployed to supervise the whole infrastructure? on which location(s)? One per site, one for several sites? How many compute nodes per controller would be necessary?
 * Should we have a single or multiple endpoints? Why?
 * Wide Area Network limitations (in terms of latency/bandwidth): Are there critical latency constraints that may prevent the correct functioning of core components? Are current services efficient enough to deal with WAN constraints (VM images, …)
 * Consistency: How can we guarantee consistency of core-services states? If one project/vm/… is created on one site, the states of the other sites should be consistent to avoid for instance double assignment of Ids/IPs/…
 * Security management : Do Fog/Edge infrastructure create new security issues ? How can we ensure the security of communications inside and between the different locations?
 * Fault tolerance issues: How can we revise OpenStack in a way that guarantees that the crash or the isolation of one (or several sites) does not impact other DCs? (Each site should be able to run independently.)
 * Maintainability: how can we upgrade the system in a consistent way (considering that upgrading the complete infrastructure can take a significant amount of time while facing crash and disconnection issues) ? In other words, we should propose mechanisms that allow OpenStack to behave correctly even if we have different versions of the core-services?
 * Interconnexion between multi-vendors (peering agreement challenges, interoperability…)

Following such a study, it will be possible to propose revisions/extensions and debate on the different approaches.

[1] http://people.rennes.inria.fr/Adrien.Lebre/PUBLIC/MassivelyDistributed-101.pdf

[2] https://etherpad.openstack.org/p/massively-distributed-clouds-overview (Initial massively distributed WG proposal made in Austin).

[3] http://docs.openstack.org/arch-design/multi-site.html

[4] https://www.openstack.org/assets/presentation-media/OpenStack-2016-Austin-D-NFV-vM.pdf

[5] https://wiki.openstack.org/wiki/Tricircle

[6] http://beyondtheclouds.github.io

Mission
Be the recognized forum of expertise for OpenStack deployments over multiple sites and provide advice and inputs to the OpenStack community. Be a catalyst for actions that deal with massively distributed cloud computing challenges, in particular by identifying cooperation opportunities.

Interactions with other Groups

 * Edge working session: https://etherpad.openstack.org/p/2017_edge_computing_working_sessions
 * OpenStack Edge Discussions Dublin PTG


 * Performance WG: https://wiki.openstack.org/wiki/Performance_Team
 * LDT WG: https://wiki.openstack.org/wiki/Large_Deployment_Team
 * We used to collaborate with the NVF Telcos (https://etherpad.openstack.org/p/ops-telco-nfv-meeting-agenda) and Meghdwar (https://wiki.openstack.org/wiki/Meghdwar) WGs.

How to participate

 * Sign up to the openstack-dev mailing list and look for posts with "[FEMDC]" in the subject
 * Take part in our bi-monthly IRC meetings onWednesday at 15:00 UTC #openstack-meeting (suggest your agenda items and take part in current discussions)
 * Share particular use-cases or superuser stories
 * Review specs and provide your input
 * Email Adrien Lebre  with your suggestions, questions, …
 * Link toward IRC meetings: please see https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2018 (go to the bottom to see/complete the next agenda)

Planned Actions for Queen cycle
(additional actions can be proposed during our IRC meetings)
 * Evaluations of alternative communication solutions (qpid-router, ZMQ) and comparisons with the default RabbitMQ bus (Inria, Orange, Redhat)
 * Cockroach DB as a MySQL replacement: a prospective analysis (Inria, Orange)
 * Use-cases analysis / Fog/Edge requirements (B-Yond, Inria, FBK)

Achieved Actions

 * Pike
 * Consolidation of the EnOS framework: http://enos.readthedocs.io/en/latest/
 * First evaluations of OpenStack WANWide: https://www.openstack.org/videos/boston-2017/toward-fog-edge-and-nfv-deployments-evaluating-openstack-wanwide
 * Ocata
 * Identify a set of scenarios in which having a distributed cloud is required.
 * Development of the EnOS framework: http://enos.readthedocs.io/en/latest/
 * Evaluation of OpenStack scalability (collaboration with the Performance team): https://www.openstack.org/videos/barcelona-2016/chasing-1000-nodes-scale
 * Newton
 * A PoC of Nova on top of Redis: https://www.openstack.org/videos/austin-2016/a-ring-to-rule-them-all-revising-openstack-internals-to-operate-massively-distributed-clouds

cross-cycle actions

 * Identifying/studying/discussing proposals and new building blocks to distribute Openstack accross several geographically distant regions.
 * For each OpenStack core-services, conducting a strong evaluation to identify
 * bottlenecks
 * blocking design choices (such as the rabbit+rpc question, ZeroMQ, …)
 * Produce visible results, available for the whole community. Wiki pages, summit presentations.
 * Analyze pros/cons of on-going actions and identify cooperation opportunities

Previous documents
Sydney, Oct 2017:
 * Sydney Face-to-face meeting: https://etherpad.openstack.org/p/FEMDC-F2F-meeting-sydney-summit

Boston Summit etherpads, May 2017
 * BoF: https://etherpad.openstack.org/p/BOS-Fog-Edge-MassivelyDistributed-BoF
 * F2F Meeting: https://etherpad.openstack.org/p/Massively_distributed_wg_boston_summit

Meeting agendas/minutes for previous years
 * https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017
 * https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2016

Barcelona Summit ether pad, Oct 2016:
 * https://etherpad.openstack.org/p/massively_distribute-barcelona_working_sessions

Initial proposal:
 * https://etherpad.openstack.org/p/massively-distributed_WG_description