Jump to: navigation, search

Massively Distributed Clouds


The goal of the massively distributed group is to debate and investigate how OpenStack can address Fog/Edge Computing use-cases (i.e. the supervision and use of a large number of remote data centers through a single distributed OpenStack system).

Status: active
Contact: Adrien Lebre <adrien.lebre@inria.fr>

Meetings

Chaired by: Adrien Lebre (ad_rien) or Anthony Simonet (menthos)

Meeting agendas: https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017

Problem description

More and more academics and industry experts are advocating for going from large-centralized Cloud Computing infrastructures to smaller ones massively distributed at the edge of the network. Referred to as “fog/edge computing”, such a dawning paradigm is attracting growing interest as it improves the whole services agility in addition to bringing computing resources closer to end-users. However to favor the adoption of this a decentralized model of the Cloud Computing paradigm, the development of a system in charge of turning a complex and diverse network of resources into a global Cloud is critical.

Instead of developing yet another brokering system, the ambition of the Massively distributed WG is:

  • to study to what extent the current OpenStack mechanisms can handle such massively distributed infrastructures
  • to propose revisions/extensions of internal mechanisms when appropriate.
  • to study how should current cloud APIs be extended to take the advantage of the geo-distribution (latency-aware applications, …)

Brokering/Orchestration of clouds are the first approaches that are considered when it comes to operate and use distinct clouds. Each micro DC hosts and supervises its own cloud and a brokering service is in charge of provisioning resources by picking them on each cloud. While these top/down approaches with a simple centralized broker can be acceptable for basic use cases, advanced brokering services become mandatory to meet requirements of production environments (monitoring, scheduling, automated provisioning, SLAs enforcements, quotas management, tenant networks…). In addition to dealing with scalability, latency and single point of failure issues, brokering services become more and more complex to finally integrate most of the mechanisms that are already implemented by each IaaS manager in charge of operating each site.

Upstream first: the vision of the massively distributed WG is to revise OpenStack through a bottom/up approach with the ultimate objective of delivering an OpenStack architecture that can natively cooperate with other instances, giving the illusion of a global cloud. Such an approach [1, 2] should enable the community to mitigate development efforts by reusing as much as possible of the existing and future OpenStack eco-system.

It is noteworthy to mention that OpenStack already proposes initial mechanisms to deal with WANWide deployments [3]. However, it is unclear whether current internal mechanisms enable the management of larger distributed cloud computing platforms (i.e. composed of hundred of distinct sites). In addition to identifying representative use-cases for massively distributed infrastructures, the first action the WG wants to perform is an analysis in terms of scalability as well as communication patterns of the different core services of OpenStack (nova, keystone, horizon, glance, cinder, neutron and swift) in a multi-sites context. Although some initiatives already investigated the massively distributed use-case in an OpenStack context [4, 5, 6], such a rigorous analysis of the vanilla stack is missing.

Such a study would enable the community to identify major challenges and answer questions such as:

  • Scalability of the controller: How many controller should/could be deployed to supervise the whole infrastructure? on which location(s)? One per site, one for several sites? How many compute nodes per controller would be necessary?
  • Should we have a single or multiple endpoints? Why?
  • Wide Area Network limitations (in terms of latency/bandwidth): Are there critical latency constraints that may prevent the correct functioning of core components? Are current services efficient enough to deal with WAN constraints (VM images, …)
  • Consistency: How can we guarantee consistency of core-services states? If one project/vm/… is created on one site, the states of the other sites should be consistent to avoid for instance double assignment of Ids/IPs/…
  • Security management : Do Fog/Edge infrastructure create new security issues ? How can we ensure the security of communications inside and between the different locations?
  • Fault tolerance issues: How can we revise OpenStack in a way that guarantees that the crash or the isolation of one (or several sites) does not impact other DCs? (Each site should be able to run independently.)
  • Maintainability: how can we upgrade the system in a consistent way (considering that upgrading the complete infrastructure can take a significant amount of time while facing crash and disconnection issues) ? In other words, we should propose mechanisms that allow OpenStack to behave correctly even if we have different versions of the core-services?
  • Interconnexion between multi-vendors (peering agreement challenges, interoperability…)

Following such a study, it will be possible to propose revisions/extensions and debate on the different approaches.

[1] http://people.rennes.inria.fr/Adrien.Lebre/PUBLIC/MassivelyDistributed-101.pdf
[2] https://etherpad.openstack.org/p/massively-distributed-clouds-overview (Initial massively distributed WG proposal made in Austin).
[3] http://docs.openstack.org/arch-design/multi-site.html
[4] https://www.openstack.org/assets/presentation-media/OpenStack-2016-Austin-D-NFV-vM.pdf
[5] https://wiki.openstack.org/wiki/Tricircle
[6] http://beyondtheclouds.github.io

Mission

Be the recognized forum of expertise for OpenStack deployments over multiple sites and provide advice and inputs to the Architecture WG and the entire OpenStack community. Be a catalyst for actions that deal with massively distributed cloud computing challenges, in particular by identifying cooperation opportunities.

Interactions with other WGs

How to participate

  • Sign up to the openstack-dev mailing list and look for posts with "[Massively distributed]" in the subject
  • Take part in our bi-monthly meetings on irc #openstack-distributed suggest your agenda items and take part in current discussions (Time should be defined according to the different timezone of participants).
  • Share particular use-cases or superuser stories
  • Review specs and provide your input
  • Email Adrien Lebre <adrien.lebre@inria.fr> with your suggestions, questions, …
  • Link toward IRC meetings: please see https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2016


Planned Actions for Ocata cycle (proposals, to be discussed/confirmed during the next summit)

  • Identify a set of scenarios in which having a distributed cloud is required. Identify to which level of distribution they are optimal.
  • Start by studying/discussing already available mechanisms to distribute Openstack accross several geographically distant regions.
  • Conduct a strong evaluation to identify
    • bottlenecks
    • blocking design choices (such as the rabbit+rpc question, ZeroMQ, …)
  • Produce visible results, available for the whole community. Wiki pages, summit presentations.
  • Analyze pros/cons of on-going actions and identify cooperation opportunities

Previous documents

Etherpad for Barcelona's sessions: https://etherpad.openstack.org/p/massively_distribute-barcelona_working_sessions

Initial proposal: [1]

Meeting agendas for 2016: https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2016