Jump to: navigation, search

Difference between revisions of "Self-healing SIG"

(Project contacts: add reference to emails kept in ethercalc)
(Upcoming events)
Line 87: Line 87:
** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21620/mistral-project-update Mistral - Project Update]
** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21620/mistral-project-update Mistral - Project Update]
** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21638/mistral-project-onboarding Mistral - Project Onboarding]
** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21638/mistral-project-onboarding Mistral - Project Onboarding]
** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21594/monasca-project-update Monasca - Project Update]
** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21639/monasca-project-onboarding Monasca - Project Onboarding]
* Additionally, brainstorming for topics for the Vancouver Forum has started:
* Additionally, brainstorming for topics for the Vancouver Forum has started:
** https://etherpad.openstack.org/p/YVR-self-healing-brainstorming
** https://etherpad.openstack.org/p/YVR-self-healing-brainstorming

Revision as of 13:09, 18 May 2018

Self-healing SIG

Status: Formed

Original proposal: http://lists.openstack.org/pipermail/openstack-sigs/2017-September/000054.html


This SIG aims to coordinate the use and development of several OpenStack projects which can be combined in various ways to manage OpenStack infrastructure in a policy-driven fashion, reacting to failures and other events by automatically healing services.


One of the biggest promises of the cloud vision was the idea that all the infrastructure could be managed in a policy-driven fashion, reacting to failures and other events by automatically healing and optimising services.  Most of the components required to implement such an architecture already exist within OpenStack:

However, there is not yet a clear strategy within the community for how these should all tie together. This SIG aims to address that.


The original proposal defined the SIG's scope as self-healing of cloud infrastructure, so for now it is primarily of interest to developers and operators, not end users. However it is also possible that in the future we will extend the scope to self-healing of cloud applications (e.g. see https://www.openstack.org/videos/barcelona-2016/building-self-healing-applications-with-aodh-zaqar-and-mistral), in which case end users could also find the SIG useful.

The scope could encompass not only self-healing of failures and service degradations, but also automatic optimization such as that performed by Watcher. However this would raise the issue that the name "self-healing" is not perfect because "healing" implies something is sick/broken, and optimization occurs even when the cloud is perfectly healthy. At the Sydney Forum session it was decided that it was better to be pragmatic and start small by focusing on hard failures. Optimization can easily be introduced later if required.



  • Developers working on the OpenStack projects listed above
  • Architects responsible for designing OpenStack deployments
  • Operators responsible for deploying and managing OpenStack

As the scope increases in the future, we may also want to include:

  • Architects responsible for designing applications which run on OpenStack clouds
  • Developers responsible for developing applications which run on OpenStack clouds
  • End users of applications which run on OpenStack clouds

SIG Leads

Community Infrastructure

  • Wiki: this page
  • openstack-sigs mailing list; use the [self-healing] tag
  • StoryBoard project
  • IRC channel: #openstack-self-healing on Freenode IRC
  • IRC meetings: TBD, #openstack-self-healing; agenda / details to be linked on SIG page + meetings list

Upcoming events

Past events

Project contacts

As a small measure of protection against email crawlers, emails are kept at https://ethercalc.openstack.org/docID where docID is e6retozlgrf8

Project Contact Email IRC Handle
Ansible (Openstack) Jean-Philippe Evrard evrardjp
Congress Eric Kao ekcs
Fault Genes WG
Freezer-DR Saad Zaher szaher
Heat Rico Lin
Masakari Adam Spiers aspiers
Mistral Dougal Matthews d0ugal
Monasca Witold Bedyk witek
Neutron Yushiro Furukawa
OPNFV Georg Kunz georgk
OPNFV Doctor
Senlin Qi Ming Teng Qiming
Senlin XueFeng Liu XueFeng
Senlin Yuanbin Chen chenyb4
TripleO Michele Baldessari bandini
TripleO Damien Ciabrini
Vitrage Ifat Afek ifat_afek
Watcher Alexander Chadin alexchadin


The idea for the SIG was born out of long-standing efforts to unify the OpenStack HA community around a single solution for instance HA, coupled with the realisation that this was just one of many self-healing use cases required in order for OpenStack infrastructure to be robust and performant.

The first meeting happened at the Denver PTG, and was minuted in this etherpad. The SIG was formally proposed as a result of this meeting.

A Sydney Forum session was proposed, accepted, and took place, after which the SIG was officially formed.

A longer description of the history is in this blog post.