Difference between revisions of "Self-healing SIG"

Jump to: navigation, search

Revision as of 00:50, 23 May 2018

1 Self-healing SIG

Self-healing SIG

Status: Formed

Original proposal: http://lists.openstack.org/pipermail/openstack-sigs/2017-September/000054.html

Mission

This SIG aims to coordinate the use and development of several OpenStack projects which can be combined in various ways to manage OpenStack infrastructure in a policy-driven fashion, reacting to failures and other events by automatically healing services.

Background

One of the biggest promises of the cloud vision was the idea that all the infrastructure could be managed in a policy-driven fashion, reacting to failures and other events by automatically healing and optimising services. Most of the components required to implement such an architecture already exist within OpenStack:

HA of individual services
Monasca: monitoring
Aodh: alarming
Congress: policy-based governance
Mistral: workflow
Senlin: clustering service
Vitrage: root cause analysis
Watcher: optimization
Masakari: compute plane HA
Freezer-dr: compute plane HA
Heat: orchestration (normally used for cloud applications, but can also deploy cloud infrastructure via TripleO)
Doctor: fault management and maintenance for NFV
Fault Genes Working Group: Fault classification & Recovery Strategy
Craton: Fleet management
Kolla： Containerized OpenStack deployment tool
kolla-k8s: same as above but in kubernetes cluster

However, there is not yet a clear strategy within the community for how these should all tie together. This SIG aims to address that.

Scope

The original proposal defined the SIG's scope as self-healing of cloud infrastructure, so for now it is primarily of interest to developers and operators, not end users. However it is also possible that in the future we will extend the scope to self-healing of cloud applications (e.g. see https://www.openstack.org/videos/barcelona-2016/building-self-healing-applications-with-aodh-zaqar-and-mistral), in which case end users could also find the SIG useful.

The scope could encompass not only self-healing of failures and service degradations, but also automatic optimization such as that performed by Watcher. However this would raise the issue that the name "self-healing" is not perfect because "healing" implies something is sick/broken, and optimization occurs even when the cloud is perfectly healthy. At the Sydney Forum session it was decided that it was better to be pragmatic and start small by focusing on hard failures. Optimization can easily be introduced later if required.

Goals

Document reference stacks describing what use cases can already be addressed with the existing projects. (Even better if some of these stacks have already been tested in the wild.)
Document what integrations between the projects already exist at a technical level.
Collect real-world use cases from operators, including ones which they would like to accomplish but cannot yet.
From the collected use cases, perform gaps analysis to help shape the future direction of these projects, e.g. through specs targetting those gaps.
Perform overlap analysis to help ensure that the projects are correctly scoped and integrate well without duplicating any significant effort.
Ensure that operators and developers are connecting on this topic on a regular basis, so that project development is steered in directions which will meet real-world requirements.

Audience

Developers working on the OpenStack projects listed above
Architects responsible for designing OpenStack deployments
Operators responsible for deploying and managing OpenStack

As the scope increases in the future, we may also want to include:

Architects responsible for designing applications which run on OpenStack clouds
Developers responsible for developing applications which run on OpenStack clouds
End users of applications which run on OpenStack clouds

SIG Leads

Adam Spiers
Co-lead: Eric Kao

Community Infrastructure

Wiki: this page
openstack-sigs mailing list; use the [self-healing] tag
StoryBoard project
IRC channel: #openstack-self-healing on Freenode IRC
IRC meetings: TBD, #openstack-self-healing; agenda / details to be linked on SIG page + meetings list

Upcoming events

There is a BoF session scheduled for the Vancouver Forum on Thursday, May 24, 1:50pm-2:30pm. Topics are being captured in the YVR-self-healing-brainstorming etherpad.

Several other sessions relating to self-healing have been been accepted for the Vancouver summit:

Past events

Project contacts

As a small measure of protection against email crawlers, emails are kept at https://ethercalc.openstack.org/docID where docID is e6retozlgrf8

Project	Contact	IRC Handle
Ansible (Openstack)	Jean-Philippe Evrard	evrardjp
Aodh
Cinder
Congress	Eric Kao	ekcs
Craton
Fault Genes WG
Freezer-DR	Saad Zaher	szaher
Heat	Rico Lin
Kolla
Masakari	Adam Spiers	aspiers
Mistral	Dougal Matthews	d0ugal
Monasca	Witold Bedyk	witek
Neutron	Yushiro Furukawa
Nova
OPNFV	Georg Kunz	georgk
OPNFV Doctor
Senlin	Qi Ming Teng	Qiming
Senlin	XueFeng Liu	XueFeng
Senlin	Yuanbin Chen	chenyb4
TripleO	Michele Baldessari	bandini
TripleO	Damien Ciabrini
Vitrage	Ifat Afek	ifat_afek
Watcher	Alexander Chadin	alexchadin

History

The idea for the SIG was born out of long-standing efforts to unify the OpenStack HA community around a single solution for instance HA, coupled with the realisation that this was just one of many self-healing use cases required in order for OpenStack infrastructure to be robust and performant.

The first meeting happened at the Denver PTG, and was minuted in this etherpad. The SIG was formally proposed as a result of this meeting.

A Sydney Forum session was proposed, accepted, and took place, after which the SIG was officially formed.

A longer description of the history is in this blog post.

Retrieved from "https://wiki.openstack.org/w/index.php?title=Self-healing_SIG&oldid=161411"

@@ Line 76: / Line 76: @@
 === Upcoming events ===
-* Sessions relating to self-healing have been been accepted for the Vancouver summit:
+There is a [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21830/self-healing-sig-bof BoF session scheduled for the Vancouver Forum on Thursday, May 24, 1:50pm-2:30pm].  Topics are being captured in [https://etherpad.openstack.org/p/YVR-self-healing-brainstorming the YVR-self-healing-brainstorming etherpad].
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20957/cloud-monitoring-with-vitrage-hands-on-lab Cloud Monitoring with Vitrage – Hands-On Lab]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21593/vitrage-project-update Vitrage - Project Update]
+Several other sessions relating to self-healing have been been accepted for the Vancouver summit:
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20839/closing-the-loop-vnf-end-to-end-failure-detection-and-auto-healing Closing the Loop: VNF end-to-end Failure Detection and Auto Healing]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20964/proactive-root-cause-analysis-with-vitrage-kubernetes-zabbix-and-prometheus Proactive Root Cause Analysis with Vitrage, Kubernetes, Zabbix and Prometheus]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20957/cloud-monitoring-with-vitrage-hands-on-lab Cloud Monitoring with Vitrage – Hands-On Lab]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21653/vitrage-project-onboarding Vitrage - Project Onboarding]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21593/vitrage-project-update Vitrage - Project Update]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21622/masakari-project-update Masakari - Project Update]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20839/closing-the-loop-vnf-end-to-end-failure-detection-and-auto-healing Closing the Loop: VNF end-to-end Failure Detection and Auto Healing]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21656/masakari-project-onboarding Masakari - Project Onboarding]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/20964/proactive-root-cause-analysis-with-vitrage-kubernetes-zabbix-and-prometheus Proactive Root Cause Analysis with Vitrage, Kubernetes, Zabbix and Prometheus]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21615/congress-project-update Congress - Project Update]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21653/vitrage-project-onboarding Vitrage - Project Onboarding]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21620/mistral-project-update Mistral - Project Update]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21622/masakari-project-update Masakari - Project Update]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21638/mistral-project-onboarding Mistral - Project Onboarding]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21656/masakari-project-onboarding Masakari - Project Onboarding]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21594/monasca-project-update Monasca - Project Update]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21615/congress-project-update Congress - Project Update]
-** [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21639/monasca-project-onboarding Monasca - Project Onboarding]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21620/mistral-project-update Mistral - Project Update]
-* Additionally, brainstorming for topics for the Vancouver Forum has started:
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21638/mistral-project-onboarding Mistral - Project Onboarding]
-** https://etherpad.openstack.org/p/YVR-self-healing-brainstorming
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21594/monasca-project-update Monasca - Project Update]
+* [https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21639/monasca-project-onboarding Monasca - Project Onboarding]
 === Past events ===