Jump to: navigation, search

Rubick

Revision as of 09:51, 23 October 2013 by Oleg Gelbukh (talk | contribs) (Add link to rule engine description page)

Project Name

Official: OpenStack Diagnostics

Codename: Rubick

Overview

The typical OpenStack cloud life cycle consists of 2 phases:

  • initial deployment and
  • operation maintenance


OpenStack cloud operators usually rely on deploymnet tools to configure all the platform components correctly and efficiently in initial deployment phase. Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack, to name a few.

However, once you installed and kicked off the cloud, platform configurations and operational conditions begin to change. These changes could break consistency and integration of cloud platform components. Keeping cloud up and running is the essense of operation maintenance phase.

Cloud operator must quickly and efficiently identify and respond to the root cause of such failures. To do so, he must check if his OpenStack configuration is sane and consistent. These checks could be thought of as rules of diagnostic system.

There are no many projects in OpenStack ecosystem aimed to increase reliability and resilience of the cloud at the operation stage. With this proposal we want to introduce a project which will help operators to diagnose their OpenStack platform, reduce response time to known and unknown failures and effectively support the desired SLA.

Mission

Diagnostics' mission is to provide OpenStack cloud operators with tools which minimize time and effort needed to identify and fix errors in operations maintenance phase of cloud life cycle.

User Stories

  • As a cloud operator, I want to make sure that my OpenStack architecture and configuration is sane and consistent across all platform components and services.
  • As a cloud architect, I want to make sure that my OpenStack architecture and configuration are compliant to best practices.
  • As a cloud architect, I need a knowledge base of troubleshooting scenarios and best practices for my OpenStack cloud which I can reuse and update with my own scenarios and practices.
  • As a cloud operator, I want to be able to automatically extract configuration parameters from all OpenStack components to verify their correctness, consistency and integrity.
  • As a cloud operator, I want automatic diagnostics tool which can inspect configuration of my OpenStack cloud and report if it is sane and/or compliant toc community-defined best practices.
  • As a cloud operator, I want to be able to define rules used to inspect and verify configuration of OpenStack components and store them to use for verification of future configuration changes.

Roadmap

Proof of concept implementation

Targeted to end October 2013. PoC implementation scope includes:

  1. Open source code in stackforge repository
  2. Standalone service with REST API v0.1
  3. Simple SSH-based configuration data extraction
  4. Rules engine with grammatic analysis
  5. Basic healthcheck ruleset v0.1 with example rules of different types
  6. Filesystem-based ruleset store

MVP1 implementation

Targeted to mid-November 2013. MVP1 implementation scope includes:

  1. Basic integration with OpenStack Deployment program projects (Tuskar, TripleO)
  2. Extraction of configuration data from Heat metadata
  3. Extended ruleset with example best practices
  4. Healthcheck ruleset v1.0
  5. Ruleset store back-ends

Links

  1. Source code on GitHub: https://github.com/MirantisLabs/rubick
  2. Launchpad project: https://launchpad.net/Rubick
  3. Service architecture: Rubick/Service architecture
  4. OpenStack integration use cases: Rubick/OpenStack Integration
  5. Rule engine description: Rubick/Rules engine