Official: OpenStack Diagnostics
The typical OpenStack cloud life cycle consists of 2 phases:
- initial deployment and
- operation maintenance
OpenStack cloud operators usually rely on deploymnet tools to configure all the platform components correctly and efficiently in initial deployment phase. Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack, to name a few.
However, once you installed and kicked off the cloud, platform configurations and operational conditions begin to change. These changes could break consistency and integration of cloud platform components. Keeping cloud up and running is the essense of operation maintenance phase.
Cloud operator must quickly and efficiently identify and respond to the root cause of such failures. To do so, he must check if his OpenStack configuration is sane and consistent. These checks could be thought of as rules of diagnostic system.
There are no many projects in OpenStack ecosystem aimed to increase reliability and resilience of the cloud at the operation stage. With this proposal we want to introduce a project which will help operators to diagnose their OpenStack platform, reduce response time to known and unknown failures and effectively support the desired SLA.
Diagnostics' mission is to provide OpenStack cloud operators with tools which minimize time and effort needed to identify and fix errors in operations maintenance phase of cloud life cycle.
- As a cloud operator, I want to make sure that my OpenStack architecture and configuration is sane and consistent across all platform components and services.
- As a cloud architect, I want to make sure that my OpenStack architecture and configuration are compliant to best practices.
- As a cloud architect, I need a knowledge base of troubleshooting scenarios and best practices for my OpenStack cloud which I can reuse and update with my own scenarios and practices.
- As a cloud operator, I want to be able to automatically extract configuration parameters from all OpenStack components to verify their correctness, consistency and integrity.
- As a cloud operator, I want automatic diagnostics tool which can inspect configuration of my OpenStack cloud and report if it is sane and/or compliant toc community-defined best practices.
- As a cloud operator, I want to be able to define rules used to inspect and verify configuration of OpenStack components and store them to use for verification of future configuration changes.
Proof of concept implementation
Targeted to end October 2013. PoC implementation scope includes:
- Open source code in stackforge repository
- Standalone service with REST API v0.1
- Simple SSH-based configuration data extraction
- Rules engine with grammatic analysis
- Basic healthcheck ruleset v0.1 with example rules of different types
- Filesystem-based ruleset store
Targeted to mid-November 2013. MVP1 implementation scope includes:
- Basic integration with OpenStack Deployment program projects (Tuskar, TripleO)
- Extraction of configuration data from Heat metadata
- Extended ruleset with example best practices
- Healthcheck ruleset v1.0
- Ruleset store back-ends