The typical OpenStack cloud life cycle consists of 2 phases:
- initial deployment and
- operation maintenance
OpenStack cloud operators usually rely on deployment tools to configure all the platform components correctly and efficiently in initial deployment phase. Multiple OpenStack projects cover that area: TripleO/Tuskar, Fuel and Devstack, to name a few.
However, once you installed and kicked off the cloud, platform configurations and operational conditions begin to change. These changes could break consistency and integration of cloud platform components. Keeping cloud up and running is the essense of operation maintenance phase.
Cloud operator must quickly and efficiently identify and respond to the root cause of such failures. To do so, he must check if his OpenStack configuration is sane and consistent. These checks could be thought of as rules of diagnostic system.
There are no many projects in OpenStack ecosystem aimed to increase reliability and resilience of the cloud at the operation stage. With this proposal we want to introduce a project which will help operators to diagnose their OpenStack platform, reduce response time to known and unknown failures and effectively support the desired SLA.
Diagnostics' mission is to provide OpenStack cloud operators with tools which minimize time and effort needed to identify and fix errors in operations maintenance phase of cloud life cycle.
More on use cases: Rubick/OpenStack Integration
- Stand-alone tool for validating configuration consistency across OpenStack services for individual service instance. For example, if we want to start nova-compute with VMWare driver, Nova will talk to Rubick during initialization, send in its configuration file and request validation of it. If configuraiton is inconsistent with configurations of other services (e.g. Keystone or Glance endpoints are not correct), Nova startup should fail with a message.
- Validation of configurations of the whole OpenStack platform as a pre- or post-deployment action. For example, TripleO could include the validation as a final step pri
- Diagnostic API to increase debugability of OpenStack. For example, whitebox testing could be simplified in Tempest if Diagnostic API allows to track actual state of resources upon Nova API request performed by test class.
Service architecture: Rubick/Service architecture
- Source code in Stackforge on GitHub: https://github.com/stackforge/rubick
- Patches in review in Gerrit: https://review.openstack.org/#/q/status:open+project:stackforge/rubick,n,z
- Launchpad project: https://launchpad.net/Rubick