Monasca/Monitoring Of Monasca
Contents
Goals and Deliverables
- Out of the box general purpose monitoring metrics and alarms available for all parts (services, applications, OS) that make up a Monasca installation.
- A dashboard for the Monasca specific components to monitor the health.
- Each component should have metrics to give a view of the service that is useful for thresholds, debugging and capacity planning
- CLI tools to complement the UI capable of displaying Monasca details
- monasca-collector info
- monasca-forwarder info
- Metrics
- Pre-configured Alarm definitions for all core services with reasonable general purpose thresholds
- Easy to see if the service is up or down
- Status, capacity, throughput, and latency with reasonable defaults out of the box
- Standard convention for metrics with some reserved names monasca-agent
There are exceptions when there are shared components like MySQL where other OpenStack components might be influencing performance or availability. The shared database would be generically labeled and not specifically identified as a Monasca component.
User Stories
- As an end user the first thing I want to see after installing Monasca is a dashboard showing the status, capacity, and latency of my Monasca installation.
- As an end user deploying Monasca either individually, via CI, Vagrant, or using the installer, I want an initial dashboard showing the status of Monasca.
- As an operator I want a simple and concise view of the health of the Monasca service.
- As an operator or provider I want metrics for all Monasca components that will describe the status, capacity, and latency of each component.
StackForge / OpenStack
Blueprints
Bugs
Reviews / Repos
- Github: Added the default alarms role
Architectural Components
Off the shelf open components
- Apache Kafka (message queue)
- MySQL (alarm, notifications database)
- InfluxDB (metrics, logging, events database)
- Apache Storm (realtime stream processor)
- Apache Zookeeper (resource coordinator)
- Operating System
Monasca components
- API
- Agent
- Notification engine
- Threshold engine
- Persister
Alarm Definition Name | Category | Provider | Component | Subcomponent | Type (status, capacity, throughput, latency) | Measurement | |
---|---|---|---|---|---|---|---|
1 | HTTP Status Alarm | System | Application | Monasca | API | Status | Up / Down |
2 | Host Alive Alarm | System | OS | Processor | Hardware | Status | Up / Down |
3 | Disk Usage | System | OS | Disk | Hardware | Capacity | Percentage |
4 | Disk Inode Usage | System | OS | Disk | Hardware | Capacity | Percentage |
5 | High CPU Usage | System | OS | Processor | Hardware | Capacity | Percentage |
6 | Network Errors | System | OS | Network | Hardware | Status | Count |
7 | Memory Usage | System | OS | Memory | Hardware | Capacity | Percentage |
8 | Kafka Consumer Lag | Monasca | Application | Message Queue | Consumer | Latency | Time |
9 | Monasca Agent emit time | Monasca | Application | Monasca | Agent | Latency | Time |
10 | Monasca Notification Configuration DB query time | Monasca | Application | Monasca | Notification | Latency | Time |
11 | Monasca Agent collection time | Monasca | Application | Monasca | Agent | Latency | Time |
12 | Zookeeper Average Latency | Monasca | Application | Resource Coordinator | ? | Latency | Time |
13 | Monasca Notification email time | Monasca | Application | Monasca | Notification | Latency | Time |
14 | Process not found | System | OS | Processor | Process | Status | Count |
15 | VM Cpu usage | OpenStack | OS | Processor | Hardware | Capacity | Percentage |