Why and how to use Intel Node Manager in Openstack
This Wiki is used to show how we can use Intel Node Manager in Openstack.
First, it will give a brief introduction to Intel Node Manager; then it will describe how we can leverage these features in Openstack.
Introduction of Node Manager
Intel Node Manager is a server management technology that allows management software to accurately monitor and control the platform’s power and thermal behaviors through industry defined standards: Intelligent Platform Management Interface(IPMI) and Datacenter Manageability Interface (DCMI).
It allows the administrator of IT to monitor the power and thermal behaviors of servers and make reasonable operation or policy according to the real-time power/thermal status of data center. Using Intel Node Manager,the power and thermal information of these servers can be used to improve overall data center efficiency and maximize overall data center usage. Data center managers can maximize the rack density with the confidence that rack power budget will not be exceeded. During a power or thermal emergency, Intel Node Manager can automatically limit server power consumption and send alert to administrators via the pre-defined policy.
The main features of Intel Node Manager can be described as following.
1) Monitor power/temperature.(Be implemented in Openstack, please see the following link for the patches)
- Collect the power/thermal information via IPMI command. It also can collect data of each subsystem, such as CPU, Memory or I/O.
2) Make policy based on power/thermal.(It is not be implemented in Openstack yet. You can use the 3rd party software or Intel software to do the policy stuffs)
- Pre-defined policy can be made based on power/thermal data.
- We can set power budget for the server, if power exceeds the threshold, frequency of CPU will decrease to bring power consumption down. (It is also called "Power Capping").
- Policy also can be made for the power/thermal emergency, if the data exceed threshold, the pre-defined operation will executed, such as shut down server or decrease CPU frequency or send alert, etc.