Jump to: navigation, search

Difference between revisions of "Nova-scheduler-HostState"

(add description of how the data is reported)
(Add new section 'How to get the data(initial data source) ')
Line 73: Line 73:
 
   http://lists.openstack.org/pipermail/openstack-dev/2013-June/010653.html
 
   http://lists.openstack.org/pipermail/openstack-dev/2013-June/010653.html
  
If the community decides to store the data into DB then loaded by the scheduler for use, like the 'resource_tracker', we need to design a DB scheme which is extensible as the followings:
+
If the community decides to store the data into DB then loaded by the scheduler for use, like the 'resource_tracker', the following DB scheme which is extensible is needed:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! Table Name !! Fields
 +
|-
 +
| resource_names
 +
||
 +
id:  primary_key
 +
name: string
 +
|-
 +
| compute_node_resources
 +
||
 +
id:              primary_key
 +
resource_name_id: foreign key to table resource_names
 +
compute_node_id:  foreign key to table compute_nodes
 +
value:            text, json encoded value
 +
timestamp:        timestamp
 +
source:           string
 +
|}

Revision as of 05:58, 21 June 2013

Nova Scheduler HostState Change Proposal

--lianhao-lu (talk) 04:27, 8 June 2013 (UTC) Created.

Here we list the current HostState fields in nova scheduler, and proposed the potential changes required for the following blueprints:

Current HostState

Field Read/Write Initial Source Description Comment
host/nodename n/a __init__() identify the host/node
capabilities ro nova compute manager dictionary contains the following keys:
  • cpu_info
  • vcpus, vcpu_used
  • disk_total, disk_used, disk_available
  • host_memory_total, host_memory_free
  • hypervisor_type, hypervisor_version, hypervisor_hostname
  • supported_instances
compute node polls periodically.
send it directly back to scheduler through RPC.
service ro DB table - services nova compute service
total_usable_disk_gb
disk_mb_used
free_ram_mb
free_disk_mb
vcpus_total
vcpus_used
rw DB table - compute_nodes compute node periodically polls and save into DB.
modified by scheduler according to scheduling situation.
num_instances
num_io_ops
rw DB table - compute_node_stats statistics compute node periodically polls and save into DB.
modified by scheduler according to scheduling situation.
num_instances_by_project
num_instances_by_os_type
vm_states
task_states
rw DB table - compute_node_stats number of instances by project_id, os_type, vm_state, task_state respectively compute node periodically polls and save into DB.
modified by scheduler according to scheduling situation.
limits rw from schedulers resource oversubscription value used by compute node when building new instance
updated rw last update timestamp

Proposed changes to the HostState fields

We plan to use the following fields to replace the current HostState fields, which is extensible to store more information for the scheduler. Every nova compute host will have a corresponding HostState instance respectively.

1. A new dictionary 'resources' will contain the resource usage information(e.g. free_ram_mb, vcpus_used, etc.) about the platform in the following format:

{
   <resource_name> : {
                  'value': <value of the resource>,
                  'timestamp': <last update time stamp>,
                  'source':<source of the data>, i.e. nova-compute, ceilometer, etc.
                  }
}

2. Those statistic related fields(i.e. num_instances, vm_states, etc.) might need to grouped into a new dictionary 'stats', which would look something like the followings:

{
 'num_instances': 1
 'num_instances_by_project': {
                               'project-id1': 2
                               'project-id2': 1
                             }
 'vm_states': {
                'active': 1
              }
}

3. The existing 'capabilities' will only contains features information of the compute node platform, i.e. cpu features, etc.

4. Other fields will remain unchanged.

5. For compatibility, the new HostState should also support the current method to access its current fields, e.g. host_state.num_instances, host_state.free_ram_mb, etc.

How to get the data(initial data source)

The data where stored in the 'resources' dictionary in HostState could be reported from the compute node periodically to the scheduler by RPC, as mentioned in UtilizationAwareScheduling according to blueprint#1. The data could also be collected from other service, e.g. ceilometer. However, there are some discussions in the community to argue that the data should be saved in to DB first:

 http://lists.openstack.org/pipermail/openstack-dev/2013-June/010653.html

If the community decides to store the data into DB then loaded by the scheduler for use, like the 'resource_tracker', the following DB scheme which is extensible is needed:

Table Name Fields
resource_names
id:   primary_key
name: string
compute_node_resources
id:               primary_key
resource_name_id: foreign key to table resource_names
compute_node_id:  foreign key to table compute_nodes
value:            text, json encoded value
timestamp:        timestamp
source:           string