Jump to: navigation, search

Difference between revisions of "ExtensibleResourceTracking"

m (Summary)
(Summary)
Line 1: Line 1:
 
== Summary ==
 
== Summary ==
  
[ New update - see section: [[ExtensibleResourceTracking#Relation_to_Utilization_Aware_Scheduling|Relation to utilization aware scheduling]] ]
+
[ '''New update''' - see section: [[ExtensibleResourceTracking#Relation_to_Utilization_Aware_Scheduling|Relation to utilization aware scheduling]] ]
  
 
There are a number of defined resources in Nova, including disk, memory and number of vCPU. The purpose of this specification is to support an extensible set of resources so that new resources can be defined for compute nodes and resource requirements for instances. These will be made available to the scheduler and allocated by the compute manager.  
 
There are a number of defined resources in Nova, including disk, memory and number of vCPU. The purpose of this specification is to support an extensible set of resources so that new resources can be defined for compute nodes and resource requirements for instances. These will be made available to the scheduler and allocated by the compute manager.  

Revision as of 19:40, 9 December 2013

Summary

[ New update - see section: Relation to utilization aware scheduling ]

There are a number of defined resources in Nova, including disk, memory and number of vCPU. The purpose of this specification is to support an extensible set of resources so that new resources can be defined for compute nodes and resource requirements for instances. These will be made available to the scheduler and allocated by the compute manager.

The resources of a compute node have limited capacity. Most, including disk, memory and vCPU, map to a physical counterpart, typically via a virtualisation technology, but it is also possible to create entirely abstract resource types that do not. As an example, a provider may chose that only one instances of a certain type should run on any compute node. This trivial example can be supported by defining a new resource for compute nodes and associating a requirement for that resource with the instance type.

Each instance has resource requirements specified as the quantity of each type of resource it needs. These are used by the scheduler to match it to compute nodes with sufficient available resources.

The compute manager on a host is aware of the resources available at the compute node and is responsible for their allocation and management. Because it has definitive knowledge of what resources are actually in use, the compute manager has the final say as to whether a compute node is able to host an instance.

Fortunately, a lot of what we need already exists. Flavors have the extra_specs attribute allowing us to define resource requirements for instances. The filter scheduler has a plug-in framework allowing new filters and weighers that use extended resources to be added. The area that does not have an extension mechanism, and the subject of this specification, is the compute node and its resource tracker.

We will need:

  • a model of a resource
  • a model of a resource requirement
  • a naming scheme to allow resource requirements to be related to resources
  • changes for compute nodes
    • resource configuration - to define extended resources
    • resource_tracker - to track extended resources
    • claims - to test requirements against available extended resources
  • provision of resource requirement information
    • resource requirements in flavors
    • resource requirements in requests
    • resource requirements in instances
  • changes for the database
    • extended resources at compute_nodes
    • extended resource requirements in instances
  • changes for scheduler (filter scheduler)
    • host manager - to access extended resources
    • filters - examples using extended resources
    • weighers - examples using extended resources

Resource Models and Naming

Resource Model

A resource has the following attributes:

  • name - used to identify this resource
  • type - a resource type name for human viewing
  • description - a short description for human viewing


Each resource at a compute node is represented by its capacity in some form - this is recorded by the resource tracker and used in scheduling and allocation decisions. This information is held at the resource tracker and passed via the database to the host_manager at the scheduler. A general form for existing resources uses the total amount available and the amount used - but any arbitrary representation could be used for extended resources.

Resource capacity (example):

  • resource total = value (configured or discovered - total amount of resource)
  • resource used = value (tracked - amount of resource currently used)


Each resource required for a particular instance is represented by a set of requirements; generally this is a single value in existing resources, such as amount of memory. This information is defined for a flavor, passed in create requests to be used at the scheduler and the resource_tracker/claims at a compute manager, and is inherited by an instance.

Resource requirement (example):

  • resource required = value (configured - amount required)


A final quantity associated with resources is the limit, the total amount that can be committed. Again, this can be arbitrary or complex information, but in general it is a single value. This can be different to the actual capacity as it reflects the policy of over or under committing the resource. The limit is calculated per compute node at the scheduler according to a function of other attributes and policy implemented in the filters. The limit is communicated with a create request and used by the resource_tracker claims.

Resource limit (example):

  • resource limit = value (calculated - the total amount of resource that can be allocated)

Naming

The names of resources should be unique in a given OpenStack deployment. In all cases a resource is known by its name and this is used to perform comparisons between information from different sources at the scheduler and the resource tracker.

In some cases (such as flavors, where extra specs will be used to hold the information) a prefix is required: prefix:name. The prefix should also be different to that used by any existing features.

Changes for compute nodes

Resource configuration

The resource tracker at a compute manager needs to know the resources and their quantities. The way that the quantity is obtained may vary by resource type, some being configured, others being discovered programmatically. The way resources are represented and capacity calculated may also vary. So we will use a plug-in framework for resources.

A base class will be implemented for new resource types to extend. These classes will be loaded at run time as they are discovered in the configuration. This follows the model of filter/weight classes at the filter scheduler.

Some methods in the existing resource tracker and claims classes will be moved into the resource base class so they can be implemented in a resource specific way. The class will also provide a method to add arbitrary information to a dictionary: this is based on the extra_specs attribute of flavors and will be used to communicate resource capacity to the scheduler. The extra_specs prefix convention does not apply but can be used.

The standard memory, disk and vCPU resources can be implemented in this form.

Resource tracker

The resource tracker will load the resource classes. These will be used to obtain the local capacity. In addition the resource tracker will pass the extra_specs data to the resource classes.

Claims

The claims class will be modified to use test methods provided by the resource classes. These tests will take the extra_specs data as a parameter.

Provision for resource requirements

Resource requirements in flavors

Flavors already have attributes defined for standard resource types. These will continue to be used for backward compatibility. The extra_specs extension will be used to incorporate extended resource requirements.

Flavors have an attribute called extra_specs that can contains arbitrary attributes. If the attributes in extra_specs do not have a prefix of the form prefix: they are interpreted as specifying capabilities in the scheduler, so all extensions require a prefix in the name. However, for our purposes there is no need to specify the prefix further as these attributes will be interpreted by plug-in code specific to the resource type. An example of resource requirements in extra_specs could be:

"extra_specs" : { "provider_name:my_resource" : 5, "std_name:another_resource" : 10}

Resource requirements in requests

The flavor details are added to the filter_properties in create requests and so the extra_specs attribute carries through to the scheduler and is available to filters and weighers, and to the compute manager, where it can be made available to the resource_tracker.

Resource requirements in instances

The instance object holds the standard resource requirements inherited from the flavor directly as attributes. It also has a reference to its flavor id (instance_type_id), which holds the extra_specs information. Whenever the instance needs to be scheduled, at creation or migration, the flavor details are copied into the filter properties for the request, so it is always available when needed either directly from filter_properties of a request or indirectly by retrieving the flavor given the instance_type_id.

It is not yet determined if there is a performance advantage in copying the requirements to be held in the instance object.

As a consequence, the extended resource requirements will not be presented when the user accesses the instance details. That can only be accessed by accessing the flavor details.

Changes for the database

The database is currently used to record the state of compute nodes. This also provides a path for information to pass from the compute nodes to the scheduler. In the future this is likely to change to avoid the database as a bottleneck in scheduling.

There will be a database schema change to support extensible resource capacities for the compute_nodes table. It is becoming common practice in Nova to represent extensible data structures as JSON strings rather than additional attribute value tables for reason of performance. The same will be done here.

Extended resources at compute_nodes

The compute_nodes table already contains columns for the standard resource types. An additional column called extra_resources will be added to hold a JSON string serialising the capacity information for extended attributes at the node. The information will be arbitrary and only interpreted by resource specific plug-ins at the scheduler.

Changes for scheduler

The scheduler already has an extension framework for filters and weighers and already places flavor attributes into filter properties. The host manager extracts host state from the database.

Host State

The HostState class will provide access to the extended resource capacity information. This requires a minor change to the update_from_compute_node() method. The HostState also has a consume_from_instance() method that updates the resources recorded by the class in response to allocation decisions in the scheduler. This will be changed to allow for the new extended resources and again requires plugins to implement the method. The same pattern as the filter classes will be used for the plugins.

Filters

Examples of filters will be provided that use extended resource types

Weighers

Examples of weighers will be provided that use extended resource types.

Consumers

Examples of consumers (the plugin class implementing consume_from_instance() methods for the HostState) will be provided that use the extended resource types.

Relation to Utilization Aware Scheduling

The purpose of this note is to clarify similarities and differences between bp utilization-aware-scheduling and bp extensible-resource-tracking. Some comments posted have referred to UBS (utilzation based scheduling). The bp utilisation-based-scheduling has not been worked on since May 2013 and was dropped in favour of bp utilization-aware-scheduling, so I assume that "UBS" really refers to the latter.

On the surface, extensible-resource-tracking and utilization-aware-scheduling appear similar: they collect data at the compute node and communicate it to the scheduler. However, that is not the whole picture and I would argue that what they do is sufficiently different that they should be kept separate. So, lets just look at what they do and how.

utilization-aware-scheduling uses "resource monitor" plugins to obtain actual usage information from the host environment, operating system, or devices.

extensible-resource-tracking uses "resource allocator" plugins to calculate allocation information from instances and to determine resource allocation.

The significant points of implementation to consider are around the respective plugins at the compute node and how they interact with the update procedure and the resource claims procedure.

ResourceTracker.update_available_resources() - this is the main method for updating information about resources. During this method they do the following:


utilization-aware-scheduling:

  • during the method each resource monitor is called to obtain is data. The monitor may collect data at this point, or may have been doing so in an on-going way in the background (although I am not aware one has been implemented this way so far).


extensible-resource-tracking:

  • the resource allocators are updated in three steps (following the existing process):
    • call method to initialize
    • for each instance at the node (running or not), call method to calculate allocation info
    • call method to obtain final data

ResourceTracker.instance_claim() - this is the main method for determining if a node has sufficient un-allocated resources to support a new instance. During this method they do the following:


utilization-aware-scheduling:

  • not applicable


extensible-resource-tracking:

  • call test method for claim, passing instance and limits input, to determine sufficient free allocation

Potential for a combined monitor/allocator implementation at the compute node:

Monitors could be extended to implement methods that support allocation functions as well. These extended monitors would be called from a different part of the update procedure, in a different way, would represent a different type of information, that changes at a different rate, and would also be listed separately in configuration. So, I would argue that monitors and allocators are sufficiently different to be kept separate.

Potential for combined data fields in database:

It is also possible to combine the data fields in the database. However, there is no particular performance advantage in combining them and they represent different types of information. Moreover, it may make sense to generate these are different rates, more frequently for usage statistics and only when it changes for allocation information. So I would argue to retain two separate fields for monitored usage statistics and allocation information.