Jump to: navigation, search

Difference between revisions of "Nova/MultipleSchedulerPolicies"

Line 1: Line 1:
 
= Multiple Active Scheduler Drivers/Policies =
 
= Multiple Active Scheduler Drivers/Policies =
 
== Summary ==
 
== Summary ==
Support for multiple active scheduler policies and/or drivers associated with different classes of workloads (within a single Nova deployment).
+
Support for multiple active scheduler policies and/or drivers associated with different host aggregates within a single Nova deployment.
  
 
Blueprint: https://blueprints.launchpad.net/nova/+spec/multiple-scheduler-drivers
 
Blueprint: https://blueprints.launchpad.net/nova/+spec/multiple-scheduler-drivers
 
== Rationale ==
 
== Rationale ==
In heterogeneous environments, it is often required that different hardware pools, designed for different classes of workloads, are managed under different policies. In Grizzly, basic partitioning of hosts and enforcement of compatibility between flavors and hosts during instance scheduling can be already implemented using host aggregates and FilterScheduler with AggregateInstanceExtraSpecsFilter. However, it is not possible to define, for example, different sets of filters and weights, or even entirely different scheduler drivers associated with different classes of workloads.  
+
In heterogeneous environments, it is often required that different hardware pools are managed under different policies. In Grizzly, basic partitioning of hosts and enforcement of compatibility between flavors and hosts during instance scheduling can be already implemented using host aggregates and FilterScheduler with AggregateInstanceExtraSpecsFilter. However, it is not possible to define, for example, different sets of filters and weights, or even entirely different scheduler drivers associated with different aggregates.  
 
For example, the admin may want to have a pool with a conservative CPU overcommit (e.g., for CPU-intensive workloads), and another pool with aggressive CPU over-commit (for workloads which are less CPU-bound).
 
For example, the admin may want to have a pool with a conservative CPU overcommit (e.g., for CPU-intensive workloads), and another pool with aggressive CPU over-commit (for workloads which are less CPU-bound).
 
This blueprint introduces a mechanism to overcome this limitation.
 
This blueprint introduces a mechanism to overcome this limitation.
Line 11: Line 11:
  
 
== User Stories ==
 
== User Stories ==
# An administrator partitions the managed environment into host aggregates, decides on specialized scheduler configurations (policies) for some or all of the aggregates, and configures host aggregates and flavors accordingly.
+
# An administrator partitions the managed environment into host aggregates, and associates specialized scheduler configurations (policies) to some or all of the aggregates.
# On instance provisioning, the corresponding target aggregate and scheduling policy are determined based on the selected flavor
+
# On instance provisioning, the name of a policy is specified using a new scheduler hint
::''Note'': more options to determine the desired policy will be considered in the future.
+
::''Note'': more options to determine the desired policy, perhaps derived from other parameters/properties rather than explicitly specified in the provisioning request, will be considered in the future.
 
== Usage Details ==
 
== Usage Details ==
 
=== Configuration (user story 1) ===
 
=== Configuration (user story 1) ===
 
The administrator will:
 
The administrator will:
# Specify 'default' scheduler driver policy in nova.conf, as usual, e.g., FilterScheduler with CoreFilter and AggregateExtraSpecFilter.
+
# Specify 'default' scheduler driver policy under [DEFAULT] section in nova.conf (e.g., FilterScheduler with CoreFilter) – as usual
# Define one or more host aggregates, comprising the desired partitioning of the managed environment, e.g., aggr1 and aggr2, so that each aggregate is designated to certain classes of workloads (e.g., CPU-intensive and CPU-balanced).
+
# Add to nova.conf one or more new sections, dedicated to specifying the different scheduling policy configurations, overriding the defaults – driver and/or associated properties. For example, [high_cpu_density] specifying FilterScheduler with CoreFilter and cpu_allocation_ratio=8, and [low_cpu_density] specifying FilterScheduler with CoreFilter and cpu_allocation_ratio=1. Not that in the above example, since driver and filters are the same, it would not be mandatory to specify them in the specific policy sections.  
# (*) Attach new key-value pair to the metadata of each aggregate, specifying the label of the scheduling policy, e.g.: "sched_policy=low_cpu_density" for aggr1, and "sched_policy=high_cpu_density" for aggr2.
+
# Specify in nova.conf which policies are enabled (using a new property – e.g., enabled_scheduler_policies=low_cpu_density, high_cpu_density)
# Decide which flavors should be used for each of the classes of workloads.
+
# Create and populate with hosts one or more host aggregates, as usual.
# (*) Specify corresponding "sched_policy" key-value in the extra spec of the flavor.
+
# Set a new metadata key-value pair for one or more of the aggregates, specifying the desired policy to be used for scheduling instances in the corresponding aggregate (e.g., "policy=high_cpu_density").
# Decide on scheduler properties associated with each policy. Attach corresponding properties with "sched:" prefix to the extra spec (metadata) of the corresponding flavors. E.g., "sched:cpu_allocation_ratio=1.0" for flavor1 and "sched:cpu_allocation_ratio=8.0" for flavor2.
 
  
(*) Note: the new "sched_policy" metadata key-value pair is used in order to guarantee correct placement between aggregates, using AggregateExtraSpecFilter. If there are already other key-value pairs that would provide this guarantee, adding "sched_policy" to the aggregate and flavor is not necessary.
+
Example (partial) nova.conf:
  
 +
<tt>
 +
[DEFAULT]<BR>
 +
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler<BR>
 +
scheduler_default_filters = CoreFilter, SchedulerPolicyFilter<BR>
 +
cpu_allocation_ratio = 4.0<BR>
 +
<BR>
 +
<nowiki>#</nowiki> a list of policies that will be used by this scheduler<BR>
 +
enabled_scheduler_policies = low_cpu_density, high_cpu_density<BR>
 +
<BR>
 +
[low_cpu_density]<BR>
 +
cpu_allocation_ratio = 1.0<BR>
 +
<BR>
 +
[high_cpu_density]<BR>
 +
cpu_allocation_ratio = 8.0<BR>
 +
</tt>
 
=== Invocation (user story 2) ===
 
=== Invocation (user story 2) ===
The user will invoke an instance provisioning request specifying one of the flavors defined by the admin, as usual.  
+
The user will invoke an instance provisioning request specifying the desired policy via a dedicated scheduler hint. For example:
  
''Note'': when the flavor does not override any scheduler options, the default scheduler configuration (from nova.conf) will be used, as it has been done before.
+
<tt>
 +
$ nova boot --image 1 --flavor 1 --hint target_policy=low_cpu_density my-first-server
 +
</tt>
  
== Limitations and further enhancements ==
+
''Note'': when no policy is specified, the default scheduler configuration will be used, as it has been done before.
This implementation has few limitations, which will be addressed via consequent patches/blueprints:
+
== Design Considerations ==
* The admin needs to ensure that if two flavors have conflicting scheduling policies (e.g., different CPU overcommit levels), then the corresponding instances will not be created on the same host (e.g., by keeping each of the flavors restricted to disjoint sets of aggregates)
+
=== Policy selection ===
* Currently it is not possible to dynamically change the scheduling policy for VM instances provisioned from a given flavor
+
In order to enable flexibility in selection of the scheduling policy, the selection logic will be encapsulated in a separate class, specified in nova.conf. In the first implementation, we will provide a single implementation selecting the scheduler policy based on an explicit scheduler hint, as specified above.
* If the admin wants to manage workloads with the same virtual hardware under different scheduling policies, he will need to create several flavors (for each combination)
 
  
Ultimately, we plan to introduce scheduling policies as 'first class citizens' in Nova (DB, CRUD, association with flavors/aggregates/tenants/etc, etc). This will enable resolving most or all the above limitations.
+
<tt>
 +
[DEFAULT]<BR>
 +
scheduler_policy_selection=SchedulerHintTargetPolicySelection
 +
</tt>
 +
=== Host selection ===
 +
Once a scheduling policy is specified, the scheduler logic needs to restrict the applicable target hosts to those which should be managed with this policy – i.e., hosts in host aggregate(s) that specify the given policy. In order to implement this restriction in FilterScheduler, a new scheduler filter has been implemented, SchedulerPolicyFilter, which filters out hosts associated with different policies. As a special case, hosts which do not belong to an aggregate, or belong to an aggregate without a specified scheduling policy, would be considered as if they have been associated with a default policy, and handled by scheduler configuration specified in the DEFAULT section. Provisioning requests that do not specify a scheduling policy would automatically map to the default policy.
 +
 
 +
=== Potential conflicts between policies ===
 +
Once there are multiple scheduling policies defined in the environment, in some cases it is important to make sure that there are no conflicts between the different policies. For example, this may mean that the same policy that has been used to find placement for an instance at provisioning time should also be used for other operations, such as instance migration. Another example is that the same policy should be used for provisioning of all the instances running on a certain physical machine. However, today it might be possible to include a certain physical host in two different host aggregates, and to associate a different scheduling policy to each of them. In such a case our current implementation will choose the first applicable policy for a given operation – which may cause conflicts. One way to avoid conflicts is to ensure that host aggregates that specify scheduling policy are disjoint. However, we feel that this may be a too strong requirement in some cases. Therefore, the most flexible approach at the moment would be to document this as a best practice, and let the administrator to decide how host aggregates and policies are used in a particular deployment.

Revision as of 11:31, 16 October 2013

Multiple Active Scheduler Drivers/Policies

Summary

Support for multiple active scheduler policies and/or drivers associated with different host aggregates within a single Nova deployment.

Blueprint: https://blueprints.launchpad.net/nova/+spec/multiple-scheduler-drivers

Rationale

In heterogeneous environments, it is often required that different hardware pools are managed under different policies. In Grizzly, basic partitioning of hosts and enforcement of compatibility between flavors and hosts during instance scheduling can be already implemented using host aggregates and FilterScheduler with AggregateInstanceExtraSpecsFilter. However, it is not possible to define, for example, different sets of filters and weights, or even entirely different scheduler drivers associated with different aggregates. For example, the admin may want to have a pool with a conservative CPU overcommit (e.g., for CPU-intensive workloads), and another pool with aggressive CPU over-commit (for workloads which are less CPU-bound). This blueprint introduces a mechanism to overcome this limitation.
Note: while in large-scale geo-distributed environments this can be done with Cells, there is no existing solution within a single (potentially small) Nova deployment.

User Stories

  1. An administrator partitions the managed environment into host aggregates, and associates specialized scheduler configurations (policies) to some or all of the aggregates.
  2. On instance provisioning, the name of a policy is specified using a new scheduler hint
Note: more options to determine the desired policy, perhaps derived from other parameters/properties rather than explicitly specified in the provisioning request, will be considered in the future.

Usage Details

Configuration (user story 1)

The administrator will:

  1. Specify 'default' scheduler driver policy under [DEFAULT] section in nova.conf (e.g., FilterScheduler with CoreFilter) – as usual
  2. Add to nova.conf one or more new sections, dedicated to specifying the different scheduling policy configurations, overriding the defaults – driver and/or associated properties. For example, [high_cpu_density] specifying FilterScheduler with CoreFilter and cpu_allocation_ratio=8, and [low_cpu_density] specifying FilterScheduler with CoreFilter and cpu_allocation_ratio=1. Not that in the above example, since driver and filters are the same, it would not be mandatory to specify them in the specific policy sections.
  3. Specify in nova.conf which policies are enabled (using a new property – e.g., enabled_scheduler_policies=low_cpu_density, high_cpu_density)
  4. Create and populate with hosts one or more host aggregates, as usual.
  5. Set a new metadata key-value pair for one or more of the aggregates, specifying the desired policy to be used for scheduling instances in the corresponding aggregate (e.g., "policy=high_cpu_density").

Example (partial) nova.conf:

[DEFAULT]
scheduler_driver=nova.scheduler.filter_scheduler.FilterScheduler
scheduler_default_filters = CoreFilter, SchedulerPolicyFilter
cpu_allocation_ratio = 4.0

# a list of policies that will be used by this scheduler
enabled_scheduler_policies = low_cpu_density, high_cpu_density

[low_cpu_density]
cpu_allocation_ratio = 1.0

[high_cpu_density]
cpu_allocation_ratio = 8.0

Invocation (user story 2)

The user will invoke an instance provisioning request specifying the desired policy via a dedicated scheduler hint. For example:

$ nova boot --image 1 --flavor 1 --hint target_policy=low_cpu_density my-first-server

Note: when no policy is specified, the default scheduler configuration will be used, as it has been done before.

Design Considerations

Policy selection

In order to enable flexibility in selection of the scheduling policy, the selection logic will be encapsulated in a separate class, specified in nova.conf. In the first implementation, we will provide a single implementation selecting the scheduler policy based on an explicit scheduler hint, as specified above.

[DEFAULT]
scheduler_policy_selection=SchedulerHintTargetPolicySelection

Host selection

Once a scheduling policy is specified, the scheduler logic needs to restrict the applicable target hosts to those which should be managed with this policy – i.e., hosts in host aggregate(s) that specify the given policy. In order to implement this restriction in FilterScheduler, a new scheduler filter has been implemented, SchedulerPolicyFilter, which filters out hosts associated with different policies. As a special case, hosts which do not belong to an aggregate, or belong to an aggregate without a specified scheduling policy, would be considered as if they have been associated with a default policy, and handled by scheduler configuration specified in the DEFAULT section. Provisioning requests that do not specify a scheduling policy would automatically map to the default policy.

Potential conflicts between policies

Once there are multiple scheduling policies defined in the environment, in some cases it is important to make sure that there are no conflicts between the different policies. For example, this may mean that the same policy that has been used to find placement for an instance at provisioning time should also be used for other operations, such as instance migration. Another example is that the same policy should be used for provisioning of all the instances running on a certain physical machine. However, today it might be possible to include a certain physical host in two different host aggregates, and to associate a different scheduling policy to each of them. In such a case our current implementation will choose the first applicable policy for a given operation – which may cause conflicts. One way to avoid conflicts is to ensure that host aggregates that specify scheduling policy are disjoint. However, we feel that this may be a too strong requirement in some cases. Therefore, the most flexible approach at the moment would be to document this as a best practice, and let the administrator to decide how host aggregates and policies are used in a particular deployment.