Difference between revisions of "SchedulerRaceReduction"
Line 11: | Line 11: | ||
== Overview == | == Overview == | ||
− | The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. run_instance) to the same compute host. This situation may oversubscribe the given compute host and cause one or more run_instance requests to fail. | + | The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. ''run_instance'') to the same compute host. This situation may oversubscribe the given compute host and cause one or more ''run_instance'' requests to fail. |
+ | |||
+ | == Impact == | ||
+ | |||
+ | Instance build requests may fail, even if other compute hosts are available with free resources. | ||
+ | |||
+ | == Solution == | ||
+ | |||
+ | * Compute hosts should have the final say over whether a ''run_instanc'' request can be properly serviced. To this end, the compute host must be capable of identify whether it has free resources when a new ''run_instance'' request arrives. | ||
+ | * Compute hosts should serially verify resources available for ''run_instance'' requests to avoid concurrent competition by multiple callers. | ||
+ | * Schedulers should read the response to ''run_instance'' and possibly retry the request at a different compute host. | ||
[https://blueprints.launchpad.net/nova/+spec/scheduler-resource-race Blueprint] | [https://blueprints.launchpad.net/nova/+spec/scheduler-resource-race Blueprint] |
Revision as of 21:23, 11 June 2012
SchedulerRaceReduction
Time: <<DateTime(2012-06-11T20:42:43Z)>>
Drafter: belliott
Overview
The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. run_instance) to the same compute host. This situation may oversubscribe the given compute host and cause one or more run_instance requests to fail.
Impact
Instance build requests may fail, even if other compute hosts are available with free resources.
Solution
- Compute hosts should have the final say over whether a run_instanc request can be properly serviced. To this end, the compute host must be capable of identify whether it has free resources when a new run_instance request arrives.
- Compute hosts should serially verify resources available for run_instance requests to avoid concurrent competition by multiple callers.
- Schedulers should read the response to run_instance and possibly retry the request at a different compute host.