Difference between revisions of "SchedulerRaceReduction"
m (Text replace - "__NOTOC__" to "") |
|||
(3 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ||
<!-- ##master-page:[[ProposalTemplate]] --> | <!-- ##master-page:[[ProposalTemplate]] --> | ||
<!-- #format wiki --> | <!-- #format wiki --> | ||
Line 5: | Line 5: | ||
== [[SchedulerRaceReduction]] == | == [[SchedulerRaceReduction]] == | ||
− | |||
'''Drafter: '''[[belliott]] | '''Drafter: '''[[belliott]] | ||
Line 12: | Line 11: | ||
The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. ''run_instance'') to the same compute host. This situation may oversubscribe the given compute host and cause one or more ''run_instance'' requests to fail. | The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. ''run_instance'') to the same compute host. This situation may oversubscribe the given compute host and cause one or more ''run_instance'' requests to fail. | ||
+ | |||
+ | == Example == | ||
+ | |||
+ | Compute host ''C'' has 3 GB of ram free. | ||
+ | |||
+ | # Scheduler ''A'' sends a ''run_instance'' request ''R1'' to ''C'' trying to build a 2GB instance. | ||
+ | # Scheduler ''B'' sends a ''run_instance'' request ''R2'' to ''C'' trying to build a 2GB instance. | ||
+ | # Assume processing of ''R1'' and ''R2'' begins concurrently on ''C''. | ||
+ | |||
+ | Obviously ''C'' cannot handle both requests, so at least 1 will fail. | ||
== Impact == | == Impact == |
Latest revision as of 23:30, 17 February 2013
SchedulerRaceReduction
Drafter: belliott
Overview
The scheduler is subject to a race condition which can cause it to incorrectly identify available resources on a particular compute host. The problem occurs if multiple scheduler instances/threads concurrently issue an instance build request (i.e. run_instance) to the same compute host. This situation may oversubscribe the given compute host and cause one or more run_instance requests to fail.
Example
Compute host C has 3 GB of ram free.
- Scheduler A sends a run_instance request R1 to C trying to build a 2GB instance.
- Scheduler B sends a run_instance request R2 to C trying to build a 2GB instance.
- Assume processing of R1 and R2 begins concurrently on C.
Obviously C cannot handle both requests, so at least 1 will fail.
Impact
Instance build requests may fail, even if other compute hosts are available with free resources.
Solution
- Compute hosts should have the final say over whether a run_instance request can be properly serviced. To this end, the compute host must be capable of identify whether it has free resources when a new run_instance request arrives.
- Compute hosts should serially verify resources available for run_instance requests to avoid concurrent competition by multiple callers.
- Schedulers should read the response to run_instance and possibly retry the request at a different compute host.