Nova/CachingScheduler

This wiki page describes the experimental scheduler that is being suggested here: https://blueprints.launchpad.net/nova/+spec/caching-scheduler

How it works
Lets take a look at what we have now, and what this blueprint changes...

What we have today...
We can roughly describe scheduling as:
 * load the current state (expensive, very expensive with current DB call)
 * use filters and weights to help pick a host (expensive with lots of hosts)

With the Caching Scheduler we do this...
The caching scheduler splits this into:
 * populate cache in periodic task
 * when we get a users request, pick the best host from the list of cached hosts

Some observations about this approach:
 * user no longer needs to wait for expensive get_all_hosts, runs before their request happens
 * user still runs through weights, but its on a much reduced list of hosts, only the hosts in the cache
 * but maintaining the cache is tricky, and tuning the cache may be trickier, but lets cover this next

Lets look at what happens in the periodic background task, for the first time:
 * admin configure a list of flavors, and how many "slots" for each flavor to reserve
 * in a periodic task we populate the cache
 * we try to fill all the requested number of slots for each flavor, iterating through in the order in the cache list
 * if you want to reserve slots of larger instance types, put them early in the list
 * from the flavor, we generate a partial request_spec, run that through the existing filters and weights logic in host manager
 * the best match is saved in the cache

On the second run of the periodic task:
 * loop through the slots for each flavor, in the order defined in the cache list
 * ensure we still have capacity for each slot on the given host
 * slots for hosts that are dead or now full, are deleted from the cache
 * we are not just replacing all the slots with new ones to reduce races with current user requests
 * during this loop we claim resources as if we were building an instance on that host, so the next steps don't pick space we have already claimed
 * now we loop around the flavors, adding as many extra slots to get back up to the requested number of slots

When a user requests a server, this is how we pick a host:
 * Look up the list of slots for the given flavor
 * if cache is empty, attempt to repopulate the cache, then try again, then fail with NoValidHost
 * generate a list of host info to send through the weights
 * pick the best host (with an added bit of randomness, in the usual way)
 * remove the associated slot from the cache
 * if success full, return the picked host
 * if we raced on claiming the slot, retry the above process, but no need to repopulate the cache

This just gives you an idea of what is happening. Over time, we can try different strategies, and hopefully evolve a better strategy. As part of this optimisation work, we will develop am automatic test harness to compare different schedulers.

Performance Comparison to existing Filter Scheduler
Some notes on what is planned:
 * There is no attempt to get all the features of the filter scheduler, the aim is to be very fast, and scalable
 * Currently, it performs very badly when the cache is empty, i.e. under conditions where you are low on capacity
 * Its probably we can use a better retry strategy to reduce the cost of the cache miss

We need to compare this to the existing scheduler, but for now it can be assumed to be worse in every way.

Right now, the caching scheduler is just an interesting toy that may lead to something great...

Other Notes about the Caching Scheduler
Please note:
 * is likely to be experimental in Icehouse
 * looking into testing this in pre-production at Rackspace, hope to have more numbers soon
 * this re-uses host manager, so we can use existing filter and weights
 * current way groups and affinity are done appears to be lost

Configuration
Long term, it would be good if the cache could learn how it should be setup.

Right now, we are making all the tuning static configuration. Please note, as it is experimental, this configuration may change at any time. Hopefully it will be stable before Icehouse is released.

Lets consider asking for a cache of 10 m1.tiny (call it id 1) and 5 m1.small (call it id 2), and look to refresh the cache every 60 seconds. The config would look like this:

[caching_scheduler]


 * 1) Options defined in nova.scheduler.caching_scheduler
 * 1) Options defined in nova.scheduler.caching_scheduler

cache_list=1,2
 * 1) Specification of the cache lists, a list of flavor_id.
 * 2) (multi valued)

weights=2,1
 * 1) Proportion by which to weight number of each flavor. If not
 * 2) defined, all our evenly weighted (multi valued)

factor=5.0
 * 1) Number of slots to keep for each flavor with a weight of
 * 2) one. (floating point value)

poll_period=60
 * 1) How often to refresh the cached slots in seconds. (integer value)

Future work
We currently have blueprints to look at:
 * Currently this only works with a single scheduler, need to look at either:
 * sharing the cache between scheduler, worker populates cache
 * locking resources between schedulers (two phase commit), but still have each scheduler with their own cache
 * shard hosts so each scheduler deals with a separate list of hosts
 * https://blueprints.launchpad.net/nova/+spec/caching-scheduler-multi-host-decentralised
 * Caching by flavor is probably not always enough, we can look at a more complex cache key, at least including os_type
 * https://blueprints.launchpad.net/nova/+spec/caching-scheduler-custom-cache-key