Latest revision as of 23:30, 17 February 2013

Volume Type aware Scheduler

At the Essex Design Summit it was agreed that a working group would be established to improve nova-volume and volume-related scheduler code.

This page tries to cover the first effort of this group to create a basic Volume-type aware scheduler. More advanced volume schedulers to follow as well as integration with Distributed/Zone-aware Schedulers.

Overview

Similarly to ability to schedule instances on nodes with particular HW properties, there is a need to create volumes on nodes connected to storage of particular type. The storage might be directly connected to a single node (internal disks, DAS JBODs/Arrays, etc) or to multiple nodes (shared DAS, SAN, etc).

The idea of this approach is to use a flexible mechanism of volume types allowing to define storage types with different extra-specs (key/value pairs).

Nova volume drivers will report properties/specs of connected storage on every node. They may add some additional properties like quantities, access details, etc. This information will be forwarded to schedulers through the same mechanism as used for compute nodes.

It will be up to schedulers to find the best node matching volume creation criteria. In the simplest case the scheduler could just find the node supporting properties of particular volume type. More advanced schedulers will be able to load-balance volumes based on quantities or access patterns. It will be also possible to create generic schedulers and schedulers per each volume type.

Design

Volume types

Cloud administrator will need to create volume types that will be compliant with volume drivers they plan to use. All volume type properties (one or more) will be stored in extra_specs table in form of key/value pairs.

Exemplary keys might include properties like type of drive (SATA/SAS/SSD), RPM, etc. At the same time for some environments it will be necessary to store things like "storage class": iSCSI, FC SAN, local drives.

Nova-volume & drivers

Volume manager on every participating node will collect information from all its drivers about supported capabilities. Such capabilities might be reported together with quantities and other additional information.

Note: we will need to reserve some keywords like 'storage_class', 'total', 'free'. We could either use total/free as some abstract values affecting scheduling decisions or have them in particular units (like GBs). In this case Scheduler will be able to check the availability of requested amount of storage.

The current Diablo code supports reporting volume capabilities through method get_volume_stats(). It has an optional parameter 'refresh' that might be used for performing a rescan/discovery of underlying H/W (not used currently).

All capabilities will be reported to schedulers using update_service_capabilities()

Question: should we support multiple volume drivers per node?
Question: As part of reporting capabilities functionality, drivers could check the DB and assign volume types to reported storage classes. In this case matching on Scheduler level might be easier (but not as flexible). Do we want it?

Scheduler

On Scheduler level, capabilities from all nova-volume nodes are automatically stored in in-memory repository similarly to capabilities from "compute" and other nodes (in zone_manager.service_states)

Create Volume requests will arrive with all necessary volume_type information. The generic volume-type aware scheduler will:

Retrieve volume type key/value pairs for requested volume type
Filter nodes reporting availability of these pairs
Select the most appropriate node (plugable sub-classes):
- any random node
- the node with min number of scheduled volumes (based on DB data)
- the node with min used capacity (same as above, based on DB data)
- the node with max available capacity (based on data reported by volume drivers)

We could allow registration of schedulers for particular volume types. In this case generic volume-type scheduler will pass request to such scheduler.

If volume type for volume was not set we could either:

pick any node
register some properties for "default" volume type

In order to make the system even more flexible we could add additional specs that will be specified on "per volume" bases. For that volume's create() API will receive new optional "specs" parameter that will be forwarded to the scheduler as part of args. On scheduler level we will combine extra_specs from volume type with these new specs and will try to find the node matching combined criteria

Examples

In the simplest case list of all supported volumes type might look like:

[ {'id': 1, 'name': 'SATA volume', 'extra_specs': {'type': 'SATA'},

{'id': 2, 'name': 'SAS volume', 'extra_specs': {'type': 'SAS'}, ... ]

Volume drivers could report capabilities like:

[ {'type': 'SATA', 'RPM': 7200, ..., 'total': 4096, 'free': 1024},

{'type': 'SAS', 'RPM': 15000, ..., 'total': 1500, 'free': 500}, ... ]

If scheduler will receive a request to create volume of type 'SATA volume' it will select all hosts reporting 'type': 'SATA' in their capabilities and from them will choose the most suited one.

Revision as of 18:54, 24 October 2011 (view source) VladimirPopovski (talk) ← Older edit		Latest revision as of 23:30, 17 February 2013 (view source) Ryan Lane (talk \| contribs) m (Text replace - "__NOTOC__" to "")
Line 1:		Line 1:
−	~~__NOTOC__~~	+
	= Volume Type aware Scheduler =		= Volume Type aware Scheduler =

Difference between revisions of "VolumeTypeScheduler"