Revision as of 19:19, 18 October 2011

Volume Type aware Scheduler

At the Essex Design Summit it was agreed that a working group would be established to improve nova-volume and volume-related scheduler code.

This page tries to cover the first effort of this group to create a basic Volume-type aware scheduler. More advanced volume scheduler to follow as well as integration with Distributed Schedulers.

Overview

Similarly to ability to schedule instances on nodes with particular HW properties, there is a need to create volumes on nodes connected to storage of particular type. The storage might be directly connected to a single node (internal disks, DAS JBODs/Arrays, etc) or to multiple nodes (shared DAS, SAN, etc).

The idea of this approach is to use a flexible mechanism of volume types allowing to define storage types with different extra-specs (key/value pairs).

Nova volume drivers will report properties/specs of connected storage on every node. They may add some additional properties like quantities, access details, etc. This information will be forwarded to schedulers through the same mechanism as used for compute nodes.

It will be up to schedulers to find the best node matching volume creation criteria. In the simplest case scheduler could just find the node supporting properties of particular volume type. More advanced schedulers will be able to load-balance volumes based on quantities or access patterns. It will be also possible to create generic schedulers and schedulers per each volume type.

Design

Volume types

Cloud administrator will need to create volume types that will be compliant with volume drivers they are planning to use. All volume type properties (one or more) will be stored in extra_specs table in form of key/value pairs.

Exemplary keys might include properties like type of drive (SATA/SAS/SSD), RPM, etc. At the same time for some environments it will be necessary to store things like "storage class" - iSCSI, FC SAN, local drives

Nova-volume & drivers

Volume manager on every participating node will collect information from all its drivers about supported capabilities. Such capabilities might be reported together with quantities and other additional information.

Note: we will need to reserve some keywords like 'storage_class', 'total', 'free'. We could either use them as some abstract values affecting scheduling decisions or have them in particular units (like GBs). In this case Scheduler will be able to check the availability of requested amount of storage.

The current Diablo code supports reporting volume capabilities through method get_volume_stats(). It has an optional parameter 'refresh' that might be used for performing a rescan/discovery of underlying H/W (not used currently).

All capabilities will be reported to schedulers using update_service_capabilities()

Question: should we support multiple volume drivers per node?
Question: As part of reporting capabilities functionality, drivers could check the DB and assign volume types to reported storage classes. In this case matching on Scheduler level might be easier (but not as flexible). Do we want it?

Scheduler

On Scheduler level, capabilities from all nova-volume nodes are automatically stored in in-memory repository similarly to capabilities from "compute" and other nodes (in zone_manager.service_states)

Create Volume requests will arrive with all necessary volume_type information. The generic volume-type aware scheduler will:

Retrieve volume type key/value pairs for requested volume type
Filter nodes reporting availability of these pairs
Select the most appropriate node (plugable sub-classes):
- any random node
- the node with min number of scheduled volumes (based on DB data)
- the node with min used capacity (same as above, based on DB data)
- the node with max available capacity (based on data reported by volume drivers)

We could allow registration of schedulers for particular volume types. In this case generic volume-type scheduler will pass request to such scheduler.

If volume type for volume was not set we could either:

pick any node
register some properties for "default" volume type

Examples

In the simplest case list of all supported volumes type might look like:

[ {'id': 1, 'name': 'SATA volume', 'extra_specs': {'type': 'SATA'},

{'id': 2, 'name': 'SAS volume', 'extra_specs': {'type': 'SAS'}, ... ]

Volume drivers could report capabilities in form of key/value pairs:

[ {'type': 'SATA', 'RPM': 7200, ..., 'total': 4096, 'free': 1024},

{'type': 'SAS', 'RPM': 15000, ..., 'total': 1500, 'free': 500}, ... ]

If scheduler will receive a request to create volume of type 'SATA volume' it will filter all host reporting 'type': 'SATA' in their capabilities and from them will choose the most suited one.

@@ Line 4: / Line 4: @@
 At the Essex Design Summit it was agreed that a working group would be established to improve nova-volume and volume-related scheduler code.
-This page tries to cover the first effort of this group to create a basic Volume-type aware scheduler. More advanced volume scheduler to follow as well as integration with Distributed Scheduler.
+This page tries to cover the first effort of this group to create a basic Volume-type aware scheduler. More advanced volume scheduler to follow as well as integration with Distributed Schedulers.
 == Overview ==
-Similarly to instance-type aware schedulers there is a need to create volumes on nodes with particular properties. In the simplest case, nodes may report availability of different types of storage (SATA/SAS/SSD).
+Similarly to ability to schedule instances on nodes with particular HW properties, there is a need to create volumes on nodes connected to storage of particular type.
+The storage might be directly connected to a single node (internal disks, DAS JBODs/Arrays, etc) or to multiple nodes (shared DAS, SAN, etc).
-The idea is to collect information about available volume types from volume drivers and report to schedulers. The scheduler will pick the most appropriate node based on data reported by volume nodes.
+The idea of this approach is to use a flexible mechanism of volume types allowing to define storage types with different extra-specs (key/value pairs).
-At the beginning the scheduler could perform only basic matching of reported volume types.
-The more advanced version of Scheduler could perform distribution of volumes based on load, translation and matching of volume type's extra-specs with parameters reported by volume driver.
-It will be also possible to create schedulers responsible for particular volume types.
+Nova volume drivers will report properties/specs of connected storage on every node. They may add some additional properties like quantities, access details, etc.
+This information will be forwarded to schedulers through the same mechanism as used for compute nodes.
+It will be up to schedulers to find the best node matching volume creation criteria.
+In the simplest case scheduler could just find the node supporting properties of particular volume type. More advanced schedulers will be able to load-balance volumes based on quantities or access patterns.
+It will be also possible to create generic schedulers and schedulers per each volume type.
 == Design ==
-Volume drivers willing to participate in the new reporting scheme will need to reply with the list containing:
+=== Volume types ===
+Cloud administrator will need to create volume types that will be compliant with volume drivers they are planning to use.
+All volume type properties (one or more) will be stored in extra_specs table in form of key/value pairs.
+Exemplary keys might include properties like type of drive (SATA/SAS/SSD), RPM, etc.
+At the same time for some environments it will be necessary to store things like "storage class" - iSCSI, FC SAN, local drives
+=== Nova-volume & drivers ===
+Volume manager on every participating node will collect information from all its drivers about supported capabilities.
+Such capabilities might be reported together with quantities and other additional information.
+* ''Note: we will need to reserve some keywords like 'storage_class', 'total', 'free'. We could either use them as some abstract values affecting scheduling decisions or have them in particular units (like GBs). In this case Scheduler will be able to check the availability of requested amount of storage. ''
+The current Diablo code supports reporting volume capabilities through method get_volume_stats().
+It has an optional parameter 'refresh' that might be used for performing a rescan/discovery of underlying H/W (not used currently).
+All capabilities will be reported to schedulers using update_service_capabilities()
+* '''Question: should we support multiple volume drivers per node?'''
+* '''Question: As part of reporting capabilities functionality, drivers could check the DB and assign volume types to reported storage classes. In this case matching on Scheduler level might be easier (but not as flexible). Do we want it?'''
+=== Scheduler ===
+On Scheduler level, capabilities from all nova-volume nodes are automatically stored in '''in-memory''' repository similarly to capabilities from "compute" and other nodes (in zone_manager.service_states)
+Create Volume requests will arrive with all necessary volume_type information. The generic volume-type aware scheduler will:
+* Retrieve volume type key/value pairs for requested volume type
+* Filter nodes reporting availability of these pairs
+* Select the most appropriate node (plugable sub-classes):
+** any random node
+** the node with min number of scheduled volumes (based on DB data)
+** the node with min used capacity (same as above, based on DB data)
+** the node with max available capacity (based on data reported by volume drivers)
+We could allow registration of schedulers for particular volume types. In this case generic volume-type scheduler will pass request to such scheduler.
+If volume type for volume was not set we could either:
+* pick any node
+* register some properties for "default" volume type
+=== Examples ===
+In the simplest case list of all supported volumes type might look like:
+[
+{'id': 1, 'name': 'SATA volume', 'extra_specs': {'type': 'SATA'},
+{'id': 2, 'name': 'SAS volume',  'extra_specs': {'type': 'SAS'},
+...
+]
-* Volume type id
+Volume drivers could report capabilities in form of key/value pairs:
-* Total number of elements (in some abstract units)
-* Number of free/available elements
-* Additional characteristics for volume type
-These capabilities will be reported to schedulers using update_service_capabilities()
+[
+{'type': 'SATA', 'RPM': 7200, ..., 'total': 4096, 'free': 1024},
-On Scheduler level, capabilities from all nova-volume nodes are automatically stored in in-memory repository similarly to capabilities from "compute" and other nodes (in zone_manager.service_states)
+{'type': 'SAS', 'RPM': 15000, ..., 'total': 1500, 'free': 500},
+...
+]
-* Retrieve volume type for requested volume allocation
+If scheduler will receive a request to create volume of type 'SATA volume' it will filter all host reporting 'type': 'SATA' in their capabilities and from them will choose the most suited one.
-* Filter nodes reporting availability of requested volume types
-* Select the most appropriate node based on

Difference between revisions of "VolumeTypeScheduler"