Jump to: navigation, search

Difference between revisions of "Blueprint-nova-planned-resource-reservation-api"

(Summary)
 
(43 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
* '''Created''': Julien Danjou
 
* '''Created''': Julien Danjou
* '''Contributors''': Julien Danjou, Patrick Petit, François Rossigneux, Julien Carpentier
+
* '''Contributors''': Sylvain Bauza, Swann Croiset, François Rossigneux, Julien Danjou, Patrick Petit, Julien Carpentier
 +
* '''Project''':  Check project [https://launchpad.net/climate Climate on Launchpad]
  
 
== Summary ==
 
== Summary ==
  
This blueprint introduces the concept of a Capacity Leasing Service for OpenStack. With that service, a user with admin privileges can reserve hardware resources that are dedicated to the sole use of a tenant. A lease is a negotiation agreement between the provider and the consumer where the former agrees to make a set of hardware resources (compute and possibly storage) available to the latter, based on a set of lease terms presented by the consumer. The lease terms include the description of the host's capacity and the availability period during which the hardware is reserved.
+
This blueprint introduces the concept of a Capacity Leasing Service (CLS) for OpenStack. With that service, a user with admin privileges can reserve hardware resources that are dedicated to the sole use of a tenant. A lease is a negotiation agreement between the provider and the consumer where the former agrees to make a set of hardware resources (compute and possibly storage) available to the latter, based on a set of lease terms presented by the consumer. The lease terms include the description of the host's capabilities and the availability period during which the hardware is reserved.
  
 
We are thinking of three kinds of lease terms:
 
We are thinking of three kinds of lease terms:
Line 12: Line 13:
 
* Immediate lease: where resources are provisioned immediately or not at all
 
* Immediate lease: where resources are provisioned immediately or not at all
  
Once a lease is created, nobody but the users of the tenant can use the reserved resources during the period of the lease. When a lease ends, nothing disruptive happens to the instances that have been scheduled on the reserved resources. The hardware resources simply return to the common pool and so become available to other tenants for VM scheduling.
+
Once a lease is created, nobody but the users of the tenant can use the reserved resources during the period of the lease. When a lease ends, nothing disruptive happens to the instances that have been scheduled on the reserved resources. The hardware resources simply return to the common pool and so become available to other tenants for instance scheduling.
A lease is potentially a billable item for which customers can be charged a flat fee or a premium price for each VM scheduled on reserved hardware and so usage of resource leases should be accounted for through Ceilometer.
+
 
 +
The CLS should allow consumer to renegotiate a lease to for example extend the reservation period would the resources be available to satisfy the renegotiation request.
 +
 
 +
A lease is potentially a billable item for which customers can be charged a flat fee or a premium price for each instance scheduled on reserved resources and so usage of leased resource should be accounted for through Ceilometer.
  
 
See Launchpad Blueprint: [https://blueprints.launchpad.net/nova/+spec/planned-resource-reservation-api Planned resource reservation API]
 
See Launchpad Blueprint: [https://blueprints.launchpad.net/nova/+spec/planned-resource-reservation-api Planned resource reservation API]
  
 
== Rationale ==
 
== Rationale ==
 +
There are situations where the reservation of hardware resources are desired to satisfy peaks of load that are known in advance. This is especially true for small scale cloud infrastructures where the co-scheduling of a large amount of compute instances is necessary. It is expected to also satisfy the needs of urgent computations, comply with regulations or security policies which proscribe the co-habitation of multiple tenants on the same physical host and address a class of high performance computing requirements such as avoiding disturbing noises generated by multi-tenancy activities.
  
In the request, the user specifies the properties of the lease:
+
In addition to this, the expected benefits of the CLS is to foster energy efficiency (greener) behaviors through promoting nodes with the lowest MFLOPS/Watt ratio and save electricity by automatically turning machine' power on and off based on their leasing schedule. The CLS should also help operators with the management of their physical assets through a more effective capacity planing based on the reservation schedules.
# region
 
# availability zone
 
# cell?
 
# required hosts capabilities,
 
# minimum number of nodes,
 
# no-latter-than starting date
 
# duration in number of days and hours
 
  
In return, the service displays a list of lease handles (sorted in chronological order) containing a schedule proposals (starting date and time), and other optional (provider's specific?) parameters.
+
Last but not least. The Capacity Leasing Service is also expected to address a class of performance needs for workloads that are typical of the high performance computing world whereby applications require to be executed on dedicated nodes of same hardware specification and speed (CPU arch, model, and clock frequency)
The user can then create, and so activate the lease, using one of the lease handles he obtained from the previous step.
 
  
 +
== Detailed Description ==
  
== More Advanced Use Cases ==
+
The Capacity Leasing Service (CLS) behaves like the Filter Scheduler to assert the match making between the properties of the lease and the capabilities of the host. In fact, the CLS should accept all the standard filters consumed by the Filter Scheduler. A lease request should include the following properties:
  
More advanced usage scenarios will be enabled through the use of plug-able modules to backend systems that will affect the content of the lease handles returned by the service. Beyond reserving capacity, the expected benefits of the Capacity Leasing Service is to foster energy efficiency (greener?) behaviors through promoting nodes with the lowest MFLOPS/Watt ratio, or save electricity by automatically turning machines on and off based on the leasing schedule. In addition, the service should help operators with the management of their physical assets in a more effective way thanks to the enablement of capacity planing and automated fulfillment processes based again on the leasing schedule.
+
* the region
 +
* the availability zone
 +
* the host capabilities extra specs (scoped and non-scoped format should be accepted)
 +
* the number of CPU cores
 +
* the amount of free RAM
 +
* the amount of free disk space
 +
* the number of hosts
 +
* the type of lease (SCHEDULE, BEST-EFFORT, IMMEDIAT)
 +
* the staring date of the lease if of type SCHEDULE
 +
* the duration in days and hours of the lease
 +
* a timeout
 +
* ...
  
Last but not least. The Capacity Leasing Service is also expected to address a class of performance needs and types of workload that are typical of the high performance computing world, whereby applications require to be executed on dedicated nodes of similar hardware specification and speed (CPU arch, model, and clock frequency)
+
The CLS, primarily checks that the capabilities provided by the host satisfy any extra specifications associated with the properties of the lease and applies all enabled subsequent filters if any.
  
As a result, capacity leasing requests to the service should allow users to specify host capabilities parameters criteria that are compatible, or even the same, as those used by the ComputeCapabilitiesFilter which are known as the instance type extra specifications.
+
The CLS then checks in its database that any of eligible hosts are not already reserved for the requested the period. Then depending on the type of lease, the CLS performs different types of actions depending on whether the lease request can be fulfilled or not.
  
For exemple:
+
# If the lease is IMMEDIATE and the request can be fulfilled, the CLS creates an aggregate for the list of eligible hosts with a special metadata key ''filter_lease_id'' that contains the unique id of the lease, returns SUCCESS with the ID of the lease and marks the lease as ACTIVE state. If the lease cannot be fulfilled, the CLS returns a FAILURE status.
* memory_mb == 22000
+
# If the lease is BEST-EFFORT and the request cannot be fulfilled immediately, the CLS starts some sort of scavenger hunt which has for objective to move away any instance that belongs to some other tenants out of the list of eligible hosts. This operation can be timely and fairly complex and so different strategies may be applied depending on heuristic factors such as the number, type and state of the instances to be migrated. The CLS should assert that there are at least enough potential candidates for the migration prior to starting the actual migration. If the CLS decides to start the migration, it returns a SUCCESS status with the ID of the lease and marks the lease as IN-PROGRESS state. If the CLS decides not to start the migration, it returns directly a FAILURE status. If the scavenger hunt succeeds to make the list of eligible hosts available for the lease before the timeout is triggered, the CLS marks the lease as ACTIVE state. Conversely, if the scavenger hunt doesn't succeed, the CLS marks the lease as TIMEDOUT.
* vcpus == 8
+
# Finally, if the lease is SCHEDULE and the request can be fulfilled for the requested period according to the reservation schedule, the CLS creates an aggregate for the list of eligible hosts with a special metadata key ''filter_lease_id'' that contains the unique id of the lease, returns SUCCESS with the ID of the lease and marks the lease as INACTIVE state until the reservation effectively begins at which point in time, the CLS marks the lease as ACTIVE state. If the request cannot be fulfilled, the CLS simply returns a FAILURE status.
* local_gb == 1690
 
* cpu_arch == "x86_64"
 
* cpu_info == '{"model":"Nehalem", "features":["tdtscp", "xtpr"]}'  
 
* xpu_arch = "fermi"
 
* xpus = 2
 
* xpu_info ='{"model":"Tesla 2050", "gcores":"448"}'
 
* net_arch = "ethernet"
 
* net_info = '{"encap":"Ethernet", "MTU":"8000"}'
 
* net_mbps = 10000
 
* hypervisor_ype == QEMU
 
 
As for the ComputeCapabilitiesFilter, the extra specification parameters used by the Capacity Leasing Service should support an operator at the beginning of the value string of a key/value pair. If there is no operator specified, then a default operator of ‘s==’ is used.
 
  
As a recap, valid operators are:
+
On the Nova Scheduler side, a filter is in charge of enforcing the rule that 
  
* = (equal to or greater than as a number; same as vcpus case)
+
=== Open Issues ===
* == (equal to as a number)
+
It is unclear how to guarantee consistency of the lease in situation of race conditions between the CLS and Nova Scheduler
* != (not equal to as a number)
 
* >= (greater than or equal to as a number)
 
* <= (less than or equal to as a number)
 
* s== (equal to as a string)
 
* s!= (not equal to as a string)
 
* s>= (greater than or equal to as a string)
 
* s> (greater than as a string)
 
* s<= (less than or equal to as a string)
 
* s< (less than as a string)
 
* <in> (substring)
 
* <or> (find one of these)
 
 
 
Examples are: ">= 5", "s== 2.1.0", "<in> gcc", and "<or> fpu <or> gpu"
 
  
 
== Design ==
 
== Design ==
Line 83: Line 67:
 
If an instance is created as being part of a lease, the scheduler has to launch the instance with the requirements fulfilled.
 
If an instance is created as being part of a lease, the scheduler has to launch the instance with the requirements fulfilled.
 
[[File:Nova-reservation-design.png||center]]
 
[[File:Nova-reservation-design.png||center]]
[[File:Flow-design.png||center]]
 
 
== Implementation ==
 
TBD
 
 
== Test Plan ==
 
TBD
 
----
 
[[Category:Spec]]
 

Latest revision as of 14:11, 14 December 2013

  • Created: Julien Danjou
  • Contributors: Sylvain Bauza, Swann Croiset, François Rossigneux, Julien Danjou, Patrick Petit, Julien Carpentier
  • Project: Check project Climate on Launchpad

Summary

This blueprint introduces the concept of a Capacity Leasing Service (CLS) for OpenStack. With that service, a user with admin privileges can reserve hardware resources that are dedicated to the sole use of a tenant. A lease is a negotiation agreement between the provider and the consumer where the former agrees to make a set of hardware resources (compute and possibly storage) available to the latter, based on a set of lease terms presented by the consumer. The lease terms include the description of the host's capabilities and the availability period during which the hardware is reserved.

We are thinking of three kinds of lease terms:

  • Schedule lease: where resources must be provisioned at a specific date and time
  • Best-effort lease: where resources are provisioned as soon as possible
  • Immediate lease: where resources are provisioned immediately or not at all

Once a lease is created, nobody but the users of the tenant can use the reserved resources during the period of the lease. When a lease ends, nothing disruptive happens to the instances that have been scheduled on the reserved resources. The hardware resources simply return to the common pool and so become available to other tenants for instance scheduling.

The CLS should allow consumer to renegotiate a lease to for example extend the reservation period would the resources be available to satisfy the renegotiation request.

A lease is potentially a billable item for which customers can be charged a flat fee or a premium price for each instance scheduled on reserved resources and so usage of leased resource should be accounted for through Ceilometer.

See Launchpad Blueprint: Planned resource reservation API

Rationale

There are situations where the reservation of hardware resources are desired to satisfy peaks of load that are known in advance. This is especially true for small scale cloud infrastructures where the co-scheduling of a large amount of compute instances is necessary. It is expected to also satisfy the needs of urgent computations, comply with regulations or security policies which proscribe the co-habitation of multiple tenants on the same physical host and address a class of high performance computing requirements such as avoiding disturbing noises generated by multi-tenancy activities.

In addition to this, the expected benefits of the CLS is to foster energy efficiency (greener) behaviors through promoting nodes with the lowest MFLOPS/Watt ratio and save electricity by automatically turning machine' power on and off based on their leasing schedule. The CLS should also help operators with the management of their physical assets through a more effective capacity planing based on the reservation schedules.

Last but not least. The Capacity Leasing Service is also expected to address a class of performance needs for workloads that are typical of the high performance computing world whereby applications require to be executed on dedicated nodes of same hardware specification and speed (CPU arch, model, and clock frequency)

Detailed Description

The Capacity Leasing Service (CLS) behaves like the Filter Scheduler to assert the match making between the properties of the lease and the capabilities of the host. In fact, the CLS should accept all the standard filters consumed by the Filter Scheduler. A lease request should include the following properties:

  • the region
  • the availability zone
  • the host capabilities extra specs (scoped and non-scoped format should be accepted)
  • the number of CPU cores
  • the amount of free RAM
  • the amount of free disk space
  • the number of hosts
  • the type of lease (SCHEDULE, BEST-EFFORT, IMMEDIAT)
  • the staring date of the lease if of type SCHEDULE
  • the duration in days and hours of the lease
  • a timeout
  • ...

The CLS, primarily checks that the capabilities provided by the host satisfy any extra specifications associated with the properties of the lease and applies all enabled subsequent filters if any.

The CLS then checks in its database that any of eligible hosts are not already reserved for the requested the period. Then depending on the type of lease, the CLS performs different types of actions depending on whether the lease request can be fulfilled or not.

  1. If the lease is IMMEDIATE and the request can be fulfilled, the CLS creates an aggregate for the list of eligible hosts with a special metadata key filter_lease_id that contains the unique id of the lease, returns SUCCESS with the ID of the lease and marks the lease as ACTIVE state. If the lease cannot be fulfilled, the CLS returns a FAILURE status.
  2. If the lease is BEST-EFFORT and the request cannot be fulfilled immediately, the CLS starts some sort of scavenger hunt which has for objective to move away any instance that belongs to some other tenants out of the list of eligible hosts. This operation can be timely and fairly complex and so different strategies may be applied depending on heuristic factors such as the number, type and state of the instances to be migrated. The CLS should assert that there are at least enough potential candidates for the migration prior to starting the actual migration. If the CLS decides to start the migration, it returns a SUCCESS status with the ID of the lease and marks the lease as IN-PROGRESS state. If the CLS decides not to start the migration, it returns directly a FAILURE status. If the scavenger hunt succeeds to make the list of eligible hosts available for the lease before the timeout is triggered, the CLS marks the lease as ACTIVE state. Conversely, if the scavenger hunt doesn't succeed, the CLS marks the lease as TIMEDOUT.
  3. Finally, if the lease is SCHEDULE and the request can be fulfilled for the requested period according to the reservation schedule, the CLS creates an aggregate for the list of eligible hosts with a special metadata key filter_lease_id that contains the unique id of the lease, returns SUCCESS with the ID of the lease and marks the lease as INACTIVE state until the reservation effectively begins at which point in time, the CLS marks the lease as ACTIVE state. If the request cannot be fulfilled, the CLS simply returns a FAILURE status.

On the Nova Scheduler side, a filter is in charge of enforcing the rule that

Open Issues

It is unclear how to guarantee consistency of the lease in situation of race conditions between the CLS and Nova Scheduler

Design

A reservation, or lease, is tight to a project. It has a start and an end timestamp, during which the lease is valid. It also has a number of nodes and their flavors associated with, so it can be quantified. A lease has a set of scheduler hints set that are immutable. An API call allows a user to retrieve the list and combination of applicable hints. When a user tries to creates a lease, the list of scheduler hints is checked for validity: an operator can refuse a lease with invalid or too strict hints.

When an instance is created, it's registered as being part of the lease when the user passes the information at creation time. It's taken from the lease when it's destroyed. If an instance is created as being part of a lease, the scheduler has to launch the instance with the requirements fulfilled.

Nova-reservation-design.png