Jump to: navigation, search

Difference between revisions of "Persistent resource claim"

(Benefit)
Line 3: Line 3:
  
 
When a resource is put back to compute resource tracker through drop_resize_claim() or abort_instance_claim(), the claim (or simply claim ID) will be passed into the resource tracker, to free the resource.
 
When a resource is put back to compute resource tracker through drop_resize_claim() or abort_instance_claim(), the claim (or simply claim ID) will be passed into the resource tracker, to free the resource.
 +
 +
See below for the benefit of this changes.
  
 
=== Implementation Plan ===
 
=== Implementation Plan ===
Line 53: Line 55:
 
*Also with this changes, it will be easy to check the resource usage for all instance, all host, etc, even including the overhead.
 
*Also with this changes, it will be easy to check the resource usage for all instance, all host, etc, even including the overhead.
  
===Details===
+
===Potential Issue===
The claim fields:
+
*We will have one more DB access for each instance claim in the compute node, not sure if performance impact.
uuid: the identification for this claim.
+
*One more table is created.
instance_uuid: the owner instance of the
+
 
 +
===Upgrade/Compatibility===
 +
This change itself is compute node locally and will have no upgrade/compatibility issue.
 +
In next step when we move the migration object/_set_instance_host_and_node out of resource tracker, it may have impact, but that should be discussed at that time.

Revision as of 07:10, 15 February 2014

Summary

When a resource claim happens through instance_claim() or resize_claim() request, the compute resource tracker will get the COMPUTE_RESOURCE_SEMAPHORE, allocate the resource, persistent the allocation result, or claim, in the DB, then return the ID of the claims.

When a resource is put back to compute resource tracker through drop_resize_claim() or abort_instance_claim(), the claim (or simply claim ID) will be passed into the resource tracker, to free the resource.

See below for the benefit of this changes.

Implementation Plan

Data Structure

  • claim object:
  • Bulleted list item
  • uuid: ID of the claim
  • host: the host that support this claims
  • owner_type: The utility type of the resource claim owner, support 'server' only now..
  • owner_id: The ID of the resource claim owner. Support only insance_uuid now.
  • vcpu: the vcpu number
  • memory: the MB of memory claimed
  • disk: the GB of disk claimed.
  • pci: A json blob of the PCI devices claimed (Nullable)
  • extra_resource: A json blob for other resource types (Nullable)
  • tag: A tag provided by the caller and will be utilized/interpreted by the caller.

DB

  • A claims table will be created to track all claims.
It's an option to keep the claim information in the instance table (like through system metadata), but I'd prefer a separated table.

API

claim operation
  • get_claims_for_instance(instance_uuid): return the claims for instance.
  • get_claims_for_host(host): return the claims for the host.
Changes to resource tracker API
  • instance_claim/resize_claim:
Add a step to sync the claim information to DB
Consideration: Does it matter of the sequence of sync claims and sync compute node? Seems should not matter because we are in lock.
  • update_available_resource:
This function may changes a lot. Resource tracker get resource information from hypervisor, and then get all claims from DB, and update the available resource by deduct the claims from the resource information from hypervisor, and deduct the information from orphan insance.
Question: Should we create the claims in DB for the orphan instance? I assume should not.
  • drop_resize_claim()/abort_instance_claim():
Add a step to remove the claim information from DB.

Benefit

The key benefit of this changes is, the update_available_resource() update the resource usage information according to the claims in DB, instead of the instance state, thus the caller of resource tracker API (instance_claim/resize_claim/drop_claim etc) can allocate the resource, then change the instance state, and will have no race condition with the update_available_resource() anymore.

This benefit enables several potential changes:

  • We can move the migration object creation out of the resource tracker, by having the conductor to claim the resource in source/target host and then create the migration object.
  • We can move the _set_instance_host_and_node out of the resource tracker, by having the conductor to claim the resource and then set the host node.
  • After move migration objection creation out of the resource tracker, resource tracker don't need understand the migration process anymore and we can combine the instance_claim/resize_claim.
  • Now we can support resource claim through rpcapi. The caller send rpcapi to the compute node, and the compute node will invocate the resource tracker, get the COMPUTE_RESOURCE_SEMAPHORE, allocate the resource, persistent the claims, and then return back the id of the claim (or the whole claim object). With this changes, we can move the prep_resize etc out of compute manager to conductor.
  • Also with this changes, it will be easy to check the resource usage for all instance, all host, etc, even including the overhead.

Potential Issue

  • We will have one more DB access for each instance claim in the compute node, not sure if performance impact.
  • One more table is created.

Upgrade/Compatibility

This change itself is compute node locally and will have no upgrade/compatibility issue. In next step when we move the migration object/_set_instance_host_and_node out of resource tracker, it may have impact, but that should be discussed at that time.