Jump to: navigation, search

Difference between revisions of "Nova/Object Cache"

(Created page with "'''Nova/Object Cache''' One of the reasons why Objects were introduced was to reduce the load on the database. But a further optimization is possible and especially useful in...")
 
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''Nova/Object Cache'''
+
With the move to objects, there is reduced pressure on the database. Additionally we can reduce network pressure caching objects.
 +
This is especially useful for large clouds comprised of many nodes. It essentially leverages the updated_at field in the database that is used to track when an object changed.
 +
If a local object copy indicates an earlier updated_at, it must be refreshed, else use with full confidence. Management interfaces such as Horizon frequently retrieve large sets of objects and refresh them frequently (when they are not event based).
  
One of the reasons why Objects were introduced was to reduce the load on the database. But a further optimization is possible
+
Object header lists are tuples of object-id and updated_at.
and especially useful in a cloud of many hosts and consequently many more objects representing VMs, networks, security groups and much more.
 
This is particularly useful for management dashboards such as horizon for very large clouds. For instance you want to display all VMs in the system (active, sleeping ..), or all VMs belonging to a tenant etc.
 
  
The database field: "updated_at" in conjunction with the object-id can be used to determine whether a copy of an object one has on hand is the latest.
+
Consider for instance the Horizon web server caching a list of virtual machine instance objects that it displays. Assume the headers are: {(VM1, t1), (VM2, t2), (VM3, t3) }
If it is, one may use it, display it without further re-fresh.
+
Assume that only VM2 was updated and a new instance VMk was created. It is adequate to retrieve only these two objects and use them in conjunction with the existing local copies of VM1 and VM3.
  
To support caching,
+
Yet another optimization is to retrieve only fields that are expected to be required/used. This would be useful in say a table view where only an abstract view of an object is provided and should the user drill down, a more detail view provided.  
# the base object class should carry the field "updated_at".
 
# Need an API that retrieves only headers .. get_all_instances( .., headers_only=true)
 
  this would return {(VM1, t1), (VM2, t2), (VM3, t3) ...(VMz, tz)}
 
#refresh_cache(my_cache, object_headers)
 
This method essentially establishes whether the object at hand was updated since it was last retrieved. For instance, assume the cache contains only
 
{(VM1, t0), (VM2, t2)} and that t0 < t1.
 
The refresh option then retrieve a fresh copy of VM1 and copies of VM3 and VMz to give
 
  
 +
Last but not least since some logic is required such as what to retrieve and update the cache, it may be useful to implement caching for objects where we expect savings either from network traffic saved, or network latency to retrieve information etc. Useful for objects that do not change too rapidly.
  
While this solution requires two calls and a comparison and then retrieval of full objects, in cases where all objects do not change, or do not change too frquently,
+
==API changes ==
much the caching improves performance. 
+
* '''get_instances(headers_only=True/False)''' or get_objects(object_type, headers_only=TrueFalse)
 +
    when headers_only=True the result would be a list of object-id and updated_at tuples.
 +
    Else a list of object instances in their entirety.
 +
* '''get_instances(object_id_list)'''
 +
    Only objects retrieved for the object-ids listed
 +
* '''cache_refresh(local_cache, object_list)'''
  
Yet another performance speed-up is to request an object with only limited fields. This typically is useful in a display hierarchy where the more detailed object is
+
Determining how to refresh the cache should happen at the point of use. It would also save compute resources at the object repository.
only required should the user select the item.
 

Latest revision as of 09:33, 10 January 2014

With the move to objects, there is reduced pressure on the database. Additionally we can reduce network pressure caching objects. This is especially useful for large clouds comprised of many nodes. It essentially leverages the updated_at field in the database that is used to track when an object changed. If a local object copy indicates an earlier updated_at, it must be refreshed, else use with full confidence. Management interfaces such as Horizon frequently retrieve large sets of objects and refresh them frequently (when they are not event based).

Object header lists are tuples of object-id and updated_at.

Consider for instance the Horizon web server caching a list of virtual machine instance objects that it displays. Assume the headers are: {(VM1, t1), (VM2, t2), (VM3, t3) } Assume that only VM2 was updated and a new instance VMk was created. It is adequate to retrieve only these two objects and use them in conjunction with the existing local copies of VM1 and VM3.

Yet another optimization is to retrieve only fields that are expected to be required/used. This would be useful in say a table view where only an abstract view of an object is provided and should the user drill down, a more detail view provided.

Last but not least since some logic is required such as what to retrieve and update the cache, it may be useful to implement caching for objects where we expect savings either from network traffic saved, or network latency to retrieve information etc. Useful for objects that do not change too rapidly.

API changes

  • get_instances(headers_only=True/False) or get_objects(object_type, headers_only=TrueFalse)
   when headers_only=True the result would be a list of object-id and updated_at tuples.
   Else a list of object instances in their entirety.
  • get_instances(object_id_list)
    Only objects retrieved for the object-ids listed 
  • cache_refresh(local_cache, object_list)

Determining how to refresh the cache should happen at the point of use. It would also save compute resources at the object repository.