Difference between revisions of "Nova/Object Cache"

Latest revision as of 09:33, 10 January 2014

With the move to objects, there is reduced pressure on the database. Additionally we can reduce network pressure caching objects. This is especially useful for large clouds comprised of many nodes. It essentially leverages the updated_at field in the database that is used to track when an object changed. If a local object copy indicates an earlier updated_at, it must be refreshed, else use with full confidence. Management interfaces such as Horizon frequently retrieve large sets of objects and refresh them frequently (when they are not event based).

Object header lists are tuples of object-id and updated_at.

Consider for instance the Horizon web server caching a list of virtual machine instance objects that it displays. Assume the headers are: {(VM1, t1), (VM2, t2), (VM3, t3) } Assume that only VM2 was updated and a new instance VMk was created. It is adequate to retrieve only these two objects and use them in conjunction with the existing local copies of VM1 and VM3.

Yet another optimization is to retrieve only fields that are expected to be required/used. This would be useful in say a table view where only an abstract view of an object is provided and should the user drill down, a more detail view provided.

Last but not least since some logic is required such as what to retrieve and update the cache, it may be useful to implement caching for objects where we expect savings either from network traffic saved, or network latency to retrieve information etc. Useful for objects that do not change too rapidly.

API changes

get_instances(headers_only=True/False) or get_objects(object_type, headers_only=TrueFalse)

   when headers_only=True the result would be a list of object-id and updated_at tuples.
   Else a list of object instances in their entirety.

get_instances(object_id_list)

    Only objects retrieved for the object-ids listed

cache_refresh(local_cache, object_list)

Determining how to refresh the cache should happen at the point of use. It would also save compute resources at the object repository.

@@ Line 8: / Line 8: @@
 Assume that only VM2 was updated and a new instance VMk was created. It is adequate to retrieve only these two objects and use them in conjunction with the existing local copies of VM1 and VM3.
+Yet another optimization is to retrieve only fields that are expected to be required/used. This would be useful in say a table view where only an abstract view of an object is provided and should the user drill down, a more detail view provided.
-'''get_instances(cached_headers,''' ... <currently existing args such as tenant-id>) == > in this scheme the object server does the determination of what to send.
+Last but not least since some logic is required such as what to retrieve and update the cache, it may be useful to implement caching for objects where we expect savings either from network traffic saved, or network latency to retrieve information etc. Useful for objects that do not change too rapidly.
-A final merge must happen at the client.  Merge is typically for lists. No merge for singleton object refreshes.  Refresh could be recursive for complex objects with aggregates and sub-objects. It may be a good idea to support cached objects for cases where significant benefit is expected .. things typically displayed in management dashboards.
-'''merge(cache, retrieved_objects)'''
-Alternately the client can request headers from the object server and then determine which complete objects to retrieve and do a final merge on receipt.
+==API changes ==
-'''get_all_instances( .., headers_only=true)'''
+* '''get_instances(headers_only=True/False)''' or get_objects(object_type, headers_only=TrueFalse)
+    when headers_only=True the result would be a list of object-id and updated_at tuples.
+    Else a list of object instances in their entirety.
+* '''get_instances(object_id_list)'''
+     Only objects retrieved for the object-ids listed
+* '''cache_refresh(local_cache, object_list)'''
-The refresh method essentially does a compare of the cached copy of an object (using the object-id) based on its updated_at value with the latest greatest in the respository per its updated_at value.
+Determining how to refresh the cache should happen at the point of use. It would also save compute resources at the object repository.
-When there are many objects, possibly with aggregates of subjects with few of them changing, this caching is useful.
-Occasionally displays have a table abstract view and a more detailed view. It may be useful to add an API that requests only some fields of any object to support the abstract view.
-'''get_all_instances(field1, field2 ..)'''