Jump to: navigation, search

Difference between revisions of "TellerHttpCaching"

(Adding Background)
(Adding Caching in Teller body)
Line 14: Line 14:
  
 
These two headers, Cache-Control: max-age and Etag will provide the information that client caches or transparent caching proxies (Squid or Varnish) need to make informed cache descions regarding eviction and validation.
 
These two headers, Cache-Control: max-age and Etag will provide the information that client caches or transparent caching proxies (Squid or Varnish) need to make informed cache descions regarding eviction and validation.
 +
 +
Caching In Teller
 +
 +
Teller's main role in [[OpenStack]] is to resolve image_uris (which act as globally unique identifiers for an image) into image data. It does this by looking up the image_uri in an image registry (in this case Parallax) and then fetching the data from a backend. The lookups to Parallax will also benefit from caching, but we will focus on that in a separate blueprint ([[ParallaxHttpCaching]]). Rather, in this blueprint, we will discuss how we can cache the image data. Since this data may be on the order of gigabytes, the savings in bandwidth and transfer times is potentially very large.
 +
 +
Caching in Teller is predicated on a single extremely important assumption: image data is immutable. This means, if a user would like to modify an image, the image will need to be re-registered in Parallax. The benefit of making this assumption, aside from simplicity, is that we can now cache that image safely at various layers without having to make validation requests to the origin object-store (usually Swift). Of course, an image may be deleted or have some other relevant part of its metadata change, so Teller will still need to make lookups in Parallax to ensure availablilty of the image. (NOTE: for security reasons, whenever we fetch an image from the object store we still need to perform checksum validations to ensure the image described by Parallax matches what actually resides in the Backend object-store. What the assumption above states, in other words is: once image data is determined to be valid, it is, by definition valid for as long as the image is available).

Revision as of 14:58, 27 October 2010

HTTP Caching In Teller

Introduction

Glance consists of two services, Parallax, the image registry, which stores image metadata describing the image and where to fetch it, and Teller which acts as a proxy for the object store containing the actual image data. Both Parallax and Teller are HTTP servers and therefore can benefit from the performance improvements offered by HTTP caching. The following is a proposal for how to add HTTP caching to the Glance project, and in particular the Teller sub-project (for Parallax see ParallaxHttpCaching).

It should be noted that HTTP caching is not the only type of caching that could improve the speed of OpenStack builds. Down the road, we leave open an option for adding memcached to Parallax, a Bit-Torrent distribution system within the cluster and any number of other options. We are starting with HTTP caching first since it will offer a dramatic savings in bandwidth and performance without a lot of work and at the same time as a clear implementation path dictated by RFC 2616.

Background on HTTP Caching

HTTP caching is built on two fundamental concepts, freshness (aka cache-expiration) and validation. The expiration policy is governed by the max-age Cache-Control header (we are not using the Expires header since it requires clock synchronization between the client and the origin server). Validation, a process of verifying that cached data is still accurate, occurs by using a validator header, either Last-Modified or the Etag header added by HTTP/1.1. For this spec, we will only use Etag (Last-Modified suffers the same clock synchronization issues as Expires).

These two headers, Cache-Control: max-age and Etag will provide the information that client caches or transparent caching proxies (Squid or Varnish) need to make informed cache descions regarding eviction and validation.

Caching In Teller

Teller's main role in OpenStack is to resolve image_uris (which act as globally unique identifiers for an image) into image data. It does this by looking up the image_uri in an image registry (in this case Parallax) and then fetching the data from a backend. The lookups to Parallax will also benefit from caching, but we will focus on that in a separate blueprint (ParallaxHttpCaching). Rather, in this blueprint, we will discuss how we can cache the image data. Since this data may be on the order of gigabytes, the savings in bandwidth and transfer times is potentially very large.

Caching in Teller is predicated on a single extremely important assumption: image data is immutable. This means, if a user would like to modify an image, the image will need to be re-registered in Parallax. The benefit of making this assumption, aside from simplicity, is that we can now cache that image safely at various layers without having to make validation requests to the origin object-store (usually Swift). Of course, an image may be deleted or have some other relevant part of its metadata change, so Teller will still need to make lookups in Parallax to ensure availablilty of the image. (NOTE: for security reasons, whenever we fetch an image from the object store we still need to perform checksum validations to ensure the image described by Parallax matches what actually resides in the Backend object-store. What the assumption above states, in other words is: once image data is determined to be valid, it is, by definition valid for as long as the image is available).