MetadataSearchAPI

Note: This is a historical document and is not official documentation for all or part of any OpenStack project.

=HTTP REST API for OpenStack Object Storage Metadata Search (OSMS)=

''Refer to the MetadataSearch Wiki page for this project's home page and further project information. This page is specifically the API specification.''

=Introduction=

This document specifies a set of REST API additions to OpenStack's Object Storage API to support a new metadata search feature, abbreviated OSMS, for Object Storage Metadata Search.

The OSMS API extends the set of features described in the base OpenStack Object Storage API, which can be found at http://docs.openstack.org/api/openstack-object-storage/1.0.

The OSMS feature maintains metadata for Object Storage accounts, containers, and objects. OSMS provides an HTTP REST API with a rich set of metadata search parameters for users and applications to search the stored metadata.

We use the term "item" to refer to an account, container, or object, when the distinction is unnecessary.

The HTTP REST request defines the base URI to search, the search criteria, the list of metadata fields to return for the items matching the criteria, and how to filter and format the resulting output.

Wiki page origin
This Wiki page was originally created by exporting the MS Word doc v0.8 of the API spec by HP's Lincoln Thomas, presented at the Icehouse Design Summit's Metadata Search session. It was exported to this Wiki page to enable community editing and discussion, with saved Wiki history.''

The PDF export of this Word doc is still available on the MetadataSearch Wiki page. No further updates of the Word/PDF document are planned. The Microsoft Office Word Add-in For MediaWiki was used to export it, so please forgive the occasional unfortunate formatting (or feel free to improve it!).

When this feature's spec and reference implementation is finalized and ready for publication as an API, it will likely be converted to DocBook and WADL for the OpenStack API page.

=System and custom metadata=

Two types of metadata are supported for searches, and both can be referenced in the same search:


 * System metadata applies to all items (accounts, containers, and objects). Each item stored includes a fixed set of attributes comprising its system metadata. System metadata attributes cannot be deleted by the user through the API.


 * Custom metadata applies only to items where the user assigns them using the "X-[Account|Container|Object]-Meta-&lt;key&gt;: &lt;value&gt;" HTTP header. Custom metadata names are user-defined, with value strings also defined by the user. Custom metadata can be added, replaced, or deleted by the user as specified in the OpenStack API. Custom metadata attributes are distinguished from system metadata attributes by the [account|container|object]_m eta_ prefix.

=Metadata attributes available=

The following table describes the system and custom metadata attributes available for searches using the OSMS API.

For descriptions of the Types column, see the section "Data types."

Live and deleted items
With the exception of the *delete_time attributes, all of the system metadata attributes listed in this table are valid for live (not-yet-deleted) items. For deleted items, only the following attributes are valid: *uri, *name, *delete_time, and *last_activity_time.

By default, only results for live items will be returned. To include deleted items in the results, include the appropriate *delete_time system attribute(s), either as part of the attribute list to be returned, or as a query criterion. For example, including delete_time or account_delete_time will cause account results to include live and deleted items that meet the query criteria to be returned.

=Superset metadata attributes available=

The OSMS API provides the following special attribute names that can be requested in the &attributes parameter list, to return a superset of certain system and/or custom attributes. These attributes cannot be used as query criteria or as sort parameters.

The same rules about live and deleted files that apply to system attributes apply to these attributes as well.

=HTTP syntax=

The HTTP request format for obtaining the list of services for this implementation is the following:

GET /services HTTP/1.1

The HTTP request for performing metadata searches has the following syntax, as one request but shown on multiple lines for readability:

GET / [/ [/ [/ ]]] ?metadata:  [&attributes= [, ][,…]] [&query=[(] [%20[AND|OR]%20 ][)][%20[AND|OR]%20…]] [&sorted[= [, ,…]]] [&limit= ] [&all_results] [&marker=' '] [&end_marker=' '] [&offset= ] [&prefix=' '] [&path=' '] [&delimiter=' '] [&format=[json|xml]] HTTP/1.1

The equivalent curl command formats for the two request types are the following, each as one command line:

curl –g "http [ s] ://&lt;IP address|hostname&gt;[:&lt;port&gt;] /services"

curl –g "http[s]://[: ] / [/ [/ [/ ]]] ?metadata:  [&attributes= [, ][,…]] [&query=[(] [ [AND|OR] ][)][ [AND|OR]…]] [&sorted[= [, ,…]]] [&limit= ] [&all_results] [&marker=' '] [&end_marker=' '] [&offset= ] [&prefix=' '] [&path=' '] [&delimiter=' '] [&format=[json|xml]]"

Syntax notes

 * Only the HTTP GET verb is used for metadata searches.
 * Optional parameters are shown in square brackets [ and ]. Everything enclosed in the brackets can be omitted from the request. Do not include the square brackets in the request. For example, the query parameter is optional and is not required to be in the HTTP request.
 * Parameters are shown in angle brackets &lt; and &gt;. Replace the parameter with the actual value, without the angle brackets.
 * Other characters shown in the syntax (such as =, ?, &, and /) must also be entered as-is in the request, and sometimes must be URL-encoded.

Quoting

 * Numeric values for parameters must not be quoted.
 * Date values for parameters must be enclosed in single quotes, e.g.:

'2013-06-09Z'


 * Reserved strings, including API versions, attribute names, and format types, must not be quoted, e.g.:

&attributes=objectLastModifiedTime,objectContentType


 * User-defined string values for parameters must be enclosed in single quotes, e.g.:

prefix='employees/'


 * Any single quotes that are part of a quoted string value must be escaped with a second single quote. For example:

'Dave''s book'


 * In the curl syntax shown above, the –g option and the double quotes around the entire URL ensure that the URL contents will be parsed correctly by curl. Any double quotes that are part of a quoted string value in a double quoted curl command must be escaped with a backslash: the \"right\" way

URL encoding
HTTP request strings are URL-decoded by the API code. API clients must encode special characters, such as greater-than character (&gt;), by replacing them with their hexadecimal equivalent values as shown by the examples in this section.

The API’s URL decoder interprets certain special characters properly without being URL encoded. Before the question mark character (?) in any HTTP request URL, the following characters are safe and do not need to be URL encoded:

/ : - _ . ~ @ #

After the question mark character (?), the following characters are safe and do not need to be URL encoded:

= & #

All other characters must be URL encoded as their hexadecimal value as described in the ISO-8859-1 (ISO-Latin) standard. For example, the plus character (+) must be encoded as %2B, and the greater than character (&gt;) must be encoded as %3E.

Spaces can be encoded as either %20 or as the plus character (+), such as "my% 20file.tx t" or "my+ file.tx t" for the file "my file.tx t". The plus character (+) is converted to a space when the URL is decoded by the API code. To include a plus character (+) in the URL, encode it as %2B, such as "A%2B" instead of "A+".

Data types
The type of each attribute will be one of:

Date formats
All date/time values accepted by the API in HTTP requests, must be in ISO 8601:2004 format. See http://www.iso.org/iso/catalogue_detail?csnumber=40874 or http://en.wikipedia.org/wiki/ISO_8601 for details on the format.

Examples of values accepted in HTTP requests:

'2013-06-09'

(9-Jun-2013, at time 00:00:00Z, where Z = Zulu a.k.a. GMT a.k.a. UTC time zone)

'2013-06-09T09:02:26Z'

(9-Jun-2013 at 9:02:22am in Zulu/ GMT/UTC time zone)

'20130609T090226Z'

(The same date/time without separators)

'2013-06-09T02:02:26-0700'

(9-Jun-2013 at 2:02:22am in time zone -0700 = 7 hours behind UTC, which could be e.g. Pacific Daylight Time, or Mountain Standard Time. Equivalent to T09:02:22Z)

Example value not accepted in HTTP requests:

'2013-06-09T09:02:26'

(No timezone indicator, will return an error)

'Mon, 17 Oct 2011 14:31:11 GMT'

(RFC 5322 format, not ISO 8601 format, will return an error)

'1346895723.552374000'

(UNIX epoch format, not ISO 8601 format, will return an error)

All date/time values returned in HTTP responses also will be in ISO 8601:2004 format, and will always be in the UTC time zone.

The granularity of the metadata time values stored and returned in HTTP responses may vary depending on how the value is stored and obtained. The granularity of a given time value may be in nanoseconds, microseconds, or in seconds only. The fractional seconds shown in the response, however, will always be in nanoseconds.

Example values returned include:

"2013-06-09T19:02:22.000000000Z"

(seconds granularity)

"2013-06-09T19:02:22.817493000Z"

(microseconds granularity)

"2013-06-09T19:02:22.359070125Z"

(nanoseconds granularity)

URIs and metadata search scope
A URI uniquely defines an account, container, or object. In an HTTP request, the URI to the left of the '?' character defines the scope of the metadata search.

The URI can specify any of these combinations to the left of the '?':


 * 1) An account, container, and object, known as an object-level URI. Searches will operate only upon that object.
 * 2) An account and container only, known as a container-level URI. Searches will be limited to the container and the objects within it.
 * 3) An account only, known as an account-level URI. Searches will be limited to the account, and all containers and objects within it.
 * 4) None of these three parameters. Searches will operate on all accounts, containers, and objects in the object store.

The search behavior depends on the parameters specified in the request. See the table below for details on each of these parameters.

=HTTP request parameters=

=Authorized searchers=

The concept of authorized searchers provides a way to maximize query speed by bypassing the need to check the searching user's access to each account, for a set of users defined by the storage administrator. These users must authenticate via the OpenStack authentication mechanism to be able to perform any query, just like all users. Once authenticated, however, the authorized searcher can access any metadata for all accounts, containers, and objects via the API.

For users that are not listed as authorized searchers, the engine will check the access to each account and container to be queried, and will only return results for accounts and containers to which the user has access. This access checking can significantly impact the performance of the query.

A storage administrator defines this list of users that are authorized searchers using a management console or using CLI commands. The administrator may distribute different user credentials to different users, for example to allow each user to manage their own password, or for audit tracking.

The authorized searchers feature is optional. If not used, access to metadata will be limited for all users, according to the user's access to accounts and containers.

=Output formats=

The default output format for the query results is plain text, if no "format" parameter is supplied.

Plain Text format
The plain text output format is human readable, with indent levels of 4 spaces per indent. The top level has no indent, and always represents the URI of an item matching the query criteria. The requested attributes for that item are listed under the URI, indented.

If &lt;sorted&gt; or &lt;sorted=uri&gt; is specified, then the results will be in lexicographic order by URI. Example:

/account1 account_container_count:15 /account1/container1 container_last_modified_time:2013-07-23T13:17:55.435654031Z /account1/container1/objectdir1/subdir1/photo.jpg object_last_changed_time:2012-12-02T00:53:29.849922518Z object_content_length:194532 /account1/container2 container_last_modified_time:2013-07-23T13:17:55.435654031Z /account1/container2/anotherObject object_last_changed_time:2012-12-02T00:53:29.849922518Z object_content_length:194532

If &lt;sorted&gt; or &lt;sorted=uri&gt; is not specified, then the results will not be sorted, or will be sorted by the attribute defined in &lt;sorted=attr&gt;. Example of unsorted output:

/account1 account_container_count:15 /account1/container1 container_last_modified_time:2013-07-23T13:17:55.435654031Z /account1/container2 container_last_modified_time:2013-07-23T13:17:55.435654031Z /account1/container1/objectdir1/subdir1/photo.jpg object_last_changed_time:2012-12-02T00:53:29.849922518Z object_content_length:194532 /account1/container2/anotherObject object_last_changed_time:2012-12-02T00:53:29.849922518Z object_content_length:194532

Each line of output is terminated by a single UNIX-style end-of-line character, UTF-8 value 10 (0x0a).

JSON format
The JSON output format for query responses conforms to standard and well-formed JSON. The first level always represents the URI of an item matching the query criteria. The requested attributes for that item are listed as the second level after the URI.

See the description of sorted vs. unsorted output in the "Plain text" section above.

Example for sorted output:

[ {    "/account1" : {     "account_container_count" : "15" } },  {    "/account1/container1" : {     "container_last_modified_time" : "2013-07-23T13:17:55.435654031Z" } },  {    "/account1/container1/objectdir1/subdir1/photo.jpg" : {     "object_last_changed_time" : "2012-12-02T00:53:29.849922518Z", "object_content_length" : 194532 } },  {    "/account1/container2" : {     "container_last_modified_time" : "2013-07-23T13:17:55.435654031Z" } },  {    "/account1/container2/anotherObject" : {     "object_last_changed_time" : "2012-12-02T00:53:29.849922518Z", "object_content_length" : 194532 } } ]

XML format
The XML output format for query responses conforms to standard and well-formed XML. The output is flat, not hierarchical. Objects are not nested in containers, and containers are not nested in accounts. Each item is provided separately, which allows for unsorted and sorted outputs based on arbitrary sorting criteria.

See the description of sorted vs. unsorted output in the "Plain text" section above.

Example for sorted output:



 15  2013-07-23T13:17:55.435654031Z  2012-12-02T00:53:29.849922518Z 194532</object_content_length> <container uri="/account1/container2"> 2013-07-23T13:17:55.435654031Z</container_last_modified_time> <object uri="/account1/container2/anotherObject"> 2012-12-02T00:53:29.849922518Z</object_last_changed_time> <object_content_length>194532</object_content_length>

=Example searches=

Get all metadata for all accounts, containers, and objects
curl -g "http://99.226.50.92/v1?metadata:v1&attributes=all_attrs"

The search returns an entry for every account, every container, and every object. Each entry contains all system attributes, and any custom attributes. The output format is plain text (see the section "Plain Text format").

Note that unless the user issuing the search is an authorized searcher, the search must authenticate the user against each account, and each container in non-authorized accounts. The results will contain metadata only from authorized accounts, and containers from non-authorized accounts that provide read permission to the user in the container read ACL. Thus, searches by non-authorized searchers across multiple accounts may be significantly slower than searches by authorized searchers.

Get selected metadata for an object, its container, and its account
curl -g "http://99.226.50.92/v1/acc1/ctr2/obj3?metadata:v1&attributes=account_container_count,account_object_count,account_meta_billing_method,all_container_system_attrs,all_object_meta_attrs&format=json"

The search returns first an entry for account acc1 with the three named attributes (two system attributes account_container_count and account_object_count, and one custom attribute account_meta_billing_method). The next entry is for the container ctr2, returning all system attributes (all_container_system_attrs) but no custom attributes. If the object obj3 has any custom metadata attributes, the next entry is for obj3 with its custom metadata but no system attributes (all_object_meta_attrs). If obj3 has no custom attributes, then no results are returned for obj3.

The output format is JSON (see the section "JSON format").

Get metadata for an account, and objects meeting a set of criteria
curl -g "http://99.226.50.92/v1/acc1?metadata:v1&attributes=account_object_count,object_last_changed_time&query=container_create_time&gt;2013-08-01 AND object_manifest_type=1 and object_manifest~'segctr1/.*'"

The search returns first the object count for the acc1 account. Next, it returns the last changed time for all objects in the account where the object's container was created after 1-Aug-2013 at 00:00:00 UTC and the object is a manifest for a DLO object that has segments in the container segctr1.

This example also shows how the result set is limited by the attributes list. Although a container attribute is present in the query, no container entries exist in the final result set because the attribute list contains no container attributes. But because the URI of each item is always returned, the account and container for each object will be shown.

Note that the 'AND' operator is case-insensitive.

Identify accounts and containers meeting a set of criteria
curl -g "http://99.226.50.92/v1?metadata:v1&query=account_object_count&lt;100 AND container_meta_customer_name~'.*'"

The search returns only the URIs for all accounts and containers (but not objects), where the account's object count is less than 100, and containers have the custom attribute "customer_name" regardless of its value.

This example also shows how the result set is limited by the item types in the query expressions, in the absence of a list of &lt;attributes&gt; to be returned. Since no object attributes exist in the query, no object items are returned.

Get metadata for all results, sorted by URI
curl -g "http://99.226.50.92/v1/acc1/ctr2?metadata:v1&attributes=account_container_count,object_content_type&sorted&all_results"

The search first returns the account container count for acc1. Next, it returns the content type of each object in ctr2, sorted by the object URI. The all_results parameter overrides the default limit of 10,000 items in the response. All objects in ctr2 will be returned, regardless of the number of objects in it. The client accepting the HTTP response must be able to receive an arbitrarily large amount of data in this case.

Get the 3rd page of 100 results, sorted by multiple attributes
curl -g "http://99.226.50.92/v1/acc1?metadata:v1&attributes=all_system_attrs&query=object_content_length&gt;1000000&sorted=container_object_count,object_uri_create_time&limit=100&offset=201"

The search returns results 201 through 300 of the sorted result set defined by the scope and query. The results in total are sorted first by container object count, then by the object URI's first creation time.

Note that because the superset attribute all_system_attrs in the attributes list applies to all item levels, an entry for acc1 will be returned, as well as entries for all containers and objects satisfying the query.

For example, suppose two containers exist, with 300 and 200 objects respectively. First, the system attributes for the acc1 account is returned. Next, the second container is returned, followed by its objects that are greater than 1,000,000 bytes (of the 200 that exist), in order of the object URIs' creation times. Next, the first container and all of its objects satisfying the query (of the 300 that exist) are returned, sorted the same way.

Get a portion of the sorted results, between the given URIs
curl -g "http://99.226.50.92/v1/acc1?metadata:v1&attributes=all_system_attrs&sorted&marker='acc1/ctr13/obj_x'&end_marker='acc1/ctr17/obj_y'"

The search returns the account, containers, and objects in the acc1 account, between the URIs 'acc1/ctr13/obj_x' and 'acc1/ctr17/obj_y' (non-inclusive) in the sorted (by URI) list of results.

=Services request=

The &lt;services&gt; parameter requests the API versions and list of services provided by the server's implementation of the given Object Storage API version and Metadata Search API version. Following are the elements of the response.

This request was created with the intention of generalizing the OSMS API to become a standard OpenStack metadata search API. That API would have a reference implementation provided by OpenStack, but the API could also be implemented by other vendors supplying a search provider with different levels of API support. A search provider may not choose to implement all of the API, and/or may add elements to the API. This request defines the elements provided with the implementation. It allows clients to access elements of the API appropriate to different search providers on different OpenStack instances.

Responses follow the following JSON format example. No other formats such as XML or plain text are supported.

[ {    "min_base_api_version" : "v1" }, {    "max_base_api_version" : "v1" }, {    "search_provider" : "HP" }, {    "search enabled" : "true" },   {    "min_search_api_version" : "v2.1" }, {    "max_search_api_version" : "v2.1" }, {    "freshness_complete" : "true" }, {    "freshness_partial" : "" }, {    "complex_boolean_expr" : "false" }, {    "attributes" : {     {        "attr_name" : "account_uri" },     {        "data_type" : "string" },     {        "sortable" : "true" }   }, … more attributes … {     {        "attr_name" : "size" },     {        "data_type" : "numeric" },     {        "sortable" : "false" }   },  } ]

This concludes the HTTP REST API for OpenStack Object Storage Metadata Search (OSMS)