Difference between revisions of "MagnetoDB/specs/streamingbulkload"
Ikhudoshyn (talk | contribs) (Created page with "= [Draft] MagnetoDB Streaming Bulk Load workflow and API= === Workflow === This page describes process of loading large amounts of data into MagnetoDB. Before uploading the d...") |
m (Isviridov moved page MagnetoDB/streamingbulkload to MagnetoDB/specs/streamingbulkload: wiki restructuring) |
||
(8 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
= [Draft] MagnetoDB Streaming Bulk Load workflow and API= | = [Draft] MagnetoDB Streaming Bulk Load workflow and API= | ||
− | === Workflow === | + | ==== Workflow ==== |
This page describes process of loading large amounts of data into MagnetoDB. | This page describes process of loading large amounts of data into MagnetoDB. | ||
Line 7: | Line 7: | ||
Data is uploaded in one streaming HTTP request. | Data is uploaded in one streaming HTTP request. | ||
− | + | ==== URL ==== | |
POST v1/{project_id}/data/tables/{table_name}/bulk_load | POST v1/{project_id}/data/tables/{table_name}/bulk_load | ||
− | === Headers === | + | ==== Headers ==== |
* User-Agent | * User-Agent | ||
* Content-Type: application/json | * Content-Type: application/json | ||
Line 16: | Line 16: | ||
* X-Auth-Token keystone auth token | * X-Auth-Token keystone auth token | ||
− | + | ==== Request Syntax ==== | |
− | Data stream is a plain text | + | Data stream is a plain text that contains '\n' separated sequence of JSON representations of items to be inserted. |
<pre> | <pre> | ||
{ "attribute_name": { "attribute_type": "attribute_value"}, "attribute_name2": { "attribute_type": "attribute_value"}...} | { "attribute_name": { "attribute_type": "attribute_value"}, "attribute_name2": { "attribute_type": "attribute_value"}...} | ||
Line 23: | Line 23: | ||
</pre> | </pre> | ||
− | + | ==== Response Syntax ==== | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
<pre> | <pre> | ||
{ | { | ||
− | "read": | + | "read": <number>, |
− | "processed": | + | "processed": <number>, |
− | " | + | "unprocessed": <number>, |
− | " | + | "failed": <number>, |
− | " | + | "last_item": <string>, |
− | + | "failed_items": { | |
− | "item": | + | "item": <string>, |
− | + | "item": <string>, | |
− | + | ... | |
− | + | } | |
− | "item": | ||
− | |||
− | |||
− | |||
− | |||
} | } | ||
</pre> | </pre> | ||
+ | In case of error, incoming data stream will continue to be read, but received items won't be processed. | ||
+ | Response will contain counts of received, processed (successfully inserted), unprocessed and failed items, last processed item and error messages for failed items. | ||
Due to asynchronous processing of received items, 'PutItem' operations for several items may be enqueued when an error is found. | Due to asynchronous processing of received items, 'PutItem' operations for several items may be enqueued when an error is found. | ||
In such case server will wait for all enqueued operations' results. Some of the results may be errors too. So response will contain more than one error. | In such case server will wait for all enqueued operations' results. Some of the results may be errors too. So response will contain more than one error. |
Latest revision as of 10:57, 9 September 2014
Contents
[Draft] MagnetoDB Streaming Bulk Load workflow and API
Workflow
This page describes process of loading large amounts of data into MagnetoDB.
Before uploading the data one should first make sure that destination table exists.
Data is uploaded in one streaming HTTP request.
URL
POST v1/{project_id}/data/tables/{table_name}/bulk_load
Headers
- User-Agent
- Content-Type: application/json
- Accept: application/json
- X-Auth-Token keystone auth token
Request Syntax
Data stream is a plain text that contains '\n' separated sequence of JSON representations of items to be inserted.
{ "attribute_name": { "attribute_type": "attribute_value"}, "attribute_name2": { "attribute_type": "attribute_value"}...} { "attribute_name": { "attribute_type": "attribute_value"}, "attribute_name2": { "attribute_type": "attribute_value"}...}
Response Syntax
{ "read": <number>, "processed": <number>, "unprocessed": <number>, "failed": <number>, "last_item": <string>, "failed_items": { "item": <string>, "item": <string>, ... } }
In case of error, incoming data stream will continue to be read, but received items won't be processed. Response will contain counts of received, processed (successfully inserted), unprocessed and failed items, last processed item and error messages for failed items. Due to asynchronous processing of received items, 'PutItem' operations for several items may be enqueued when an error is found. In such case server will wait for all enqueued operations' results. Some of the results may be errors too. So response will contain more than one error.