Meteos/Howto

This section explains what to do to increase the model accuracy.



step 1. Parse Dataset
Parsing dataset is one of the efficient way to increase model accuracy.

You can see example [here].

In this example, user eliminates exception data from dataset using filter method.

step 2. Tuning model parameters
Parameter tuning is very important to increase the model accuracy. Parameters differs depending on prediction models.

(List of Parameters will be pasted here later.)

A "numIterations" is a common parameter except DecisionTreeModel.

A numIterations is a number of iterations run for each batch of data.

In general, you can increase the model accuracy by specifying larger number in numIterations. However, it takes more time for creating model.

You can specify model parameters in "model_params" section when creating model.

You can see a example [here].

step 3. Evaluate model accuracy
After creating prediction model, you can evaluate model using "meteos model-evaluation-create" method.

Before evaluating model, user have to create datasets for evaluation in advance.

In machine learning, user splits the dataset in general. One is for creating a prediction model, the other is for evaluating model.

You can specify the percentage of split in percent_train and percent_test parameter. $ cat sample/json/dataset_split.json {   "display_name": "sample-data", "display_description": "This is a sample dataset", "method": "split", "source_dataset_url": "swift://meteos/decision_tree_data.txt", "experiment_id": "c2287865-150a-4850-a724-946a7b6125a5", "percent_train": "0.9", "percent_test": "0.1", "swift_tenant": "demo", "swift_username": "demo", "swift_password": "nova" } $ meteos dataset-create --json sample/json/dataset_split.json +-+--+ +-+--+ +-+--+ $ meteos dataset-list +--+---+---+---+ +--+---+---+---+ +--+---+---+---+
 * Property   | Value                                |
 * created_at | 2017-02-10T02:12:50.000000           |
 * description | This is a sample dataset            |
 * head       | None                                 |
 * id         | da78d06e-0966-4a6d-b499-337da4f636b3 |
 * name       | sample-data_train_0.9                |
 * project_id | 9bbef6d798f24736aecb506764af84bf     |
 * status     | creating                             |
 * stderr     | None                                 |
 * user_id    | 1b046b966c9141779a271c3747c135e8     |
 * id                                  | name                  | status    | source_dataset_url                    |
 * da78d06e-0966-4a6d-b499-337da4f636b3 | sample-data_train_0.9 | available | swift://meteos/decision_tree_data.txt |
 * e9356aef-1a5a-43ad-8c47-2df55443fbdd | sample-data_test_0.1 | available | swift://meteos/decision_tree_data.txt |

Create a prediction model with a splitted datasets for creation.

$ cat sample/json/model_decision_tree.json {   "display_name": "sample-tree-model", "display_description": "Sample Decision Tree Model", "source_dataset_url": "internal://da78d06e-0966-4a6d-b499-337da4f636b3", "model_type": "DecisionTreeRegression", "model_params": "{'numIterations': 100}", "dataset_format": "libsvm", "experiment_id": "c2287865-150a-4850-a724-946a7b6125a5" } $ meteos model-create --json sample/json/model_decision_tree.json +-+--+ +-+--+ +-+--+
 * Property   | Value                                |
 * created_at | 2017-02-10T02:19:08.000000           |
 * description | Sample Decision Tree Model          |
 * id         | 0c502657-1440-4263-a718-bf54c35fd500 |
 * name       | sample-tree-model                    |
 * params     | eydudW1JdGVyYXRpb25zJzogMTAwfQ==     |
 * project_id | 9bbef6d798f24736aecb506764af84bf     |
 * status     | creating                             |
 * stderr     | None                                 |
 * stdout     | None                                 |
 * type       | DecisionTreeRegression               |
 * user_id    | 1b046b966c9141779a271c3747c135e8     |

Create a model evaluation with a splittted datasets for evaluation.

$ cat sample/json/model_evaluation.json {   "display_name": "sample-evaluation", "source_dataset_url": "internal://e9356aef-1a5a-43ad-8c47-2df55443fbdd", "model_id": "0c502657-1440-4263-a718-bf54c35fd500", "swift_tenant": "demo", "swift_username": "demo", "swift_password": "nova" } $ meteos model-evaluation-create --json sample/json/model_evaluation.json ++-+ ++-+ ++-+.
 * Property          | Value                                           |
 * created_at        | 2017-02-10T02:22:59.000000                      |
 * id                | 0c20784d-376e-4155-b51f-7d653739b445            |
 * model_id          | 0c502657-1440-4263-a718-bf54c35fd500            |
 * model_type        | DecisionTreeRegression                          |
 * name              | sample-evaluation                               |
 * project_id        | 9bbef6d798f24736aecb506764af84bf                |
 * source_dataset_url | internal://e9356aef-1a5a-43ad-8c47-2df55443fbdd |
 * status            | creating                                        |
 * stderr            | None                                            |
 * stdout            | None                                            |
 * user_id           | 1b046b966c9141779a271c3747c135e8                |

You can see the evaluation score as a stdout parameter.

$ meteos model-evaluation-show 15a1e95e-ce66-4ea4-9ec4-a486da298a6e ++-+ ++-+ ++-+
 * Property          | Value                                           |
 * created_at        | 2017-02-09T12:07:05.000000                      |
 * id                | 15a1e95e-ce66-4ea4-9ec4-a486da298a6e            |
 * model_id          | 31e9f2cc-25e8-4100-b926-9218fa1dc68d            |
 * model_type        | DecisionTreeRegression                          |
 * name              | eva1                                            |
 * project_id        | 9bbef6d798f24736aecb506764af84bf                |
 * source_dataset_url | internal://746ad46c-c66a-4603-aaac-45197a782ff8 |
 * status            | available                                       |
 * stderr            |                                                 |
 * stdout            | Precision: 1.0                                  |
 * | Recall: 1.0                                    |
 * | F1 Score: 1.0                                  |
 * user_id           | 1b046b966c9141779a271c3747c135e8                |

step 4. Recreate model with new datasets
it's desirable for prediction models to increase the accuracy continuously.

you can re-create model with new datasets using "meteos model-recreate" method.

If you updated new datasets, you have to recreate a prediction model with new datasets to increase model accuracy continuously.

$ meteos model-list +--+--+---+++ +--+--+---+++ +--+--+---+++ Update new datasets $ swift upload meteos recommendation_data.txt recommendation_data.txt Recreate a prediction model $ vim sample/json/model_recreate.json $ cat sample/json/model_recreate.json {   "source_dataset_url": "swift://meteos/recommendation_data.txt", "swift_tenant": "demo", "swift_username": "demo", "swift_password": "nova" } $ meteos model-recreate 4ece0645-f9ed-4677-8548-b369f6b3835c --json sample/json/model_recreate.json $ meteos model-list +--+--++++ +--+--++++ +--+--++++ $ meteos model-list +--+--+---+++ +--+--+---+++ +--+--+---+++
 * id                                  | name                 | status    | type           | source_dataset_url                     |
 * 4ece0645-f9ed-4677-8548-b369f6b3835c | Movie Recommendation | available | Recommendation | swift://meteos/recommendation_data.txt |
 * id                                  | name                 | status     | type           | source_dataset_url                     |
 * 4ece0645-f9ed-4677-8548-b369f6b3835c | Movie Recommendation | recreating | Recommendation | swift://meteos/recommendation_data.txt |
 * id                                  | name                 | status    | type           | source_dataset_url                     |
 * 4ece0645-f9ed-4677-8548-b369f6b3835c | Movie Recommendation | available | Recommendation | swift://meteos/recommendation_data.txt |