Jump to: navigation, search

Difference between revisions of "Meteos/DatasetsandModels"

(Meteos Prediction Models)
(DataSet Format)
 
(17 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Meteos Prediction Models ==
+
== Meteos Dataset ==
 +
 
 +
Dataset is a data to create a prediction model.
 +
 
 +
=== DataSet Format ===
 +
 
 +
Meteos currently supports following data format.
 +
 
 +
Supported dataset format differs depending on prediction models.
 +
 
 +
{| class="wikitable sortable" style="text-align: center;"
 +
|-
 +
!Model/Dataset Format!! CSV data format !! LibSVM data format !! Text data format
 +
|-
 +
| Logistic Regression Model || YES ||  YES ||  NO
 +
|-
 +
| Naive Bayes Model Model || YES ||  NO ||  YES
 +
|-
 +
| Linear Regression Model || YES ||  YES ||  NO
 +
|-
 +
| Ridge Regression Model || YES ||  YES ||  NO
 +
|-
 +
| Decision Tree Classification Model || YES ||  YES ||  NO
 +
|-
 +
| Decision Tree Regression Model || YES ||  YES ||  NO
 +
|-
 +
| Random Forest Classification Model || YES ||  YES ||  NO
 +
|-
 +
| Random Forest Regression Model || YES ||  YES ||  NO
 +
|-
 +
| Kmeans Model || YES ||  NO ||  YES
 +
|-
 +
| Recommendation Model || YES ||  NO ||  NO
 +
|-
 +
| Word2Vec Model || NO ||  NO ||  YES
 +
|-
 +
| FP-growth Model || NO ||  NO ||  YES
 +
|}
 +
 
 +
 
 +
Most of models support cvs dataset format, however libsvm dataset format is only supported by LinearRegression Model, LogisticRegression Model and DecisionTree Mode.
 +
 
 +
==== CSV data format ====
 +
<pre>
 +
<label>,<value1>,<value2>, ... <valueN>
 +
</pre>
 +
 
 +
==== LibSVM data format ====
 +
<pre>
 +
<label> <index1>:<value1> <index2>:<value2> ... <indexN>:<valueN>
 +
</pre>
 +
 
 +
Text data format is only supported Word2Vec and FP-growth Model.
 +
Any other Model can not handle text data.
 +
 
 +
==== Text data format ====
 +
<pre>
 +
This is a text data format
 +
</pre>
 +
 
 +
=== DataSet URL ===
 +
 
 +
When creating a prediction model, user specify a "source_dataset_url" parameter which show the place where dataset is located.
 +
 
 +
A Source_dataset_url has two url types as follows:
 +
 
 +
==== Swift URL ====
 +
 
 +
Swift URL is used when creating a model from dataset in swift.
 +
 
 +
If it is no neccesary to parse a dataset, user can create a model from dataset in swift directly by specify source_data_url as below.
 +
 
 +
<pre>
 +
swift://<container_name>/<object_name>
 +
</pre>
 +
 
 +
==== Internal HDFS URL ====
 +
 
 +
Internal HDFS URL is used when creating a model from internal hdfs of meteos experiment.
 +
 
 +
"Internal" is meaning that dataset has been already downloaded or parsed by Meteos.
 +
 
 +
When creating a model from dataset in hdfs, user have to specify a url as below.
 +
 
 +
<pre>
 +
internal://<dataset_id>
 +
</pre>
 +
 
 +
== Meteos Prediction Model ==
 +
 
 +
Currently Meteos supports these following prediction models of Apache Spark.
  
 
Apache Spark has two machine learning libraries (MLlib and Ml).
 
Apache Spark has two machine learning libraries (MLlib and Ml).
Line 5: Line 95:
 
MLlib and ML has multiple prediction models by data mining and machine learning algorithms.
 
MLlib and ML has multiple prediction models by data mining and machine learning algorithms.
  
Currently Meteos supports these following prediction models.
+
=== MLlib ===
  
=== MLlib ===
+
==== Supervised learning ====
 +
 
 +
===== [http://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#module-pyspark.mllib.Classification Classification] =====
 +
 
 +
Logistic Regression Model
 +
 
 +
Naive Bayes Model
 +
 
 +
===== [http://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#module-pyspark.mllib.regression Regression] =====
 +
 
 +
Linear Regression Model
 +
 
 +
Ridge Regression Model
 +
 
 +
===== [http://spark.apache.org/docs/1.6.0/api/python/pyspark.mllib.html#module-pyspark.mllib.tree Tree] =====
 +
 
 +
Decision Tree Classification Model
 +
 
 +
Decision Tree Regression Model
 +
 
 +
Random Forest Classification Model
 +
 
 +
Random Forest Regression Model
  
[http://spark.apache.org/docs/1.6.0/mllib-linear-methods.html#linear-least-squares-lasso-and-ridge-regression LinearRegression Model]
+
==== Unsupervised learning ====
  
[http://spark.apache.org/docs/1.6.0/mllib-linear-methods.html#logistic-regression LogisticRegression Model]
+
===== [http://spark.apache.org/docs/1.6.0/mllib-clustering.html#k-means Kmeans Model] =====
  
[http://spark.apache.org/docs/1.6.0/mllib-decision-tree.html DecisionTree Model]
+
===== [http://spark.apache.org/docs/1.6.0/mllib-collaborative-filtering.html#collaborative-filtering Recommendation Model] =====
  
[http://spark.apache.org/docs/1.6.0/mllib-clustering.html#k-means Kmeans Model]
+
===== [http://spark.apache.org/docs/1.6.0/ml-features.html#word2vec Word2Vec Model] =====
  
[http://spark.apache.org/docs/1.6.0/mllib-collaborative-filtering.html#collaborative-filtering Recommendation Model]
+
===== [http://spark.apache.org/docs/1.6.0/mllib-frequent-pattern-mining.html FP-growth Model] =====
  
==== ML ====
+
=== ML ===
  
 
Not Supported.
 
Not Supported.

Latest revision as of 06:29, 23 March 2017

Meteos Dataset

Dataset is a data to create a prediction model.

DataSet Format

Meteos currently supports following data format.

Supported dataset format differs depending on prediction models.

Model/Dataset Format CSV data format LibSVM data format Text data format
Logistic Regression Model YES YES NO
Naive Bayes Model Model YES NO YES
Linear Regression Model YES YES NO
Ridge Regression Model YES YES NO
Decision Tree Classification Model YES YES NO
Decision Tree Regression Model YES YES NO
Random Forest Classification Model YES YES NO
Random Forest Regression Model YES YES NO
Kmeans Model YES NO YES
Recommendation Model YES NO NO
Word2Vec Model NO NO YES
FP-growth Model NO NO YES


Most of models support cvs dataset format, however libsvm dataset format is only supported by LinearRegression Model, LogisticRegression Model and DecisionTree Mode.

CSV data format

<label>,<value1>,<value2>, ... <valueN>

LibSVM data format

<label> <index1>:<value1> <index2>:<value2> ... <indexN>:<valueN>

Text data format is only supported Word2Vec and FP-growth Model. Any other Model can not handle text data.

Text data format

This is a text data format

DataSet URL

When creating a prediction model, user specify a "source_dataset_url" parameter which show the place where dataset is located.

A Source_dataset_url has two url types as follows:

Swift URL

Swift URL is used when creating a model from dataset in swift.

If it is no neccesary to parse a dataset, user can create a model from dataset in swift directly by specify source_data_url as below.

swift://<container_name>/<object_name>

Internal HDFS URL

Internal HDFS URL is used when creating a model from internal hdfs of meteos experiment.

"Internal" is meaning that dataset has been already downloaded or parsed by Meteos.

When creating a model from dataset in hdfs, user have to specify a url as below.

internal://<dataset_id>

Meteos Prediction Model

Currently Meteos supports these following prediction models of Apache Spark.

Apache Spark has two machine learning libraries (MLlib and Ml).

MLlib and ML has multiple prediction models by data mining and machine learning algorithms.

MLlib

Supervised learning

Classification

Logistic Regression Model

Naive Bayes Model

Regression

Linear Regression Model

Ridge Regression Model

Tree

Decision Tree Classification Model

Decision Tree Regression Model

Random Forest Classification Model

Random Forest Regression Model

Unsupervised learning

Kmeans Model
Recommendation Model
Word2Vec Model
FP-growth Model

ML

Not Supported.