Difference between revisions of "Meteos/DatasetsandModels"
(→DataSet Format) |
(→DataSet Format) |
||
Line 15: | Line 15: | ||
| Logistic Regression Model || YES || YES || NO | | Logistic Regression Model || YES || YES || NO | ||
|- | |- | ||
− | | Naive Bayes Model Model || YES || NO || | + | | Naive Bayes Model Model || YES || NO || YES |
|- | |- | ||
| Linear Regression Model || YES || YES || NO | | Linear Regression Model || YES || YES || NO | ||
Line 29: | Line 29: | ||
| Random Forest Regression Model || YES || YES || NO | | Random Forest Regression Model || YES || YES || NO | ||
|- | |- | ||
− | | Kmeans Model || YES || NO || | + | | Kmeans Model || YES || NO || YES |
|- | |- | ||
| Recommendation Model || YES || NO || NO | | Recommendation Model || YES || NO || NO |
Latest revision as of 06:29, 23 March 2017
Meteos Dataset
Dataset is a data to create a prediction model.
DataSet Format
Meteos currently supports following data format.
Supported dataset format differs depending on prediction models.
Model/Dataset Format | CSV data format | LibSVM data format | Text data format |
---|---|---|---|
Logistic Regression Model | YES | YES | NO |
Naive Bayes Model Model | YES | NO | YES |
Linear Regression Model | YES | YES | NO |
Ridge Regression Model | YES | YES | NO |
Decision Tree Classification Model | YES | YES | NO |
Decision Tree Regression Model | YES | YES | NO |
Random Forest Classification Model | YES | YES | NO |
Random Forest Regression Model | YES | YES | NO |
Kmeans Model | YES | NO | YES |
Recommendation Model | YES | NO | NO |
Word2Vec Model | NO | NO | YES |
FP-growth Model | NO | NO | YES |
Most of models support cvs dataset format, however libsvm dataset format is only supported by LinearRegression Model, LogisticRegression Model and DecisionTree Mode.
CSV data format
<label>,<value1>,<value2>, ... <valueN>
LibSVM data format
<label> <index1>:<value1> <index2>:<value2> ... <indexN>:<valueN>
Text data format is only supported Word2Vec and FP-growth Model. Any other Model can not handle text data.
Text data format
This is a text data format
DataSet URL
When creating a prediction model, user specify a "source_dataset_url" parameter which show the place where dataset is located.
A Source_dataset_url has two url types as follows:
Swift URL
Swift URL is used when creating a model from dataset in swift.
If it is no neccesary to parse a dataset, user can create a model from dataset in swift directly by specify source_data_url as below.
swift://<container_name>/<object_name>
Internal HDFS URL
Internal HDFS URL is used when creating a model from internal hdfs of meteos experiment.
"Internal" is meaning that dataset has been already downloaded or parsed by Meteos.
When creating a model from dataset in hdfs, user have to specify a url as below.
internal://<dataset_id>
Meteos Prediction Model
Currently Meteos supports these following prediction models of Apache Spark.
Apache Spark has two machine learning libraries (MLlib and Ml).
MLlib and ML has multiple prediction models by data mining and machine learning algorithms.
MLlib
Supervised learning
Classification
Logistic Regression Model
Naive Bayes Model
Regression
Linear Regression Model
Ridge Regression Model
Tree
Decision Tree Classification Model
Decision Tree Regression Model
Random Forest Classification Model
Random Forest Regression Model
Unsupervised learning
Kmeans Model
Recommendation Model
Word2Vec Model
FP-growth Model
ML
Not Supported.