Difference between revisions of "Sahara/SparkPlugin"
(→Changed section title) |
|||
Line 4: | Line 4: | ||
[https://blueprints.launchpad.net/sahara/+spec/spark-plugin This blueprint] proposes a Sahara provisioning plugin for Spark that can launch and resize Spark clusters and run EDP jobs. | [https://blueprints.launchpad.net/sahara/+spec/spark-plugin This blueprint] proposes a Sahara provisioning plugin for Spark that can launch and resize Spark clusters and run EDP jobs. | ||
− | + | EDP support is in-progress, as some Sahara core code changes are needed to support Spark jobs. | |
We are currently testing a more general plugin to support [http://shark.cs.berkeley.edu/ Shark], one of the Spark related projects. Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. | We are currently testing a more general plugin to support [http://shark.cs.berkeley.edu/ Shark], one of the Spark related projects. Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users. | ||
Line 10: | Line 10: | ||
== Supported releases == | == Supported releases == | ||
− | This plugin supports Spark version 0 | + | This plugin supports Spark version 1.0.1. Currently, the deployment mode is "stand alone": as such, the Spark cluster will be suitable for EDP jobs and for individual spark applications (the cluster is not intended for a multi-tenant setup). Currently, there is no support for "Mesos" or "YARN" based deployments. Additionally, this plugin only supports a Cloudera-based HDFS (CDH4, CDH5) data layer. Future releases will relax such limitations. |
The companion DIB element provided with this plugin generates disk images according to the configuration described above. | The companion DIB element provided with this plugin generates disk images according to the configuration described above. |
Revision as of 14:45, 24 October 2014
Introduction
Spark is a fast and general engine for large-scale data processing.
This blueprint proposes a Sahara provisioning plugin for Spark that can launch and resize Spark clusters and run EDP jobs.
EDP support is in-progress, as some Sahara core code changes are needed to support Spark jobs.
We are currently testing a more general plugin to support Shark, one of the Spark related projects. Shark is an open source distributed SQL query engine for Hadoop data. It brings state-of-the-art performance and advanced analytics to Hive users.
Supported releases
This plugin supports Spark version 1.0.1. Currently, the deployment mode is "stand alone": as such, the Spark cluster will be suitable for EDP jobs and for individual spark applications (the cluster is not intended for a multi-tenant setup). Currently, there is no support for "Mesos" or "YARN" based deployments. Additionally, this plugin only supports a Cloudera-based HDFS (CDH4, CDH5) data layer. Future releases will relax such limitations.
The companion DIB element provided with this plugin generates disk images according to the configuration described above.
Documentation
- How to use the Spark plugin: Sahara/SparkPluginNotes
- Notes about the changes to sahara-image-elements: Sahara/SparkImageBuilder
Status
Development is done by Daniele Venzano (Research Engineer at Eurecom) and Pietro Michiardi (Prof. at Eurecom). A preliminary version of the plugin was developed with the additional help of two Master students at Eurecom, Do Huy-Hoang and Vo Thanh Phuc. This work is partially supported by the BigFoot project, a EC-funded research project with grant agreement n. 317858.