Jump to: navigation, search

Difference between revisions of "Sahara/SparkPlugin"

m (Sergey Lukjanov moved page Savanna/SparkPlugin to Sahara/SparkPlugin: Savanna project was renamed due to the trademark issues.)
 
(10 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Introduction ==
 
== Introduction ==
  
[http://spark.incubator.apache.org/ Spark] is an in-memory implementation of MapReduce written in Scala.<br/>
+
[http://spark.apache.org/ Spark] is a fast and general engine for large-scale data processing.<br/>
[https://blueprints.launchpad.net/savanna/+spec/spark-plugin This blueprint] proposes a Savanna provisioning plugin for Spark that can launch and resize Spark clusters and run EDP jobs.
+
[https://blueprints.launchpad.net/sahara/+spec/spark-plugin This blueprint] proposes a Sahara provisioning plugin for Spark that can launch and resize Spark clusters and run EDP jobs.
  
== Requirements ==
+
Currently, Spark i sused in "stand alone" deployment mode: as such, the Spark cluster will be suitable for EDP jobs and for individual spark applications (the cluster is not intended for a multi-tenant setup). Currently, there is no support for "Mesos" or "YARN" based deployments.
  
Support for version 0.8.0 of Spark and later is planned, since it has relaxed dependencies on Hadoop and HDFS library versions. Spark in ''standalone'' mode is targeted, there will be no no support for Mesos or YARN modes for now.
+
== Supported releases ==
 +
 
 +
This plugin only supports a Cloudera-based HDFS (CDH4, CDH5) data layer, but this limitation will be addressed by future releases.
 +
 
 +
The companion [https://github.com/openstack/sahara-image-elements Disk Image builder element], provided with this plugin, generates by default disk images containing Spark and Hadoop versions known to be working with the corresponding release of the Spark plugin. The following table shows supported versions for each OpenStack release:
 +
 
 +
{| class="wikitable"
 +
|-
 +
! OpenStack release !! Spark version !! Hadoop version !! Notes
 +
|-
 +
| Kilo and previous || 1.0.2 || CDH4 || EDP mostly working, Swift data source may not work out of the box.
 +
|-
 +
| Liberty (planned) || 1.3.1 (1.4.0) || CDH 5.3 || 1.3.1 has been merged, 1.4 under test, 1.0 has been deprecated
 +
|}
  
 
== Documentation ==
 
== Documentation ==
Notes about the changes to savanna-image-elements: [[Savanna/SparkImageBuilder]]<br/>
+
* How to use the Spark plugin: [[Sahara/SparkPluginNotes]]<br/>
Notes on using the Spark plugin: [[Savanna/SparkPluginNotes]]
+
* Notes about the changes to sahara-image-elements: [[Sahara/SparkImageBuilder]]
  
 
== Status ==
 
== Status ==
We are running unit and integration tests on the plugin, that is almost finished. In January we plan to publish the code for feedback and review.
+
Bleeding edge development is done on the [https://github.com/bigfootproject/sahara Bigfoot project Sahara page] on GitHub. Please check that version for support for more recent versions of Spark bug fixes and optimizations.
  
Development is done by: Do Huy-Hoang and Vo Thanh Phuc (Master students at Eurecom), Daniele Venzano (Research Engineer at Eurecom), under the supervision of Prof. Pietro Michiardi (at eurecom). This work is partially supported by the BigFoot project, a EC-funded research project.
+
Development is done by Daniele Venzano (Research Engineer at Eurecom) and Pietro Michiardi (Prof. at Eurecom). A preliminary version of the plugin was developed with the additional help of two Master students at Eurecom, Do Huy-Hoang and Vo Thanh Phuc.
 +
This work is partially supported by the BigFoot project, a EC-funded research project with grant agreement n. 317858.
  
 
== Related Resources ==
 
== Related Resources ==
* [[Savanna/PluggableProvisioning/PluginAPI]]
+
* [[Sahara/PluggableProvisioning/PluginAPI]]
* [https://blueprints.launchpad.net/savanna/+spec/spark-plugin Blueprint]
+
* [https://blueprints.launchpad.net/sahara/+spec/spark-plugin Blueprint]

Latest revision as of 07:18, 17 July 2015

Introduction

Spark is a fast and general engine for large-scale data processing.
This blueprint proposes a Sahara provisioning plugin for Spark that can launch and resize Spark clusters and run EDP jobs.

Currently, Spark i sused in "stand alone" deployment mode: as such, the Spark cluster will be suitable for EDP jobs and for individual spark applications (the cluster is not intended for a multi-tenant setup). Currently, there is no support for "Mesos" or "YARN" based deployments.

Supported releases

This plugin only supports a Cloudera-based HDFS (CDH4, CDH5) data layer, but this limitation will be addressed by future releases.

The companion Disk Image builder element, provided with this plugin, generates by default disk images containing Spark and Hadoop versions known to be working with the corresponding release of the Spark plugin. The following table shows supported versions for each OpenStack release:

OpenStack release Spark version Hadoop version Notes
Kilo and previous 1.0.2 CDH4 EDP mostly working, Swift data source may not work out of the box.
Liberty (planned) 1.3.1 (1.4.0) CDH 5.3 1.3.1 has been merged, 1.4 under test, 1.0 has been deprecated

Documentation

Status

Bleeding edge development is done on the Bigfoot project Sahara page on GitHub. Please check that version for support for more recent versions of Spark bug fixes and optimizations.

Development is done by Daniele Venzano (Research Engineer at Eurecom) and Pietro Michiardi (Prof. at Eurecom). A preliminary version of the plugin was developed with the additional help of two Master students at Eurecom, Do Huy-Hoang and Vo Thanh Phuc. This work is partially supported by the BigFoot project, a EC-funded research project with grant agreement n. 317858.

Related Resources