Jump to: navigation, search

Difference between revisions of "Sahara/SparkImageBuilder"

(Mention the experimental version)
(Remove outdated text and put links to maintained documents)
 
Line 1: Line 1:
 
== Sahara image elements for Spark ==
 
== Sahara image elements for Spark ==
This page provides some documentation on the image builder utility for Sahara, with focus on the Hadoop CDH and the Spark element.
+
This page provides links to updated documentation for the Spark element in the image builder utility for Sahara.
  
=== Hadoop CDH ===
+
* [[Sahara/SparkPlugin]] wiki page containing a table with supported versions
Spark can be deployed alongside the two main distributions for Hadoop, namely CDH and HDP. For this reason, the image builder contains a new CDH element to deploy a cloudera-based Hadoop install. Note that the element uses Ubuntu packages and not Cloudera parcels, nor the Cloudera Manager.
+
* [https://github.com/openstack/sahara-image-elements/tree/master/elements/spark/README.rst Spark element README]
 
+
* [https://github.com/openstack/sahara-image-elements/blob/master/diskimage-create/README.rst DIB README]
=== Spark ===
+
* [https://github.com/bigfootproject/sahara-image-elements Spark experimental DIB version]
Currently, the image builder supports the 0.9.1 release of Spark. By default the official binary distribution is downloaded from the Spark website. By using environment variables a different distribution package can be used, for example one created by compiling Spark with the "make_distribution" script.
 
 
 
(*) Note that Spark uses the Hadoop-client library to talk to HDFS. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster uses. By default, Spark links to Hadoop 1.0.4. You can change this by setting the SPARK_HADOOP_VERSION variable when compiling. A list of supported Hadoop distributions is available here: [http://spark.incubator.apache.org/docs/latest/hadoop-third-party-distributions.html]
 
 
 
=== Additional notes ===
 
Spark is deployed in the standalone operational mode. The default configuration is (in summary) the following:
 
* The default number of cores to give to applications (if they don't set spark.cores.max) is all available cores
 
* The default total amount of memory for Spark applications is the total memory on the machine (VM) minus 1 GB
 
* The default number of worker instances to run on each machine is 1. A single worker will try to use all the available cores
 
 
 
Currently, the Sahara Spark plugin exposes configuration options to modify the most important parameters.
 
 
 
=== Experimental version ===
 
On the [https://github.com/bigfootproject/sahara-image-elements Bigfoot project image builder page] on GitHub you can find an updated version of the image builder, that generates images with more recent versions of Spark and CDH.
 

Latest revision as of 08:21, 8 July 2015

Sahara image elements for Spark

This page provides links to updated documentation for the Spark element in the image builder utility for Sahara.