Difference between revisions of "Sahara/SparkImageBuilder"

Latest revision as of 08:21, 8 July 2015

Sahara image elements for Spark

This page provides links to updated documentation for the Spark element in the image builder utility for Sahara.

Sahara/SparkPlugin wiki page containing a table with supported versions
Spark element README
DIB README
Spark experimental DIB version

@@ Line 1: / Line 1: @@
-== Savanna image elements for SPARK ==
+== Sahara image elements for Spark ==
+This page provides links to updated documentation for the Spark element in the image builder utility for Sahara.
-This page provides some documentation to the image builder utility for Savanna, with focus on the spark element.
+* [[Sahara/SparkPlugin]] wiki page containing a table with supported versions
+* [https://github.com/openstack/sahara-image-elements/tree/master/elements/spark/README.rst Spark element README]
-=== Hadoop version: ===
+* [https://github.com/openstack/sahara-image-elements/blob/master/diskimage-create/README.rst DIB README]
-Spark can be deployed alongside the two main distributions for Hadoop, namely CDH and HDP. For this reason, the image builder contains a new CDH element to deploy a cloudera-based hadoop install. Note that the element uses Ubuntu packages and not Cloudera parcels, nor the Cloudera Manager.
+* [https://github.com/bigfootproject/sahara-image-elements Spark experimental DIB version]
-=== Spark: ===
-Currently, the image builder supports the 0.8.1 release of Spark. Downloading binaries is an option, compiling source is not (you need to run javac from a chroot environment). Currently, Spark is built from sources and packaged into a "distribution" (using make-distribution.sh in spark): the distribution only contains the jars for a standalone spark deployment compiled for CDH 4.5, Hadoop 1(*). The distribution is downloaded from a repository during the VM image creation.
-(*) Note that Spark uses the Hadoop-client library to talk to HDFS. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster uses. By default, Spark links to Hadoop 1.0.4. You can change this by setting the SPARK_HADOOP_VERSION variable when compiling. A list of supported Hadoop distributions is available here: [http://spark.incubator.apache.org/docs/latest/hadoop-third-party-distributions.html [link<nowiki>]</nowiki>]
-=== Additional notes: ===
-Spark is deployed in the standalone operational mode: this means there's an individual spark process per slave machine. If not configured properly, the spark slave process may underutilize the provisioned VM, depending on the flavor thereof. Currently, savanna exposes spark configuration options, that can be set accordingly to the cluster deployed by savanna.