Jump to: navigation, search

Difference between revisions of "Sahara/SparkImageBuilder"

(Created page with "== Savanna image elements for SPARK == This page provides some documentation to the image builder utility for Savanna, with focus on the spark element. === Hadoop version: =...")
 
m (Sergey Lukjanov moved page Savanna/SparkImageBuilder to Sahara/SparkImageBuilder: Savanna project was renamed due to the trademark issues.)
(No difference)

Revision as of 15:41, 7 March 2014

Savanna image elements for SPARK

This page provides some documentation to the image builder utility for Savanna, with focus on the spark element.

Hadoop version:

Spark can be deployed alongside the two main distributions for Hadoop, namely CDH and HDP. For this reason, the image builder contains a new CDH element to deploy a cloudera-based hadoop install. Note that the element uses Ubuntu packages and not Cloudera parcels, nor the Cloudera Manager.

Spark:

Currently, the image builder supports the 0.8.1 release of Spark. Downloading binaries is an option, compiling source is not (you need to run javac from a chroot environment). Currently, Spark is built from sources and packaged into a "distribution" (using make-distribution.sh in spark): the distribution only contains the jars for a standalone spark deployment compiled for CDH 4.5, Hadoop 1(*). The distribution is downloaded from a repository during the VM image creation.

(*) Note that Spark uses the Hadoop-client library to talk to HDFS. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster uses. By default, Spark links to Hadoop 1.0.4. You can change this by setting the SPARK_HADOOP_VERSION variable when compiling. A list of supported Hadoop distributions is available here: [link]

Additional notes:

Spark is deployed in the standalone operational mode: this means there's an individual spark process per slave machine. If not configured properly, the spark slave process may underutilize the provisioned VM, depending on the flavor thereof. Currently, savanna exposes spark configuration options, that can be set accordingly to the cluster deployed by savanna.