Jump to: navigation, search

Sahara/SparkImageBuilder

< Sahara
Revision as of 12:28, 23 April 2015 by Daniele Venzano (talk | contribs) (Mention the experimental version)

Sahara image elements for Spark

This page provides some documentation on the image builder utility for Sahara, with focus on the Hadoop CDH and the Spark element.

Hadoop CDH

Spark can be deployed alongside the two main distributions for Hadoop, namely CDH and HDP. For this reason, the image builder contains a new CDH element to deploy a cloudera-based Hadoop install. Note that the element uses Ubuntu packages and not Cloudera parcels, nor the Cloudera Manager.

Spark

Currently, the image builder supports the 0.9.1 release of Spark. By default the official binary distribution is downloaded from the Spark website. By using environment variables a different distribution package can be used, for example one created by compiling Spark with the "make_distribution" script.

(*) Note that Spark uses the Hadoop-client library to talk to HDFS. Because the HDFS protocol has changed in different versions of Hadoop, you must build Spark against the same version that your cluster uses. By default, Spark links to Hadoop 1.0.4. You can change this by setting the SPARK_HADOOP_VERSION variable when compiling. A list of supported Hadoop distributions is available here: [1]

Additional notes

Spark is deployed in the standalone operational mode. The default configuration is (in summary) the following:

  • The default number of cores to give to applications (if they don't set spark.cores.max) is all available cores
  • The default total amount of memory for Spark applications is the total memory on the machine (VM) minus 1 GB
  • The default number of worker instances to run on each machine is 1. A single worker will try to use all the available cores

Currently, the Sahara Spark plugin exposes configuration options to modify the most important parameters.

Experimental version

On the Bigfoot project image builder page on GitHub you can find an updated version of the image builder, that generates images with more recent versions of Spark and CDH.