0% found this document useful (0 votes)
149 views

Apache Spark Installation and Programming Guide

This document provides a step-by-step guide to install and configure Apache Spark in standalone mode on a single machine. It explains how to download and extract the Spark files, set the SPARK_HOME environment variable, and launch the Spark shell. It also provides a basic Java code example to count the words in a file using Spark.

Uploaded by

suneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
149 views

Apache Spark Installation and Programming Guide

This document provides a step-by-step guide to install and configure Apache Spark in standalone mode on a single machine. It explains how to download and extract the Spark files, set the SPARK_HOME environment variable, and launch the Spark shell. It also provides a basic Java code example to count the words in a file using Spark.

Uploaded by

suneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Apache Spark Installation and Programming Guide

This is a step-by-step guide to install Apache Spark. Spark can be configured with multiple cluster
managers like YARN or in local mode and standalone mode.

StandaloneDeployMode
In this practical, you will be configuring Spark to run in standalone mode. Both driver and worker
nodes run on the same machine.
Since we use Java to write and run programs on Spark, ensure that Java 8 is pre-installed on
the machines on which you have to run Spark job.
To install Spark on the machine, you would download prebuilt binary of Spark from
http://spark.apache.org/downloads.html page.

Select the spark distribution as shown in the below snapshot:

You can also directly download Spark-1.6.1 by using the following command:

wget http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.4.tgz

Decompress the Spark file into the directory where you want to store Spark.

tar xvf spark-1.6.1-bin-hadoop2.4.tgz C /DeZyre

Make a softlink to the actual spark directory (This will be helpful for any version upgrade in future)

ln -s spark-1.5.2-bin-hadoop2.4 spark

Make an entry for spark in .bashrc file

SPARK_HOME=/mydirectory/spark

export PATH=$SPARK_HOME/bin:$PATH
Source the changed .bashrc file by the command

source ~/.bashrc

We have successfully configured spark in standalone mode. To check lets launch the Spark Shell by
the following command:

spark-shell

To check the Sparks Scala shell version by the following command

sc.version

WritingProgram
Next we will write a basic Java application to count a word in a file. Below is the source code for the
Word Count program in Apache Spark. You also need to import some Spark classes into your program.
You also need to include the path for the file to be used.

JavaRDD<String> textFile = sc.textFile("hdfs://...");

JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String,

String>() {

public Iterable<String> call(String s) { return Arrays.asList(s.split("

")); }

});

JavaPairRDD<String, Integer> pairs = words.mapToPair(new

PairFunction<String, String, Integer>() {

public Tuple2<String, Integer> call(String s) { return new Tuple2<String,

Integer>(s, 1); }

});

JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new

Function2<Integer, Integer, Integer>() {

public Integer call(Integer a, Integer b) { return a + b; }

});

counts.saveAsTextFile("hdfs://...");

You might also like