Apache Spark Installation and Programming Guide

This document provides a step-by-step guide to install and configure Apache Spark in standalone mode on a single machine. It explains how to download and extract the Spark files, set the SPARK_HOME environment variable, and launch the Spark shell. It also provides a basic Java code example to count the words in a file using Spark.

Uploaded by

suneha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

159 views2 pages

Apache Spark Installation and Programming Guide

Uploaded by

suneha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Apache Spark Installation and Programming Guide

This is a step-by-step guide to install Apache Spark. Spark can be configured with multiple cluster
managers like YARN or in local mode and standalone mode.

StandaloneDeployMode
In this practical, you will be configuring Spark to run in standalone mode. Both driver and worker
nodes run on the same machine.
Since we use Java to write and run programs on Spark, ensure that Java 8 is pre-installed on
the machines on which you have to run Spark job.
To install Spark on the machine, you would download prebuilt binary of Spark from
http://spark.apache.org/downloads.html page.

Select the spark distribution as shown in the below snapshot:

You can also directly download Spark-1.6.1 by using the following command:

wget http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.4.tgz

Decompress the Spark file into the directory where you want to store Spark.

tar xvf spark-1.6.1-bin-hadoop2.4.tgz C /DeZyre

Make a softlink to the actual spark directory (This will be helpful for any version upgrade in future)

ln -s spark-1.5.2-bin-hadoop2.4 spark

Make an entry for spark in .bashrc file

SPARK_HOME=/mydirectory/spark

export PATH=$SPARK_HOME/bin:$PATH
Source the changed .bashrc file by the command

source ~/.bashrc

We have successfully configured spark in standalone mode. To check lets launch the Spark Shell by
the following command:

spark-shell

To check the Sparks Scala shell version by the following command

sc.version

WritingProgram
Next we will write a basic Java application to count a word in a file. Below is the source code for the
Word Count program in Apache Spark. You also need to import some Spark classes into your program.
You also need to include the path for the file to be used.

JavaRDD<String> textFile = sc.textFile("hdfs://...");

JavaRDD<String> words = textFile.flatMap(new FlatMapFunction<String,

String>() {

public Iterable<String> call(String s) { return Arrays.asList(s.split("

")); }

});

JavaPairRDD<String, Integer> pairs = words.mapToPair(new

PairFunction<String, String, Integer>() {

public Tuple2<String, Integer> call(String s) { return new Tuple2<String,

Integer>(s, 1); }

});

JavaPairRDD<String, Integer> counts = pairs.reduceByKey(new

Function2<Integer, Integer, Integer>() {

public Integer call(Integer a, Integer b) { return a + b; }

});

counts.saveAsTextFile("hdfs://...");

Spark-Tutorial - IV - Python
No ratings yet
Spark-Tutorial - IV - Python
212 pages
What Is Apache Spark?
No ratings yet
What Is Apache Spark?
232 pages
Apache Spark
No ratings yet
Apache Spark
100 pages
Unit 4 Spark Updated
No ratings yet
Unit 4 Spark Updated
86 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Apache Spark Tutorial
100% (1)
Apache Spark Tutorial
6 pages
Spark Interview Questions PDF 2
No ratings yet
Spark Interview Questions PDF 2
19 pages
Spark Introduction
No ratings yet
Spark Introduction
19 pages
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
No ratings yet
DS-CMPSC 410 Topic 5 Spark-Submit, Persist, GroupBy
45 pages
SABDE3G06 Big Data Sparks
No ratings yet
SABDE3G06 Big Data Sparks
57 pages
Spark Overview: Security
No ratings yet
Spark Overview: Security
4 pages
Installation Et Configuration de Spark
No ratings yet
Installation Et Configuration de Spark
14 pages
Installing Apache Spark and Scala: Windows
No ratings yet
Installing Apache Spark and Scala: Windows
3 pages
spark
No ratings yet
spark
160 pages
Lec - Spark
No ratings yet
Lec - Spark
65 pages
Spark PPT
No ratings yet
Spark PPT
55 pages
4.2. Spark Applications
No ratings yet
4.2. Spark Applications
19 pages
Unit V
No ratings yet
Unit V
23 pages
Cisco Routing Switching - Unlocked PDF
No ratings yet
Cisco Routing Switching - Unlocked PDF
200 pages
BDALab Assn5
No ratings yet
BDALab Assn5
16 pages
Part-B Assignment No. 3
No ratings yet
Part-B Assignment No. 3
5 pages
Step 1: Verifying Java Installation: Download Scala
No ratings yet
Step 1: Verifying Java Installation: Download Scala
3 pages
Sumit Kothari Apache Spark and Scala Practical 17
No ratings yet
Sumit Kothari Apache Spark and Scala Practical 17
18 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Spark Python Install
No ratings yet
Spark Python Install
3 pages
Practical 11cdscds
No ratings yet
Practical 11cdscds
4 pages
Installing Spark
No ratings yet
Installing Spark
20 pages
Install+Apache+Spark+in+a+Standalone+Mode+on+Windows
No ratings yet
Install+Apache+Spark+in+a+Standalone+Mode+on+Windows
11 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
29_PDFsam_apache_spark_tutorial
No ratings yet
29_PDFsam_apache_spark_tutorial
7 pages
Integration of Python With Hadoop and Spark
No ratings yet
Integration of Python With Hadoop and Spark
10 pages
22_PDFsam_apache_spark_tutorial
No ratings yet
22_PDFsam_apache_spark_tutorial
7 pages
Final Note
No ratings yet
Final Note
31 pages
645a183e12b85
No ratings yet
645a183e12b85
5 pages
Iouu
No ratings yet
Iouu
12 pages
8_PDFsam_apache_spark_tutorial
No ratings yet
8_PDFsam_apache_spark_tutorial
7 pages
Big Data Computing Spark Basics and RDD: Ke Yi
No ratings yet
Big Data Computing Spark Basics and RDD: Ke Yi
43 pages
Configuration - Spark 2.3.2 Documentation
No ratings yet
Configuration - Spark 2.3.2 Documentation
20 pages
Apache Spark Installation
No ratings yet
Apache Spark Installation
4 pages
CIS612_SparkInstallation_Ubuntun
No ratings yet
CIS612_SparkInstallation_Ubuntun
10 pages
Bda 5
No ratings yet
Bda 5
21 pages
Unit 5
100% (1)
Unit 5
109 pages
1_PDFsam_apache_spark_tutorial
No ratings yet
1_PDFsam_apache_spark_tutorial
7 pages
Part B Assignment No 13
No ratings yet
Part B Assignment No 13
4 pages
Cse3002 Big Data m3 Detailed
No ratings yet
Cse3002 Big Data m3 Detailed
39 pages
Apache Spark Tutorial
100% (4)
Apache Spark Tutorial
36 pages
Pyspark_notes_new
No ratings yet
Pyspark_notes_new
18 pages
Apache Spark and Ignite
No ratings yet
Apache Spark and Ignite
4 pages
Remote Mysql Server
No ratings yet
Remote Mysql Server
58 pages
Broadband Error Codes
50% (2)
Broadband Error Codes
30 pages
Types of Graphs
No ratings yet
Types of Graphs
13 pages
Chapter 3 spark
No ratings yet
Chapter 3 spark
6 pages
Unit IV spark
No ratings yet
Unit IV spark
23 pages
Open Modbus TCP for NCM_CP Redundant V2 English
No ratings yet
Open Modbus TCP for NCM_CP Redundant V2 English
87 pages
TC Ar12
No ratings yet
TC Ar12
149 pages
How To Connect Two Routers
No ratings yet
How To Connect Two Routers
7 pages
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
No ratings yet
Name: Wable Snehal Mahesh Subject:-Scala & Spark Div: - Mba Ii Roll No: - 57 Guidence Name: - Prof. Archana Suryawanshi - Kadam
11 pages
IPWorks Network Connectivity Overview1
No ratings yet
IPWorks Network Connectivity Overview1
29 pages
Vodafone Introduction
No ratings yet
Vodafone Introduction
11 pages
Day1 Main
No ratings yet
Day1 Main
188 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
A Lightweight Block Encryption Algorithm For Narrowband Internet of Things
No ratings yet
A Lightweight Block Encryption Algorithm For Narrowband Internet of Things
19 pages
bda unit 5 - mam
No ratings yet
bda unit 5 - mam
44 pages
Clean Log
No ratings yet
Clean Log
14 pages
07-Module 5 - Implementing IPv4
No ratings yet
07-Module 5 - Implementing IPv4
50 pages
Gay Bars Torremolinos
No ratings yet
Gay Bars Torremolinos
1 page
1.1 Project Description: Modules Involved in The Project: Students Module
No ratings yet
1.1 Project Description: Modules Involved in The Project: Students Module
51 pages
H12-811-Enu V8.02
No ratings yet
H12-811-Enu V8.02
102 pages
HA350786
No ratings yet
HA350786
2 pages
Component Diagram in UML 2.0: Veronica Carrega
No ratings yet
Component Diagram in UML 2.0: Veronica Carrega
35 pages
pLINES Datasheet FP V2.22
0% (1)
pLINES Datasheet FP V2.22
2 pages
Dynamic VLAN Assignment Using RADIUS
No ratings yet
Dynamic VLAN Assignment Using RADIUS
6 pages
MXXT Industrial Ethernet Remote IO Module User Manual V1.0
No ratings yet
MXXT Industrial Ethernet Remote IO Module User Manual V1.0
25 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
An Enhanced Weighted Performance-Based Handover Parameter Optimization Algorithm For LTE Networks
No ratings yet
An Enhanced Weighted Performance-Based Handover Parameter Optimization Algorithm For LTE Networks
11 pages
S5750E-SI - Datasheet - S5750E-SI - Datasheet - v4v4 8 Including 24F
No ratings yet
S5750E-SI - Datasheet - S5750E-SI - Datasheet - v4v4 8 Including 24F
8 pages
WS-011 Windows Server 2019 Administration
No ratings yet
WS-011 Windows Server 2019 Administration
58 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
7 pages
Rajatarangini - Shri Ramtej Shastri Pandey - Part1
No ratings yet
Rajatarangini - Shri Ramtej Shastri Pandey - Part1
6 pages
IoT_Architecture_Detailed_Notes
No ratings yet
IoT_Architecture_Detailed_Notes
2 pages
Sx1276/77/78 - 137-1050 MHZ Ultra Low Power Long Range Transceiver
No ratings yet
Sx1276/77/78 - 137-1050 MHZ Ultra Low Power Long Range Transceiver
1 page
RX1 Datasheet
No ratings yet
RX1 Datasheet
7 pages
DH-IPC-HDW2439T-AS-LED-S2: 4MP Lite Full-Color Fixed-Focal Eyeball Network Camera
No ratings yet
DH-IPC-HDW2439T-AS-LED-S2: 4MP Lite Full-Color Fixed-Focal Eyeball Network Camera
3 pages
Directory File(s) File(s) Sourcefile Targetfile
No ratings yet
Directory File(s) File(s) Sourcefile Targetfile
2 pages
Sentry IT Data Sheet - Sentry 5000-IT Controller Rev. A1 FS-B3512-05
No ratings yet
Sentry IT Data Sheet - Sentry 5000-IT Controller Rev. A1 FS-B3512-05
2 pages
Fast Data Processing Systems with SMACK Stack
From Everand
Fast Data Processing Systems with SMACK Stack
Raúl Estrada
No ratings yet
Spark: Big Data Cluster Computing in Production
From Everand
Spark: Big Data Cluster Computing in Production
Ilya Ganelin
No ratings yet
Learning Apache Spark 2
From Everand
Learning Apache Spark 2
Muhammad Asif Abbasi
No ratings yet
A concise guide to PHP MySQL and Apache
From Everand
A concise guide to PHP MySQL and Apache
alasdair gilchrist
4/5 (2)