Welcome to Scribd!

0% found this document useful (0 votes)

38 views

Spark 101 - Overview and Efficient Use

Uploaded by

This document provides an overview of Spark and how to efficiently use it. It discusses Spark's RDD and DataFrame models for parallel computing, how Spark supports a wide range of input formats and can persist data for faster reuse. It also covers the difference between narrow and wide dependencies and the anatomy of a Spark job in terms of DAGs, jobs, stages and tasks.

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Spark 101 - Overview and Efficient Use

Uploaded by

divya kolluri

0% found this document useful (0 votes)

38 views9 pages

Original Description:

spark

Original Title

Spark Session 4

Copyright

Available Formats

PPTX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

0% found this document useful (0 votes)

38 views9 pages

Spark 101 - Overview and Efficient Use

Uploaded by

divya kolluri

Copyright:

Available Formats

Download as PPTX, PDF, TXT or read online from Scribd

Download as pptx, pdf, or txt

Jump to Page

You are on page 1of 9

Search inside document

Spark 101– Overview and Efficient Use

Spark Model of Parallel Computing:

RDDs & Dataframe

 Immutable and fault-tolerant

 Support vast spectrum of Input

formats such as local file, hdfs, ES

 Can be Persisted for faster reuse

 Dataframe is distributed equivalent

for pandas or R dataframe

 Lazily Evaluated
Performance Comparison : RDD &
Dataframe
Lazy Evaluation

 Transformations

 Action

 Directed Acyclic Graph

 Logical and Execution plan

Unified Spark Stack
Spark SQL Flow
Wide Versus Narrow Dependencies

 Narrow Dependencies, e.g. –

map, filter, flatmap etc.

 Wide Dependencies, e.g. – sort,

join, groupByKey etc.
The Anatomy of a Spark Job

 The DAG

 JOB

 Stages

 Tasks

Sas 1
Document292 pages
Sas 1
divya kolluri
100% (1)
Sas 1
Document292 pages
Sas 1
divya kolluri
100% (1)
Hilti AM Threaded Rod
Document9 pages
Hilti AM Threaded Rod
m.ali
No ratings yet
Big Data Engineering - PySpark
Document120 pages
Big Data Engineering - PySpark
consistent thoughts
100% (1)
Microsoft Excel: Advanced Spreadsheet Skills
Document4 pages
Microsoft Excel: Advanced Spreadsheet Skills
Noel Arroyo Jr.
No ratings yet
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
Document96 pages
Intro To Apache Spark: Credits To CS 347-Stanford Course, 2015, Reynold Xin, Databricks (Spark Provider)
Costi Stoian
No ratings yet
Spark Interview Questions 1713805760
Document40 pages
Spark Interview Questions 1713805760
monachatterjee962
No ratings yet
Bda Unit 6
Document14 pages
Bda Unit 6
belwalkarvarad
No ratings yet
SPARK
Document125 pages
SPARK
Nessrin Hamdi
No ratings yet
C5-SPARK Technology
Document39 pages
C5-SPARK Technology
rjiba0ilef
No ratings yet
DE Sample Resume
Document6 pages
DE Sample Resume
Sri Guru
No ratings yet
Spark A To Z
Document63 pages
Spark A To Z
Sozha Vendhan
No ratings yet
Bda 7
Document4 pages
Bda 7
sdk1972003
No ratings yet
Introduction To Spark
Document54 pages
Introduction To Spark
Sana Khan
No ratings yet
Harsh - Data Engineer
Document8 pages
Harsh - Data Engineer
venkat k
No ratings yet
Spark Interview Questions 04
Document4 pages
Spark Interview Questions 04
Satya Priya
No ratings yet
Cloudera Developer Training For Apache Spark
Document3 pages
Cloudera Developer Training For Apache Spark
kesh
No ratings yet
Top Answers To Spark Interview Questions
Document32 pages
Top Answers To Spark Interview Questions
Nitin Gorde
No ratings yet
Spark BD
Document9 pages
Spark BD
Mohamed H. Mokarab
No ratings yet
Apache Spark Components
Document4 pages
Apache Spark Components
nitinlucky
No ratings yet
Apache Spark Quick Guide
Document21 pages
Apache Spark Quick Guide
Oumaima Alfa
100% (1)
Spark
Document9 pages
Spark
Mohamed H. Mokarab
No ratings yet
BDA1
Document17 pages
BDA1
pswarupa607
No ratings yet
Lecture 25
Document59 pages
Lecture 25
SandraPerera
No ratings yet
Spark SQL
Document25 pages
Spark SQL
Rishi
No ratings yet
Introduction To Big Data Technologies
Document10 pages
Introduction To Big Data Technologies
indolent56
No ratings yet
Top Answers To Spark Interview Questions
Document32 pages
Top Answers To Spark Interview Questions
srinivas75k
No ratings yet
Naukri KrishnaRao (12y 0m)
Document4 pages
Naukri KrishnaRao (12y 0m)
nataraj.rs00
No ratings yet
Dhanush Bigdata Resume Updated
Document9 pages
Dhanush Bigdata Resume Updated
Nishant Kumar
No ratings yet
Unit 5 Note
Document18 pages
Unit 5 Note
Sashikanth chowdary
No ratings yet
Introduction To Spark PDF
Document37 pages
Introduction To Spark PDF
97 Tanuja Neharkar
No ratings yet
Ch. 4
Document4 pages
Ch. 4
Xenos Playground aka Boxman Studios
No ratings yet
Apache Spark
Document27 pages
Apache Spark
Muhammad
No ratings yet
BDA Unit-6
Document11 pages
BDA Unit-6
status wind sk
No ratings yet
John Pual
Document10 pages
John Pual
ssreddy.data
No ratings yet
Hadoop Vs Spark
Document2 pages
Hadoop Vs Spark
ahmed77fouad23
No ratings yet
Spark Context, Resilient Distributed Datasets
Document36 pages
Spark Context, Resilient Distributed Datasets
cakvlr
No ratings yet
PySpark Core Print
Document8 pages
PySpark Core Print
Sidheshwar Kumbhar
No ratings yet
Prashanth - Data Engineer
Document8 pages
Prashanth - Data Engineer
dailyreq8
No ratings yet
R01 1
Document7 pages
R01 1
vitig2
No ratings yet
Unit 5
Document109 pages
Unit 5
Rajesh Kumar Rakasula
100% (1)
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
Document8 pages
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
sukhpreet singh
No ratings yet
Cheat Sheet v2
Document3 pages
Cheat Sheet v2
ycong
No ratings yet
Top Answers To Spark Interview Questions
Document4 pages
Top Answers To Spark Interview Questions
Ejaz Alam
No ratings yet
Shark
Document24 pages
Shark
kapilkashyap3105
No ratings yet
Intro To Apache Spark
Document66 pages
Intro To Apache Spark
Yohanes Eka Wibawa
No ratings yet
Data Engineering Guide for Beginners: Part 2
From Everand
Data Engineering Guide for Beginners: Part 2
Allan Murray
No ratings yet
Learn Well Technocraft: Hadoop/Big Data Syllabus
Document12 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
SONAL S.K
No ratings yet
Satya Sandeep - Data Engineer Resume
Document8 pages
Satya Sandeep - Data Engineer Resume
venkat k
No ratings yet
Spark Syllabus 1
Document3 pages
Spark Syllabus 1
Prabhakar Prabhu
No ratings yet
Apache Spark Interview Questions
Document12 pages
Apache Spark Interview Questions
varun3dec1
No ratings yet
Spark2x: Big Data Huawei Course
Document25 pages
Spark2x: Big Data Huawei Course
Thiago Siqueira
No ratings yet
Spark Notes
Document37 pages
Spark Notes
bhargavi
No ratings yet
spark_sql
Document18 pages
spark_sql
23mca005
No ratings yet
SPARK
Document35 pages
SPARK
leila maamouch
No ratings yet
Report SQL PDF
Document21 pages
Report SQL PDF
Rambabu Alokam
No ratings yet
Apache Spark Explanation
Document9 pages
Apache Spark Explanation
levin696
No ratings yet
Teja DE
Document9 pages
Teja DE
suryabs2106
No ratings yet
Extended Spark Interview QA
Document3 pages
Extended Spark Interview QA
Adarsh Ranjan
No ratings yet
DVS SPARK Course Content: Module 1 - Introduction and Evolution of Apache Spark
Document2 pages
DVS SPARK Course Content: Module 1 - Introduction and Evolution of Apache Spark
JayaramReddy
No ratings yet
DVS SPARK Course Content PDF
Document2 pages
DVS SPARK Course Content PDF
JayaramReddy
No ratings yet
Spark SQL
Document24 pages
Spark SQL
Jaswanth Chowdarys
No ratings yet
What Is Spark?: History of Apache Spark
Document65 pages
What Is Spark?: History of Apache Spark
Apurva
No ratings yet
Rajasekhar Bigdata Developer
Document7 pages
Rajasekhar Bigdata Developer
Vipin Goswami
No ratings yet
R Session A
Document107 pages
R Session A
divya kolluri
No ratings yet
Python Programming Concepts
Document5 pages
Python Programming Concepts
divya kolluri
No ratings yet
Data Exploration & Visualization
Document23 pages
Data Exploration & Visualization
divya kolluri
No ratings yet
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
Document11 pages
Apache Spark Ecosystem - Complete Spark Components Guide: 1. Objective
divya kolluri
No ratings yet
1 PDF
Document10 pages
1 PDF
divya kolluri
No ratings yet
Specialist Base Programming
Document4 pages
Specialist Base Programming
divya kolluri
No ratings yet
Kinematics of Machines
Document227 pages
Kinematics of Machines
dpksobs
100% (4)
Le Cun Support
Document77 pages
Le Cun Support
Angéniol
No ratings yet
Final Thesis by Rameta Gelalcha
Document81 pages
Final Thesis by Rameta Gelalcha
Abudi Kasahun
No ratings yet
ARRB AAM3 Vol3 1982 PDF
Document525 pages
ARRB AAM3 Vol3 1982 PDF
salad ass
No ratings yet
Baxter Magolda Presentation
Document19 pages
Baxter Magolda Presentation
Josue Claudio Dantas
No ratings yet
Simpel Noted PDF
Document5 pages
Simpel Noted PDF
sanjay kumar
No ratings yet
Icrosoft Aint: Paint Brush Screenshot - Windows XP
Document9 pages
Icrosoft Aint: Paint Brush Screenshot - Windows XP
rakeshnandiymail
No ratings yet
Datasheet-Sanmac-4571-En-V2019-12-06 08 - 23 Version 1
Document10 pages
Datasheet-Sanmac-4571-En-V2019-12-06 08 - 23 Version 1
edr
No ratings yet
PMI-ACP-New Demo-10Qs
Document5 pages
PMI-ACP-New Demo-10Qs
Melwin Dsouza
No ratings yet
APA Format Find The Citation Error
Document25 pages
APA Format Find The Citation Error
Maryam Al Marzouqi
No ratings yet
MSC Project Proposal
Document4 pages
MSC Project Proposal
Usman Chaudhary
No ratings yet
LP Examples
Document2 pages
LP Examples
RajeevManojM
No ratings yet
A Wavefront Is The Locus of Points (Wavelets) Having The Same Phase of Oscillations. Wavefront
Document22 pages
A Wavefront Is The Locus of Points (Wavelets) Having The Same Phase of Oscillations. Wavefront
Jinshy Vinod
No ratings yet
Odes 850
Document97 pages
Odes 850
Elmir Besiri
No ratings yet
Ethiopian History 19 Century
Document26 pages
Ethiopian History 19 Century
muluken Asrat
No ratings yet
390 33 Powerpoint Slides 2 Essentials Grammar Chapter 2
Document83 pages
390 33 Powerpoint Slides 2 Essentials Grammar Chapter 2
Atif Imam
No ratings yet
Model Question Science 2077
Document5 pages
Model Question Science 2077
Uk King
No ratings yet
Apache 2k.log
Document58 pages
Apache 2k.log
curso
No ratings yet
Digital Design Through Verilog HDL
Document4 pages
Digital Design Through Verilog HDL
mukesh_sonu
100% (1)
About: A Legacy of Preparing Leaders
Document4 pages
About: A Legacy of Preparing Leaders
Aneel kumar
No ratings yet
Oral Communication: Sections: Grade 11 (STEM, ABM, HUMSS-A, HE-A, HUMSS-C & HE-C)
Document2 pages
Oral Communication: Sections: Grade 11 (STEM, ABM, HUMSS-A, HE-A, HUMSS-C & HE-C)
Gladyz Artjane Quilab Flores - Castillo
No ratings yet
Prerna NTCC Report
Document20 pages
Prerna NTCC Report
Prerna Roy
No ratings yet
IOS 14 - Apple
Document1 page
IOS 14 - Apple
Tomáš Ptáček
No ratings yet
Chapter One
Document20 pages
Chapter One
Woldie Kassie
No ratings yet
PHDCCI HR Conclave-2024
Document4 pages
PHDCCI HR Conclave-2024
Arunima Koul Jalali
No ratings yet
Desktopfs Pkgs
Document19 pages
Desktopfs Pkgs
Sagar Singhania
No ratings yet
Awakening To The Infinite
Document13 pages
Awakening To The Infinite
Oana Baraian
No ratings yet
CRITICS of RBV
Document11 pages
CRITICS of RBV
ZXZX
No ratings yet