0% found this document useful (0 votes)

257 views6 pages

Pyspark Vs Spark SQL

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

257 views6 pages

Pyspark Vs Spark SQL

Uploaded by

Sozha Vendhan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Scenario Based Interview

Pyspark vs
Spark SQL

Ganesh. R
#Problem Statement You are the restaurant owner and you want to analyze a possible
expansion (there will be at least one customer every day).

Compute the moving average of how much the customer paid in a seven days window (i.e.,
current day + 6 days before). average_amount should be rounded to two decimal places.

Return the result table ordered by visited_on in ascending order.

from pyspark.sql import SparkSession

from pyspark.sql import functions as F
from pyspark.sql.functions import col, sum, round, window
from pyspark.sql.types import DateType

# Initialize Spark session

spark = SparkSession.builder.appName("MovingAverage").getOrCreate()

# Sample data
data = [
(1, "Jhon", "2019-01-01", 100),
(2, "Daniel", "2019-01-02", 110),
(3, "Jade", "2019-01-03", 120),
(4, "Khaled", "2019-01-04", 130),
(5, "Winston", "2019-01-05", 110),
(6, "Elvis", "2019-01-06", 140),
(7, "Anna", "2019-01-07", 150),
(8, "Maria", "2019-01-08", 80),
(9, "Jaze", "2019-01-09", 110),
(1, "Jhon", "2019-01-10", 130),
(3, "Jade", "2019-01-10", 150),
]

# Create DataFrame
columns = ["customer_id", "name", "visited_on", "amount"]
df = spark.createDataFrame(data, schema=columns)

df.display()
df.printSchema()

root
|-- customer_id: long (nullable = true)
|-- name: string (nullable = true)
|-- visited_on: string (nullable = true)
|-- amount: long (nullable = true)

# Define a window specification

window_spec = Window.orderBy("visited_on").rowsBetween(-6, 0)

# Calculate the rolling sum and average

result_df = (
df.groupBy("visited_on")
.agg(sum("amount").alias("daily_amount"))
.withColumn("amount", sum("daily_amount").over(window_spec))
.withColumn("average_amount",
round(avg("daily_amount").over(window_spec), 2))
)

# Filter to include only rows where row_number >= 7

result_df = (
result_df.withColumn("row_number",
row_number().over(Window.orderBy("visited_on")))
.filter(col("row_number") >= 7)
.select("visited_on", "amount", "average_amount")
)

# Show the result

result_df.display()

df.createOrReplaceTempView("Customer")

%sql
WITH CustomerGrouped AS (
SELECT
visited_on,
SUM(amount) AS total_amount
FROM
Customer
GROUP BY
visited_on
),
MovingAverage AS (
SELECT
visited_on,
total_amount,
SUM(total_amount) OVER (
ORDER BY
visited_on ROWS BETWEEN 6 PRECEDING
AND CURRENT ROW
) AS sum_amount_7d
FROM
CustomerGrouped
)
SELECT
visited_on,
sum_amount_7d AS amount,
ROUND(sum_amount_7d / 7, 2) AS average_amount
FROM
MovingAverage
WHERE
DATEDIFF(
visited_on,
(
SELECT
MIN(visited_on)
FROM
CustomerGrouped
)
) >= 6
ORDER BY
visited_on;
IF YOU FOUND
THIS POST
USEFUL, PLEASE
SAVE IT.

Ganesh. R
+91-9030485102. Hyderabad, Telangana. rganesh0203@gmail.com

https://medium.com/@rganesh0203 https://rganesh203.github.io/Portfolio/
https://github.com/rganesh203. https://www.linkedin.com/in/r-ganesh-a86418155/

https://www.instagram.com/rg_data_talks/ https://topmate.io/ganesh_r0203

Azure Databricks Interview Question
No ratings yet
Azure Databricks Interview Question
12 pages
PySpark Comprehensive Notes
No ratings yet
PySpark Comprehensive Notes
59 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
PySpark Data Frame Questions PDF
100% (2)
PySpark Data Frame Questions PDF
57 pages
Etl Commands For Pyspark
No ratings yet
Etl Commands For Pyspark
8 pages
Pyspark Questions & Scenario Based
No ratings yet
Pyspark Questions & Scenario Based
25 pages
Pyspark Practice
No ratings yet
Pyspark Practice
42 pages
SQL Vs PySpark 1678871778
No ratings yet
SQL Vs PySpark 1678871778
8 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
3 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages
Spark Architecture
100% (1)
Spark Architecture
12 pages
Fast Data Processing with Spark 2 - Third Edition
From Everand
Fast Data Processing with Spark 2 - Third Edition
Krishna Sankar
No ratings yet
PySpark Cheat Sheet For RDD Operations
No ratings yet
PySpark Cheat Sheet For RDD Operations
1 page
Spark Tutorial
No ratings yet
Spark Tutorial
8 pages
SQL To Pyspark Conversion
No ratings yet
SQL To Pyspark Conversion
9 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Spark Interview Questions 1713805760
No ratings yet
Spark Interview Questions 1713805760
40 pages
SQL Interview Questions and Answers: What Is SQL and Where Does It Come From?
No ratings yet
SQL Interview Questions and Answers: What Is SQL and Where Does It Come From?
9 pages
Simplifying Data Engineering Databricks
100% (1)
Simplifying Data Engineering Databricks
20 pages
4 - Action and RDD Transformations
No ratings yet
4 - Action and RDD Transformations
25 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Top Pyspark InterviewQuestions
No ratings yet
Top Pyspark InterviewQuestions
21 pages
Pyspark IQ FREE Guide
100% (1)
Pyspark IQ FREE Guide
57 pages
50 PySpark Interview Questions PDF
No ratings yet
50 PySpark Interview Questions PDF
7 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Spark Interview Questions and Answers
100% (3)
Spark Interview Questions and Answers
31 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Pyspark Interview 1738079940
No ratings yet
Pyspark Interview 1738079940
6 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Spark SQL
100% (1)
Spark SQL
25 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
O Reilly Data Lake Bootcamp Day 11694182865124
No ratings yet
O Reilly Data Lake Bootcamp Day 11694182865124
46 pages
Apache Spark - Optimization Techniques
No ratings yet
Apache Spark - Optimization Techniques
7 pages
Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
Pyspark
No ratings yet
Pyspark
31 pages
Spark and Scala Course
No ratings yet
Spark and Scala Course
5 pages
50 PySpark Interview Questions 1732556477
No ratings yet
50 PySpark Interview Questions 1732556477
7 pages
External Tables
No ratings yet
External Tables
105 pages
Windowing Functions
No ratings yet
Windowing Functions
54 pages
Window Function in Pyspark
100% (1)
Window Function in Pyspark
8 pages
PySpark VS SQL Interview Questions
100% (1)
PySpark VS SQL Interview Questions
16 pages
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
No ratings yet
PySpark Tutorial For Beginners - Python Examples - Spark by (Examples)
19 pages
Learning Real-Time Processing With Spark Streaming - Sample Chapter
No ratings yet
Learning Real-Time Processing With Spark Streaming - Sample Chapter
30 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Py Spark
83% (6)
Py Spark
195 pages
Apache Spark Interview Questions and Answers PDF
No ratings yet
Apache Spark Interview Questions and Answers PDF
31 pages
Snowflake:: Data Warehouse For Cloud
No ratings yet
Snowflake:: Data Warehouse For Cloud
2 pages
Making Big Data Simple With Databricks
No ratings yet
Making Big Data Simple With Databricks
25 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
PySpark Essentials: A Practical Guide to Distributed Computing
From Everand
PySpark Essentials: A Practical Guide to Distributed Computing
Robert Johnson
No ratings yet
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
From Everand
Databricks Certified Associate Developer for Apache Spark Using Python: The ultimate guide to getting certified in Apache Spark using practical examples with Python
Saba Shah
No ratings yet
Pentaho Data Integration Cookbook - Second Edition
From Everand
Pentaho Data Integration Cookbook - Second Edition
Alex Meadows
No ratings yet
Querying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition)
From Everand
Querying Databricks with Spark SQL: Leverage SQL to query and analyze Big Data for insights (English Edition)
Adam Aspin
No ratings yet
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
From Everand
Mastering Data Engineering and Analytics with Databricks: A Hands-on Guide to Build Scalable Pipelines Using Databricks, Delta Lake, and MLflow (English Edition)
Manoj Kumar
No ratings yet
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
From Everand
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
Pooja Kelgaonkar
No ratings yet
Python Tutorial
No ratings yet
Python Tutorial
37 pages
Top 12 Python Libraries
No ratings yet
Top 12 Python Libraries
15 pages
Spark A To Z
No ratings yet
Spark A To Z
63 pages
Learning SQL Zero To Hero
100% (1)
Learning SQL Zero To Hero
110 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
SQL Fundamentals
No ratings yet
SQL Fundamentals
61 pages
Big Data Engineering: Post Graduate Program in
No ratings yet
Big Data Engineering: Post Graduate Program in
4 pages
Avadhut Shinde ADE CV
No ratings yet
Avadhut Shinde ADE CV
2 pages
Big Data Techniques of 2025
No ratings yet
Big Data Techniques of 2025
31 pages
RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming
No ratings yet
RT-DBSCAN: Real-Time Parallel Clustering of Spatio-Temporal Data Using Spark-Streaming
15 pages
Jake S Resume 7
No ratings yet
Jake S Resume 7
1 page
Partha's Resume
No ratings yet
Partha's Resume
2 pages
Big Data - Spark
100% (1)
Big Data - Spark
72 pages
Ajay Data Engineer Resume
No ratings yet
Ajay Data Engineer Resume
6 pages
MCQs - Big Data Analytics - Fundamentals
No ratings yet
MCQs - Big Data Analytics - Fundamentals
14 pages
Naveen's Resume - AWS DE
No ratings yet
Naveen's Resume - AWS DE
5 pages
Introduction To Hadoop - Part Two: 1 Working With Found Datasets 1 2 Hadoop and Comma Separated Values (CSV) Files 1
No ratings yet
Introduction To Hadoop - Part Two: 1 Working With Found Datasets 1 2 Hadoop and Comma Separated Values (CSV) Files 1
18 pages
Spark Setup
No ratings yet
Spark Setup
4 pages
Databricks RealQuestions
No ratings yet
Databricks RealQuestions
9 pages
Session 3.8
No ratings yet
Session 3.8
17 pages
Power BI - USC
No ratings yet
Power BI - USC
6 pages
Sri 3
No ratings yet
Sri 3
8 pages
Mod4 Bda
No ratings yet
Mod4 Bda
14 pages
Avinash - Data Engineer (AutoRecovered)
No ratings yet
Avinash - Data Engineer (AutoRecovered)
10 pages
Unit 1. Introduction To Big Data: False
No ratings yet
Unit 1. Introduction To Big Data: False
7 pages
Bda 05
No ratings yet
Bda 05
12 pages
SPSS Modeler Level 2 Quiz Attempt Review1
No ratings yet
SPSS Modeler Level 2 Quiz Attempt Review1
13 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
From Query Plan To Query Performance:: Supercharging Your Spark Queries Using The Spark UI SQL Tab
No ratings yet
From Query Plan To Query Performance:: Supercharging Your Spark Queries Using The Spark UI SQL Tab
52 pages
SR Data Engineer - Lalitya Resume
No ratings yet
SR Data Engineer - Lalitya Resume
8 pages
Venkata Shiva Krishna Bugga: Work Experience Skills
No ratings yet
Venkata Shiva Krishna Bugga: Work Experience Skills
2 pages
DeltaLake Databricks
No ratings yet
DeltaLake Databricks
5 pages
Data Science Course Content
No ratings yet
Data Science Course Content
24 pages
Databricks Spark Reference Applications
No ratings yet
Databricks Spark Reference Applications
37 pages

Pyspark Vs Spark SQL

Uploaded by

Pyspark Vs Spark SQL

Uploaded by

Scenario Based Interview

Return the result table ordered by visited_on in ascending order.

from pyspark.sql import SparkSession

# Initialize Spark session

# Define a window specification

# Calculate the rolling sum and average

# Filter to include only rows where row_number >= 7

# Show the result

You might also like