0% found this document useful (0 votes)

1K views

DataFrame Operations Using A Json File

This Python code uses Spark SQL to read employee data from a JSON file into a DataFrame. It then filters the DataFrame to only rows where the stream is "JAVA" and writes the filtered DataFrame to a new Parquet file. It first reads the JSON, displays the DataFrame, coalesces and writes it to a Parquet file. Then it reads the Parquet, filters for "JAVA" stream, displays and writes the filtered DataFrame to a new Parquet file.

Uploaded by

Arpita Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views

DataFrame Operations Using A Json File

Uploaded by

Arpita Das

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 1

#Put your code here

from pyspark.sql import SparkSession

spark = SparkSession \
.builder \
.appName("Data Frame EMPLOYEE") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
df = spark.read.json("emp.json")
df.show()
df.coalesce(1).write.parquet("Employees")
pf = spark.read.parquet("Employees")
dfNew = pf.filter(pf.stream=='JAVA')
dfNew.show()
dfNew.coalesce(1).write.parquet("JavaEmployees")

Milestone - Coding - Python - Cu
No ratings yet
Milestone - Coding - Python - Cu
3 pages
Increasing List
No ratings yet
Increasing List
2 pages
Exercise - Descriptive Statistics - Fresco
100% (1)
Exercise - Descriptive Statistics - Fresco
1 page
Fresco Play Hands On Answers
33% (3)
Fresco Play Hands On Answers
2 pages
Digital Python Intermediate iON LX Async SP Assessment 18 21
No ratings yet
Digital Python Intermediate iON LX Async SP Assessment 18 21
1 page
Hadoop
77% (13)
Hadoop
65 pages
E2 String
0% (4)
E2 String
2 pages
Python Qualis Pytest HandsOn
67% (3)
Python Qualis Pytest HandsOn
7 pages
FAQ Metamorph
50% (4)
FAQ Metamorph
5 pages
Py Spark Final
No ratings yet
Py Spark Final
1 page
DNN Handson
No ratings yet
DNN Handson
2 pages
TCS Database Questions
100% (1)
TCS Database Questions
23 pages
Unstructtured Data Classification Fresco
100% (1)
Unstructtured Data Classification Fresco
4 pages
Final - DNN - Hands - On - Jupyter Notebook
0% (1)
Final - DNN - Hands - On - Jupyter Notebook
6 pages
Import As From Import Import: Problem 1
100% (1)
Import As From Import Import: Problem 1
5 pages
Spark SQL Hands - On
No ratings yet
Spark SQL Hands - On
3 pages
Scala Constructs: Concepts of Functional Programming
No ratings yet
Scala Constructs: Concepts of Functional Programming
21 pages
DATAbase Connectivity
100% (2)
DATAbase Connectivity
4 pages
Python List Handson 1
No ratings yet
Python List Handson 1
2 pages
Change Datatypes and Return Required Json Data
No ratings yet
Change Datatypes and Return Required Json Data
1 page
Modules 1
No ratings yet
Modules 1
9 pages
Python Hands On
100% (1)
Python Hands On
11 pages
Fresco
100% (2)
Fresco
17 pages
Database Connection 1
100% (1)
Database Connection 1
5 pages
Phone Directory E2 Stage 1
0% (1)
Phone Directory E2 Stage 1
3 pages
Creating A Selenium Script
No ratings yet
Creating A Selenium Script
3 pages
Tcs Digital Profile
33% (3)
Tcs Digital Profile
1 page
Create A DataFrame
No ratings yet
Create A DataFrame
1 page
Zenpython Handson1
67% (3)
Zenpython Handson1
2 pages
Python Qualis
No ratings yet
Python Qualis
6 pages
Tcs EDA Question
0% (1)
Tcs EDA Question
5 pages
Python 3 Application Programming
100% (1)
Python 3 Application Programming
12 pages
Time Series Analysis
0% (1)
Time Series Analysis
2 pages
Coroutine
0% (1)
Coroutine
2 pages
This Study Resource Was
No ratings yet
This Study Resource Was
3 pages
Python 3 Functions and OOPs
No ratings yet
Python 3 Functions and OOPs
7 pages
Spark Streaming - Malay
100% (1)
Spark Streaming - Malay
1 page
Python 3 Programming Q & A
No ratings yet
Python 3 Programming Q & A
4 pages
Python 3 Functions and OOPs
No ratings yet
Python 3 Functions and OOPs
6 pages
Descriptor
No ratings yet
Descriptor
4 pages
Kafka - Premiera Ola
No ratings yet
Kafka - Premiera Ola
5 pages
Context Manager 1
No ratings yet
Context Manager 1
1 page
Advanced Designer Exam
100% (6)
Advanced Designer Exam
19 pages
Power BI Outset
100% (1)
Power BI Outset
11 pages
Tensor Flow
No ratings yet
Tensor Flow
2 pages
ECMAScript6 Handson
100% (1)
ECMAScript6 Handson
2 pages
Basics of Statistics and Probability Handsons
No ratings yet
Basics of Statistics and Probability Handsons
3 pages
Props
0% (3)
Props
1 page
Scala - The Diatonic Syallable
No ratings yet
Scala - The Diatonic Syallable
2 pages
Implementing Design Thinking
No ratings yet
Implementing Design Thinking
4 pages
Chapter-3 Risk Management Through Insurance: Certificate in Insurance Concepts
60% (5)
Chapter-3 Risk Management Through Insurance: Certificate in Insurance Concepts
23 pages
Python 3 Programming
No ratings yet
Python 3 Programming
3 pages
Class N Static
No ratings yet
Class N Static
5 pages
Fresco Play Training 2
No ratings yet
Fresco Play Training 2
12 pages
PySpark_FP_Course ID 58339 - Hands on 4
No ratings yet
PySpark_FP_Course ID 58339 - Hands on 4
2 pages
PySpark_FP_Course ID 58339 - Hands on 1
No ratings yet
PySpark_FP_Course ID 58339 - Hands on 1
2 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
(Big Data Analytics With PySpark) (CheatSheet)
No ratings yet
(Big Data Analytics With PySpark) (CheatSheet)
7 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
3 pages
SQL Cheat Sheet Python
No ratings yet
SQL Cheat Sheet Python
1 page
23
No ratings yet
23
5 pages
22
No ratings yet
22
7 pages
Exercise - ANOVA - Fresco
No ratings yet
Exercise - ANOVA - Fresco
1 page
Exercise ANOVA Anotherone - Fresco
No ratings yet
Exercise ANOVA Anotherone - Fresco
1 page
Give A Try - Database Connectivity
No ratings yet
Give A Try - Database Connectivity
5 pages
DataFrame Operations
No ratings yet
DataFrame Operations
1 page

DataFrame Operations Using A Json File

Uploaded by

DataFrame Operations Using A Json File

Uploaded by

#Put your code here

from pyspark.sql import SparkSession

You might also like