0% found this document useful (0 votes)
2K views

Create A DataFrame

This document shows code to create a Spark DataFrame from Row objects, display the DataFrame, and write it out as a Parquet file. It imports SparkSession to create a Spark instance, imports Row and DataFrame functions from Spark SQL, defines Row objects for passenger data and adds them to a list, creates a DataFrame from the list, displays the DataFrame contents, and writes the single-partition DataFrame to a Parquet file called PassengerData.

Uploaded by

Arpita Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

Create A DataFrame

This document shows code to create a Spark DataFrame from Row objects, display the DataFrame, and write it out as a Parquet file. It imports SparkSession to create a Spark instance, imports Row and DataFrame functions from Spark SQL, defines Row objects for passenger data and adds them to a list, creates a DataFrame from the list, displays the DataFrame contents, and writes the single-partition DataFrame to a Parquet file called PassengerData.

Uploaded by

Arpita Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

# Put your code here

from pyspark.sql import SparkSession


spark = SparkSession \
.builder \
.appName("Data Frame PASSANGER") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()

from pyspark.sql import *


passanger = Row("Name","age","source","destination")
data1 = passanger("David", "22", "London", "Paris")
data2 = passanger("Steve", "22", "New York", "Sydney")
passangerData=[data1,data2]
df = spark.createDataFrame(passangerData)
df.show()

# Don't Remove this line


df.coalesce(1).write.parquet("PassengerData")

You might also like