0% found this document useful (0 votes)

5 views13 pages

Advance Python Unit 4

This document provides an overview of advanced Python programming concepts, specifically focusing on the Pandas and SciPy libraries. It covers topics such as creating and manipulating Pandas Series and DataFrames, reading CSV and JSON files, data analysis techniques, data cleaning methods, and plotting data visualizations. Practical examples are included to illustrate the usage of various functions and methods within these libraries.

Uploaded by

devr07j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views13 pages

Advance Python Unit 4

Uploaded by

devr07j

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Unit -4 Advance Python Programming B.Sc.

CA&IT Sem 6
Working with Pandas and SciPy
Working with Pandas – Introduction to Pandas, Pandas Series, Pandas
Dataframes, Reading CSV and JSON files, Analyzing data, Cleaning data,
Correlations, Plotting , Working with SciPy
------------------------------------------------------------------------------------------------
Introduction to Pandas :
What is Pandas? State its functions.

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

Pandas Series :

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

Example:
import pandas as pd
data = [54, 65, 76,98,12]
myvar = pd.Series(data)
print(myvar)
output :

0 54
1 65
2 76
3 98
4 12
dtype: int64
• Label : Label can be used to access a specified value. Uses numeric
index by default. (0,1,….n)
For example : To print value at index 1
print(myvar[1])

Page |1
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6

• using index argument to create user defined labels.

import pandas as pd
data = [54, 65, 76,98,12]
myvar = pd.Series(data,index=["a","b","c","d","e"])
print(myvar["b"])
-------------------------------------------------------------------

Creating Key/Value Objects as Series

----------------------------------------------------------------

You can also use a key/value object, like a dictionary, to create a Series.

Example :

import pandas as pd
data = {'Name':['Jay', 'Nick', 'Krishna', 'Jhon'],
'Age':[20, 21, 19, 18]}
myvar = pd.Series(data)
print(myvar)
output:
Name [Jay, Nick, Krishna, Jhon]
Age [20, 21, 19, 18]
dtype: object

Selecting particular items from Series

Example : use the index argument and specify only the items you want to
include in the Series. (data should be a dictionary)

Example :

import pandas as pd
data= {"London": 4.20, "Paris": 3.80, "Rome": 3.90}
myvar = pd.Series(data, index = ["London", "Rome"])
print(myvar)

Page |2
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
Example :
import pandas as pd
data= {1: "Jack", 2 :"Jhon" , 3 : "Ajay" }
myvar = pd.Series(data, index = [2,3])
print(myvar)

-----------------------------------------------------------------------------------------------
Reading CSV Files

What is CSV file ?

CSV files (comma separated files) is a simple way to store big data sets is to
use.

CSV files contains plain text and is a well know format that can be read by
everyone including Pandas.

Load data into from a csv file : use to_string to print DataFrame

import pandas as pd
df = pd. read_csv('data.csv')
print(df.to_string())

Load data into from a csv file : without using to_string to print DataFrame

import pandas as pd
df = pd.read_csv('data.csv')
print(df)

Reading JSON File :

What is a JSON file ?
JSON file is a JavaScript Object Notation file . It means that a script
(executable) file which is made of text in a programming language, is used to
store and transfer the data.

To use this feature, we import the json package in Python script. The text in
JSON is done through quoted-string which contains the value in key-value
mapping within { }.
Deserialization of JSON
The Deserialization of JSON means the conversion of JSON objects into their
respective Python objects. The load()/loads() method is used for it.

Page |3
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
Following is the table which shows JSON object and its equivalent Python
Object :

JSON OBJECT PYTHON OBJECT

object dict
array list
string str
null None
number (int) int
number (real) float
true True
false False

json.load(): json.load() accepts file object, parses the JSON data, populates a
Python dictionary with the data and returns it back to you.
Syntax:
json.load(file object)
Example : Read and print the content of data.json file
import json
f = open('data.json')
data = json.load(f)
for i in data['employee’]:
print(i)
f.close()
Example : Writing the content to data1.json file
import json
d= {
"id": "04",
"name": "Rajesh",
"department": "Production"
}

Page |4
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
json_object = json.dumps(d, indent=4)
with open("data1.json", "w") as f:
f.write(json_object)
Appending data to json object
import json
def write_json(new_data, filename='data.json'):
with open(filename,'r+') as f:
file_data = json.load(f)
file_data["employee"].append(new_data)
f.seek(0)
json.dump(file_data, f, indent = 4)

d= {
"id": "05",
"name": "Deep",
"department": "Marketing"}

write_json(d)
Analyzing data – head(), tail(),info()

head() – prints rows from top of the DataFrame.

By default head() prints 5 rows from top.

to print top 5 rows from DataFrame: (By default)

import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())

to print top 10 rows from DataFrame:

import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(10))

tail() – prints rows from bottom of the DataFrame.

Page |5
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
By default tail() prints 5 rows from bottom.

to print last 5 rows from DataFrame: (By default)

import pandas as pd
df = pd.read_csv('data.csv')
print(df.tail())

to print last 3 rows from DataFrame:

import pandas as pd
df = pd.read_csv('data.csv')
print(df.tail(3))

to print information about the DataFrame:

print(df.info())

output:
0 Duration 169 non-null int64
1 Pulse 169 non-null int64
2 Maxpulse 169 non-null int64
3 Calories 164 non-null float64
dtypes: float64(1), int64(3)
memory usage: 5.4 KB
------------------------------------------------------------------------------------------------
Data Cleaning

Data cleaning means fixing bad data in your data set.

Bad data could be:

1. Empty cells
2. Data in wrong format
3. Wrong data
4. Duplicates

1. Cleaning Empty Cells

Empty cells can potentially give you a wrong result when you analyze
data.

For Example in data.csv file : The data set contains some empty cells
("Date" in row 22, and "Calories" in row 18 and 28).

Page |6
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
Different ways to deal with empty rows:
i) Remove Rows containing empty cells.
ii) Replace Rows containing empty cells. fillna()
iii) Replace specific column in Rows containing empty
cells. fillna()
iv) Replace values with mean,median or mode, containing
empty cells.

i) Remove Rows containing empty cells - dropna()

One way to deal with empty cells is to remove rows that contain empty
cells.

If data sets are very big, removing a few rows will not have a big impact on
the result.
if you want to make modifications in original dataframe
df.dropna(inplace = True)

new_df = df.dropna()

ii) Replace Empty Values

Empty cells can be replace with new value in the whole Data Frame.

So you do not have to delete entire rows just because of some empty cells.

The fillna() method allows us to replace empty cells with a value:

df.fillna(130)

Replace Only For Specified Columns

To only replace empty values for one column, specify the column name for
the DataFrame:

df["Calories"].fillna(130)
Replace Using Mean, Median, or Mode
Pandas uses the mean() median() and mode() methods to calculate the
respective values for a specified column:
x = df["Calories"].mean()

x = df["Calories"].median()

x = df["Calories"].mode()[0]
Page |7
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6

2) Cleaning Data of Wrong Format

Cells with data of wrong format can make it difficult, or even impossible, to
analyze data.

To solve the problem :

1. remove the rows

2. Convert the cells in same format.

1. Remove the rows :

df.dropna(subset=['Date'], inplace = True)

2. convert all cells in the columns into the same format :

df['Date'] = pd.to_datetime(df['Date'])

3 ) Fixing Wrong Data

"Wrong data" does not have to be "empty cells" or "wrong format", it can just
be wrong, like if someone registered "100" instead of "1.00".

Sometimes you can spot wrong data by looking at the data set, because you
have an expectation of what it should be.

If you take a look at our data set, you can see that in row 7, the duration is 450,
but for all the other rows the duration is between 30 and 60.

Ways to fix wrong data :

1. Replacing Values
2. Remove the rows containing wrong data

1. Replace values.

In our example, it is most likely a typo, and the value should be "45" instead of
"450", and we could just insert "45" in row 7:

df.loc[7,'Duration'] = 45

For small data sets you might be able to replace the wrong data one by one, but
not for big data sets.

Page |8
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
To replace wrong data for larger data sets you can create some rules, e.g. set
some boundaries for legal values, and replace any values that are outside of the
boundaries.

import pandas as pd
df = pd.read_csv('data.csv')

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
print(df.to_string())

2. Removing Rows

Another way of handling wrong data is to remove the rows that contains wrong
data.

for x in df.index:
if df.loc[x, "Duration"] > 120:
df.drop(x, inplace = True)
4 ) Removing Duplicates

• The data set contains duplicates (row 11 and 12).

print(df.duplicated())

Removing duplicates :

import pandas as pd
df = pd.read_csv('data.csv')
df.drop_duplicates(inplace = True)
print(df.to_string())

Correlations :
The corr() method calculates the relationship between each column in your
data set.

df.corr()

The number varies from -1 to 1.

Page |9
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
1 means that there is a 1 to 1 relationship (a perfect correlation), and for this
data set, each time a value went up in the first column, the other one went up as
well.

0.9 is also a good relationship, and if you increase one value, the other will
probably increase as well.

-0.9 would be just as good relationship as 0.9, but if you increase one value, the
other will probably go down.

0.2 means NOT a good relationship, meaning that if one value goes up does not
mean that the other will.

Plotting :
----------------------------------------------------------------------------------------------
plot()

Pandas uses the plot() method to create diagrams.

We can use Pyplot, a submodule of the Matplotlib library to visualize the

diagram on the screen.

Example: plotting graph using data from data.csv file

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot()-
plt.show()

----------------------------------------------------------------------------------------------

Scatter Plot :

To display a scatter plot use argument kind = 'scatter' .

A scatter plot needs an x- and a y-axis.

Example: generating scatter chart from data.csv file

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data.csv')
df.plot(kind = 'scatter', x = 'Duration', y = 'Pulse')
P a g e | 10
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
plt.show()
----------------------------------------------------------------------------------------------
Examples for practical reference only

Example 1 : Return a new Data Frame with no empty cells:

import pandas as pd
df = pd.read_csv('data.csv')
new_df = df.dropna()
print(new_df.to_string())

Example 2 : Fill empty cells with value 130.

import pandas as pd
df = pd.read_csv('data.csv')
df.fillna(130, inplace = True)

Example 3 : Remove the rows :

import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.dropna(subset=['Date'], inplace = True)
print(df.to_string())

Example 4 : convert all cells in the columns into the same format :

import pandas as pd
df = pd.read_csv('data.csv')
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())

Example 5: Fixing wrong data with correct value

import pandas as pd
df = pd.read_csv('data.csv')
df.loc[7,'Duration'] = 45
print(df.to_string())

Example 6: Fixing wrong data with correct value considering the boundary
ranges
import pandas as pd
df = pd.read_csv('data.csv')

for x in df.index:

P a g e | 11
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
if df.loc[x, "Duration"] > 120:
df.loc[x, "Duration"] = 120
print(df.to_string())

Example 7 : Find duplicates

import pandas as pd
df = pd.read_csv('data.csv')
print(df.duplicated())
Example 8 : Remove duplicates
import pandas as pd
df = pd.read_csv('data.csv')
df.drop_duplicates(inplace = True)
print(df.to_string())

pie() - draw pie chart

import matplotlib.pyplot as plt

cars = ['AUDI', 'BMW', 'FORD',
'TESLA', 'JAGUAR', 'MERCEDES']
data = [23, 17, 35, 29, 12, 41]
fig = plt.figure(figsize=(10, 7))
plt.pie(data, labels=cars)
plt.show()
------------------------------------------
bar chart
-------------------------------------------
import pandas as pd
import matplotlib.pyplot as plot
data = {"City": ["London", "Paris", "Rome"],
"Tourist": [22.42, 18.95, 10.7]};
df = pd.DataFrame(data=data)
df.plot.bar(x="City", y="Tourist", rot=50, title="Number of tourist visits - Year
2022");
plot.show()
----------------------------------------------------
import pandas as pd
import matplotlib.pyplot as plt
plotdata = pd.DataFrame({
"Maruti":[40, 25, 10, 26, 36],
"Kia":[6, 8, 10, 12, 15],
"Honda":[30, 27, 42, 17, 37]
P a g e | 12
Unit -4 Advance Python Programming B.Sc. CA&IT Sem 6
},
index=["2018", "2019", "2020", "2021", "2022"]
)
plotdata.plot(kind="bar")
plt.title("Sales of Car")
plt.xlabel("Year")
plt.ylabel("Sales in Lakhs")
plt.show()

Short Questions:

1. What is Pandas? State its functions.

2. What is Pandas Series ? Give one example.
3. Explain Deserialization of JSON.
4. Explain json.load() function
5. Explain head() function with example.
6. Explain tail() function with example.
7. Explain how to remove duplicates from csv file.
8. Explain Data Cleaning in brief.

Long Questions :
1. Explain correlations in detail
2. Explain Creating Key/Value Objects as Series and selecting
particular items from series.
3. Explain plot() method and scatter plot()
4. Explain how to fix wrong data in .csv file.
5. Explain how to clean empty cells in .csv file.
6. Explain how to clean data of wrong format in .csv file.

P a g e | 13

Pandas Handbook
No ratings yet
Pandas Handbook
33 pages
Forces and Motion Simulation - Sample Answer
No ratings yet
Forces and Motion Simulation - Sample Answer
7 pages
12 Information Practices Text Book Preeti Arora
No ratings yet
12 Information Practices Text Book Preeti Arora
45 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas
No ratings yet
Pandas
41 pages
Asfasdas
No ratings yet
Asfasdas
36 pages
Pandas
No ratings yet
Pandas
21 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Pandas Module (Part-I)
No ratings yet
Pandas Module (Part-I)
36 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
Pandas
No ratings yet
Pandas
94 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
EDA Pandas
No ratings yet
EDA Pandas
228 pages
MOD-3 Dap
No ratings yet
MOD-3 Dap
41 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
30 pages
Pandas Notes
No ratings yet
Pandas Notes
10 pages
2 Pandas
No ratings yet
2 Pandas
22 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
AI Student HandbookXII 2025-26!8!20
No ratings yet
AI Student HandbookXII 2025-26!8!20
13 pages
Data Science - Sec3
No ratings yet
Data Science - Sec3
27 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
Unit IV
No ratings yet
Unit IV
49 pages
Unit6 - Working With Data
No ratings yet
Unit6 - Working With Data
29 pages
Lecture 7 Understanding Dataframes in Python and R
No ratings yet
Lecture 7 Understanding Dataframes in Python and R
17 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
12 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
Pandas 1
No ratings yet
Pandas 1
50 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Mdad - Numpy ML
No ratings yet
Mdad - Numpy ML
85 pages
Pandas Basics
No ratings yet
Pandas Basics
21 pages
Pandas
No ratings yet
Pandas
8 pages
Pandas
No ratings yet
Pandas
5 pages
Lab 1 ML Lab
No ratings yet
Lab 1 ML Lab
15 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas
No ratings yet
Pandas
12 pages
Unit 4.2
No ratings yet
Unit 4.2
24 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Lab 9
No ratings yet
Lab 9
9 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
No ratings yet
Introduction To Pandas and Matplotlib: Dr. D. Kothandaraman Associate Professor, SCOPE, VITAP-University
30 pages
Ainotes
No ratings yet
Ainotes
5 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Pandas
No ratings yet
Pandas
29 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
Ainotes Dataframe
No ratings yet
Ainotes Dataframe
5 pages
Quick Python Guide
From Everand
Quick Python Guide
Coder1
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Class 07 - English PDF
No ratings yet
Class 07 - English PDF
4 pages
Additional Exercises For Convex Optimization PDF
No ratings yet
Additional Exercises For Convex Optimization PDF
187 pages
Computer and Internet: 1. Who Is The Father of Computer?
No ratings yet
Computer and Internet: 1. Who Is The Father of Computer?
18 pages
Gas Law Homework
No ratings yet
Gas Law Homework
6 pages
Worksheet Class Xii - Chemistry Chapter - Haloalkanes and Haloarenes
No ratings yet
Worksheet Class Xii - Chemistry Chapter - Haloalkanes and Haloarenes
3 pages
STD 3 - CT 3 - Division, Billing & Prepositions
No ratings yet
STD 3 - CT 3 - Division, Billing & Prepositions
24 pages
Image Processing LECTURE
No ratings yet
Image Processing LECTURE
60 pages
Optical Character Recognition Techniques in Mail Sorting: A Review of Algorithms
No ratings yet
Optical Character Recognition Techniques in Mail Sorting: A Review of Algorithms
27 pages
A Case Study in Technological Quality Assurance of A Metric Screw Thread
100% (1)
A Case Study in Technological Quality Assurance of A Metric Screw Thread
10 pages
06 Sorting
No ratings yet
06 Sorting
78 pages
Detailed Analysis of Radiation Data From The Gemini 4 and Gemini 7 Proton-Electron Spectrometer Experiments
No ratings yet
Detailed Analysis of Radiation Data From The Gemini 4 and Gemini 7 Proton-Electron Spectrometer Experiments
97 pages
MSD Merged
No ratings yet
MSD Merged
531 pages
CGE674 Assignment 1
No ratings yet
CGE674 Assignment 1
10 pages
Chapter 4 Geologic Processes On Earths Surface
No ratings yet
Chapter 4 Geologic Processes On Earths Surface
67 pages
ATPL Inst 6.3 PDF
No ratings yet
ATPL Inst 6.3 PDF
8 pages
2017 AMC8真题
No ratings yet
2017 AMC8真题
10 pages
ALTEC LANSING 620A Monitor
No ratings yet
ALTEC LANSING 620A Monitor
3 pages
Hale Uav
No ratings yet
Hale Uav
29 pages
Engineering: GREAT Solutions
No ratings yet
Engineering: GREAT Solutions
4 pages
Highly Durable Anti-Reflective Anti-Soiling Coating For PV Module Glass
No ratings yet
Highly Durable Anti-Reflective Anti-Soiling Coating For PV Module Glass
1 page
Bayesian Games: 1: Definition and Equilibrium
No ratings yet
Bayesian Games: 1: Definition and Equilibrium
20 pages
Production Function and Returns To A Factor Concept of Production Function
100% (2)
Production Function and Returns To A Factor Concept of Production Function
17 pages
A Review and Assessment of Hydrodynamic
No ratings yet
A Review and Assessment of Hydrodynamic
7 pages
EDA WORKSHEETs
No ratings yet
EDA WORKSHEETs
5 pages
III Semester DBMS Manual - VTU MCA
0% (2)
III Semester DBMS Manual - VTU MCA
31 pages
Erase A Hard Drive Using The Mac Terminal - EasyOSX
No ratings yet
Erase A Hard Drive Using The Mac Terminal - EasyOSX
7 pages
Garlock Stress Saver Literature
No ratings yet
Garlock Stress Saver Literature
4 pages
Conception, Design and Development of A Test Bench For The Automotive Industry Intercoolers
No ratings yet
Conception, Design and Development of A Test Bench For The Automotive Industry Intercoolers
10 pages
Architecture of Light - Case Study Holly Cross Church in Nin, Croatia
No ratings yet
Architecture of Light - Case Study Holly Cross Church in Nin, Croatia
9 pages

Advance Python Unit 4

Uploaded by

Advance Python Unit 4

Uploaded by

Unit -4 Advance Python Programming B.Sc.

Pandas is a Python library used for working with data sets.

It has functions for analyzing, cleaning, exploring, and manipulating data.

A Pandas Series is like a column in a table.

It is a one-dimensional array holding data of any type.

• using index argument to create user defined labels.

Creating Key/Value Objects as Series

Selecting particular items from Series

What is CSV file ?

Reading JSON File :

JSON OBJECT PYTHON OBJECT

head() – prints rows from top of the DataFrame.

to print top 5 rows from DataFrame: (By default)

to print top 10 rows from DataFrame:

tail() – prints rows from bottom of the DataFrame.

to print last 5 rows from DataFrame: (By default)

to print last 3 rows from DataFrame:

to print information about the DataFrame:

Data cleaning means fixing bad data in your data set.

Bad data could be:

1. Cleaning Empty Cells

i) Remove Rows containing empty cells - dropna()

ii) Replace Empty Values

The fillna() method allows us to replace empty cells with a value:

Replace Only For Specified Columns

2) Cleaning Data of Wrong Format

To solve the problem :

1. remove the rows

2. Convert the cells in same format.

1. Remove the rows :

df.dropna(subset=['Date'], inplace = True)

2. convert all cells in the columns into the same format :

3 ) Fixing Wrong Data

Ways to fix wrong data :

• The data set contains duplicates (row 11 and 12).

The number varies from -1 to 1.

Pandas uses the plot() method to create diagrams.

We can use Pyplot, a submodule of the Matplotlib library to visualize the

Example: plotting graph using data from data.csv file

To display a scatter plot use argument kind = 'scatter' .

A scatter plot needs an x- and a y-axis.

Example: generating scatter chart from data.csv file

Example 1 : Return a new Data Frame with no empty cells:

Example 2 : Fill empty cells with value 130.

Example 3 : Remove the rows :

Example 5: Fixing wrong data with correct value

Example 7 : Find duplicates

pie() - draw pie chart

import matplotlib.pyplot as plt

1. What is Pandas? State its functions.

You might also like