0% found this document useful (0 votes)

11 views

CO3_3_Indexing and Sorting, Loading Data From CSV

Uploaded by

nadimpalligeethikasaiswetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

CO3_3_Indexing and Sorting, Loading Data From CSV

Uploaded by

nadimpalligeethikasaiswetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

Department of CSE

COURSE NAME: DATA ANALYTICS AND VISUALIZATION

COURSE CODE: 22CS2227R
Topic: Loading data in CSV, Indexing & Sorting

Session - 11

1
AIM OF THE SESSION

To familiarize students with the concepts of loading csv files, indexing and sorting of data and lists.

INSTRUCTIONAL OBJECTIVES

This Session is designed to: understand importance of Indexing – its real time

applications in sorting the lists with respect to the data frames.

LEARNING OUTCOMES

At the end of this session, you should be able to: Understand the

Sort a pandas DataFrame by the values of one or more columns.

Sort a DataFrame by its index using .sort_index()

Organize missing data while sorting values.

Sort a DataFrame in place using inplace set to True.

Sort a Pandas Dataframe.
Session Content Loading data from csv files

Use the Ascending Parameter to change the Sort order

Sort a Dataframe by its Index using .sort_index()

Organize the Missing data while sorting values.
Sort a DataFrame in place using inplace set to True.

‘Lists’ in Python

Inferential statistics

3
Preparing the Dataset
Fuel economy data compiled by the US Environmental Protection Agency (EPA) on
vehicles made between 1984 and 2021.

The EPA fuel economy dataset is great because it has many different types of information
that you can sort on, from textual to numeric data types.

The dataset contains eighty-three columns in total.

For analysis purposes, you’ll be looking at MPG (miles per gallon) data on vehicles by
make, model, year, and other vehicle attributes.

4
Python Code

By calling .read_csv() with the dataset

URL, you’re able to load the data into a
DataFrame.

5
Getting Familiar With .sort_values()
We use .sort_values() to sort values in a DataFrame along either axis (columns or rows).

The figure above shows the results of using .sort_values() to sort the DataFrame’s
rows based on the values in the highway08 column.
6
Getting Familiar With .sort_index()
You use .sort_index() to sort a DataFrame by its row index or column labels.
The difference from using .sort_values() is that you’re sorting the DataFrame
based on its row index or column names, not by the values in these rows or
columns.

7
Sorting Your DataFrame on a Single Column
To sort the DataFrame based on the values in a single column, you’ll
use .sort_values(). By default, this will return a new DataFrame sorted in
ascending order.
It does not modify the original DataFrame.

8
Sorting by a Column in Ascending Order
• To use .sort_values(), you pass a single argument to the method containing
the name of the column you want to sort by. In this example, you sort the
DataFrame by the city08 column, which represents city MPG for fuel-only
cars. 
This sorts your
DataFrame using the
column values from city08,
showing the vehicles with
the lowest MPG first.
By
default, .sort_values() sorts
your data in ascending
order.

9
Changing the Sort Order
Another parameter of .sort_values() is ascending.
By default .sort_values() has ascending set to True.
If you want the DataFrame sorted in descending order, then you can pass False to
this parameter.

10
Choosing a Sorting Algorithm
Pandas allows you to choose different sorting algorithms to use with
both .sort_values() and .sort_index().
The available algorithms are quicksort, mergesort, and heapsort.

Using kind, you set the sorting

algorithm to mergesort.

11
Sorting Your DataFrame on Multiple Columns
In data analysis, it’s common to want to sort your data based on the values of
multiple columns.
Imagine you have a dataset with people’s first and last names. It would make
sense to sort by last name and then first name, so that people with the same last
name are arranged alphabetically according to their first names.

12
Sorting Your DataFrame on Multiple Columns
In addition to the MPG in city conditions,
you may also want to look at MPG for
highway conditions. To sort by two keys,
you can pass a list of column names to by:

13
Sorting by Multiple Columns in Ascending Order
To sort the DataFrame on multiple columns, you must provide a list of
column names. For example, to sort by make and model, you should
create the following list and then pass it to .sort_values():

14
Sorting by Multiple Columns in Descending Order
 Sorting in descending order based on the make and model columns.
To sort in descending order, set ascending to False:

15
Sorting Your DataFrame on Its Index
Before sorting on the index, it’s a good idea to know what an index represents.
A DataFrame has an .index property, which by default is a numerical
representation of its rows’ locations.
You can think of the index as the row numbers. It helps in quick row lookup and
identification.

16
Sorting Your DataFrame on Its Index

17
Sorting by Index in Descending Order

Now we will sort your DataFrame by its index in descending order.

Remember from sorting your DataFrame with .sort_values() that you can reverse
the sort order by setting ascending to False.
This parameter also works with .sort_index(), so you can sort your DataFrame in
reverse order like this:

18
Sorting by Index in Descending Order

Now your DataFrame is sorted by its index in descending order.

One difference between using .sort_index() and .sort_values() is
that .sort_index() has no by parameter since it sorts a DataFrame on the row
index by default.
19
Merge/Join Datasets
 Joining and merging DataFrames is the core process to start with data analysis
and machine learning tasks.
 It is one of the toolkits which every Data Analyst or Data Scientist should
master because in almost all the cases data comes from multiple source and
files.
 You may need to bring all the data in one place by some sort of join logic and
then start your analysis.
 Pandas provides various facilities for easily combining different datasets.

20
Understanding the different types of merge
 We can merge two data frames in pandas python by using the merge()
function.
 The different arguments to merge() allow you to perform natural join, left
join, right join, and full outer join in pandas.
 Before you perform joint operations let’s first load the two csv files and
convert them into data frames df1 and df2.

21
Natural join
 Natural join keeps only rows that match from the data frames(df1 and df2),
specify the argument how=’inner’
Syntax: pd.merge(df1, df2, on=column', how='inner')
 Return only the rows in which the left table have matching keys in the right
table

22
Full outer join
 Full outer join keeps all rows from both data frames, specify how=‘outer’.
Syntax: pd.merge(df1, df2, on=column', how=’outer’)
 Returns all rows from both tables, join records from the left which have
matching keys in the right table.

23
Left outer join
 Left outer join includes all the rows of your data frame df1 and only those
from df2 that match, specify how =‘Left.
Syntax: pd.merge(df1, df2, on=column', how=left)
 Return all rows from the left table, and any rows with matching keys from the
right table.

24
Right outer join
 Return all rows from the df2 table, and any rows with matching keys from the
df1 table, specify how =‘Right’.
Syntax: pd.merge(df1, df2, on=column', how=right)
 Return all rows from the right table, and any rows with matching keys from
the left table.

25
SELF-ASSESSMENT QUESTIONS

1. What is the syntax to create a Pandas Series from a Python list ___________________

2. What is a correct syntax to return the first value of a Pandas Series_________________

3. What is a correct syntax to add the lables "x", "y", and "z" to a Pandas Series __________________

4. What is a correct syntax to create a Pandas DataFrame_______________

5. What is a correct syntax to return the first row in a Pandas DataFrame____________________

TERMINAL QUESTIONS

1) Critique your views on pandas – groupby function with examples.

2) Deduce the steps to write csv files to pandas.

3) Defend your steps on delimited text file that uses a comma to separate

the values.

4) Compare and Contrast between different types of joins.

REFERENCES FOR FURTHER LEARNING OF THE SESSION

Reference Books:
1)Biological data exploration with Python, pandas and seaborn by Martin Jones. June,
2020. (https://pythonforbiologists.com/biological-data-exploration-book) ISBN-13: 979-
8612757238.
2)Hands-on Machine Learning with Scikit-Learn & TensorFlow by Aurélien Géon. March
2017. Publisher: O'Reilly Media, Inc. ISBN: 9781491962299.
3)Python Crash Course: A Hands-On, Project-Based Introduction to Programming (2nd
Edition).
Sites and Web links:
1. https://mu.ac.in/wp-content/uploads/2022/10/Big-Data-Analytics-and-Visualization.pdf
THANK YOU

Team – DAV

Data Structure & Algorithms Lab Manual V1.2-1
No ratings yet
Data Structure & Algorithms Lab Manual V1.2-1
97 pages
Embedded Interview Questions
100% (1)
Embedded Interview Questions
14 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
HackerRank Python Practice Topics
0% (1)
HackerRank Python Practice Topics
14 pages
Pandas
No ratings yet
Pandas
94 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Loki Temp PPT Pandas 2
No ratings yet
Loki Temp PPT Pandas 2
31 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
RA Continuing Education (Data Processing With Pandas)
No ratings yet
RA Continuing Education (Data Processing With Pandas)
77 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
Pandas
No ratings yet
Pandas
13 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
DA - 2. Pandas
No ratings yet
DA - 2. Pandas
79 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
What is pandas
No ratings yet
What is pandas
9 pages
Pandas
No ratings yet
Pandas
4 pages
PYTHON Pandas and Manipulation Data
No ratings yet
PYTHON Pandas and Manipulation Data
36 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
IV Unit Fds
No ratings yet
IV Unit Fds
16 pages
Intro Pandas
No ratings yet
Intro Pandas
18 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
CO3_1_Pandas Series and Data Frame
No ratings yet
CO3_1_Pandas Series and Data Frame
37 pages
Pandas
No ratings yet
Pandas
5 pages
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
No ratings yet
Pandas Tutorial 1: Pandas Basics (Reading Data Files, Dataframes, Data Selection)
15 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Pandas
No ratings yet
Pandas
13 pages
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
No ratings yet
Experiment No 3 Importing and Exporting Data in Python Using Pandas Student
6 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
Pandas
No ratings yet
Pandas
25 pages
Phan1_Pandas_Numpy_Matplotlib
No ratings yet
Phan1_Pandas_Numpy_Matplotlib
158 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas Dataframe Export The CSV File
No ratings yet
Pandas Dataframe Export The CSV File
9 pages
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
Pandas_Notes
No ratings yet
Pandas_Notes
6 pages
Reference Guide - Pandas Tools For Structuring A Dataset
No ratings yet
Reference Guide - Pandas Tools For Structuring A Dataset
5 pages
Python Pandas Presentation
No ratings yet
Python Pandas Presentation
32 pages
Python Unit 4&5 Que
No ratings yet
Python Unit 4&5 Que
33 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
20 Pandas Functions For 80% of Your Data Science
No ratings yet
20 Pandas Functions For 80% of Your Data Science
22 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
1 page
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
1 page
Exp1 - Manipulating Datasets Using Pandas
No ratings yet
Exp1 - Manipulating Datasets Using Pandas
15 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Pandas Notes(1)
No ratings yet
Pandas Notes(1)
44 pages
Python-for-Data-Analysis-edgar
No ratings yet
Python-for-Data-Analysis-edgar
49 pages
Pandas
No ratings yet
Pandas
12 pages
Unit 3 Data Analysis using pandas - Copy
No ratings yet
Unit 3 Data Analysis using pandas - Copy
49 pages
mypnotes
No ratings yet
mypnotes
3 pages
DataFrame.docx
No ratings yet
DataFrame.docx
95 pages
Pandas Data Structures: Sections
No ratings yet
Pandas Data Structures: Sections
13 pages
Python Pandas
No ratings yet
Python Pandas
19 pages
python interviews
No ratings yet
python interviews
154 pages
PANDAS Python
No ratings yet
PANDAS Python
2 pages
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Data Structures, Algorithms, & Applications: Instructor: Babak Alipour Slides and Book: Sartaj Sahni
No ratings yet
Data Structures, Algorithms, & Applications: Instructor: Babak Alipour Slides and Book: Sartaj Sahni
33 pages
ADA_lab_manual_updated_2023-24[1][1][1]
No ratings yet
ADA_lab_manual_updated_2023-24[1][1][1]
36 pages
C++ Arrays Chapter 9 Class Xii
75% (4)
C++ Arrays Chapter 9 Class Xii
31 pages
Algo 2
No ratings yet
Algo 2
9 pages
14ec320 Syllabus
No ratings yet
14ec320 Syllabus
7 pages
CS Notes
No ratings yet
CS Notes
18 pages
DAA-important Questions For External Exam
No ratings yet
DAA-important Questions For External Exam
3 pages
Ada Lab Manual
No ratings yet
Ada Lab Manual
57 pages
18CS42 Design and Analysis of Algorithms
No ratings yet
18CS42 Design and Analysis of Algorithms
16 pages
Data Structure Multiple Choice Questions and Answers - Set 2 Data Structure Data Structure Objective Type Questions and Answers Data Structure
No ratings yet
Data Structure Multiple Choice Questions and Answers - Set 2 Data Structure Data Structure Objective Type Questions and Answers Data Structure
9 pages
2ceit402 Design and Analysis of Algorithms Ce It Ceai
No ratings yet
2ceit402 Design and Analysis of Algorithms Ce It Ceai
2 pages
Searching on sorted sequence
No ratings yet
Searching on sorted sequence
9 pages
Ok CMPE 30052 DATA STRUCTURES AND ALGORITHM 1
No ratings yet
Ok CMPE 30052 DATA STRUCTURES AND ALGORITHM 1
90 pages
Adobe
No ratings yet
Adobe
25 pages
Searching and Sorting 2
No ratings yet
Searching and Sorting 2
24 pages
3.2 - Foster Methodology Ch2
No ratings yet
3.2 - Foster Methodology Ch2
99 pages
Arrays in Data Structures and Algorithms DSA A
No ratings yet
Arrays in Data Structures and Algorithms DSA A
10 pages
Data Structures and Algorithms Lab Journal - Lab 1
No ratings yet
Data Structures and Algorithms Lab Journal - Lab 1
11 pages
Explain Briefly The Different Building Blocks of Algorithms
No ratings yet
Explain Briefly The Different Building Blocks of Algorithms
19 pages
Complete Download Big C 2nd Edition Cay S. Horstmann PDF All Chapters
100% (22)
Complete Download Big C 2nd Edition Cay S. Horstmann PDF All Chapters
50 pages
Super 30 Course Syllabus
No ratings yet
Super 30 Course Syllabus
24 pages
Immediate download Fundamentals Of Python: Data Structures 2nd Edition Kenneth A. Lambert ebooks 2024
100% (2)
Immediate download Fundamentals Of Python: Data Structures 2nd Edition Kenneth A. Lambert ebooks 2024
55 pages
Y10 05 CT25 Slides
No ratings yet
Y10 05 CT25 Slides
7 pages
STF 2019 Test Questions
No ratings yet
STF 2019 Test Questions
19 pages
Design and Analysis of Algorithms (1)
No ratings yet
Design and Analysis of Algorithms (1)
11 pages
Gujarat Technological University: Analysis and Design of Algorithms
No ratings yet
Gujarat Technological University: Analysis and Design of Algorithms
13 pages
Assignment No 03 - Lab
No ratings yet
Assignment No 03 - Lab
4 pages