CO3_3_Indexing and Sorting, Loading Data From CSV
CO3_3_Indexing and Sorting, Loading Data From CSV
Session - 11
1
AIM OF THE SESSION
To familiarize students with the concepts of loading csv files, indexing and sorting of data and lists.
INSTRUCTIONAL OBJECTIVES
This Session is designed to: understand importance of Indexing – its real time
LEARNING OUTCOMES
At the end of this session, you should be able to: Understand the
‘Lists’ in Python
Inferential statistics
3
Preparing the Dataset
Fuel economy data compiled by the US Environmental Protection Agency (EPA) on
vehicles made between 1984 and 2021.
The EPA fuel economy dataset is great because it has many different types of information
that you can sort on, from textual to numeric data types.
For analysis purposes, you’ll be looking at MPG (miles per gallon) data on vehicles by
make, model, year, and other vehicle attributes.
4
Python Code
5
Getting Familiar With .sort_values()
We use .sort_values() to sort values in a DataFrame along either axis (columns or rows).
The figure above shows the results of using .sort_values() to sort the DataFrame’s
rows based on the values in the highway08 column.
6
Getting Familiar With .sort_index()
You use .sort_index() to sort a DataFrame by its row index or column labels.
The difference from using .sort_values() is that you’re sorting the DataFrame
based on its row index or column names, not by the values in these rows or
columns.
7
Sorting Your DataFrame on a Single Column
To sort the DataFrame based on the values in a single column, you’ll
use .sort_values(). By default, this will return a new DataFrame sorted in
ascending order.
It does not modify the original DataFrame.
8
Sorting by a Column in Ascending Order
• To use .sort_values(), you pass a single argument to the method containing
the name of the column you want to sort by. In this example, you sort the
DataFrame by the city08 column, which represents city MPG for fuel-only
cars.
This sorts your
DataFrame using the
column values from city08,
showing the vehicles with
the lowest MPG first.
By
default, .sort_values() sorts
your data in ascending
order.
9
Changing the Sort Order
Another parameter of .sort_values() is ascending.
By default .sort_values() has ascending set to True.
If you want the DataFrame sorted in descending order, then you can pass False to
this parameter.
10
Choosing a Sorting Algorithm
Pandas allows you to choose different sorting algorithms to use with
both .sort_values() and .sort_index().
The available algorithms are quicksort, mergesort, and heapsort.
11
Sorting Your DataFrame on Multiple Columns
In data analysis, it’s common to want to sort your data based on the values of
multiple columns.
Imagine you have a dataset with people’s first and last names. It would make
sense to sort by last name and then first name, so that people with the same last
name are arranged alphabetically according to their first names.
12
Sorting Your DataFrame on Multiple Columns
In addition to the MPG in city conditions,
you may also want to look at MPG for
highway conditions. To sort by two keys,
you can pass a list of column names to by:
13
Sorting by Multiple Columns in Ascending Order
To sort the DataFrame on multiple columns, you must provide a list of
column names. For example, to sort by make and model, you should
create the following list and then pass it to .sort_values():
14
Sorting by Multiple Columns in Descending Order
Sorting in descending order based on the make and model columns.
To sort in descending order, set ascending to False:
15
Sorting Your DataFrame on Its Index
Before sorting on the index, it’s a good idea to know what an index represents.
A DataFrame has an .index property, which by default is a numerical
representation of its rows’ locations.
You can think of the index as the row numbers. It helps in quick row lookup and
identification.
16
Sorting Your DataFrame on Its Index
17
Sorting by Index in Descending Order
18
Sorting by Index in Descending Order
20
Understanding the different types of merge
We can merge two data frames in pandas python by using the merge()
function.
The different arguments to merge() allow you to perform natural join, left
join, right join, and full outer join in pandas.
Before you perform joint operations let’s first load the two csv files and
convert them into data frames df1 and df2.
21
Natural join
Natural join keeps only rows that match from the data frames(df1 and df2),
specify the argument how=’inner’
Syntax: pd.merge(df1, df2, on=column', how='inner')
Return only the rows in which the left table have matching keys in the right
table
22
Full outer join
Full outer join keeps all rows from both data frames, specify how=‘outer’.
Syntax: pd.merge(df1, df2, on=column', how=’outer’)
Returns all rows from both tables, join records from the left which have
matching keys in the right table.
23
Left outer join
Left outer join includes all the rows of your data frame df1 and only those
from df2 that match, specify how =‘Left.
Syntax: pd.merge(df1, df2, on=column', how=left)
Return all rows from the left table, and any rows with matching keys from the
right table.
24
Right outer join
Return all rows from the df2 table, and any rows with matching keys from the
df1 table, specify how =‘Right’.
Syntax: pd.merge(df1, df2, on=column', how=right)
Return all rows from the right table, and any rows with matching keys from
the left table.
25
SELF-ASSESSMENT QUESTIONS
1. What is the syntax to create a Pandas Series from a Python list ___________________
3. What is a correct syntax to add the lables "x", "y", and "z" to a Pandas Series __________________
3) Defend your steps on delimited text file that uses a comma to separate
the values.
Reference Books:
1)Biological data exploration with Python, pandas and seaborn by Martin Jones. June,
2020. (https://pythonforbiologists.com/biological-data-exploration-book) ISBN-13: 979-
8612757238.
2)Hands-on Machine Learning with Scikit-Learn & TensorFlow by Aurélien Géon. March
2017. Publisher: O'Reilly Media, Inc. ISBN: 9781491962299.
3)Python Crash Course: A Hands-On, Project-Based Introduction to Programming (2nd
Edition).
Sites and Web links:
1. https://mu.ac.in/wp-content/uploads/2022/10/Big-Data-Analytics-and-Visualization.pdf
THANK YOU
Team – DAV