0% found this document useful (0 votes)

38 views45 pages

Introduction To Data Science Using Python Part2

Uploaded by

salahmohamed38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views45 pages

Introduction To Data Science Using Python Part2

Uploaded by

salahmohamed38

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Introduction to Data Science

using python Part2

Pandas
Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Let’s see how we read this data into pandas:

Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Before you use pandas you must

Let’s see how we read this data into pandas: import it. Anytime you use pandas put
this line as the top of your code.
Reading in Data From Excel
I have the following data saved in the file “Grades_Short.csv”:

Reading the data into a variable called

Let’s see how we read this data into pandas: df_grades.

Built in read_csv method Path to file

Reading in Data From Excel
So, what is df_grades and how does it store the data?

Typing the name of any variable at the end of a code cell will display the contents of
the variable.
Reading in Data From Excel
So, what is df_grades and how does it store the data?

• df_grades is a pandas dataframe.

• The data is stored in a tabular format very similar to excel.

Reading in Data From Excel
Data file

Jupyter notebook
Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter notebook

Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter Notebook

“/” separates directories

Reading in Data From Excel

Now Grades_Short.csv is in Data Folder Jupyter notebook in folder

Notebooks
“..” = go back one directory
The head() Method
Using the head() method

• If the data is really large you don’t want to print out the entire dataframe to your
output.

• The head(n) method outputs the first n rows of the data frame. If n is not supplied,
the default is the first 5 rows.

• I like to run the head() method after I read in the dataframe to check that everything
got read in correctly.

• There is also a tail(n) method that returns the last n rows of the dataframe
Basic Features

Think of this
as a list

object = string

float64 = decimal

int64 = integer
Basic Features
column names

row names = index

Basic Features
column names

row names = index

Basic Features
column names

row names = index

• Pandas defaults to have the index be the row number and it will automatically
recognize that the first row is the column names.

• Next we discuss how to pick out various pieces of the dataframe.

Selecting a Single Column

• Between square brackets, the column must be given as a string

• Outputs column as a series
• A series is a one dimensional dataframe. more on this in the slicing
section
Selecting a Single Column

• Exactly equivalent way to get Name column

• + : don’t have to type brackets or quotes
• -: won’t generalize to selecting multiple columns,, won’t work if column
names have spaces, can’t create new columns this way
Selecting Multiple Columns

• List of strings, which correspond to

column names.
• You can select as many column as
you want.
• Column don’t have to be contiguous.
Storing Result

Why store a slice?

• We might want/have to do our

analysis is steps.
• Less error prone
• More readable

The variable name stores a

series
Slicing a Series

Slice/index through
the index, which is
usually numbers
Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element

Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element Contiguous slice

non_inclusive
Slicing a Series

Slice/index through
the index, which is
usually numbers

Picking out single element Contiguous slice

Arbitrary slice
Slicing a Data Frame

• There are a few ways to pick slice a data frame, we will use the .loc method.

• Access elements through the index labels column names

• We will see how to change both of these labels later on

Slicing a Data Frame

• Pick a single value out.

Column name
Index label (string)
(number)
Slicing a Data Frame

• Pick out entire row: “pick out all

columns”

first_row is a series
Slicing a Data Frame

• Pick out contiguous chunk: Endpoints are inclusive!

Slicing a Data Frame

• Pick out arbitrary chunk:

Built in Functions

How do I compute the average score on the final?

Built in Functions

How do I compute the average score on the final?

Built in mean() method

Built in Functions

How do I compute the highest Mini Exam 1 score?

Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:

summary_df is
a dataframe!
Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:
Built in Functions

I can actually get all key stats for numeric columns at once with the describe()
method:

Notice here the

index is not row
numbers…
Built in Functions

Other useful built in methods:

value_count(): Gives a count of the number of times each unique value apears in the
column. Returns a series where indices are the unique column values.
Built in Functions

Other useful built in methods:

value_count(): Gives a count of the number of times each unique value appears in the
column. Returns a series where indices are the unique column values.
Built in Functions

Other useful built in methods:

unique(): Returns an array of all of the unique values.

Attributes vs. Methods

When do I a put a ()?

Attributes vs. Methods

When do I a put a ()?

dataframe attributes
dataframe methods
Attributes vs. Methods

When do I a put a ()?

dataframe attributes
dataframe methods

Require computation for output

Features of dataframe
Creating New Columns

Let’s create a useless new column of all 1s:

Creating New Columns

We can also create column as function of other column. The Final was worth 36
points, let’s create a column for each student’s percentage.
Deleting Columns
Deleting Columns

Introdution To GIS Programming - ToC
No ratings yet
Introdution To GIS Programming - ToC
9 pages
Healthcare - Chatbot Report
No ratings yet
Healthcare - Chatbot Report
44 pages
Pandas Basics
No ratings yet
Pandas Basics
84 pages
Mastering Objectoriented Python
From Everand
Mastering Objectoriented Python
Steven F. Lott
5/5 (2)
Pandas
No ratings yet
Pandas
5 pages
DataFrame Ac Win Final
No ratings yet
DataFrame Ac Win Final
30 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Rajni Ip File Final
No ratings yet
Rajni Ip File Final
42 pages
Python Data Science 101
100% (1)
Python Data Science 101
41 pages
Lecture Week2
No ratings yet
Lecture Week2
72 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
En - PY0101EN - Loading Data With Pandas
No ratings yet
En - PY0101EN - Loading Data With Pandas
2 pages
Chapter Notes - Data Handling Using Pandas DataFrame
No ratings yet
Chapter Notes - Data Handling Using Pandas DataFrame
16 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Python-for-Data-Analysis (Pandas
No ratings yet
Python-for-Data-Analysis (Pandas
31 pages
Pandas
No ratings yet
Pandas
12 pages
Python Pandas Demo PDF
100% (2)
Python Pandas Demo PDF
23 pages
For Assignment-3 (Final - Pandas - Lab)
No ratings yet
For Assignment-3 (Final - Pandas - Lab)
40 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Murali Internship
No ratings yet
Murali Internship
34 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Unit IV
No ratings yet
Unit IV
49 pages
Pandas 1705297450
No ratings yet
Pandas 1705297450
21 pages
On Data Handling Using Pandas-I
100% (2)
On Data Handling Using Pandas-I
63 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
Data Frame in Panda 01
No ratings yet
Data Frame in Panda 01
9 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
14 pages
Pandas - Digitalocean
No ratings yet
Pandas - Digitalocean
15 pages
Unit 4.2
No ratings yet
Unit 4.2
24 pages
CO3 - 1 - Pandas Series and Data Frame
No ratings yet
CO3 - 1 - Pandas Series and Data Frame
37 pages
Pandas Cheat Sheet........
No ratings yet
Pandas Cheat Sheet........
11 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas For Machine Learning: Acadview
No ratings yet
Pandas For Machine Learning: Acadview
18 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
10 pages
Wa0046.
No ratings yet
Wa0046.
8 pages
Pandas
No ratings yet
Pandas
41 pages
Data Frames
No ratings yet
Data Frames
60 pages
Data Handling Using Pandas-1
No ratings yet
Data Handling Using Pandas-1
60 pages
Pandas
No ratings yet
Pandas
27 pages
Chapter 4 - Python For Data Analysis
No ratings yet
Chapter 4 - Python For Data Analysis
47 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Pandas Interview Questions
No ratings yet
Pandas Interview Questions
21 pages
Pandas DataFrame Notes
67% (3)
Pandas DataFrame Notes
13 pages
1 Data Handling Using Pandas 1
No ratings yet
1 Data Handling Using Pandas 1
63 pages
Python For ML
No ratings yet
Python For ML
41 pages
Python For Data Analysis: Dr. Kishore Kunal
100% (1)
Python For Data Analysis: Dr. Kishore Kunal
43 pages
IP 12th Chapter 3
No ratings yet
IP 12th Chapter 3
9 pages
Lab-3 Pandas Library
No ratings yet
Lab-3 Pandas Library
14 pages
Data Handing Using Pandas-I
100% (2)
Data Handing Using Pandas-I
46 pages
Class XII IP Key Points (Python Pandas)
No ratings yet
Class XII IP Key Points (Python Pandas)
5 pages
Class Xii Information Practices PPT On Data Handling Using Pandas-I
No ratings yet
Class Xii Information Practices PPT On Data Handling Using Pandas-I
64 pages
Pandas-PPT
No ratings yet
Pandas-PPT
32 pages
Pandas
No ratings yet
Pandas
8 pages
Mastering Data Structures and Algorithms in C and C++
From Everand
Mastering Data Structures and Algorithms in C and C++
Sachin Naha
No ratings yet
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Caltech - Data Science Bootcamp
No ratings yet
Caltech - Data Science Bootcamp
32 pages
Python Learning Planner
No ratings yet
Python Learning Planner
6 pages
Python in Chemestry
No ratings yet
Python in Chemestry
9 pages
Python For Data Analysis
No ratings yet
Python For Data Analysis
4 pages
Analyst Resume
No ratings yet
Analyst Resume
1 page
Skills: MD Aqueef Danyal
No ratings yet
Skills: MD Aqueef Danyal
2 pages
Mohini Maggo
No ratings yet
Mohini Maggo
1 page
Pandas Dataframe Methods Structured
No ratings yet
Pandas Dataframe Methods Structured
3 pages
Python Programming Internship Report PDF
No ratings yet
Python Programming Internship Report PDF
4 pages
R and Python For Oceanographers: A Practical Guide With Applications Hakan Alyurukinstant Download
100% (2)
R and Python For Oceanographers: A Practical Guide With Applications Hakan Alyurukinstant Download
73 pages
Management Resume
No ratings yet
Management Resume
1 page
Informatics Practices Cheshta Gupta
No ratings yet
Informatics Practices Cheshta Gupta
30 pages
3rd Quart Exam Robtics & AI Class 10
No ratings yet
3rd Quart Exam Robtics & AI Class 10
5 pages
Deep CNN Model To Detect Parkinson
No ratings yet
Deep CNN Model To Detect Parkinson
56 pages
Python Importing Data Cheat Sheet
No ratings yet
Python Importing Data Cheat Sheet
1 page
Agya Ram Verma - Yatendra Kumar - Basic and Advance - Phython Programming-Independently Published (2024)
No ratings yet
Agya Ram Verma - Yatendra Kumar - Basic and Advance - Phython Programming-Independently Published (2024)
240 pages
Oral Exam Question and Answer Python Programming
No ratings yet
Oral Exam Question and Answer Python Programming
11 pages
Class 12 IP Practice Assignment Series 2
No ratings yet
Class 12 IP Practice Assignment Series 2
4 pages
Resume of Data Analyst
No ratings yet
Resume of Data Analyst
2 pages
Eda Lab
No ratings yet
Eda Lab
43 pages
Python Machine Learning Workbook For Beginners
No ratings yet
Python Machine Learning Workbook For Beginners
264 pages
Data Science Fir Civil Engineering Unit 1 Notes and Assignments
No ratings yet
Data Science Fir Civil Engineering Unit 1 Notes and Assignments
29 pages
Data Analyst Resume
No ratings yet
Data Analyst Resume
1 page
PSPP Set 1
No ratings yet
PSPP Set 1
2 pages
Python Basics For Data Science and Analysis
No ratings yet
Python Basics For Data Science and Analysis
29 pages
Advanced Certificate Program in Data Science and AI Curriculum v1.0
No ratings yet
Advanced Certificate Program in Data Science and AI Curriculum v1.0
55 pages
IPython CUsersrohit
No ratings yet
IPython CUsersrohit
3 pages

Introduction To Data Science Using Python Part2

Uploaded by

Introduction To Data Science Using Python Part2

Uploaded by

Introduction to Data Science

using python Part2

Let’s see how we read this data into pandas:

Before you use pandas you must

Reading the data into a variable called

Built in read_csv method Path to file

• df_grades is a pandas dataframe.

• The data is stored in a tabular format very similar to excel.

Now Grades_Short.csv is in Data Folder Jupyter notebook

Now Grades_Short.csv is in Data Folder Jupyter Notebook

“/” separates directories

Now Grades_Short.csv is in Data Folder Jupyter notebook in folder

row names = index

row names = index

row names = index

• Next we discuss how to pick out various pieces of the dataframe.

• Between square brackets, the column must be given as a string

• Exactly equivalent way to get Name column

• List of strings, which correspond to

Why store a slice?

• We might want/have to do our

The variable name stores a

Picking out single element

Picking out single element Contiguous slice

Picking out single element Contiguous slice

• Access elements through the index labels column names

• We will see how to change both of these labels later on

• Pick a single value out.

• Pick out entire row: “pick out all

• Pick out contiguous chunk: Endpoints are inclusive!

• Pick out arbitrary chunk:

How do I compute the average score on the final?

How do I compute the average score on the final?

Built in mean() method

How do I compute the highest Mini Exam 1 score?

Notice here the

Other useful built in methods:

Other useful built in methods:

Other useful built in methods:

unique(): Returns an array of all of the unique values.

When do I a put a ()?

When do I a put a ()?

When do I a put a ()?

Require computation for output

Let’s create a useless new column of all 1s:

You might also like