0% found this document useful (0 votes)

16 views3 pages

Pandas Notes

Pandas is a Python library used for working with structured data and tables. It allows users to clean, analyze, and visualize data. Pandas provides Series and DataFrame objects for working with one-dimensional and two-dimensional labeled data structures. DataFrames can be created from various data sources like lists, dictionaries, and CSV/JSON files. Pandas offers methods for cleaning data by handling missing values, reformatting data types, and removing duplicates. It also provides functions for analyzing data through descriptive statistics, grouping, and plotting visualizations.

Uploaded by

Edu Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views3 pages

Pandas Notes

Uploaded by

Edu Costa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

PANDAS NOTES

import pandas as pd

0. INTRODUCTION

Pandas is used to analyze big data. It allows us to clean messy data sets,
and make them readable and relevant.

1. PANDAS SERIES

It is a column in a table and it is created with 'pd.Series(list, index=)' or

'pd.Series(dict, index=)'. If not specified, the labels will be their index, but we
can write the exact label we want for each value. Then, we can index through the
label, which will be the index or the label we gave it. We can also create the
series from a dictionary, given it has already key and value, and in the index part
we can select only the labels we want.

2. PANDAS DATAFRAMES

It is a full table of data. We use 'pd.DataFrame(data,index=)' where data is

a dictionary where the values are lists with the elements in each column of the
table. We can also label the indices with a list as done with the series and then
call them with that.

We can return a row with '.loc[index]' or a series of rows with a list of the
indices of those rows '.loc[list]'. The returned value will be a Series and a
DataFrame respectively. Recall that with series we could just get the rows with the
index but now the indexing will give us the columns.

We can use the function '.rename(columns=,inplace=)' to replace the names of

the columns.

3. READ CSV (AND JSON)

A CSV file can be imported to a DataFrame with 'pd.read_csv(file)'. We can

print the entire DataFrame with '.to_string()', or we can print it directly but it
will not show all the rows. We can check the maximum number of rows displayed with
'pd.options.display.max_rows' and change it with 'pd.options.display.max_rows =
1000'. For the JSON files the command is 'pd.read_json(file)'. They have the same
format as dictionaries.

4. ANALYZING DATA

To get a quick overview of the data we use '.head(=5)' where the number in
parenthesis is the rows that will be shown. '.tail()' does the same but starting
from the end.

The function '.info()' gives some information about the data set like rows,
columns, labels, non-null counts and data types.

We can transform the data from the columns to arrays with the command
'.values' over a column of a dataframe or a series.

We can get a list with the labels of the columns with '.columns'. We convert
it to a list with the function '.tolist()'.

The function '.describe()' gives information as the count, mean, std, min,
max and values at different percentages of each column.
5. CLEANING DATA

It is important to clean bad data from the set before computing with it. We
can find empty cells, data in wrong format, wrong data and duplicates.

Also, the use of masks in the indexing will not change the format of the
data, it will not flatten it. It will just get rid of the rows that not satisfy the
mask.

6. CLEANING EMPTY CELLS

For empty cells we can either remove the whole row with that value or replace
the value.

To remove the row, we use '.dropna(subset=,inplace = True)', where the

inplace command allows us to change the original when set to true.

Empty cells can be replaced with a desired value with '.fillna(value,

inplace=)'. If we want to do that to a certain column, we have to select that
column using indexing syntax with the data frame, with the label inside the
indexing. This value can be a chosen one, the '.mean()' (average value), the
'mode()' (vaule in the middle after them being sorted out ascending) or the
'median()' (the value that appears more frequently).

7. CLEANING WRONG FORMAT

For the wrong format it is a little bit more complicated. We can either
remove the rows or try to correct the value. Dates that are not a string but the
numbers are good, can be retrieved with the function '.to_datetime(column)' by re-
writing the column. If this does not work, now at least it will be a null value
that can be removed with the function '.dropna(subset=column,inplace=)'.

8. CLEANING WRONG DATA

If we spot a wrong value, we can just change it indexing with

'.loc[indexoftherow, labelofcolumn] = newvalue'.

If we want to do this at a larger scale, we use a loop for the indices (with
'.index') and then using an if statement with the '.loc[index,label]' function. We
could also remove the column with '.drop(index, inplace=)'.

9. REMOVING DUPLICATES

We find duplicates with '.duplicated()', that returns a list of booleans with

true values in the ones duplicated. We remove them with '.drop_duplicates(inplace =
True)'.

10. ADDING DATA

We can add new data easily based on exisitng data on the dataframe. We can
use it for example to create columns with normalized values. The procedure is the
following:

v_mean = np.mean(v)
v_rms = np.sqrt(np.mean((v-v_mean)**2))
df[col+'_normalized'] = (v-v_mean)/v_rms
To do so, we use 'np.isnan(array)' to replace possible NaN for 0. To get the
info in a new column we just calculate the array with the values and we add it as
if we were changing the value, with the typical syntax.

11. PANDAS PLOTTING

Although Pandas allow building of plots, we need the Pyplot library to show
it, imported with 'import matplotlib.pyplot as plt'.

We simply can plot with '.plot()' and show with 'plt.show()'.

We can do scatter plots with an x and a y axis with '.plot(kind = 'scatter',

x = 'Duration', y = 'Calories')'.

We can also plot histograms with the information of one column (it shows the
frequency of the different intervals of the values) with 'column.plot(kind =
'hist')'. Also, histograms can be done with 'plt.hist(column)'

Other things are used in the code: 'import sys', 'import matplotlib',
'matplotlib.use('Agg')', 'plt.savefig(sys.stdout.buffer)' and 'sys.stdout.flush()'.

Other plotting options are, for example, the scatter matrices, imported with:

from pandas.plotting import scatter_matrix

and used with 'scatter_matrix(df[normalized_cols][:2000], figsize=(12, 12),

alpha=0.2, s=50, diagonal='kde')', although I do not know what it does.

12. PANDAS CORRELATION? Looks interesting but has not been done in class

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Vivid S70N S60N v205 Basic Service Manual - SM - FR091940-1EN - 3
No ratings yet
Vivid S70N S60N v205 Basic Service Manual - SM - FR091940-1EN - 3
546 pages
Black Wade The Wild Side of Love PDF
No ratings yet
Black Wade The Wild Side of Love PDF
4 pages
Python Module 1 Question Bank Answers
No ratings yet
Python Module 1 Question Bank Answers
23 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
data handling module
No ratings yet
data handling module
10 pages
Content Pandas Cheat Sheet
No ratings yet
Content Pandas Cheat Sheet
9 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
mypnotes
No ratings yet
mypnotes
3 pages
Pandas_Tutorial
No ratings yet
Pandas_Tutorial
9 pages
Commands SQL, Python (BASICS)
No ratings yet
Commands SQL, Python (BASICS)
7 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Python Pandas Tutorial For Beginners
No ratings yet
Python Pandas Tutorial For Beginners
203 pages
Pandas CheatSheet
No ratings yet
Pandas CheatSheet
18 pages
Pandas
No ratings yet
Pandas
9 pages
FDS Module 2 Notes
No ratings yet
FDS Module 2 Notes
24 pages
CHP 8 Pandas
No ratings yet
CHP 8 Pandas
49 pages
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
No ratings yet
Christian Mayer, Lukas Rieger, Kyrylo Kravets - Coffee Break Pandas - 74 Pandas Puzzles To Build Your Pandas Data Science Superpower-Finxter - Com (2020)
156 pages
Fundamental - Python
No ratings yet
Fundamental - Python
3 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
What is pandas
No ratings yet
What is pandas
9 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Introduction to Pandas Programming 2
No ratings yet
Introduction to Pandas Programming 2
3 pages
DV FINAL QB
No ratings yet
DV FINAL QB
60 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas
No ratings yet
Pandas
29 pages
exp3 python (1)
No ratings yet
exp3 python (1)
15 pages
99c949c0-5910-425f-9ac5-155882800fa5
No ratings yet
99c949c0-5910-425f-9ac5-155882800fa5
36 pages
Pandas
No ratings yet
Pandas
5 pages
14oct Pandas 2024
No ratings yet
14oct Pandas 2024
13 pages
ML UNIT-2 NOTES
No ratings yet
ML UNIT-2 NOTES
17 pages
Pandas Roadmap
No ratings yet
Pandas Roadmap
6 pages
Pandas
No ratings yet
Pandas
13 pages
unit-3(FODS)
No ratings yet
unit-3(FODS)
34 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
Pandas
No ratings yet
Pandas
26 pages
pandas (1)
No ratings yet
pandas (1)
25 pages
Pandas_Notes
No ratings yet
Pandas_Notes
6 pages
Exercise 3
No ratings yet
Exercise 3
12 pages
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
No ratings yet
3Y3Z2Xzqn7 U Y%K : 2. How To Create A Data Frame Using A Dictionary of Pre-Existing Columns or Numpy 2D Arrays?
8 pages
I.p file
No ratings yet
I.p file
20 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Python for Analytics_2025_2020
No ratings yet
Python for Analytics_2025_2020
28 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Pandas 1702216043
No ratings yet
Pandas 1702216043
86 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Pandas Notes
No ratings yet
Pandas Notes
4 pages
Data Analysis With Python
No ratings yet
Data Analysis With Python
60 pages
Pandas: Import
100% (1)
Pandas: Import
13 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Pandas
No ratings yet
Pandas
21 pages
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
pandas_notes
No ratings yet
pandas_notes
8 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Pandas
No ratings yet
Pandas
13 pages
Pandas
No ratings yet
Pandas
25 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Python Comands
No ratings yet
Python Comands
3 pages
justenoughpython_pandas_220915_175329
No ratings yet
justenoughpython_pandas_220915_175329
64 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Excel Techniques
From Everand
Excel Techniques
Online Trainees
2/5 (1)
5D2 S4CLD2208 BPD en de
No ratings yet
5D2 S4CLD2208 BPD en de
35 pages
Java Programming Lab Assignment - 5
No ratings yet
Java Programming Lab Assignment - 5
11 pages
The Dynamics and Future of Cloud Based Software in Indonesia
No ratings yet
The Dynamics and Future of Cloud Based Software in Indonesia
50 pages
Week 1 Intro
No ratings yet
Week 1 Intro
32 pages
Graylog Product Adoption Guide
No ratings yet
Graylog Product Adoption Guide
5 pages
Forensics and Incident Response-Question BANK - Fair For Students
No ratings yet
Forensics and Incident Response-Question BANK - Fair For Students
5 pages
Questions and Answers
100% (1)
Questions and Answers
38 pages
Instant Ebooks Textbook Kafka: The Definitive Guide, 2nd Edition (Early Release) Neha Narkhede Download All Chapters
100% (4)
Instant Ebooks Textbook Kafka: The Definitive Guide, 2nd Edition (Early Release) Neha Narkhede Download All Chapters
52 pages
L6 Lesson plan – Computing systems – Y8
No ratings yet
L6 Lesson plan – Computing systems – Y8
5 pages
Name Title On Market Place
No ratings yet
Name Title On Market Place
4 pages
FINAL REPORT-App For Prisoners
No ratings yet
FINAL REPORT-App For Prisoners
59 pages
Advance Computer Knowhow
100% (1)
Advance Computer Knowhow
31 pages
A Case For Software Estimation: Effort (Size X Complexity) Productivity
No ratings yet
A Case For Software Estimation: Effort (Size X Complexity) Productivity
11 pages
IMS-DC Presentacion
No ratings yet
IMS-DC Presentacion
37 pages
Mayank Sony CV
No ratings yet
Mayank Sony CV
1 page
Os Index New
No ratings yet
Os Index New
8 pages
MCQ
No ratings yet
MCQ
60 pages
Brochure 7425 7428 7435
No ratings yet
Brochure 7425 7428 7435
8 pages
Experiment14
No ratings yet
Experiment14
3 pages
Class 12 Computer Science Sample Paper Set 12
No ratings yet
Class 12 Computer Science Sample Paper Set 12
13 pages
TU BBA 1st Sem Computer Ref
No ratings yet
TU BBA 1st Sem Computer Ref
74 pages
Miscellaneous VRACK: Telkom University Bandung Techno Park Telecommunication Street No.1 40257 Bandung ID
No ratings yet
Miscellaneous VRACK: Telkom University Bandung Techno Park Telecommunication Street No.1 40257 Bandung ID
2 pages
MCS 22 Solved Assignment
No ratings yet
MCS 22 Solved Assignment
22 pages
BS 1560 - Flanges PDF
No ratings yet
BS 1560 - Flanges PDF
69 pages
Validation Completed - 5 Errors Found
No ratings yet
Validation Completed - 5 Errors Found
1 page
Touch Projector
No ratings yet
Touch Projector
62 pages
SPI-USB-Users-manual EN
No ratings yet
SPI-USB-Users-manual EN
5 pages