Pandas Notes
Here are detailed notes on the pandas basics we’ve covered. Each section follows
the same five-point pattern, with very simple examples and full code you can run.
1. Creating a DataFrame
1. Real-Life Example
You run a roadside tea stall and note today’s sales of two items:
Cups of tea sold
Packets of biscuits sold
You write it on paper:
Tea – 30 cups
Biscuits – 20 packets
To analyse sales, you want this in a neat computer table.
2. Concise Definition
A DataFrame is pandas’ way to store table data (rows × columns), just like an
Excel sheet.
Pandas Notes 1
3. Syntax Breakdown
pd.DataFrame(data, # your raw data (dict or list of dicts)
columns=None, # (optional) list of column names
index=None) # (optional) list of row labels
data: often a dict of equal-length lists, e.g. {'item': [...], 'count': [...]}
columns: lets you pick or rename which columns appear, and in what order
index: lets you label rows (e.g. dates or IDs)
4. Full, Working Code
import pandas as pd
# 1) Raw sales data as a dictionary
data = {
'item': ['Tea', 'Biscuits'],
'sold': [30, 20]
}
# 2) Create the DataFrame
df = pd.DataFrame(data)
# 3) Show the table
print(df)
Output:
item sold
0 Tea 30
Pandas Notes 2
1 Biscuits 20
5. Three Key Takeaways
1. Dict → Table: A dict of lists becomes a neat table.
2. Check with print(df): Always look at your table right after creating it.
3. Flexible Input: You can also start from a list of records ( [{'item':'Tea','sold':30}, …] ).
2. Inspecting with head()
1. Real-Life Example
You have a big list of student marks. You want to peek at the first few entries to
confirm you loaded them correctly.
2. Concise Definition
df.head(n) shows the first n rows of your DataFrame (default n=5 ).
3. Syntax Breakdown
df.head(n)
df: your DataFrame
.head: the function to look at top rows
(n): number of rows to show (optional; default = 5)
4. Full, Working Code
import pandas as pd
data = {
Pandas Notes 3
'student': ['Amit', 'Bina', 'Chirag', 'Deepa', 'Esha', 'Farhan'],
'marks': [85, 90, 78, 92, 88, 75]
}
df = pd.DataFrame(data)
# Show the first 3 students
print(df.head(3))
Output:
student marks
0 Amit 85
1 Bina 90
2 Chirag 78
5. Three Key Takeaways
1. Quick Peek: head() avoids scrolling through hundreds of rows.
2. Default = 5: Without (n) , you see the first 5.
3. Errors Show Early: If your data header is wrong, you spot it immediately.
3. Checking Structure ( shape , columns , dtypes )
1. Real-Life Example
You have a guest list for a family function with names, ages, and gifts they bring.
You want to know:
How many guests?
What columns do you have?
Are ages stored as numbers or text?
Pandas Notes 4
2. Concise Definition
df.shape → returns (rows, columns)
df.columns → lists the column names
df.dtypes → shows each column’s data type (int, float, object)
3. Syntax Breakdown
df.shape # no (), returns a tuple like (10, 3)
df.columns # no (), returns an Index of column names
df.dtypes # no (), returns a Series of column:data_type
4. Full, Working Code
import pandas as pd
guests = {
'name': ['Ravi', 'Sara', 'Manoj'],
'age': [28, 25, 30],
'gift': ['Flowers','Chocolates','Book']
}
df = pd.DataFrame(guests)
# Check structure
print("Shape :", df.shape)
print("Columns :", df.columns)
print("Data types:\n", df.dtypes)
Output:
Pandas Notes 5
Shape : (3, 3)
Columns : Index(['name', 'age', 'gift'], dtype='object')
Data types:
name object
age int64
gift object
dtype: object
5. Three Key Takeaways
1. Know Size: shape tells you exactly how many rows and columns.
2. See Fields: columns avoids guessing field names.
3. Type Safety: dtypes lets you catch “ages as text” before you do math.
4. Aggregation with agg()
1. Real-Life Example
You track daily sales of two sweets at your mithai shop. After a week, you want:
Total sweets sold
Average price you charged
Day with maximum laddoos sold
Instead of manual sums, you use pandas to tell you in one step.
2. Concise Definition
(or df.agg() ) computes summary numbers (sum, mean, max, min)
df.aggregate()
across entire DataFrame.
Combined with groupby() , it does the same per category (e.g., per sweet type).
3. Syntax Breakdown
Pandas Notes 6
# Overall summary
df.agg({'sold':'sum', 'price':'mean'})
# By category
df.groupby('sweet').agg({
'sold':['sum','max'],
'price':['mean','min']
})
df: your table
.agg / .aggregate: the summary function
groupby('col'): first split rows by that column
func dict/list: choose which statistics you want
4. Full, Working Code
import pandas as pd
data = {
'day': ['Mon','Tue','Wed','Thu','Fri','Sat','Sun'],
'sweet': ['laddoo','laddoo','gulab','laddoo','gulab','gulab','laddoo'],
'sold': [10, 12, 8, 15, 10, 9, 11],
'price': [20, 20, 25, 20, 25, 25, 20]
}
df = pd.DataFrame(data)
# 1) Overall summary
print(df.agg({'sold':'sum','price':'mean'}))
# 2) Summary by sweet
Pandas Notes 7
print(df.groupby('sweet').agg({'sold':['sum','max'],'price':['mean','min']}))
5. Three Key Takeaways
1. One-Step Summary: .agg() gives totals and averages in one command.
2. Compare Groups: groupby()+agg() shows stats per category (like laddoo vs
gulab).
3. Customizable: Pass your own list or dict of functions— .agg(['min','max','mean']) or
even your own Python function.
End of Notes
Keep practicing with your own data every
day!
Pandas Notes 8