Introduction to Pandas
Introduction to Pandas
1. Introduction to Pandas
Pandas is built on top of NumPy and provides two primary data structures: Series and DataFrame.
Series
A Series is a one-dimensional labeled array capable of holding any data type.
import pandas as pd
# Creating a Series
s = pd.Series([1, 3, 5, 7, 9])
print(s)
DataFrame
A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.
# Creating a DataFrame
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']
df = pd.DataFrame(data)
Introduction to Pandas
print(df)
2. Basic Operations on DataFrames
Viewing Data
- head(): View the first few rows of the DataFrame.
- tail(): View the last few rows of the DataFrame.
- info(): Get a summary of the DataFrame.
- describe(): Get descriptive statistics.
print(df.head())
print(df.tail())
print(df.info())
print(df.describe())
Selecting Data
- Using column names.
- Using row indices with iloc and loc.
# Select a column
print(df['Name'])
# Select multiple columns
print(df[['Name', 'City']])
Introduction to Pandas
# Select rows by index
print(df.iloc[1:3])
# Select rows and columns by labels
print(df.loc[0:2, ['Name', 'City']])
3. Data Manipulation
Adding and Dropping Columns
- Adding new columns.
- Dropping columns.
# Adding a new column
df['Country'] = ['USA', 'France', 'Germany', 'UK']
print(df)
# Dropping a column
df = df.drop(columns=['Country'])
print(df)
Filtering Data
- Using conditions to filter rows.
Introduction to Pandas
# Filtering rows where Age > 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)
4. Handling Missing Data
- Checking for missing data.
- Filling missing data.
- Dropping missing data.
# Creating a DataFrame with missing values
data = {
'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, None, 35, 32],
'City': ['New York', 'Paris', None, 'London']
df = pd.DataFrame(data)
# Checking for missing data
print(df.isnull())
# Filling missing data
df_filled = df.fillna({'Age': df['Age'].mean(), 'City': 'Unknown'})
Introduction to Pandas
print(df_filled)
# Dropping missing data
df_dropped = df.dropna()
print(df_dropped)
5. Data Aggregation and Grouping
- Using groupby to group data and perform aggregation.
data = {
'Category': ['A', 'B', 'A', 'B'],
'Value': [10, 20, 30, 40]
df = pd.DataFrame(data)
# Grouping by 'Category' and calculating the sum of 'Value'
grouped_df = df.groupby('Category').sum()
print(grouped_df)
6. Merging and Joining DataFrames
- Concatenation.
Introduction to Pandas
- Merging based on keys.
# Concatenation
df1 = pd.DataFrame({'A': ['A0', 'A1'], 'B': ['B0', 'B1']})
df2 = pd.DataFrame({'A': ['A2', 'A3'], 'B': ['B2', 'B3']})
result = pd.concat([df1, df2])
print(result)
# Merging
left = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'A': ['A0', 'A1', 'A2']})
right = pd.DataFrame({'key': ['K0', 'K1', 'K2'], 'B': ['B0', 'B1', 'B2']})
result = pd.merge(left, right, on='key')
print(result)
7. Advanced Data Operations
Pivot Tables
- Creating pivot tables to summarize data.
data = {
'Date': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'City': ['New York', 'Paris', 'Berlin', 'London'],
Introduction to Pandas
'Sales': [200, 150, 300, 250]
df = pd.DataFrame(data)
pivot_table = df.pivot_table(values='Sales', index='City', columns='Date')
print(pivot_table)
Applying Functions
- Using apply to apply functions to data.
# Applying a lambda function to a column
df['Sales'] = df['Sales'].apply(lambda x: x * 1.1)
print(df)
Conclusion
This is a brief overview of some of the basic and intermediate functionalities of pandas. As you work
more with pandas, you'll discover many more powerful features and methods that can help you
manipulate and analyze data efficiently. Practice is key, so try to work on different datasets and use
the pandas documentation for further reference.