One-Day Intensive Python Data Analysis and
Visualization Workshop
Welcome! This expertly crafted curriculum is designed for absolute beginners to master the
essential Python libraries for data analysis and visualization: Pandas, NumPy, Matplotlib,
Seaborn, and Plotly. By the end, you’ll be able to manipulate, analyze, and visualize data
confidently.
1. Pandas: Data Manipulation Made Easy
What is Pandas & Why It’s Important?
Pandas is the most widely used Python library for data analysis. It provides intuitive structures
(Series & DataFrame) to clean, transform, and analyze tabular data efficiently.
A. Core Data Structures
Series: A one-dimensional labeled array.
DataFrame: A two-dimensional labeled data table (think spreadsheet).
import pandas as pd
# Series example
grades = pd.Series([85, 90, 95], index=['Alice', 'Bob', 'Charlie'])
# DataFrame example
data = {'Name': ['Alice', 'Bob'], 'Grade': [85, 90]}
df = pd.DataFrame(data)
B. Reading and Writing Data
# Read from CSV
df = pd.read_csv('data.csv')
# Write to Excel
df.to_excel('output.xlsx', index=False)
C. Data Cleaning and Preprocessing
Handling missing values:
df.isnull().sum() # Find missing
df['col'].fillna(0, inplace=True) # Replace missing
Dropping duplicates/rows:
df.drop_duplicates(inplace=True)
df.dropna(subset=['col'], inplace=True)
D. Filtering, Grouping, and Aggregating
# Filtering rows
df_filtered = df[df['Grade'] > 85]
# Grouping and aggregating
df_grouped = df.groupby('Name')['Grade'].mean()
E. Efficient Manipulation Tips
Use df.loc and df.iloc for indexing
Prefer vectorized operations over loops for speed
Exercise: Load a CSV of IPL player scores, clean missing values, and compute average runs per
player.
2. NumPy: Fast Numerical Computing
What is NumPy?
NumPy supplies the backbone for high-speed numerical calculations in Python, using arrays
much faster than standard lists.
A. Arrays vs. Lists
import numpy as np
arr = np.array([1, 2, 3, 4])
Arrays consume less memory and are much faster for calculations.
B. Basic Operations
arr2 = arr * 2 # Vectorized multiplication
sum_arr = np.sum(arr)
mean_arr = np.mean(arr)
C. Shapes & Dimensions
matrix = np.array([[1, 2], [3, 4]])
matrix.shape # (2, 2)
matrix.flatten() # 1D version
D. Indexing & Slicing
arr[:2] # first two elements
matrix[1, 0] # row 1, column 0
E. Broadcasting
Allows operations on arrays of different shapes (automatically stretches arrays).
arr = np.array([1, 2, 3])
arr + 10 # [11, 12, 13]
Exercise: Create a 2D array and compute the sum of each row.
3. Matplotlib: Foundation of Data Visualization
Purpose:
Matplotlib is the classic plotting library for static, publication-quality graphics.
A. Basic Plotting
import matplotlib.pyplot as plt
# Line plot
plt.plot([1, 2, 3, 4], [10, 20, 15, 25])
plt.title('Growth Over Time')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
# Scatter plot
plt.scatter([1,2,3], [4,5,6])
plt.show()
# Bar chart
plt.bar(['A', 'B', 'C'], [10, 20, 5])
plt.show()
B. Customization
Add legend: plt.legend(['Label'])
Change colors, linewidths, markers
C. Saving Plots
plt.savefig('my_plot.png')
Exercise: Visualize IPL top run scorers as a bar plot.
4. Seaborn: Beautiful Statistical Plots
What is Seaborn?
Built on Matplotlib, Seaborn automates attractive formatting and provides statistical
visualizations.
A. Attractive Themes
import seaborn as sns
sns.set_style('darkgrid')
B. Key Plot Types
Distribution Plot: Shows data spread
sns.histplot(df['runs'])
Categorical Plot: Compare categories
sns.boxplot(x='team', y='runs', data=df)
Heatmap: Display matrix data
sns.heatmap(data.corr())
C. Combining with Matplotlib
fig, ax = plt.subplots()
sns.violinplot(x='team', y='runs', data=df, ax=ax)
plt.show()
D. Customizing Colors
sns.set_palette('coolwarm')
Exercise: Make a boxplot comparing player runs by team.
5. Plotly: Interactive, Web-Ready Visualizations
What is Plotly?
Plotly allows creation of interactive charts you can hover, zoom, or embed in web apps–essential
for modern dashboards.
A. Interactive Plots
import plotly.express as px
# Interactive line plot
fig = px.line(df, x='Match', y='Score', title='IPL Match Scores')
fig.show()
# Interactive bar chart
fig = px.bar(df, x='Player', y='Runs', color='Team')
fig.show()
B. Embedding in Web Apps
Dashboards via Plotly Dash
Save html: fig.write_html('plot.html')
C. Customizing Interactivity
Tooltips with additional info
Enable/disable zoom/pan
D. Plotly vs. Others
Plotly: interactive, web-based
Matplotlib/Seaborn: static, publication-friendly
Exercise: Create an interactive IPL run distribution that lets you filter by year.
Final Tips and Pitfalls
Always check data shapes before analysis.
Use .head(), .info(), .describe() for exploration.
Avoid loops within pandas and NumPy; use vectorized functions.
Plot small examples to test before applying on large datasets.
Break problems into single steps; ask questions when stuck!
Conclusion and Next Steps
By practicing these libraries, trying the hands-on exercises above, and consulting official
tutorials, you’ll build strong foundations for all future Python data projects. Success comes from
experimenting, asking questions, and building real projects – keep exploring!