0% found this document useful (0 votes)
0 views6 pages

One-Day Intensive Python Data Analysis and Visuali

Uploaded by

Meet Pardeshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views6 pages

One-Day Intensive Python Data Analysis and Visuali

Uploaded by

Meet Pardeshi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

One-Day Intensive Python Data Analysis and

Visualization Workshop
Welcome! This expertly crafted curriculum is designed for absolute beginners to master the
essential Python libraries for data analysis and visualization: Pandas, NumPy, Matplotlib,
Seaborn, and Plotly. By the end, you’ll be able to manipulate, analyze, and visualize data
confidently.

1. Pandas: Data Manipulation Made Easy


What is Pandas & Why It’s Important?
Pandas is the most widely used Python library for data analysis. It provides intuitive structures
(Series & DataFrame) to clean, transform, and analyze tabular data efficiently.

A. Core Data Structures


Series: A one-dimensional labeled array.
DataFrame: A two-dimensional labeled data table (think spreadsheet).

import pandas as pd

# Series example
grades = pd.Series([85, 90, 95], index=['Alice', 'Bob', 'Charlie'])

# DataFrame example
data = {'Name': ['Alice', 'Bob'], 'Grade': [85, 90]}
df = pd.DataFrame(data)

B. Reading and Writing Data

# Read from CSV


df = pd.read_csv('data.csv')

# Write to Excel
df.to_excel('output.xlsx', index=False)
C. Data Cleaning and Preprocessing
Handling missing values:

df.isnull().sum() # Find missing


df['col'].fillna(0, inplace=True) # Replace missing

Dropping duplicates/rows:

df.drop_duplicates(inplace=True)
df.dropna(subset=['col'], inplace=True)

D. Filtering, Grouping, and Aggregating

# Filtering rows
df_filtered = df[df['Grade'] > 85]

# Grouping and aggregating


df_grouped = df.groupby('Name')['Grade'].mean()

E. Efficient Manipulation Tips


Use df.loc and df.iloc for indexing
Prefer vectorized operations over loops for speed
Exercise: Load a CSV of IPL player scores, clean missing values, and compute average runs per
player.

2. NumPy: Fast Numerical Computing


What is NumPy?
NumPy supplies the backbone for high-speed numerical calculations in Python, using arrays
much faster than standard lists.

A. Arrays vs. Lists

import numpy as np
arr = np.array([1, 2, 3, 4])

Arrays consume less memory and are much faster for calculations.
B. Basic Operations

arr2 = arr * 2 # Vectorized multiplication


sum_arr = np.sum(arr)
mean_arr = np.mean(arr)

C. Shapes & Dimensions

matrix = np.array([[1, 2], [3, 4]])


matrix.shape # (2, 2)
matrix.flatten() # 1D version

D. Indexing & Slicing

arr[:2] # first two elements


matrix[1, 0] # row 1, column 0

E. Broadcasting
Allows operations on arrays of different shapes (automatically stretches arrays).

arr = np.array([1, 2, 3])


arr + 10 # [11, 12, 13]

Exercise: Create a 2D array and compute the sum of each row.

3. Matplotlib: Foundation of Data Visualization


Purpose:
Matplotlib is the classic plotting library for static, publication-quality graphics.

A. Basic Plotting

import matplotlib.pyplot as plt

# Line plot
plt.plot([1, 2, 3, 4], [10, 20, 15, 25])
plt.title('Growth Over Time')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

# Scatter plot
plt.scatter([1,2,3], [4,5,6])
plt.show()

# Bar chart
plt.bar(['A', 'B', 'C'], [10, 20, 5])
plt.show()

B. Customization
Add legend: plt.legend(['Label'])
Change colors, linewidths, markers

C. Saving Plots

plt.savefig('my_plot.png')

Exercise: Visualize IPL top run scorers as a bar plot.

4. Seaborn: Beautiful Statistical Plots


What is Seaborn?
Built on Matplotlib, Seaborn automates attractive formatting and provides statistical
visualizations.

A. Attractive Themes

import seaborn as sns

sns.set_style('darkgrid')

B. Key Plot Types


Distribution Plot: Shows data spread
sns.histplot(df['runs'])

Categorical Plot: Compare categories


sns.boxplot(x='team', y='runs', data=df)

Heatmap: Display matrix data


sns.heatmap(data.corr())
C. Combining with Matplotlib

fig, ax = plt.subplots()
sns.violinplot(x='team', y='runs', data=df, ax=ax)
plt.show()

D. Customizing Colors

sns.set_palette('coolwarm')

Exercise: Make a boxplot comparing player runs by team.

5. Plotly: Interactive, Web-Ready Visualizations


What is Plotly?
Plotly allows creation of interactive charts you can hover, zoom, or embed in web apps–essential
for modern dashboards.

A. Interactive Plots

import plotly.express as px

# Interactive line plot


fig = px.line(df, x='Match', y='Score', title='IPL Match Scores')
fig.show()

# Interactive bar chart


fig = px.bar(df, x='Player', y='Runs', color='Team')
fig.show()

B. Embedding in Web Apps


Dashboards via Plotly Dash
Save html: fig.write_html('plot.html')

C. Customizing Interactivity
Tooltips with additional info
Enable/disable zoom/pan

D. Plotly vs. Others


Plotly: interactive, web-based
Matplotlib/Seaborn: static, publication-friendly
Exercise: Create an interactive IPL run distribution that lets you filter by year.
Final Tips and Pitfalls
Always check data shapes before analysis.
Use .head(), .info(), .describe() for exploration.
Avoid loops within pandas and NumPy; use vectorized functions.
Plot small examples to test before applying on large datasets.
Break problems into single steps; ask questions when stuck!

Conclusion and Next Steps


By practicing these libraries, trying the hands-on exercises above, and consulting official
tutorials, you’ll build strong foundations for all future Python data projects. Success comes from
experimenting, asking questions, and building real projects – keep exploring!

You might also like