Data Visualization in Python - Lesson & Review
1. Introduction to Data Visualization
Data visualization is the graphical representation of information and data. It helps you see patterns, trends,
and outliers in datasets.
Why it's important:
- Makes data easier to understand
- Helps in decision-making
- Communicates findings effectively
Popular Python Libraries:
- matplotlib
- seaborn
- plotly
- pandas (built-in plotting)
- plotly.express (interactive visuals)
2. Getting Started with Matplotlib
Example Code:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 1, 8, 7]
plt.plot(x, y)
plt.title('Simple Line Chart')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.show()
Data Visualization in Python - Lesson & Review
Common Plot Types:
- Line plot: plt.plot()
- Bar chart: plt.bar()
- Histogram: plt.hist()
- Scatter plot: plt.scatter()
- Pie chart: plt.pie()
3. Seaborn for Statistical Plots
Example Code:
import seaborn as sns
import pandas as pd
df = sns.load_dataset('tips')
sns.histplot(data=df, x='total_bill', kde=True)
plt.title('Histogram of Total Bill')
plt.show()
Popular Plots:
- Histogram: sns.histplot()
- Boxplot: sns.boxplot()
- Violin plot: sns.violinplot()
- Scatter with regression: sns.lmplot()
- Heatmap: sns.heatmap()
4. Customizing Your Plots
Key Customizations:
- Titles and labels
- Colors: color='red'
- Line styles: linestyle='--'
Data Visualization in Python - Lesson & Review
- Markers: marker='o'
- Legends: plt.legend()
Example:
plt.plot(x, y, color='green', linestyle='--', marker='o', label='Growth')
plt.legend()
5. Using Pandas for Quick Visuals
Example Code:
import pandas as pd
data = {'sales': [100, 200, 150], 'profit': [30, 70, 50]}
df = pd.DataFrame(data, index=['Jan', 'Feb', 'Mar'])
df.plot(kind='bar')
plt.title('Sales vs Profit')
plt.show()
6. Interactive Visuals with Plotly
Example Code:
import plotly.express as px
df = px.data.iris()
fig = px.scatter(df, x='sepal_width', y='sepal_length', color='species')
fig.show()
Review Questions and Answers
Q1. What is data visualization and why is it important?
Answer: It is the graphical representation of data to help people understand patterns, trends, and insights. It
Data Visualization in Python - Lesson & Review
makes complex data easier to interpret and communicate.
Q2. Name three Python libraries used for data visualization.
Answer: matplotlib, seaborn, plotly
Q3. Write a Python code to create a bar chart of fruits and their quantities.
Answer:
import matplotlib.pyplot as plt
fruits = ['Apples', 'Bananas', 'Cherries']
quantities = [10, 15, 7]
plt.bar(fruits, quantities)
plt.title('Fruit Quantities')
plt.show()
Q4. What is the difference between sns.histplot() and sns.boxplot()?
Answer: sns.histplot() shows the frequency distribution of a single variable, while sns.boxplot() shows the
distribution, median, quartiles, and outliers.
Q5. How can you make an interactive plot in Python?
Answer: Use plotly.express. Example:
import plotly.express as px
df = px.data.iris()
px.scatter(df, x='sepal_width', y='sepal_length', color='species').show()
Q6. Which function would you use to label the x-axis in Matplotlib?
Answer: plt.xlabel('Your label')