Plotting and Data Visualization
Introduction to Data Visualization
•Definition: Data visualization is the graphical representation of data and
information.
•Purpose: Helps in identifying patterns, trends, and outliers in data.
•Key Benefits:
•Makes complex data easier to understand.
•Improves decision-making.
•Reveals relationships between variables.
•Example: Visualizing the distribution of test scores in a class.
Types of Data Visualizations
•1. Bar Chart:
•Used for categorical data.
•Displays the frequency or value of different categories.
•2. Line Plot:
•Shows data trends over time (time series).
•Best for continuous data.
•3. Histogram:
•Used for continuous data to show the distribution.
•Bins data into intervals.
•4. Scatter Plot:
•Shows the relationship between two continuous variables.
•Useful for correlation and regression analysis.
•5. Box Plot:
•Displays the distribution summary (median, quartiles, outliers).
Creating a Bar Chart
•Bar charts are ideal for categorical data.
•Example: A bar chart showing the number of students in different grade categories (A, B, C, etc.).
Python code for creating bar plot
import matplotlib.pyplot as plt
categories = ['A', 'B', 'C', 'D']
values = [25, 30, 15, 10]
plt.bar(categories, values, color='skyblue')
plt.xlabel('Grades')
plt.ylabel('Number of Students')
plt.title('Distribution of Grades in Class')
plt.show()
Creating a Line Plot
•Line plots are used to display trends over time.
•Example: A line plot showing the change in temperature over the course of a week.
Python code for creating a line plot
import matplotlib.pyplot as plt
days = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat',
'Sun']
temperature = [22, 24, 19, 23, 25, 28, 26]
plt.plot(days, temperature, marker='o',
linestyle='-', color='orange')
plt.xlabel('Day of Week')
plt.ylabel('Temperature (°C)')
plt.title('Temperature Trend Over a Week')
plt.show()
Creating a Histogram
•Histograms are used for continuous data and display the frequency of data points within certain
ranges (bins).
•Example: A histogram showing the distribution of heights of students in a class.
thon code for plotting a histogram
port matplotlib.pyplot as plt
port numpy as np
a = np.random.normal(170, 10, 1000) # Simulated data: Mean = 170, SD = 10
hist(data, bins=30, color='green', edgecolor='black')
xlabel('Height (cm)')
ylabel('Frequency')
title('Height Distribution of Students')
show()
Creating a Scatter Plot
•Scatter plots are used to examine the relationship between two continuous variables.
•Example: Scatter plot showing the relationship between hours studied and exam scores.
Python code for scatter plot
import matplotlib.pyplot as plt
hours_studied = [1, 2, 3, 4, 5, 6, 7, 8, 9]
exam_scores = [55, 60, 65, 70, 72, 75, 78, 80, 85]
plt.scatter(hours_studied, exam_scores, color='red')
plt.xlabel('Hours Studied')
plt.ylabel('Exam Score')
plt.title('Relationship Between Hours Studied and Exam Scores')
plt.show()
Creating a Box Plot
•Box plots (or box-and-whisker plots) summarize the distribution of data, highlighting the median, quartiles,
and potential outliers.
•Example: Box plot showing the test scores of students in multiple classes.
Python code for Box Plot
import matplotlib.pyplot as plt
data = [ [70, 80, 90, 85, 88], [60, 65, 70, 75, 80], [85, 90, 95, 92, 98] ]
plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor='skyblue', color='black'))
plt.xlabel('Classes')
plt.ylabel('Test Scores')
plt.title('Test Scores Distribution Across Three Classes')
plt.show()
Best Practices for Data Visualization
• Choose the right chart type: Select the visualization that best suits your data (e.g., bar chart for
categories, line plot for trends).
• Keep it simple: Avoid unnecessary clutter. Focus on the message.
• Label everything: Axis labels, chart title, legend, and data points for clarity.
• Use color effectively: Ensure color contrast for readability and use colors that make sense.
• Be mindful of scale: Make sure the axis scales are appropriate for the data range.