IT_R23_Skills Development-DATA VISUALIZATION Lab
IT_R23_Skills Development-DATA VISUALIZATION Lab
IT_R23_Skills Development-DATA VISUALIZATION Lab
LABORATORY MANUAL
B.TECH IT
(II YEAR–I SEM)(2024-2025)
PREPARED BY
Mr. K Venkaiah
Assistant Professor
COURSE OBJECTIVES:
Students will be able to
1. Understand the importance of data visualization for business intelligence and
decision making.
2. Know approaches to understand visual perception
3. Learn about categories of visualization and application areas
4. Familiarize with the data visualization tools
5. Gain knowledge of effective data visuals to solve workplace problems
COURSE OUTCOMES:
At the end of the course, Students will be able to:
1. Use Python, R and Tableau for data visualization
2. Apply data visuals to convey trends in data over time using tableau
3. Construct effective data visuals to solve workplace problems
4. Explore and work with different plotting libraries
5. Learn and create effective visualizations
Reference Books:
1. 1. Data visualization with python: create an impact with meaningful data insights using
interactive and engaging visuals, Mario Dobler, Tim Grobmann, Packt Publications, 2019
2. Practical Tableau: 100 Tips, Tutorials, and Strategies from a Tableau Zen Master, Ryan
Sleeper, Oreilly Publications, 2018
3. Data Visualization with R: 111 Examples by Thomas Rahlf, Springer, 2020
DV Lab 2023-24
TABLE OF CONTENTS
3 Basic Visualization in R 12
Data visualization is the practice of translating information into a visual context, such as a
map or graph, to make data easier for the human brain to understand and pull insights from. It
is the representation of information and data through use of common graphics, such as charts,
plots, infographics, and animations. Data visualization is a powerful way for people, especially
data professionals, to display data so that it can be interpreted easily.
Data Visualization enables decision-makers of any enterprise or industry to look into analytical
reports and understand concepts that might otherwise be difficult to grasp.
Benefits of Data Visualization:
1. It is easy to understand the information with graphics
2. It made data to be represented in attractive way
3. Shows complex relationships
4. Helps to process large datasets
5. Useful for identifying trends
6. Minimizes ambiguity
Data visualization tools provide the ability to see and understand data trends, outliers, and
patterns in an easy, intuitive way. There are various data visualization tools available. One must
choose the tool based on various factors such as its ease of use, types of graphical
representations the tool can produce, size of the dataset the tool can handle etc. some of Data
Visualization tools are Tableau, Power BI, Google Charts, JupyteR, Grafana etc.
Department of IT Page 1
DV Lab 2023-24
Python has different modules for visualizing data such as matplotlib, seaborn. Matplotlib is a
comprehensive library for creating static, animated, and interactive visualizations in Python. It
presents data in 2D graphics. Seaborn is a visualization library that is built on top of Matplotlib.
It provides data visualizations that are typically more aesthetic and statistically sophisticated.
Matplotlib can be installed using the following command:
pip install matplotlib
Once the module installed, it must be imported into the program using the following command
import matplotlib as mpl, where mpl is the alias name given to matplotlib
library.
matplotlib.pyplot is a state-based interface to matplotlib. matplotlib.pyplot is a collection of
functions that make matplotlib work like MATLAB. Each pyplot function makes some change
to a figure: e.g., creates a figure, creates a plotting area in a figure, plots some lines in a plotting
area, decorates the plot with labels etc. pyplot can be imported into the program using following
command
import matplotlib.pyplot as plt
Following are some of the basic data visualization plots
1. Line plots
2. Area plots
3. Histograms
4. Bar charts
5. Pie charts
6. Box plots
7. Scatter plots
Line Plots:
A line plot is used to represent quantitative values over a continuous interval or time period. It
is generally used to depict trends on how the data has changed over time.
Example:
Department of IT Page 2
DV Lab 2023-24
Program:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5, 6]
y = [1, 5, 3, 5, 7, 8]
plt.plot(x, y)
plt.show()
Output:
Area Plots:
An Area Plot is also called as Area Chart which is used to display magnitude and proportion of
multiple variables.
Example:
Program:
import matplotlib.pyplot as plt
days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
plt.plot([],[],color='m', label='Sleeping', linewidth=5)
plt.plot([],[],color='c', label='Eating', linewidth=5)
Department of IT Page 3
DV Lab 2023-24
Histograms:
Histograms represents the frequency distribution of a dataset. It is a graph showing the number
of observations within each given interval.
Example:
Program:
import matplotlib.pyplot as plt
population_age=[22,55,62,45,21,22,34,42,42,4,2,102,95,85,55,110,120,70,65,55,111,115,80]
bins = [0,10,20,30,40,50,60,70,80,90,100]
plt.hist(population_age, bins, histtype='bar', rwidth=0.8)
plt.xlabel('age groups')
plt.ylabel('Number of people')
plt.title('Histogram')
plt.show()
Department of IT Page 4
DV Lab 2023-24
output:
Bar Charts:
A Bar chart or bar graph is a chart or graph that presents categorical data with rectangular bars
with heights or lengths proportional to the values that they represent. A bar plot is a way of
representing data where the length of the bars represents the magnitude/size of the
feature/variable.
Example:
Program:
from matplotlib import pyplot as plt
plt.bar([0.25,1.25,2.25,3.25,4.25],[50,40,70,80,20],label="BMW",width=.5)
plt.bar([.75,1.75,2.75,3.75,4.75],[80,20,20,50,60],label="Audi", color='r',width=.5)
plt.legend()
plt.xlabel('Days')
plt.ylabel('Distance (kms)')
plt.title('Information')
plt.show()
Output:
Department of IT Page 5
DV Lab 2023-24
Pie Charts:
A Pie chart is a circular statistical chart, which is divided into sectors to illustrate numerical
proportion.
Example:
Program:
import matplotlib.pyplot as plt
days = [1,2,3,4,5]
sleeping =[7,8,6,11,7]
eating = [2,3,4,3,2]
working =[7,8,7,2,2]
playing = [8,5,7,8,13]
slices = [7,2,2,13]
activities = ['sleeping','eating','working','playing']
cols = ['c','m','r','b']
plt.pie(slices, labels=activities, colors=cols, startangle=90, shadow= True,
explode=(0,0.1,0,0), autopct='%1.1f%%')
plt.title('Pie Plot')
plt.show()
Output:
Department of IT Page 6
DV Lab 2023-24
Box Plots:
A Box plot (or box-and-whisker plot) shows the distribution of quantitative data in a way that
facilitates comparisons between variables or across levels of a categorical variable. Box plot
shows the quartiles of the dataset while the whiskers extend encompass the rest of the
distribution but leave out the points that are the outliers.
Example:
Program:
import matplotlib.pyplot as plt
x=[1,2,3,4,5,6,7]
y=[1,2,4,5,3,6,9]
z=[x,y]
plt.boxplot(z,labels=[“A”,”B”],showmeans=True)
plt.show()
Output:
Scatter Plots:
A Scatter chart, also called a scatter plot, is a chart that shows the relationship between two
variables.
Department of IT Page 7
DV Lab 2023-24
Program:
import matplotlib.pyplot as plt
x=[1,1.5,2,2.5,3,3.5,3.6]
y=[7.5,8,8.5,9,9.5,10,10.5]
x1=[8,8.5,9,9.5,10,10.5,11]
y1=[3,3.5,3.7,4,4.5,5,5.2]
plt.scatter(x,y, label='high income low saving',color='r')
plt.scatter(x1,y1,label='low income high savings',color='b') plt.xlabel('saving*100')
plt.ylabel('income*1000')
plt.title('Scatter Plot')
plt.legend()
plt.show()
Output:
Department of IT Page 8
DV Lab 2023-24
ggplot2 is an open-source data visualization package for the statistical programming language
R. ggplot is enriched with customized features to make visualization better. ggplot2 is a system
for declaratively creating graphics, based on The Grammar of Graphics. ggplot2 can greatly
improve the quality and aesthetics of graphics.
The ggplot2 package can be easily installed using the following R function:install.
packages(ggplot2)
then the following command must be used in program to use ggplot package:
library(ggplot2)
Consider the following dataset named surveys. All the visualizations mentioned above are
applied on this dataset.
Surveys<-data.frame(record_id=c(1,2,3,4,5),
month=c(7,7,7,7,7),day=c(16,16,16,17,17),year=c(1977,1977,1977,1977,1977),plot_id=c(2,3
,2,7,3),species_id=c(NL,NL,DM,DM,DM),sex=c(M,M,F,M,M),hindfoot_length=c(32,33,37,
36,35))
Scatter plot:
ggplot(data = surveys, mapping = aes(x = weight, y = hindfoot_length)) + geom_point(alpha
= 0.1, color = "blue“)
Output:
Histogram:
ggplot(surveys, aes(species) + geom_histogram(binwidth = 2)+ labs(title = "Histogram")
Department of IT Page 9
DV Lab 2023-24
Output:
bar chart:
ggplot(surveys, aes(species.id)) + geom_bar(fill = "red")+ labs(title = "Bar Chart")
Output:
Box plot:
ggplot(data = surveys, mapping = aes(x = species_id, y = weight)) + geom_boxplot()
Output:
Line plot:
ggplot(data = yearly_counts, aes(x = year, y = n, group = species_id, colour = species_id)) +
geom_line()
Output:
Department of IT Page 10
DV Lab 2023-24
Tableau is a data visualization tool that provides pictorial and graphical representations of data.
It is used for data analytics and business intelligence. Tableau provides limitless data
exploration without interrupting flow of analysis. With an intuitive drag and drop interface,
user can uncover hidden insights in data and make smarter decisions faster.
Tableau can be downloaded from the following website:
https://www.tableau.com/products/public/download
after downloading, the following is the screen appears.
Click the licence agreement checkbox and then click on install button. After installation, clickon
Tableau Public icon to run Tableau. Following is the Tableau Public home screen.
Department of IT Page 11
DV Lab 2023-24
Tableau supports connecting to a wide variety of data, stored in a variety of places. For
example, data might be stored on computer in a spread sheet or a text file, or in a big data,
relational, or cube (multidimensional) database on a server in enterprise or the data can be from
a public domain available on the web.
Data can be imported in Tableau Public from Connect panel on left side. For example, an Excel
sample data set was loaded into Tableau as follows:
Department of IT Page 12
DV Lab 2023-24
The data store page appears as above. The left pan shows that above dataset consists of 3
worksheets. If we drag orders table, screen appears as follows: Tableau automatically identifies
the data type of each column.
Now drag Returns table onto the Canvas to the right of Orders table. This shows the relation
between the two tables Orders and Returns.
If we click on the link between Orders and Returns table names at the top gives the summar y
of the relationship between the tables. Now rename the data store and click on Sheet1 at the
Department of IT Page 13
DV Lab 2023-24
bottom left to proceed. This step creates a data extract which improves query performance.
Department of IT Page 14
DV Lab 2023-24
Or the above operation can be done by creating a calculated field as shown below. To create a
calculated field, click on the down arrow button beside search tab above Tables panel, drag a
field to that calculated field window.
Department of IT Page 15
DV Lab 2023-24
In the same way we can apply any aggregate or statistical function on data with the help of
calculated fields.
Department of IT Page 16
DV Lab 2023-24
Bar chart:
Bar charts can be created in 3 variations in Tableau: Horizontal bars, stacked bars, side-by-side
bars.
Horizontal bars can be created by selecting that type of chart from Show Me menu on right
hand side of Canvas. The type of chart in box on right hand side represents horizontal bar graph.
In similar to above, stacked bar graph can be created and the result is shown below.
Department of IT Page 17
DV Lab 2023-24
Department of IT Page 18
DV Lab 2023-24
Pie chart:
Department of IT Page 19
DV Lab 2023-24
Bubble chart:
Heat map:
Department of IT Page 20
DV Lab 2023-24
Department of IT Page 21
DV Lab 2023-24
Department of IT Page 22
DV Lab 2023-24
Week 8: Dashboards
A dashboard is a way of displaying various types of visual data in one place. Usually, a
dashboard is intended to convey different, but related information in an easy-to-digest form.
And oftentimes, this includes things like key performance indicators (KPI)s or other important
business metrics that stakeholders need to see and understand at a glance.
Dashboards are useful across different industries and verticals because they’re highly
customizable. They can include data of all sorts with varying date ranges to help you
understand: what happened, why it happened, what may happen, and what action should be
taken.
For example, category of sales across months in a year, region is the field added. The first view
is shown below. This can be renamed at the bottom of the screen.
Now go to 2nd sheet for creating the 2nd view. The second view is shown below. A bubble chart
was drawn between profit and subcategory. Then rename the sheet.
Department of IT Page 23
DV Lab 2023-24
Next 3rd view is created as follows for profit for each subcategory in the category with averages.
After creating individual views, now a Dashboard can be created by clicking on create
dashboard at the toolbar.
Department of IT Page 24
DV Lab 2023-24
now the sheets or views which are created earlier can be drag and dropped on this dashboard.
The above three created views are placed in the dashboard as follows. One can follow their
own way of importing sheets on the dashboard. After creating dahsboard, title can be given to
the dashboard from Dashboard tab. Dahsboard can be customized in terms of its appearance
by the user if requied. Dashboard once created can be saved on users system and can be
retrieved whenever required.
Department of IT Page 25
DV Lab 2023-24
Department of IT Page 26
DV Lab 2023-24
Department of IT Page 27
DV Lab 2023-24
Department of IT Page 28