Lesson 4 part 1-3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

Data Visualization and

Communication
Lesson 4

IT Specialist: Data Analytics


Topics Covered
• Skill 4.1: Report data
• Skill 4.2a and 4.3a: Create and derive conclusions from
visualizations that compare one or more categories of data
• Skill 4.2b and 4.3b: Create and derive conclusions from
visualizations that show how individual parts make up the whole
• Skill 4.2c and 4.3c: Create and derive conclusions from
visualizations that analyze trends
• Skill 4.2d and 4.3d: Create and derive conclusions from
visualizations that determine the distribution of data
• Skill 4.2e and 4.3e: Create and derive conclusions from
visualizations that analyze the relationship between sets of values

2
Skill 4.1: Report data
• Data reporting is the process of collecting and organizing raw
data and representing it with a suitable visualization to analyze
the data. There are different visualization methods to organize
and represent data for analysis. This section helps you
understand how data can be organized by using tables and
charts.

3
Data visualization and communication
• Data visualization is the process of showing data using different
visuals like graphs and charts. Organizing the data in a pattern
helps you analyze, evaluate, and generate reports. By
becoming familiar with and practicing different methods of
visualization, you will be able to choose and create the
appropriate visualization method for different kinds of data.

4
Using tables and charts to display information
• R can also be used to organize data into a table. First, you load
each column into a list, and then use a data.frame command to
organize the lists into a table.
• Charts are another way of representing data. Using charts, data
can be represented with different color codes and patterns,
which makes it easier to analyze the data. There are different
types of graphical representations used to visualize data, such
as column charts, bar charts, pie charts, line charts, etc.

5
Use tables and charts to display information
• Figure 4-1 Example of a table • Figure 4-7 Example of a chart

6
Use tables and charts to display information
• The students’ scores can be represented in the form of an Excel
chart, using the following steps.
• After entering the data in Table 4-1, next highlight data and
headers (here in A1: C:11), click on Insert and choose the 2-D
column chart.
• Next, click on Select Data, and change the chart data range to
“=Sheetx!$B1:$C$11” where x is the sheet containing the data,
then click OK.
• Data labels can be added to this chart by clicking Add Chart
Element, then Data Labels, and finally Center.
7
Use tables and charts to display information
• In R, a basic chart can be made after loading the data into a data
frame by using the barplot() command, as below.

• barplot(height=ScoreTable$Score, names=ScoreTable$Name,
ylim=c(0,100), xlab="", ylab="Score", space=0.05, las=2)
mtext("Name", side=1, line=4)

• Both tables and charts can be used to visualize data. Depending on


the purpose, different visualization methods (tables or charts) can be
used to display and analyze the data. If the purpose of the data
analysis is to sort or search, tables can be used. However, charts
can be better suited to interpreting the data visually.

8
Use tables and charts to display information
• Example: The following data report of a retail shop is
represented using both a table and a chart.
• First, detailed information about the items, unit price, units sold,
purchase date, revenue, total cost, and total price can be
organized and formatted in a table.
• In Excel, this is accomplished by entering the data in rows and
columns and highlighting the dataset. On the Home tab
highlight Format as table and choose the desired style.
• Then check the data range and click the “My data has headers”
box.
9
Use tables and charts to display information
• Using this table, the profit for each item can be easily compared by using
the Profit column to sort the dataset from highest to lowest (or vice versa).
However, the relationship between cost, profit and revenue cannot be
easily seen without a graphical representation.
• Using a chart, the relationship between these key economic
measurements is visualized.
• For instance, we can assess how profit levels relate to revenue levels for
this retail shop.
• Furthermore, we can start to ask questions such as: do low profit levels
match low revenue? Or do high profit levels closely match low cost levels?
By asking these questions and using a graphical representation to answer
these questions, we can begin to start to make conclusions about how
these economic measurements are related to each other, if they are
related at all.

10
Use tables and charts to display information
• In Excel, highlight the dataset and on the Insert tab choose 2-D
Column.
• Right-click on the chart that appears and choose Select Data.
• Then under Horizontal (Category) Axis Labels, choose the data
for the labels (here A2:A8).
• The resulting chart makes it much easier to visually inspect
trends and comparisons in data.

11
Use tables and charts to display information
• In R, a chart like this can be created by rearranging the data
slightly and using the graphics library ggplot2. First, the data is
entered by using one column for Revenue, Cost, and Profit
amounts and a second column to indicate the category

12
Use tables and charts to display information
• Immediately after creating the table, you can build the table with
the ggplot2 package with the code below:

• library (ggplot2)
• ggplot(StoreTable, aes(fill=Attribute, y=Dollars, x=Items)) +
geom_bar(position='dodge', stat='identity’)

13
Use tables and charts to display information
• From these charts (R or Excel), you can notice that the profit for
the two highest cost categories is a fair bit higher than those for
the lower cost categories. This can help direct efforts in the
retail operations. A column chart is an appropriate visualization
in this scenario because it makes it easier for the audience to
compare profit, cost, and revenue (sales) across multiple
categories by comparing the heights of the bars.
• For example, in this scenario, the business can see that,
although cosmetics generate more sales than dairy, the higher
cost makes dairy a slightly more profitable product.

14
Disaggregate data
• Disaggregate data is aggregate data (sums, totals, averages,
rates, etc.) that retains some of its original information about
different subgroups (gender, age, economic status, etc.) linked
to these aggregated measures.
• Analyzing disaggregated data allows you to retain the simplicity
of summarized (aggregate) data metrics but still makes
available the ability to compare these measurements between
and within these subgroups.

15
Disaggregate data
• A large data set can have a number of factors or attributes for
each data point. For example, the average annual income of
individuals based solely on their country.
Country Annual Income in USD ($)

USA 36832.37

MEX 5141.96

16
Disaggregate data
• However, by disaggregating this data, you can retain
information about additional factors such as skill level and
gender.
• This disaggregated view enables you to gain insights into the
variations and differences based on these specific factors,
offering a more detailed analysis of income patterns and
disparities.

17
Disaggregate data
• Figure 4-10 Aggregated data using Excel

18
Disaggregate data
• Plotting and analyzing the data in the complete table is
complicated because of the volume of data that is available. In
a disaggregated dataset, the data is broken down into smaller,
more manageable subsets based on specific criteria. It allows
you to extract detailed insights by diving deeper into specific
groups or categories.

19
Disaggregate data

20
Skill 4.2a and 4.3a: Create and derive conclusions from
visualizations that compare one or more categories of data
• This skill covers how to:
• Use different types of charts:
Figure 4-14 Example of a Column Chart Figure 4-15 Example of a Bar Chart

21
Column Chart
• Advantages of a Column • Disadvantages of a Column
Chart Chart
• Column charts should be used • Column charts are not suitable
for small datasets with discrete for data sets with a large
categorical variables. number of different categories.
• They are good for comparing • They can be used to
categories and identifying manipulate the viewer's
broad trends within a dataset. perception of the data.

22
Bar Chart
• The Excel steps for creating a bar chart are almost the same as
generating a column chart. The key difference is that a 2-D
Bar must be chosen as the chart type instead of a 2-D Column
chart.
• In R, the code is the same with the addition of + coord_flip() at
the end of the column chart code provided earlier. The
advantages and disadvantages for bar charts are the same as
for column charts. The choice of a bar chart over a column chart
is mainly preference for its appearance when used in a
presentation or report.

23
Identify Data Visualization Practices That
Minimize The Potential For Misinterpretation
• Visualizing data using different methods helps you to compare,
summarize, analyze, and evaluate a set of data. However,
tables or charts can mislead if not constructed with care. As a
result, the analysis of a data set can be faulty. Hence, it is very
important to understand how to identify and avoid practices that
lead to misinterpretation of data when it is presented visually.
• Figures 4-16 and 4-17 illustrate an example of visually
displayed data that is presented in a misleading manner.

24
Identify Data Visualization Practices That
Minimize The Potential For Misinterpretation
Figure 4-16 Figure 4-17

25
Best practices of data visualization for
column, bar, and line charts
• Choose the correct chart for the appropriate task.
• Use different patterns or color codes to represent data which will help
in analyzing and comparing categories.
• Choose a scale that begins with 0 for bars and columns in order to give
an accurate depiction of data.

26
Skill 4.2b and 4.3b: Create and derive conclusions from
visualizations that show how individual parts make up the whole

• This skill covers how to:


• Differentiate between the following types of graphical representations:
• Pie Chart
• Donut Chart
• Other variations on bar and column charts such as stacked bar and column
charts

27
Pie Chart
• Figure 4-20 Example of a Pie
Chart

28
Pie Chart in Excel
• In Excel enter the data into a table, highlight the table and on
the Insert tab choose the drop-down next to the Pie icon and
choose 2-D Pie.
• Then on the Chart Design tab choose Add Chart Element and
choose Data Labels. Position in the center.
• Then right click on one of the labels (it will be the raw population
number) and choose Format Data Label. In the formatting
menu, uncheck Value and check Percentage.

29
Pie Chart in R
• In R, load the data into a data frame and then use the pie
command to plot the data as shown below:
• Country <- c("China", "India", "USA", "Indonesia", "Pakistan")
Population <- c(1439323776, 1380004385, 331002651,
273523615, 220892340)
• pie(Population, Country)

30
Pie Chart
• Advantages of a Pie Chart • Disadvantages of a Pie Chart
• The relationships between the • They can be used only for one
different categories to the total attribute of a set of data.
can be easily seen. • If too many sections of data are
• Comparisons between different included, it is more difficult to
categories is clearer and compare the categories.
simpler than using a line,
column, or bar graph.

31
Donut Chart
• Figure 4-22 Example of a
Donut Chart

32
Donut Chart in Excel
• To represent this data as a donut chart in Excel, follow the steps
for a pie chart and its labels; however, choose the Donut chart
option.

33
Donut Chart in R
• Categories <- c("Music", "Dance", "Theatre", "Singing", "Drawing")
• StudentCount <- c(26, 36, 22, 38, 28)
• Activity <- data.frame (Categories, StudentCount)

• Activity$proportion = Activity$StudentCount /
sum(Activity$StudentCount)
• Activity$ymax = cumsum(Activity$fraction) #top of each rectangle
• Activity$ymin <- c(0, head(Activity$ymax, n=-1)) #bottom of each
rectangle

34
Donut Chart in R
• library(ggplot2)
• ggplot(Activity, aes(ymax=ymax, ymin=ymin, xmax=4, xmin=3,
fill=Categories)) + geom_rect()
• + coord_polar(theta="y")
• + xlim(c(2, 4))

35
Donut Chart in R
• + scale_fill_brewer(palette=4)
• + theme_void()

• #add labels
• Activity$labelPosition = (Activity$ymax + Activity$ymin) / 2
• Activity$label = paste0(Activity$Categories, "\n", Activity$percent,
"%")
• + geom_label(x=3.5, aes(y=labelPosition, label=label), size=6)

• + theme(legend.position = "none")
36
Other variations on bar and column charts
such as stacked bar and column charts (Slide 1 of 3)
• Figure 4-24 Example of a • Figure 4-26 Example of a
Stacked Bar Chart 100% Stacked Bar Chart

37
Stacked Bar Chart in Excel
• In Excel, choose the Insert tab, then the 2-D Bar stacked option.
• If the data is flipped around (the stacks should be branches on
the inside of each bar, and the categories the four quarters, but
they are often reversed), click Switch Row/Column to change
the categories.
• The chart shows four bars, one for each quarter. Within each
bar, the four branches are stacked and the size of each
branch’s stacked section is visually proportional to the amount
of sales for that branch

38
Stacked Bar Chart in R
• In R, the stacked bar is similar to the bar or column chart
produced earlier but with a slight difference in ggplot commands
namely leaving off the dodge designation that keeps the bars
from overlapping in the earlier charts.

• ggplot(SalesOverview, aes(x=Quarter, y=Sales, fill=Branch)) +


geom_col() + coord_flip() + labs(x="Quarter", y="Sales")

39
100% Stacked Bar Chart in R
• library(ggplot2)
• ggplot(SalesOverview, aes(x=Quarter, y=Sales, fill=Branch)) +
geom_bar(position="fill", stat="identity") +
• labs(x="Quarter", y="Sales")

40
Other variations on bar and column charts
such as stacked bar and column charts (Slide 2 of 3)
• Advantages of Stacked • Disadvantages of Stacked
Bar/Column Charts Bar/Column Charts
• These charts display data as • Can be hard to track trends
segments, representing the between the stacks for a single
relative proportions of each factor, as its position depends
category within the whole. on the other pieces of the
• Unlike pie charts, they allow the stacks prior to it.
visualization of multiple
categories simultaneously,
providing an efficient way to
compare data across them.

41
Other variations on bar and column charts
such as stacked bar and column charts (Slide 3 of 3)
• Advantages of 100% Stacked • Disadvantages of 100%
Bar/Column Charts Stacked Bar/Column Charts
• 100% stacked bar charts are • They may lead to
helpful when wanting to misinterpretation of the data as
compare one factor’s the horizontal bars for all the
contribution to the total of quarterly sales are equal in
another factor. size.
• Since all the bars are 100%
and look the same size,
interpreting the total raw value
instead of the percentage for
each bar isn’t obvious.

42
Skill 4.2c and 4.3c: Create and derive conclusions
from visualizations that analyze trends
• This skill covers how to:
• Use different types of visualization:
• Line chart and variants of the line chart
• Waterfall chart
• Sankey Diagram

43
Line chart and variants of the line chart
• Advantages of Line Charts • Disadvantages of Line Charts
• It helps to check if the values • Line charts are limited to only
are increasing or decreasing. continuous data.
• It is a good option when data is
continuous.

44
Line chart in Excel
• In Excel, highlight the data table and on the Insert tab
choose Line with Markers. Then click on the Select Data button
and check that the labels and data are assigned correctly.

45
Line chart in R
• Subject <- c("English", "Math", "Science", "Social Studies",
"English", "Math", "Science", "Social Studies", "English", "Math",
"Science", "Social Studies", "English", "Math", "Science", "Social
Studies", "English", "Math", "Science", "Social Studies")
• Year <- c(2019, 2019, 2019, 2019, 2020, 2020, 2020, 2020,
2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2023, 2023,
2023, 2023)
• Scores <- c(85, 78, 89, 75, 68, 82, 86, 68, 75, 80, 85, 65, 90,
75, 95, 75, 85, 88, 98, 70)
• YOYScores <- data.frame(Subject, Year, Scores)
46
Line chart and variants of the line chart in R
• library(ggplot2)
• ggplot(data=YOYScores, aes(x=Year, y=Scores)) + geom_line
(aes(color=Subject)) + geom_point (aes(color=Subject))

47
Smooth Line Chart in Excel
• Rainfall data for the following example is provided.
• In Excel, you once again highlight your data table, choose
the Line with Markers chart type, click on the data series on the
chart, then right click on Format Data Series. Under the paint
bucket options, at the bottom, choose Smoothed Line.

48
Line chart and variants of the line chart
• Advantages of Smooth Line • Disadvantages of Smooth
Charts Line Charts
• Smoothed lines keep the data • Smoothing makes the slopes
from seeming jumpy or erratic between points less accurate,
to the eye, allowing for a more making interpolation less exact.
realistic interpretation of events. • True extremes or outliers may
• Smoothed lines can more be visually de-emphasized by
closely suggest statistical tools the smoothing of the
for future use—an equation connecting line.
representing a linear or non-
linear relation may be used to
match the trend in the data.

49
Waterfall chart (Slide 1 of 2)
• Figure 4-36 Example of a Waterfall Chart

50
Waterfall chart
• To create the waterfall chart in Excel, first highlight the data
table and choose Waterfall chart type. When the chart is made,
click on the Subtotal data point (which will not be displayed
correctly), right click, and choose Format Data Point. Under
Series options click the Set as Total box

51
Waterfall chart (Slide 2 of 2)
• Advantages of waterfall • Disadvantages of waterfall
charts charts
• Waterfall charts are good at • They are less familiar to users
showing how a result or than standard bar charts, which
process develops over time. may result in interpretation of
• The sizes of the bars in each the data that may be
step are proportional to the size misleading at first glance.
of the credit or debit, allowing • Floating bars may be
for a quick visual comparison of misinterpreted as full-size bars
events. with gaps.

52
Sankey Diagram (Slide 1 of 2)
• Figure 4-37 Example of a
Sankey Diagram

53
Sankey Diagram (Slide 2 of 2)
• Advantages of a Sankey • Disadvantages of a Sankey
Diagram Diagram
• Although similar information can • It is only useful to describe
be visualized using a pie chart or changes in specific quantities such
bar chart, this visual illustrates the as money, energy, or materials, in
flow of materials, energy, or which its flow through a process is
money into different categories tracked.
and emphasizes intermediary • Only one quantity, at a time, can
steps that are involved in how be effectively illustrated and
these quantities flow. visualized on one chart.
• The use of steps provides a
qualitative beginning, middle, and
end to the process being
described without having to be
exact about time.
54
Skill 4.2d and 4.3d: Create and derive conclusions from
visualizations that determine the distribution of data
• This skill covers how to:
• Use different types of visualizations
Figure 4-41 Example of a Histogram Figure 4-43 Example of a Box and Whisker Plot

55
Histogram
• Advantages of a Histogram • Disadvantages of a Histogram
• It can be used for a big range of • Intervals and range of data can be
data. manipulated to make accurate
• It is very helpful to display the analysis difficult.
frequency of the data. • Histograms are easily manipulated
• It helps to analyze the pattern of by changing bin size and by
the data. changing the start and endpoints
of bins. Smaller bin sizes can lead
to more spread-out data, with less
insight into values that occur more
often. Shifting the placement of
the bins can move some of the
edge data points between bars,
also affording less insight into the
overall data.

56
Box and Whisker plot
• Advantages of a Box and • Disadvantages of a Box and
Whisker Plot Whisker Plot
• It is easy to compare two or • It is difficult to analyze the exact
more data sets. values in the dataset since
• It can be used for extremely much is hidden as only the key
large data sets. numbers are represented in this
• Most variation in the box and plot.
whisker plot can be found in the • The box and whisker plot does
sizes of the boxes, whiskers, not indicate the other measures
and median lines. of central tendency: mean and
mode.

57
Skill 4.2e and 4.3e: Create and derive conclusions from
visualizations that analyze the relationship between sets of values

• This skill covers how to:


• Use different types of visualizations
Figure 4-46 Example of a Scatter Plot Figure 4-49 Example of a Bubble Chart

58
Scatter plot
• Advantages of a Scatter Plot • Disadvantages of a Scatter
• Scatter plots help to identify Plot
gaps in the data. • Non-linear trends are harder to
• Scatter plots help reveal identify conclusively.
relationships between two
numeric factors.
• Plotting points can help identify
trends and direct further
statistical analysis.

59
Bubble chart
• Advantages of a Bubble Chart • Disadvantages of a Bubble
• It is useful to find whether there Chart
is a linear relationship between • The size of bubbles that are
two key variables. close in dimension are difficult
• A third variable is compared to compare with other bubbles
visually by the size of each since they are not always next
bubble. to each other.

60
Summary
• This lesson covered reporting data; creating and deriving
conclusions from visualizations that compare one or more
categories of data; creating and deriving conclusions from
visualizations that show how individual parts make up the
whole; creating and deriving conclusions from visualizations
that analyze trends; creating and deriving conclusions from
visualizations that determine the distribution of data; and
creating and deriving conclusions from visualizations that
analyze the relationship between sets of values.

61

You might also like