Boxplots in R-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Boxplots in R

 In descriptive statistics, a box plot or boxplot (also known as a box and whisker plot)
is a type of chart often used in explanatory data analysis..
 A box graph is a chart that is used to display information in the form of distribution
by drawing boxplots for each of them
 Boxplots are a measure of how well data is distributed across a data
set. This divides the data set into three quartiles.
 This graph represents the minimum, maximum, average, first quartile,
and the third quartile in the data set.
 This distribution of data is based on five sets (minimum, first quartile, median, third
quartile, and maximum).
 Boxplot is also useful in comparing the distribution of data in a data set
by drawing a boxplot for each of them.

1. R provides a boxplot() function to create a boxplot


boxplot(x, data, notch, varwidth, names, main)

.
S.No Parameter Description
1. x It is a vector or a formula.
2. data It is the data frame.
3. notch It is a logical value set as true to draw a notch.
4. varwidth It is also a logical value set as true to draw the width of the box same as the sample size.
5. names It is the group of labels that will be printed under each boxplot.
6. main It is used to give a title to the graph.
Definitions
Minimum Score
The lowest score, excluding outliers (shown at the end of the left whisker).

Lower Quartile
Twenty-five percent of scores fall below the lower quartile value (also known as the first
quartile).

Median
The median marks the mid-point of the data and is shown by the line that divides the box
into two parts (sometimes known as the second quartile). Half the scores are greater than
or equal to this value, and half are less.

Upper Quartile
Seventy-five percent of the scores fall below the upper quartile value (also known as the
third quartile). Thus, 25% of data are above this value.

Maximum Score
The highest score, excluding outliers (shown at the end of the right whisker).

Whiskers
The upper and lower whiskers represent scores outside the middle 50% (i.e., the lower
25% of scores and the upper 25% of scores).

The Interquartile Range (or IQR)


The box plot shows the middle 50% of scores (i.e., the range between the 25th and 75th
percentile).

Why Are Box Plots Useful?


Box plots divide the data into sections containing approximately 25% of the data in that se

Box plots are useful as they provide a visual summary of the data enabling researchers to
quickly identify mean values, the dispersion of the data set, and signs of skewness.

 Box plots are useful as they show the average score of a


data set
 The median is the average value from a set of data and is shown by the line that
divides the box into two parts. Half the scores are greater than or equal to this
value, and half are less.
Drawing A Box And Whisker Plot

Example:
Construct a box plot for the following data:
12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25

Solution:
Step 1: Arrange the data in ascending order.

Step 2: Find the median, lower quartile and upper

quartile.

Median (middle value) = 22


Lower quartile (middle value of the lower half) = 12
Upper quartile (middle value of the upper half) = 36

(If there is an even number of data items, then we need to get the average of the middle
numbers.)

Step 3: Draw a number line that will include the smallest and the largest data.

Step 4: Draw three vertical lines at the lower quartile (12), median (22) and the upper
quartile (36), just above the number line.
Step 5: Join the lines for the lower quartile and the upper quartile to form a box.

Step 6: Draw a line from the smallest value (5) to the left side of the box and draw a line
from the right side of the box to the biggest value (53).

How To Draw A Box And Whiskers Plot For A Set Of Data?

Example:
Make a box-and-whisker plot for the following data set.
7, 3, 14, 9, 7, 8, 12.
Entering Your Own Data
R’s boxplot command has several levels of use, some quite easy, some a bit more difficult to learn.
Let’s start with an easy example. You can enter your own data manually and then create a boxplot.

x=c(1,2,3,3,4,5,5,7,9,9,15,25)
boxplot(x)

If you’d like to compare two sets of data, enter each set separately, then enter them individually into
the boxplot command.

x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
boxplot(x,y)

You can easily compare three sets of data. Just enter your three sets of data and then enter them
individually into the boxplot command.

x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z)

You can use the argument horizontal=TRUE to lay them out horizontally.

x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z, horizontal=TRUE)

You can add names to each boxplot.

x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z,
horizontal=TRUE,
names=c("Level 1","Level 2","Level 3"))

You can add different colors.

x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z,
horizontal=TRUE,
names=c("Level 1","Level 2","Level 3"),
col=c("red","yellow","blue"))

What is a box plot?


A box plot shows the distribution of data for a continuous variable.

How are box plots used?


Box plots help you see the center and spread of data. You can also use them as a visual tool
to check for normality or to identify points that may be outliers.

Is a box plot the same as a box-and-whisker plot?


Yes. Box plots may also be called outlier box plots or quantile box plots. Each is a variation
on how the box plot is drawn.

The Box and Whisker Plot is also called as Box Plot. It consists of a rectangular
“box” and two “whiskers.” Box and Whisker Plot contains the following parts:
 Box: The box in the plot spans from the first quartile (Q1) to the third
quartile (Q3). This box contains the middle 50% of the data and represents
the interquartile range (IQR). The width of the box provides insights into the
data’s spread.
 Whiskers: The whiskers extend from the minimum value to Q1 and from
Q3 to the maximum value. They signify the range of the data, excluding
potential outliers. The whiskers can vary in length, indicating the data’s
skewness or symmetry.
 Median Line: A line within the box represents the median (Q2). It divides the
data into two halves, revealing the central tendency.
 Outliers: Individual data points lying beyond the whiskers are considered
outliers and are often plotted as individual points.
Histograms in R language
A histogram is a type of bar chart which shows the frequency of the number of values which are compared with
a set of values ranges. The histogram is used for the distribution, whereas a bar chart is used for comparing
different entities. In the histogram, each bar represents the height of the number of values present in the given
range.

A histogram contains a rectangular area to display the statistical information which is


proportional to the frequency of a variable and its width in successive numerical intervals.

We can create histogram in R Programming Language using hist() function.


Syntax: hist(v, main, xlab, xlim, ylim, breaks, col, border)
Parameters:
 v: This parameter contains numerical values used in histogram.
 main: This parameter main is the title of the chart.
 col: This parameter is used to set color of the bars.
 xlab: This parameter is the label for horizontal axis.
 border: This parameter is used to set border color of each bar.
 xlim: This parameter is used for plotting values of x-axis.
 ylim: This parameter is used for plotting values of y-axis.
 breaks: This parameter is used as width of each bar.

Example
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.


png(file = "histogram.png")

# Create the histogram.


hist(v,xlab = "Weight",col = "yellow",border = "blue")

# Save the file.


dev.off()

Example
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32,14, 19, 27, 39)

# Create the histogram.


hist(v, xlab = "No.of Articles ", col = "green", border = "black")

# Create data for the graph.


v <- c(9,13,21,8,36,22,12,41,31,33,19)

# Give the chart file a name.


png(file = "histogram_lim_breaks.png")

# Create the histogram.


hist(v,xlab = "Weight",col = "green",border = "red", xlim = c(0,40), ylim = c(0,5),
breaks = 5)

# Save the file.


dev.off()

Pyramid plot
Description
Displays a pyramid (opposed horizontal bar) plot on the current graphics device.
Dot plot in R
 A dot plot or dot chart is similar to a scatter plot.
 The main difference is that the dot plot in R displays the index (each category) in the
vertical axis and the corresponding value in the horizontal axis, so you can see the
value of each observation following a horizontal line from the label.
 This graph can also be used as an alternative of horizontal barplots.
 In addition, you can label the corresponding points in the vertical axis by different
groups and even sort them based on some variable.

month <- month.name

expected <- c(15, 16, 20, 31, 11, 6,

17, 22, 32, 12, 19, 20)

sold <- c(8, 18, 12, 10, 41, 2,

19, 26, 14, 16, 9, 13)

quarter <- c(rep(1, 3), rep(2, 3), rep(3, 3), rep(4, 3))

data <- data.frame(month, expected, sold, quarter)

data
You can create a dot chart in R of the sold variable passing it to the dotchart function.

You can also label each data point with the labels argument and specify additional
arguments, like the symbol, the symbol size or the color of the symbol with
the pch , bg and pt.cex arguments, respectively.

dotchart(data$sold, labels = data$month, pch = 21, bg = "green", pt.cex = 1.5)

You might also like