Boxplots in R-1
Boxplots in R-1
Boxplots in R-1
In descriptive statistics, a box plot or boxplot (also known as a box and whisker plot)
is a type of chart often used in explanatory data analysis..
A box graph is a chart that is used to display information in the form of distribution
by drawing boxplots for each of them
Boxplots are a measure of how well data is distributed across a data
set. This divides the data set into three quartiles.
This graph represents the minimum, maximum, average, first quartile,
and the third quartile in the data set.
This distribution of data is based on five sets (minimum, first quartile, median, third
quartile, and maximum).
Boxplot is also useful in comparing the distribution of data in a data set
by drawing a boxplot for each of them.
.
S.No Parameter Description
1. x It is a vector or a formula.
2. data It is the data frame.
3. notch It is a logical value set as true to draw a notch.
4. varwidth It is also a logical value set as true to draw the width of the box same as the sample size.
5. names It is the group of labels that will be printed under each boxplot.
6. main It is used to give a title to the graph.
Definitions
Minimum Score
The lowest score, excluding outliers (shown at the end of the left whisker).
Lower Quartile
Twenty-five percent of scores fall below the lower quartile value (also known as the first
quartile).
Median
The median marks the mid-point of the data and is shown by the line that divides the box
into two parts (sometimes known as the second quartile). Half the scores are greater than
or equal to this value, and half are less.
Upper Quartile
Seventy-five percent of the scores fall below the upper quartile value (also known as the
third quartile). Thus, 25% of data are above this value.
Maximum Score
The highest score, excluding outliers (shown at the end of the right whisker).
Whiskers
The upper and lower whiskers represent scores outside the middle 50% (i.e., the lower
25% of scores and the upper 25% of scores).
Box plots are useful as they provide a visual summary of the data enabling researchers to
quickly identify mean values, the dispersion of the data set, and signs of skewness.
Example:
Construct a box plot for the following data:
12, 5, 22, 30, 7, 36, 14, 42, 15, 53, 25
Solution:
Step 1: Arrange the data in ascending order.
quartile.
(If there is an even number of data items, then we need to get the average of the middle
numbers.)
Step 3: Draw a number line that will include the smallest and the largest data.
Step 4: Draw three vertical lines at the lower quartile (12), median (22) and the upper
quartile (36), just above the number line.
Step 5: Join the lines for the lower quartile and the upper quartile to form a box.
Step 6: Draw a line from the smallest value (5) to the left side of the box and draw a line
from the right side of the box to the biggest value (53).
Example:
Make a box-and-whisker plot for the following data set.
7, 3, 14, 9, 7, 8, 12.
Entering Your Own Data
R’s boxplot command has several levels of use, some quite easy, some a bit more difficult to learn.
Let’s start with an easy example. You can enter your own data manually and then create a boxplot.
x=c(1,2,3,3,4,5,5,7,9,9,15,25)
boxplot(x)
If you’d like to compare two sets of data, enter each set separately, then enter them individually into
the boxplot command.
x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
boxplot(x,y)
You can easily compare three sets of data. Just enter your three sets of data and then enter them
individually into the boxplot command.
x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z)
You can use the argument horizontal=TRUE to lay them out horizontally.
x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z, horizontal=TRUE)
x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z,
horizontal=TRUE,
names=c("Level 1","Level 2","Level 3"))
x=c(1,2,3,3,4,5,5,7,9,9,15,25)
y=c(5,6,7,7,8,10,1,1,15,23,44,76)
z=c(15,15,15,16,19,25,29,30,55,79)
boxplot(x,y,z,
horizontal=TRUE,
names=c("Level 1","Level 2","Level 3"),
col=c("red","yellow","blue"))
The Box and Whisker Plot is also called as Box Plot. It consists of a rectangular
“box” and two “whiskers.” Box and Whisker Plot contains the following parts:
Box: The box in the plot spans from the first quartile (Q1) to the third
quartile (Q3). This box contains the middle 50% of the data and represents
the interquartile range (IQR). The width of the box provides insights into the
data’s spread.
Whiskers: The whiskers extend from the minimum value to Q1 and from
Q3 to the maximum value. They signify the range of the data, excluding
potential outliers. The whiskers can vary in length, indicating the data’s
skewness or symmetry.
Median Line: A line within the box represents the median (Q2). It divides the
data into two halves, revealing the central tendency.
Outliers: Individual data points lying beyond the whiskers are considered
outliers and are often plotted as individual points.
Histograms in R language
A histogram is a type of bar chart which shows the frequency of the number of values which are compared with
a set of values ranges. The histogram is used for the distribution, whereas a bar chart is used for comparing
different entities. In the histogram, each bar represents the height of the number of values present in the given
range.
Example
# Create data for the graph.
v <- c(9,13,21,8,36,22,12,41,31,33,19)
Example
# Create data for the graph.
v <- c(19, 23, 11, 5, 16, 21, 32,14, 19, 27, 39)
Pyramid plot
Description
Displays a pyramid (opposed horizontal bar) plot on the current graphics device.
Dot plot in R
A dot plot or dot chart is similar to a scatter plot.
The main difference is that the dot plot in R displays the index (each category) in the
vertical axis and the corresponding value in the horizontal axis, so you can see the
value of each observation following a horizontal line from the label.
This graph can also be used as an alternative of horizontal barplots.
In addition, you can label the corresponding points in the vertical axis by different
groups and even sort them based on some variable.
quarter <- c(rep(1, 3), rep(2, 3), rep(3, 3), rep(4, 3))
data
You can create a dot chart in R of the sold variable passing it to the dotchart function.
You can also label each data point with the labels argument and specify additional
arguments, like the symbol, the symbol size or the color of the symbol with
the pch , bg and pt.cex arguments, respectively.