R Complete

R Complete Note
Presenting Data in Charts and Tables

Bar Chart
# Function to create bar charts:
barplot()
Parameters for barplot() :
x — Vector or matrix containing numeric values used in bar chart
xlab — Label for x-axis
ylab — Label for y-axis
main — The title of the bar chart
names.arg — Vector names appearing under each bar
col — Give colors to the bars
border — Give colors to the borders of the bars
density — (A single number or a vector) Gives the density of the shading

lines for the bars. The default is no shading
Example:
x = c(5, 7, 4, 8, 12, 16, 15)
barplot(
x,
main = "Number of customers visited the store",
xlab = "Days",
ylab = "Number of customers",
R Complete Note 1
names.arg = c("Monday", "Tuesday", "Wednesday", "Thursda
y", "Friday", "Saturday", "Sunday"),
col = rainbow(length(x))
)
Exercise
Create a barplot for this Exercise using R.
Performance Level Frequency
Good 13
Above Average 12
Average 15
Poor 10
Total 50
x = c(13, 12, 15, 10)

barplot(
x,
main = "Performance Level",
xlab = "Levels",
ylab = "Frequency",
names.arg = c("Good", "Above Average", "Average", "Poo
r"),
R Complete Note 2
col = rainbow(length(x))
)
Example with density :
barplot(
x,
main = "Number of customers visited the store",
xlab = "Days",
ylab = "Number of customers",
names.arg = c("Monday", "Tuesday", "Wednesday", "Thursd
ay", "Friday", "Saturday", "Sunday"),
col = rainbow(length(x)),
density = seq(10, 70, 10)
)
R Complete Note 3
Tabular Method
Example:
Rating Frequency Relative Frequency Percent Frequency

2
Poor 2 20
= 0.1 10%
3
Below Average 3 20
= 0.15 15%
5
Average 5 20
= 0.25 25%
9
Above Average 9 20
= 0.45 45%
1
Excellent 1 20
= 0.05 5%
Total 20 1.00 100%
set = c(1, 3, 8, 4, 2, 3, 6, 5, 5, 8, 4, 2, 4, 1, 5)
# Create a frequency table

table(set)
ta <- table(set)
# Relatove frequency
prop.table(ta)
# Cumulative Frequency
cumsum(ta)
R Complete Note 4
Pie Chart
# Function to create pie charts:
pie()
Parameters for pie() :
labels — Descriptions to the slices
radius — Radius of the circle (value between −1and +1)
main — The title of the pie chart
col — The color of the slice
clockwise — If set to TRUE , slices are drawn clockwise
💡 Slices are drawn counter-clockwise by default
Example:
x <- c(35, 28, 47, 63, 50)

pie(x)
Pie Chart with slice percentages as labels:
x <- c(35, 28, 47, 63, 50)

piepercent <- round(100 * x / sum(x), 1)
R Complete Note 5
lbls <- paste(piepercent, "%", sep = "")
pie(
x,
labels = lbls,
main = "Pie chart with slice percentage",
col = rainbow(length(x)),
radius = 1
)
Pie chart with slice percentage along with characters as labels:
x <- c(35, 28, 47, 63, 50)

districts <- c("Colombo", "Kandy", "Jaffna", "Anuradhapur
a", "Batticaloa")
piepercent <- round(100 * x / sum(x), 1)
lbls <- paste(districts, piepercent, "%", sep = "")
pie(
x,
labels = lbls,
main = "Pie chart with slice percentage",
col = heat.colors(length(x)),
R Complete Note 6
radius = 1
)
Measures of Centre Tendency

Mean
datasets::CO2
d <- CO2 # Assign variable 'd' to CO2
u <- d$uptake # Assign variable 'u' to d$uptakeu
mean(u) # Find the mean of 'u'
Find mean values for each category in a column (similar to group by in SQL):
## Find mean values for each category in a column (similar

to group by in SQL)
# Find the mean value of the 'uptake' variable for each cat
egory in the 'Plant' column
tapply(d$uptake, d$Plant, mean)
R Complete Note 7
# Find the maximum value of the 'uptake' variable for each
category in the 'Treatment' column
tapply(d$uptake, d$Treatment, max)
Find the mean value for a specific category in a column:
## Find the mean value of a specific category type in a col

umn
# Find the mean value of the 'uptake' variable for the cate
gory 'chilled' in the 'Treatment column
mean(d$uptake[d$Treatment=="chilled"])
Median
median(u) # Find the median of 'u'
Mode
## Find the mode -- Method 1
# Define function to find the mode

getmode <- function(v) {
uniqv <- unique(v) # Get unique values of 'v'
uniqv[which.max(tabulate(match(v, uniqv)))]} # Return the
value that occurs most frequently
# Call the function to get the mode of 'u'

getmode(u)
R Complete Note 8
💬 match(v, uniqv) returns a vector of the same length as
element is the index of the corresponding element of
v
v
where each
in uniqv .
tabulate() counts the number of times each integer occurs in the

input vector, up to the maximum value in the input vector.
which.max() returns the index of the maximum value in the input vector.
So,
uniqv[which.max(tabulate(match(v, uniqv)))] returns the value in uniqv that
corresponds to the maximum count in v , which is the mode of v .
## Find the mode -- Method 2
# Create a frequency table of 'u' and assign it to 'y'

y <- table(u)
# Find the mode(s) of 'u'

names(y)[which(y==max(y))]
💬 The table() function counts the number of times each unique value
occurs in u
R Complete Note 9
💬 max(y) finds the maximum frequency in y .
returns the indices of

which(y==max(y)) y where the frequency is equal
to the maximum frequency.
names(y)[which(y==max(y))]returns the names (or labels) of y at these

indices. In other words, it finds the values of u that occur most
frequently, which is the mode(s) of u .
Measures of Dispersion
Range and Interquartile Range
# Find the range
Range = max(u) - min(u)
Range
# Find the Interquartile Range

IQR(u)
Quartiles
# Find Quartiles
quantile(u, 0.25) #First Quartile
quantile(u, 0.5) #Second Quartile
quantile(u, 0.75) #Third Quartile
Five number summary
# Find the five-number summary

summary(u)
R Complete Note 10
Find the five-number summary for a specific category in a column:
## Find the five-number summary of a specific category type

in a column
# Find the five-number summary of the 'uptake' variable for
the category 'chilled' in the 'Treatment column
summary(d$uptake[d$Treatment=="chilled"])
Exercise
Find the summary statistics for uptake where plant type is Qn1 and uptake value
is more than 20
# Find the summary statistics for uptake where plant type i

s “Qn1” and uptake value is more than 20
summary(d$uptake[d$Plant=="Qn1" & d$uptake > 20])
💡 function will automatically neglect missing values while

summary()
others do not.
To neglect the missing values in other functions you have to
specifically mention it
Example:
num = c(10, 20, 33, 44, NA, 88, 55)

mean(num) # This will not neglect 'NA'
mean(num, na.rm = T # This will neglect 'NA'
Deciles
# Find Deciles
quantile(u, 0.4) #Fourth Decide
quantile(u, 0.7) #Seventh Decile
Percentiles
R Complete Note 11
# Find Percentiles
quantile(u, 0.98) # 98th Percentile
quantile(u, 0.37) # 37th Percentile
Variance
# Find the Sample Variance
var(u)
# Find the Standard Deviation

sd(u)
Box-Plot (Box & Whisker Plot)

# Function to create box plots:
boxplot()
Parameters for boxplot() :
[y-axis]~[x-axis] — The axes of the graph
data — The dataset
main — The title of the bar chart
xlab — Label for x-axis
ylab — Label for y-axis
col — Give colors to the boxes
border — Give colors to the borders of the boxes
notch — Add a notch to the box at the Median
varwidth — If set to FALSE , all boxes will have the same width regardless of
the size of the group
horizontal — If set to TRUE , the boxes will be horizontal
Examples:
R Complete Note 12
datasets::ToothGrowth
TG<-ToothGrowth
boxplot(
TG$len,
main="Box plot of tooth length",
ylab="Tooth length",
col="hotpink",
border="lightpink",
notch = FALSE,
varwidth = FALSE,
horizontal = TRUE
)
datasets::ToothGrowth
TG<-ToothGrowth
boxplot(
R Complete Note 13
len~supp,
data = TG,
main = "Tooth growth with supplement types",
xlab = "Supplement type",
ylab = "Tooth length",
col = c("hotpink", "lightpink")
)
Tally Table
# Tally Table
datasets::iris
i <- iris
table(i$Species)
Output:
R Complete Note 14
Contingency Table
# Contingency Table
table(d$Plant, d$Type)
table(d$Plant, d$Treatment)
Output:
Binomial Distribution
dbinom
For binomial distributions, dbinom is used in R.
# dbinom Help
help(dbinom)
Example:
Find P (x = 1)when n = 5, and θ = 0.1.
R Complete Note 15
x=1
n=5
θ = 0.1
P (x = 1) =5 C1 (0.1)1 (0.9)4

= 5 × 0.1 × 0.6561
= 0.32805
dbinom(x = 1, size = 5, prob = 0.1)
Find P (x ≤ 3)when n = 5and θ = 0.1.
sum(dbinom(x = 0:3, size = 5, prob = 0.1))
Exercise
A customer receiving service from a customer care center can be classified as
good service or bad service. The probability of getting good service is 0.4.
1. What is the probability of he/she getting at least 2 good services out of 10

tries?
n = 10
x=2
θ = 0.4
1 - sum(dbinom(x = 0:1, size = 10, prob = 0.4))
2. What is the probability he/she getting bad service between 3 and 7 out of
10 tries?
n = 10
3 < x < 10
θ = 0.6
R Complete Note 16
pbinom
pbinom is a cumulative function
# pbinomm Help
help(pbinom)
Examples:
Find P (x ≤ 3)when n = 5and θ = 0.1.
pbinom(3, size = 5, prob = 0.1)
The same can be done with dbinom as:
Poisson Distribution
dpois
dpois is used for Poisson distributions in R
# dpois Help
help(dpois)
Examples:
Find P (x = 0)when λ = 0.03.
dpois(x = 0, lambda = 0.03)
Find P (x ≥ 1)when λ = 0.03
1 - dpois(x = 0, lambda = 0.03)
R Complete Note 17
ppois
ppois is a cumulative function
# ppois Help
help(ppois)
Example:
Find the value of P (x = 0) + P (x = 1) + P (x = 2)when λ = 2.
ppois(2, lamba = 2)
The same can be done with dpois as:
# Method 1
p1 <- dpois(x = 0, lambda = 2)
p <- p1 + p2 + p3
p
# Method 2
sum(dpois(x=0:2, lambda = 2))
Exercise
Suppose it has been observed that, on average 180 cars per hour pass a
specified point on a particular road in the morning rush hour. Due to impending
road works it is estimated that congestion will occur closer to the city center if
more than 5 cars pass the point in any of one minute. What is the probability of
congestion occurring?
180
λ= =3
60
x>5
1 - ppois(5, lambda = 3)
R Complete Note 18
Exercise
A manufacturer of balloons produces 40% that are oval and 60% that are
round. Packets of 20 balloons may be assumed to contain random samples of
balloons. Determine the probability that such a packet contains:
1. an equal number of oval balloons and round balloons
2. P (oval) = 0.4
P (round) = 0.6
20
C10 (0.4)10 (0.6)10

dbinom(x = 10, size = 20, prob = 0.4)
3. fewer oval balloons than round balloons
P (x ≤ 9)
pbinom(9, size = 20, prob = 0.4)
A customer selects packets of 20 balloons at random from a large consignment

until she finds a packet with exactly 12 round balloons.
3. Give a reason why a binomial distribution is not an appropriate model for

the number of packets selected.
The number of trials is not fixed even though they are independent
events
Continuous Uniform Distribution

dunif
dunif(x, min, max) is used to find the PDF at xin R.
# dunif Help
?dunif
R Complete Note 19
Example:
Find the PDF of a uniform distribution between 0and 5at the point x = 2.
dunif(2, min = 0, max = 5)
Cumulative Distribution Function (CDF)
💡 Cumulative distribution function (CDF) for a uniform distribution gives

the probability that the random variable X is less than or equal to a
certain value x.
punif
punif is a cumulative function in R.
punif(q, min, max) is used to find the CDF of x ≤ q

Example:
Find the probability that a random variable from a uniform distribution between
0 and 5 is less than or equal to 3.
punif(3 , min = 0, max = 5)
qunif
qunif is a quantile function in R.
qunif(p, min, max) is used to find the quantile defined by p
Example:
What is the 90th percentile of a uniform distribution between 0 and 5?
qunif(0.90, min = 0, max = 5)
The Normal Distribution / Gaussian

Distribution
R Complete Note 20
pnorm
pnorm is used for Normal Distribution calculations in R.
# pnorm Help
?pnorm
Example:
Find P (x < 18)when mean is 15and the standard deviation is 2
x−μ 18 − 15
<
2

σ
z < 1.5
= 0.9332
pnorm(q = 18, mean = 15, sd = 2)
pnorm(q = 18, mean = 15, sd = 2, lower.tail = TRUE)
💬 The lower.tail argument specifies whether the PDF is calculated for

the lower tail (left-hand side) or the upper tail (right-hand side) of the
normal distribution.
Example:
Find P (x > 18)when mean is 15and the standard deviation is 2
1 − P (x < 18)
= 1 − 0.9331928
= 0.0668072
pnorm(q = 18, mean = 15, sd = 2, lower.tail = FALSE)
Example:
Find P (970000 < x < 1060000)when mean is 1000000and standard
deviation is 30000
R Complete Note 21
# P(x < 1060000)
P1 <- pnorm(q = 1060000, mean = 1000000, sd = 30000, lower.
tail = TRUE)
# P(x < 970000)
P2 <- pnorm(q = 970000, mean = 1000000, sd = 30000, lower.t
ail = TRUE)
# P(970000 < x < 1060000) = P(x < 1060000) - P(x < 970000)
P <- P2 - P1
P
Hypothesis Testing - Examples

H0 : ? ≤ 80

H1 : ? > 80

Sample Mean = 83
Standard Deviation = 8
# Test Statistic Value

Z1 = (83 - 80) / (8 / sqrt(25))
Z1
# Table_value for 95% upper tail test

Table_value <- qnorm(0.95)
Table_value
if (Table_value < Z1) {

print("Reject the H0")
}
H0 : ? = 170 (this specifies a signle value for the parameter of interest)

H1 : ? > 170 (this is what we want to determine)

sd = 65
mu_0 = 170
R Complete Note 22
n = 400
x_bar = 178

z1 <- (x_bar - mu_0) / (sd / sqrt(n))
z1
# Table_value for 95% upper tail test

Table_value <- qnorm(0.95)
Table_value
if (Table_value < z1) {

}
The owner of the shop wants to induce the annual income of the shop. He
suspects compared to previous years annual income rate declined to less than
5%.. He suspects at 5% significance error. Standard deviation of annual
income for last 16 years is 0.1%. The population mean is 5%, and sample mean
is 4.962%.
H0 : ? = 5 (this specifies a single values for the parameter of interest)

H1 : ? < 5 (this is what we want to determine)

sd = 0.1
mu_0 = 5
n = 16
x_bar = 4.962

z1 <- (x_bar - mu_0) / (sd / sqrt(n))
z1
# Table_value for 5% lower tail test

Table_value <- round(qnorm(0.05), 2)
Table_value
if (Table_value > z1) {
R Complete Note 23
} else {
print("Failed to reject H0")
}
R Complete Note 24

R Complete

Uploaded by

Copyright:

Available Formats

R Complete

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Complete

Uploaded by

Copyright:

Available Formats

R Complete Note

Presenting Data in Charts and Tables

Parameters for barplot() :

x — Vector or matrix containing numeric values used in bar chart

xlab — Label for x-axis

ylab — Label for y-axis

main — The title of the bar chart

names.arg — Vector names appearing under each bar

col — Give colors to the bars

border — Give colors to the borders of the bars

density — (A single number or a vector) Gives the density of the shading

x = c(5, 7, 4, 8, 12, 16, 15)

Performance Level Frequency

x = c(13, 12, 15, 10)

Example with density :

Rating Frequency Relative Frequency Percent Frequency

Total 20 1.00 100%

# Create a frequency table

Parameters for pie() :

labels — Descriptions to the slices

radius — Radius of the circle (value between −1﻿and +1﻿)

main — The title of the pie chart

col — The color of the slice

clockwise — If set to TRUE , slices are drawn clockwise

💡 Slices are drawn counter-clockwise by default

x <- c(35, 28, 47, 63, 50)

Pie Chart with slice percentages as labels:

x <- c(35, 28, 47, 63, 50)

Pie chart with slice percentage along with characters as labels:

x <- c(35, 28, 47, 63, 50)

Measures of Centre Tendency

mean(u) # Find the mean of 'u'

## Find mean values for each category in a column (similar

Find the mean value for a specific category in a column:

## Find the mean value of a specific category type in a col

# Define function to find the mode

# Call the function to get the mode of 'u'

tabulate() counts the number of times each integer occurs in the

## Find the mode -- Method 2

# Create a frequency table of 'u' and assign it to 'y'

# Find the mode(s) of 'u'

returns the indices of

names(y)[which(y==max(y))]returns the names (or labels) of y at these

# Find the Interquartile Range

Five number summary

# Find the five-number summary

## Find the five-number summary of a specific category type

# Find the summary statistics for uptake where plant type i

💡 function will automatically neglect missing values while

num = c(10, 20, 33, 44, NA, 88, 55)

# Find the Standard Deviation

Box-Plot (Box & Whisker Plot)

Parameters for boxplot() :

[y-axis]~[x-axis] — The axes of the graph

data — The dataset

main — The title of the bar chart

xlab — Label for x-axis

ylab — Label for y-axis

col — Give colors to the boxes

border — Give colors to the borders of the boxes

radius — Radius of the circle (value between −1and +1)

Find P (x ≤ 3)when n = 5and θ = 0.1.

Find P (x ≤ 3)when n = 5and θ = 0.1.

Find P (x ≥ 1)when λ = 0.03

punif(q, min, max) is used to find the CDF of x ≤ q

qunif(p, min, max) is used to find the quantile defined by p