RProgrammingLanguage Workshop
RProgrammingLanguage Workshop
net/publication/374088889
CITATIONS READS
0 1,015
1 author:
Ahmed Elshahhat
Zagazig University
90 PUBLICATIONS 562 CITATIONS
SEE PROFILE
All content following this page was uploaded by Ahmed Elshahhat on 22 September 2023.
Online Workshop
By
September 2023
DOI:10.13140/RG.2.2.24265.31841
To the late Prof. Samir K. Ashour
1943-2022
1 Overview
1 Overview
2 Data Structures
1 Overview
2 Data Structures
3 R Data Import
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
3 R Data Import
3 R Data Import
4 R Statistics
3 R Data Import
4 R Statistics
5 R Graphics:
Bar & Box
Histogram & Density
Heatmap
Pairs
QQ
3D
3 R Data Import
4 R Statistics
5 R Graphics:
Bar & Box
Histogram & Density
Heatmap
Pairs
QQ
3D
6 Inference:
Parameter Estimation
Monte Carlo of Parameter Estimation
Linear Regression Models
Monte Carlo of Linear Regression Models
Schedule:
Activity Time (in Minutes)
Overview 30
Data Structures 35
R Statistics 20
R Graphics 35
Inference 60
Dataset: All R scripts used in this workshop are available within these slides.
Installation Requirements: Download the latest versions of R.
Outline
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
Overview
What is R?
Overview
R Advantages
Overview
R Advantages
Overview
R Advantages
1 R is open source.
2 R has a wide community.
3 Outstanding graphical outputs.
4 R is easy to learn and understand.
5 More than 18,000 packages are available and free.
6 R is good for MacOS, Linux and Microsoft Windows.
7 R is cross-platform which runs on many operating systems.
8 R is excellent for simulation, programming, computer intensive analyses, etc.
9 In R, anyone is welcome to provide bug fixes, code enhancements, and add
new packages.
10 Knowledge support for any base default without internet connection.
Overview
R Restrictions
Overview
Installing R System
Overview
R Installation
Overview
Installing R System
Overview
Installing R System
Overview
Installing R System
Overview
Installing R System
Overview
Installing add-on Packages
R Session
R Console
R Session
R Editor
R Session
Interactive R sessions
R Programming Tools
Arithmetic Operators
R Operators
Arithmetic Operators
R Operators
Arithmetic Operators
x <- c(1, 1, 6)
y <- c(0, 4, 3)
print (x / y)
[1] Inf 0.25 2.00
R Operators
Arithmetic Operators
x <- c(5, 4, 2)
y <- c(2, 1, 4)
print (x %% y)
[1] 1 0 2
Operator ‘%*%’ used to find the division of 1st subject with 2nd subject
x <- c(4, 4, 2)
y <- c(2, 0, 4)
print (x %/% y)
[1] 2 Inf 0
V
Operator ‘ ’ raised 1st subject to the exponent of 2nd subject
x <- c(4, 3, 2)
y <- c(3, 0, 5)
print (x ^ y)
[1] 64 1 32
R Operators
Arithmetic Operators
Operator ‘>’ returns TRUE when every element in 1st subject is greater than
the corresponding element of 2nd subject
x <- c(4, 3, 2)
y <- c(3, 0, 5)
print (x > y)
[1] TRUE TRUE FALSE
Operator ‘<’ returns TRUE when every element in 1st subject is less than the
corresponding element of 2nd subject
x <- c(4, 3, 2)
y <- c(3, 0, 5)
print (x < y)
[1] FALSE FALSE TRUE
R Operators
Arithmetic Operators
Operator ‘<=’ returns TRUE when every element in 1st subject is less than or
equal to the corresponding element of another subject
x <- c(4, 3, 2)
y <- c(7, 0, 5)
print (x <= y)
[1] TRUE FALSE TRUE
Operator ‘>=’ returns TRUE when every element in 1st subject is greater than
or equal to the corresponding element of another subject
x <- c(4, 3, 2)
y <- c(7, 0, 5)
print (x >= y)
[1] FALSE TRUE FALSE
R Operators
Arithmetic Operators
Operator ‘==’ returns TRUE when every element in 1st subject is equal to the
corresponding element of 2nd subject
x <- c(4, 3, 2)
y <- c(7, 3, 5)
print (x == y)
[1] FALSE TRUE FALSE
Operator ‘!=’ returns TRUE when every element in 1st subject is not equal to
the corresponding element of 2nd subject
x <- c(4, 3, 2)
y <- c(7, 3, 5)
print (x != y)
[1] TRUE FALSE TRUE
R Operators
Arithmetic Operators
Operator ‘:’ used to create the series of numbers in sequence for a subject
x <- c (1:10)
print (x)
[1] 1 2 3 4 5 6 7 8 9 10
R Programming Tools
Commonly Functions
table counts
c concatenate
print show value
which TRUE indices
length no. of values
summary generic stats
dim matrix order
min, max minimum, maximum
help(), ? provide informations
rbind, cbind bind vectors as a row, a column
class type of an argument
apply repeat over rows, columns
sort, order, rank sort, order, vector rank
R Programming Tools
Commonly Functions, Cont’d
mean(x) average
var(x) variance
cor(x) correlation
cov(x) covariance
sqrt(x) square root
log10(x) log base 10
sin(x), cos(x), tan(x) linear algebra
log(x) natural logarithm
seq(x) sequence generation
median(x) middle number in a sorted
mad(x) median absolute deviation
d, p, q, r density, probability, quantile, generating rns functions
R Programming Tools
Probability Distribution Functions
R Programming Tools
Probability Distribution Functions, Cont’d
#1 R in Action
#8 R for Everyone
#9 The Book of R
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
https://www.coursera.org/learn/data- analysis- r
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
https://www.facebook.com/groups/2101100100212657/
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
Introduction to R (DataCamp)
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
Swirl: Learn R
https://swirlstats.com/
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
https://www.coursera.org/learn/business- analytics- r
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
https://www.coursera.org/learn/probability- intro
The best online courses to learn R programming, the language used by data
analysts and statisticians to structure, analyze, and visualize data.
https://www.udemy.com/course/r- programming
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#1 R-bloggers
Note: The R-bloggers website comprises the efforts of more than 750 R bloggers.
https://www.r- bloggers.com/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
Note: In 2015, Microsoft acquired Inside-R’s parent company Revolution Analytics. One result of this acquisition is
the Microsoft R Application Network, (MRAN).
https://mran.microsoft.com/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#3 Quick-R
Note: Professor Rob Kabacoff at Wesleyan University created this website to introduce you to R and its applications.
https://www.statmethods.net/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#4 RStudio
Note: RStudio is an online learning page that links to tutorials and examples to help you master R and related tools.
https://www.rstudio.com/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#5 Statistics Globe
Note: Statistics Globe is an education platform that provides free programming tutorials in R and Python as well as
theoretical explanations for the field of statistics and data science.
https://statisticsglobe.com/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#6 Stack Overflow
Note: Stack Overflow is a multimillion-member community of programmers dedicated to helping each other. You can
search their Q&A base for help with a problem, or you can ask a question.
https://stackoverflow.com/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#7 R Tutorial
Note: R tutorial is designed for software programmers, statisticians and data miners who are looking forward for
developing statistical software using R programming.
https://www.tutorialspoint.com/r/index.htm
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#8 R Programming Tutorial
Note: R Programming Tutorial is designed for both beginners and professionals. Our tutorial provides all the basic
and advanced concepts of data analysis and visualization.
https://www.javatpoint.com/r- tutorial
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#9 RDocumentation
Note: RDocumentation enables you to search for R packages and functions that suit your needs.
https://www.rdocumentation.org/
In R, you might run into a situation or two that requires some expert help. The
websites listed can provide the assistance you need.
#10 R Manuals
Note: If you want to go directly to the source, visit the R manuals page.
https://cran.r- project.org/manuals.html
Outline
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
R Data Types
In R, there are 6 basic data types called: logical, numeric, integer, complex,
character and raw.
R Data Types
Data Types
print("abc") # Character
[1] "abc"
R Data Types
print(TRUE) # Logical
[1] TRUE
Note: charToRaw() command converts each character to an American Standard Code for Information Interchange
(ASCII) value.
R Data Structures
R has a wide variety of data types including factors, matrices, vectors, arrays, data
frames, and lists.
R Data Structures
Vectors
A vector is the basic data structure in R that stores data of six types of data such
as logical, integer, double, complex, character and raw.
Vectors
R Data Structures
Vectors
Vectors, Cont’d
R Data Structures
Vectors
Vectors, Cont’d
R Data Structures
Vectors
Vectors, Cont’d
R Data Structures
Vectors
Vectors, Cont’d
x; y
[1] 1 2 3 4 5
[1] 6 7 8 9 10
R Data Structures
Vectors
Vectors, Cont’d
x;y
[1] 1 2 3 4 5
[1] 6 7 8 9 10
R Data Structures
Vectors
Vectors, Cont’d
x;y
[1] 1 2 3 4 5
[1] 6 7 8 9 10
R Data Structures
Vectors
Vectors, Cont’d
data <- rep(c(2 ,4 ,6) , times =3)
print(data) # Repeat vector 3 times
[1] 2 4 6 2 4 6 2 4 6
R Data Structures
Vectors
Vectors, Cont’d
for (i in seq (1 ,3 ,0.5)) {
print(i) # Sequence 1 to 3 by 0.5 separately
}
[1] 1
[1] 1.5
[1] 2
[1] 2.5
[1] 3
R Data Structures
Vectors
Vectors, Cont’d
data <- c(1 ,2 ,3 ,4 ,5 ,6)
for (i in data) {
if (i %% 2 == 0)
print(i) # Print even integers
}
[1] 2
[1] 4
[1] 6
R Data Structures
Vectors
Vectors, Cont’d
data <- c(1 ,2 ,3 ,4 ,5 ,6)
for (i in data) {
if (i %% 2 != 0)
print(i) # Print odd integers
}
[1] 1
[1] 3
[1] 5
R Data Structures
Vectors
Vectors, Cont’d
data <- c(1 ,2 ,3 ,4 ,5 ,6)
for (i in data) {
if (i %% 2 == 1)
print(i) # Print odd integers
}
[1] 1
[1] 3
[1] 5
R Data Structures
Matrices
A matrix is a two-dimensional data structure where data are arranged into rows
and columns. In R, the basic syntax for creating a matrix is matrix() function as
matrix (x, nrow , ncol , byrow ) # Insert matrix
x - data items of same type
nrow - number of rows
ncol - number of columns
byrow (optional) - if TRUE, the matrix is filled row-wise. By
default, the matrix is filled column-wise.
R Data Structures
Matrices
Matrices
x = c(9 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8)
# Create a 3x3 matrix
A = matrix (x,nrow =3, ncol =3); print(A)
[,1] [,2] [ ,3]
[1 ,] 9 3 6
[2 ,] 1 4 7
[3 ,] 2 5 8
R Data Structures
Matrices
Matrices, Cont’d
A = matrix (c(9 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8) ,nrow =3, ncol =3); print(A)
[,1] [,2] [ ,3]
[1 ,] 9 3 6
[2 ,] 1 4 7
[3 ,] 2 5 8
dim(A) # Dimension of A
[1] 3 3
det(A) # Determinant of A
[1] -27
R Data Structures
Matrices
Matrices, Cont’d
sum(diag(A)) # Trace of A
[1] 21
qr(A)$rank # Rank of A
[1] 3
R Data Structures
Matrices
Matrices, Cont’d
A[ ,1]; A[1 ,]
[1] 9 1 2
[1] 9 3 6
R Data Structures
Matrices
Matrices, Cont’d
cbind(A[1 ,]) # Transpose 1st row to a column
[,1]
[1 ,] 9
[2 ,] 3
[3 ,] 6
R Data Structures
Matrices
Matrices, Cont’d
colSums (A); rowSums (A) # Sum columns ; Sum rows of A
[1] 12 12 21
[1] 18 12 15
t(A) # Transpose A
[,1] [,2] [ ,3]
[1 ,] 9 1 2
[2 ,] 3 4 5
[3 ,] 6 7 8
R Data Structures
Matrices
Matrices, Cont’d
x = c(9 ,1 ,2 ,3 ,4 ,5 ,6 ,7 ,8) # Data x
y = c(0 ,2 ,4 ,6 ,8 ,10 ,12 ,14 ,16) # Data y
R Data Structures
Matrices
Matrices, Cont’d
R Data Structures
Matrices
Matrices, Cont’d
R Data Structures
Matrices
Matrices, Cont’d
R Data Structures
Arrays
Array is a data structure which can store data of the same type in more than two
dimensions. In R, the basic syntax for creating an array is array() function as
array(x, dim = c(nrow , ncol , nmat)) # Insert array
R Data Structures
Arrays
Arrays
, , 2
R Data Structures
Arrays
Arrays, Cont’d
100 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Data Structures Arrays
R Data Structures
Arrays
Arrays, Cont’d
A1 <- matrix (c (1:6) , 2, 3, byrow = TRUE) # Matrix A1
A2 <- matrix (c( -1: -6) , 2, 3, byrow = TRUE) # Matrix A2
col.names <- c("COL1","COL2","COL3") # Col.names
row.names <- c("ROW1","ROW2") # Row.names
mat.names <- c(" Matrix1 "," Matrix2 ") # Matrix .names
101 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Data Structures Arrays
R Data Structures
Arrays
Arrays, Cont’d
, , Matrix1
, , Matrix2
102 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Data Structures Arrays
R Data Structures
Arrays
Arrays, Cont’d
Mat <- Array [,,1] + Array [,,2] # Add arrays
print(Mat)
COL1 COL2 COL3
ROW1 0 0 0
ROW2 0 0 0
103 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Data Structures Arrays
R Data Structures
Arrays
Arrays, Cont’d
104 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Data Structures Arrays
R Data Structures
Arrays
Arrays, Cont’d
105 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Data Structures Arrays
R Data Structures
Arrays
Arrays, Cont’d
106 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
Outline
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
107 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import In R, one can read data from files stored outside the R environ-
ment. One can also write data into files which will be stored and accessed by
the operating system. R can also read and write into various file formats such
as txt, excel, csv, xml etc.
108 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
109 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
Read a csv file, Cont’d
Following the path directory of staff.csv file in your computer, the read.csv()
function can read the staff.csv file as
read_data <- read.csv("C:\\ Users \\ king \\ Desktop \\ staff.csv")
print(read_data) # Read all data in staff.csv
Finally, the csv file is displayed as:
id name salary age jobe
1 1 Ahmed 2850 32 Prof
2 2 Islam 2680 28 Eng
3 3 Adam 2540 25 Dr
4 4 Asmaa 2760 30 HR
5 5 Mona 2400 26 IT
cat("Total Columns :", ncol(read_data)) # No. of columns
Total Columns : 5
cat("Total Rows:", nrow(read_data)) # No. of rows
Total Rows: 5
110 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
111 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
Read a csv file, Cont’d
info_ people <- subset (read_data , salary > 2700 & jobe == "
Eng")
print(info_ people ) # Get people working as Eng having
salary > 2700
[1] id name salary age jobe
<0 rows > (or 0- length row.names) # Means it does not
available (NA)
112 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
113 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
114 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
Read a xlsx file, Cont’d
Following the path directory of students.xlsx file in your computer, the read_excel()
function can read the students.xlsx file as
# Read students .xlsx file
read_data <- read_ excel(" students .xlsx", sheet = 1)
print(read_data) # Read all data in students .xlsx
id name level age college
1 1 Ahmed 4 21 Business
2 2 Islam 3 20 Engineering
3 3 Adam 2 18 Engineering
4 4 Asmaa 2 19 Arts
5 5 Mimi 1 18 Law
115 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
116 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
117 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
Read a txt file
A txt file is a kind of computer file that is structured as a sequence of lines of
electronic text. To see how we read txt files in R, let’s consider the following data
present in the file named guest.txt as
id name star age country
1 1 Ahmed 4 21 UK
2 2 Islam 3 20 Germany
3 3 Adam 3 18 France
4 4 Asmaa 5 19 Canada
5 5 Mimi 2 18 USA
118 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
Read a txt file, Cont’d
Following the path directory of guest.txt file in your computer, the read_table()
function can read the guest.txt file as
# Read guest.txt file
read_data <- read. table("guest.txt", header = TRUE)
print(read_data) # Read all data in guest.txt
id name star age country
1 1 Ahmed 4 21 UK
2 2 Islam 3 20 Germany
3 3 Adam 3 18 France
4 4 Asmaa 5 19 Canada
5 5 Mimi 2 18 USA
119 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
120 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Data Import
R Data Import
121 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Statistics
Outline
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
122 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Statistics
Statistics
Statistics
set.seed (1234) # Set seed for reproducibility
print(mean(x)) # Mean
[1] -0.0265972
123 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Statistics
Statistics
Statistics, Cont’d
print(var(x)) # Variance
[1] 0.9946825
124 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Statistics
Statistics
Statistics, Cont’d
125 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
Outline
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
126 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
127 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
Simple Plot #1
set.seed (123)
x <- rnorm (500) # Generate sample x from N(0 ,1)
y <- x + rnorm (500) # Generate sample y
plot(x, y) # Plot samples x and y
127 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
Simple Plot #1
128 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
x = rnorm(500)
y = x + rnorm(100)
z1 = x − 2y + 100
z1
e 100
z2 = (z1 + 5)
log(z1 )
129 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
Simple Plot #2
set.seed (123)
x <- rnorm (500)
y <- x + rnorm (100)
z1 <- x - 2*y + 100
z2 <- (z1 +5)*(exp(z1/100)/log(z1))
plot(z1 , z2 , lwd = 3, col = "coral",
xlab = expression (z[1]) , ylab = expression (z[2]) ,
main = expression (
frac ((z [1]+5) *e^frac(z[1] ,100) ,log(z[1]))
)
)
# Plot sample z1 and z2
130 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
Simple Plot #2
131 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
x1 = 1 : 10
x2 = 1 : 10
Simple Plot #3
132 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics
R Plots
Simple Plot #3
133 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Bar & Box
R Plots
Barplot
x = rnorm(50)
y = x + rnorm(50)
Barplot Plot
134 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Bar & Box
R Plots
Barplot
135 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Bar & Box
R Plots
Boxplot
x = rnorm(50)
y = x + rnorm(50)
Boxplot Plot
136 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Bar & Box
R Plots
Boxplot
137 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
Histogram
x = rnorm(50)
y = x + rnorm(50)
Histogram Plot
set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
par(mfrow=c(1 ,2) , oma=c(0 ,0 ,0 ,0))
hist(x)
hist(y) # Draw histograms of x & y in one row
R Plots
Histogram
139 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
Density Plot
x = rnorm(50)
y = x + rnorm(50)
Density Plot
set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
plot( density (x))
polygon ( density (x), col = 1) # Draw density
140 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
Density Plot
141 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
Histogram & Density Plot
x = rnorm(50)
y = x + rnorm(50)
set.seed (123)
x <- rnorm (50)
y <- x + rnorm (50)
hist(x, prob = TRUE) # Draw histogram and density
lines( density (x), lwd =3, col = "red")
142 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
143 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
A general plot represents the scatter, bar, box, time series, time-based and a
specified function in 2×3 window.
General Plot
set.seed (123)
x <- rnorm (500)
y <- x + rnorm (500)
Data_1 <- ts( matrix (x ,500 ,1) ,start=c(0 ,1) ,frequency =12)
Data_2 <- seq(as.Date("2005/1/1"),by="month",length =50)
Data_3 <- factor ( mtcars $cyl)
Data_4 <- function (x) x^2
Data_5 <- rnorm (32)
Data_6 <- rnorm (50)
The ts() function converts a numeric vector into a time series object.
144 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
145 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Histogram & Density
R Plots
General Plot, Cont’d
146 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Heatmap
R Plots
Heatmap
147 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Heatmap
R Plots
Heatmap, Cont’d
The data set in excel file is
148 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Heatmap
R Plots
Heatmap, Cont’d
149 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Pairs
R Plots
Pairs Plot
150 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Pairs
R Plots
Pairs Plot
151 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Pairs
R Plots
Pairs Plot
set.seed (123)
x1 <- rnorm (1000) # Create variable
x2 <- x1 + rnorm (1000 , 0, 2)
x3 <- 3 * x1 - x2 + rnorm (1000 ,0 ,4)
PR <- data. frame (x1 ,x2 ,x3)
pairs(PR) # Draw pairs
152 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Pairs
R Plots
Pairs Plot
153 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Pairs
R Plots
Pairs Plot
Pairs Plot
set.seed (123)
library (" ggplot2 ")
library (" GGally ")
x1 <- rnorm (1000) # Create variable x1
x2 <- x1 + rnorm (1000 ,0 ,2) # Create variable x2
x3 <- 3*x1 -x2 + rnorm (1000 ,0 ,4) # Create variable x3
data <- data. frame(x1 ,x2 ,x3) # Combine all variables
ggpairs (data) # Apply ggpairs function
cor(x1 ,x2) # Correlation between x1 and x2
cor(x1 ,x2) # Correlation between x1 and x3
cor(x2 ,x3) # Correlation between x2 and x3
154 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics Pairs
R Plots
Pairs Plot
155 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics QQ
R Plots
QQ Plot
156 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics QQ
R Plots
QQ Plot
157 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics 3D
R Plots
3D Plot
3D plot in R Language is used to add title, change viewing direction, and add color
and shade to the plot.
q
G= x2 + y2
3D Plot
For more details see; https://www.geeksforgeeks.org/creating- 3d- plots- in- r- programming- persp- function/
158 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics 3D
R Plots
3D Plot
159 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
R Graphics 3D
R Plots
Colors in R Plots
160 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference
Outline
1 Overview
2 Data Structures
3 R Data Import
4 R Statistics
5 R Graphics
6 Inference
161 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference
162 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference
Inference
Hint: It would be interesting to investigate Bayes MCMC methods, but due to the
time limit of this workshop this part of statistical inference will be investigated later.
162 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Parameter Estimation
Maximum Likelihood
MLE-Two dimensional
163 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Parameter Estimation
Maximum Likelihood
MLE-Two dimensional
164 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Parameter Estimation
Least-Squares
LSE-Two dimensional
Here, the least-squares estimation method is used to find the best fit for the
parameter(s) of a target population (e.g., Weibull) based on data set by minimizing
the sum of squares of differences between the theoretical and empirical CDFs.
LSE
set.seed (1234) # Set seed for reproducible
alpha <- 2 # Shape parameter value
lambda <- 1 # Scale parameter value
start <- c(alpha , lambda ) # Start value
x <- rweibull (20 , alpha , lambda ) # Simulate random sample
n <- length (x) # No. of observations
lower <- c(0 ,0); upper <- c(+Inf ,+ Inf)
Dweibull <- function (x, param ) { # Weibull distribution
alpha <- param [1]
lambda <- param [2]
res <- 1-exp(- lambda *x^ alpha )
}
165 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Parameter Estimation
Least-Squares
LSE-Two dimensional
LSE, Cont’d
LSE <- function (param ,x,CDF) { # Set an objective function
x <- sort(x)
D <- rep (0,l= length (x))
for(i in 1:n) { D[i] <- (CDF(x[i], param ) -(i/(n+1)))^2 }
sum(D)
}
166 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Parameter Estimation
Least-Squares
LSE-Two dimensional
LSE, Cont’d
OLS_ weibull =OLS(Dweibull ,start ,x,lower , upper )
print (OLS_ weibull )
167 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
168 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
169 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
170 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
Now, we’ll discuss the R script of drawing a Monte Carlo simulation of Weibull
parameters based on complete sampling using two classical methods of
estimation are:
171 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
172 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
173 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
174 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
175 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Parameter Estimation
176 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Simple
y - response variable.
x - predictor variable.
a & b - regression coefficients.
177 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Simple
178 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Simple
179 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Simple
Residuals :
Min 1Q Median 3Q Max
-0.063002 -0.016629 0.000412 0.018944 0.039775
Coefficients :
Estimate Std. Error t value Pr(>|t|)
( Intercept ) -0.38455 0.08049 -4.778 0.00139 **
x 0.67461 0.05191 12.997 1.16e -06 ***
---
Signif . codes : 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
180 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Simple
Call:
lm( formula = y ~ x, data = mydata )
Coefficients :
( Intercept ) x
-0.3846 0.6746
181 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Simple
182 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Multiple
y - response variable.
x1 , x2 , ..., xn - predictor variables.
a, b1 , b2 , ..., bn - regression coefficients.
183 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Multiple
184 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Multiple
185 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Multiple
Coefficients :
Estimate Std. Error t value Pr(>|t|)
( Intercept ) -0.277006 0.101562 -2.727 0.0294 *
x1 0.670162 0.047952 13.976 2.27e -06 ***
x2 -0.004044 0.002607 -1.551 0.1647
---
Signif . codes : 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
186 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Linear Regression Models
Linear Regression
Multiple
Call:
lm( formula = y ~ x1 + x2 , data = mydata )
Coefficients :
( Intercept ) x1 x2
-0.277006 0.670162 -0.004044
187 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
Now, we’ll discuss the R script of drawing a Monte Carlo simulation of both simple
and multiple linear regression models.
188 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
189 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
for(i in 1:N){
Est[i ,] = c(Res [[i]]$coef) # Calculate Av.Ests
MSE[i ,] = (theta -c(Res [[i]]$coef))^2 # Calculate MSEs
MAB[i ,] = abs(theta -c(Res [[i]]$coef))/ theta # Calculate MABs
}
Reg_1 = mean(Est [ ,1]); Reg_2 = mean(Est [ ,2])
MSE_1 = mean(MSE [ ,1]); MSE_2 = mean(MSE [ ,2])
MAB_1 = mean(MAB [ ,1]); MAB_2 = mean(MAB [ ,2])
190 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
191 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
for(i in 1:N){
Est[i ,] = c(Res [[i]]$coef)
MSE[i ,] = (theta -c(Res [[i]]$coef))^2
MAB[i ,] = abs(theta -c(Res [[i]]$coef))/ theta
}
# Calculate Av. Ests
Reg_1= mean(Est [ ,1]); Reg_2= mean(Est [ ,2]); Reg_3= mean(Est [ ,3])
# Calculate MSEs
MSE_1= mean(MSE [ ,1]); MSE_2= mean(MSE [ ,2]); MSE_3= mean(MSE [ ,3])
# Calculate MABs
MAB_1= mean(MAB [ ,1]); MAB_2= mean(MAB [ ,2]); MAB_3= mean(MAB [ ,3])
192 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
193 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
194 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
Mount, J. and Zumel, N. (2019). Practical Data Science with R. Simon and
Schuster, Shelter Island, New York.
195 /
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language
Inference Monte Carlo of Linear Regression Models
196 /
View publication stats
196 Dr. Ahmed Elshahhat Data Analysis Using R Programming Language