0% found this document useful (0 votes)

5 views

R Script

The document is a comprehensive guide on R programming and RStudio, covering topics such as installation, data manipulation, visualization, and statistical analysis. It provides detailed explanations of R's capabilities, advantages, and essential language features, along with practical examples and code snippets. Additionally, it discusses the differences between R and RStudio, and outlines the use of various R packages for data analysis.

Uploaded by

Bhumika Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

R Script

Uploaded by

Bhumika Jain

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Dr.

Arun Julka

R
A UN JULKA

1
Dr. Arun Julka

Table of Contents
1 R and RStudio ................................................................................................................................. 4
1.1 R vs RStudio............................................................................................................................. 5
1.2 Introduction to R ..................................................................................................................... 5
1.2.1 How can you use R?........................................................................................................ 5
1.2.2 Getting Started with R..................................................................................................... 5
1.3 Advantages of R ...................................................................................................................... 6
2 Installation of R Packages ............................................................................................................... 7
2.1 R Packages............................................................................................................................... 7
2.2 File Import in R ........................................................................................................................ 7
3 Essentials of the R Language .......................................................................................................... 8
3.1 File R1(calculation) .................................................................................................................. 8
3.2 File R2(vectors) ....................................................................................................................... 9
3.3 File R3(matrices) ..................................................................................................................... 9
3.4 File R4(Arrays and Lists) ........................................................................................................ 10
3.4.1 Arrays ............................................................................................................................ 10
3.4.2 Lists ............................................................................................................................... 10
3.5 File R5(Loops) ........................................................................................................................ 10
3.6 File R6(Factors and Data Frame) ........................................................................................... 11
3.6.1 Factors ........................................................................................................................... 11
3.6.2 Data Frame .................................................................................................................... 11
3.7 File R7(Conditional and Control Flows)................................................................................. 11
3.8 File R8(importing from excel) ............................................................................................... 13
3.9 File R9(text) ........................................................................................................................... 13
3.10 File R10(Apply Function: A Versatile Tool for Data Manipulation) ....................................... 13
3.10.1 apply () .......................................................................................................................... 13
3.10.2 lapply() .......................................................................................................................... 14
3.10.3 sapply() ......................................................................................................................... 14
3.10.4 vapply() ......................................................................................................................... 14
3.10.5 tapply() .......................................................................................................................... 14
4 Data Visualisation using R............................................................................................................ 15
4.1 R11(Histograms).................................................................................................................... 15
4.2 R12(Boxplot) ......................................................................................................................... 15
4.3 R13(Line plot) ........................................................................................................................ 15
4.3.1 Line Plot with single series of data ............................................................................... 15
4.3.2 Line Plot with multiple series of data ........................................................................... 16

2
Dr. Arun Julka

4.4 R14(Scatter Plots).................................................................................................................. 16

4.5 R15(Bar Chart)....................................................................................................................... 17
5 Descriptive Statistics Using R....................................................................................................... 18
5.1 Summary (Descriptive Statistics 1) ....................................................................................... 18
5.2 Measure of Central Tendency (Descriptive Statistics 2) ....................................................... 18
5.2.1 Arithmetic Mean, Median and Mode ............................................................................ 18
5.2.2 Mode using function from the data frame ..................................................................... 19
5.3 Measure of Dispersion(Descriptive Statistics 3) ................................................................... 19
5.3.1 Range ............................................................................................................................ 19
5.3.2 Variance ........................................................................................................................ 19
5.3.3 Standard Deviation........................................................................................................ 19
5.4 Datasheet (Descriptive Statistics 4) ...................................................................................... 19
5.4.1 Tribbles in R: A Concise Data Frame Creation............................................................... 19
5.5 Student data case (Descriptive Statistics5) ........................................................................... 20
6 Relationship between two variables .............................................................................................. 20
6.1 Covariance (Descriptive Statistics6) ...................................................................................... 20
6.2 Correlation (Descriptive Statistics7) ..................................................................................... 20
6.3 Coefficient of Determination (Descriptive Statistics8) ......................................................... 21
7 Citation.......................................................................................................................................... 22
8 Regression using R ....................................................................................................................... 23
8.1 Simple Regression using R..................................................................................................... 23
8.2 Multiple Regression using R .................................................................................................. 23
8.2.1 Case1 ............................................................................................................................. 23
8.2.2 Case2 ............................................................................................................................. 23

3
Dr. Arun Julka

1 R and RStudio
R's open-source nature, extensive statistical capabilities, powerful data visualisation tools, and a vibrant
community make it a compelling choice for data scientists, statisticians, and researchers.
• R is a programming language for statistical computing and graphics.
• R is the successor language of the ‘S’ language.
• The name of this language ‘R’ has been derived from the first alphabet of its developers’ names,
viz., Robert Gentleman and Ross Ihaka.
• R provides many graphical and statistical tools such as linear, and nonlinear modelling,
classification, classical statistical tests, and clustering etc.
• It runs on various UNIX platforms and other systems including Windows and MacOS.
• R offers integrated software facilities including operators for calculations, intermediate tools
for data manipulation, data visualisation, data storage and handling facilities.
• R is a fully developed language. It consists of various loops, conditionals, and other output-
input functions. For this reason, it is popularly known as ‘R Environment.’

RStudio
• RStudio is an integrated development environment (IDE).
• It is particularly designed to work with the R programming language.
• RStudio can be broadly divided into 4 panes:

Source Editor Environment

Console Plots

4
Dr. Arun Julka

1.1 R vs RStudio
Basis R RStudio

Meaning It is a programming language. It is an Integrated Development

Environment (IDE).

Objective It aims at statistical computing It aims at development of statistical

and graphics. programs.

Elaborative process R is the core engine for RStudio is more elaborative in nature as
performing data analysis and it provides a more user- friendly
computations. However, it is environment for working with R.
less elaborative than RStudio.

Independent platform It is an independent platform It is not an independent platform which

which means it can be used on means it is specifically designed for R
any other operating system language only.
that understands the R coding.

Extension R script has an extension ‘. R’ RStudio project file has an extension ‘.

Rproj’

1.2 Introduction to R
Imagine R as a powerful Swiss Army knife for data analysis. It's a programming language and software
environment designed specifically for statistical computing and data visualisation. Think of it as a tool
that allows you to explore, manipulate, and extract insights from data, no matter how complex or messy
it might be.

1.2.1 How can you use R?

R can be used in various fields, including:
➢ Social Sciences: Analysing survey data, conducting opinion polls, and studying social
trends.
➢ Business Analytics: Making data-driven decisions, forecasting sales, and optimising
marketing strategies.
➢ Bioinformatics: Analysing genetic data, studying protein structures, and understanding
biological processes.
➢ Finance: Modelling financial markets, assessing risk, and optimising investment
portfolios.
➢ Environmental Science: Monitoring environmental changes, analysing climate data, and
predicting natural disasters.

1.2.2 Getting Started with R

To start your journey with R, you'll need to:
1. Install R: Download and install R from the official website (https://cran.r-project.org/).
2. Choose an IDE: Consider using an Integrated Development Environment (IDE) like
RStudio, which provides a user-friendly interface for coding and data analysis.

5
Dr. Arun Julka

3. Learn the Basics: Start with basic R syntax, data structures (vectors, matrices, data
frames), and fundamental statistical functions.
4. Explore Packages: Discover and install packages that cater to your specific needs, such as
tidyverse for data manipulation and visualisation, caret for machine learning, and ggplot2
for advanced plotting.
1.3 Advantages of R
R has become a cornerstone for data analysis and statistical computing due to its numerous advantages:
1. Open-Source and Free:
➢Cost-Effective: No licensing fees, making it accessible to everyone.
➢Community-Driven: A large and active community contributes to its development and
provides extensive support.
2. Comprehensive Statistical Capabilities:
➢ Statistical Tests: R offers a wide range of statistical tests for hypothesis testing, regression
analysis, and more.
➢ Machine Learning: Powerful machine-learning algorithms for classification, regression,
clustering, and predictive modelling.
3. Data Visualisation:
➢ High-Quality Graphics: Create stunning visualisations with packages like ggplot2, lattice,
and plotly.
➢ Customisation: Tailor plots to specific needs, including interactive visualisations.
4. Flexibility and Extensibility:
➢ Package Ecosystem: A vast collection of packages (over 18,000) for various statistical and
data analysis tasks.
➢ Custom Function Creation: Develop custom functions to tailor the analysis to specific
requirements.
5. Reproducible Research:
➢ Version Control: Track changes and ensure reproducibility.
➢ R Markdown: Create dynamic documents combining code, output, and narrative text.
6. Platform Independence:
➢ Cross-Platform Compatibility: Runs on Windows, macOS, and Linux.
7. Strong Community Support:
➢ Active Forums and Communities: Online resources for help and collaboration.
➢ Tutorials and Documentation: Extensive documentation and tutorials available.
8. Integration with Other Tools:
➢ Interoperability: Seamlessly integrates with other tools like Python, SQL, and Hadoop.
9. Data Wrangling and Manipulation:
➢ Powerful Data Manipulation: Efficiently clean, transform, and reshape data with
packages like dplyr and tidyr.
10. Big Data Analysis:
➢ Scalability: Handles large datasets with packages like sparklyr and bigr.

6
Dr. Arun Julka

2 Installation of R Packages
2.1 R Packages
➢ base
➢ readxl
➢ readr
➢ dplyr
➢ tidyr
➢ tibble
➢ tidyverse
➢ ggplot2
➢ lmtest
➢ graphics
➢ stats

2.2 File Import in R

File Format Package Required
As Text .txt base
As CSV .csv readr
AS EXCEL .xlsx / .xls readxl
As SPSS, SAS, STATA .sav/.sas/.dta haven

7
Dr. Arun Julka

3 Essentials of the R Language

3.1 File R1(calculation)
#Calculation with R
10+17
10*15
140/7
100-6
2^5
2+2-2/2
3*5/6
3/5*6
1:20
20:1
seq(1,100, by=5)
seq(1,100, by=3)
seq(1,100, 5)
seq(1,100, 3)
rep(7,10)

# arithmetic function in R
abs(-15)
exp(1)
log(exp(1))
log(10)
log10(10)
log10(exp(1))
log(16,4)

# create a variable
x <- -100
x + 70
abs (x)
u <- 19
v <- 11
u+v
sum(u, v)

#create and change a variable

result <- 10 - 4
print(result)
result <- 6 * 7
print(result)
result <- 20 / 5
print(result)
result <- 2^3
print(result)

Remember:

8
Dr. Arun Julka

Order of Operations: R follows the standard order of operations (PEMDAS/BODMAS):

Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction
(from left to right).

# Calculating the area of a circle

radius <- 5
pi <- 3.14159
area <- pi * radius^2
print(area)

3.2 File R2(vectors)

#Vectors and subscripts
numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)
# create a numeric vector
y <- c (10,12, -32,40,50,100)
3*y
abs (y)
class(y)
length(y)
mean(y)
max(y)
min(y)
quantile(y)
z<-rep(c(1,2,3,4,5),3)
z
#Extracting elements of a vector using subscripts
y
y[4]
y[3]

3.3 File R3(matrices)

#making a matrix
A <- matrix(c(1,4,7,2,5,8,3,6,9), nrow=3)
A
class(A)
attributes(A)
#making a matrix row-wise using vector
vector <- c(1,2,3,4,4,3,2,1)
B <- matrix(vector, byrow=T, nrow=2)
B
class(B)
attributes(B)
#transpose of a matrix
C<-t(B)
C
#addition, subtraction, and multiplication of matrices

9
Dr. Arun Julka

D <- matrix(c(10,40,70,20,50,80,30,60,90), nrow=3)

D
2*D-5
E <- matrix(c(5,6,7,4,8,9,10,6,11),nrow=3)
E
3*E+5
D+E
D-E
DE<-D%*%E
DE

3.4 File R4(Arrays and Lists)

3.4.1 Arrays
# Create a 2D array
my_array <- array(1:24, dim = c(2, 3, 4))
print(my_array)
#make two vectors of varied lengths
v1<-c(5,9,3)
v2<-c(10,11,12,13,14,15)
#use the above vectors as inputs for the array
new_array<- array(c(v1,v2),dim=c(3,3,2))
new_array
3.4.2 Lists
#create List
my_list <- list(name = "Arun Julka",
age = 53,
city = "New Delhi",
hobbies = c("reading", "coding", "painting")
)
my_list
View(my_list)

3.5 File R5(Loops)

# create a Loop
Loops are fundamental programming constructs that allow you to execute a code block repeatedly. R
provides several ways to implement loops
#for
for (i in 1:5) {
print(i)
}
#while
i <- 6
while (i <= 10) {
print(i)
i <- i + 1
}
#repeat
i <- 11
repeat {
print(i)

10
Dr. Arun Julka

i <- i + 1
if (i > 15) {
break
}

3.6 File R6(Factors and Data Frame)

3.6.1 Factors
#Factor
data<-c("Male", "Female", "Male", "Child", "Child", "Male", "Female", "Female")
data
factor.data<-factor(data)
factor.data
3.6.2 Data Frame
# create a data frame
name <- c("Arun Julka", "Arvinder", "Deepak Mehra", "Sanjay Garg", "Soma Jain")
age <- c(60, 55, 52, 56, 45)
gender <- c("M", "F", "M", "M", "F")
friends <- data.frame (name, age, gender)
friends[,]
friends[1,]
friends[2,]
friends[3,]
friends[,1]
friends[,2]
friends[,]
friends[1,1]
friends[1:3,1]
# create a data frame
df<-data.frame(x=c(1,4,4,5,6,10,12,13),y=c(2,2,3,3,4,5,11,11),z=c(8,9,9,9,10,13,15,17))
df

3.7 File R7(Conditional and Control Flows)

# if statement
x <- 10
if (x > 5) {
print("x is greater than 5")
}
# if-else statement
y <- 15
if (y > 15) {
print("y is greater than 15")
} else {
print("y is less than or equal to 15")
}
# if-else if-else statement
z <- 3
if (z > 10) {
print("z is greater than 10")
} else if (z > 5) {

11
Dr. Arun Julka

print("z is greater than 5")

} else {
print("z is less than or equal to 5")
}
#Break
for (i in 1:10) {
if (i == 5) {
break
}
print(i)
}
#next
for (j in 1:10) {
if (j %% 2 == 0) {
next
}
print(j)
}

12
Dr. Arun Julka

3.8 File R8(importing from excel)

#importing from excel
library(readxl)
CreditLimit <- read_excel("C:/Users/ADMIN/OneDrive/Desktop/R/R Data/CreditLimit.xlsx")
View(CreditLimit)

#check the first 6 rows of Excel

head(CreditLimit)
#check the last 6 rows of Excel
tail(CreditLimit)
#check the summary of Excel
summary(CreditLimit)
#check the summary of a variable Excel
summary(CreditLimit$Income)
cl<- CreditLimit
cl
#check dimensions of Excel
dim(CreditLimit)
#check the number of variables in Excel
length(CreditLimit)
#check the list of variables in Excel
attributes(CreditLimit)
#check the data type of variables in Excel
class(CreditLimit)

3.9 File R9(text)

#importing a text file
TEXT <- read.delim("C:/Users/ADMIN/OneDrive/Desktop/R/TEXT.txt")
View(TEXT)

Remember:
File Import in R
File Format Package Required
As Text .txt base
As CSV .csv readr
AS EXCEL .xlsx / .xls readxl
As SPSS, SAS, STATA .sav/ .sas/.dta haven

3.10 File R10(Apply Function: A Versatile Tool for Data Manipulation)

The apply family of functions in R provides efficient ways to apply a function to elements of an array
or list. These functions can significantly streamline your data analysis tasks.

Here are the primary functions in the apply family:

3.10.1 apply ()
Purpose: Applies a function to the margins of an array.
Syntax: apply(X, MARGIN, FUN, ...)

13
Dr. Arun Julka

• X: The array to which the function is applied.

• MARGIN: A vector specifying the margins (rows, columns, etc.) to apply the
function to.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a matrix
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Calculate the sum of each row
row_sums <- apply(my_matrix, 1, sum)
print(row_sums)

3.10.2 lapply()
Purpose: Applies a function to each element of a list.
Syntax: lapply(X, FUN, ...)
• X: The list to which the function is applied.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a list of numbers
my_list <- list(1, 2, 3, 4, 5)
# Square each element
squared_list <- lapply(my_list, function(x) x^2)
print(squared_list)

3.10.3 sapply()
Purpose: Similar to lapply(), but simplifies the output to a vector or matrix.
Syntax: sapply(X, FUN, ..., simplify = TRUE)

3.10.4 vapply()
Purpose: Like sapply(), but allows you to specify the type of the output.
Syntax: vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
3.10.5 tapply()
Purpose: Applies a function to subsets of a vector, splitting the vector based on factors.
Syntax: tapply(X, INDEX, FUN, ..., simplify = TRUE)
Example:
# Create a vector and a factor
x <- 1:10
f <- factor(rep(c("A", "B"), 5))
# Calculate the mean of x for each level of f
means <- tapply(x, f, mean)
print(means)

14
Dr. Arun Julka

4 Data Visualisation using R

4.1 R11(Histograms)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
hist(iris$Sepal.Length)
hist(iris$Sepal.Length, col="steelblue")
hist(iris$Sepal.Width, col="red")
hist(iris$Sepal.Width, col="yellow")
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour
hist(iris$Petal.Length,
main='Histogram',
xlab='Length',
ylab='Frequency',
col='red')

4.2 R12(Boxplot)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
boxplot(iris$Sepal.Length)
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour of boxplot and bor-
der as a border of boxplot
boxplot(iris$Petal.Length,
main='Petal Length',
xlab='Species',
ylab='Petal Length',
col='pink',
border = 'red')
OR
boxplot(iris$Petal.Length,main='Petal Length', xlab='Species', ylab='Petal Length', col='pink', border
= 'red')

4.3 R13(Line plot)

4.3.1 Line Plot with single series of data
#Plot a vector
l1<- c(7,12,28,3,41)

15
Dr. Arun Julka

plot(l1)
#Plot a vector using points
plot(l1,type = 'p')
#Plot a vector using lines
plot(l1,type = 'l')
#Plot a vector using both points and lines
plot(l1,type = 'o')
#Plot a vector using both points and lines with colour
l2<- c(1,2,8,13,40)
plot(l2,type = 'o', col='red')
#Plot a vector using both points and lines with colour, heading, label of the x & y axis
l3<- c(5,2,11,7,20,15,22,17,25)
plot(l3,type = 'o', col='green', main='Line Plot', xlab='points', ylab='Frequency')

4.3.2 Line Plot with multiple series of data

#variable 't' represent time
t<-0:10
#variable 'z' showing quantity that is decreasing in time
z<-exp(-t/2)
#variable 'w' that is increasing with time
w<-0.1*exp(t/3)
#plot t and z
plot(t,z,type ='l')
#plot t and z with colour red, line width 3, label of x & y axes time and concentration
plot(t,z,type ='l', col='red', lwd='3', xlab='Time', ylab='Concentration')
#plot t and w
plot(t,w,type ='o')
#plot t and w with colour green, line width 4, label of x & y axes time and concentration
plot(t,w,type ='o', col='green', lwd='4', xlab='Time', ylab='Concentration')
#plot both lines
lines(t,z,col='red', lwd='3')
#add title 'Exponential Growth and decay'
title("Exponential Growth and decay")

4.4 R14(Scatter Plots)

• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
#plot scatter diagram
plot(iris$Sepal.Length, iris$Sepal.Width)
#plot scatter diagram in dot shape
plot(iris$Sepal.Length, iris$Sepal.Width, pch=20)
#plot scatter diagram taking pch 0 to 25
plot(iris$Sepal.Length, iris$Sepal.Width,pch=0)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=1)

16
Dr. Arun Julka

plot(iris$Sepal.Length, iris$Sepal.Width,pch=2)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=3)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=4)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=5)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=6)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=7)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=8)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=9)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=10)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=11)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=12)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=13)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=14)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=15)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=16)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=17)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=18)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=19)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=20)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=21)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=22)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=23)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=24)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=25)
#plot scatter diagram in dot shape, red colour, heading Scatter Plot, Label of x & y axes sepal_length
& sepal_width
plot(iris$Sepal.Length, iris$Sepal.Width,col="red", main="Scatter Plot",xlab="sepal_length",
ylab="sepal_width", pch=20)

4.5 R15(Bar Chart)

• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
# Count the number of observations for each species
species_counts <- table(iris$Species)
species_counts
# Create a bar chart
barplot(species_counts,
main = "Distribution of Iris Species",
xlab = "Species",
ylab = "Count",
col = c("blue", "green", "pink"))

17
Dr. Arun Julka

5 Descriptive Statistics Using R

5.1 Summary (Descriptive Statistics 1)
#Summary of a data sheet
data(“iris”)
View(iris)
summary (iris)
• Mean
• Minimum
• Medium
• Quartiles
• Maximum

5.2 Measure of Central Tendency (Descriptive Statistics 2)

5.2.1 Arithmetic Mean, Median and Mode
#create a vector
x<-c(3,7,5,13,20,23,39,23,40,23,14,12,56,23)
#calculate mean of x
mean(x)
#calculate median of x
median(x)
#create a frequency table
t1<-table(x)
t1
#calculate mode of x
mode_x<-names(t1)[which(t1==max(t1))]
mode_x
#create a vector
y<-c(3,7,5,13,20,20,39,23,40,23,14,12,56,23,20)
#calculate mean of y
mean(y)
#calculate median of y
median(y)
#create a frequency table
t2<-table(y)
t2
#calculate mode of y
mode_y<-names(t2)[which(t2==max(t2))]
mode_y
#create a vector
z<-c(3,7,5,13,20,20,39,23,25,25,40,23,14,12,56,23,20,25)
#calculate mean of z
mean(z)
#calculate median of z
median(z)
#create a frequency table
t3<-table(z)
t3
#calculate mode of z
mode_z<-names(t3)[which(t3==max(t3))]

18
Dr. Arun Julka

mode_z

5.2.2 Mode using function from the data frame

# create a data frame
df <- data.frame(x = c(1, 4, 4, 5, 6, 7, 10, 12),
y = c(2, 2, 3, 3, 4, 5, 11, 11),
z = c(8, 9, 9, 9, 10, 13, 15, 17))
# Define the mode function
find_mode <- function(x) {
unique_values <- unique(x)
counts <- tabulate(match(x, unique_values))
unique_values[which.max(counts)]
}
# Apply the find_mode function to each column of the data frame
modes <- apply(df, 2, find_mode)
print(modes)

5.3 Measure of Dispersion(Descriptive Statistics 3)

5.3.1 Range
#Range: The difference between the maximum and minimum values.
data <- c(10, 20, 30, 40, 50)
range(data)
# Calculate the range using the range function
range_value <- range(data)[2] - range(data)[1]
range_value
5.3.2 Variance
#Variance: The average squared deviation from the mean.
var(data)
5.3.3 Standard Deviation
#Standard Deviation: The square root of the variance, providing a measure of dispersion in the same
units as the original data.
sd(data)
5.4 Datasheet (Descriptive Statistics 4)
# Load the iris dataset
data(iris)
# Calculate the range of Sepal Length
range(iris$Sepal.Length)
# Calculate the variance of Petal Width
var(iris$Petal.Width)
# Calculate the standard deviation of Sepal Width
sd(iris$Sepal.Width)
# Calculate the IQR of the Petal Length
IQR(iris$Petal.Length)

5.4.1 Tribbles in R: A Concise Data Frame Creation

A tribble is a concise way to create a data frame in R, especially useful for small data sets. It's part of
the tidyverse package.
STEP 1: Load the ‘dplyr’ package in R.
STEP 2: Tick ‘Tribble.’ in RStudio

19
Dr. Arun Julka

tribble(~X,~Y,"v",15,"w",5,"x",25,"y",20)
tribble(~A,~B,"m",11:14,"n",2:6,"o",21:25,"p",51:56)

5.5 Student data case (Descriptive Statistics5)

• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Excel”
• Select the data file Student_data
• Click open
#check the first 6 rows of the file
head(Student_data)
#check the last 6 rows of the file
tail(Student_data)
#check the dimensions
dim(Student_data)
#check the number of columns
length(Student_data)
#check the variables
attributes(Student_data)
#check the class of variables
class(Student_data$`Roll No.`)
class(Student_data$Gender)
class(Student_data$`Income Group`)
class(Student_data$Marks)
#check the summary of the file
summary(Student_data)
#check the summary of marks only
summary(Student_data$Marks)
BA<-Student_data$Marks
summary(BA)
#convert Income Group into categorical
IG<-as.factor(Student_data$`Income Group`)
summary(IG)
#convert Gender into categorical
gender<-as.factor(Student_data$Gender)
summary(gender)

6 Relationship between two variables

6.1 Covariance (Descriptive Statistics6)
x<-c(1,3,5,10)
y<-c(2,4,6,20)
cov(x,y)
cov(x,y, method = "pearson")
cov(x,y, method = "kendall")
cov(x,y, method = "spearman")

6.2 Correlation (Descriptive Statistics7)

x<-c(1,3,5,10)
y<-c(2,4,6,20)

20
Dr. Arun Julka

cor(x,y)
cor(x,y, method = "pearson")
cor(x,y, method = "kendall")
cor(x,y, method = "spearman")

6.3 Coefficient of Determination (Descriptive Statistics8)

x<-c(1,3,5,10)
y<-c(2,4,6,20)
r_squared<-cor(x,y)^2
r_squared
OR
# Sample data
x <- c(1, 3, 5, 10)
y <- c(2, 4, 6, 20)
# Create a linear model
model <- lm(y ~ x)
# Extract R-squared
r_squared <- summary(model)$r.squared
# Print R-squared
print(r_squared)

##find covariance and correlation

u<-c(1,2,3,4,5,6,7,8,9,10)
v<-c(10,9,8,7,6,5,4,3,2,1)
w<-c(2,4,6,8,10,12,14,16,18,20)
cov(u,v)
cor(u,v)
cov(u,w)
cor(u,w)

21
Dr. Arun Julka

7 Citation
citation()
To cite R in publications, use:

R Core Team (2024). _R: A Language and Environment for Statistical

Computing_. R Foundation for Statistical Computing, Vienna,
Austria. <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2024},
url = {https://www.R-project.org/},
}

#Simple Regression using Excel in R

• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Excel”
• Select the data file income_data
• Click open
install.packages("lmtest")
income.happiness.lm<-lm(happiness~income,data=income_data)
summary(income.happiness.lm)
plot(income.happiness.lm)

8.2 Multiple Regression using R

8.2.1 Case1
# Load the mtcars data set
data(mtcars)
View(mtcars)
# Multiple Regression Model
model <- lm(mpg ~ wt + hp, data = mtcars)
# Summary of the model
summary(model)
plot(model)
# Plot the fitted values against the actual values
plot(mtcars$mpg, fitted(model))
abline(a = 0, b = 1, col = "red")
8.2.2 Case2
# Multiple Regression using Sample data
house_data <- data.frame(
price = c(200, 250, 300, 350, 400,410,420,450,500),
sqft = c(1000, 1200, 1500, 1800, 2000,2100,2200,2300,2500),

23
Dr. Arun Julka

floor = c(12, 9, 8,7,6,4,3,2,1)

)
# Create a multiple regression model
model <- lm(price ~ sqft + floor, data = house_data)

# Summarise the model

summary(model)

#Making Predictions
new_data <- data.frame(sqft = 1600, floor = 7)
predicted_price <- predict(model, newdata = new_data)
print(predicted_price)

24
Dr. Arun Julka

Getting started with R:

Introduction to R, Advantages of R, Installation of R Packages, Importing data from
spreadsheet files, Commands and Syntax, Packages and Libraries.
Data Structures in R:
Vectors, Matrices, Arrays, Lists, Factors, Data Frames, Conditionals and Control Flows,
Loops, Functions, and Apply family.
Descriptive Statistics Using R:
Importing Data file; Data visualisation using charts: histograms, bar charts, box plots, line
graphs, scatter plots. etc.
Data description: Measure of Central Tendency, Measure of Dispersion,
Relationship between variables: Covariance, Correlation and coefficient of determination.

Data Visualization in R
No ratings yet
Data Visualization in R
36 pages
Lean Six Sigma Guidebook
100% (13)
Lean Six Sigma Guidebook
200 pages
Quality Control
100% (3)
Quality Control
8 pages
01-MSBA-615 - Introduction To R Programming and R Studio
No ratings yet
01-MSBA-615 - Introduction To R Programming and R Studio
47 pages
Bayes CPH - Tutorial R
No ratings yet
Bayes CPH - Tutorial R
9 pages
VDA QMC 2016 English Web
33% (3)
VDA QMC 2016 English Web
158 pages
Managerial Economics Chapter 5 Presentation
0% (1)
Managerial Economics Chapter 5 Presentation
27 pages
R Script
No ratings yet
R Script
25 pages
Essential R
No ratings yet
Essential R
183 pages
R Lanaguage
No ratings yet
R Lanaguage
25 pages
EssentialR PDF
No ratings yet
EssentialR PDF
181 pages
SSMDA Expt 7
No ratings yet
SSMDA Expt 7
16 pages
Introduction To R
No ratings yet
Introduction To R
36 pages
R Programming Language_ 2020 Edition
No ratings yet
R Programming Language_ 2020 Edition
228 pages
R Programming Lab
No ratings yet
R Programming Lab
46 pages
R Intro Script
No ratings yet
R Intro Script
86 pages
LAB MANUAL
No ratings yet
LAB MANUAL
46 pages
E5 - Statistical Analysis Using R
100% (1)
E5 - Statistical Analysis Using R
45 pages
An R Companion To Statistical Thinking For The 21st Century
No ratings yet
An R Companion To Statistical Thinking For The 21st Century
159 pages
1.R Unit 1
No ratings yet
1.R Unit 1
49 pages
Untitled
No ratings yet
Untitled
59 pages
R Programming R Basics For Beginners. (Z-Library)
No ratings yet
R Programming R Basics For Beginners. (Z-Library)
177 pages
Stats With R
No ratings yet
Stats With R
103 pages
R Tutiorial
No ratings yet
R Tutiorial
6 pages
Statistical Analysis and Visualizations Using R: Okan Bulut
No ratings yet
Statistical Analysis and Visualizations Using R: Okan Bulut
96 pages
R Practical Report
No ratings yet
R Practical Report
55 pages
r Programming Lab
No ratings yet
r Programming Lab
26 pages
ProgrammingForDS13_introR
No ratings yet
ProgrammingForDS13_introR
25 pages
A Brief Introduction To R
No ratings yet
A Brief Introduction To R
17 pages
CLASS ONE
No ratings yet
CLASS ONE
66 pages
Basic+R Course
No ratings yet
Basic+R Course
30 pages
R Language
No ratings yet
R Language
59 pages
Statistical Methods Lab Manual-2021-22
No ratings yet
Statistical Methods Lab Manual-2021-22
58 pages
Pranav R Programming Lab File
No ratings yet
Pranav R Programming Lab File
41 pages
A Concise Tutorial On R
No ratings yet
A Concise Tutorial On R
112 pages
basics of R
No ratings yet
basics of R
12 pages
Nirula R Programming Lab Manual (1)
No ratings yet
Nirula R Programming Lab Manual (1)
94 pages
R RStudio Basics
No ratings yet
R RStudio Basics
26 pages
Assignment For MCA 3rd Sem HPU R Programming
No ratings yet
Assignment For MCA 3rd Sem HPU R Programming
31 pages
DAR Programming - An Approach to Data Analytics-1
No ratings yet
DAR Programming - An Approach to Data Analytics-1
156 pages
Computing-II - Lecture Notes-I
No ratings yet
Computing-II - Lecture Notes-I
72 pages
STAT319 Lab Manual Based On R - Final Version
No ratings yet
STAT319 Lab Manual Based On R - Final Version
127 pages
Getting Started With R and RStudio
No ratings yet
Getting Started With R and RStudio
35 pages
Howtouser: 1 What Is R
No ratings yet
Howtouser: 1 What Is R
6 pages
R Basic
No ratings yet
R Basic
16 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
Lecture 1
No ratings yet
Lecture 1
35 pages
A Crash R Course On Statistical Graphics
No ratings yet
A Crash R Course On Statistical Graphics
169 pages
How To Use The R Programming Language For Statistical Analyses
No ratings yet
How To Use The R Programming Language For Statistical Analyses
38 pages
Introduction To R
No ratings yet
Introduction To R
67 pages
Sanju - R
No ratings yet
Sanju - R
34 pages
R Language Lab Manual Lab 1
100% (1)
R Language Lab Manual Lab 1
33 pages
R Handout Statistics and Data Analysis Using R
No ratings yet
R Handout Statistics and Data Analysis Using R
91 pages
Introduction To R
No ratings yet
Introduction To R
20 pages
Statistical Analysis With R - A Quick Start
100% (1)
Statistical Analysis With R - A Quick Start
47 pages
R Workshop
No ratings yet
R Workshop
47 pages
Lecture 2
No ratings yet
Lecture 2
163 pages
Getting Started with R
No ratings yet
Getting Started with R
15 pages
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
No ratings yet
Introduction To R and Rstudio, R Script, Calling Functions, Running Code
10 pages
Lecture Notes
100% (1)
Lecture Notes
82 pages
Topic 1 - Intro To Basics
No ratings yet
Topic 1 - Intro To Basics
38 pages
A Discourse Analysis of 1 Peter
From Everand
A Discourse Analysis of 1 Peter
Ervin Ray Starwalt
No ratings yet
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
From Everand
The Linux Terminal for Advanced Users - The Command Line Made Easy: First Edition
Michael Basler
No ratings yet
Software Patterns Made Easy
From Everand
Software Patterns Made Easy
Justice Nanhou
No ratings yet
Thesis - Personality, - Self - Efficacy - An PDF
No ratings yet
Thesis - Personality, - Self - Efficacy - An PDF
689 pages
3-Determining How Cost Behave - Cost Behavior
No ratings yet
3-Determining How Cost Behave - Cost Behavior
51 pages
Assignment #4: Probability: Course Title: Business Statistics I
No ratings yet
Assignment #4: Probability: Course Title: Business Statistics I
2 pages
MBA Project
100% (1)
MBA Project
32 pages
Chapter 15 - ANCOVA - 2019
No ratings yet
Chapter 15 - ANCOVA - 2019
20 pages
Selecting Appropriate Research Methods and Tools
No ratings yet
Selecting Appropriate Research Methods and Tools
5 pages
SM I 2013 LecturesWeek 6
No ratings yet
SM I 2013 LecturesWeek 6
7 pages
Data Mining
No ratings yet
Data Mining
14 pages
Yearly Lesson Plan KSSM Form 1
No ratings yet
Yearly Lesson Plan KSSM Form 1
8 pages
A Risk-Oriented Model For Factor Rotation Decisions
No ratings yet
A Risk-Oriented Model For Factor Rotation Decisions
38 pages
Recovering Learning Lossin Numeracy Through Integrative Pedagogy
No ratings yet
Recovering Learning Lossin Numeracy Through Integrative Pedagogy
16 pages
Statistics in Oracle
No ratings yet
Statistics in Oracle
13 pages
ML Unit 2
No ratings yet
ML Unit 2
25 pages
The Scientific Problem and Its Background
No ratings yet
The Scientific Problem and Its Background
8 pages
SBE11E Chapter 09
No ratings yet
SBE11E Chapter 09
32 pages
China Thesis Database
100% (3)
China Thesis Database
7 pages
6 PDF
No ratings yet
6 PDF
5 pages
Final Exam SP '18
No ratings yet
Final Exam SP '18
6 pages
Data Mining Techniques: Introductory and Advanced Topics
No ratings yet
Data Mining Techniques: Introductory and Advanced Topics
17 pages
Nama: Rienaldy Wienanto NIM: H4401201071 Matkul: Ekonometrika Tugas 6
No ratings yet
Nama: Rienaldy Wienanto NIM: H4401201071 Matkul: Ekonometrika Tugas 6
5 pages
CH 3 - Uncertainty, Repeatability and Accuracy
No ratings yet
CH 3 - Uncertainty, Repeatability and Accuracy
11 pages
Fellowship Resume Template
100% (1)
Fellowship Resume Template
6 pages
Matching Supply With Demand: Course Description
No ratings yet
Matching Supply With Demand: Course Description
4 pages
Mdm4u Fianl Exam Formula PDF
No ratings yet
Mdm4u Fianl Exam Formula PDF
1 page
Normative Values For The Unipedal Stance Test With Eyes Open and Closed
No ratings yet
Normative Values For The Unipedal Stance Test With Eyes Open and Closed
8 pages
What Is A Data Scientist
No ratings yet
What Is A Data Scientist
21 pages