0% found this document useful (0 votes)
5 views

R Script

The document is a comprehensive guide on R programming and RStudio, covering topics such as installation, data manipulation, visualization, and statistical analysis. It provides detailed explanations of R's capabilities, advantages, and essential language features, along with practical examples and code snippets. Additionally, it discusses the differences between R and RStudio, and outlines the use of various R packages for data analysis.

Uploaded by

Bhumika Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

R Script

The document is a comprehensive guide on R programming and RStudio, covering topics such as installation, data manipulation, visualization, and statistical analysis. It provides detailed explanations of R's capabilities, advantages, and essential language features, along with practical examples and code snippets. Additionally, it discusses the differences between R and RStudio, and outlines the use of various R packages for data analysis.

Uploaded by

Bhumika Jain
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Dr.

Arun Julka

R
A UN JULKA

1
Dr. Arun Julka

Table of Contents
1 R and RStudio ................................................................................................................................. 4
1.1 R vs RStudio............................................................................................................................. 5
1.2 Introduction to R ..................................................................................................................... 5
1.2.1 How can you use R?........................................................................................................ 5
1.2.2 Getting Started with R..................................................................................................... 5
1.3 Advantages of R ...................................................................................................................... 6
2 Installation of R Packages ............................................................................................................... 7
2.1 R Packages............................................................................................................................... 7
2.2 File Import in R ........................................................................................................................ 7
3 Essentials of the R Language .......................................................................................................... 8
3.1 File R1(calculation) .................................................................................................................. 8
3.2 File R2(vectors) ....................................................................................................................... 9
3.3 File R3(matrices) ..................................................................................................................... 9
3.4 File R4(Arrays and Lists) ........................................................................................................ 10
3.4.1 Arrays ............................................................................................................................ 10
3.4.2 Lists ............................................................................................................................... 10
3.5 File R5(Loops) ........................................................................................................................ 10
3.6 File R6(Factors and Data Frame) ........................................................................................... 11
3.6.1 Factors ........................................................................................................................... 11
3.6.2 Data Frame .................................................................................................................... 11
3.7 File R7(Conditional and Control Flows)................................................................................. 11
3.8 File R8(importing from excel) ............................................................................................... 13
3.9 File R9(text) ........................................................................................................................... 13
3.10 File R10(Apply Function: A Versatile Tool for Data Manipulation) ....................................... 13
3.10.1 apply () .......................................................................................................................... 13
3.10.2 lapply() .......................................................................................................................... 14
3.10.3 sapply() ......................................................................................................................... 14
3.10.4 vapply() ......................................................................................................................... 14
3.10.5 tapply() .......................................................................................................................... 14
4 Data Visualisation using R............................................................................................................ 15
4.1 R11(Histograms).................................................................................................................... 15
4.2 R12(Boxplot) ......................................................................................................................... 15
4.3 R13(Line plot) ........................................................................................................................ 15
4.3.1 Line Plot with single series of data ............................................................................... 15
4.3.2 Line Plot with multiple series of data ........................................................................... 16

2
Dr. Arun Julka

4.4 R14(Scatter Plots).................................................................................................................. 16


4.5 R15(Bar Chart)....................................................................................................................... 17
5 Descriptive Statistics Using R....................................................................................................... 18
5.1 Summary (Descriptive Statistics 1) ....................................................................................... 18
5.2 Measure of Central Tendency (Descriptive Statistics 2) ....................................................... 18
5.2.1 Arithmetic Mean, Median and Mode ............................................................................ 18
5.2.2 Mode using function from the data frame ..................................................................... 19
5.3 Measure of Dispersion(Descriptive Statistics 3) ................................................................... 19
5.3.1 Range ............................................................................................................................ 19
5.3.2 Variance ........................................................................................................................ 19
5.3.3 Standard Deviation........................................................................................................ 19
5.4 Datasheet (Descriptive Statistics 4) ...................................................................................... 19
5.4.1 Tribbles in R: A Concise Data Frame Creation............................................................... 19
5.5 Student data case (Descriptive Statistics5) ........................................................................... 20
6 Relationship between two variables .............................................................................................. 20
6.1 Covariance (Descriptive Statistics6) ...................................................................................... 20
6.2 Correlation (Descriptive Statistics7) ..................................................................................... 20
6.3 Coefficient of Determination (Descriptive Statistics8) ......................................................... 21
7 Citation.......................................................................................................................................... 22
8 Regression using R ....................................................................................................................... 23
8.1 Simple Regression using R..................................................................................................... 23
8.2 Multiple Regression using R .................................................................................................. 23
8.2.1 Case1 ............................................................................................................................. 23
8.2.2 Case2 ............................................................................................................................. 23

3
Dr. Arun Julka

1 R and RStudio
R's open-source nature, extensive statistical capabilities, powerful data visualisation tools, and a vibrant
community make it a compelling choice for data scientists, statisticians, and researchers.
• R is a programming language for statistical computing and graphics.
• R is the successor language of the ‘S’ language.
• The name of this language ‘R’ has been derived from the first alphabet of its developers’ names,
viz., Robert Gentleman and Ross Ihaka.
• R provides many graphical and statistical tools such as linear, and nonlinear modelling,
classification, classical statistical tests, and clustering etc.
• It runs on various UNIX platforms and other systems including Windows and MacOS.
• R offers integrated software facilities including operators for calculations, intermediate tools
for data manipulation, data visualisation, data storage and handling facilities.
• R is a fully developed language. It consists of various loops, conditionals, and other output-
input functions. For this reason, it is popularly known as ‘R Environment.’

RStudio
• RStudio is an integrated development environment (IDE).
• It is particularly designed to work with the R programming language.
• RStudio can be broadly divided into 4 panes:

Source Editor Environment

Console Plots

4
Dr. Arun Julka

1.1 R vs RStudio
Basis R RStudio

Meaning It is a programming language. It is an Integrated Development


Environment (IDE).

Objective It aims at statistical computing It aims at development of statistical


and graphics. programs.

Elaborative process R is the core engine for RStudio is more elaborative in nature as
performing data analysis and it provides a more user- friendly
computations. However, it is environment for working with R.
less elaborative than RStudio.

Independent platform It is an independent platform It is not an independent platform which


which means it can be used on means it is specifically designed for R
any other operating system language only.
that understands the R coding.

Extension R script has an extension ‘. R’ RStudio project file has an extension ‘.


Rproj’

1.2 Introduction to R
Imagine R as a powerful Swiss Army knife for data analysis. It's a programming language and software
environment designed specifically for statistical computing and data visualisation. Think of it as a tool
that allows you to explore, manipulate, and extract insights from data, no matter how complex or messy
it might be.

1.2.1 How can you use R?


R can be used in various fields, including:
➢ Social Sciences: Analysing survey data, conducting opinion polls, and studying social
trends.
➢ Business Analytics: Making data-driven decisions, forecasting sales, and optimising
marketing strategies.
➢ Bioinformatics: Analysing genetic data, studying protein structures, and understanding
biological processes.
➢ Finance: Modelling financial markets, assessing risk, and optimising investment
portfolios.
➢ Environmental Science: Monitoring environmental changes, analysing climate data, and
predicting natural disasters.

1.2.2 Getting Started with R


To start your journey with R, you'll need to:
1. Install R: Download and install R from the official website (https://cran.r-project.org/).
2. Choose an IDE: Consider using an Integrated Development Environment (IDE) like
RStudio, which provides a user-friendly interface for coding and data analysis.

5
Dr. Arun Julka

3. Learn the Basics: Start with basic R syntax, data structures (vectors, matrices, data
frames), and fundamental statistical functions.
4. Explore Packages: Discover and install packages that cater to your specific needs, such as
tidyverse for data manipulation and visualisation, caret for machine learning, and ggplot2
for advanced plotting.
1.3 Advantages of R
R has become a cornerstone for data analysis and statistical computing due to its numerous advantages:
1. Open-Source and Free:
➢Cost-Effective: No licensing fees, making it accessible to everyone.
➢Community-Driven: A large and active community contributes to its development and
provides extensive support.
2. Comprehensive Statistical Capabilities:
➢ Statistical Tests: R offers a wide range of statistical tests for hypothesis testing, regression
analysis, and more.
➢ Machine Learning: Powerful machine-learning algorithms for classification, regression,
clustering, and predictive modelling.
3. Data Visualisation:
➢ High-Quality Graphics: Create stunning visualisations with packages like ggplot2, lattice,
and plotly.
➢ Customisation: Tailor plots to specific needs, including interactive visualisations.
4. Flexibility and Extensibility:
➢ Package Ecosystem: A vast collection of packages (over 18,000) for various statistical and
data analysis tasks.
➢ Custom Function Creation: Develop custom functions to tailor the analysis to specific
requirements.
5. Reproducible Research:
➢ Version Control: Track changes and ensure reproducibility.
➢ R Markdown: Create dynamic documents combining code, output, and narrative text.
6. Platform Independence:
➢ Cross-Platform Compatibility: Runs on Windows, macOS, and Linux.
7. Strong Community Support:
➢ Active Forums and Communities: Online resources for help and collaboration.
➢ Tutorials and Documentation: Extensive documentation and tutorials available.
8. Integration with Other Tools:
➢ Interoperability: Seamlessly integrates with other tools like Python, SQL, and Hadoop.
9. Data Wrangling and Manipulation:
➢ Powerful Data Manipulation: Efficiently clean, transform, and reshape data with
packages like dplyr and tidyr.
10. Big Data Analysis:
➢ Scalability: Handles large datasets with packages like sparklyr and bigr.

6
Dr. Arun Julka

2 Installation of R Packages
2.1 R Packages
➢ base
➢ readxl
➢ readr
➢ dplyr
➢ tidyr
➢ tibble
➢ tidyverse
➢ ggplot2
➢ lmtest
➢ graphics
➢ stats

2.2 File Import in R


File Format Package Required
As Text .txt base
As CSV .csv readr
AS EXCEL .xlsx / .xls readxl
As SPSS, SAS, STATA .sav/.sas/.dta haven

7
Dr. Arun Julka

3 Essentials of the R Language


3.1 File R1(calculation)
#Calculation with R
10+17
10*15
140/7
100-6
2^5
2+2-2/2
3*5/6
3/5*6
1:20
20:1
seq(1,100, by=5)
seq(1,100, by=3)
seq(1,100, 5)
seq(1,100, 3)
rep(7,10)

# arithmetic function in R
abs(-15)
exp(1)
log(exp(1))
log(10)
log10(10)
log10(exp(1))
log(16,4)

# create a variable
x <- -100
x + 70
abs (x)
u <- 19
v <- 11
u+v
sum(u, v)

#create and change a variable


result <- 10 - 4
print(result)
result <- 6 * 7
print(result)
result <- 20 / 5
print(result)
result <- 2^3
print(result)

Remember:

8
Dr. Arun Julka

Order of Operations: R follows the standard order of operations (PEMDAS/BODMAS):


Parentheses, Exponents, Multiplication and Division (from left to right), Addition and Subtraction
(from left to right).

# Calculating the area of a circle


radius <- 5
pi <- 3.14159
area <- pi * radius^2
print(area)

3.2 File R2(vectors)


#Vectors and subscripts
numeric_vector <- c(1, 2, 3, 4, 5)
character_vector <- c("apple", "banana", "cherry")
logical_vector <- c(TRUE, FALSE, TRUE)
# create a numeric vector
y <- c (10,12, -32,40,50,100)
3*y
abs (y)
class(y)
length(y)
mean(y)
max(y)
min(y)
quantile(y)
z<-rep(c(1,2,3,4,5),3)
z
#Extracting elements of a vector using subscripts
y
y[4]
y[3]

3.3 File R3(matrices)


#making a matrix
A <- matrix(c(1,4,7,2,5,8,3,6,9), nrow=3)
A
class(A)
attributes(A)
#making a matrix row-wise using vector
vector <- c(1,2,3,4,4,3,2,1)
B <- matrix(vector, byrow=T, nrow=2)
B
class(B)
attributes(B)
#transpose of a matrix
C<-t(B)
C
#addition, subtraction, and multiplication of matrices

9
Dr. Arun Julka

D <- matrix(c(10,40,70,20,50,80,30,60,90), nrow=3)


D
2*D-5
E <- matrix(c(5,6,7,4,8,9,10,6,11),nrow=3)
E
3*E+5
D+E
D-E
DE<-D%*%E
DE

3.4 File R4(Arrays and Lists)


3.4.1 Arrays
# Create a 2D array
my_array <- array(1:24, dim = c(2, 3, 4))
print(my_array)
#make two vectors of varied lengths
v1<-c(5,9,3)
v2<-c(10,11,12,13,14,15)
#use the above vectors as inputs for the array
new_array<- array(c(v1,v2),dim=c(3,3,2))
new_array
3.4.2 Lists
#create List
my_list <- list(name = "Arun Julka",
age = 53,
city = "New Delhi",
hobbies = c("reading", "coding", "painting")
)
my_list
View(my_list)

3.5 File R5(Loops)


# create a Loop
Loops are fundamental programming constructs that allow you to execute a code block repeatedly. R
provides several ways to implement loops
#for
for (i in 1:5) {
print(i)
}
#while
i <- 6
while (i <= 10) {
print(i)
i <- i + 1
}
#repeat
i <- 11
repeat {
print(i)

10
Dr. Arun Julka

i <- i + 1
if (i > 15) {
break
}

3.6 File R6(Factors and Data Frame)


3.6.1 Factors
#Factor
data<-c("Male", "Female", "Male", "Child", "Child", "Male", "Female", "Female")
data
factor.data<-factor(data)
factor.data
3.6.2 Data Frame
# create a data frame
name <- c("Arun Julka", "Arvinder", "Deepak Mehra", "Sanjay Garg", "Soma Jain")
age <- c(60, 55, 52, 56, 45)
gender <- c("M", "F", "M", "M", "F")
friends <- data.frame (name, age, gender)
friends[,]
friends[1,]
friends[2,]
friends[3,]
friends[,1]
friends[,2]
friends[,]
friends[1,1]
friends[1:3,1]
# create a data frame
df<-data.frame(x=c(1,4,4,5,6,10,12,13),y=c(2,2,3,3,4,5,11,11),z=c(8,9,9,9,10,13,15,17))
df

3.7 File R7(Conditional and Control Flows)


# if statement
x <- 10
if (x > 5) {
print("x is greater than 5")
}
# if-else statement
y <- 15
if (y > 15) {
print("y is greater than 15")
} else {
print("y is less than or equal to 15")
}
# if-else if-else statement
z <- 3
if (z > 10) {
print("z is greater than 10")
} else if (z > 5) {

11
Dr. Arun Julka

print("z is greater than 5")


} else {
print("z is less than or equal to 5")
}
#Break
for (i in 1:10) {
if (i == 5) {
break
}
print(i)
}
#next
for (j in 1:10) {
if (j %% 2 == 0) {
next
}
print(j)
}

12
Dr. Arun Julka

3.8 File R8(importing from excel)


#importing from excel
library(readxl)
CreditLimit <- read_excel("C:/Users/ADMIN/OneDrive/Desktop/R/R Data/CreditLimit.xlsx")
View(CreditLimit)

#check the first 6 rows of Excel


head(CreditLimit)
#check the last 6 rows of Excel
tail(CreditLimit)
#check the summary of Excel
summary(CreditLimit)
#check the summary of a variable Excel
summary(CreditLimit$Income)
cl<- CreditLimit
cl
#check dimensions of Excel
dim(CreditLimit)
#check the number of variables in Excel
length(CreditLimit)
#check the list of variables in Excel
attributes(CreditLimit)
#check the data type of variables in Excel
class(CreditLimit)

3.9 File R9(text)


#importing a text file
TEXT <- read.delim("C:/Users/ADMIN/OneDrive/Desktop/R/TEXT.txt")
View(TEXT)

Remember:
File Import in R
File Format Package Required
As Text .txt base
As CSV .csv readr
AS EXCEL .xlsx / .xls readxl
As SPSS, SAS, STATA .sav/ .sas/.dta haven

3.10 File R10(Apply Function: A Versatile Tool for Data Manipulation)

The apply family of functions in R provides efficient ways to apply a function to elements of an array
or list. These functions can significantly streamline your data analysis tasks.

Here are the primary functions in the apply family:

3.10.1 apply ()
Purpose: Applies a function to the margins of an array.
Syntax: apply(X, MARGIN, FUN, ...)

13
Dr. Arun Julka

• X: The array to which the function is applied.


• MARGIN: A vector specifying the margins (rows, columns, etc.) to apply the
function to.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a matrix
my_matrix <- matrix(1:9, nrow = 3, ncol = 3)
# Calculate the sum of each row
row_sums <- apply(my_matrix, 1, sum)
print(row_sums)

3.10.2 lapply()
Purpose: Applies a function to each element of a list.
Syntax: lapply(X, FUN, ...)
• X: The list to which the function is applied.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a list of numbers
my_list <- list(1, 2, 3, 4, 5)
# Square each element
squared_list <- lapply(my_list, function(x) x^2)
print(squared_list)

3.10.3 sapply()
Purpose: Similar to lapply(), but simplifies the output to a vector or matrix.
Syntax: sapply(X, FUN, ..., simplify = TRUE)

3.10.4 vapply()
Purpose: Like sapply(), but allows you to specify the type of the output.
Syntax: vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
3.10.5 tapply()
Purpose: Applies a function to subsets of a vector, splitting the vector based on factors.
Syntax: tapply(X, INDEX, FUN, ..., simplify = TRUE)
Example:
# Create a vector and a factor
x <- 1:10
f <- factor(rep(c("A", "B"), 5))
# Calculate the mean of x for each level of f
means <- tapply(x, f, mean)
print(means)

14
Dr. Arun Julka

4 Data Visualisation using R


4.1 R11(Histograms)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
hist(iris$Sepal.Length)
hist(iris$Sepal.Length, col="steelblue")
hist(iris$Sepal.Width, col="red")
hist(iris$Sepal.Width, col="yellow")
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour
hist(iris$Petal.Length,
main='Histogram',
xlab='Length',
ylab='Frequency',
col='red')

4.2 R12(Boxplot)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
boxplot(iris$Sepal.Length)
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour of boxplot and bor-
der as a border of boxplot
boxplot(iris$Petal.Length,
main='Petal Length',
xlab='Species',
ylab='Petal Length',
col='pink',
border = 'red')
OR
boxplot(iris$Petal.Length,main='Petal Length', xlab='Species', ylab='Petal Length', col='pink', border
= 'red')

4.3 R13(Line plot)


4.3.1 Line Plot with single series of data
#Plot a vector
l1<- c(7,12,28,3,41)

15
Dr. Arun Julka

plot(l1)
#Plot a vector using points
plot(l1,type = 'p')
#Plot a vector using lines
plot(l1,type = 'l')
#Plot a vector using both points and lines
plot(l1,type = 'o')
#Plot a vector using both points and lines with colour
l2<- c(1,2,8,13,40)
plot(l2,type = 'o', col='red')
#Plot a vector using both points and lines with colour, heading, label of the x & y axis
l3<- c(5,2,11,7,20,15,22,17,25)
plot(l3,type = 'o', col='green', main='Line Plot', xlab='points', ylab='Frequency')

4.3.2 Line Plot with multiple series of data


#variable 't' represent time
t<-0:10
#variable 'z' showing quantity that is decreasing in time
z<-exp(-t/2)
#variable 'w' that is increasing with time
w<-0.1*exp(t/3)
#plot t and z
plot(t,z,type ='l')
#plot t and z with colour red, line width 3, label of x & y axes time and concentration
plot(t,z,type ='l', col='red', lwd='3', xlab='Time', ylab='Concentration')
#plot t and w
plot(t,w,type ='o')
#plot t and w with colour green, line width 4, label of x & y axes time and concentration
plot(t,w,type ='o', col='green', lwd='4', xlab='Time', ylab='Concentration')
#plot both lines
lines(t,z,col='red', lwd='3')
#add title 'Exponential Growth and decay'
title("Exponential Growth and decay")

4.4 R14(Scatter Plots)


• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
#plot scatter diagram
plot(iris$Sepal.Length, iris$Sepal.Width)
#plot scatter diagram in dot shape
plot(iris$Sepal.Length, iris$Sepal.Width, pch=20)
#plot scatter diagram taking pch 0 to 25
plot(iris$Sepal.Length, iris$Sepal.Width,pch=0)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=1)

16
Dr. Arun Julka

plot(iris$Sepal.Length, iris$Sepal.Width,pch=2)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=3)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=4)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=5)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=6)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=7)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=8)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=9)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=10)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=11)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=12)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=13)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=14)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=15)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=16)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=17)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=18)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=19)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=20)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=21)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=22)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=23)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=24)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=25)
#plot scatter diagram in dot shape, red colour, heading Scatter Plot, Label of x & y axes sepal_length
& sepal_width
plot(iris$Sepal.Length, iris$Sepal.Width,col="red", main="Scatter Plot",xlab="sepal_length",
ylab="sepal_width", pch=20)

4.5 R15(Bar Chart)


• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
# Count the number of observations for each species
species_counts <- table(iris$Species)
species_counts
# Create a bar chart
barplot(species_counts,
main = "Distribution of Iris Species",
xlab = "Species",
ylab = "Count",
col = c("blue", "green", "pink"))

17
Dr. Arun Julka

5 Descriptive Statistics Using R


5.1 Summary (Descriptive Statistics 1)
#Summary of a data sheet
data(“iris”)
View(iris)
summary (iris)
• Mean
• Minimum
• Medium
• Quartiles
• Maximum

5.2 Measure of Central Tendency (Descriptive Statistics 2)


5.2.1 Arithmetic Mean, Median and Mode
#create a vector
x<-c(3,7,5,13,20,23,39,23,40,23,14,12,56,23)
#calculate mean of x
mean(x)
#calculate median of x
median(x)
#create a frequency table
t1<-table(x)
t1
#calculate mode of x
mode_x<-names(t1)[which(t1==max(t1))]
mode_x
#create a vector
y<-c(3,7,5,13,20,20,39,23,40,23,14,12,56,23,20)
#calculate mean of y
mean(y)
#calculate median of y
median(y)
#create a frequency table
t2<-table(y)
t2
#calculate mode of y
mode_y<-names(t2)[which(t2==max(t2))]
mode_y
#create a vector
z<-c(3,7,5,13,20,20,39,23,25,25,40,23,14,12,56,23,20,25)
#calculate mean of z
mean(z)
#calculate median of z
median(z)
#create a frequency table
t3<-table(z)
t3
#calculate mode of z
mode_z<-names(t3)[which(t3==max(t3))]

18
Dr. Arun Julka

mode_z

5.2.2 Mode using function from the data frame


# create a data frame
df <- data.frame(x = c(1, 4, 4, 5, 6, 7, 10, 12),
y = c(2, 2, 3, 3, 4, 5, 11, 11),
z = c(8, 9, 9, 9, 10, 13, 15, 17))
# Define the mode function
find_mode <- function(x) {
unique_values <- unique(x)
counts <- tabulate(match(x, unique_values))
unique_values[which.max(counts)]
}
# Apply the find_mode function to each column of the data frame
modes <- apply(df, 2, find_mode)
print(modes)

5.3 Measure of Dispersion(Descriptive Statistics 3)


5.3.1 Range
#Range: The difference between the maximum and minimum values.
data <- c(10, 20, 30, 40, 50)
range(data)
# Calculate the range using the range function
range_value <- range(data)[2] - range(data)[1]
range_value
5.3.2 Variance
#Variance: The average squared deviation from the mean.
var(data)
5.3.3 Standard Deviation
#Standard Deviation: The square root of the variance, providing a measure of dispersion in the same
units as the original data.
sd(data)
5.4 Datasheet (Descriptive Statistics 4)
# Load the iris dataset
data(iris)
# Calculate the range of Sepal Length
range(iris$Sepal.Length)
# Calculate the variance of Petal Width
var(iris$Petal.Width)
# Calculate the standard deviation of Sepal Width
sd(iris$Sepal.Width)
# Calculate the IQR of the Petal Length
IQR(iris$Petal.Length)

5.4.1 Tribbles in R: A Concise Data Frame Creation


A tribble is a concise way to create a data frame in R, especially useful for small data sets. It's part of
the tidyverse package.
STEP 1: Load the ‘dplyr’ package in R.
STEP 2: Tick ‘Tribble.’ in RStudio

19
Dr. Arun Julka

tribble(~X,~Y,"v",15,"w",5,"x",25,"y",20)
tribble(~A,~B,"m",11:14,"n",2:6,"o",21:25,"p",51:56)

5.5 Student data case (Descriptive Statistics5)


• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Excel”
• Select the data file Student_data
• Click open
#check the first 6 rows of the file
head(Student_data)
#check the last 6 rows of the file
tail(Student_data)
#check the dimensions
dim(Student_data)
#check the number of columns
length(Student_data)
#check the variables
attributes(Student_data)
#check the class of variables
class(Student_data$`Roll No.`)
class(Student_data$Gender)
class(Student_data$`Income Group`)
class(Student_data$Marks)
#check the summary of the file
summary(Student_data)
#check the summary of marks only
summary(Student_data$Marks)
BA<-Student_data$Marks
summary(BA)
#convert Income Group into categorical
IG<-as.factor(Student_data$`Income Group`)
summary(IG)
#convert Gender into categorical
gender<-as.factor(Student_data$Gender)
summary(gender)

6 Relationship between two variables


6.1 Covariance (Descriptive Statistics6)
x<-c(1,3,5,10)
y<-c(2,4,6,20)
cov(x,y)
cov(x,y, method = "pearson")
cov(x,y, method = "kendall")
cov(x,y, method = "spearman")

6.2 Correlation (Descriptive Statistics7)


x<-c(1,3,5,10)
y<-c(2,4,6,20)

20
Dr. Arun Julka

cor(x,y)
cor(x,y, method = "pearson")
cor(x,y, method = "kendall")
cor(x,y, method = "spearman")

6.3 Coefficient of Determination (Descriptive Statistics8)


x<-c(1,3,5,10)
y<-c(2,4,6,20)
r_squared<-cor(x,y)^2
r_squared
OR
# Sample data
x <- c(1, 3, 5, 10)
y <- c(2, 4, 6, 20)
# Create a linear model
model <- lm(y ~ x)
# Extract R-squared
r_squared <- summary(model)$r.squared
# Print R-squared
print(r_squared)

##find covariance and correlation


u<-c(1,2,3,4,5,6,7,8,9,10)
v<-c(10,9,8,7,6,5,4,3,2,1)
w<-c(2,4,6,8,10,12,14,16,18,20)
cov(u,v)
cor(u,v)
cov(u,w)
cor(u,w)

21
Dr. Arun Julka

7 Citation
citation()
To cite R in publications, use:

R Core Team (2024). _R: A Language and Environment for Statistical


Computing_. R Foundation for Statistical Computing, Vienna,
Austria. <https://www.R-project.org/>.

A BibTeX entry for LaTeX users is

@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2024},
url = {https://www.R-project.org/},
}

See also ‘citation("pkgname")’for citing R packages.

22
Dr. Arun Julka

8 Regression using R
8.1 Simple Regression using R
#Perfect Regression Model_1
x1<-c(1,2,3,4,5)
x2<-c(3,5,7,9,11)
reg1<-lm(x2~x1)
summary(reg1)
plot(reg1)
#Perfect Regression Model_2
x3<-c(6,7,8,9,10)
x4<-c(-4,-5,-6,-7,-9)
reg2<-lm(x4~x3)
summary(reg2)
plot(reg2)
#imperfect Regression Model
x5<-c(11,12,13,14,15)
x6<-c(8,11,-5,22,-0)
reg3<-lm(x6~x5)
summary(reg3)
plot(reg3)

#Simple Regression using Excel in R


• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Excel”
• Select the data file income_data
• Click open
install.packages("lmtest")
income.happiness.lm<-lm(happiness~income,data=income_data)
summary(income.happiness.lm)
plot(income.happiness.lm)

8.2 Multiple Regression using R


8.2.1 Case1
# Load the mtcars data set
data(mtcars)
View(mtcars)
# Multiple Regression Model
model <- lm(mpg ~ wt + hp, data = mtcars)
# Summary of the model
summary(model)
plot(model)
# Plot the fitted values against the actual values
plot(mtcars$mpg, fitted(model))
abline(a = 0, b = 1, col = "red")
8.2.2 Case2
# Multiple Regression using Sample data
house_data <- data.frame(
price = c(200, 250, 300, 350, 400,410,420,450,500),
sqft = c(1000, 1200, 1500, 1800, 2000,2100,2200,2300,2500),

23
Dr. Arun Julka

floor = c(12, 9, 8,7,6,4,3,2,1)


)
# Create a multiple regression model
model <- lm(price ~ sqft + floor, data = house_data)

# Summarise the model


summary(model)

#Making Predictions
new_data <- data.frame(sqft = 1600, floor = 7)
predicted_price <- predict(model, newdata = new_data)
print(predicted_price)

24
Dr. Arun Julka

Getting started with R:


Introduction to R, Advantages of R, Installation of R Packages, Importing data from
spreadsheet files, Commands and Syntax, Packages and Libraries.
Data Structures in R:
Vectors, Matrices, Arrays, Lists, Factors, Data Frames, Conditionals and Control Flows,
Loops, Functions, and Apply family.
Descriptive Statistics Using R:
Importing Data file; Data visualisation using charts: histograms, bar charts, box plots, line
graphs, scatter plots. etc.
Data description: Measure of Central Tendency, Measure of Dispersion,
Relationship between variables: Covariance, Correlation and coefficient of determination.

25

You might also like