R Script
R Script
Arun Julka
R
A UN JULKA
1
Dr. Arun Julka
Table of Contents
1 R and RStudio ................................................................................................................................. 4
1.1 R vs RStudio............................................................................................................................. 5
1.2 Introduction to R ..................................................................................................................... 5
1.2.1 How can you use R?........................................................................................................ 5
1.2.2 Getting Started with R..................................................................................................... 5
1.3 Advantages of R ...................................................................................................................... 6
2 Installation of R Packages ............................................................................................................... 7
2.1 R Packages............................................................................................................................... 7
2.2 File Import in R ........................................................................................................................ 7
3 Essentials of the R Language .......................................................................................................... 8
3.1 File R1(calculation) .................................................................................................................. 8
3.2 File R2(vectors) ....................................................................................................................... 9
3.3 File R3(matrices) ..................................................................................................................... 9
3.4 File R4(Arrays and Lists) ........................................................................................................ 10
3.4.1 Arrays ............................................................................................................................ 10
3.4.2 Lists ............................................................................................................................... 10
3.5 File R5(Loops) ........................................................................................................................ 10
3.6 File R6(Factors and Data Frame) ........................................................................................... 11
3.6.1 Factors ........................................................................................................................... 11
3.6.2 Data Frame .................................................................................................................... 11
3.7 File R7(Conditional and Control Flows)................................................................................. 11
3.8 File R8(importing from excel) ............................................................................................... 13
3.9 File R9(text) ........................................................................................................................... 13
3.10 File R10(Apply Function: A Versatile Tool for Data Manipulation) ....................................... 13
3.10.1 apply () .......................................................................................................................... 13
3.10.2 lapply() .......................................................................................................................... 14
3.10.3 sapply() ......................................................................................................................... 14
3.10.4 vapply() ......................................................................................................................... 14
3.10.5 tapply() .......................................................................................................................... 14
4 Data Visualisation using R............................................................................................................ 15
4.1 R11(Histograms).................................................................................................................... 15
4.2 R12(Boxplot) ......................................................................................................................... 15
4.3 R13(Line plot) ........................................................................................................................ 15
4.3.1 Line Plot with single series of data ............................................................................... 15
4.3.2 Line Plot with multiple series of data ........................................................................... 16
2
Dr. Arun Julka
3
Dr. Arun Julka
1 R and RStudio
R's open-source nature, extensive statistical capabilities, powerful data visualisation tools, and a vibrant
community make it a compelling choice for data scientists, statisticians, and researchers.
• R is a programming language for statistical computing and graphics.
• R is the successor language of the ‘S’ language.
• The name of this language ‘R’ has been derived from the first alphabet of its developers’ names,
viz., Robert Gentleman and Ross Ihaka.
• R provides many graphical and statistical tools such as linear, and nonlinear modelling,
classification, classical statistical tests, and clustering etc.
• It runs on various UNIX platforms and other systems including Windows and MacOS.
• R offers integrated software facilities including operators for calculations, intermediate tools
for data manipulation, data visualisation, data storage and handling facilities.
• R is a fully developed language. It consists of various loops, conditionals, and other output-
input functions. For this reason, it is popularly known as ‘R Environment.’
RStudio
• RStudio is an integrated development environment (IDE).
• It is particularly designed to work with the R programming language.
• RStudio can be broadly divided into 4 panes:
Console Plots
4
Dr. Arun Julka
1.1 R vs RStudio
Basis R RStudio
Elaborative process R is the core engine for RStudio is more elaborative in nature as
performing data analysis and it provides a more user- friendly
computations. However, it is environment for working with R.
less elaborative than RStudio.
1.2 Introduction to R
Imagine R as a powerful Swiss Army knife for data analysis. It's a programming language and software
environment designed specifically for statistical computing and data visualisation. Think of it as a tool
that allows you to explore, manipulate, and extract insights from data, no matter how complex or messy
it might be.
5
Dr. Arun Julka
3. Learn the Basics: Start with basic R syntax, data structures (vectors, matrices, data
frames), and fundamental statistical functions.
4. Explore Packages: Discover and install packages that cater to your specific needs, such as
tidyverse for data manipulation and visualisation, caret for machine learning, and ggplot2
for advanced plotting.
1.3 Advantages of R
R has become a cornerstone for data analysis and statistical computing due to its numerous advantages:
1. Open-Source and Free:
➢Cost-Effective: No licensing fees, making it accessible to everyone.
➢Community-Driven: A large and active community contributes to its development and
provides extensive support.
2. Comprehensive Statistical Capabilities:
➢ Statistical Tests: R offers a wide range of statistical tests for hypothesis testing, regression
analysis, and more.
➢ Machine Learning: Powerful machine-learning algorithms for classification, regression,
clustering, and predictive modelling.
3. Data Visualisation:
➢ High-Quality Graphics: Create stunning visualisations with packages like ggplot2, lattice,
and plotly.
➢ Customisation: Tailor plots to specific needs, including interactive visualisations.
4. Flexibility and Extensibility:
➢ Package Ecosystem: A vast collection of packages (over 18,000) for various statistical and
data analysis tasks.
➢ Custom Function Creation: Develop custom functions to tailor the analysis to specific
requirements.
5. Reproducible Research:
➢ Version Control: Track changes and ensure reproducibility.
➢ R Markdown: Create dynamic documents combining code, output, and narrative text.
6. Platform Independence:
➢ Cross-Platform Compatibility: Runs on Windows, macOS, and Linux.
7. Strong Community Support:
➢ Active Forums and Communities: Online resources for help and collaboration.
➢ Tutorials and Documentation: Extensive documentation and tutorials available.
8. Integration with Other Tools:
➢ Interoperability: Seamlessly integrates with other tools like Python, SQL, and Hadoop.
9. Data Wrangling and Manipulation:
➢ Powerful Data Manipulation: Efficiently clean, transform, and reshape data with
packages like dplyr and tidyr.
10. Big Data Analysis:
➢ Scalability: Handles large datasets with packages like sparklyr and bigr.
6
Dr. Arun Julka
2 Installation of R Packages
2.1 R Packages
➢ base
➢ readxl
➢ readr
➢ dplyr
➢ tidyr
➢ tibble
➢ tidyverse
➢ ggplot2
➢ lmtest
➢ graphics
➢ stats
7
Dr. Arun Julka
# arithmetic function in R
abs(-15)
exp(1)
log(exp(1))
log(10)
log10(10)
log10(exp(1))
log(16,4)
# create a variable
x <- -100
x + 70
abs (x)
u <- 19
v <- 11
u+v
sum(u, v)
Remember:
8
Dr. Arun Julka
9
Dr. Arun Julka
10
Dr. Arun Julka
i <- i + 1
if (i > 15) {
break
}
11
Dr. Arun Julka
12
Dr. Arun Julka
Remember:
File Import in R
File Format Package Required
As Text .txt base
As CSV .csv readr
AS EXCEL .xlsx / .xls readxl
As SPSS, SAS, STATA .sav/ .sas/.dta haven
The apply family of functions in R provides efficient ways to apply a function to elements of an array
or list. These functions can significantly streamline your data analysis tasks.
3.10.1 apply ()
Purpose: Applies a function to the margins of an array.
Syntax: apply(X, MARGIN, FUN, ...)
13
Dr. Arun Julka
3.10.2 lapply()
Purpose: Applies a function to each element of a list.
Syntax: lapply(X, FUN, ...)
• X: The list to which the function is applied.
• FUN: The function to be applied.
• ...: Additional arguments to be passed to the function.
Example:
# Create a list of numbers
my_list <- list(1, 2, 3, 4, 5)
# Square each element
squared_list <- lapply(my_list, function(x) x^2)
print(squared_list)
3.10.3 sapply()
Purpose: Similar to lapply(), but simplifies the output to a vector or matrix.
Syntax: sapply(X, FUN, ..., simplify = TRUE)
3.10.4 vapply()
Purpose: Like sapply(), but allows you to specify the type of the output.
Syntax: vapply(X, FUN, FUN.VALUE, ..., USE.NAMES = TRUE)
3.10.5 tapply()
Purpose: Applies a function to subsets of a vector, splitting the vector based on factors.
Syntax: tapply(X, INDEX, FUN, ..., simplify = TRUE)
Example:
# Create a vector and a factor
x <- 1:10
f <- factor(rep(c("A", "B"), 5))
# Calculate the mean of x for each level of f
means <- tapply(x, f, mean)
print(means)
14
Dr. Arun Julka
4.2 R12(Boxplot)
• Create a script file by going to File>New File>R Script
• Import Dataset in the File Menu and then choose “From Text(base)”
• Select the data file iris.csv
• Click open
OR
data("iris")
View(iris)
boxplot(iris$Sepal.Length)
#Specify Heading as Main, Label of x and y axis as xlab & ylab, col as the colour of boxplot and bor-
der as a border of boxplot
boxplot(iris$Petal.Length,
main='Petal Length',
xlab='Species',
ylab='Petal Length',
col='pink',
border = 'red')
OR
boxplot(iris$Petal.Length,main='Petal Length', xlab='Species', ylab='Petal Length', col='pink', border
= 'red')
15
Dr. Arun Julka
plot(l1)
#Plot a vector using points
plot(l1,type = 'p')
#Plot a vector using lines
plot(l1,type = 'l')
#Plot a vector using both points and lines
plot(l1,type = 'o')
#Plot a vector using both points and lines with colour
l2<- c(1,2,8,13,40)
plot(l2,type = 'o', col='red')
#Plot a vector using both points and lines with colour, heading, label of the x & y axis
l3<- c(5,2,11,7,20,15,22,17,25)
plot(l3,type = 'o', col='green', main='Line Plot', xlab='points', ylab='Frequency')
16
Dr. Arun Julka
plot(iris$Sepal.Length, iris$Sepal.Width,pch=2)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=3)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=4)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=5)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=6)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=7)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=8)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=9)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=10)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=11)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=12)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=13)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=14)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=15)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=16)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=17)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=18)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=19)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=20)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=21)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=22)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=23)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=24)
plot(iris$Sepal.Length, iris$Sepal.Width,pch=25)
#plot scatter diagram in dot shape, red colour, heading Scatter Plot, Label of x & y axes sepal_length
& sepal_width
plot(iris$Sepal.Length, iris$Sepal.Width,col="red", main="Scatter Plot",xlab="sepal_length",
ylab="sepal_width", pch=20)
17
Dr. Arun Julka
18
Dr. Arun Julka
mode_z
19
Dr. Arun Julka
tribble(~X,~Y,"v",15,"w",5,"x",25,"y",20)
tribble(~A,~B,"m",11:14,"n",2:6,"o",21:25,"p",51:56)
20
Dr. Arun Julka
cor(x,y)
cor(x,y, method = "pearson")
cor(x,y, method = "kendall")
cor(x,y, method = "spearman")
21
Dr. Arun Julka
7 Citation
citation()
To cite R in publications, use:
@Manual{,
title = {R: A Language and Environment for Statistical Computing},
author = {{R Core Team}},
organization = {R Foundation for Statistical Computing},
address = {Vienna, Austria},
year = {2024},
url = {https://www.R-project.org/},
}
22
Dr. Arun Julka
8 Regression using R
8.1 Simple Regression using R
#Perfect Regression Model_1
x1<-c(1,2,3,4,5)
x2<-c(3,5,7,9,11)
reg1<-lm(x2~x1)
summary(reg1)
plot(reg1)
#Perfect Regression Model_2
x3<-c(6,7,8,9,10)
x4<-c(-4,-5,-6,-7,-9)
reg2<-lm(x4~x3)
summary(reg2)
plot(reg2)
#imperfect Regression Model
x5<-c(11,12,13,14,15)
x6<-c(8,11,-5,22,-0)
reg3<-lm(x6~x5)
summary(reg3)
plot(reg3)
23
Dr. Arun Julka
#Making Predictions
new_data <- data.frame(sqft = 1600, floor = 7)
predicted_price <- predict(model, newdata = new_data)
print(predicted_price)
24
Dr. Arun Julka
25