R Programming Lab Manual
1. Implement data frames in R. Write a program to join columns and rows in a data frame
using cbind() and rbind() in R.
Aim:
To implement data frames in R and joining columns and rows using cbind() and rbind().
Algorithm:
Step1: Create the vector objects using c()
Step2: Use cbind() to combine the vectors into a single data frame
Step3: Print the dataframe using print()
Step4: Create a dataframe using data.frame()
Step5: Print the heading using cat()
Step7: Print the data frame
Step8: Use rbind() to combine rows from both the data frame
Step9: Print the results
Program:
#Creating vector objects
Name<-c("Tejas","Gughan","Tharani","Sruthi")
Address<-c("Virginia","Singapore","Australia","India")
Marks<-c(550,560,565,570)
#Combining vectors into one data frame
info<-cbind(Name,Address,Marks)
#Printing data frame
print(info)
# Creating another data frame with similar columns
new.stuinfo<-data.frame(Name=c("Adhithya","Parnika"),Address=c("Tamilnadu","Chantily"),
Marks=c("578","580"),stringsAsFactors=FALSE)
#Printing a header.
cat("# # # The Second data frame\n")
#Printing the data frame.
print(new.stuinfo)
# Combining rows form both the data frames.
all.info<-rbind(info,new.stuinfo)
# Printing a header.
cat("# # # The combined data frame\n")
# Printing the result.
print(all.info)
Output:
#Creating vector objects
> Name<-c("Tejas","Gughan","Tharani","Sruthi")
> Address<-c("Virginia","Singapore","Australia","India")
> Marks<-c(550,560,565,570)
> #Combining vectors into one data frame
> info<-cbind(Name,Address,Marks)
> #Printing data frame
> print(info)
Name Address Marks
[1,] "Tejas" "Virginia" "550"
[2,] "Gughan" "Singapore" "560"
[3,] "Tharani" "Australia" "565"
[4,] "Sruthi" "India" "570"
> # Creating another data frame with similar columns
> new.stuinfo<-
data.frame(Name=c("Adhithya","Parnika"),Address=c("Tamilnadu","Chantily"),
+ Marks=c("578","580"),stringsAsFactors=FALSE)
> #Printing a header.
> cat("# # # The Second data frame\n")
# # # The Second data frame
> #Printing the data frame.
> print(new.stuinfo)
Name Address Marks
1 Adhithya Tamilnadu 578
2 Parnika Chantily 580
> # Combining rows form both the data frames.
> all.info<-rbind(info,new.stuinfo)
> # Printing a header.
> cat("# # # The combined data frame\n")
# # # The combined data frame
> # Printing the result.
> print(all.info)
Name Address Marks
1 Tejas Virginia 550
2 Gughan Singapore 560
3 Tharani Australia 565
4 Sruthi India 570
5 Adhithya Tamilnadu 578
6 Parnika Chantily 580
Result:
Thus, data frames have been successfully implemented in R.
2. Implement various String Manipulation functions in R.
Aim:
To implement various string manipulation functions in R.
Algorithm:
Step1: String manipulation functions in R, including operations such as concatenation,
splitting, substring extraction, replacement, and case conversion. The string
manipulations functions are:
1. Use paste() to Concatenate strings
2. Use strsplit() to split Strings ()
3. Use sub(), gsub() to replace a substring
4. Use trimws() to trim leading and trailing whitespaces
5. Use upper() and lower() to convert thestring into upper and lower case
6. Use substr() to extract a substring
7. Use nchar() to retrieve the length of a string
8. Use grepl() to check if a pattern exists in a string
9. Use regexp() to find the position of a substring
10. Use sprint() to pad a string with leading zeros, to format strings with variables
11. Use paste() to combine a list of strings into one
12. Use regexpr(), regmatches() to match a regular expression pattern
Program:
# Concatenate two strings
str1 <- "Hello"
str2 <- "World"
result <- paste(str1, str2, sep = " ")
print(result) # "Hello World"
# Split a string by spaces
str <- "This is an example"
result <- strsplit(str, " ")
print(result) # List of words
# Replace a substring
str<-"Hello World"
result<-sub("World","R",str)
print(result) # "Hello R"
# Replace all occurrences of a substring
str <- "Hello World. Welcome to the World."
result <- gsub("World", "R", str)
print(result) # "Hello R. Welcome to the R."
# Trim leading and trailing whitespace
str <- " Hello World "
result <- trimws(str)
print(result) # "Hello World"
#Convert to uppercase
str<-"Hello World"
result<-toupper(str)
print(result) # "HELLO WORLD"
# Convert to lowercase
str <- "Hello World"
result <- tolower(str)
print(result) # "hello world"
# Extract a substring
str <- "Hello World"
result <- substr(str, 1, 5)
print(result) # "Hello"
# Get the length of a string
str <- "Hello World"
result <- nchar(str)
print(result) # 11
# Check if a pattern exists in a string
str <- "Hello World"
pattern <- "World"
result <- grepl(pattern, str)
print(result) # TRUE
# Find the position of a substring
str <- "Hello World"
pattern <- "World"
result <- regexpr(pattern, str)
print(result) # 7
# Pad a string with leading zeros
str <- "42"
result <- sprintf("%05d", as.numeric(str))
print(result) # "00042"
# Combine a list of strings into one
str_list <- c("Hello", "World", "in", "R")
result <- paste(str_list, collapse = " ")
print(result) # "Hello World in R"
# Format strings with variables
name <- "John"
age <- 30
result <- sprintf("My name is %s and I am %d years old.", name, age)
print(result) # "My name is John and I am 30 years old."
# Match a regular expression pattern
str <- "My email is example@domain.com"
pattern<-"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
result<-regexpr(pattern,str)
print(regmatches(str,result)) # "example@domain.com"
Output:
# Concatenate two strings
> str1 <- "Hello"
> str2 <- "World"
> result <- paste(str1, str2, sep = " ")
> print(result) # "Hello World"
[1] "Hello World"
>
> # Split a string by spaces
> str <- "This is an example"
> result <- strsplit(str, " ")
> print(result) # List of words
[[1]]
[1] "This" "is" "an" "example"
>
> # Replace a substring
> str<-"Hello World"
> result<-sub("World","R",str)
> print(result) # "Hello R"
[1] "Hello R"
>
> # Replace all occurrences of a substring
> str <- "Hello World. Welcome to the World."
> result <- gsub("World", "R", str)
> print(result) # "Hello R. Welcome to the R."
[1] "Hello R. Welcome to the R."
>
> # Trim leading and trailing whitespace
> str <- " Hello World "
> result <- trimws(str)
> print(result) # "Hello World"
[1] "Hello World"
>
> #Convert to uppercase
> str<-"Hello World"
> result<-toupper(str)
> print(result) # "HELLO WORLD"
[1] "HELLO WORLD"
>
> # Convert to lowercase
> str <- "Hello World"
> result <- tolower(str)
> print(result) # "hello world"
[1] "hello world"
>
> # Extract a substring
> str <- "Hello World"
> result <- substr(str, 1, 5)
> print(result) # "Hello"
[1] "Hello"
>
> # Get the length of a string
> str <- "Hello World"
> result <- nchar(str)
> print(result) # 11
[1] 11
>
> # Check if a pattern exists in a string
> str <- "Hello World"
> pattern <- "World"
> result <- grepl(pattern, str)
> print(result) # TRUE
[1] TRUE
>
> # Find the position of a substring
> str <- "Hello World"
> pattern <- "World"
> result <- regexpr(pattern, str)
> print(result) # 7
[1] 7
attr(,"match.length")
[1] 5
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
>
> # Pad a string with leading zeros
> str <- "42"
> result <- sprintf("%05d", as.numeric(str))
> print(result) # "00042"
[1] "00042"
>
> # Combine a list of strings into one
> str_list <- c("Hello", "World", "in", "R")
> result <- paste(str_list, collapse = " ")
> print(result) # "Hello World in R"
[1] "Hello World in R"
>
> # Format strings with variables
> name <- "John"
> age <- 30
> result <- sprintf("My name is %s and I am %d years old.", name, age)
> print(result) # "My name is John and I am 30 years old."
[1] "My name is John and I am 30 years old."
>
> # Match a regular expression pattern
> str <- "My email is example@domain.com"
> pattern<-"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
> result<-regexpr(pattern,str)
> print(regmatches(str,result)) # "example@domain.com"
[1] "example@domain.com"
Result:
Thus, string manipulation functions have been successfully implemented in R.
3. Implement the following data structures in R(Vectors, Lists, Data frames)
Aim:
To implement Vectors, Lists and Data frames in R.
Algorithm:
Step1: Open R or RStudio.
Step2: Enter the code into the R console or a script file
1. Create numeric vectors with its variant types
2. Create Character vectors
3. Create Logical vectors
4. Create list
5. Create a data frame
Step 3: Carry out necessary manipulations
Step 4: Execute the code to generate the results.
Program:
numeric_vector<-c(1,2,3,4,5)
print(numeric_vector)
numeric_vector1<-c(1.2,2.3,3.4,4.5,5.6)
print(numeric_vector1)
numeric_vector2<-c(1:10)
print(numeric_vector2)
numeric_vector3<-c(1.2:10.2)
print(numeric_vector3)
numeric_vector4<-c(1.2:11.0)
print(numeric_vector4)
numeric_vector5<-c(1.2:6.5)
print(numeric_vector5)
numeric_vector6<-seq(3,7,by=0.3)
print(numeric_vector6)
character_vector<-c("apple","banana","cherry")
print(character_vector)
character_vector1<-c("a","b","c")
print(character_vector1)
character_vector2<-c('a','b','c')
print(character_vector2)
character_vector3<-c('apple','banana','cherry')
print(character_vector3)
logical_vector<-c(TRUE,FALSE,TRUE,TRUE)
print(logical_vector)
logical_vector1<-c(TRUE,FALSE,TRUE,TRUE)
print(logical_vector1)
logical_vector2<-c("TRUE",FALSE,TRUE,TRUE)
print(logical_vector2)
logical_vector3<-c("true",FALSE,TRUE,TRUE)
print(logical_vector3)
logical_vector<-c("True",FALSE,TRUE,TRUE)
print(logical_vector)
my_list <-list(name="john",age=25,scores=c(90,85,88),result=TRUE)
print(my_list)
print(my_list[0])
print(my_list[1:2])
my_list[1]<-"Ryan"
print(my_list)
length(my_list)
"john" %in% my_list
"Ryan" %in% my_list
append(my_list,"joe")
my_list1=my_list[-2]
print(my_list1)
my_list2=c(my_list,my_list1)
print(my_list2)
for(i in my_list)
print(i)
typeof(my_list)
print(typeof(my_list))
my_data_frame<-data.frame(
Name=c("john","jane","Doe"),
Age=c(25,30,22),
Scores=c(90,85,88)
)
print(my_data_frame)
summary(my_data_frame)
my_data_frame$Age
Output:
> numeric_vector<-c(1,2,3,4,5)
> print(numeric_vector)
[1] 1 2 3 4 5
>
> numeric_vector1<-c(1.2,2.3,3.4,4.5,5.6)
> print(numeric_vector1)
[1] 1.2 2.3 3.4 4.5 5.6
>
> numeric_vector2<-c(1:10)
> print(numeric_vector2)
[1] 1 2 3 4 5 6 7 8 9 10
>
> numeric_vector3<-c(1.2:10.2)
> print(numeric_vector3)
[1] 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2
>
> numeric_vector4<-c(1.2:11.0)
> print(numeric_vector4)
[1] 1.2 2.2 3.2 4.2 5.2 6.2 7.2 8.2 9.2 10.2
>
> numeric_vector5<-c(1.2:6.5)
> print(numeric_vector5)
[1] 1.2 2.2 3.2 4.2 5.2 6.2
>
> numeric_vector6<-seq(3,7,by=0.3)
> print(numeric_vector6)
[1] 3.0 3.3 3.6 3.9 4.2 4.5 4.8 5.1 5.4 5.7 6.0 6.3 6.6 6.9
>
> character_vector<-c("apple","banana","cherry")
> print(character_vector)
[1] "apple" "banana" "cherry"
>
> character_vector1<-c("a","b","c")
> print(character_vector1)
[1] "a" "b" "c"
>
> character_vector2<-c('a','b','c')
> print(character_vector2)
[1] "a" "b" "c"
>
> character_vector3<-c('apple','banana','cherry')
> print(character_vector3)
[1] "apple" "banana" "cherry"
>
> logical_vector<-c(TRUE,FALSE,TRUE,TRUE)
> print(logical_vector)
[1] TRUE FALSE TRUE TRUE
>
> logical_vector1<-c(TRUE,FALSE,TRUE,TRUE)
> print(logical_vector1)
[1] TRUE FALSE TRUE TRUE
>
> logical_vector2<-c("TRUE",FALSE,TRUE,TRUE)
> print(logical_vector2)
[1] "TRUE" "FALSE" "TRUE" "TRUE"
>
> logical_vector3<-c("true",FALSE,TRUE,TRUE)
> print(logical_vector3)
[1] "true" "FALSE" "TRUE" "TRUE"
>
> logical_vector<-c("True",FALSE,TRUE,TRUE)
> print(logical_vector)
[1] "True" "FALSE" "TRUE" "TRUE"
>
> my_list <-list(name="john",age=25,scores=c(90,85,88),result=TRUE)
> print(my_list)
$name
[1] "john"
$age
[1] 25
$scores
[1] 90 85 88
$result
[1] TRUE
> print(my_list[0])
named list()
> print(my_list[1:2])
$name
[1] "john"
$age
[1] 25
> my_list[1]<-"Ryan"
> print(my_list)
$name
[1] "Ryan"
$age
[1] 25
$scores
[1] 90 85 88
$result
[1] TRUE
> length(my_list)
[1] 4
> "john" %in% my_list
[1] FALSE
> "Ryan" %in% my_list
[1] TRUE
> append(my_list,"joe")
$name
[1] "Ryan"
$age
[1] 25
$scores
[1] 90 85 88
$result
[1] TRUE
[[5]]
[1] "joe"
> my_list1=my_list[-2]
> print(my_list1)
$name
[1] "Ryan"
$scores
[1] 90 85 88
$result
[1] TRUE
> my_list2=c(my_list,my_list1)
> print(my_list2)
$name
[1] "Ryan"
$age
[1] 25
$scores
[1] 90 85 88
$result
[1] TRUE
$name
[1] "Ryan"
$scores
[1] 90 85 88
$result
[1] TRUE
> for(i in my_list)
+ print(i)
[1] "Ryan"
[1] 25
[1] 90 85 88
[1] TRUE
> typeof(my_list)
[1] "list"
> print(typeof(my_list))
[1] "list"
>
>
> my_data_frame<-data.frame(
+ Name=c("john","jane","Doe"),
+ Age=c(25,30,22),
+ Scores=c(90,85,88)
+)
> print(my_data_frame)
Name Age Scores
1 john 25 90
2 jane 30 85
3 Doe 22 88
> summary(my_data_frame)
Name Age Scores
Length:3 Min. :22.00 Min. :85.00
Class :character 1st Qu.:23.50 1st Qu.:86.50
Mode :character Median :25.00 Median :88.00
Mean :25.67 Mean :87.67
3rd Qu.:27.50 3rd Qu.:89.00
Max. :30.00 Max. :90.00
> my_data_frame$Age
[1] 25 30 22
Result:
Thus, Vectors, Lists and Data frames have been successfully implemented in R.
4. Program to read a csv file, analyze and create pie and bar chart for the data in the file using R.
Aim: To read a CSV file, analyze categorical data, and visualize the results using pie and bar
charts.
Algorithm:
Step1: Open R or RStudio.
Step2: Enter the code into the R console or a script file.
Loading Libraries:
• Import necessary libraries: ggplot2 for data visualization, dplyr for data manipulation,
and scales for formatting axis labels.
Reading Data:
• Read the CSV file named "data.csv" into a data frame named data.
• Display the first few rows of the data to understand its structure and content.
Creating a Bar Chart:
• Create a bar chart using ggplot() with Category on the x-axis and Value on the y-axis.
• Customize the chart's appearance by setting the fill color to "green" and removing
gridlines.
• Add labels to the bars using geom_text().
• Adjust the chart's theme to your preferences.
Creating a Pie Chart:
• Calculate the proportion of each Value relative to the total sum.
• Create a new column label to combine category names with their corresponding
percentages.
• Create a pie chart using ggplot() with Proportion as the y-axis and Category as the fill.
• Set the chart's coordinates to polar coordinates using coord_polar().
• Customize the chart's appearance by setting the fill palette and removing axes and
background.
• Add labels to the pie slices using geom_text().
Displaying Charts:
• Print the bar chart and pie chart to visualize the data.
Step3:
Execute the code to perform the analysis and generate the visualizations.
Program:
# Load necessary libraries
library(ggplot2)
library(dplyr)
library(scales)
# Read the CSV file
data<-read.csv("E:/SK/RLab-MCA/data.csv")
# View the first few rows of the data
print(head(data))
# Create a bar chart
bar_plot <- ggplot(data, aes(x = Category, y = Value)) +
geom_bar(stat = "identity", fill = "green") +
labs(title = "Bar Chart of Values by Category",
x = "Category",
y = "Value") +
theme_minimal()+
theme(panel.grid.major = element_blank(), # Remove major gridlines
panel.grid.minor = element_blank()) + # Remove minor gridlines
geom_text(aes(label = signif(Value)), nudge_y = 3)+
theme(axis.line = element_line(size = 0.5, colour = "black")) # Customize axes line
# Display the bar chart
print(bar_plot)
# Create a pie chart
# Pie chart requires the 'Value' column to be in proportions of the total
data <- data %>%
mutate(Proportion = Value / sum(Value),
label = paste0(Category, ": ", round(Proportion*100, 1), "%"))
print(data)
# For pie chart, use geom_bar with coord_polar
pie_plot <- ggplot(data, aes(x = "", y = Proportion, fill = Category)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y",start=0)+
labs(title = "Pie Chart of Values by Category") +
scale_fill_brewer(palette="Set1")+
#scale_fill_grey()+
theme_void() + # Remove axes and background
theme(legend.title = element_blank())+
geom_text(aes(label = label), position = position_stack(vjust = 0.5), color = "white")
# Display the pie chart
print(pie_plot)
Output:
Result:
Thus, reading, analysing the categorical data and visualize the results using pie and bar charts
from a CSV file, have been successfully implemented in R.
5. Perform statistical analysis for a sample dataset using R
Aim: To perform statistical analysis for a sample dataset using R
Algorithm:
Step1: Open R or RStudio.
Step2: Enter the code into the R console or a script file.
(i) Loading Data: The dataset is loaded from a specified path. Ensure the path is correct on
your system.
(ii) Exploratory Data Analysis:
head(data): Displays the first six rows.
nrow(data): Counts the number of rows.
str(data): Displays the structure of the dataset.
summary(data): Provides summary statistics for each column.
(iii) Missing Values: colSums(is.na(data)) checks for missing values in each column.
(iv) Visualization:
(i) A scatter plot is created using ggplot2, showing the relationship between petal length
and petal width, colored by species.
(Iii) A boxplot is generated to visualize petal length distribution across species.
(v) Statistical Analysis:
(i) Mean Calculation: The mean sepal length is calculated for each species.
(ii) Correlation Matrix: The correlation between numeric variables is calculated.
(iii) Linear Regression: A linear model is fitted with sepal length as the response
variable and all other variables as predictors.
Step3: Execute the code to perform the analysis and generate the visualizations.
Program:
#install.packages("ggplot2")
library(ggplot2)
data<-read.csv("E:/SK/RLab-MCA/iris_dataset.csv") # Loading iris flower data as example data
set
head(iris) # Printing first six rows of iris data set
nrow(iris) # to display the total no. of rows in dataset
str(iris) # to display the data structure of dataset
summary(iris) # Return summary statistics
colSums(is.na(iris)) #To check for missing values (NAs) in each column of the dataset
options(repr.plot.width=7,repr.plot.height=7) # Representations of "recordedplot" instances such as
plotting area width, height in inches
iris_plot<-
ggplot(data=iris,x1=aes(y=Petal.Length,x=Petal.Width,col=Species))+geom_point(aes(xlab("Petal
Width")+ylab("Petal Length"))) # To create a scatter plot using ggplot2
aggregate(Sepal.Length ~ Species, iris, mean) # Return mean by group, To compute the mean of
the Sepal.Length variable for each level of the Species factor in the iris dataset
cor(iris[ , 1:4]) # Return correlation matrix, To compute the correlation matrix for the first four
columns of the iris dataset, which contain numerical measurements of the flower such as
Sepal.Length,Sepal.Width, Petal.Length, Petal.Width
summary(lm(Sepal.Length ~ ., iris)) # Results of linear regression, Fit the linear model and display
the summary Coefficients, Residuals, Multiple R-squared and Adjusted R-squared, F-statistic
options(repr.plot.width=7,repr.plot.height=7)
iris_box<-ggplot(iris,aes(y=Petal.Length,x=Petal.Width,col=Species))+geom_boxplot()+xlab("Petal
Width")+ylab("Petal Length")
print(iris_box) # To display the boxplot stored in the variable iris_box
boxplot(Petal.Length~Species, data=iris, main='Petal Length by Species', xlab='Species', ylab='Petal
Length', col='steelblue', border='black') # creates a boxplot using base R plotting functions
Output:
Result:
Thus, statistical analysis for a sample dataset have been successfully implemented in R.