0% found this document useful (0 votes)
109 views44 pages

RSTUDIO

The document outlines a practical file for a subject called Basics of Data Management with "R" belonging to the first semester of the MBA program in Business Analytics at Dr. A.P.J. Abdul Kalam Technical University in Lucknow, Uttar Pradesh, India for the academic year 2022-2023. It includes an index of topics to be covered in the course ranging from learning basic R syntax to data visualization with ggplot2 and performing data analysis tasks with dplyr and tidyr packages. Assignments involving R scripts are to be submitted by the student to the subject teacher for evaluation and marking.

Uploaded by

samarth agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views44 pages

RSTUDIO

The document outlines a practical file for a subject called Basics of Data Management with "R" belonging to the first semester of the MBA program in Business Analytics at Dr. A.P.J. Abdul Kalam Technical University in Lucknow, Uttar Pradesh, India for the academic year 2022-2023. It includes an index of topics to be covered in the course ranging from learning basic R syntax to data visualization with ggplot2 and performing data analysis tasks with dplyr and tidyr packages. Assignments involving R scripts are to be submitted by the student to the subject teacher for evaluation and marking.

Uploaded by

samarth agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 44

AffiliatedtoDr.A.P.J.

AbdulKalamTechnicalUniversity,Lucknow,UttarPradesh

PRACTICAL FILE

PROGRAM:- MBA(BUSINESS ANALYTICS)

SEMESTER-1

ACADEMIC YEAR:- 2022-2023

SUBJECT:- BASICS OFDATA MANAGEMENT WITH “R”

SUBJECT CODE:- KMBA152

SUBMITTED BY:- SUBMITTED TO:-


AKANKSHA KUMARI MS.NEETU SINGH
INDEX

Sr. No. TOPICS T SIGN COMMENTS

1 Learn the basics of R Syntax

1.1 R Script for Arithmetic Operators

1.2 R Script for Logical Operators

1.3 R Script for Relational Operators.

1.4 R Script for Conditional Statements.

1.5 R Scripts for Looping.

1.6 R Scripts for User-Defined Functions.

1.7 R Scripts for Data Frames

2 Learn how to organize and modify data in R using data


frames and dplyr

2.1 R Script for Data Manipulation with the help of


dplyr(filter,distinct,arrange,select,rename,mutate,trans
mutate,summarize) functions

2.2 R Script for Data Manipulation with the help of tidyr


package(gather,separate,unite,spread) functions

3 Learn how to prepare data for analysis in R using dplyr


and tidyr
INDEX

Sr. No. TOPICS T SIGN COMMENTS

4 Learn the basics of how to create visualizations using


the popular R package ggplot2

4.1 R Script for Summary of Data Set

4.2 R Script for Data Layers

4.3 R Script for Aesthetic Layer

4.4 R Script for Geometric Layer

4.5 R Script for Adding Size,Colour and Shape

4.6 R Script for Histogram Plot

4.7 R Script for Facet Layer

4.8 R Script for Statistics Layer

4.9 R Script for Coordinates Layer

4.10 R Script for Coord_cartesian()

4.11 R Script for Theme Layer

5 Learn the basics of aggregate functions in R with dplyr,


which let us calculate quantities that describe groups of
data

5.1 R script to create with 4 columns and group with


subjects and get the aggregates like minimum, sum, and
maximum.

5.2 R Script to create with 4 columns and group with


subjects and get the average (mean)

Sr. No. TOPICS T SIGN COMMENTS

6 Learn the basics of joining tables together in R with


dplyr

6.1 R Script for Inner Join

6.2 R Script for Left Join

6.3 R Script for Right Join

6.4 R Script for Full Join

6.5 R Script for Semi Join

6.6 R Script for Anti Join

7 Learn to use R or manually calculate the mean, median,


and mode of real-world datasets

7.1 R Script for importing data using read.csv and find


mean,median and mode value.

8 Learn how to quantify the spread of the dataset


by calculating the variance and standard
deviation in R
# R Arithmetic Operators Example for integers

INPUT:
a <- 7.5
b <- 2

print ( a+b ) #addition


print ( a-b ) #subtraction
print ( a*b ) #multiplication
print ( a/b ) #Division
print ( a%%b ) #Reminder
print ( a%/%b ) #Quotient
print ( a^b ) #Power of

OUTPUT:

INPUT:

# R Operators - R Logical Operators Example for basic logical elements

a <- 0 # logical FALSE


b <- 2 # logical TRUE

print ( a & b ) # logical AND element wise


print ( a | b ) # logical OR element wise
print ( !a ) # logical NOT element wise
print ( a && b ) # logical AND consolidated for all elementsprint( a || b ) #
logical OR consolidated for all elements

OUTPUT:

INPUT:

# R Operators - R Relational Operators Example for Numbers

a <- c(7.5, 3, 5)

b <- c(2, 7, 0)
print ( a<b ) # less than

print ( a>b ) # greater than

print ( a==b ) # equal to

print ( a<=b ) # less than or equal to

print ( a>=b ) # greater than or equal to

OUTPUT:

print ( a!=b ) # not equal to

INPUT:

# R scripts for conditional Statements


x <- -3
if (x < 0) {
print("x is a negative number")
}
OUTPUT:
INPUT:

R Scripts For Looping:

# R program to demonstrate the use of for loop

# using for loop


for (valin1: 5)
{
# statement
print(val)
}

OUTPUT:

INPUT:

# R program to illustrate
# application of for loop

# assigning strings to the vector


week < - c('Sunday',
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday')

# using for loop to iterate


# over each string in the vector
for (day inweek)
{

# displaying each string in the vector


print(day)
}

OUTPUT:

INPUT:
# R program to demonstrate the use of while loop

val = 1

# using while loop


while (val<= 5)
{
# statements
print(val)
val = val + 1
}

OUTPUT:

INPUT:
# R Scripts For User-Defined Functions

vec1 <- c(28,64,63,43,56,46,87,34,73)

vec2 <- c(53,37,29,45,68,33,76,49,30)

vec3 <- c(12,44,36,75,36,93,34,64,18)


vec1 <- ((vec1-min(vec1))/(max(vec1)-min(vec1)))

vec2 <- ((vec2-min(vec2))/(max(vec2)-min(vec2)))

vec3 <- ((vec3-min(vec3))/(max(vec3)-min(vec3)))

vec1

OUTPUT:

# R Scripts for Data Frames

# R program to create dataframe

# creating a data frame


friend.data<- data.frame(
friend_id = c(1:5),
friend_name = c("Sachin", "Sourav",
"Dravid", "Sehwag",
"Dhoni"),
stringsAsFactors = FALSE
)
# print the data frame
print(friend.data)

OUTPUT:
Learn how to organize and modify data in R using data frames and dplyr

.R Script for Data Manipulation With the help of dplyr package

(filter,distinct ,arrange, select, rename, mutate,trans mutate, summarize)functions

# import dplyr package


library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))

# fetch players who scored more


# than 100 runs
filter(stats, runs>100)

OUTPUT:

# import dplyr package

library(dplyr)
# create a data frame
stats <- data.frame(player=c('A', 'B', 'C', 'D', 'A', 'A'),
runs=c(100, 200, 408, 19, 56, 100),
wickets=c(17, 20, NA, 5, 2, 17))

# removes duplicate rows


distinct(stats)
#remove duplicates based on a column
distinct(stats, player, .keep_all = TRUE)

OUTPUT:

# import dplyr package


library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))

# ordered data based on runs


arrange(stats, runs)

OUTPUT:

# import dplyr package


library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))

# fetch required column data


select(stats, player,wickets)

OUTPUT:

# import dplyr package


library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, NA, 5))

# renaming the column


rename(stats, runs_scored=runs)
OUTPUT:

# import dplyr package


library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, 7, 5))

# add new column avg


mutate(stats, avg=runs/4)

# drop all and create a new column


transmute(stats, avg=runs/4)

OUTPUT:

INPUT:
# import dplyr package
library(dplyr)

# create a data frame


stats <- data.frame(player=c('A', 'B', 'C', 'D'),
runs=c(100, 200, 408, 19),
wickets=c(17, 20, 7, 5))

# summarize method
summarize(stats, sum(runs), mean(runs))

Output:

R Script for Data Manipulation With the help of tidyr Package


(Seprate, Unite, Spread) functions

INPUT:

# import tidyr package


library(tidyr)
long <- tidy_dataframe %>%
gather(Group, Frequency,
Group.1:Group.3)

# use separate() function to make data wider


separate_data<- long %>%
separate(Group, c("Allotment",
"Number"))

# print the wider format


separate_data

OUTPUT:

INPUT:

long <- tidy_dataframe %>%


gather(Group, Frequency,
Group.1:Group.3)
# use separate() function to make data wider
separate_data<- long %>%
separate(Group, c("Allotment",
"Number"))

# use unite() function to glue


# Allotment and Number columns
unite_data<- separate_data %>%
unite(Group, Allotment,
Number, sep = ".")

# print the new data frame


unite_data
OUTPUT:
INPUT:
# use unite() function to make data wider
back_to_wide<- unite_data %>%
spread(Group, Frequency)

# print the new data frame


back_to_wide
OUTPUT:
Learn the basics of how to create visualization using the popular R
package ggplot2
.R script for summary of data set

INPUT:

# Installing the package


install.packages("dplyr")

# Loading package
library(dplyr)

# Summary of dataset in package


summary(mtcars)

OUTPUT:
INPUT:
# Data Layer
ggplot(data = mtcars)
OUTPUT:
INPUT:
# Aesthetic Layer
ggplot(data = mtcars, aes(x = hp, y = mpg, col = disp))
OUTPUT:
INPUT:
# Geometric layer
ggplot(data = mtcars,
aes(x = hp, y = mpg, col = disp)) + geom_point()
INPUT:
# Facet Layer
p <- ggplot(data = mtcars,
aes(x = hp, y = mpg,
shape = factor(cyl))) + geom_point()

# Separate rows according to transmission type


p + facet_grid(am ~ .)

# Separate columns according to cylinders


p + facet_grid(. ~ cyl)
OUTPUT:
INPUT:

# Statistics layer
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() +
stat_smooth(method = lm, col = "red")

OUTPUT:
INPUT:
# Coordinates layer: Control plot dimensions
ggplot(data = mtcars, aes(x = wt, y = mpg)) +
geom_point() +
stat_smooth(method = lm, col = "red") +
scale_y_continuous("mpg", limits = c(2, 35),
expand = c(0, 0)) +
scale_x_continuous("wt", limits = c(0, 25),
expand = c(0, 0)) + coord_equal()
OUTPUT:
INPUT:
# Add coord_cartesian() to proper zoom in
ggplot(data = mtcars, aes(x = wt, y = hp, col = am)) +
geom_point() + geom_smooth() +
coord_cartesian(xlim = c(3, 6))

OUTPUT:
INPUT:
# Theme layer
ggplot(data = mtcars, aes(x = hp, y = mpg)) +
geom_point() + facet_grid(. ~ cyl) +
theme(plot.background = element_rect(
fill = "black", colour = "gray"))
OUTPUT:
Learn the basics of aggregate functions in R with dplyr, Which let us
calculate quantities that describe groups of data
.R Script Using aggregate function to summarize in one variable and
Group by one variable.

INPUT:
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))

# get sum of marks by grouping with subjects


aggregate(marks~ subjects, data, FUN=sum)

OUTPUT:
INPUT: Multiple Value
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))

# get sum of marks by grouping with subjects and names


aggregate(marks~ subjects+names, data, FUN=sum)

OUTPUT:
INPUT: Multiple Value and Group by one variable
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))

# get sum of marks and id by grouping with subjects


aggregate(cbind(marks, id)~ subjects, data, FUN=sum)

OUTPUT:
INPUT: Multiple value and Group by multiple variable
# create the dataframe with 4 columns
data = data.frame(subjects=c("java", "python", "java",
"java", "php", "php"),
id=c(1, 2, 3, 4, 5, 6),
names=c("manoj", "sai", "mounika",
"durga", "deepika", "roshan"),
marks=c(89, 89, 76, 89, 90, 67))

# get sum of marks and id by grouping


# with subjects and names
aggregate(cbind(marks, id)~ subjects+names, data, FUN=sum)

OUTPUT:
Learn the basics of joining tables together in R with dplyr
R script for Inner Join
INPUT: R Script for inner join
# load the library
library("dplyr")

# create dataframe with 1 to 5 integers


gfg1 < -data.frame(ID=c(1: 5))

# create dataframe with 4 to 8 integers


gfg2 < -data.frame(ID=c(4: 8))

# perform inner join


inner_join(gfg1, gfg2, by="ID")

OUTPUT:
INPUT: R Script for Left Join
# load the library
library("dplyr")

# create the dataframes


gfg1<-data.frame(ID=c(1:5))

gfg2<-data.frame(ID=c(4:8))

# perform left join


left_join(gfg1,gfg2, by = "ID")

OUTPUT:
INPUT: Right Join
# load the library
library("dplyr")

# create dataframes
gfg1<-data.frame(ID=c(1:5))

gfg2<-data.frame(ID=c(4:8))

# perform right join


right_join(gfg1,gfg2, by = "ID")

OUTPUT:
INPUT: Full Join
# load library
library("dplyr")

# create dataframe
gfg1<-data.frame(ID=c(1:5))
gfg2<-data.frame(ID=c(4:8))

# perform full join


full_join(gfg1,gfg2, by = "ID")

OUTPUT:
INPUT: Semi Join
# load the library
library("dplyr")

# create the dataframes


gfg1<-data.frame(ID=c(1:5))
gfg2<-data.frame(ID=c(4:8))

# perform semijoin
semi_join(gfg1,gfg2, by = "ID")

OUTPUT:
INPUT: Anti Join
# load the library
library("dplyr")

# create the dataframes


gfg1<-data.frame(ID=c(1:5))
gfg2<-data.frame(ID=c(4:8))

# perform anti join


anti_join(gfg1,gfg2, by = "ID")

OUTPUT:
Learn to Use R Or manually Calculate the Mean, median, and mode of
real-world datasets
.R Script for importing data using read.CSV

INPUT:

# import and store the dataset in data1

data1 <- read.csv(file.choose(), header=T)

# display the data


data1

OUTPUT:
INPUT:R Script for importing data using read.csv and find mean,
median, and mode value

# R program to import data into R

# Import the data using read.csv()


myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors=F)
# Print the first 6 rows
print(head(myData))

OUTPUT:
INPUT:
# R program to illustrate

# Descriptive Analysis

# Import the data using read.csv()


myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors=F)

# Compute the mean value


mean = mean(myData$Age)
print(mean)

OUTPUT:
INPUT:
# Import the data using read.csv()
myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors=F)

mode = function(){
return(sort(-table(myData$Age))[1])
}

mode()

OUTPUT:

INPUT:

# R program to illustrate
# Descriptive Analysis

# Import the library


library(modeest)

# Import the data using read.csv()


myData = read.csv("CardioGoodFitness.csv",
stringsAsFactors=F)

# Compute the mode value


mode = mfv(myData$Age)
print(mode)

OUTPUT:

You might also like