R Concepts - 25092018 PDF
R Concepts - 25092018 PDF
R Concepts - 25092018 PDF
Sandip Mukhopadhyay
What is R?
S Version 1
S Version 2
S Version 3
S Version 4
Developed 30 years ago for research
applied to the high-tech industry
HISTORY AND EVOLUTION OF R
The regular development of R
1990’s: R developed
concurrently with S
1993: R made public
Acceleration of R development
R-Help and R-Devl mailing-lists
Creation of the R Core Group
HISTORY AND EVOLUTION OF R
• Lack of scalability
• Less acceptance in Industrial application
compared to its peer Python
• Application of R is limited to data-science,
while Python has wider usage
R Studio
R studio is a widely used IDE for writing, testing and executing R
codes. There are various parts in a typical screen of R studio IDE.
These are:
Console see the output
Syntax editor when we can write the code
Workspace tab where users can see active objects from the code written in
the console
History tab that shows a history of commands used in the code
File tab where folders and files can be seen in the default workspace
Plot tab shows graphs
Packages tab shows add-ons and packages required for running specific
process(s)
Help tab contains the information on IDE, commands, etc.
Syntax editor History
packageDescription(“ggplot2”)
help(package = “ggplot2”)
find.package(“ggplot2”)
install.packages(“ggplot2”)
Some basics about R coding
• There are over 1,000 functions at the core of R, and new R functions
are created all the time.
• Each R function comes with its own help page. To access a function’s
help page, type a question mark followed by the function’s name in
the console.
Reference materials / e-books
1. R-blogs : https://www.r-bloggers.com
2. R tutorials :
https://www.programiz.com/r-
programming/
3. R Video book : https://www.r-
bloggers.com/in-depth-introduction-to-
machine-learning-in-15-hours-of-expert-
videos/
4. Stackoverflow
5. R pubs
Reference materials / other analytics resources
1. www.analyticsmag.com
2. www.kdnuggets.com
3. www.analyticsbridge.com
4. www.datapine.com
5. www.datasciencecentral.com
Operators in R
Commonly used function in R
• if...else Statement
• switch statement
• "For" loop
• "While" loop
• “repeat” loop
• Next statement
• Break statement
Commonly used function in R
Commonly used function in R
Summary : what we have learnt
Example:
f <- 3 # numeric
f
g <- "US" # text
g
h <- TRUE # logical
h
Types of Data Structure in R : Vector
The c function (c is short for combine) creates a new vector consisting of three
values: 4, 7, and 8.
Vectors
A vector cannot hold values of different data types.
Consider the example below. We are trying to place
integer, string and boolean values together in a
vector.
Example:
vector <- c(1,2,3,4)
f <- matrix(vector, nrow=2, ncol=2)
f
[,1] [,2]
[1,] 1 3
[2,] 2 4
Matrices
To access the 2nd column of the matrix, simply provide the column number and
omit the row number.
To access the 2nd and 3rd columns of the matrix, simply provide the column
numbers and omit the row number.
Types of Data Structure in R : Arrays
Arrays - Similar to matrices; these can have more than two dimensions.
a <- matrix(c(1,1,1,1) , 2, 2)
b <- matrix(c(2,2,2,2) , 2, 2)
x <- array(c(a,b), c(2,2,2))
Types of Data Structure in R : Data frames
Data frames - These are the most commonly used data structures in R.
A data frame is similar to a general matrix, but it can contain different
modes of data, such as a number and character.
Lists - These are the most complex data structures. A list may contain a
combination of vectors, matrices, data frames, and even other lists.
Example:
vec <- c(1,2,3,4)
mat <- matrix(vec,2,2)
x <- list (vec, mat)
Data Frame Access
• dim()
dim()function is used to obtain dimensions of a data frame.
• nrow()
dim()function is used to obtain dimensions of a data frame.
• ncol()
ncol() function returns number of columns in a data frame.
• str()
str() function compactly displays the internal structure of R objects.
summary()
use the summary() function to return result summaries for each column of the
dataset.
Few R functions for understanding data in data frames
• head()
head()function is used to obtain the first n observations where n is set as 6 by
default.
• tail()
tail()function is used to obtain the last n observations where n is set as 6 by
default.
• edit()
• The edit() function will invoke the text editor on the R object.
Text Data in R
getwd()
getwd() command returns the absolute filepath of the current working
directory.
setwd()
setwd() command resets the current working directory to another
location as per users’ preference.
dir()
This function returns a character vector of the names of files or
directories in the named directory.
grep(pattern, a) Pattern argument contains matching The function returns string after
pattern searching for a text pattern into a
a is a character vector given text string.
toupper(a) a is a character vector The function converts a string into
uppercase
tolower(a) a is a character vector The function converts a string into
lowercase.
Copyright © 2018
List
To get the elements of the list, “emp” use the below command.
Copyright © 2018
List
Add an element with the name “EmpDesg” and value “Software Engineer” to the
list, “emp”.
Output:
Delete an element with the name “EmpUnit” and value “IT” from the list,
“emp”.
Copyright © 2018
Methods for Reading Data
Reading CSV Files
A CSV file uses .csv extension and stores data in a table structure
format in any plain text. The following function reads data from a CSV
file:
read.csv(“filename”)
where, filename is the name of the CSV file that needs to be imported.
Reading Spreadsheets
read.xlsx(“filename”,…)
where, filename argument defines the path of the file to be read; the
dots “…” define the other optional arguments.
Copyright © 2018
List
Add an element with the name “EmpDesg” and value “Software
Engineer” to the list, “emp”.
Output:
Delete an element with the name “EmpUnit” and value “IT” from the list,
“emp”.
Data Frames
head()
head()function is used to obtain the first n
observations where n is set as 6 by default.
tail()
tail()function is used to obtain the last n observations
where n is set as 6 by default.
edit()
The edit() function will invoke the text editor on the R
object.