Unit 1
Unit 1
R Environment:
R is a popular programming language used for statistical computing and graphical
presentation
It is an integrated suite of software facilities for data manipulation, calculation and
graphical display
It is an interpreted programming language
it is a software environment that is widely used for statistical computing and data
analysis
it was developed by Ross Ilaka and Robert Gentleman at the University of
Auckland, New Zealand in 1993
R is an open-source implementation of the S programming language
The R Core Team was formed in 1997 to develop the language further.
Current Version is 4.4.1, released on 14.6.24
Why Use R?
It is a great resource for data analysis, data visualization, data science and machine
learning
It provides many statistical techniques (such as statistical tests, classification, clustering
and data reduction)
It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot, etc.
It works on different platforms(platform independent) (Windows, Mac, Linux)
It is open-source and free
It has a large community support
It has many packages (libraries of functions) that can be used to solve different
problems
1
Features of R
R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined recursive functions and input and output facilities.
R has an effective data handling and storage facility.
R provides operators for calculations on arrays, lists, vectors and matrices.
R provides a large, coherent and integrated collection of tools for data analysis.
R provides graphical facilities for data analysis and display either directly at the
computer or printing at the papers.
Output:
[1] 101 102 103
[1] 11 22 33
[1] 0.01 0.02 0.03
[1] 1.1 2.2 3.3
[1] 101 102 103 104 105
[1] 1 4 9
[1] 0.0000000 0.6931472 1.0986123
[1] 0.8414710 0.9092974 0.1411200
5
[1] 0.5403023 -0.4161468 -0.9899925
[1] 2.718282 7.389056 20.085537
max and min select the largest and smallest values in their arguments, even if they are
given several vectors. The length() function is used to print the number of elements in
vector.
Ex:
a <- c(2, 1, 5, 4, 5,6)
min(a) # [1] 1
max(a) # [1] 6
length(a) # [1] 6
Statistical measures like mean and median are essential for summarizing and
understanding the central tendency of a dataset. In R, these measures can be calculated
easily using built-in functions.
Mean : It is calculated by taking the sum of the values and dividing with the number of values in
a data series. The function mean() is used to calculate this in R.
Ex:
a <- c(2, 1, 5, 4, 5,6)
mean(a) # [1] 3.833333
Median : The median() function in R is used to compute the median (middle value) of a numeric
data set.
Ex:
a <- c(2, 1, 5, 4, 5,6)
median(a) # [1] 4.5
The parallel maximum and minimum functions pmax and pmin return a vector (of length
equal to their longest argument) that contains in each element the largest (smallest)
element in that position in any of the input vectors.
Ex:
a<-c(1,2,3,4)
b<-c(5,6)
pmax(a,b) # [1] 5 6 5 6
pmin(a,b) # [1] 1 2 3 4
The sqrt() function in R is used to compute the square root of a number or each element
in a numeric vector. The sqrt() function will return NaN for negative numbers because
the square root of a negative number is not defined in the realm of real numbers.
6
Syntax:
sqrt(x)
Ex:1
result <- sqrt(16)
print(result)
Output: 4
Ex:2
numbers <- c(4, 9, 16, 25)
sqrt_numbers <- sqrt(numbers)
print(sqrt_numbers)
Output: 2 3 4 5
Ex:3
result <- sqrt("a")
print(result)
Output:
Error in sqrt("a") : non-numeric argument to mathematical function
Execution halted
Ex:4
result <- sqrt(-9)
print(result)
Output : [1] NaN
sort(x) returns a vector of the same size as x with the elements arranged in increasing
order
Ex:
a<-c(11,2,1,23)
sort(a)
Output:
[1] 1 2 11 23
seq(from=1,to=10)
Output:
[1] 1 2 3 4 5 6 7 8 9 10
seq(1:5)
Output:
[1] 1 2 3 4 5
seq(-1,-10)
Output:
[1] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
seq(1,5,by=2)
Output:
[1] 1 3 5
seq(from=5, to=10)
Output:
[1] 5 6 7 8 9 10
seq(from=-5,length=5, by=2)
Output:
[1] -5 -3 -1 1 3
seq(-3, 2, by=.5)
Output:
[1] -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
A related function is rep() which can be used for replicating an object in various complicated
ways.
a<-c(2,3)
d<-rep(a, times=2)
d
d1<-rep(a, each=2)
d1
Output:
[1] 2 3 2 3
[1] 2 2 3 3
8
Logical Vectors:
Logical vectors in R are vectors that consist of Boolean values: TRUE, FALSE, and NA (Not
Applicable for missing values).
They are commonly used in conditional statements, subsetting, and logical operations.
Logical vectors are fundamental in R for performing tasks that involve condition checking
and data filtering.
Ways to create Logical Vectors:
1. Direct Assignment
a <- c(TRUE, FALSE, TRUE, NA)
2. Using comparison operations
Ex:1
v <- c(10, 20, 30, 40, 50)
vec <- v > 25 # c(30, 40, 50)
vec
Output:
[1] FALSE FALSE TRUE TRUE TRUE
Ex:2
a<-c(4,3,7,6,1)
res<-a%%2==0
print(res)
Output:
[1] TRUE FALSE FALSE TRUE FALSE
Missing Values:
Missing values are those elements that are not known. NA or NaN are reserved words that
indicate a missing value.
Dealing Missing Values in R:
Missing Values in R, are handled with the use of some pre-defined functions:
1. is.na() Function for Finding Missing values:
A logical vector is returned by this function that indicates all the NA values present. It returns a
Boolean value. If NA is present in a vector it returns TRUE else FALSE.
Ex:
x<- c(NA, 3, 4, NA, NA, NA)
is.na(x)
9
Output:
[1] TRUE FALSE FALSE TRUE TRUE TRUE
2. Using the is.nan() Function
These are produced by numerical computation. We can apply the is.nan() function to check for
NAN values. This function returns a vector containing logical values (either True or False). If
there are some NAN values present in the vector, then it returns True corresponding to that
position in the vector otherwise it returns False.
Ex:
myVector <- c(NA, 100, 241, NA, 0 / 0, 101, 0 / 0)
is.nan(myVector)
Character vectors:
Character Vector in R is a vector of a type character that is used to store strings and NA values.
Character strings are entered using either matching double (") or single (') quotes, but are
printed using double quotes (or sometimes without quotes).
A vector where each element has only alphabet characters (a-z or A-Z) of any size is a character
vector. For example, c('ABC','abc','AbC','12AB') is a character vector of length 3.
Ex:
a<-c('A','45C',"abc")
print(a)
res<-is.character(a)
print(res)
Output:
[1] "A" "45C" "abc"
[1] TRUE
Objects, their modes and attributes:
Intrinsic attributes: mode and length
The entities R operates on are technically known as objects.
Examples are vectors of numeric (real) or complex values, vectors of logical values and
vectors of character strings.
These are known as “atomic” structures since their components are all of the same type, or
mode, namely numeric, complex, logical, character and raw.
Vectors must have their values all of the same mode.
Thus any given vector must be unambiguously either logical, numeric, complex, character or
raw.
R also operates on objects called lists, which are of mode list.
These are ordered sequences of objects which individually can be of any mode
lists are known as “recursive” rather than atomic structures since their components can
their components can themselves be lists in their own right.
The other recursive structures are those of mode function and expression.
10
Functions are the objects that form part of the R system along with similar user written
functions.
Property of an object : Mode and Length
The functions mode(object) and length(object) can be used to find out the mode and length
of any defined structure.
z is a complex vector of length 100, then in an expression mode(z) is the character string
"complex" and length(z) is 100.
Changing the length of an object:
An “empty” object may still have a mode. For example e <- numeric() makes e an empty
vector structure of mode numeric. Similarly character() is a empty character vector. Once an
object of any size has been created, new components may be added to it simply by giving it
an index value outside its previous range. Thus e[3] <- 17 now makes e a vector of length 3
(the first two components of which are both NA).
Getting and setting attributes:
The attr() function is used to get or set the value of a specific attribute of an object.
Attributes in R are metadata that can be attached to R objects to provide additional
information.
Common examples of attributes include names, dimensions, class, and others.
attr(x, which)
attr(x, which) <- value
x: The R object from which you want to get or to which you want to set an attribute.
which: A string specifying the name of the attribute that want to access or modify.
value: The new value to assign to the attribute.
Getting an Attribute:
To get the value of an attribute, you use the attr() function with two arguments: the object and
the name of the attribute.
mat <- matrix(1:6, nrow = 2, ncol = 3)
attr(mat, "dim") # 2 3
Setting an Attribute:
To set the value of an attribute, use the attr() function with the assignment form.
vec <- 1:6
attr(vec, "dim") <- c(2, 3)
vec
Output:
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
Class of an Object:
11
In R, the class of an object is an attribute that defines how the object should behave and
interact with functions and methods.
The class of an object is fundamental in R's object-oriented programming, determining
the object's type and how it is treated by generic functions.
Ex:
x <- 42
class(x) # "numeric“
y <- "Hello"
class(y) # "character"
Setting the Class of an Object:
One can set or change the class of an object using the class() function with the assignment
form:
class(x)<-"Ingeter"
class(x) # “Integer”
Unclass() :
The unclass function in R is used to remove the class attribute from an object.
This can be useful when you want to strip an object of its class-specific behavior and
revert it to a more basic form, typically a vector or list.
By doing this, the object no longer behaves according to its class methods but rather as a
basic data type.
unclass(object)
Example:
# Create a factor
f <- factor(c("apple", "banana", "cherry", "apple"))
print(f)
# Unclass the factor
uf <- unclass(f)
print(uf)
Output:
[1] apple banana cherry apple
Levels: apple banana cherry
[1] 1 2 3 1
attr(,"levels")
[1] "apple" "banana" "cherry"
Ordered and unordered factors:
Factors:
The factor is a data structure which is used for fields which take only predefined finite
number of values.
These are the variable which takes a limited number of different values.
12
These are the data objects which are used to categorize the data and to store it on
multiple levels.
It can store both integers and strings values, and are useful in the column that has a
limited number of unique values.
A factor is a vector object used to specify a discrete classification (grouping) of the
components of other vectors of the same length.
R provides both ordered and unordered factors
Attributes of a factor:
x: It is the vector that needs to be converted into a factor.
Levels: It is a set of distinct values which are given to the input vector x.
Labels: It is a character vector corresponding to the number of labels.
Exclude: This will mention all the values you want to exclude.
Ordered: This logical attribute decides whether the levels are ordered.
nmax: It will decide the upper limit for the maximum number of levels.
Ex:
x <-c("female", "male", "male", "female")
print(x)
# Converting the vector x into a factor named gender
gender <factor(x)
print(gender)
o/p:
[1] "female" "male" "male" "female"
[1] female male male female
Levels: female male
To display levels:
gender <- factor(c("male", “female", "male", "female"))
levels(gender)
Output :
[1] "female" "male"
levels argument inside the factor():
Ex:
gender <- factor(c("female", "male", "male", "female"), levels = c("female", "transgender",
"male"))
gender
Output:
[1] female male male female
Levels: female transgender male
Output:
A B C
3 12 40
A B C
1.5 4.0 8.0
Ordered factors:
An ordered factor is a special type of factor where the levels are ordered.
This ordering allows for comparisons between levels.
For example, if you have an ordered factor representing "low", "medium", and
"high", we can say that "medium" is greater than "low".
The ordered() function creates such ordered factors.
Ex:
size = c("small", "large", "large", "small",
"medium", "large", "medium", "medium")
size_factor <- factor(size)
print(size_factor)
15
# ordering the levels
ordered.size <- factor(size, levels = c(
"small", "medium", "large"), ordered = TRUE)
print(ordered.size)
O/P
[1] small large large small medium large medium medium
Levels: large medium small
[1] small large large small medium large medium medium
Levels: small < medium < large
*****
16