SC&RP - Unit 1
SC&RP - Unit 1
SC&RP - Unit 1
UNIT - 1
R Programming
R is a software environment which is used to analyze statistical information and graphical
representation. R allows us to do modular programming using functions.
What is R Programming
"R is an interpreted computer programming language which was created by Ross Ihaka and
Robert Gentleman at the University of Auckland, New Zealand." The R Development Core
Team currently develops R. It is also a software environment used to analyze statistical
information, graphical representation, reporting, and data modeling. R is the implementation
of the S programming language, which is combined with lexical scoping semantics. 18:10
R not only allows us to do branching and looping but also allows to do modular programming
using functions. R allows integration with the procedures written in the C, C++, .Net, Python, and
FORTRAN languages to improve efficiency.
In the present era, R is one of the most important tool which is used by researchers, data analyst,
statisticians, and marketers for retrieving, cleaning, analyzing, visualizing, and presenting data.
History of R Programming
The history of R goes back about 20-30 years ago. R was developed by Ross lhaka and Robert
Gentleman in the University of Auckland, New Zealand, and the R Development Core Team
currently develops it. This programming language name is taken from the name of both the
developers. The first project was considered in 1992. The initial version was released in 1995, and
in 2000, a stable beta version was released.
The following table shows the release date, version, and description of R language:
0.49 1997-04-23 First time R's source was released, and CRAN (Comprehensive R Archive
Network) was started.
2.13 2011-04-14 Added a function that rapidly converts code to byte code.
3.5 2018-04-23 Added new features such as compact internal representation of integer
sequences, serialization format etc.
Features of R programming
R is a domain-specific programming language which aims to do data analysis. It has some
unique features which make it very powerful. The most important arguably being the notation of
vectors. These vectors allow us to perform a complex operation on a set of values in a single
command. These are the following features of R programming:
1. It is a simple and effective programming language which has been well developed.
2. It is data analysis software.
3. It is a well-designed, easy, and effective language which has the concepts of user-defined, looping,
conditional, and various I/O facilities.
4. It has a consistent and incorporated set of tools which are used for data analysis.
5. For different types of calculation on arrays, lists and vectors, R contains a suite of operators.
The important task in data science is the way we deal with the data: clean, feature engineering,
feature selection, and import. It should be our primary focus. Data scientist job is to understand
the data, manipulate it, and expose the best approach. For machine learning, the best algorithms
can be implemented with R. Keras and TensorFlow allow us to create high-end machine learning
techniques. R has a package to perform Xgboost. Xgboost is one of the best algorithms for Kaggle
competition.
R communicate with the other languages and possibly calls Python, Java, C++. The big data world
is also accessible to R. We can connect R with different databases like Spark or Hadoop.
In brief, R is a great tool to investigate and explore the data. The elaborate analysis such as
clustering, correlation, and data reduction are done with R.
Comparison R Python
Index
Development Core Team currently develops released in 1991. Python has a very
R. R is also a software environment which is simple and clean code syntax. It
used to analyze statistical information, emphasizes the code readability and
graphical representation, reporting, and data debugging is also simple and easier in
modeling. Python.
Specialties for R packages have advanced techniques which For finding outliers in a data set both R
data science are very useful for statistical work. The CRAN and Python are equally good. But for
text view is provided by many useful R developing a web service to allow
packages. These packages cover everything peoples to upload datasets and find
from Psychometrics to Genetics to Finance. outliers, Python is better.
Functionalities For data analysis, R has inbuilt functionalities Most of the data analysis functionalities
are not inbuilt. They are available
through packages like Numpy and
Pandas
Key domains of Data visualization is a key aspect of analysis. Python is better for deep learning because
application R packages such as ggplot2, ggvis, lattice, etc. Python packages such as Caffe, Keras,
make data visualization easier. OpenNN, etc. allows the development of
the deep neural network in a very simple
way.
Availability of There are hundreds of packages and ways to Python has few main packages such as
packages accomplish needful data science tasks. viz, Sccikit learn, and Pandas for data
analysis of machine learning,
respectively.
Applications of R
There are several-applications available in real-time. Some of the popular applications are as
follows:
o Facebook
o Google
o Twitter
o HRDAG
o Sunlight Foundation
o RealClimate
o NDAA
o XBOX ONE
o ANZ
o FDA
Advantages
1) Open Source
An open-source language is a language on which we can work without any need for a license or a
fee. R is an open-source language. We can contribute to the development of R by optimizing our
packages, developing new ones, and resolving issues.
2) Platform Independent
R is a platform-independent language or cross-platform programming language which means its
code can run on all operating systems. R enables programmers to develop software for several
competing platforms by writing a program only once. R can run quite easily on Windows, Linux,
and Mac.
7) Statistics
R is mainly known as the language of statistics. It is the main reason why R is predominant than
other programming languages for the development of statistical tools.
8) Continuously Growing
R is a constantly evolving programming language. Constantly evolving means when something
evolves, it changes or develops over time, like our taste in music and clothes, which evolve as we
get older. R is a state of the art which provides updates whenever any new feature is added.
Disadvantages
1) Data Handling
In R, objects are stored in physical memory. It is in contrast with other programming languages
like Python. R utilizes more memory as compared to Python. It requires the entire data in one
single place which is in the memory. It is not an ideal option when we deal with Big Data.
2) Basic Security
R lacks basic security. It is an essential part of most programming languages such as Python.
Because of this, there are many restrictions with R as it cannot be embedded in a web-application.
3) Complicated Language
R is a very complicated language, and it has a steep learning curve. The people who don't have
prior knowledge or programming experience may find it difficult to learn R.
4) Weak Origin
The main disadvantage of R is, it does not have support for dynamic or 3D graphics. The reason
behind this is its origin. It shares its origin with a much older programming language "S."
5) Lesser Speed
R programming language is much slower than other programming languages such as MATLAB
and Python. In comparison to other programming language, R packages are much slower.
In R, algorithms are spread across different packages. The programmers who have no prior
knowledge of packages may find it difficult to implement algorithms.
Syntax of R Programming
R Programming is a very popular programming language which is broadly used in data analysis.
The way in which we define its code is quite simple. The "Hello World!" is the basic program for
all the languages, and now we will understand the syntax of R programming with "Hello world"
program. We can write our code either in command prompt, or we can use an R script file.
R Command Prompt
It is required that we have already installed the R environment set up in our system to work on the
R command prompt. After the installation of R environment setup, we can easily start R command
prompt by typing R in our Windows command prompt. When we press enter after typing R, it will
launch interpreter, and we will get a prompt on which we can code our program.
In the above code, the first statement defines a string variable string, where we assign a string
"Hello World!". The next statement print() is used to print the value which is stored in the variable
string.
R Script File
The R script file is another way on which we can write our programs, and then we execute those
scripts at our command prompt with the help of R interpreter known as Rscript. We make a text
file and write the following code. We will save this file with .R extension as:
Demo.R
To execute this file in Windows and other operating systems, the process will remain the same as
mentioned below.
Comments
In R programming, comments are the programmer readable explanation in the source code of an
R program. The purpose of adding these comments is to make the source code easier to understand.
These comments are generally ignored by compilers and interpreters.
In R programming there is only single-line comment. R doesn't support multi-line comment. But
if we want to perform multi-line comments, then we can add our code in a false block.
Single-line comment
In R, there are several data types such as integer, string, etc. The operating system allocates
memory based on the data type of the variable and decides what can be stored in the reserved
memory.
There are the following data types which are used in R programming:
Logical True, False It is a special data type for data with only two possible values
which can be construed as true/false.
Integer 3L, 66L, 2346L Here, L tells R to store the value as an integer,
Complex Z=1+2i, t=7+3i A complex value in R is defined as the pure imaginary value i.
variable_logical<- TRUE
cat(variable_logical,"\n")
cat("The data type of variable_logical is ",class(variable_logical),"\n\n")
When we execute the following program, it will give us the following output:
1. Atomic vector
2. List
3. Array
4. Matrices
5. Data Frame
6. Factors
Vectors
A vector is the basic data structure in R, or we can say vectors are the most basic R data objects.
There are six types of atomic vectors such as logical, integer, character, double, and raw. "A
vector is a collection of elements which is most commonly of mode character, integer, logical
or numeric". They can be created using the c() function.
nv<- c(1,2,3,4,5)
1. Atomic vector
2. Lists
List
In R, the list is the container. Unlike an atomic vector, the list is not restricted to be a single mode.
A list contains a mixture of data types. The list is also known as generic vectors because the
element of the list can be of any type of R object. "A list is a special type of vector in which each
element can be a different type."
We can create a list with the help of list() or as.list(). We can use vector() to create a required
length empty list.
Arrays
There is another type of data objects which can store data in more than two dimensions known as
arrays. "An array is a collection of a similar data type with contiguous memory
allocation." Suppose, if we create an array of dimension (2, 3, 4) then it creates four rectangular
matrices of two rows and three columns.
In R, an array is created with the help of array() function. This function takes a vector as an input
and uses the value in the dim parameter to create an array.
Matrices
A matrix is an R object in which the elements are arranged in a two-dimensional rectangular layout.
In the matrix, elements of the same atomic types are contained. For mathematical calculation, this
can use a matrix containing the numeric element. A matrix is created with the help of the matrix()
function in R.
Syntax
Data Frames
A data frame is a two-dimensional array-like structure, or we can say it is a table in which each
column contains the value of one variable, and row contains the set of value from each column.
Factors
Factors are also data objects that are used to categorize the data and store it as levels. Factors can
store both strings and integers. Columns have a limited number of unique values so that factors
are very useful in columns. It is very useful in data analysis for statistical modeling.
Factors are created with the help of factor() function by taking a vector as an input parameter.
Variables in R Programming
Variables are used to store the information to be manipulated and referenced in the R program.
The R variable can store an atomic vector, a group of atomic vectors, or a combination of many R
objects.
Language like C++ is statically typed, but R is a dynamically typed, means it check the type of
data type when the statement is run. A valid variable name contains letter, numbers, dot and
underlines characters. A variable name should start with a letter or the dot not followed by a
number.
var_name, Valid Variable can start with a dot, but dot should not be followed by a number. In this cas
var.name the variable will be invalid.
var_name% Invalid In R, we can't use any special character in the variable name except dot and underscor
.2var_name Invalid A variable name cannot start with a dot which is followed by a digit.
var_name2 Valid The variable contains letter, number and underscore and starts with a letter.
Assignment of variable
In R programming, there are three operators which we can use to assign the values to the variable.
We can use leftward, rightward, and equal_to operator for this purpose.
There are two functions which are used to print the value of the variable i.e., print() and cat(). The
cat() function combines multiples values into a continuous print output.
print(variable.1)
cat ("variable.1 is ", variable.1 ,"\n")
cat ("variable.2 is ", variable.2 ,"\n")
cat ("variable.3 is ", variable.3 ,"\n")
When we execute the above code in our R command prompt, it will give us the following output:
not declared of any data type. It gets the data type from the R-object, which is to be assigned to
the variable.
We can check the data type of the variable with the help of the class() function. Let's see an
example:
variable_y<- 124
cat("The data type of variable_y is ",class(variable_y),"\n")
variable_y<- 133L
cat(" Next the data type of variable_y becomes ",class(variable_y),"\n")
When we execute the above code in our R command prompt, it will give us the following output:
Keywords in R Programming
In programming, a keyword is a word which is reserved by a program because it has a special
meaning. A keyword can be a command or a parameter. Like in C, C++, Java, there is also a set
of keywords in R. A keyword can't be used as a variable name. Keywords are also called as
"reserved names."
if else repeat
NaN NA NA_integer_
Operators in R
In computer programming, an operator is a symbol which represents an action. An operator is a
symbol which tells the compiler to perform specific logical or mathematical manipulations. R
programming is very rich in built-in operators.
In R programming, there are different types of operator, and each operator performs a different
task. For data manipulation, There are some advance operators also such as model formula and list
indexing.
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math operations. The
operators act on each and every element of the vector. There are various arithmetic operators which
are supported by R.
6. %/% This operator is used to find the division of a <- c(2, 3.3, 4)
the first vector with the second(quotient). b <- c(11, 5, 3)
print(a%/%b)
Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities. These
include numerical equalities and inequalities. A relational operator compares each element of the
first vector with the corresponding element of the second vector. The result of the comparison will
be a Boolean value. There are the following relational operators which are supported by R:
1. > This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is greater than the b <- c(2, 4, 6)
print(a>b)
corresponding element of the second vector.
It will give us the following
output:
[1] FALSE FALSE
FALSE
2. < This operator will return TRUE when every a <- c(1, 9, 5)
element in the first vector is less then the b <- c(2, 4, 6)
print(a<b)
corresponding element of the second vector.
It will give us the following
output:
[1] FALSE TRUE
FALSE
3. <= This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is less than or equal to b <- c(2, 3, 6)
print(a<=b)
the corresponding element of another vector.
It will give us the following
output:
[1] TRUE TRUE
TRUE
4. >= This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is greater than or equal to b <- c(2, 3, 6)
print(a>=b)
the corresponding element of another vector.
It will give us the following
output:
[1] FALSE TRUE
FALSE
Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions. In
the program, each operand is considered as a condition which can be evaluated to a false or true
value. The value of the conditions is used to determine the overall value of the op1 operator op2.
Logical operators are applicable to those vectors whose type is logical, numeric, or complex.
The logical operator compares each element of the first vector with the corresponding element of
the second vector.
1. & This operator is known as the Logical AND operator. a <- c(3, 0, TRUE,
This operator takes the first element of both the vector 2+2i)
b <- c(2, 4, TRUE,
and returns TRUE if both the elements are TRUE.
2+3i)
print(a&b)
2. | This operator is called the Logical OR operator. This a <- c(3, 0, TRUE,
operator takes the first element of both the vector and 2+2i)
b <- c(2, 4, TRUE,
returns TRUE if one of them is TRUE.
2+3i)
print(a|b)
3. ! This operator is known as Logical NOT operator. This a <- c(3, 0, TRUE,
operator takes the first element of the vector and gives 2+2i)
print(!a)
the opposite logical value as a result.
It will give us the following
output:
[1] FALSE TRUE
FALSE FALSE
4. && This operator takes the first element of both the vector a <- c(3, 0, TRUE,
and gives TRUE as a result, only if both are TRUE. 2+2i)
b <- c(2, 4, TRUE,
2+3i)
print(a&&b)
5. || This operator takes the first element of both the vector a <- c(3, 0, TRUE,
and gives the result TRUE, if one of them is true. 2+2i)
b <- c(2, 4, TRUE,
2+3i)
print(a||b)
Assignment Operators
An assignment operator is used to assign a new value to a variable. In R, these operators are used
to assign values to vectors. There are the following types of assignment
1. <- or = or <<- These operators are known as left a <- c(3, 0, TRUE, 2+2i)
assignment operators. b <<- c(2, 4, TRUE, 2+3i)
d = c(1, 2, TRUE, 2+3i)
print(a)
print(b)
print(d)
2. -> or ->> These operators are known as right c(3, 0, TRUE, 2+2i) -> a
assignment operators. c(2, 4, TRUE, 2+3i) ->> b
print(a)
print(b)
Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators are not used
for general mathematical or logical computation. There are the following miscellaneous operators
which are supported in R
1) if
The if statement consists of a Boolean expression which is followed by one or more statements. In
R, if statement is the simplest conditional statement which is used to decide whether a block of the
statement will be executed or not.
Example:
1. a<-11
2. if(a<15)
3. + print("I am lesser than 15")
Output:
2) else
The R else statement is associated with if statement. When the if statement's condition is false only
then else block will be executed. Let see an example to make it clear:
Example:
1. a<-22
2. if(a<20){
3. cat("I am lesser than 20")
4. }else{
5. cat("I am larger than 20")
6. }
Output:
3) repeat
The repeat keyword is used to iterate over a block of code multiple numbers of times. In R, repeat
is a loop, and in this loop statement, there is no condition to exit from the loop. For exiting the
loop, we will use the break statement.
Example:
1. x <- 1
2. repeat {
3. cat(x)
4. x = x+1
5. if (x == 6){
6. break
7. }
8. }
Output:
4) while
A while keyword is used as a loop. The while loop is executed until the given condition is true.
This is also used to make an infinite loop.
Example:
1. a <- 20
2. while(a!=0){
3. cat(a)
4. a = a-2
5. }
Output:
5) function
A function is an object in R programming. The keyword function is used to create a user-define
function in R. R has some pre-defined functions also, such as seq, mean, and sum.
Example:
1. new.function<- function(n) {
2. for(i in 1:n) {
3. a <- i^2
4. print(a)
5. }
6. }
7. new.function(6)
Output:
6) for
The for is a keyword which is used for looping or iterating over a sequence (dictionary, string, list,
set or tuple).
We can execute a set of a statement once for each item in the iterator (list, set, tuple, etc.) with the
help of for loop.
Example:
1. v <- LETTERS[1:4]
2. for ( i in v) {
3. print(i)
4. }
Output:
7) next
The next keyword skips the current iteration of a loop without terminating it. When R parser found
next, it skips further evaluation and starts the new iteration of the loop.
Example:
1. v <- LETTERS[1:6]
2. for ( i in v) {
3. if (i == "D") {
4. next
5. }
6. print(i)
7. }
Output:
8) break
The break keyword is used to terminate the loop if the condition is true. The control of the
program firstly passes to the outer statement then passes to the body of the break statement.
Example:
1. n<-1
2. while(n<10){
3. if(n==3)
4. break
5. n=n+1
6. cat(n,"\n")
7. }
8. cat("End of the program")
Output:
9) TRUE/FALSE
The TRUE and FALSE keywords are used to represent a Boolean true and Boolean false. If the
given statement is true, then the interpreter returns true else the interpreter returns false.
10) NULL
In R, NULL represents the null object. NULL is used to represent missing and undefined values.
NULL is the logical representation of a statement which is neither TRUE nor FALSE.
Example:
1. as.null(list(a = 1, b = "c"))
Output:
Inf and -Inf are positive and negative infinity. NaN stands for 'Not a Number.' NaN applies on
numeric values and real and imaginary parts of complex values, but it will not apply to the values
of integer vectors.
Usage
1. is.finite(x)
2. is.infinite(x)
3. is.nan(x)
4.
5. Inf
6. NaN
12) NA
NA is a logical constant of length 1 that contains a missing value indicator. It can be coerced to
any other vector type except raw. There are other types of constant also, such as NA_Integer_,
NA_real_, NA_complex_, and NA_character. These constants are of the other atomic vector type
which supports missing values.
Usage
1. NA
2. is.na(x)
3. anyNA(x, recursive = FALSE)
4.
5. ## S3 method for class 'data.frame'
6. is.na(x)
7.
8. is.na(x) <- value
ymbol which tells the compiler to perform specific logical or mathematical manipulations. R
programming is very rich in built-in operators.
In R programming, there are different types of operator, and each operator performs a different
task. For data manipulation, There are some advance operators also such as model formula and list
indexing.
1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Miscellaneous Operators
Arithmetic Operators
Arithmetic operators are the symbols which are used to represent arithmetic math operations. The
operators act on each and every element of the vector. There are various arithmetic operators which
are supported by R.
4. / This operator divides the vector from another one. b <- c(11, 5, 3)
a <- c(2, 3.3, 4) print(a/b)
It will give us the following output:
[1] 0.1818182 0.6600000 4.0000000
6. %/% This operator is used to find the division of the first a <- c(2, 3.3, 4)
vector with the second(quotient). b <- c(11, 5, 3)
print(a%/%b)
7. ^ This operator raised the first vector to the exponent b <- c(11, 5, 3)
of the second vector. a <- c(2, 3.3, 4) print(a^b)
Relational Operators
A relational operator is a symbol which defines some kind of relation between two entities. These
include numerical equalities and inequalities. A relational operator compares each element of the
first vector with the corresponding element of the second vector. The result of the comparison will
be a Boolean value. There are the following relational operators which are supported by R:
1. > This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is greater than the b <- c(2, 4, 6)
print(a>b)
corresponding element of the second vector.
It will give us the following output:
[1] FALSE FALSE FALSE
2. < This operator will return TRUE when every a <- c(1, 9, 5)
element in the first vector is less then the b <- c(2, 4, 6)
print(a<b)
corresponding element of the second vector.
It will give us the following output:
[1] FALSE TRUE FALSE
3. <= This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is less than or equal to b <- c(2, 3, 6)
print(a<=b)
the corresponding element of another vector.
It will give us the following output:
[1] TRUE TRUE TRUE
4. >= This operator will return TRUE when every a <- c(1, 3, 5)
element in the first vector is greater than or equal b <- c(2, 3, 6)
print(a>=b)
to the corresponding element of another vector.
It will give us the following output:
[1] FALSE TRUE FALSE
Logical Operators
The logical operators allow a program to make a decision on the basis of multiple conditions. In
the program, each operand is considered as a condition which can be evaluated to a false or true
value. The value of the conditions is used to determine the overall value of the op1 operator op2.
Logical operators are applicable to those vectors whose type is logical, numeric, or complex.
The logical operator compares each element of the first vector with the corresponding element of
the second vector.
1. & This operator is known as the Logical AND operator. a <- c(3, 0, TRUE, 2+2i)
This operator takes the first element of both the vector b <- c(2, 4, TRUE, 2+3i)
print(a&b)
and returns TRUE if both the elements are TRUE.
It will give us the following output:
[1] TRUE FALSE TRUE
TRUE
2. | This operator is called the Logical OR operator. This a <- c(3, 0, TRUE, 2+2i)
operator takes the first element of both the vector and b <- c(2, 4, TRUE, 2+3i)
print(a|b)
returns TRUE if one of them is TRUE.
It will give us the following output:
[1] TRUE TRUE TRUE TRUE
3. ! This operator is known as Logical NOT operator. a <- c(3, 0, TRUE, 2+2i)
This operator takes the first element of the vector and print(!a)
gives the opposite logical value as a result.
It will give us the following output:
4. && This operator takes the first element of both the vector a <- c(3, 0, TRUE, 2+2i)
and gives TRUE as a result, only if both are TRUE. b <- c(2, 4, TRUE, 2+3i)
print(a&&b)
5. || This operator takes the first element of both the vector a <- c(3, 0, TRUE, 2+2i)
and gives the result TRUE, if one of them is true. b <- c(2, 4, TRUE, 2+3i)
print(a||b)
Assignment Operators
An assignment operator is used to assign a new value to a variable. In R, these operators are used
to assign values to vectors. There are the following types of assignment
1. <- or = or <<- These operators are known as left assignment operators. a <- c(3, 0, TRUE, 2+2i)
b <<- c(2, 4, TRUE, 2+3i)
d = c(1, 2, TRUE, 2+3i)
print(a)
print(b)
print(d)
2. -> or ->> These operators are known as right assignment operators. c(3, 0, TRUE, 2+2i) -> a
c(2, 4, TRUE, 2+3i) ->> b
print(a)
print(b)
Miscellaneous Operators
Miscellaneous operators are used for a special and specific purpose. These operators are not used
for general mathematical or logical computation. There are the following miscellaneous operators
which are supported in R
R Vector
A vector is a basic data structure which plays an important role in R programming.
In R, a sequence of elements which share the same data type is known as vector. A vector supports
logical, integer, double, character, complex, or raw data type. The elements which are contained
in vector known as components of the vector. We can check the type of vector with the help of
the typeof() function.
The length is an important property of a vector. A vector length is basically the number of elements
in the vector, and it is calculated with the help of the length() function.
Vector is classified into two parts, i.e., Atomic vectors and Lists. They have three common
properties, i.e., function type, function length, and attribute function.
PlayNext
Unmute
Duration 18:10
Loaded: 8.44%
Â
Fullscreen
Backward Skip 10sPlay VideoForward Skip 10s
There is only one difference between atomic vectors and lists. In an atomic vector, all the elements
are of the same type, but in the list, the elements are of different data types. In this section, we will
discuss only the atomic vectors. We will discuss lists briefly in the next topic.
We can create a vector with the help of the colon operator. There is the following syntax to use
colon operator:
1. z<-x:y
Example:
1. a<-4:-10
2. a
Output
[1] 4 3 2 1 0 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10
Example:
1. seq_vec<-seq(1,4,by=0.5)
2. seq_vec
3. class(seq_vec)
Output
Example:
1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. class(seq_vec)
Output
Atomic vectors in R
In R, there are four types of atomic vectors. Atomic vectors play an important role in Data Science.
Atomic vectors are created with the help of c() function. These atomic vectors are as follows:
Numeric vector
The decimal values are known as numeric data types in R. If we assign a decimal value to any
variable d, then this d variable will become a numeric type. A vector which contains numeric
elements is known as a numeric vector.
Example:
1. d<-45.5
2. num_vec<-c(10.1, 10.2, 33.2)
3. d
4. num_vec
5. class(d)
6. class(num_vec)
Output
[1] 45.5
Integer vector
A non-fraction numeric value is known as integer data. This integer data is represented by "Int."
The Int size is 2 bytes and long Int size of 4 bytes. There is two way to assign an integer value to
a variable, i.e., by using as.integer() function and appending of L to the value.
Example:
1. d<-as.integer(5)
2. e<-5L
3. int_vec<-c(1,2,3,4,5)
4. int_vec<-as.integer(int_vec)
5. int_vec1<-c(1L,2L,3L,4L,5L)
6. class(d)
7. class(e)
8. class(int_vec)
9. class(int_vec1)
Output
[1] "integer"
[1] "integer"
[1] "integer"
[1] "integer"
Character vector
A character is held as a one-byte integer in memory. In R, there are two different ways to create a
character data type value, i.e., using as.character() function and by typing string between double
quotes("") or single quotes('').
Example:
1. d<-'shubham'
2. e<-"Arpita"
3. f<-65
4. f<-as.character(f)
5. d
6. e
7. f
8. char_vec<-c(1,2,3,4,5)
9. char_vec<-as.character(char_vec)
10. char_vec1<-c("shubham","arpita","nishka","vaishali")
11. char_vec
12. class(d)
13. class(e)
14. class(f)
15. class(char_vec)
16. class(char_vec1)
Output
[1] "shubham"
[1] "Arpita"
[1] "65"
[1] "1" "2" "3" "4" "5"
[1] "shubham" "arpita" "nishka" "vaishali"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
[1] "character"
Logical vector
The logical data types have only two values i.e., True or False. These values are based on which
condition is satisfied. A vector which contains Boolean values is known as the logical vector.
Example:
1. d<-as.integer(5)
2. e<-as.integer(6)
3. f<-as.integer(7)
4. g<-d>e
5. h<-e<f
6. g
7. h
8. log_vec<-c(d<e, d<f, e<d,e<f,f<d,f<e)
9. log_vec
10. class(g)
11. class(h)
12. class(log_vec)
Output
[1] FALSE
[1] TRUE
[1] TRUE TRUE FALSE TRUE FALSE FALSE
[1] "logical"
[1] "logical"
[1] "logical"
Repeating Values:
You can create a vector by repeating the values using the rep() function. # Creates a vector with
five is
Ex:
r<-rep(1, times=5)
r
Vector Function:
Vector Length
To find out how many items a vector has, use the length() function
Ex:
fruits <-c(“banana”, “apple”, “orange”)
length(fruits)
Output: [1] 3
Sort a Vector:
To sort items in a vector alphabetically or numerically, use the sort() function:
Ex:
fruits <-c(“banana”, “apple”, “orange”)
n<-c(5, 6, 1, 8)
sort(fruits)
sort(n)
Output:
[1] “apple” “banana” “orange”
[1] 1 5 6 8
Change an item:
To change the value of a specific item, refer to the index number
Ex:
fruits <-c(“banana”, “apple”, “orange”)
fruits[2]<-“Mango”
fruits
Output:
[1] “banana” “Mango” “orange”
Example:
1. seq_vec<-seq(1,4,length.out=6)
2. seq_vec
3. seq_vec[2]
Output
Example:
1. char_vec<-c("shubham"=22,"arpita"=23,"vaishali"=25)
2. char_vec
3. char_vec["arpita"]
Output
Example:
1. a<-c(1,2,3,4,5,6)
2. a[c(TRUE,FALSE,TRUE,TRUE,FALSE,TRUE)]
Output
[1] 1 3 4 6
Vector Operation
In R, there are various operation which is performed on the vector. We can add, subtract, multiply
or divide two or more vectors from each other. In data science, R plays an important role, and
operations are required for data manipulation. There are the following types of operation which
are performed on the vector.
1) Combining vectors
The c() function is not only used to create a vector, but also it is also used to combine two vectors.
By combining one or more vectors, it forms a new vector which contains all the elements of each
vector. Let see an example to see how c() function combines the vectors.
Example:
1. p<-c(1,2,4,5,7,8)
2. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
3. r<-c(p,q)
Output
2) Arithmetic operations
We can perform all the arithmetic operation on vectors. The arithmetic operations are performed
member-by-member on vectors. We can add, subtract, multiply, or divide two vectors. Let see an
example to understand how arithmetic operations are performed on vectors.
Example:
1. a<-c(1,3,5,7)
2. b<-c(2,4,6,8)
3. a+b
4. a-b
5. a/b
6. a%%b
Output
[1] 3 7 11 15
[1] -1 -1 -1 -1
[1] 2 12 30 56
[1] 0.5000000 0.7500000 0.8333333 0.8750000
[1] 1 3 5 7
Example:
1. a<-c("Shubham","Arpita","Nishka","Vaishali","Sumit","Gunjan")
2. b<-c(TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)
3. a[b]
Output
4) Numeric Index
In R, we specify the index between square braces [ ] for indexing a numerical value. If our index
is negative, it will return us all the values except for the index which we have specified. For
example, specifying [-3] will prompt R to convert -3 into its absolute value and then search for the
value which occupies that index.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. q[2]
3. q[-4]
4. q[15]
Output
[1] "arpita"
[1] "shubham" "arpita" "nishka" "vaishali" "sumit"
[1] NA
5) Duplicate Index
An index vector allows duplicate values which means we can access one element twice in one
operation. Let see an example to understand how duplicate index works.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. q[c(2,4,4,3)]
Output
6) Range Indexes
Range index is used to slice our vector to form a new vector. For slicing, we used colon(:) operator.
Range indexes are very helpful for the situation involving a large operator. Let see an example to
understand how slicing is done with the help of the colon operator to form a new vector.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
2. b<-q[2:5]
3. b
Output
7) Out-of-order Indexes
In R, the index vector can be out-of-order. Below is an example in which a vector slice with the
order of first and second values reversed.
Example:
1. q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")b<-q[2:5]
2. q[c(2,1,3,4,5,6)]
Output
1. z=c("TensorFlow","PyTorch")
2. z
Output
Once our vector of characters is created, we name the first vector member as "Start" and the second
member as "End" as:
1. names(z)=c("Start","End")
2. z
Output
Start End
"TensorFlow" "PyTorch"
1. z["Start"]
Output
Start
"TensorFlow"
We can reverse the order with the help of the character string index vector.
1. z[c("Second","First")]
Output
Second First
"PyTorch" "TensorFlow"
Applications of vectors
1. In machine learning for principal component analysis vectors are used. They are extended to
eigenvalues and eigenvector and then used for performing decomposition in vector spaces.
2. The inputs which are provided to the deep learning model are in the form of vectors. These vectors
consist of standardized data which is supplied to the input layer of the neural network.
3. In the development of support vector machine algorithms, vectors are used.
4. Vector operations are utilized in neural networks for various operations like image recognition and
text processing.
R Lists
In R, lists are the second type of vector. Lists are the objects of R which contain elements of
different types such as number, vectors, string and another list inside it. It can also contain a
function or a matrix as its elements. A list is a data structure which has components of mixed data
types. We can say, a list is a generic vector which contains other objects.
Example
Output:
[[1]]
[1] 3 4 5 6
[[2]]
[1] "shubham" "nishka" "gunjan" "sumit"
[[3]]
[1] TRUE FALSE FALSE TRUE
List Functions:
R provides various functions for working with lists, including:
length(): Returns the number of elements in a list.
names(): Returns or sets the names of the elements in a list.
str(): Displays the structure of a list, showing its elements and data types.
unlist():Converts a list to a vector by flattening it.
Lists creation
The process of creating a list is the same as a vector. In R, the vector is created with the help of c()
function. Like c() function, there is another function, i.e., list() which is used to create a list in R.
A list avoid the drawback of the vector which is data type. We can add the elements in the list of
different data types.
Syntaxcreen
1. list()
1. list_1<-list(1,2,3)
2. list_2<-list("Shubham","Arpita","Vaishali")
3. list_3<-list(c(1,2,3))
4. list_4<-list(TRUE,FALSE,TRUE)
5. list_1
6. list_2
7. list_3
8. list_4
Output:
[[1]]
[1] 1
[[2]]
[1] 2
[[3]]
[1] 3
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] "Vaishali"
[[1]]
[1] 1 2 3
[[1]]
[1] TRUE
[[2]]
[1] FALSE
[[3]]
[1] TRUE
1. list_data<-list("Shubham","Arpita",c(1,2,3,4,5),TRUE,FALSE,22.5,12L)
2. print(list_data)
In the above example, the list function will create a list with character, logical, numeric, and vector
element. It will give the following output
Output:
[[1]]
[1] "Shubham"
[[2]]
[1] "Arpita"
[[3]]
[1] 1 2 3 4 5
[[4]]
[1] TRUE
[[5]]
[1] FALSE
[[6]]
[1] 22.5
[[7]]
[1] 12
1. Creating a list.
2. Assign a name to the list elements with the help of names() function.
3. Print the list data.
Let see an example to understand how we can give the names to the list elements.
Example
Output:
$Students
[1] "Shubham" "Nishka" "Gunjan"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
Let see an example of both methods to understand how they are used in the list to access elements.
Output:
[[1]]
[1] "Shubham" "Arpita" "Nishka"
[[1]]
[[1]][[1]]
[1] "BCA"
[[1]][[2]]
[1] "MCA"
[[1]][[3]]
[1] "B.tech"
Output:
$Student
[1] "Shubham" "Arpita" "Nishka"
$Student
[1] "Shubham" "Arpita" "Nishka"
$Marks
[,1] [,2] [,3]
[1,] 40 60 90
[2,] 80 70 80
$Course
$Course[[1]]
[1] "BCA"
$Course[[2]]
[1] "MCA"
$Course[[3]]
[1] "B. tech."
ADVERTISEMENT
Example
17.
18. # Updating the 3rd Element.
19. list_data[3] <- "Masters of computer applications"
20. print(list_data[3])
Output:
[[1]]
[1] "Moradabad"
$<NA>
NULL
$Course
[1] "Masters of computer applications"
Ex:
lst<-list(“apple”, “banana”, “cherry”)
nl<-lst[-1]
nl
Output:
[[1]]
[1] “banana”
[[2]]
[1]”cherry”
The unlist() function takes the list as a parameter and change into a vector. Let see an example to
understand how to unlist() function is used in R.
Example
1. # Creating lists.
2. list1 <- list(10:20)
3. print(list1)
4.
5. list2 <-list(5:14)
6. print(list2)
7.
8. # Converting the lists to vectors.
9. v1 <- unlist(list1)
10. v2 <- unlist(list2)
11.
12. print(v1)
13. print(v2)
14.
15. adding the vectors
16. result <- v1+v2
17. print(result)
Output:
[[1]]
[1] 1 2 3 4 5
[[1]]
[1] 10 11 12 13 14
[1] 1 2 3 4 5
[1] 10 11 12 13 14
[1] 11 13 15 17 19
Merging Lists
R allows us to merge one or more lists into one list. Merging is done with the help of the list()
function also. To merge the lists, we have to pass all the lists into list function as a parameter, and
it returns a list which contains all the elements which are present in the lists. Let see an example
to understand how the merging process is done.
Example
Output:
[[1]]
[[1]][[1]]
[1] 2
[[1]][[2]]
[1] 4
[[1]][[3]]
[1] 6
[[1]][[4]]
[1] 8
[[1]][[5]]
[1] 10
[[2]]
[[2]][[1]]
[1] 1
[[2]][[2]]
[1] 3
[[2]][[3]]
[1] 5
[[2]][[4]]
[1] 7
[[2]][[5]]
[1] 9
R Arrays
In R, arrays are the data objects which allow us to store data in more than two dimensions. In R,
an array is created with the help of the array() function. This array() function takes a vector as an
input and to create an array it uses vectors values in the dim parameter.
For example- if we will create an array of dimension (2, 3, 4) then it will create 4 rectangular
matrices of 2 row and 3 columns.
R Array Syntax
There is the following syntax of R arrays:
data
The data is the first argument in the array() function. It is an input vector which is given to the
array.
matrices
row_size
This parameter defines the number of row elements which an array can store.
column_size
This parameter defines the number of columns elements which an array can store.
dim_names
This parameter is used to change the default names of rows, columns, layers and blocks.
How to create?
In R, array creation is quite simple. We can easily create an array using vector and array() function.
In array, data is stored in the form of the matrix. There are only two steps to create a matrix which
are as follows
Let see an example to understand how we can implement an array with the help of the vectors and
array() function.
Example
Output
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
It is not necessary to give the name to the rows and columns. It is only used to differentiate the
row and column for better understanding.
Below is an example, in which we create two arrays and giving names to the rows, columns, and
matrices.
Example
Output
, , Matrix1
, , Matrix2
Example
1. , , Matrix1
2. Col1 Col2 Col3
3. Row1 1 10 13
4. Row2 3 11 14
5. Row3 5 12 15
6.
7. , , Matrix2
8. Col1 Col2 Col3
9. Row1 1 10 13
10. Row2 3 11 14
11. Row3 5 12 15
12.
13. Col1 Col2 Col3
14. 5 12 15
15.
16. [1] 13
17.
18. Col1 Col2 Col3
19. Row1 1 10 13
20. Row2 3 11 14
21. Row3 5 12 15
Manipulation of elements
The array is made up matrices in multiple dimensions so that the operations on elements of an
array are carried out by accessing elements of the matrices.
Example
Output
,,1
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,1
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
,,2
[,1] [,2] [,3]
[1,] 8 16 46
[2,] 4 73 36
[3,] 7 48 73
This function takes the array on which we have to perform the calculations. The basic syntax of
the apply() function is as follows:
Here, x is an array, and a margin is the name of the dataset which is used and fun is the function
which is to be applied to the elements of the array.
Example
Output
,,1
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
,,2
[,1] [,2] [,3]
[1,] 1 10 13
[2,] 3 11 14
[3,] 5 12 15
[1] 48 56 64
R Matrix
In R, a two-dimensional rectangular data set is known as a matrix. A matrix is created with the
help of the vector input to the matrix function. On R matrices, we can perform addition,
subtraction, multiplication, and division operation.
In the R matrix, elements are arranged in a fixed number of rows and columns. The matrix elements
are the real numbers. In R, we use matrix function, which can easily reproduce the memory
representation of the matrix. In the R matrix, all the elements must share a common basic type.
Example
1. matrix1<-matrix(c(11, 13, 15, 12, 14, 16),nrow =2, ncol =3, byrow = TRUE)
2. matrix1
Output
History of matrices in R
The word "Matrix" is the Latin word for womb which means a place where something is formed
or produced. Two authors of historical importance have used the word "Matrix" for unusual ways.
They proposed this axiom as a means to reduce any function to one of the lower types so that at
the "bottom" (0order) the function is identical to its extension.
Any possible function other than a matrix from the matrix holds true with the help of the process
of generalization. It will be true only when the proposition (which asserts function in question) is
true. It will hold true for all or one of the value of argument only when the other argument is
undetermined.
data
The first argument in matrix function is data. It is the input vector which is the data elements of
the matrix.
nrow
The second argument is the number of rows which we want to create in the matrix.
ncol
The third argument is the number of columns which we want to create in the matrix.
byrow
The byrow parameter is a logical clue. If its value is true, then the input vector elements are
arranged by row.
dim_name
The dim_name parameter is the name assigned to the rows and columns.
Let's see an example to understand how matrix function is used to create a matrix and arrange the
elements sequentially by row or column.
Example
Output
[2,] 4 8 12
[3,] 5 9 13
[4,] 6 10 14
1. We can access the element which presents on nth row and mth column.
2. We can access all the elements of the matrix which are present on the nth row.
3. We can also access all the elements of the matrix which are present on the mth column.
Let see an example to understand how elements are accessed from the matrix present on nth row
mth column, nth row, or mth column.
Example
Output
[1] 12
1. matrix[n, m]<-y
Here, n and m are the rows and columns of the element, respectively. And, y is the value which
we assign to modify our matrix.
Example
Output
Example 1
Output
Example 2
Output
row3 11 12 13
row4 14 15 16
Example 1
Output
row2 8 9 10
row3 11 12 13
row4 14 15 16
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 5 8 11 14 6 9 12 15 7 10 13 16
Matrix operations
In R, we can perform the mathematical operations on a matrix such as addition, subtraction,
multiplication, etc. For performing the mathematical operation on the matrix, it is required that
both the matrix should have the same dimensions.
Let see an example to understand how mathematical operations are performed on the matrix.
Example 1
Output
Matrix Transposition:
Transpose of a matrix using the t() function
result<-t(matrix1)
Matrix Inversion:
To find the inverse of a square matrix, use the solve() function
inverse<-solve(matrix1)
Applications of matrix
1. In geology, Matrices takes surveys and plot graphs, statistics, and used to study in different fields.
2. Matrix is the representation method which helps in plotting common survey things.
3. In robotics and automation, Matrices have the topmost elements for the robot movements.
4. Matrices are mainly used in calculating the gross domestic products in Economics, and it also helps
in calculating the capability of goods and products.
5. In computer-based application, matrices play a crucial role in the creation of realistic seeming
motion.
R Data Frame
A data frame is a two-dimensional array-like structure or a table in which a column contains values
of one variable, and rows contains one set of values from each column. A data frame is a special
case of the list in which each component has equal length.
A data frame is used to store data table and the vectors which are present in the form of a list in a
data frame, are of equal length.
In a simple way, it is a list of equal length vectors. A matrix can contain one type of data, but a
data frame can contain different data types such as numeric, character, factor, etc.
o Rectangular structure: Data frames are two dimensional structures with rows and
columns forming a rectangular shape. All columns must have same number of rows,
making them suitable for structured datasets.
o Column Names: The columns name should be non-empty.
o The rows name should be unique.
o The data which is stored in a data frame can be a factor, numeric, or character type.
o Each column contains the same number of data items.
Example
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita915.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
Example
9. stringsAsFactors = FALSE
10. )
11. # Printing the structure of data frame.
12. str(emp.data)
Output
1. We can extract the specific columns from a data frame using the column name.
2. We can extract the specific rows also from a data frame.
3. We can extract the specific rows corresponding to specific columns.
Let's see an example of each one to understand how data is extracted from the data frame with the
help these ways.
Output
emp.data.employee_idemp.data.sal
1 1 623.30
2 2 515.20
3 3 611.00
4 4 729.00
5 5 843.25
Output
Output
employee_id starting_date
2 2 2013-09-23
3 3 2014-11-15
We can
1. Add a column by adding a column vector with the help of a new column name using cbind()
function.
2. Add rows by adding new rows in the same structure as the existing data frame and using rbind()
function
Let's see an example to understand how rbind() function works and how the modification is done
in our data frame.
Output
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesal
1 1 Shubham623.30
2 2 Arpita515.20
3 3 Nishka611.00
4 4 Gunjan729.00
5 5 Sumit843.25
Example
Output
employee_idemployee_namesalstarting_date
1 1 Shubham623.30 2012-01-01
2 2 Arpita515.20 2013-09-23
3 3 Nishka611.00 2014-11-15
4 4 Gunjan729.00 2014-05-11
5 5 Sumit843.25 2015-03-27
employee_idemployee_namesalstarting_date
Min. :1 Length:5 Min. :515.2 Min. :2012-01-01
1st Qu.:2 Class :character 1st Qu.:611.0 1st Qu.:2013-09-23
Median :3 Mode :character Median :623.3 Median :2014-05-11
NaN (Not a Number): NaN represents an undefined or unpresentable value in numerical calculations. It is
often used when a mathematical operation doesn’t result in a valid numeric value.
Ex:
0/0 output: [1] NaN
Inf and –Inf (Positive and Negative Infinity): Inf represents positive infinity and –Inf represents negative
infinity. These values are that are beyond the representable range.
R factors
The factor is a data structure which is used for fields which take only predefined finite number of
values. These are the variable which takes a limited number of different values. These are the data
objects which are used to categorize the data and to store it on multiple levels. It can store both
integers and strings values, and are useful in the column that has a limited number of unique values.
Factors have labels which are associated with the unique integers stored in it. It contains predefined
set value known as levels and by default R always sorts levels in alphabetical order.
Attributes of a factor
There are the following attributes of a factor in R
a. X
It is the input vector which is to be transformed into a factor.
b. levels
It is an input vector that represents a set of unique values which are taken by x.
c. labels
It is a character vector which corresponds to the number of labels.
d. Exclude
It is used to specify the value which we want to be excluded,
e. ordered
It is a logical attribute which determines if the levels are ordered.
f. nmax
It is used to specify the upper bound for the maximum number of level.
R provides factor() function to convert the vector into factor. There is the following syntax of
factor() function
1. factor_data<- factor(vector)
Example
Output
Example
Output
[1] Shubham Nishka Arpita Nishka Shubham Sumit Nishka Shubham Sumit
[10] Arpita Sumit
Levels: Arpita Nishka Shubham Sumit
[1] Nishka
Levels: Arpita Nishka Shubham Sumit
[1] Shubham Nishka Arpita Shubham Sumit Nishka Shubham Sumit Arpita
[10] Sumit
Levels: Arpita Nishka Shubham Sumit
Modification of factor
Like data frames, R allows us to modify the factor. We can modify the value of a factor by simply
re-assigning it. In R, we cannot choose values outside of its predefined levels means we cannot
insert value if it's level is not present on it. For this purpose, we have to create a level of that value,
and then we can add it to our factor.
Example
Output
Warning message:
In `[<-.factor`(`*tmp*`, 4, value = "Gunjan") :
invalid factor level, NA generated
[1] Shubham Nishka Arpita Shubham
Levels: Arpita Nishka Shubham
[1] Shubham Nishka Arpita Gunjan Shubham
Levels: Arpita Nishka Shubham Gunjan
Example
Output
Example
Output
1. gl(n, k, labels)
1. n indicates the number of levels.
2. k indicates the number of replications.
3. labels is a vector of labels for the resulting factor levels.
Example
1. gen_factor<- gl(3,5,labels=c("BCA","MCA","B.Tech"))
2. gen_factor
Output
[1] BCA BCA BCA BCA BCA MCA MCA MCA MCA MCA
[11] B.Tech B.Tech B.Tech B.Tech B.Tech
Levels: BCA MCA B.Tech
An object is simply a collection of data (variables) and methods (functions). Similarly, a class is a
blueprint for that object.
Class System in R
While most programming languages have a single class system, R has three class systems:
S3 Class
S4 Class
Reference Class
S3 Class in R
S3 class is the most popular class in the R programming language. Most of the classes that
come predefined in R are of this type.
First we create a list with various components then we create a class using the class() function.
For example,
# create a list with required components
student1 <- list(name = "John", age = 21, GPA = 3.5)
Output
$name
[1] "John"
$age
[1] 21
$GPA
[1] 3.5
attr(,"class")
[1] "student"
In the above example, we have created a list named student1 with three components. Notice the
creation of class,
Here, Student_Info is the name of the class. And to create an object of this class, we have passed
the student1 list inside class() .
Finally, we have created an object of the Student_Info class and called the object student1 .
To learn more in detail about S3 classes, please visit R S3 class.
S4 Class in R
S4 class is an improvement over the S3 class. They have a formally defined structure which helps
in making objects of the same class look more or less similar.
Here, we have created a class named Student_Info with three slots (member variables): name , age ,
and GPA .
Now to create an object, we use the new() function. For example,
Here, inside new() , we have provided the name of the class "Student_Info" and value for all three
slots.
We have successfully created the object named student1 .
Example: S4 Class in R
Output
Slot "age":
[1] 21
Slot "GPA":
[1] 3.5
Here, we have created an S4 class named Student_Info using the setClass() function and an object
named student1 using the new() function.
To learn more in detail about S4 classes, please visit R S4 class.
Reference Class in R
Reference classes were introduced later, compared to the other two. It is more similar to the
object oriented programming we are used to seeing in other major programming languages.
# Student_Info() is our generator function which can be used to create new objects
student1 <- Student_Info(name = "John", age = 21, GPA = 3.5)
Output
In the above example, we have created a reference class named Student_Info using
the setRefClass() function.
Objects are created by setting Objects are created using Objects are created using generator
the class attribute new() functions
And we have used our generator function Student_Info() to create a new object student1 .
Explicit Coercion: You can explicitly coerce data types using functions like:
Function Description
as.list It accepts only dictionary type or vector as input arguments in the parameter
This allows you to convert data from one type to another based on your requirements.
x<-5
y<-as.numeric(x)
y
Output: [1] 5
Example:
The program prints the values and data types of these variables to demonstrate coercion
#Create a numeric variable
nvar<-42
#Coerce the numeric variable to a character
cvar<-as.character(nvar)
#Coerce the character variable back to a numeric
nv<-as.numeric(cvar)
#Prints the result
cat(“Original Numeric Variable”, nvar, “\n”)
cat(“Coerced to Character”, cvar, “\n”)
cat(“Coerced back to Numeric”,nv, “\n”)
#Check the data types
Output:
Original Numeric Varaible: 42
Coerced to Character: 42
Coeeced Back to Numeric: 42
Data type of numeric_var: numeric
Data type of character_var: character
Data type of numeric_var coerced: numeric
Plotting
R has a number of built-in tools for basic graph types such as histograms, scatter plots, bar
charts, boxplots and much more. Rather than going through all of different types, we will
focus on plot(), a generic function for plotting x-y data.
Ex:
In the above example, we have used the plot() function to plot one point on a graph.
plot(2, 4)
Here,
2 - specifies point on the x-axis
4 - specifies point on the y-axis
Output
# create a vector x
x <- c(2, 4, 6, 8)
# create a vector y
y <- c(1, 3, 5, 7)
Output
In the above example, we can plot multiple points on a graph using the plot() function and R
vector.
plot(x, y)
Here, we have passed two vectors: x and y inside plot() to plot multiple points.
The first item of x and y i.e. 2 and 1 respectively plots 1st point on graph and second item
of x and y plots 2nd point on graph and so on.
Note: Make sure the number of points on both vectors are the same.
Output
Scatter Plot:
Scatterplots are useful for visualizing the relationship and distribution of data points and for
identifying patterns, clusters or outliners.
x<-c(1, 2, 3, 4, 5)
x and y are numeric vectors representing the data to be plotted on the x-axis and y-axis respectively.
pch- 19 sets the type of point used in the plot (a filled circle)
main, xlab and ylab are used to set the plot’s title and axis labels.
Plot Labels:
The plot() function also accepts other parameters, such as main, xlab and ylab if you want to customize the
graph with a main title and different labels for x and y-axis.
plot(1:10, main=”My Graph”, xlab=”The x-axis”, ylab=”The y-axis”)
Line Plot:
A line plot in R is used to display data points connected by lines. It’s a useful visualization for showing
trends and changes in data over time across a continues variable.
plot(x, y, type="l", lwd=2, col="red", main="Line Plot", xlab="X-Axis", ylab="Y-Axis")
Bar Chart:
A bar chart, also known as a bar plot or bar graph, is a common type of data visualization in R used to
represent categorical data. It displays data using rectangular bars, where the lengths of each bar is
proportional o the value it represents.
barplot(y, names.arg=x, col="green", main="Bar Chart", xlab="X-axis", ylab="Y-axis")
Histogram:
A histogram in R is a graphical representation of the distribution of a continues or discrete dataset. It’s a
valuable tool for data visualizing the frequency or density of data within specified intervals or bins.
hist(y, col="purple", main="Histogram", xlab="Value", ylab="Frequency")
Pie Chart
A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical
proportion. Pie charts represents data visually as a fractional part of a whole, which can be an
effective communication tool.
pie(expenditure,
main = "Monthly Expenditure Breakdown",
labels = c("Housing", "Food", "Cloths", "Entertainment", "Other")
)
Boxplot:
A boxplot also known as a box-and –whisker plot is a graphical representation of the distribution
of a dataset. It displays the median, quartiles and potential outliers.