DSC551 (PROGRAMMING FOR DATA SCIENCE)
CHAPTER 2
DATA STRUCTURES IN R PROGRAMMING
PREPARED BY: DR NIK NUR FATIN FATIHAH BINTI SAPRI
1 • VECTORS
TYPE OF DATA
2 • FACTORS
STRUCTURES
IN R 3 • MATRICES
4 • ARRAY
A way of organizing data
5
for use in the computer.
• DATA FRAME
6 • LIST
1 • VECTORS A vector is simply a list of items that are of
the same type.
# Vector of strings
fruits <- c("banana", "apple", "orange")
# Vector of numerical values
numbers <- c(1, 2, 3)
# Vector of logical values
log_values <- c(TRUE, FALSE, TRUE, FALSE)
1 • VECTORS How to create vectors?
Vectors R function Use of Function Example
For example;
numbers <-
Combine/concatenate c( ) c(values)
c(1, 2, 3)
For example;
seq(from, to) seq(1,10)
seq(from, to, by=) seq(1,10,by=2)
seq(1,3,by=0.5)
Sequence seq( ) seq(20,0,by=-5)
seq(from, to, length=) seq(0,20,length=4)
seq(along) seq(5)
seq(1:5)
For example;
Replication rep( )
rep(value,no. of replication) rep(5,5)
1 • VECTORS Selecting element from vector:
x=c(1,4,5,3,9,10,12)
Functions Output Notes
X[5]
To print 1st – 3rd element
X[5]=100
y=x<8
X[-2]
length(x)
edit(x)
To remove element in 1st and 2nd place
X[-c(2,4)]
Functions in vector. Given x=c(1,3,5,7,9) and
1 • VECTORS y=c(-1,-3,-5,7,-9)
Functions Description Example/Output
mean(x) To compute mean
var(x) To compute variance
sd(x) To compute standard deviation
sum(x) To compute the summation
min(x) To find the minimum value
max(x) To find the maximum value
diff(x) To compute the difference between two vectors
identical(x,y) To check the element inside two vectors are similar
2 • FACTORS A factor used for categorical data. Can be
created using factor() function
#categorize the music genre
music_genre <-
factor(c("Jazz", "Rock", "Classic", "Classic", "Pop", "Jazz", "Rock", "Jazz"))
#categorize the opinion
Opinion=factor(c(“yes”,”yes”,”no”,”yes”,”no”))
Opinion_1=factor(c(“yes”,”yes”,”no”,”yes”,”no”),label=c(1,2))
3 • MATRICES
A matrix is a two dimensional data set with rows
and columns. Can be created using matrix()
function
Three ways in creating matrix: 1) matrix(data,nrow,ncol) function
2) dim() function
3) rbind() and cbind () function
3 • MATRICES
Matrices R function Example
m = matrix(1:6, nrow = 2, ncol = 3)
#By default, matrices constructed by column
Matrix matrix(data, nrow, ncol)
n = matrix(1:6, nrow = 2,ncol = 3,byrow = T)
x = c(2,4,6,8,10,12) #a vector
Dimension dim( )
dim(x) = c(2,3)
k = 1:3
Row bind rbind( ) l = 10:12
Column bind cbind( ) matrix1 = rbind(k,l) #row
matrix2 = cbind(k,l) #column
3 • MATRICES
Matrices can do computation. Given
x=matrix(4:7,nrow=2)
Matrices can do computation
Functions Description Example/Output
t(x) To find transpose matrix
det(x) To compute determinant
diag(x) To find diagonal
solve(x) To compute inverse matrix
rowMeans(x) To compute row means
colMeans(x) To compute column means
4 • ARRAY Array is multi dimensional data set with rows,
columns and groups. Can be created using array()
function
# An array with one dimension with values ranging from 1 to 24
thisarray <- c(1:24)
thisarray
# An array with more than one dimension
multiarray <- array(thisarray, dim = c(4, 3, 2))
multiarray
5
Data frames are used to store tabular data in R
• DATA FRAME (data set).
Can be viewed as a data table with rows for
cases and columns for variables (numeric,
character, logical, and etc.)
All of the columns in a data frame must be of
the same length
May contain columns of different data types.
Cannot do matrix multiplication on a data frame.
Can be created using data.frame() function
#Example
names = c(“ali", “abu", “siti", “sofea")
gender = c("male", "male", "female", "female")
age = c(25, 21, 30, 27)
occupation = c(“doctor", “lawyer", “doctor", “lawyer”)
Data=data.frame(names,gender,age,occupation)
Observe the above R code. Is there anything you would like to suggest?
6 • LIST A list in R can contain many different data
types inside it. Can be created using
list() function.
# List of strings
thislist <- list("apple", "banana", "cherry")
# to change the item inside list
thislist <- list("apple", "banana", "cherry")
thislist[1] <- "blackcurrant“
Exercises
Height(cm) Weight(kg) No.Hours sleeping BMI
175 80 8
165 85 7
150 50 12
155 55 10
168 63 9
153 45 7
165 74 6
177 90 7
180 86 8
164 74 8
150 53 11
1) Create a data frame for the above raw data
2) Calculate the bmi and add it into the data frame
3) Find the summary statistics for the data.