Session1 2 (PRS)
Session1 2 (PRS)
Evaluation Scheme
Project work & Presentation 40 %
Final End Term Exam 60 %
Software
(Business Analytics)
Types of Analytics
• Descriptive Analytics is, which use data
aggregation and data mining to provide insight
into the past and answer: “What has
happened?”
• Predictive Analytics is, which use statistical
models and forecasts techniques to understand
the future and answer: “What could happen?”
• Prescriptive Analytics is, which use
optimization and simulation algorithms to
advice on possible outcomes and answer:
“What should we do?”
Business Analytics
Machine Learning
Machine Learning
Machine Learning
Machine Learning
Machine Learning
Supervised learning
Machine Learning
Unsupervised learning
Unsupervised learning is a machine learning
technique, where you do not need to supervise the
model.
Instead, you need to allow the model to work on its
own to discover information. It mainly deals with the
unlabelled data.
Unsupervised learning algorithms allows you to
perform more complex processing tasks compared
to supervised learning. Although, unsupervised
learning can be more unpredictable compared with
other natural learning methods.
Let's, take the case of a baby and her family dog.
What is Business Analytics?
Classification of machine learning
algorithms
Traditional Programming
Traditional programming is a manual process—
meaning a person (programmer) creates the
program. But without anyone programming the logic,
one has to manually formulate or code rules.
Classification
Algorithms
Training
Data
Classifier
Testing
Data Unseen Data
(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
M erlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Decision Tree Induction: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
YES
>40 low no fair ??
NO
>40 low no fair ??
Decision Tree Induction: An Example
age income student credit_rating buys_computer
Training data set: Buys_computer <=30 high no fair no
Resulting tree: <=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
age? <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
<=30 overcast
31..40 >40 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
no yes yes
Clustering
Types of Unsupervised Learning
Unsupervised learning problems further grouped into clustering and association problems.
Clustering
Clustering
Indian Institute of Management Rohtak
Introduction
• Why R?
Open source and free: Reviewed by many internationally
renowned INDUSTERY and computational scientists
Advanced tools
Very competent community
The graphical capabilities are outstanding
Cross‐platform: Used on GNU/Linux, Macintosh, and Microsoft
Windows, running on both 32 and 64 bit processors
Compatibility: importing data from SAS, SPSS, or directly from
Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. It
can also produce graphics output in PDF, JPG, PNG, and SVG
formats, and table output for LATEX and HTML.
• Limitations
Poor documentation, Memory consumption, and Lack of a point
of contact
Indian Institute of Management Rohtak
Getting Started
• To install R on your MAC or PC you first need
to go to http://www.r‐project.org/
Indian Institute of Management Rohtak
Choose a CRAN (Comprehensive R
Archive Network) Mirror
Indian Institute of Management Rohtak
Download and Install R
Indian Institute of Management Rohtak
Download and Install R
Indian Institute of Management Rohtak
Download and Install R
Indian Institute of Management Rohtak
Download and Install R
> A=3
> A+1
[1] 4
> a=a*3
Error: object 'a' not found
> A=A*4
> A
[1] 12
> Print(A)
Error: could not find function "Print"
> print(A)
[1] 12
> •Ctrl+L — Clear the Console Indian Institute of Management Rohtak
Advantages of RStudio
• It is an Integrated Development
Environment (IDE) for R
development
• Pre‐requisite: R should be installed
• It provides better graphics
• It provides better user interface
Indian Institute of Management Rohtak
RStudio
• RStudio allows the user to run R in a more user‐
friendly environment. http://www.rstudio.com/
Indian Institute of Management Rohtak
RStudio
Indian Institute of Management Rohtak
RStudio
Indian Institute of Management Rohtak
Rstudio screenshot
Indian Institute of Management Rohtak
Source Editor
We can create new scripts or programs for later use
Able to create in source
editor untitled2
Indian Institute of Management Rohtak
Source Editor
• We can create scripts for future use
• We can run commands line wise or the entire list of commands by selecting
them and clicking on Run
Indian Institute of Management Rohtak
Source Editor
Indian Institute of Management Rohtak
Assigning values
> x <‐ 2 # or x = 2
> print(x)
[1] 2
> x
[1] 2
Note: R is case sensitive
You can see the variables in the environment window
> x <‐ 5
> x
[1] 5
Note: R overwrites objects. It takes the value of latest assignment
> xx <‐ “Rstudio”
> xx
[1] “Rstudio”
Performing operations
> 5*9
[1] 45
Indian Institute of Management Rohtak
R Console
• This is the Rstudio pane where you type your
commands or run directly from scripts
• R Console is an interactive window where you provide
inputs and get outputs
Indian Institute of Management Rohtak
Workspace
The workspace is your current R working
environment and includes any user‐defined objects
(vectors, matrices, data frames, lists, functions)
Indian Institute of Management Rohtak
Workspace
At the end of an R session, the user can save an
image of the current workspace that is
automatically reloaded the next time R is started.
Indian Institute of Management Rohtak
Workspace and Working Directory
> getwd():It shows the working directory
Indian Institute of Management Rohtak
Workspace and Working Directory
> setwd(“/the_path_of_the_directory”): set the
working directory
Indian Institute of Management Rohtak
Workspace and Working Directory
> dir (): shows all the data stored in the workspace memory
Indian Institute of Management Rohtak
Workspace and Working Directory
> y <‐ 10 #Create a variable
Indian Institute of Management Rohtak
Workspace and Working Directory
> rm(variable_name): remove the variable
Indian Institute of Management Rohtak
Workspace History
History: All the commands executed in past appears here.
Double click on the command pushes it to the R console which
can be executed with enter key.
Indian Institute of Management Rohtak
Output/Packages
Files: All the files in the system are shown(default:
working directory)
Indian Institute of Management Rohtak
Output/Packages
Packages: Packages can be installed here. Installed
packages are shown in the window
Indian Institute of Management Rohtak
Output/Packages
Plots: Figures are plotted here
Indian Institute of Management Rohtak
Output/Packages
Plots: Figures are plotted here
Indian Institute of Management Rohtak
Output/Packages
Plots: Figures are plotted here
Indian Institute of Management Rohtak
Output/Packages
Help: The documentation is present here. You can
search for keywords
?read.tabl (write in console)
Indian Institute of Management Rohtak
Basic Useful things
• If you enter incomplete command, R will let you
know by providing a ‘+’ sign until you complete the
command
• The UP key in the keyboard can be used to get the
previous commands used
• ‘#’ is used for writing comments: the line would not
be evaluated
• Control key+ l: to clear the console screen
• Control key + q: quits the Rstudio
• Note: All the shortcuts are mentioned in the menu
Indian Institute of Management Rohtak
Operation Symbols
Symbol Meaning
+ Addition
- Subtraction
* Multiplication
/ Division
^ Exponential
Indian Institute of Management Rohtak
Creating Vectors
A vector is a sequence of data elements of the same basic type.
We create a vector in R using “c” or “concatenate command”
Indian Institute of Management Rohtak
Creating Vectors
Creating sequences
Creating repeating vectors
Indian Institute of Management Rohtak
Operations on vectors
Indian Institute of Management Rohtak
Extracting Vectors
x <‐c(1,3,5,7,9)
Indian Institute of Management Rohtak
Accessing Vector Element
> t <‐c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")
> v <-t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
> print(v)
[1] "Sun" "Fri"
Indian Institute of Management Rohtak
List
list1 <-list(c(2,5,3),21.3,"ramanjeet")
...............................
> print(list1)
[[1]]
[1] 2 5 3
[[2]]
[1] 21.3
[[3]]
[1] "ramanjeet"
Indian Institute of Management Rohtak
Relational Operator
> v <-c(2,5.5,6,9)
> t <-c(8,2.5,14,9)
> print(v>t)
>print(v>=t)
Indian Institute of Management Rohtak
Matrices
Creating a Matrix
Indian Institute of Management Rohtak
Matrices
Indian Institute of Management Rohtak
Matrices
Creating a Matrix
Indian Institute of Management Rohtak
Extracting Matrix Values
Indian Institute of Management Rohtak
Operations on Matrices
Indian Institute of Management Rohtak
Operations on Matrices
# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow=2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow=2)
print(matrix2)
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
print(result)
[,1] [,2] [,3]
[1,] 15 0 6
[2,] 18 36 24
Indian Institute of Management Rohtak
Data Frames
Data frame is a two dimensional data structure in R.
It contains vectors of equal lengths
Creating Data Frames
Indian Institute of Management Rohtak
Operations on Data Frames
#x is a data frame of id,
age, and name vectors
#Shows the number of rows, columns,
and the dimension of data frame x
Indian Institute of Management Rohtak
Lists
A list is a generic vector containing other objects of varied lengths.
Objects can be numeric, characters, data frames, sequences, etc.
Creating a List
#x is the data frame created in the
previous section
#bucketList is a list and contains objects
such as numbers, characters, sequences,
and data frames
Indian Institute of Management Rohtak
Operation on data frame
x<‐
data.frame("roll"=1:5,"name"=c("
jack","jill","jeeva","smith","bob"),
"age"=c(20,22,30,28,21))
Sort by name
newdata<‐x[order(x$name),]
Sort by age and with in sort by name
newdata<‐x[order(x$age,x$name),]
Indian Institute of Management Rohtak
Vector Combined Object
city <- c("Tampa","Seattle","Hartford","Denver")
Indian Institute of Management Rohtak
Import from Files tab
• Save the “studentsCSV” file provided in
Documents of your system (working directory).
The file would appear in Files tab in Rstudio
output/packages pane.
Indian Institute of Management Rohtak
Indian Institute of Management Rohtak
Indian Institute of Management Rohtak
Import data from Excel
• Pre‐requisites: Save the data file as
– Comma‐separated value: .csv
– Tab delimited text file: .txt
1. From the Files tab of Output/Packages
window
2. Import from browsing
3. From File menu
Indian Institute of Management Rohtak
Import from Files tab
Click on Import button
Indian Institute of Management Rohtak
Import from Files tab
Click on Import button
Indian Institute of Management Rohtak
Import from Files tab
Indian Institute of Management Rohtak
View the imported data
The imported data would be visible on the Source Editor pane and also can be
viewed with writing studentsCSV and hitting enter in the console
dim(studentsCSV)
RETURN 10 ROWS
3 COLOUMN
Indian Institute of Management Rohtak
Import CSV file from Browse: read.csv
data1 <‐ read.csv(file.choose(), header=T)
• Here read.csv would allow to read from a csv file, file.choose() would allow to
select the file from browsing, header=T means headers such as Name, Age for
the data is present in the file
• Select the “studentsCSV” file from Documents
Indian Institute of Management Rohtak
Import CSV file from Browse: read.csv
The imported data would be visible on the Source Editor pane
and also can be viewed with writing data1 and hitting enter
in the console
View(data1)
Indian Institute of Management Rohtak
Import CSV file from Browse: read.table
data2 <‐ read.table(file.choose(), header=T, sep=",")
• Here read.table would allow to read from a file in table format, file.choose() would
allow to select the file from browsing, header=T means headers such as Name, Age for
the data is present in the file, and sep="," means the file is in comma separated format
• Select the “studentsCSV” file from Documents
Indian Institute of Management Rohtak
Import CSV file from Browse: read.table
The imported data would be visible on the Source Editor
pane and also can be viewed with writing data2 and hitting
enter in the console
Indian Institute of Management Rohtak
Import txt file from Browse: read.delim
• Save the “studentsTXT” file provided in the Documents in your system(working directory)
data2 <‐ read.delim(file.choose(), header=T)
• Here read.delim would allow to read from a tab delimited file, file.choose() would allow
to select the file from browsing, header=T means headers such as Name, Age for the data
is present in the file.
• Select the “studentsTXT” file from Documents
Indian Institute of Management Rohtak
Import txt file from Browse: read.delim
The imported data would be visible on the Source
Editor pane and also can be viewed with writing
data2 and hitting enter in the console
Indian Institute of Management Rohtak
Import txt file from Browse: read.table
data2 <‐ read.table(file.choose(), header=T, sep="\t")
• Here read.table would allow to read from a file in table format, file.choose() would
allow to select the file from browsing, header=T means headers such as Name, Age
for the data is present in the file, and sep="\t" means the file is in tab delimited
format
• Select the “studentsTXT” file from Documents
Indian Institute of Management Rohtak
Import txt file from Browse: read.table
The imported data would be visible on the Source Editor
pane and also can be viewed with writing data2 and hitting
enter in the console
Indian Institute of Management Rohtak
Import from File Menu
Go to File ‐> Import Dataset ‐> From CSV
Indian Institute of Management Rohtak
Import from File Menu
Click on the Browse button
Indian Institute of Management Rohtak
Import from File Menu
Select the studentsCSV file and click on Import button
Indian Institute of Management Rohtak
Import Data (input.csv)
data <‐ read.csv("C:/Users/admin/Desktop/input.csv")
data <‐ read.csv(file.choose(), header=T)
Excel file
library("readxl")
my_data <‐
read_excel("my_file.xls")
my_data <‐ read_excel(file.choose())
# Specify sheet by its name
my_data <‐ read_excel("my_file.xlsx", sheet =
"data")
# Specify sheet by its index
my_data <‐ read_excel("my_file.xlsx", sheet = 2)
Import from Desktop
The function read.table() can then be used to read the data frame directly
> airqual <‐ read.table("C:/Desktop/airquality.txt")
Similarly, to read .csv files the read.csv() function can be used to read in the data frame
directly
> airqual <‐ read.csv("C:/Desktop/airquality.csv")
excle:
library("readxl")
airqual <‐
read_excel("C:\\Users\\admin\\Desktop\\BA_Gradesheet.
xlsx")
Practices
In a given csv file (input.csv)
# Get the max salary from data frame.
Indian Institute of Management Rohtak
Practices
Paste your input.csv file in working directory
data <- read.csv("input.csv")
print(data)
# Get the max salary from data frame.
sal <- max(data$salary)
print(sal)
[1] 843.25
Get the details of the person with max salary
# Get the person detail having max salary.
retval <- subset(data, salary == max(salary))
print(retval)
Indian Institute of Management Rohtak
Practices
#Get all the people working in IT department
retval <- subset( data, dept == "IT")
print(retval)