0% found this document useful (0 votes)
58 views109 pages

Session1 2 (PRS)

The document discusses a course on business analytics that will introduce students to structured and unstructured data as well as the challenges and opportunities of big data. The course will provide a hands-on approach to applying analytical techniques and tools to business analytics using structured data. Students will gain practical experience through hands-on exercises in applying methods and tools over the business analytics lifecycle.

Uploaded by

gftr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views109 pages

Session1 2 (PRS)

The document discusses a course on business analytics that will introduce students to structured and unstructured data as well as the challenges and opportunities of big data. The course will provide a hands-on approach to applying analytical techniques and tools to business analytics using structured data. Students will gain practical experience through hands-on exercises in applying methods and tools over the business analytics lifecycle.

Uploaded by

gftr
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 109

Business Analytics 

Prof Manas Tripathi & Prof PR Srivastava


Business Analytics 

Though the course and the discussed methods will


specifically deal with structured data, it will also
introduce the concepts of Big Data and their
importance in business world. While making the
students aware of the challenges and opportunities
involved, it take them through the process of data
analytics based on structured data, using a hands on
approach towards the same. The hands on
approach is aimed at providing practical
opportunities to apply the various available
methods and tools in the context of Business
Analytics Lifecycle.
Business Analytics
Topic No. No of Sessions Topic, Sub‐topics Case / Exercise / Assignment Instructor

1 1‐2 Understanding Analytics and its Role   Course Brief  Prof. Manas Tripathi/ Prof. 


in the Organizations, Data   Introduction and Background of Data Analytics Praveen Ranjan Srivastava
Preparation  Knowledge Discovery Process
 Understanding the data
 Data Preprocessing

2 3‐4 Data warehouse, OLAP  Extraction, Transformation and Loading of  Prof. Manas Tripathi/ Prof. 


Data in Data Warehouse Praveen Ranjan Srivastava
 Data Preparation
 OLAP Operations
 Supervised and Unsupervised Learning

3 5‐6 Analytical Techniques for Business  Introduction to R Programming  Prof. Manas Tripathi/ Prof. 


Intelligence Praveen Ranjan Srivastava

4 7‐8 Analytical Techniques for Business  Classification   Prof. Manas Tripathi/ Prof. 


Intelligence Praveen Ranjan Srivastava

5 9‐10 Analytical Techniques for Business  Clustering   Prof. Manas Tripathi/ Prof. 


Intelligence Praveen Ranjan Srivastava

6 11‐12 Analytical Techniques for Business  Association Rule Mining Prof. Manas Tripathi/ Prof. 


Intelligence Praveen Ranjan Srivastava

7 13‐14 Analytical Techniques for Business  Neural Network   Prof. Manas Tripathi/ Prof. 


Intelligence Praveen Ranjan Srivastava

8 15‐16 Project Presentations Project Presentations Prof. Manas Tripathi/ Prof. 


Praveen Ranjan Srivastava

Evaluation Scheme
Project work & Presentation 40 %
Final End Term Exam 60 %
Software
(Business  Analytics) 

Types of Analytics
• Descriptive Analytics is, which use data
aggregation and data mining to provide insight
into the past and answer: “What has
happened?”
• Predictive Analytics is, which use statistical
models and forecasts techniques to understand
the future and answer: “What could happen?”
• Prescriptive Analytics is, which use
optimization and simulation algorithms to
advice on possible outcomes and answer:
“What should we do?”
Business Analytics 
Machine Learning 
Machine Learning 
Machine Learning 
Machine Learning 
Machine Learning 
Supervised learning
Machine Learning 
Unsupervised learning
Unsupervised learning is a machine learning
technique, where you do not need to supervise the
model.
Instead, you need to allow the model to work on its
own to discover information. It mainly deals with the
unlabelled data.
Unsupervised learning algorithms allows you to
perform more complex processing tasks compared
to supervised learning. Although, unsupervised
learning can be more unpredictable compared with
other natural learning methods.
Let's, take the case of a baby and her family dog.

She knows and identifies this dog.

Few weeks later a family friend


brings along a dog and tries to play
with the baby.

Baby has not seen this dog earlier. But it


recognizes many features (2 ears, eyes,
walking on 4 legs) are like her pet dog. She
identifies the new animal as a dog. This is
unsupervised learning, where you are not
taught but you learn from the data (in this
case data about a dog)
Machine Learning 
reinforcement learning
Business Analytics 

What is Business Analytics?
Classification of machine learning 
algorithms
Traditional Programming
Traditional programming is a manual process—
meaning a person (programmer) creates the
program. But without anyone programming the logic,
one has to manually formulate or code rules.

In machine learning, on the other hand, the


algorithm automatically formulates the rules from
the data.
Association Rule a concept of Mining
Decision Tree 
Tree based learning algorithms are considered to be one of the best and mostly used
supervised learning methods. Tree based methods empower predictive models with high
accuracy, stability and ease of interpretation. Unlike linear models, they map non‐linear
relationships quite well. They are adaptable at solving any kind of problem at hand.
Process (1): Model Construction

Classification
Algorithms
Training
Data

NAME RANK YEARS TENURED Classifier


Mike Assistant Prof 3 no (Model)
Mary Assistant Prof 7 yes
Bill Professor 2 yes
Jim Associate Prof 7 yes
IF rank = ‘professor’
Dave Assistant Prof 6 no
OR years > 6
Anne Associate Prof 3 no
THEN tenured = ‘yes’
Process (2): Using the Model in Prediction 

Classifier

Testing
Data Unseen Data

(Jeff, Professor, 4)
NAM E RANK YEARS TENURED
Tom Assistant Prof 2 no Tenured?
M erlisa Associate Prof 7 no
George Professor 5 yes
Joseph Assistant Prof 7 yes
Decision Tree Induction: Training Dataset
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
YES

>40       low no fair ??
NO
>40       low no fair ??

Decision Tree Induction: An Example
age income student credit_rating buys_computer
 Training data set: Buys_computer <=30 high no fair no
 Resulting tree: <=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
age? <=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
<=30 overcast
31..40 >40 31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

student? yes credit rating?

no yes excellent fair

no yes yes
Clustering 
Types of Unsupervised Learning
Unsupervised learning problems further grouped into clustering and association problems.
Clustering 
Clustering 

Clustering is an important concept when it


comes to unsupervised learning. It mainly
deals with finding a structure or pattern in a
collection of uncategorized data. Clustering
algorithms will process your data and find
natural clusters(groups) if they exist in the
data. You can also modify how many clusters
your algorithms should identify. It allows you
to adjust the granularity of these groups.
Introduction to R

Indian Institute of Management Rohtak
Introduction
• Why R?
 Open source and free: Reviewed by many internationally 
renowned INDUSTERY and computational scientists
 Advanced tools
 Very competent community
 The graphical capabilities are outstanding
 Cross‐platform: Used on GNU/Linux, Macintosh, and Microsoft 
Windows, running on both 32 and 64 bit processors
 Compatibility: importing data from SAS, SPSS, or directly from 
Microsoft Excel, Microsoft Access, Oracle, MySQL, and SQLite. It 
can also produce graphics output in PDF, JPG, PNG, and SVG 
formats, and table output for LATEX and HTML.
• Limitations
 Poor documentation, Memory consumption, and Lack of a point 
of contact
Indian Institute of Management Rohtak
Getting Started
• To install R on your MAC or PC you first need 
to go to http://www.r‐project.org/

Indian Institute of Management Rohtak
Choose a CRAN (Comprehensive R 
Archive Network) Mirror

Indian Institute of Management Rohtak
Download and Install R

Indian Institute of Management Rohtak
Download and Install R

Indian Institute of Management Rohtak
Download and Install R

Indian Institute of Management Rohtak
Download and Install R
> A=3
> A+1
[1] 4
> a=a*3
Error: object 'a' not found
> A=A*4
> A
[1] 12
> Print(A)
Error: could not find function "Print"
> print(A)
[1] 12
> •Ctrl+L — Clear the Console Indian Institute of Management Rohtak
Advantages of RStudio
• It is an Integrated Development 
Environment (IDE) for R 
development
• Pre‐requisite: R should be installed
• It provides better graphics
• It provides better user interface

Indian Institute of Management Rohtak
RStudio
• RStudio allows the user to run R in a more user‐
friendly environment. http://www.rstudio.com/ 

Indian Institute of Management Rohtak
RStudio

Indian Institute of Management Rohtak
RStudio

Indian Institute of Management Rohtak
Rstudio screenshot

Indian Institute of Management Rohtak
Source Editor
We can create new scripts or programs for later use

Able to create in source 
editor untitled2

Indian Institute of Management Rohtak
Source Editor
• We can create scripts for future use
• We can run commands line wise or the entire list of commands by selecting 
them and clicking on Run

Indian Institute of Management Rohtak
Source Editor

Indian Institute of Management Rohtak
Assigning values
> x <‐ 2 # or x = 2
> print(x)
[1] 2
> x
[1] 2
Note: R is case sensitive
You can see the variables in the environment window
> x <‐ 5
> x
[1] 5
Note: R overwrites objects. It takes the value of latest assignment
> xx <‐ “Rstudio”
> xx
[1] “Rstudio”
Performing operations
> 5*9
[1] 45

Indian Institute of Management Rohtak
R Console
• This is the Rstudio pane where you type your 
commands or run directly from scripts
• R Console is an interactive window where you provide 
inputs and get outputs

Indian Institute of Management Rohtak
Workspace
The workspace is your current R working 
environment and includes any user‐defined objects 
(vectors, matrices, data frames, lists, functions)

Indian Institute of Management Rohtak
Workspace
At the end of an R session, the user can save an 
image of the current workspace that is 
automatically reloaded the next time R is started.

Indian Institute of Management Rohtak
Workspace and Working Directory
> getwd():It shows the working directory

Indian Institute of Management Rohtak
Workspace and Working Directory
> setwd(“/the_path_of_the_directory”): set the 
working directory

Indian Institute of Management Rohtak
Workspace and Working Directory
> dir (): shows all the data stored in the workspace memory

Indian Institute of Management Rohtak
Workspace and Working Directory
> y <‐ 10     #Create a variable

Indian Institute of Management Rohtak
Workspace and Working Directory
> rm(variable_name): remove the variable

Indian Institute of Management Rohtak
Workspace History
History: All the commands executed in past appears here. 
Double click on the command pushes it to the R console which 
can be executed with enter key.

Indian Institute of Management Rohtak
Output/Packages
Files: All the files in the system are shown(default: 
working directory)

Indian Institute of Management Rohtak
Output/Packages
Packages: Packages can be installed here. Installed 
packages are shown in the window

Indian Institute of Management Rohtak
Output/Packages
Plots: Figures are plotted here

Indian Institute of Management Rohtak
Output/Packages
Plots: Figures are plotted here

Indian Institute of Management Rohtak
Output/Packages
Plots: Figures are plotted here

Indian Institute of Management Rohtak
Output/Packages
Help: The documentation is present here. You can 
search for keywords
?read.tabl (write in console)

Indian Institute of Management Rohtak
Basic Useful things
• If you enter incomplete command, R will let you 
know by providing a ‘+’ sign until you complete the 
command
• The UP key in the keyboard can be used to get the 
previous commands used
• ‘#’ is used for writing comments: the line would not 
be evaluated
• Control key+ l: to clear the console screen
• Control key + q: quits the Rstudio
• Note: All the shortcuts are mentioned in the menu

Indian Institute of Management Rohtak
Operation Symbols
Symbol Meaning

+ Addition

- Subtraction

* Multiplication

/ Division

Modulo (estimates remainder in a


%%
division)

^ Exponential

Indian Institute of Management Rohtak
Creating Vectors
A vector is a sequence of data elements of the same basic type. 
We create a vector in R using “c” or “concatenate command”

Indian Institute of Management Rohtak
Creating Vectors
Creating sequences

Creating repeating vectors

Indian Institute of Management Rohtak
Operations on vectors

Indian Institute of Management Rohtak
Extracting Vectors
x <‐c(1,3,5,7,9)

Indian Institute of Management Rohtak
Accessing Vector Element
> t <‐c("Sun","Mon","Tue","Wed","Thurs","Fri","Sat")

Can you print Mon, Tue and Friday ??


> u <-t[c(2,3,6)]
> print(u)
[1] "Mon" "Tue" "Fri"

> v <-t[c(TRUE,FALSE,FALSE,FALSE,FALSE,TRUE,FALSE)]
> print(v)
[1] "Sun" "Fri"

Indian Institute of Management Rohtak
List

list1 <-list(c(2,5,3),21.3,"ramanjeet")
...............................
> print(list1)
[[1]]
[1] 2 5 3

[[2]]
[1] 21.3

[[3]]
[1] "ramanjeet"
Indian Institute of Management Rohtak
Relational Operator
> v <-c(2,5.5,6,9)

> t <-c(8,2.5,14,9)

> print(v>t)

FALSE TRUE FALSE FALSE

>print(v>=t)

FALSE TRUE FALSE TRUE

Indian Institute of Management Rohtak
Matrices
Creating a Matrix

Indian Institute of Management Rohtak
Matrices

Indian Institute of Management Rohtak
Matrices
Creating a Matrix

Indian Institute of Management Rohtak
Extracting Matrix Values

Indian Institute of Management Rohtak
Operations on Matrices

Indian Institute of Management Rohtak
Operations on Matrices
# Create two 2x3 matrices.
matrix1 <- matrix(c(3, 9, -1, 4, 2, 6), nrow=2)
print(matrix1)
matrix2 <- matrix(c(5, 2, 0, 9, 3, 4), nrow=2)
print(matrix2)
# Multiply the matrices.
result <- matrix1 * matrix2
cat("Result of multiplication","\n")
print(result)

[,1] [,2] [,3]
[1,]   15    0    6
[2,]   18   36   24
Indian Institute of Management Rohtak
Data Frames
Data frame is a two dimensional data structure in R. 
It contains vectors of equal lengths
Creating Data Frames

Indian Institute of Management Rohtak
Operations on Data Frames

#x is a data frame of id, 
age, and name vectors

#Shows the number of rows, columns, 
and the dimension of data frame x

Indian Institute of Management Rohtak
Lists
A list is a generic vector containing other objects of varied lengths.
Objects can be numeric, characters, data frames, sequences, etc.
Creating a List

#x is the data frame created in the 
previous section

#bucketList is a list and contains objects 
such as numbers, characters, sequences, 
and data frames

Indian Institute of Management Rohtak
Operation on data frame
x<‐
data.frame("roll"=1:5,"name"=c("
jack","jill","jeeva","smith","bob"),
"age"=c(20,22,30,28,21))
Sort by name

newdata<‐x[order(x$name),]
Sort by age and with in sort by name

newdata<‐x[order(x$age,x$name),]
Indian Institute of Management Rohtak
Vector Combined Object
city <- c("Tampa","Seattle","Hartford","Denver")

state <- c("FL","WA","CT","CO")

zipcode <- c(33602,98104,06161,80294)

# Combine above three vectors into one data frame.


addresses <- cbind(city,state,zipcode)
# Print a header.
cat("# # # # The First data frame\n")
# Print the data frame.
print(addresses) Indian Institute of Management Rohtak
Vector Combined Object

# # # # The First data frame


city state zipcode
[1,] "Tampa" "FL" "33602"
[2,] "Seattle" "WA" "98104"
[3,] "Hartford" "CT" "6161"
[4,] "Denver" "CO" "80294"

Indian Institute of Management Rohtak
Import from Files tab
• Save the “studentsCSV” file provided in 
Documents of your system (working directory). 
The file would appear in Files tab in Rstudio
output/packages pane.

Indian Institute of Management Rohtak
Indian Institute of Management Rohtak
Indian Institute of Management Rohtak
Import data from Excel
• Pre‐requisites: Save the data file as
– Comma‐separated value: .csv
– Tab delimited text file: .txt
1. From the Files tab of Output/Packages 
window
2. Import from browsing
3. From File menu

Indian Institute of Management Rohtak
Import from Files tab
Click on Import button

Indian Institute of Management Rohtak
Import from Files tab
Click on Import button

Indian Institute of Management Rohtak
Import from Files tab

Indian Institute of Management Rohtak
View the imported data
The imported data would be visible on the Source Editor pane and also can be 
viewed with writing studentsCSV and hitting enter in the console 

dim(studentsCSV)
RETURN 10 ROWS 
3 COLOUMN

Indian Institute of Management Rohtak
Import CSV file from Browse: read.csv 
data1 <‐ read.csv(file.choose(), header=T)
• Here read.csv would allow to read from a csv file, file.choose() would allow to 
select the file from browsing, header=T means headers such as Name, Age for 
the data is present in the file
• Select the “studentsCSV” file from Documents 

Indian Institute of Management Rohtak
Import CSV file from Browse: read.csv
The imported data would be visible on the Source Editor pane 
and also can be viewed with writing data1 and hitting enter 
in the console 

View(data1)

Indian Institute of Management Rohtak
Import CSV file from Browse: read.table
data2 <‐ read.table(file.choose(), header=T, sep=",")
• Here read.table would allow to read from a file in table format, file.choose() would 
allow to select the file from browsing, header=T means headers such as Name, Age for 
the data is present in the file, and sep="," means the file is in comma separated format
• Select the “studentsCSV” file from Documents 

Indian Institute of Management Rohtak
Import CSV file from Browse: read.table
The imported data would be visible on the Source Editor 
pane and also can be viewed with writing data2 and hitting 
enter in the console 

Indian Institute of Management Rohtak
Import txt file from Browse: read.delim
• Save the “studentsTXT” file provided in the Documents in your system(working directory)
data2 <‐ read.delim(file.choose(), header=T)
• Here read.delim would allow to read from a tab delimited file, file.choose() would allow 
to select the file from browsing, header=T means headers such as Name, Age for the data 
is present in the file.
• Select the “studentsTXT” file from Documents 

Indian Institute of Management Rohtak
Import txt file from Browse: read.delim
The imported data would be visible on the Source 
Editor pane and also can be viewed with writing 
data2 and hitting enter in the console 

Indian Institute of Management Rohtak
Import txt file from Browse: read.table
data2 <‐ read.table(file.choose(), header=T, sep="\t")
• Here read.table would allow to read from a file in table format, file.choose() would 
allow to select the file from browsing, header=T means headers such as Name, Age 
for the data is present in the file, and sep="\t" means the file is in tab delimited 
format
• Select the “studentsTXT” file from Documents 

Indian Institute of Management Rohtak
Import txt file from Browse: read.table
The imported data would be visible on the Source Editor 
pane and also can be viewed with writing data2 and hitting 
enter in the console 

Indian Institute of Management Rohtak
Import from File Menu
Go to File ‐> Import Dataset ‐> From CSV

Indian Institute of Management Rohtak
Import from File Menu
Click on the Browse button

Indian Institute of Management Rohtak
Import from File Menu
Select the studentsCSV file and click on Import button

Indian Institute of Management Rohtak
Import Data (input.csv)
 data <‐ read.csv("C:/Users/admin/Desktop/input.csv")

• data <- read.csv("input.csv")

data <‐ read.csv(file.choose(), header=T)
Excel file
library("readxl") 
my_data <‐
read_excel("my_file.xls") 
my_data <‐ read_excel(file.choose())
# Specify sheet by its name 
my_data <‐ read_excel("my_file.xlsx", sheet = 
"data") 
# Specify sheet by its index 
my_data <‐ read_excel("my_file.xlsx", sheet = 2)
Import from Desktop
The function read.table() can then be used to read the data frame directly

> airqual <‐ read.table("C:/Desktop/airquality.txt")

Similarly, to read .csv files the read.csv() function can be used to read in the data frame 
directly

> airqual <‐ read.csv("C:/Desktop/airquality.csv")
excle:
library("readxl")
airqual <‐
read_excel("C:\\Users\\admin\\Desktop\\BA_Gradesheet.
xlsx")
Practices
In a given csv file (input.csv)
# Get the max salary from data frame.

# Get the person detail having max salary.

#Get all the people working in IT department

#Get the persons in IT department whose salary is


greater than 600

## Write filtered data into a new file.

Indian Institute of Management Rohtak
Practices
Paste your input.csv file in working directory
data <- read.csv("input.csv")
print(data)
# Get the max salary from data frame.
sal <- max(data$salary)
print(sal)
[1] 843.25
Get the details of the person with max salary
# Get the person detail having max salary.
retval <- subset(data, salary == max(salary))
print(retval)

Indian Institute of Management Rohtak
Practices
#Get all the people working in IT department
retval <- subset( data, dept == "IT")
print(retval)

#Get the persons in IT department whose salary is greater


than 600

info <- subset(data, salary > 600 & dept == "IT")


print(info)

# Write filtered data into a new file.


write.csv(retval,"output.csv")
newdata <- read.csv("output.csv")
print(newdata)
Indian Institute of Management Rohtak
Thank you !!!
Indian Institute of Management Rohtak

You might also like