0% found this document useful (0 votes)
50 views

Busiess Analytics Data Mining Lecture 7

This document discusses predictive modeling and various machine learning techniques. It introduces neural networks and their basic elements like processing elements, network architecture, and information processing. Common neural network types like feedforward and recurrent networks are described. The document also discusses supervised learning using backpropagation and evaluating neural network models. Finally, it introduces other techniques like support vector machines and their use of kernels and hyperplanes for classification.

Uploaded by

utkarsh bhargava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Busiess Analytics Data Mining Lecture 7

This document discusses predictive modeling and various machine learning techniques. It introduces neural networks and their basic elements like processing elements, network architecture, and information processing. Common neural network types like feedforward and recurrent networks are described. The document also discusses supervised learning using backpropagation and evaluating neural network models. Finally, it introduces other techniques like support vector machines and their use of kernels and hyperplanes for classification.

Uploaded by

utkarsh bhargava
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Business analytics

Lecture 7- Predictive modelling


Why is it important to study
medical procedures
▪Clinical decision support systems that
use the outcome of data mining
studies can support healthcare
managers and/or medical
professionals in making accurate and
timely decisions to optimally allocate
resources in order to increase the
quantity and quality of medical
services
▪Healthcare systems Effectiveness is
probably more of a clinical concern,
while efficiency is more of a
managerial concern.
▪Clinical decision support systems that
use the outcome of data mining
studies are shown to be useful and
reasonably accurate predictors,
especially if used in combination
Neural Network Concepts
▪Neural networks (NN): a brain metaphor for
information processing
▪Neural computing
▪Artificial neural network (ANN)
▪Many uses for ANN for
▪ pattern recognition, forecasting, prediction, and
classification
▪Many application areas
▪ finance, marketing, manufacturing, operations,
information systems, and so on
Biological Neural Networks

▪Two interconnected brain cells (neurons)


Processing Information in ANN
Inputs Weights Outputs

x1
w1 Y1

x2 w2 Neuron (or PE) f (S )


. S = 
n
X iW
Y
. Y2
. i =1
i

.
. Summation
Transfer
.
Function
wn Yn
xn

▪A single neuron (processing element – PE) with


inputs and outputs
Biology Analogy
Elements of ANN
▪Processing element (PE)
▪Network architecture
▪Hidden layers
▪Parallel processing
▪Network information processing
▪Inputs
▪Outputs
▪Connection weights
▪Summation function
Elements of ANN

Neural Network with


One Hidden Layer
Elements of ANN
(a) Single neuron (b) Multiple neurons

x1 x1 w11 (PE) Y1
w1
w21
(PE) Y

w1 w12
x2 Y = X 1W1 + X 2W2
x2 w22 (PE) Y2
PE: Processing Element (or neuron)

Y1 = X1W11 + X 2W21
Summation Function for a Single w23
Y2 = X1W12 + X2W22
Neuron (a), and
Y3 = X 2W 23 (PE) Y3
Several Neurons (b)
Elements of ANN
▪Transformation (Transfer) Function
▪ Linear function
▪ Sigmoid (logical activation) function [0 1]
▪ Tangent Hyperbolic function [-1 1]

Summation function: Y = 3(0.2) + 1(0.4) + 2(0.1) = 1.2


X1 = 3 Transfer function: YT = 1/(1 + e-1.2) = 0.77
W
1 =0
.2

W2 = 0.4 Processing Y = 1.2


X2 = 1 YT = 0.77
element (PE)
.1
3
=0
W

X3 = 2
❖ Threshold value?
Neural Network Architectures
▪Architecture of a neural network is driven by
the task it is intended to address
▪ Classification, regression, clustering, general
optimization, association, ….
▪Most popular architecture: Feedforward, multi-
layered perceptron with backpropagation
learning algorithm
▪ Used for both classification and regression type
problems
▪Others – Recurrent, self-organizing feature
maps, Hopfield networks, …
Neural Network Architectures
Feed-Forward Neural Networks
Feed-forward MLP with 1 Hidden Layer
Neural Network Architectures
Recurrent Neural Networks
Other Popular ANN Paradigms
Self-Organizing Maps (SOM)

Input 1 ▪ First introduced


by the Finnish
Professor Teuvo
Input 2
Kohonen
▪ Applies to
clustering type
problems
Input 3
Development Process of an ANN
An MLP ANN Structure for
the Box-Office Prediction Problem
Testing a Trained ANN Model
▪Data is split into three parts
▪Training (~60%)
▪Validation (~20%)
▪Testing (~20%)

▪k-fold cross validation


▪Less bias
▪Time consuming
AN Learning Process
A Supervised Learning Process
ANN
Model
Three-step process:
1. Compute temporary
Compute
output outputs.
2. Compare outputs with
desired targets.
3. Adjust the weights and
Is desired
Adjust
weights
No
output repeat the process.
achieved?

Yes

Stop
learning
Backpropagation Learning
a(Zi – Yi)
x1 error
w1

x2 w2 Neuron (or PE) f (S )


Y = f (S )
. S = 
n
X iW i
Yi
. i =1

. Summation
Transfer
Function
wn
xn

▪Backpropagation of Error for a Single Neuron


Backpropagation Learning
▪The learning algorithm procedure
1. Initialize weights with random values and set other
network parameters
2. Read in the inputs and the desired outputs
3. Compute the actual output (by working forward
through the layers)
4. Compute the error (difference between the actual and
desired output)
5. Change the weights by working backward through the
hidden layers
6. Repeat steps 2-5 until weights stabilize
Illuminating The Black Box
Sensitivity Analysis on ANN
▪A common criticism for ANN: The lack of
transparency/explainability
▪The black-box syndrome!
▪Answer: sensitivity analysis
▪Conducted on a trained ANN
▪The inputs are perturbed while the relative
change on the output is measured /
recorded
▪Results illustrate the relative importance of
input variables
Sensitivity Analysis on ANN
Models
Systematically Trained ANN
Perturbed “the black-box” Observed
Inputs Change in
Outputs

D1

▪For a good example, see Application Case 6.3


▪ Sensitivity analysis reveals the most important injury
severity factors in traffic accidents
Support Vector Machines (SVM)
▪SVM are among the most popular machine-
learning techniques.
▪SVM belong to the family of generalized
linear models… (capable of representing
non-linear relationships in a linear fashion).
▪SVM achieve a classification or regression
decision based on the value of the linear
combination of input features.
▪Because of their architectural similarities,
SVM are also closely associated with ANN.
Support Vector Machines (SVM)
▪Goal of SVM: to generate mathematical
functions that map input variables to desired
outputs for classification or regression type
prediction problems.
▪ First, SVM uses nonlinear kernel functions to
transform non-linear relationships among the
variables into linearly separable feature spaces.
▪ Then, the maximum-margin hyperplanes are
constructed to optimally separate different classes
from each other based on the training dataset.
▪SVM has solid mathematical foundation!
Support Vector Machines (SVM)
▪A hyperplane is a geometric concept used to
describe the separation surface between
different classes of things.
▪ In SVM, two parallel hyperplanes are constructed on
each side of the separation space with the aim of
maximizing the distance between them.
▪A kernel function in SVM uses the kernel trick
(a method for using a linear classifier algorithm
to solve a nonlinear problem)
▪ The most commonly used kernel function is the radial
basis function (RBF).
Support Vector Machines (SVM)
L1

M
X2 X2

ar
gi
L2

n
e
an
L3

l
rp
pe
hy
n
gi
ar
-m
um
im
ax
M
X1 X1

➢ Many linear classifiers (hyperplanes) may separate the data


Application Case 6.4
Managing Student Retention with
Predictive Modeling
Questions for Discussion
1. Why is attrition one of the most important issues in
higher education?
2. How can predictive analytics (ANN, SVM, and so
forth) be used to better manage student retention?
3. What are the main challenges and potential
solutions to the use of analytics in retention
management?
Application
Case 6.4

Managing Student
Retention with
Predictive Modeling
How Does an SVM Work?
▪Following a machine-learning process, an
SVM learns from the historic cases.
▪The Process of Building SVM
1. Preprocess the data
▪ Scrub and transform the data.
2. Develop the model.
▪ Select the kernel type (RBF is often a natural choice---Radial Basis
function)
▪ Determine the kernel parameters for the selected kernel type.
▪ If the results are satisfactory, finalize the model; otherwise change
the kernel type and/or kernel parameters to achieve the desired
accuracy level.
3. Extract and deploy the model.
The Process of Building an SVM
Pre-Process the Data
Training
ü Scrub the data
data
“Identify and handle missing,
incorrect, and noisy”
ü Transform the data
“Numerisize, normalize and
standardize the data”

Pre-processed data

Develop the Model


Experimentation
ü Select the kernel type “Training/Testing”
“Choose from RBF, Sigmoid
or Polynomial kernel types”
ü Determine the kernel values
“Use v-fold cross validation or
employ ‘grid-search’”

Validated SVM model

Deploy the Model


Prediction
ü Extract the model coefficients Model
ü Code the trained model into
the decision support system
ü Monitor and maintain the
model
SVM Applications
▪SVMs are the most widely used kernel-learning
algorithms for wide range of classification and
regression problems
▪SVMs represent the state-of-the-art by virtue of
their excellent generalization performance, superior
prediction power, ease of use, and rigorous
theoretical foundation
▪Most comparative studies show its superiority in
both regression and classification type prediction
problems.
▪SVM versus ANN?
k-Nearest Neighbor Method (k-NN)
▪ANNs and SVMs → time-demanding,
computationally intensive iterative derivations
▪k-NN is a simplistic and logical prediction
method, that produces very competitive results
▪k-NN is a prediction method for classification
as well as regression types (similar to ANN &
SVM)
▪k-NN is a type of instance-based learning (or
lazy learning) – most of the work takes place at
the time of prediction (not at modeling)
▪k : the number of neighbors used
k-Nearest Neighbor Method (k-NN)
Y

k=3

k=5
Yi

The answer depends on


the value of k

Xi X
The Process of k-NN Method

Training Set
Parameter Setting

Historic Data ü Distance measure


ü Value of “k”

Validation Set

Predicting
Classify (or Forecast)
new cases using k
number of most
similar cases

New Data
k-NN Model Parameter
1. Similarity Measure: The Distance Metric

▪Numeric versus nominal values?


k-NN Model Parameter
2. Number of Neighbors (the value of k)
▪The best value depends on the data
▪Larger values reduce the effect of noise but
also make boundaries between classes less
distinct
▪An “optimal” value can be found heuristically
▪Cross Validation is often used to determine
the best value for k and the distance measure

You might also like