PAM All Files
PAM All Files
PAM All Files
Dr. P.K.Viswanathan
Professor(Analytics)
Present Competitive Environment
has been witnessing cornucopia of
Data that is increasing at an
astonishing rate beyond
human imagination.
AI is the New Electricity
AI
ML
▪ Artificial Intelligence(AI is the major field)
▪ Machine Learning(ML) is a subfield of AI
▪ Deep Learning(DL) is a subfield of ML
DL
Pillars of Analytics
▪ Can you predict when the market churn will take place so that my company
can take appropriate action to save a lot of money?
▪ What is the chance that a customer will default on a loan if I choose to give?
▪ What is the market demand for my new product that I would like to launch?
Why the term “Predictive Analytics”?
Predictive
Training Data Model
Algorithms
Data Set
• CART
• Neural Nets
• Random Forest
• Naive Bayes
Unsupervised Learning
Presentation
Dr. P.K.Viswanathan
Professor(Analytics)
Logistic Regression-Examples
Probability = Odds/(odds+1)
Odds = Probability/(1-Probability)
Why Odds Anyway?
𝑒𝑍
P=
(1+𝑒 𝑍 )
Z=b0+b1X1+b2X2+...bkXk
X1, X2, . . . , Xk are the predictor variables
𝐿𝑜𝑔𝐿 =∑Ylog(P)+∑(1-Y)Log(1-P)
Walk the Talk
Simmons Catalogue1
1.Adapted from Anderson, Sweeney, and Williams purely for classroom discussion
Simmons Catalogue-Continues
Simmons conducted a study by sending out 100
catalogs, 50 to customers who have a Simmons credit
card and 50 to customers who do not have the card.
At the end of the test period, Simmons noted for each of
the 100 customers:
1) the amount the customer spent last year at Simmons,
2) whether the customer had a Simmons credit card, and
3) whether the customer made a $200 purchase.
The data file that contains the information is in
Logit-Simmons.csv
• Books By Mail company is interested in offering a new title called The Art History of
Florence to 1000, existing customers. Of these, 83 actually purchased the book, a
response rate of 8.3 percent. Hence, the company sent a test mailing to them in
this regard. The company also sent out an identical mailing to another 1000
customers to serve as holdout sample. The scope of the study primarily confined to
predicting whether a customer will buy the new book or not is based on two input
variables namely months since last purchase and number of art books purchased. The
data files of the existing customers and the holdout sample are given in
“PaulBooks1.csv” and “Paulbooks2.csv” respectively.
Any Practical Value for Books By Mail?
Presentation
Dr. P.K.Viswanathan
Professor(Analytics)
What is Discriminant Analysis?
1. Profiling
2. Differentiation
3. Classification
Applications of LDA
▪ In a textile mill, cotton quality depends on the chemical characteristics. LDA
can create the score required. If the score is more than a threshold value
(Cut off Point), Cotton quality is Good else Bad.
▪ The discussion here will be confined to two groups only as most of the
applications involve a dichotomous situation. However, LDA can easily
handle multiple classes.
Math Behind LDA
𝑍 = 𝑎1 𝑥1 + 𝑎2 𝑥2 +. . . . . . +𝑎𝑘 𝑥𝑘
𝑍1 = 𝑎1 𝑥1 (𝐼) + 𝑎2 𝑥2(𝐼) +. . . . +𝑎𝑘 𝑥𝑘(𝐼)
𝑍2 = 𝑎1 𝑥1 (𝐼𝐼) + 𝑎2 𝑥2(𝐼𝐼) +. . . . +𝑎𝑘 𝑥𝑘(𝐼𝐼)
𝑍1 − 𝑍2 = |a1D1+a2D2+a3D3+…..+akDk| =|aD|
Where D1, D2, ..Dk are the difference in means between the two
groups for predictor variables x1, x2, …., xk respectively. The values of
a1,a2,…ak will be so chosen as to
Maximize 𝑍1 − 𝑍2
subject to the constraint Var(Z)=1
Data Set
The seminal paper of Altman[1] classified and predicted corporate bankruptcy
based on a set of financial ratios. Z score of Fisher’s linear discriminant analysis
was employed to classify the firm into either “Bankrupt” or “Solvent”. The data
used in the study were from manufacturing corporations. The data set has 33
bankrupt firms and 33 solvent firms. The central goal was whether the bankrupt
firms and solvent firms could be sharply differentiated (separated) in terms of five
financial ratios. They are Working Capital/Total Assets(WCTA), Retained
Earnings/Total Assets(RETA), Earnings Before Interest and Taxes/Total
Assets(EBITTA), Market Value of Equity/Book Value of Total Debt(MVEBVTD), and
Sales/Total Assets(SATA). Original data set has been obtained from Morriosn’s Book
on “Multivariate Statistical Analysis”.
[The abbreviations within brackets are made for ease of identifying the ratios].
Brief Description of the Ratios
▪ EBITTA: This ratio is calculated by dividing the total assets of a firm into its earnings
before interest and tax reductions. Since a firm's ultimate existence is based on the
earning power of its assets, this ratio appears to be particularly appropriate for
studies dealing with corporate failure
▪ MVEBVTD: This ratio measure shows how much the firm's assets can decline in
value (measured by market value of equity plus debt) before the liabilities exceed
the assets and the firm becomes insolvent. It also appears to be a more effective
predictor of bankruptcy than the more commonly used ratio: Net worth/Total debt
Brief Description of the Ratios
▪ SATA: The capital-turnover ratio is a standard financial ratio illustrating the sales
generating ability of the firm's assets. It is one measure of management's capability
in dealing with competitive conditions. This final ratio is quite important because of
its unique relationship to other variables in the model. Statistically speaking,
perhaps, this ratio would appear to be least significant in discriminating power.
Profiling-Descriptive
Group Means
WCTA RETA EBITTA MVEBVTD SATA
Bankrupt -6.05 -62.51 -31.78 40.05 1.50
Solvent 41.38 35.25 15.32 254.67 1.94
Differentiation-Visual-WCTA
Differentiation-Visual-RETA
Differentiation-Visual-EBITTA
Differentiation-Visual-MVEBVTD
Differentiation-Visual-SATA
LDA -Z score Equation
Z=0.0153WCTA+0.0183RETA+0.0418EBITTA+0.0077MVEBVTD+1.2543SATA
Accuracy(95.45%)
Correlation between DA( Z Scores) and
Input Variables in absolute terms)
Input DA Rank
Variable DA
WCTA 0.7304 3
RETA 0.8702 1
EBITTA 0.6809 4
MVEBVTD 0.7352 2
SATA 0.2589 5
ROC Curve
Discriminant Analysis
X2
✓
SVM Scenario
▪ Find lines that correctly classify the
training data
▪ Among all such lines, pick the one that
has the greatest distance to the points
closest to it (margin).
▪ The closest points that identify this line
are known as support vectors.
▪ region they define around the line is
known as the margin.
Sec. 15.1
wTxa + b = 1
d
wTxb + b = -1
• Hyperplane
wT x + b = 0
• This implies:
wT(xa–xb) = 2
d = ||xa–xb||= 2/||w||
wT x + b = 0
Math Behind SVM
• We can we can formulate the quadratic optimization problem:
• A better formulation
Y1(W1X1+W2X12+W3X3…..+WkXk+b)>=1
Y2(W1X1+W2X2+W3X3…..+WkXk+b)>=1
…………………………
………………………….
Yn(W1X1+W2X2+W3X3…..+WkXk+b)>=1
Learning SVM–Classroom Exercise-Walk the Talk
*The file DiscriWinstonFR.csv contains information on the following items about
24 companies: EBITASS(Earnings before Income and Taxes, divided by Total
Assets), ROTC(Return on Total Capital), and Group(1 for “Most Admired” and 2
for “Least Admired” Companies).
0.25
0.2
0.15
0.1
ROTC
Wx+b=0
0.05
0
-0.1 0 0.1 0.2 0.3
-0.05
-0.1
-0.15
EBITTAS
Hyperplane Separating the Data Points
With Decision Boundaries
0.3 Hyperplane Equation is 24.97X1+ 91.98X2 -14.49 0.3
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
-0.1 -0.05 0 0.05 0.1 0.15 0.2 0.25 0.3
-0.05 -0.05
-0.1 -0.1
-0.15 -0.15
Wx+b=0 Wx+b=-1 Wx+b=+1 ROTC
Confusion Matrix
Predicted
Actual Most Admired Least Admired
Most Admired 12 0
Least Admired 0 12
Acuracy 100.00%
Dataset with noise
OVERFITTING!
Hard Margin v.s. Soft Margin
◼ The old formulation:
Find w and b such that
Φ(w) =½ wTw is minimized and for all {(xi ,yi)}
yi (wTxi + b) ≥ 1
Φ: x → φ(x)
The “Kernel Trick”
◼ The linear classifier relies on dot product between vectors K(xi,xj)=xiTxj
◼ If every data point is mapped into high-dimensional space via some
transformation Φ: x → φ(x), the dot product becomes:
K(xi,xj)= φ(xi) Tφ(xj)
◼ A kernel function is some function that corresponds to an inner product in
some expanded feature space.
◼ Example:
2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xiTxj)2,
Need to show that K(xi,xj)= φ(xi) Tφ(xj):
K(xi,xj)=(1 + xiTxj)2,
= 1+ xi12xj12 + 2 xi1xj1 xi2xj2+ xi22xj22 + 2xi1xj1 + 2xi2xj2
= [1 xi12 √2 xi1xi2 xi22 √2xi1 √2xi2]T [1 xj12 √2 xj1xj2 xj22 √2xj1 √2xj2]
= φ(xi) Tφ(xj), where φ(x) = [1 x12 √2 x1x2 x22 √2x1 √2x2]
Examples of Kernel Functions
◼ Linear: K(xi,xj)= xi Txj
Presentation
Dr. P.K.Viswanathan
Professor(Analytics)
Components of a decision tree
Root node
3
Dataset
4
How decision tree works – step1
Sowmya Vivek
6
Measures of impurity
• Gini Index
• Entropy measure
7
1. Calculate GINI for overall rectangle
2 2
12 12
1− − 𝟎. 𝟓
24 24
8
2. Calculation of GINI Index for left and right
rectangles
GINI index for the left & right
2 2
2 2 11 5
7 1 1− − 0.43
1− − 𝟎. 𝟐𝟏𝟗 16 16
8 8
9
3. weighted average of impurity measures
2 2
2 2 11 5
7 1 1− − 0.43
1− − 𝟎. 𝟐𝟏𝟗 16 16
8 8
8 16
× 0.219 + × 0.43 = 0.359
24 24
10
GINI Index before & after the split
0.5 0.359
Combined impurity
Calculate GINI for Calculate GINI for Calculate GINI for of left + right –
overall rectangle left rectangle right rectangle weighted average of
impurity measures
12
1. Calculate Entropy for overall rectangle
12 12 12 12
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 =1
24 24 24 24
13
2. Calculation of entropy for left and right
rectangles
GINI index for the left & right
7 7 1 1
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 = 0.54
8 8 8 8
11 11 5 5
− × 𝑙𝑜𝑔2 +− × 𝑙𝑜𝑔2 = 0.89
16 16 16 16
14
3. weighted average of entropy
8 16
× 0.54 + × 0.89 = 0.779
24 24
15
Entropy before & after the split
1 0.779
15-Mar-22 18
Random Forests
Ensemble methods
We need to make sure they do not all just learn the same
Random Forest
• This is a widely used ensemble technique in view of its superior performance
and scalability.