0% found this document useful (0 votes)
8 views514 pages

Data Mining Reference Guide

The document is the Analytic Solver Data Mining Reference Guide, detailing features and functionalities of the software developed by Frontline Systems, Inc. It covers various topics including data sampling, data exploration, transformations, clustering, text mining, and time series analysis. The guide also includes acknowledgments, copyright information, and ordering details for the software.

Uploaded by

Nelson Morales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views514 pages

Data Mining Reference Guide

The document is the Analytic Solver Data Mining Reference Guide, detailing features and functionalities of the software developed by Frontline Systems, Inc. It covers various topics including data sampling, data exploration, transformations, clustering, text mining, and time series analysis. The guide also includes acknowledgments, copyright information, and ordering details for the software.

Uploaded by

Nelson Morales
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 514

Version 2020

Analytic Solver
Data Mining Reference Guide
Copyright
Software copyright 1991-2020 by Frontline Systems, Inc.
User Guide copyright 2020 by Frontline Systems, Inc.
GRG/LSGRG Solver: Portions copyright 1989 by Optimal Methods, Inc. SOCP Barrier Solver: Portions
copyright 2002 by Masakazu Muramatsu. LP/QP Solver: Portions copyright 2000-2010 by International
Business Machines Corp. and others. Neither the Software nor this User Guide may be copied, photocopied,
reproduced, translated, or reduced to any electronic medium or machine-readable form without the express
written consent of Frontline Systems, Inc., except as permitted by the Software License agreement below.

Trademarks
Frontline Solvers®, XLMiner®, Analytic Solver®, Risk Solver®, Premium Solver®, Solver SDK®, and RASON®
are trademarks of Frontline Systems, Inc. Windows and Excel are trademarks of Microsoft Corp. Gurobi is a
trademark of Gurobi Optimization, Inc. Knitro is a trademark of Artelys. MOSEK is a trademark of MOSEK
ApS. OptQuest is a trademark of OptTek Systems, Inc. XpressMP is a trademark of FICO, Inc.

Acknowledgements
Thanks to Dan Fylstra and the Frontline Systems development team for a 25-year cumulative effort to build the
best possible optimization and simulation software for Microsoft Excel. Thanks to Frontline’s customers who
have built many thousands of successful applications, and have given us many suggestions for improvements.
Risk Solver Pro and Risk Solver Platform have benefited from reviews, critiques, and suggestions from several
risk analysis experts:
• Sam Savage (Stanford Univ. and AnalyCorp Inc.) for Probability Management concepts including SIPs,
SLURPs, DISTs, and Certified Distributions.
• Sam Sugiyama (EC Risk USA & Europe LLC) for evaluation of advanced distributions, correlations, and
alternate parameters for continuous distributions.
• Savvakis C. Savvides for global bounds, censor bounds, base case values, the Normal Skewed distribution
and new risk measures.

How to Order
Contact Frontline Systems, Inc., P.O. Box 4288, Incline Village, NV 89450.
Tel (775) 831-0300 Fax (775) 831-0314 Email info@solver.com Web http://www.solver.com
Table of Contents
Table of Contents 3

Introduction to Analytic Solver Data Mining 16


Introduction................................................................................................................. 16
Ribbon Overview ........................................................................................................ 16
Data Mining Ribbon ...................................................................................... 16
Differences Between Desktop and Cloud Versions ....................................................... 18
AnalyticSolver.com ..................................................................................................... 19
Uploading a Dataset ....................................................................................... 19
Common Dialog Options ............................................................................................. 22
Worksheet ..................................................................................................... 22
Workbook...................................................................................................... 22
Data Range .................................................................................................... 22
# Rows # Cols............................................................................................... 22
First row contains headers .............................................................................. 22
Variables in the Input Data ............................................................................. 22
Selected Variables ......................................................................................... 23
Help .............................................................................................................. 23
Next .............................................................................................................. 23
Reset ............................................................................................................. 23
OK/Finish...................................................................................................... 23
Cancel ........................................................................................................... 23
References................................................................................................................... 23

Big Data Options 25


Introduction................................................................................................................. 25
Big Data Options ......................................................................................................... 25

Sampling or Importing from a Database, Worksheet or File Folder 29


Introduction................................................................................................................. 29
Sampling from a Worksheet......................................................................................... 30
Example: Sampling from a Worksheet using Simple Random Sampling ........ 30
Example: Sampling from a Worksheet using Sampling with Replacement...... 32
Example: Sampling from a Worksheet using Stratified Random Sampling ..... 33
Sample from Worksheet Options ................................................................................. 38
Data Range .................................................................................................... 38
First Row Contains Headers ........................................................................... 38
Variables ....................................................................................................... 38
Sample With replacement .............................................................................. 39
Set Seed......................................................................................................... 39
Desired sample size ....................................................................................... 39
Simple random sampling................................................................................ 39
Stratified random sampling ............................................................................ 39
Stratum Variable ............................................................................................ 39

Frontline Solvers Analytic Solver Data Mining Reference Guide 3


Proportionate to stratum size .......................................................................... 39
Equal from each stratum, please specify #records ........................................... 39
Equal from each stratum, #records = smallest stratum size .............................. 40
Sampling from a Database ........................................................................................... 40
Importing from a File Folder........................................................................................ 42
Importing from File Folder Options ............................................................................. 45
Directory ....................................................................................................... 46
Files .............................................................................................................. 46
Selected Files................................................................................................. 46
Import selected files ....................................................................................... 46
Sample from selected files ............................................................................. 46
Sample With replacement .............................................................................. 47
Desired sample size ....................................................................................... 47
Simple random sampling................................................................................ 47
Set Seed......................................................................................................... 47
Output ........................................................................................................... 47

Exploring Data 48
Introduction................................................................................................................. 48
Feature Selection ......................................................................................................... 48
Feature Selection Example ............................................................................. 49
Feature Selection Options .............................................................................. 66
Variables listbox ............................................................................................ 67
Continuous Variables listbox.......................................................................... 67
Categorical Variables listbox.......................................................................... 67
Output Variable ............................................................................................. 67
Output Variable Type .................................................................................... 68
Discretize predictors ...................................................................................... 69
Discretize output variable............................................................................... 70
Pearson correlation ........................................................................................ 70
Spearman rank correlation.............................................................................. 70
Kendall concordance...................................................................................... 70
Welch’s Test.................................................................................................. 71
F-Statistic ...................................................................................................... 71
Fisher score ................................................................................................... 71
Chi-Squared................................................................................................... 71
Cramer's V..................................................................................................... 71
Mutual information ........................................................................................ 71
Gain ratio ...................................................................................................... 72
Gini index...................................................................................................... 72
Table of all produced measures ...................................................................... 72
Top features table .......................................................................................... 73
Feature importance plot.................................................................................. 73
Number of features ........................................................................................ 73
Rank By ........................................................................................................ 73
Chart Wizard ............................................................................................................... 73
Bar Chart ....................................................................................................... 79
Box Whisker Plot Example ............................................................................ 82
Histogram Example ..................................................................................................... 85
Line Chart Example..................................................................................................... 89
Parallel Coordinates Chart Example ............................................................................. 92
ScatterPlot Example .................................................................................................... 95
Scatterplot Matrix Plot Example .................................................................................. 98
Variable Plot Example ............................................................................................... 101
Export to Power BI .................................................................................................... 103

Frontline Solvers Analytic Solver Data Mining Reference Guide 4


Export to Tableau ...................................................................................................... 107
Common Chart Options ............................................................................................. 110
Common Chart Options ............................................................................... 110
Data Mining Cloud Chart Options ................................................................ 113

Transforming Datasets with Missing or Invalid Data 115


Introduction............................................................................................................... 115
Missing Data Handling Examples .............................................................................. 115
Options for Missing Data Handling............................................................................ 128
Missing Values are represented by this value ................................................ 128
Overwrite existing worksheet ....................................................................... 128
Variable names in the first Row ................................................................... 129
Variables ..................................................................................................... 129
How do you want to handle missing values for the selected variable(s)? ....... 129
Apply to selected variable(s) ........................................................................ 129
Reset ........................................................................................................... 129
OK .............................................................................................................. 129

Transform Continuous Data 130


Introduction............................................................................................................... 130
Bin Continuous Data .................................................................................... 130
Rescale Continuous Data ............................................................................. 130
Examples for Binning Continuous Data ..................................................................... 131
Options for Binning Continuous Data ........................................................................ 140
Variable names in the first row ..................................................................... 141
Binned Variable Name ................................................................................. 141
Show binning intervals in the output ............................................................ 141
Name of binned variable .............................................................................. 141
#bins for variable ......................................................................................... 141
Equal count.................................................................................................. 142
Equal interval .............................................................................................. 142
Rank of the bin ............................................................................................ 142
Mean of the bin............................................................................................ 142
Median of the bin ......................................................................................... 142
Mid Value ................................................................................................... 142
Apply to Selected Variable........................................................................... 142
Examples for Rescaling Continuous Data................................................................... 142
Rescaling Options ..................................................................................................... 146
Partition Data............................................................................................... 148
Rescaling: Fitting ........................................................................................ 149
Show Fitted Statistics................................................................................... 150
Partitioned Data ........................................................................................... 150
New Data .................................................................................................... 151

Transforming Categorical Data 152


Introduction............................................................................................................... 152
Transforming Categorical Data Examples .................................................................. 153
Options for Transforming Categorical Data ................................................................ 160
Data Range .................................................................................................. 160
First row contains headers ............................................................................ 161
Variables ..................................................................................................... 161
Variables to be factored ............................................................................... 161
Assign Numbers Options ............................................................................. 161
Category variable ......................................................................................... 162

Frontline Solvers Analytic Solver Data Mining Reference Guide 5


Assign Category .......................................................................................... 162
Limit number of categories to....................................................................... 162
Assign Category ID ..................................................................................... 162
Apply .......................................................................................................... 163
Reset ........................................................................................................... 163

Principal Components Analysis 164


Introduction............................................................................................................... 164
Examples for Principal Components .......................................................................... 166
Options for Principal Components Analysis ............................................................... 171
Principal Components .................................................................................. 172
Smallest #components explaining................................................................. 172
Method ........................................................................................................ 172
Show principal components score................................................................. 173
Show Q - Statistics....................................................................................... 173
Show Hotteling’s T-Squared Statistics.......................................................... 173

k-Means Clustering 174


Introduction............................................................................................................... 174
Example for k-Means Clustering................................................................................ 174
k-Means Clustering Options ...................................................................................... 179
Normalize input data .................................................................................... 179
# Clusters .................................................................................................... 179
# Iterations................................................................................................... 180
Options........................................................................................................ 180
Set Seed....................................................................................................... 180
Show data summary ..................................................................................... 181
Show distances from each cluster center ....................................................... 181

Hierarchical Clustering 182


Introduction............................................................................................................... 182
Agglomerative methods ............................................................................... 182
Single linkage clustering .............................................................................. 183
Complete linkage clustering ......................................................................... 183
Average linkage clustering ........................................................................... 184
Centroid Method.......................................................................................... 185
Ward's hierarchical clustering method .......................................................... 185
McQuitty's Method ...................................................................................... 186
Median Method ........................................................................................... 186
Examples of Hierarchical Clustering .......................................................................... 186
Options for Hierarchical Clustering............................................................................ 195
Data Type .................................................................................................... 196
Normalize input data .................................................................................... 197
Similarity Measures ..................................................................................... 197
Clustering Method ....................................................................................... 197
Draw Dendrogram ....................................................................................... 198
Maximum Number of Leaves ....................................................................... 198
Show cluster membership ............................................................................ 198
Number of Clusters ...................................................................................... 198

Text Mining 199


Introduction............................................................................................................... 199
Text Mining Example ................................................................................................ 200

Frontline Solvers Analytic Solver Data Mining Reference Guide 6


Processing New Documents Based on an Existing Text Mining Model ....................... 217
Text Mining Options ................................................................................................. 221
Variables ..................................................................................................... 221
First Row Contains Headers ......................................................................... 221
Selected Text Variables................................................................................ 222
Text variables contain file paths ................................................................... 222
Selected Non-Text Variables ........................................................................ 222
Map text variables to an existing model ........................................................ 222
Select Model Worksheet .............................................................................. 223
Select Model Workbook .............................................................................. 223
Selected Text Variables................................................................................ 223
Model Text Variables .................................................................................. 223
Match Selected ............................................................................................ 223
Unmatch Selected ........................................................................................ 223
Unmatch All ................................................................................................ 223
Match By Name ........................................................................................... 223
Match Sequentially ...................................................................................... 223
Analyze All Terms....................................................................................... 224
Analyze specified terms only ....................................................................... 224
Start term/phrase .......................................................................................... 224
End term/phrase ........................................................................................... 225
Stopword removal........................................................................................ 225
Exclusion list ............................................................................................... 226
Vocabulary Reduction Advanced… ............................................................. 226
Maximum vocabulary size ........................................................................... 228
Perform stemming ....................................................................................... 228
Normalize case ............................................................................................ 228
Term Normalization Advanced…................................................................. 228
Remove terms occurring in less than __% of documents ............................... 229
Remove terms occurring in more than __% of documents ............................. 229
Maximum term length.................................................................................. 229
Term-Document Matrix Scheme .................................................................. 230
Perform latent semantic indexing ................................................................. 231
Concept Extraction – Latent Semantic Indexing ........................................... 232
Term-Document Matrix ............................................................................... 233
Concept-Document Matrix ........................................................................... 233
Term-Concept Matrix .................................................................................. 234
Term frequency table ................................................................................... 234
Most frequent terms ..................................................................................... 234
Full vocabulary ............................................................................................ 234
Maximum corresponding documents ............................................................ 234
Zipf’s plot.................................................................................................... 234
Show documents summary ........................................................................... 235
Keep a short except. Number of characters ................................................... 235
Scree Plot .................................................................................................... 235
Maximum number of concepts ..................................................................... 235
Terms scatter plot ........................................................................................ 235
Document scatter plot .................................................................................. 235
Concept importance ..................................................................................... 236
Term Importance ......................................................................................... 236
Write Text Mining Model ............................................................................ 236

Exploring a Time Series Dataset 237


Introduction............................................................................................................... 237
Autocorrelation (ACF) ................................................................................. 237

Frontline Solvers Analytic Solver Data Mining Reference Guide 7


Partial Autocorrelation Function (PACF)...................................................... 238
Autocovariance of Data (ACVF) .................................................................. 238
ARIMA ....................................................................................................... 239
Partitioning .................................................................................................. 239
Examples for Time Series Analysis ............................................................................ 239
Options for Exploring Time Series Datasets ............................................................... 246
Time variable............................................................................................... 246
Variables in the Partition Data...................................................................... 246
Specify Partitioning Options ........................................................................ 247
Specify Percentages for Partitioning ............................................................. 247
Variables in the input data ............................................................................ 247
Selected variable .......................................................................................... 247
Parameters: Training .................................................................................... 248
Parameters: Validation ................................................................................. 248
Plot ACF Chart ............................................................................................ 248
Plot PACF Chart .......................................................................................... 248
Plot ACVF Chart ......................................................................................... 248
Time Variable .............................................................................................. 249
Selected Variable ......................................................................................... 249
Fit seasonal model ....................................................................................... 249
Period .......................................................................................................... 249
Nonseasonal Parameters............................................................................... 249
Seasonal Parameters .................................................................................... 250
Maximum number of iterations .................................................................... 250
Fitted Values and residuals........................................................................... 250
Variance-covariance matrix.......................................................................... 250
Produce forecasts ......................................................................................... 250
Number of forecasts ..................................................................................... 250
Confidence level for forecast confidence intervals ........................................ 250

Smoothing Techniques 251


Introduction............................................................................................................... 251
Exponential smoothing ................................................................................ 251
Moving Average Smoothing ........................................................................ 251
Double exponential smoothing ..................................................................... 252
Holt Winters Smoothing .............................................................................. 252
Exponential Smoothing Example ............................................................................... 253
Moving Average Smoothing Example........................................................................ 258
Double Exponential Smoothing Example ................................................................... 263
Holt Winters Smoothing Example .............................................................................. 266
Common Smoothing Options..................................................................................... 274
Common Options......................................................................................... 274
First row contains headers ............................................................................ 274
Variables in Input Data ................................................................................ 275
Time variable............................................................................................... 275
Selected variable .......................................................................................... 275
Output Options ............................................................................................ 275
Exponential Smoothing Options................................................................................. 275
Optimize...................................................................................................... 275
Level (Alpha) .............................................................................................. 275
Moving Average Smoothing Options ......................................................................... 276
Interval ........................................................................................................ 276
Double Exponential Smoothing Options .................................................................... 276
Optimize...................................................................................................... 276
Level (Alpha) .............................................................................................. 276

Frontline Solvers Analytic Solver Data Mining Reference Guide 8


Trend (Beta) ................................................................................................ 276
Holt Winter Smoothing Options................................................................................. 277
Period .......................................................................................................... 277
Optimize...................................................................................................... 277
Level (Alpha) .............................................................................................. 277
Trend (Beta) ................................................................................................ 277
Seasonal (Gamma) ....................................................................................... 278
Produce Forecast.......................................................................................... 278
# Forecasts................................................................................................... 278

Data Mining Partitioning 279


Introduction............................................................................................................... 279
Training Set ................................................................................................. 279
Validation Set .............................................................................................. 279
Test Set ....................................................................................................... 279
Partition with Oversampling......................................................................... 280
Partition Options.......................................................................................... 281
Standard Data Partition Example ............................................................................... 281
Partition with Oversampling Example ........................................................................ 285
Standard Partitioning Options .................................................................................... 287
Use partition variable ................................................................................... 288
Set Seed....................................................................................................... 288
Pick up rows randomly ................................................................................ 289
Automatic percentages ................................................................................. 289
Specify percentages ..................................................................................... 289
Equal percentages ........................................................................................ 289
Partitioning with Oversampling Options .................................................................... 289
Set seed ....................................................................................................... 290
Output variable ............................................................................................ 290
#Classes ...................................................................................................... 290
Specify Success class ................................................................................... 291
% of success in data set ................................................................................ 291
Specify % success in training set .................................................................. 291
Specify % validation data to be taken away as test data ................................. 291

Discriminant Analysis Classification Method 292


Introduction............................................................................................................... 292
Discriminant Analysis Example ................................................................................. 292
Metrics ........................................................................................................ 299
Discriminant Analysis Options .................................................................................. 304
Variables in Input Data ................................................................................ 305
Selected Variables ....................................................................................... 305
Output Variable ........................................................................................... 305
Number of Classes ....................................................................................... 305
Success Class............................................................................................... 305
Success Probability Cutoff ........................................................................... 306
Partition Data............................................................................................... 306
Rescale Data ................................................................................................ 307
Prior Probability .......................................................................................... 307
Canonical Variate Analysis .......................................................................... 308
Show CVA Model ....................................................................................... 308
Show LDA Model........................................................................................ 308
Score Training Data ..................................................................................... 308
Score Validation Data .................................................................................. 308

Frontline Solvers Analytic Solver Data Mining Reference Guide 9


Score Test Data............................................................................................ 309
Score New Data ........................................................................................... 309

Logistic Regression 310


Introduction............................................................................................................... 310
Logistic Regression Example ..................................................................................... 310
Logistic Regression Options ...................................................................................... 324
Variables In Input Data ................................................................................ 325
Selected Variables ....................................................................................... 325
Weight Variable........................................................................................... 325
Output Variable ........................................................................................... 325
Number of Classes ....................................................................................... 325
Success Class............................................................................................... 325
Success Probability Cutoff ........................................................................... 325
Partition Data............................................................................................... 326
Rescale Data ................................................................................................ 326
Prior Probability .......................................................................................... 327
Partition Data............................................................................................... 327
Fit Intercept ................................................................................................. 327
Iterations (Max) ........................................................................................... 327
Variance – Covariance Matrix ...................................................................... 328
Multicollinearity Diagnostics ....................................................................... 328
Analysis Of Coefficients .............................................................................. 328
Feature Selection ......................................................................................... 328
Score Training Data ..................................................................................... 329
Score Validation Data .................................................................................. 330
Score Test Data............................................................................................ 330
Score New Data ........................................................................................... 330

k – Nearest Neighbors Classification Method 331


Introduction............................................................................................................... 331
k-Nearest Neighbors Classification Example .............................................................. 331
k-Nearest Neighbors Classification Options ............................................................... 337
Variables in input data ................................................................................. 338
Selected variables ........................................................................................ 338
Output variable ............................................................................................ 338
Number of Classes ....................................................................................... 338
Success Class............................................................................................... 338
Success Probability Cutoff ........................................................................... 338
# Neighbors (k)............................................................................................ 339
Nearest Neighbors Search ............................................................................ 339
Prior Probabilities ........................................................................................ 340
Rescale Data ................................................................................................ 340
Partition Data............................................................................................... 341
Score Training Data ..................................................................................... 341
Score Validation Data .................................................................................. 341
Score Test Data............................................................................................ 341
Score New Data ........................................................................................... 342

Classification Tree Classification Method 343


Introduction............................................................................................................... 343
Pruning the tree............................................................................................ 344
Single Tree Classification Tree Example .................................................................... 344
Classification Tree Options........................................................................................ 355

Frontline Solvers Analytic Solver Data Mining Reference Guide 10


Variables In Input Data ................................................................................ 356
Selected Variables ....................................................................................... 356
Output Variable ........................................................................................... 356
Categorical Variables ................................................................................... 356
Number of Classes ....................................................................................... 356
Success Class............................................................................................... 357
Success Probability Cutoff ........................................................................... 357
Partition Data............................................................................................... 357
Rescale Data ................................................................................................ 357
Tree Growth ................................................................................................ 358
Prior Probability .......................................................................................... 358
Prune (Using Validation Set)........................................................................ 359
Show Feature Importance............................................................................. 359
Maximum Number of Levels ....................................................................... 359
Trees to Display ........................................................................................... 359
Score Training Data ..................................................................................... 359
Score validation data .................................................................................... 360
Score test data.............................................................................................. 360
Score new data............................................................................................. 360

Naïve Bayes Classification Method 361


Introduction............................................................................................................... 361
Bayes Theorem ............................................................................................ 361
Naïve Bayes Classification Example .......................................................................... 362
Naïve Bayes Classification Method Options ............................................................... 371
Variables in input data ................................................................................. 372
Selected Variables ....................................................................................... 372
Output Variable ........................................................................................... 372
Number of Classes ....................................................................................... 372
Success Class............................................................................................... 372
Success Probability Cutoff ........................................................................... 373
Partition Data............................................................................................... 373
Prior Probability .......................................................................................... 373
Laplace Smoothing ...................................................................................... 374
Show Prior Conditional Probability .............................................................. 374
Show Log-Density ....................................................................................... 375
Score Training Data ..................................................................................... 375
Score Validation Data .................................................................................. 375
Score Test Data............................................................................................ 375
Score New Data ........................................................................................... 375

Neural Network Classification Method 376


Introduction............................................................................................................... 376
Training an Artificial Neural Network .......................................................... 377
The Iterative Learning Process ..................................................................... 377
Feedforward, Back-Propagation ................................................................... 378
Structuring the Network ............................................................................... 378
Automated Neural Network Classification Example ................................................... 379
Manual Neural Network Classification Example ........................................................ 385
NNC with Output Variable Containing 2 Classes ....................................................... 391
Neural Network Classification Method Options ......................................................... 394
Variables In Input Data ................................................................................ 394
Selected Variables ....................................................................................... 395
Categorical Variables ................................................................................... 395

Frontline Solvers Analytic Solver Data Mining Reference Guide 11


Output Variable ........................................................................................... 395
Number of Classes ....................................................................................... 395
Success Class............................................................................................... 395
Success Probability Cutoff ........................................................................... 395
Partition Data............................................................................................... 396
Rescale Data ................................................................................................ 396
Hidden Layers/Neurons ............................................................................... 397
Hidden Layer ............................................................................................... 397
Output Layer ............................................................................................... 397
Prior Probability .......................................................................................... 397
Neuron Weight Initialization Seed................................................................ 398
Training Parameters ..................................................................................... 398
Stopping Rules ............................................................................................ 400
Score Training Data ..................................................................................... 401
Score Validation Data .................................................................................. 401
Score Test Data............................................................................................ 402
Score New Data ........................................................................................... 402

Ensemble Methods for Classification 403


Boosting Ensemble Method Example......................................................................... 405
Bagging Ensemble Method for Classification ............................................................. 412
Random Trees Ensemble Method Example ................................................................ 419
Classification Ensemble Methods Options.................................................................. 426
Variables In Input Data ................................................................................ 426
Selected Variables ....................................................................................... 427
Categorical Variables ................................................................................... 427
Output Variable ........................................................................................... 427
Number of Classes ....................................................................................... 427
Success Class............................................................................................... 427
Success Probability Cutoff ........................................................................... 427
Partition Data............................................................................................... 428
Rescale Data ................................................................................................ 428
Number of Weak Learners ........................................................................... 429
Weak Learner .............................................................................................. 429
AdaBoost Variant ........................................................................................ 429
Random Seed for Resampling ...................................................................... 429
Show Weak Learner..................................................................................... 429
Random Seed for Boostrapping .................................................................... 430
Number of Randomly Selected Features ....................................................... 431
Random Seed for Featured Selection ............................................................ 431
Score Training Data ..................................................................................... 432
Score Validation Data .................................................................................. 432
Score Test Data............................................................................................ 432
Score New Data ........................................................................................... 432

Linear Regression Method 433


Introduction............................................................................................................... 433
Linear Regression Options ......................................................................................... 433
Variables Input Data .................................................................................... 434
Selected Variables ....................................................................................... 434
Weight Variable........................................................................................... 434
Output Variable ........................................................................................... 434
Partition Data............................................................................................... 435
Rescale Data ................................................................................................ 435

Frontline Solvers Analytic Solver Data Mining Reference Guide 12


Fit Intercept ................................................................................................. 436
Feature Selection ......................................................................................... 436
Regression Display ...................................................................................... 437
Score Training Data ..................................................................................... 438
Score Validation Data .................................................................................. 438
Score Test Data............................................................................................ 438
Score New Data ........................................................................................... 438

k-Nearest Neighbors Regression Method 439


Introduction............................................................................................................... 439
k-Nearest Neighbors Regression Method Example ..................................................... 439
k-Nearest Neighbors Regression Method Options ...................................................... 447
Variables In Input Data ................................................................................ 447
Selected Variables ....................................................................................... 447
Output Variable ........................................................................................... 448
Partition Data............................................................................................... 448
Rescale Data ................................................................................................ 448
# Neighbors (k)............................................................................................ 448
Nearest Neighbors Search ............................................................................ 449
Score Training Data ..................................................................................... 449
Score Validation Data .................................................................................. 449
Score Test Data............................................................................................ 449
Score New Data ........................................................................................... 449

Regression Tree Method 451


Introduction............................................................................................................... 451
Methodology ............................................................................................... 451
Pruning the tree............................................................................................ 451
Single Tree Regression Tree Example ........................................................................ 452
Regression Tree Options............................................................................................ 460
Variables in input data ................................................................................. 460
Selected variables ........................................................................................ 461
Output Variable ........................................................................................... 461
Partition Data............................................................................................... 461
Rescale Data ................................................................................................ 462
Tree Growth ................................................................................................ 462
Prune (Using Validation Set)........................................................................ 462
Show Feature Importance............................................................................. 462
Maximum Number of Levels ....................................................................... 462
Score Training Data ..................................................................................... 463
Score Validation Data .................................................................................. 463
Score Test Data............................................................................................ 463
Score New Data ........................................................................................... 463

Neural Network Regression Method 464


Introduction............................................................................................................... 464
Training an Artificial Neural Network .......................................................... 465
The Iterative Learning Process ..................................................................... 465
Feedforward, Back-Propagation ................................................................... 466
Structuring the Network ............................................................................... 466
Automated Neural Network Regression Method Example .......................................... 467
Manual Neural Network Regression Method Example ............................................... 472
Neural Network Regression Method Options ............................................................. 479
Variables In Input Data ................................................................................ 480

Frontline Solvers Analytic Solver Data Mining Reference Guide 13


Selected Variables ....................................................................................... 480
Categorical Variables ................................................................................... 480
Output Variable ........................................................................................... 480
Partition Data............................................................................................... 481
Rescale Data ................................................................................................ 481
Hidden Layers/Neurons ............................................................................... 482
Hidden Layer ............................................................................................... 482
Output Layer ............................................................................................... 482
Training Parameters ..................................................................................... 483
Stopping Rules ............................................................................................ 484
Score Training Data ..................................................................................... 486
Score Validation Data .................................................................................. 486
Score Test Data............................................................................................ 486
Score New Data ........................................................................................... 486

Ensemble Methods 487


Boosting Regression Example.................................................................................... 488
Bagging Regression Example .................................................................................... 495
Random Trees Ensemble Method Example ................................................................ 500
Ensemble Methods for Regression Options ................................................................ 503
Variables In Input Data ................................................................................ 504
Selected Variables ....................................................................................... 504
Categorical Variables ................................................................................... 504
Output Variable ........................................................................................... 504
Partition Data............................................................................................... 505
Rescale Data ................................................................................................ 505
Number of Weak Learners ........................................................................... 506
Weak Learner .............................................................................................. 506
Step Size...................................................................................................... 506
Show Weak Learner..................................................................................... 506
Random Seed for Boostrapping .................................................................... 507
Number of Randomly Selected Features ....................................................... 508
Random Seed for Feature Selection .............................................................. 508
Score Training Data ..................................................................................... 509
Score Validation Data .................................................................................. 509
Score Test Data............................................................................................ 509
Score New Data ........................................................................................... 509

Association Rules 510


Introduction............................................................................................................... 510
Association Rules Example........................................................................................ 511
Association Rules Options ......................................................................................... 513
Input data format ......................................................................................... 513
Minimum support (# transactions) ................................................................ 513
Minimum confidence (%) ............................................................................ 513

Frontline Solvers Analytic Solver Data Mining Reference Guide 14


Frontline Solvers Analytic Solver Data Mining Reference Guide 15
Introduction to Analytic Solver
Data Mining

Introduction
Analytic Solver Data Mining V2020 comes in two versions: Analytic Solver
Desktop – a traditional “COM add-in” that works only in Microsoft Excel for
Windows PCs (desktops and laptops), and Analytic Solver Cloud – a modern
“JavaScript add-in” that works in Excel for Windows and Excel for Macintosh
(desktops and laptops), and also in Excel for the Web (formerly Excel Online)
using Web browsers such as Chrome, FireFox and Safari. Your license gives
you access to both versions, and your Excel workbooks and optimization,
simulation and data mining models work in both versions, no matter where you
save them (though OneDrive is most convenient).
This Reference Guide gives step-by-step instructions on how to utilize the data
mining and predictive methods and algorithms included in both the Desktop and
Cloud applications. The overwhelming majority of features in Analytic Solver
Desktop are also included in the Cloud app. However, there are a few variations
between the two products. This guide documents any key differences between
the two products.

Ribbon Overview
Analytic Solver Data Mining, previously referred to as XLMinerTM, is a
comprehensive data mining software package for use in the Cloud or as an add-
in to Excel. Data mining is a discovery-driven data analysis technology used for
identifying patterns and relationships in data sets. With overwhelming amounts
of data now available from transaction systems and external data sources,
organizations are presented with increasing opportunities to understand their
data and gain insights into it. Data mining is still an emerging field, and is a
convergence of fields like statistics, machine learning, and artificial intelligence.
Often, there may be more than one approach to a problem. Analytic Solver Data
Mining is a tool belt to help you get started quickly offering a variety of
methods to analyze your data. It has extensive coverage of statistical and
machine learning techniques for classification, prediction, affinity analysis and
data exploration and reduction.

Data Mining Ribbon


The Data Mining ribbon is divided into 5 sections: Get Data, Data Analysis,
Time Series, Data Mining and Tools. A ribbon from each application (Desktop
and Cloud) is shown below and on the next page. Notice that these Ribbons are
almost identical. This is "by design" to ensure an almost seamless integration
for our users.

Frontline Solvers Analytic Solver Data Mining Reference Guide 16


Desktop Analytic Solver Data Mining

Data Mining Cloud

• Click the Model button to display the Solver Task Pane. This new feature
(added in V2016) allows you to quickly navigate through datasets and
worksheets containing Analytic Solver Data Mining results.
• Click the Get Data button to draw a random sample of data, or summarize
data from a (i) an Excel worksheet, (ii) the PowerPivot “spreadsheet data
model” which can hold 10 to 100 million rows of data in Excel, (iii) an
external SQL database such as Oracle, DB2 or SQL Server, or (iv) a dataset
with up to billions of rows, stored across many hard disks in an external Big
Data compute cluster running Apache Spark (https://spark.apache.org/).
• You can use the Data Analysis group of buttons to explore your data, both
visually and through methods like cluster analysis, transform your data with
methods like Principal Components, Missing Value imputation, Binning
continuous data, and Transforming categorical data, or use the Text Mining
feature to extract information from text documents.
• Use the Time Series group of buttons for time series forecasting, using both
Exponential Smoothing (including Holt-Winters) and ARIMA (Auto-
Regressive Integrated Moving Average) models, the two most popular time
series forecasting methods from classical statistics. These methods forecast
a single data series forward in time.
• The Data Mining group of buttons give you access to a broad range of
methods for prediction, classification and affinity analysis, from both
classical statistics and data mining. These methods use multiple input
variables to predict an outcome variable or classify the outcome into one of
several categories. Introduced in V2015, Analytic Solver Data Mining and
now the Data Mining Cloud app, offer Ensemble Methods for use with
Classification Trees, Regression Trees, and Neural Networks.
• Use the Predict button to build prediction models using Multiple Linear
Regression (with variable subset selection and diagnostics), k-Nearest
Neighbors, Regression Trees, and Neural Networks. Use Ensemble
Methods with Regression Trees and Neural Networks to create more
accurate prediction models.
• Use the Classify button to build classification models with Discriminant
Analysis, Logistic Regression, k-Nearest Neighbors, Classification Trees,
Naïve Bayes, and Neural Networks. Use Ensemble Methods with
Classification Trees and Neural Networks to create more accurate
classification models.
• Use the Associate button to perform affinity analysis (“what goes with
what” or market basket analysis) using Association Rules.
If forecasting and data mining are new for you, don’t worry – you can learn a lot
about them by consulting our extensive in-product Help. Click Help – Help

Frontline Solvers Analytic Solver Data Mining Reference Guide 17


Text on the Data Mining tab, or click Help – Help Text – Forecasting/Data
Mining on the Analytic Solver tab (these open the same Help file).
If you’d like to learn more and get started as a ‘data scientist,’ consult the
excellent book Data Mining for Business Intelligence, which was written by the
original Data Mining (formally known as XLMiner) designers and early
academic users. You’ll be able to run all the Data Mining examples and
exercises in Analytic Solver.
Analytic Solver Data Mining, along with the Data Mining Cloud app, can be
purchased as a stand-alone product. A stand-alone license for Analytic Solver
Data Mining includes all of the data analysis, time series data capabilities,
classification and prediction features available in Analytic Solver
Comprehensive but does not support optimization or simulation. See the
Analytic Solver Data Mining User Guide Data Specifications for each product.

Differences Between Desktop and Cloud Versions


Analytic Solver Data Mining was constructed form the "ground up" to be as
similar to Desktop Analytic Solver Data Mining as possible. Ultimately,
however, a few differences have arisen. They are noted here and also
throughout this guide and the Data Mining User Guide where applicable.

Differences
• Minor Ribbon Differences:
o The Text Mining icon is located in the Text section of the Ribbon
o Standard Partitioning is located in the Partition section of the
Ribbon
o The Tools section of the Ribbon includes Score
• A new icon, License, has been added. Click this icon to manage your
Analytic Solver licenses. See the section "License in the Cloud Apps"
within this guide for more information.
• The Options button has been removed. All menu items previously
appearing on this menu, now appear on the License or Help menus.
• Workflows created in Data Mining Cloud are not supported in Analytic
Solver Desktop.
• A Help Center has been added to the Help menu. Click Help – Help Center
to find example models, start a live chat, open user guides, listen to
recorded and live webinars, etc. In the Cloud app, items such as Welcome
Screen, Help Text and About Data Mining are not applicable and do not
appear on the menu, while User Guide and Reference Guide have been
combined to simply User Guides. See the section "Help in the Cloud Apps"
within this guide for more information.
• Sampling from a File Folder or Database is not supported.
• Scoring to a database is also not supported.
• To view output charts, click the Charts icon on the ribbon, select the desired
output sheet for Worksheet and the desired chart for Chart. Viewing output
charts is documented in each of the chapters included in this guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 18


• The Model tab on the Solver Task Pane does not exist in the Cloud app,
only the Workflow tab. See the "Model" section within this guide for more
information.
• Dynamic Arrays for the Psi Data Mining Functions (PsiForecast, PsiPredict,
PsiPosteriors, and PsiTransform) are supported in the Data Mining Cloud
app. To use a Dynamic Array in place of an Excel Control array, simply
enter the Psi Data Mining Function into one cell. The Dynamic Array will
"spill" down. See the section Using Data Mining Psi Functions in Excel
within the Data Mining User Guide for more information.

AnalyticSolver.com
With your free trial or paid license, you can use Analytic Solver in desktop
Excel, and its cloud-based counterparts: AnalyticSolver.com and Analytic
Solver Cloud. AnalyticSolver.com is a comprehensive, SaaS (Software as a
Service) toolkit for predictive and prescriptive analytics that shares technology
with Analytic Solver desktop version.
Using AnalyticSolver.com, you can sample data from spreadsheets, SQL
databases and Power Pivot, explore your data visually, clean and transform your
data, and create, evaluate and apply a wide range of time series forecasting and
data mining models – from linear regression and logistic regression to
classification and regression trees, neural networks, and association
rules. Essentially, everything you can do in Excel using Analytic Solver Data
Mining, you can do using AnalyticSolver.com.
All examples within the Analytic Solver Data Mining User Guide or Analytic
Solver Data Mining Reference Guide can be completed either using
AnalyticSolver.com or desktop Analytic Solver Data Mining. However, if using
AnalyticSolver.com, you must first upload your dataset to Azure Cloud Storage
using the Upload File tool on the AnalyticSolver.com ribbon. Afterwards, all
tools and features available in desktop Analytic Solver Data Mining, will be
available to you in AnalyticSolver.com. See the example below which
illustrates how to upload a dataset.

Uploading a Dataset
To upload a dataset in AnalyticSolver.com, click Solver Home – Upload to open
the Upload File dialog. Browse to C:\Program Files\Frontline Systems\Analytic
Solver Platform\Datasets to open an example dataset. (If using 32 bit Excel on a
64 bit Operating System, browse to: C:\Program Files (x86)\Frontline
Systems\Analytic Solver Platform\Datasets.) Click the Open the dataset,
SandlerFilms.xlsx, then click Upload.

Frontline Solvers Analytic Solver Data Mining Reference Guide 19


Note: If loading a file containing delimiter-separated values, select Delimited
File. All options under File Delimiter will be enabled. Select the appropriate
delimiters for your particular file and then click Upload. For example, if loading
a CSV file where each record resides on its own line and each field is separated
by a comma, you would select Comma under Column Delimiter and Newline
under Row Delimiter.
The file is uploaded to the Files in My User Account drop down menu.

To view all records, click Open – From AnalyticSolver.com –


SandlerFilms.xlsx. (See the screenshot below.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 20


Click the Save button to save any changes made to the spreadsheet. To freeze
the row headers, select cell A2 and click Freeze Panes. The icon will
automatically toggle to "Unfreeze Panes". Click to unfreeze the top row.

From here, you can choose to explore the sampled data by creating
visualizations using the Chart Wizard, complete a data analysis with Analytic
Solver Data Mining 's feature selection tool, transform the data using the data
transformation utilities, build classification or prediction models to forecast
expected gross or opening night revenue or perform any other desired analytic
task(s) by clicking the down arrow under the appropriate icon. Analytic Solver
for the Web gives you the full data mining capability of desktop Analytic Solver
Data Mining, on any device, at any location.

The following chapters document each tool and feature included in desktop
Analytic Solver Data Mining and Analytic Solver for the Web.

Frontline Solvers Analytic Solver Data Mining Reference Guide 21


Common Dialog Options

These options, fields and command buttons appear on most Analytic Solver
Data Mining dialogs.

Worksheet
The active worksheet appears in this field.

Workbook
The active workbook appears in this field.

Data Range
The range of the dataset appears in this field.

# Rows # Cols
The number of rows and columns in the dataset appear in these two fields,
respectively.

First row contains headers


If this option is selected, variables will be listed according to the first row in
the dataset.

Variables in the Input Data


All variables contained in the dataset will be listed in this field.

Frontline Solvers Analytic Solver Data Mining Reference Guide 22


Selected Variables
Variables listed in this field will be included in the output. Select the
desired variables listed in the Variables In Input Data listbox, then click the
> button to shift variables to the Selected Variables field.

Help
Click this command button to open the Analytic Solver Data Mining Help
text file.

Next
Click this command button to progress to the next dialog.

Reset
Click this command button to reset the options for the selected method.

OK/Finish
Click this command button to initiate the desired method and produce the
output report.

Cancel
Click this command button to close the open dialog without saving any
options or creating an output report.

References
See below for a list of references sited when compiling this guide.
Websites
1. The Data & Analysis Center for Software. <https://www.thecsiac.com>
2. NEC Research Institute Research Index: The NECI Scientific Literature
Digital Library.
<http://www.iicm.tugraz.at/thesis/cguetl_diss/literatur/Kapitel02/URL/NEC
/cs.html>.
3. Thearling, Kurt. Data Mining and Analytic Technologies.
<http://www.thearling.com>
Books
1. Anderberg, Michael R. Cluster Analysis for Applications. Academic Press
(1973).
2. Berry, Michael J. A., Gordon S. Linoff. Mastering Data Mining. Wiley
(2000).

Frontline Solvers Analytic Solver Data Mining Reference Guide 23


3. Breiman, Leo Jerome H. Friedman, Richard A. Olshen, Charles J. Stone.
Classification and Regression Trees. Chapman & Hall/CRC (1998).
4. Han, Jiawei, Micheline Kamber. Data Mining: Concepts and Techniques.
Morgan Kaufmann Publishers (2000).
5. Hand, David, Heikki Mannila, Padhraic Smyth. Principles of Data Mining.
MIT Press, Cambridge" (2001).
6. Hastie, Trevor, Robert Tibshirani, Jerome Friedman. The Elements of
Statistical Learning: Data Mining, Inference, and Prediction. Springer, New
York (2001).
7. Shmueli, Galit, Nitin R. Patel, Peter C. Bruce. Data Mining for Business
Intelligence. Wiley, New Jersey (2010).

Frontline Solvers Analytic Solver Data Mining Reference Guide 24


Big Data Options

Introduction
Large amounts of data are being generated and collected continuously from a multitude of sources every minute of
every day. From your toothbrush to your vehicle GPS to Twitter/Facebook/Google/Yahoo, data is everywhere.
Being able to make decisions based on this information requires the ability to extract trends and patterns that can be
buried deeply within the numbers.
Generally these large datasets contain millions of records (rows) requiring multiple gigabytes or terabytes of storage
space across multiple hard drives in an external compute cluster. Analytic Solver Platform V2016-R3 enables users,
for the first time, to ‘pull’ sampled and summarized data into Excel from compute clusters running Apache Spark,
the open-source software widely embraced by Big Data vendors and users.
See the Analytic Solver Data Mining User Guide for a complete step by step example illustrating how to sample and
summarize big data using Analytic Solver. See the Big Data Options section below for a comprehensive
explanation of each option that appears on the Big Data dialogs.

Big Data Options


The following options are included in each of the five Big Data dialogs.

Sample Big Data, Data dialog


Option Name Sample Big Data Dialog Option Description – Data dialog

File Location Enter the location of the file here.

Credentials If your dataset is located on Amazon S3, click Credentials to enter your Access and
Secret Keys.

When All Variables is selected for this option, all columns (features) in the dataset
would be selected for the analysis without the need for the user to select the
variables.
Schema
When Select variables is selected for this option, the command button Infer
Schema is enabled. Once Infer Schema is clicked, schema (variables) will be
inferred from the dataset on the cluster and listed in the Variables grid. Users may
use the > and < buttons to select variables for inclusion in the sample.

Variables Variables available for inclusion in the sample will appear here. Use the > button
to select variables to be included in the sample.

Select Variables Variables transferred here will be included in the sample. Use the < button to
remove variables from the sample.

Clicking Submit sends a request for sampling to the compute cluster but does not
wait for completion. The result is output containing the Job ID and basic
Submit
information about the submitted job so different submissions may be identified.
This information can be used at any time later for querying the status of the job and
generating reports based on the results of the completed job.

Sends a request for sampling to the Apache Spark compute cluster where the
Run Frontline Systems access server is installed and waits for the results. Once the job
is completed and results are returned to the Analytic Solver Data Mining client, a
report is inserted into the Model tab of the Analytic Solver Task Pane under Data
Mining – Results - Sampling.

Cancel Click this command button to close the open dialog without saving any options or
creating an output report.

Frontline Solvers Analytic Solver Data Mining Reference Guide 25


Option Name Sample Big Data Option Description – Options dialog
If your data is in Apache Parquet format, select Parquet for this option. If your
data is in Delimited Text format, select Delimited text. When Delimited Text is
selected, the Format button will be enabled. Click, to open the Delimited Text
Format dialog. Here you can specify if your First row contains headers along
with the delimiter used in your data.

Data Format
Analytic Solver Data Mining can process data from Hadoop Distributed File
System (HDFS), local file systems that are visible to Spark cluster and Amazon
S3. Performance is best with HDFS, and it is recommended that you load data
from a local file system or Amazon S3 into HDFS. If the local file system is
used, the data must be accessible at the same path on all Spark workers, either via
a network path, or because it was copied to the same location on all workers.

At present, Analytic Solver Data Mining can process data in Apache Parquet and
CSV (delimited text) formats. Performance is far better with Parquet, which
stores data in a compressed, columnar representation; it is highly recommended
that you convert CSV data to Parquet before you seek to sample or summarize the
data.
If this option is selected, data records in the resulting sample will carry the correct
ordinal IDs that correspond to the original data records, so that records can be
Track Record IDs
matched. Note: Selecting this option may significantly increase running time so it
should be applied only when necessary.
Sample with When selected, records in the dataset may be chosen for inclusion in the sample
Replacement multiple times.
If an integer value appears for Random seed, Analytic Solver Data Mining will
use this value to set the feature selection random number seed. Setting the
random number seed to a nonzero value ensures that the same sequence of
random numbers is used each time the dataset is chosen for sampling. The default
Random Seed value is “12345”. If left blank, the random number generator is initialized from
the system clock, so the random sample would be represented by different records
from run to run. If you need the results from successive samples to be strictly
comparable, you should set the seed. To do this, type the desired number you
want into the box. This option accepts positive integers with up to 7 digits.
When this option is selected, Analytic Solver Data Mining will return a fixed –
Exact Sampling
size sampled subset of data according to the setting for Desired Sample Size.
Desired Sample Size Enter the number of records to be included in the sample.
When this option is selected, the size of the resultant sample will be determined
by the value entered for Desired Sample Fraction. Approximate sampling is
much faster than Exact Sampling. Usually, the resultant fraction is very close to
Approximate Sampling
the Desired Sample Fraction so this option should be preferred over exact
sampling as often as possible. Even if the resultant sample slightly deviates from
the desired size, this would be easy to correct in Excel.
This is the expected size of the sample as a fraction of the dataset's size. If
Sampling with Replacement is selected, the value for Desired Sample Fraction
Desired Sample must be greater than 0. If Sampling without replacement (i.e. Sampling with
Fraction Replacement is not selected), the Desired Sample Fraction becomes the
probability that each element is chosen and, as a result, Desired Sample Fraction
must be between 0 and 1.

Frontline Solvers Analytic Solver Data Mining Reference Guide 26


Big Data Get Results dialog
Option Name Big Data Get Results Option Description
Click the down arrow to the right of this option to obtain results from the
Job Identifier
previously submitted (using Get Data – Big Data – Sample/Summarize) job.
Click this command button to check the status of the previously submitted job.
The following information will be returned.
Application is the type of the submitted job.
Start Time displays the date and time when the job was submitted. Start Time
will always be displayed in the user's Local Time.
Duration shows the elapsed time since job submission if the job is still
Get Info RUNNING and total compute time if the job is FINISHED.
Status is the current state of the job: FINISHED, FAILED, ERRORED or
RUNNING. FINISHED indicates that the job has been completed and results
are available for retrieval. FAILED or ERRORED indicates that the job has not
completed due to an internal cluster failure. When this occurs, Details will
contain a message indicating the reason. In our example, this dialog displays the
status as FINISHED.
If Status is FINISHED, you may click the Get Results button to obtain the results
from the cluster and populate the report as shown below. Note: It is not required
to click Get Info before Get Results. If Get Results is clicked, the status of the job
Get Results will be checked and if the status is FINISHED, the results will be pulled from the
cluster and the report will be created. Otherwise, Status will be updated with the
appropriate message to reflect the state of the submitted job: FAILED,
ERRORED, or RUNNING.
Click this command button to close the open dialog without saving any options or
Cancel
creating an output report.
Help Click this command button to open the Analytic Solver Data Mining Help Text.

See the Sample Big Data, Data dialog above for all option explanations except Group Variables.
Summarize Big Data, Data dialog Option Name Summarize Big Data Options Dialog Description – Data dialog
Group Variables are variables from the dataset that are treated as key variables for
aggregation. In the screenshot above, two variables have been selected as Group
Group Variables Variables: Year and UniqueCarrier. The variables will be grouped so that all
records with the same Year and UniqueCarrier are included in the same group,
and then all aggregate functions for each group will be calculated.

Frontline Solvers Analytic Solver Data Mining Reference Guide 27


Summarize Big Data, Options dialog See the Sample Big Data, Data dialog above for all option explanations except Data Format,
Aggregation Type and Compute Group Counts.
Option Name Summarize Big Data Options Dialog Description – Options dialog
Data Format
See option description above.
Aggregation Type
Aggregation Type provides 5 statistics that can be inferred from the dataset: sum,
average, standard deviation, minimum and maximum.
Compute Group
Counts This option is enabled when 1 or more Grouping Variables is selected. When this
option is selected, the number of records belonging to each group is computed and
reported.

Frontline Solvers Analytic Solver Data Mining Reference Guide 28


Sampling or Importing from a
Database, Worksheet or File
Folder

Introduction
Sampling
A statistician often comes across huge volumes of information from which he or
she wants to draw inferences. Since time and cost limitations make it impossible
to go through every entry in these enormous datasets, statisticians must resort to
sampling techniques. These sampling techniques choose a reduced sample or
subset from the complete dataset. The statistician can then perform his or her
statistical procedures on this reduced dataset saving much time and money.
Let’s review a few statistical terms. The entire dataset is called the population.
A sample is the portion of the population that is actually examined. A good
sample should be a true representation of the population to avoid forming
misleading conclusions. Various methods and techniques have been developed
to ensure a representative sample is chosen from the population. A few are
discussed here.
• Simple Random Sampling This is probably the simplest method for
obtaining a good sample. A simple random sample of say, size n, is
chosen from the population in such a way that every random set of n
items from the population has an equal chance of being chosen to be
included in the sample. Thus simple random sampling not only avoids
bias in the choice of individual items but also gives every possible
sample an equal chance.
The Data Sampling utility in Analytic Solver Data Mining offers the
user the freedom to choose sample size, seed for randomization, and
sampling with or without replacement.
• Stratified Random Sampling In this technique, the population is first
divided into groups of similar items. These groups are called strata.
Each stratum, in turn, is sampled using simple random sampling. These
samples are then combined to form a stratified random sample.
The Data Sampling utility in Analytic Solver Data Mining offers the
user the freedom to choose a sorting seed for randomization and
sampling with or without replacement. The desired sample size can be
prefixed by the user depending on which method is being chosen for
stratified random sampling.
Analytic Solver Data Mining Desktop allows sampling either from a worksheet,
database or file folder. Analytic Solver Data Mining Cloud, does not support
sampling from a database or file folder.

Frontline Solvers Analytic Solver Data Mining Reference Guide 29


Importing from a File System in Desktop Addin
In order to run desktop Analytic Solver Data Mining’s Text Miner tool, we must
first import the text. Text may be present in a worksheet, as a column
(variable) where the cell in each row contains a comment, paragraph or other
free-form text or a text field in a database. In both of these cases, each text
‘document’ is naturally associated on each row / observation with other
structured input fields, and for supervised learning, with an outcome variable.
Note: If using Analytic Solver Data Mining within desktop Excel, we
recommend the use of Microsoft’s free Power Query add-in, or the facilities
built into the free Power Pivot add-in, for importing data from a wide range of
online and on premise databases into Excel’s “spreadsheet data model”, which
has no limit other than memory on the number of rows. Analytic Solver Data
Mining may be used to draw a random sample from this data, to be brought onto
a worksheet for analysis and model-building.
Text may also be present in a series of document files in a disk or network
folder (where each document represents an observation). In this case, the menu
option Get Data – File Folder may be used to read either all of the documents
in a folder, or a representative sample of these documents.
If the documents are relatively small (each one no more than 32,767 characters,
the limit on the length of a string in a single worksheet cell), the document
contents may be brought into the output. Otherwise, Analytic Solver Data
Mining can import only the document filenames/paths into the output; the
document contents are read only during the text mining operation, and document
size is not limited except by memory and time.
The output produced by Get Data – File Folder (see the example below) may
be used directly as input to the text mining operation, described later on in this
guide. But if there are other structured fields, or an outcome variable in the
dataset, it is up to the user to assemble a worksheet that associates each
document with the correct observation for the other fields. Excel’s many tools
for editing rows, columns and sheets can aid considerably in this process, but it
is not automatic. Note: AnalyticSolver.com does not support data manipulation
of this type. If you are using AnalyticSolver.com or the Data Mining Cloud app,
you can perform these edits in Excel, then upload the revised worksheet to
AnalyticSolver.com.

Sampling from a Worksheet


Below are three examples that illustrate how to perform Simple Random
Sampling with and without replacement and Stratified Random Sampling from a
worksheet. Each example uses the sample dataset in the Sampling.xlsx example
workbook.
To open this workbook, click Help – Example Models on the Data Mining
Desktop Ribbon, then Forecasting/Data Mining Examples – Sampling.

Example: Sampling from a Worksheet using


Simple Random Sampling
The Sampling.xlsx dataset contains a variable ID for the record identification
and seven variables, v1, v2, v7, v8, v9, v10, v11.

Frontline Solvers Analytic Solver Data Mining Reference Guide 30


To start, click Get Data – Worksheet. In this example, the default option,
Simple Random Sampling, is used. Select all variables under Variables, click >
to include them in the sample data, then click OK.

The result Sampling will be inserted in the Model tab of the Analytic Solver task
pane under Transformations – Sample From Worksheet. A portion of the output
is shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 31


The output is a simple random sample without replacement, with a default
random seed setting of 12345. The desired sample size is 86 records as shown in
the Sample Size field.

Example: Sampling from a Worksheet using


Sampling with Replacement
This second example illustrates how to sample from a worksheet using sampling
with replacement.
Click Get Data – Worksheet to bring up the Sample From Worksheet dialog.
Again, select all variables in the Variables section and click > to include each in
the sample data. Check Sample with replacement and enter 300 for Desired
sample size. Since we are choosing sampling with replacement, Analytic Solver
Data Mining will generate a sample with a larger number of records than the
dataset. Click OK.

Frontline Solvers Analytic Solver Data Mining Reference Guide 32


The result, Sampling1, will be inserted into the Analytic Solver task pane under
Transformations - Sample From Worksheet. A portion of the output is shown
below.

The output indicates "True" for Sampling with Replacement. As a result, the
desired sample size is greater than the number of records in the input data (289
records vs a sample size of 300). Looking closely at the ID column, you’ll see
that multiple records have been sampled more than once.

Example: Sampling from a Worksheet using


Stratified Random Sampling
This example illustrates how to sample from a worksheet using stratified
random sampling.

Click Get Data – Worksheet and select all variables under Variables, click > to
include them in the sample data. Select Stratified random sampling.

Click the down arrow next to Stratum Variable and select v8. The strata number
is automatically displayed once you select v8. Keep the default setting selected,
Proportional to stratum size. Then click OK.

Frontline Solvers Analytic Solver Data Mining Reference Guide 33


The results, contained within Sampling2, is inserted into Transformations –
Sample From Worksheet. A portion of the output is shown below.

Analytic Solver Data Mining calculated the percentage representation of V8 in


the dataset and maintained that percentage in the sample.
Let’s see what happens to our output when we select a different option for
Stratified Sampling.
Click Get Data -- Worksheet. Select all variables under Variables, click > to
include them in the sample data. Select Stratified random sampling. Choose
v8 as the Stratum variable. The #strata is displayed automatically. Select Equal
from each stratum, please specify #records.

Frontline Solvers Analytic Solver Data Mining Reference Guide 34


Enter the #records. Remember, this number should not be greater than the
smallest stratum size. In this case the smallest stratum size is 8. (Note: The
smallest stratum size appears automatically in a box next to the option, Equal
from each stratum, please specify #records.) Enter 7, which is less than the limit
of 8, and then click OK.

Sampling3 will be inserted into the Analytic Solver task pane under
Transformations -- Sample From Worksheet. A portion of the output is shown
below.
As you can see in the output, the number of records in the sampled data is 56 or
7 records per stratum for 8 strata (7 * 8 = 56).

If a sample with an equal number of records for each stratum but of bigger size
is desired, use the same options above with Sampling with Replacement
selected.

Frontline Solvers Analytic Solver Data Mining Reference Guide 35


Click Get Data -- Worksheet once again. Select all variables under Variables,
click > to include them in the sample data. Select Sample with replacement
and Stratified random sampling. Select V8 for the Stratum variable. Select
Equal from each stratum, please specify #records and enter 20. Though the
smallest stratum size is 8 in this dataset, we can acquire more records for our
sample since we are Sampling with replacement. Keeping all other options the
same, the output is as follows. Click OK.

Click Transformations -- Sample From Worksheet in the Model tab of the


Analytic Solver Task Pane for the result, Sampling4. A portion of the output is
shown below.

Since the output sample has 20 records per stratum, the #records in sampled
data is 160 (20 records per stratum for 8 strata).

Frontline Solvers Analytic Solver Data Mining Reference Guide 36


Click Get Data -- Worksheet one last time. Select all variables under
Variables, click > to include them in the sample data. Select Stratified random
sampling. Select V8 for the Stratum variable and select Equal from each
stratum, # records = smallest stratum size. The edit box to the right of the
option is prefilled with the number 8. This is the smallest stratum size in the
dataset.

Keeping all other options the same, click OK. The output, found under
Transformations -- Sample From Worksheet in the Analytic Solver task pane, is
below.

Since the output sample has 8 records per stratum, the Sample Size is 64 (8
records per stratum for 8 strata).

Frontline Solvers Analytic Solver Data Mining Reference Guide 37


Sample from Worksheet Options
Please see below for a complete list of each option contained on the Sample
from Worksheet and Sample from Database dialogs.

Data Range
Either type the address directly into this field, or use the reference button, to
enter the data range from the worksheet or data set. If the cell pointer (active
cell) is already somewhere in the data range, Analytic Solver Data Mining
automatically picks up the contiguous data range surrounding the active cell.
After the data range is selected, Analytic Solver Data Mining displays the
number of records in the selected range.

First Row Contains Headers


When this box is checked, Analytic Solver Data Mining picks up the headings
from the first row of the selected data range. When the box is unchecked,
Analytic Solver Data Mining follows the default naming convention, i.e., the
variable in the first column of the selected range will be called "Var1", the
second column "Var2," etc.

Variables
This list box contains the names of the variables in the selected data range. If the
first row of the range contains the variable names, then these names appear in
this list box. If the first row of the dataset does not contain the headers, then
Analytic Solver Data Mining lists the variable names using its default naming
convention. In this case the first column is named Var1; the second column is
named Var2 and so on. To select a variable for sampling, select the variable,
then click the ">" button. Use the CTRL key to select multiple variables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 38


Sample With replacement
If this option is checked the data will be sampled with replacement. The default
is Sampling without Replacement.

Set Seed
Enter the desired sorting seed here. The default seed is 12345.

Desired sample size


Enter the desired sample size here. (Note that the actual sample size in the
output may vary a little, depending on additional options selected.)

Simple random sampling


The data is sorted using the simple random sampling technique, taking into
account the additional parameter settings.

Stratified random sampling


If selected, Analytic Solver Data Mining enables the Stratum Variable options.

Stratum Variable
Select the variable to be used for stratified random sampling by clicking the
down arrow and selecting the desired variable. As the user selects the variable
name, Analytic Solver Data Mining displays the #Strata that variable contains
in a box to the left and the smallest stratum size in a field beside the option
Equal from each stratum, #records = smallest stratum size. (Note: Analytic
Solver Comprehensive and Data Mining support an unlimited number of
variables each having an unlimited number of distinct values. Analytic Solver
Basic supports variables with 2 to 30 distinct values.)

Proportionate to stratum size


Analytic Solver Data Mining detects the proportion of each stratum in the
dataset and maintains the same in sampling. Due to this, Analytic Solver Data
Mining sometimes must increase the sample size in order to maintain the
proportionate stratum size.

Equal from each stratum, please specify


#records
On specifying the number of records, Analytic Solver Data Mining generates a
sample which has the same number of records from each stratum. In this case
the number chosen automatically decides the desired sample size. As a result,
the option to enter the desired sample size is disabled.

Frontline Solvers Analytic Solver Data Mining Reference Guide 39


Equal from each stratum, #records = smallest
stratum size
Analytic Solver Data Mining detects the smallest stratum size and generates a
sample wherein every stratum has a representation of that size. If this option is
selected, Sample with replacement and Desired sample size are both disabled.
Analytic Solver Data Mining performs the stratified random sampling with or
without replacement. If Sample with replacement is not selected, the desired
sample size must be less than the number of records in the dataset.
If Sample with Replacement is selected, Analytic Solver Data Mining is limited
to 1,000,000 records in the sample output.

Sampling from a Database


Click Get Data – Database on the Data Mining Desktop ribbon to display the
following dialog. Note: Sampling from a Database is currently not supported in
the Data Mining Cloud app.

Click the down arrow next to Data Source and select MS-Access, then click
Connect to database.

An Open file dialog opens, browse to C:\Program Files\Frontline


Systems\Analytic Solver Platform\Datasets. Select the Demo.mdb Microsoft
Access database, and then click Open.
Data is auto populated for Table/View.

Frontline Solvers Analytic Solver Data Mining Reference Guide 40


The Fields in table listbox is populated as shown in the screenshot below.

Select all fields from Fields in table and click > to move all fields to Selected
fields. Select ID as the Primary key. A Primary key must contain non-null and
unique values across all rows in the table.

Frontline Solvers Analytic Solver Data Mining Reference Guide 41


Click OK. A portion of the output is below.

For more information on the Sampling Options, refer to the examples above for
Sampling from a Worksheet. You can sample from a database using all the
methods described in this section.

Importing from a File Folder


The following example is used to illustrate how to import 1,000 text files saved
in the same file folder. Click Help – Example Models on the Data Mining
Desktop ribbon, then click Forecasting/Data Mining. Note: This functionality is
not supported in Data Mining Cloud app.
Browse to C:\Program Files\Frontline Systems\Analytic Solver
Platform\Datasets and open the Text Mining Example Documents.zip archive
file. Unzip the contents of this file to C:\Program Files\Frontline
Systems\Analytic Solver Platform\Datasets\Text Mining Example Documents \
(or a location of your choice). Four folders will be created beneath Text Mining
Example Documents: Autos, Electronics, Additional Autos and Additional
Electronics. One thousand, two hundred short text files will be extracted to the
location chosen. This example is based on the text dataset at
http://www.cs.cmu.edu/afs/cs/project/theo-20/www/data/news20.html, which

Frontline Solvers Analytic Solver Data Mining Reference Guide 42


consists of 20,000 messages, collected from 20 different netnews newsgroups.
We selected about 1,200 of these messages that were posted to two interest
groups, for Autos and Electronics (about 50% in each).
Select Get Data – File Folder to open the Import From File System dialog. At
the top of the dialog, click Browse… to navigate to the Autos subfolder
(C:\Program Files\Frontline Systems\Analytic Solver Platform\Datasets\Text
Mining Example Documents\Autos). Set the File Type to All Files (*.*), then
select all files in the folder and click the Open button. The files will appear in
the left list box under Files. Click the >> button to move the files from the Files
list box to the Selected Files list box. Now repeat these steps for the Electronics
subfolder. When these steps are completed, 985 files will appear under Selected
Files.
Select Sample from selected files to enable the Sampling Options. Analytic
Solver Data Mining will perform sampling from the files in the Selected Files
field. Enter 300 for Desired sample size while leaving the default settings for
Simple random sampling and Set Seed.
Note: If you are using the educational version of Analytic Solver Data Mining,
enter "100" for Desired Sample Size. This is the upper limit for the number of
files supported when sampling from a file system when using Analytic Solver
Data Mining. For a complete list of the capabilities of Analytic Solver Data
Mining and Analytic Solver Data Mining for Education, click here.
Analytic Solver Data Mining will select 300 files using Simple random
sampling with a seed value of 12345. Under Output, leave the default setting of
Write file paths. Rather than writing out the file contents into the report,
Analytic Solver Data Mining will include the file paths.
Note: Currently, Analytic Solver Data Mining only supports the import of
delimited text files. A delimited text file is one in which data values are
separated by a character such as quotation marks, commas or tabs. These
characters define a beginning and end of a string of text.

Frontline Solvers Analytic Solver Data Mining Reference Guide 43


Click OK. The FileSampling worksheet will be inserted into the Analytic
Solver task pane under Data Mining – Transformations – Sample From File
Folder with contents similar to that shown on the next page.
The Data portion of the report displays the selections we made on the Import
From File System dialog. Here we see the path of the directories, the number of
files written, our choice to write the paths or contents (File Paths), the sampling
method, the desired sample size and the seed value (12345).
Underneath the Data portion are paths to the 300 text files in random order that
were sampled by Analytic Solver Data Mining. If Write file contents had been
selected, rather than Write file paths, the report would contain the RowID, File
Path, and the first 32,767 characters present in the document.
From here, one could use Excel’s sort features to categorize the paths by
“Autos” and “Electronics” for use with the Text Mining tool. See subsequent
Text Mining chapter for an example on how to use this feature.

Frontline Solvers Analytic Solver Data Mining Reference Guide 44


Importing from File Folder Options
See below for an explanation of each option as displayed on the Import from
File System dialog in Data Mining Desktop.
Note: Analytic Solver Data Mining only supports the import of delimited text
files. A delimited text file is one in which data values are separated by a
character such as quotation marks, commas or tabs. These characters define a
beginning and end of a string of text. This functionality if not supported in Data
Mining Cloud.

Frontline Solvers Analytic Solver Data Mining Reference Guide 45


Directory
Click Browse to navigate to the directory that contains the collection of text
documents.

Files
The files contained within the file folder as selected for Directory will appear
here. Click the > command button to move individual files or the >> button to
move the entire collection to the Selected Files listbox.

Selected Files
The text files listed here have been selected for import or sampling.

Import selected files


Select this option to import the selected text files.

Sample from selected files


Select this option to choose a randomly selected sample from the collection of
text documents according to the options selected within the Sampling Options
section.

Frontline Solvers Analytic Solver Data Mining Reference Guide 46


Sample With replacement
If this option is checked, the text files will be sampled with replacement. The
default is Sampling without Replacement. When Sampling with replacement,
text documents chosen during sampling will not be removed from the collection.

Desired sample size


Enter a value for the desired sample size. This value determines the number of
text documents to be included in the sample. The default value is half of the
number of documents listed in the Selected Files list box.

Simple random sampling


If this option is selected, a simple random sample of say, size n, is chosen from
the documents in the Selected Files list box in such a way that every random set
of n items from the population has an equal chance of being chosen to be
included in the sample. Thus simple random sampling not only avoids bias in
the choice of individual items but also gives every possible document an equal
chance of being selected. This option is selected by default when Sample from
selected files is enabled.

Set Seed
This option initializes the random number generator. Setting the random
number seed to a nonzero value (any number of your choice is OK) ensures that
the same sequence of random numbers is used each time the sample of
documents is selected. The default value is “12345”. When the seed is zero, the
random number generator is initialized from the system clock, so the sequence
of documents selected will be different each time a sample is taken. If you need
the results from successive runs to be strictly comparable, you should set the
seed. To do this, select the checkbox next to the Set Seed edit box, or type the
number you want into the box. This option is selected by default when Sample
from selected files is enabled. This option accepts both positive and negative
integers with up to 9 digits.

Output
If Write file paths is selected, pointers to the file locations are stored on the
FileSampling output sheet. If Write file contents is selected, the content of each
text document will be written to a cell on the FileSampling output, up to a
maximum of 32,767 characters.

Frontline Solvers Analytic Solver Data Mining Reference Guide 47


Exploring Data

Introduction
The Explore menu gives you access to Dimensionality Reduction via Feature
Selection and the ability to explore your data using charts such as Bar charts,
Line Charts, Scatterplots, Boxplots, Histograms, Parallel Coordinates, Scatter
Plot Matrices and Variable Plots.

Feature Selection
Dimensionality Reduction is the process of deriving a lower-dimensional
representation of original data, that still captures the most significant
relationships, to be used to represent the original data in a model. This domain
can be divided into two branches, feature selection and feature extraction.
Feature selection attempts to discover a subset of the original variables while
Feature Extraction attempts to map a high – dimensional model to a lower
dimensional space. In past versions, Analytic Solver Data Mining only
contained one feature extraction tool which could be used outside of a
classification or regression method, Principal Components Analysis (Transform
– Principal Components on the Data Mining ribbon). However, in V2015, a
new feature selection tool was added, Feature Selection. For more information
on Principal Components Analysis, please see the chapter of the same name.
In V2015, a new tool for Dimensionality Reduction was introduced, Feature
Selection. Feature Selection attempts to identify the best subset of variables (or
features) out of the available variables (or features) to be used as input to a
classification or regression method. The main goal of Feature Selection is
threefold – to “clean” the data, to eliminate redundancies, and to quickly identify
the most relevant and useful information hidden within the data thereby
reducing the scale or dimensionality of the data. Feature Selection results in an
enhanced ability to explore the data, visualize the data and in some cases to
make some previously infeasible analytic models feasible.
One important issue in Feature Selection is how to define the “best” subset. If
using a supervised learning technique (classification/regression model), the
“best” subset would result in a model with the lowest misclassification rate or
residual error. This presents a different question – which classification method
should we use? A given subset (of variables) may be optimal for one method
but not for another. One might answer, “try all possible subsets”.
Unfortunately, the number of all possible combinations of variables can quickly
grow to an exponential number making the problem of finding the best subset
(of variables) infeasible for even a moderate number of variables. Even trying
to find the best subset of 10 variables out of a total of 50 would lead to
10,272,278,170 combinations!
Feature Selection methods are divided into 3 major categories: filters, wrappers,
and embedded approaches. Analytic Solver Data Mining’s new Feature
Selection tool uses Filter Methods which provide the mechanisms to rank
variables according to one or more univariate measure and to select the top-
ranked variables to represent the data in the model. In Analytic Solver Data

Frontline Solvers Analytic Solver Data Mining Reference Guide 48


Mining, Feature Selection is only supported in supervised learning methods; the
importance of a variable is based on its relation, or ability to predict the value of,
the output variable. The measures used to rank the variables can be divided into
three main categories: correlation – based, statistical tests, and information –
theoretic measures. The definitive characteristic of Filter methods is their
independence of any particular model, therefore making them widely applicable
as a preprocessing step for supervised learning algorithms. Usually, filter
methods are much less computationally expensive than other Feature Selection
approaches. This means that when faced with a big data problem, these methods
are sometimes the only methods that are computationally feasible. The major
drawback is that filters do not examine subsets containing multiple variables,
they only rank them individually. Sometimes, individual features, not important
by themselves, could become relevant when combined with other feature(s).
Feature Selection is a very important topic that becomes more relevant as the
number of variables in a model increases. See the example below for a walk
through of this significant new feature.

Feature Selection Example


Analytic Solver Data Mining’s Feature Selection tool gives users the ability to
rank and select the most relevant variables for inclusion in a classification or
regression model. In many cases the most accurate models, or the models with
the lowest misclassification or residual errors, have benefited from better feature
selection, using a combination of human insights and automated methods.
Analytic Solver Data Mining provides a facility to compute all of the following
metrics, described in the literature, to give users information on what features
should be included, or excluded, from their models.

• Correlation-based
o Pearson product-moment correlation
o Spearman rank correlation
o Kendall concordance
• Statistical/probabilistic independence metrics
o Welch’s statistic
o F statistic
o Chi-square statistic
• Information-theoretic metrics
o Mutual Information (Information Gain)
o Gain Ratio
• Other
o Cramer’s V
o Fisher score
o Gini index

Only some of these metrics can be used in any given application, depending on
the characteristics of the input variables (features) and the type of problem. In a
supervised setting, if we classify data mining problems as follows:

• : real-valued features, regression problem


• : real-valued features, binary classification problem
• : real-valued features, multi-class classification problem
• : nominal categorical features, regression problem

Frontline Solvers Analytic Solver Data Mining Reference Guide 49


• : nominal categorical features, binary classification
problem

• : nominal categorical features, multi-class


classification problem

then we can describe the applicability of the Feature Selection metrics by the
following table:

R-R R-{0,1} R-{1..C} {1..C}-R {1..C}-{0,1} {1..C}-{1..C}


Pearson N
Spearman N
Kendall N
Welch's D N
F-Test D N N
Chi-
D D D D N N
squared
Mutual
D D D D N N
Info
Gain Ratio D D D D N N
Fisher D N N
Gini D N N

"N" means that metrics can be applied naturally, and “D” means that features
and/or the outcome variable must be discretized before applying the particular
filter.
As a result, depending on the variables (features) selected and the type of
problem chosen in the first dialog, various metrics will be available or disabled
in the second dialog.
The goal of this example is three-fold: 1. To use Feature Selection as a tool for
exploring relationships between features and the outcome variable, 2. Reducing
the dimensionality based on the Feature Selection results and 3. Evaluating the
performance of a supervised learning algorithm (a classification algorithm) for
different feature subsets.
This example uses the Boston_Housing.xlsx example dataset, which contains 14
variables each describing a census tract within the city of Boston. A description
of each variable is given in the table below. In addition to these variables, the
data set also contains an additional variable, which has been created by
categorizing median value (MEDV) into two categories – high (MEDV > 30)
and low (MEDV < 30).

CRIM Per capita crime rate by town


ZN Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)
RM Average number of rooms per dwelling

Frontline Solvers Analytic Solver Data Mining Reference Guide 50


AGE Proportion of owner-occupied units built prior to 1940
DIS Weighted distances to five Boston employment centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of African-Americans
by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's

To open the Boston Housing example dataset, click Help – Example Models –
Forecasting\Data Mining Examples -- Boston Housing.
Select a cell within the data (say A2), then click Explore – Feature Selection to
bring up the first dialog.
Select all variables except MEDV and CAT.MEDV as Continuous Variables
and CAT.MEDV as the Output Variable. Leave the default setting of
Categorical selected. This setting denotes that the Output Variable is a
categorical variable. If the number of unique values in the Output variable is
greater than 10, then Continuous will be selected by default. However, at any
time the User may override the default choice based on his or her own
knowledge of the variable. Note: You can also perform this analysis with
variables CHAS (nominal) and RAD (ordinal) selected as Categorical Variables
– for this particular example, the most/least relevant or important variables
found by Feature Selection would be similar.

Frontline Solvers Analytic Solver Data Mining Reference Guide 51


Click the Measures tab or click Next.
Since we have continuous variables, Discretize predictors is enabled. When this
option is selected, Analytic Solver Data Mining will transform continuous
variables into discrete, categorical data in order to be able to calculate statistics
as shown in the table in the Introduction. For example, all of our variables (or
features) are continuous or real-valued,
as labeled in the chart. As a result, if we are interested in
evaluating the relevance of features according to the Chi-Squared Test or
measures available in the Information Theory group (Mutual Information and
Gain ratio), we must discretize these variables. If we do not select Discretize
predictors, then we have the option to compute Welch’s test or the F-test
statistics (F-Statistic or Fisher score), only. Let’s select Discretize predictors,
then click Advanced. Leave the defaults of 10 for Maximum # bins and Equal
Interval for Bins to be made with. Analytic Solver Data Mining will create 10
bins and will assign records to the bins based on if the variable’s value falls in
the interval of the bin. This will be performed for each of the Continuous
Variables.
Note: Discretize output variable is disabled because our output variable,
CAT.MEDV, is already a categorical nominal variable. If we had no
Continuous Variables and all Categorical Variables, Discretize predictors would
be disabled.

Frontline Solvers Analytic Solver Data Mining Reference Guide 52


Select Chi-squared and Cramer’s V under Chi-Squared Test. The Chi-
squared test statistic is used to assess the statistical independence of two events.
When applied to Feature Selection, it is used as a test of independence to assess
whether the assigned class is independent of a particular variable. The
minimum value for this statistic is 0. The higher the Chi-Squared statistic, the
more independent the variable.
Cramer’s V is a variation of the Chi-Squared statistic that also measures the
association between two discrete nominal variables. This statistic ranges from 0
to 1 with 0 indicating no association between the two variables and 1 indicating
complete association (the two variables are equal).
Select Mutual information and Gain ratio within the Information Theory frame.
Mutual information is the degree of a variables’ mutual dependence or the
amount of uncertainty in variable 1 that can be reduced by incorporating
knowledge about variable 2. Mutual Information is non-negative and is equal to
zero if the two variables are statistically independent. Also, it is always less
than the entropy (amount of information contained) in each individual variable.
The Gain Ratio, ranging from 0 and 1, is defined as the mutual information (or
information gain) normalized by the feature entropy. This normalization helps
address the problem of overemphasizing features with many values but the
normalization results in an overestimate of the relevance of features with low
entropy. It is a good practice to consider both mutual information and gain ratio

Frontline Solvers Analytic Solver Data Mining Reference Guide 53


for deciding on feature rankings. The larger the gain ratio, the larger the
evidence for the feature to be relevant in a classification model.

Click the Output Options tab or click Next. Table of all produced measures is
selected by default. When this option is selected, Analytic Solver Data Mining
will produce a report containing all measures selected on the Measures tab.
Top Features table is selected by default. This option produces a report
containing only top variables as indicated by the Number of features edit box.
Select Feature importance plot. This option produces a graphical
representation of variable importance based on the measure selected in the Rank
By drop down menu.
Enter 5 for Number of features. Analytic Solver Data Mining will display the
top 5 most important or most relevant features (variables) as ranked by the
statistic displayed in the Rank By drop down menu.
Keep Chi squared statistic selected for the Rank By option. Analytic Solver
Data Mining will display all measures and rank them by the statistic chosen in
this drop down menu.

Frontline Solvers Analytic Solver Data Mining Reference Guide 54


Click Finish.
• The Feature Importance Plot opens automatically in desktop Analytic
Solver Data Mining Desktop.
• If using AnalyticSolver.com, you can open this plot under Results –
Feature Selection – Run 1 – Feature_Importance_Plot.
• In the Data Mining Cloud app, click the Charts icon on the Ribbon to
open the Charts dialog, then select FS_Top_Features for Worksheet
and Feature Importance Chart for Chart.
This chart ranks the variables by most important or relevant according to the
selected measure. In this example, we see that the RM (Average number of
rooms per dwelling), LSTAT (% lower status of the population), PTRatio
(pupil-teacher ratio by town), ZN (proportion of residential land zoned for lots
over 25,000 sq. ft.), and INDUS (proportion of non-retail business acres per
town) variables are the top five most important or relevant variables according
to the Chi-Squared statistic. It’s beneficial to examine the Feature Selection
Importance Plot in order to quickly identify the largest drops or “elbows” in
feature relevancy (importance) and select the optimal number of variables for a
given classification or regression model.
Note: We could have limited the number of variables displayed on the plot to a
specified number of variables (or features) by selecting Number of features and
then specifying the number of desired variables. This is useful when the number

Frontline Solvers Analytic Solver Data Mining Reference Guide 55


of input variables is large or we are particularly interested in a specified number
of highly – ranked features.

Run your mouse over each bar in the graph to see the Variable name and
Importance factor, in this case Chi-Square, in the top of the dialog.
Click the X in the upper right hand corner to close the dialog, then click
FS_Output tab to open the Feature Selection report.

The Detailed Feature Selection Report displays each computed metric selected
on the Measures tab: Chi-squared statistic, Chi-squared P-Value, Cramer’s V,
Mutual Information, and Gain Ratio. If using the Cloud or Desktop Analytic
Solver Data Mining, click the down arrow next to each statistic to sort the
table. For example, if we click the down arrow next to Chi-squared Statistic and
select Sort Largest to Smallest from the menu,

the table will be sorted on the Chi-squared test statistic from largest to smallest.
Note: This sort option is not supported in AnalyticSolver.com.

Frontline Solvers Analytic Solver Data Mining Reference Guide 56


Click Top Features Info on the Output Navigator to view the Feature Importance
Plot and Top Features Info table. The RM, LSTAT, PTRATIO, ZN, and
INDUS variables are the 5 most important or relevant variables as ranked by the
Chi-square test. According to the Chi-squared test, RM is the most relevant
variable for discriminating the price of the house. This variable is highly
dependent with the outcome variable, CAT.MEDV.

Keep in mind that when determining what features to include in our


classification model, it is advantageous to examine at least several metrics to see
which ones agree/disagree on the level of each variable’s importance.
Let’s re-run Feature Selection but this time we’ll ask for different statistics. On
the first dialog, we will again select all variables except CAT.MEDV and
MEDV as Selected Variables and select CAT.MEDV as the output
variable. (For an example screenshot, see above). Then click the Measures
tab. Select Welch’s test, F-Statistic, Fisher score, and Gini index.
Welch’s Test is a two-sample test (i.e. applicable for binary classification
problems) that is used to check the hypothesis that two populations with
possibly unequal variances have equal means. When used with the Feature
Selection tool, a large T-statistic value (in conjunction with a small p-value)
would provide sufficient evidence that the Distribution of values for each of the
two classes are distinct and the variable may have enough discriminative power
to be included in the classification model.
F-Test tests the hypothesis of at least one sample mean being different from
other sample means assuming equal variances among all samples. If the
variance between the two samples is large with respect to the variance within the
sample, the F statistic will be large. Specifically for Feature Selection purposes,
it is used to test if a particular feature is able to separate the records from
different target classes by examining between-class and within-class variances.

Frontline Solvers Analytic Solver Data Mining Reference Guide 57


Fisher Score is a variation of the F-Statistic. It chooses (or assigns higher
values) to variables that assign similar values to samples from the same class
and different values to samples from different classes. The larger the Fisher
Score value, the more relevant or important the variable (or feature).
The Gini index measures a variable’s ability to distinguish between classes. The
maximum value of the index for binary classification is 0.5. The smaller the
Gini index, the more relevant the variable.

The Top features table option is selected by default. Increase the Number of
features to 5 under Feature Selection and select Feature Importance Plot, then
click Finish to accept the remaining defaults on the Output Options tab and run
Feature Selection.

Frontline Solvers Analytic Solver Data Mining Reference Guide 58


According to the Welch Test, the top five most relevant variables are LSTAT,
RM, INDUS, PTRatio, and Tax.

Click the X in the upper right hand corner to close the plot then click the
Feature Selection: Statistics link on the Output Navigator. Observe that the p-

Frontline Solvers Analytic Solver Data Mining Reference Guide 59


values corresponding to the computed Welch’s Test statistic for the above 5
variables are almost zero (e.g. LSTAT 4.73E-70, RM – 2.83E-31, INDUS –
5.111E-19, PTRATIO – 1.95E-17, etc…), which means that we have very
strong evidence (threshold for p-value is 0.05) for rejecting the hypothesis that
two samples contained in these variables – in our case stratified by binary
CATMEDV – have equal means. This provides evidence that these variables
would not be redundant and would have some nontrivial discriminative ability
for house prices.

If using either the Desktop or Cloud apps, we can sort by each Statistic and see
that the top four variables for each statistic includes: LSTAT, INDUS,
PRATIO, and RM (taking into account Welch statistic magnitude). (Sort
functionality is not currently supported in AnalyticSolver.com.) As you can see
in the screenshot below, the F-Test statistic ranks RM as the top variable (with
strong evidence i.e. very low P-Value) and LSTAT as the second most important
variable,

while Welch’s test ranks the LSTAT variable as the most important variable and
the RM variable (in magnitude) as the 2nd most important variable.
Interestingly, the Gini Index, which is not a statistical hypothesis test, also
agrees to the above ranks. The fact that this index and our hypothesis tests agree
provides even stronger evidence of the aforementioned variables relevancy. As
mentioned above, the Gini index is a widely – used measure for quantifying a
variable’s ability to distinguish between classes which is related to how the

Frontline Solvers Analytic Solver Data Mining Reference Guide 60


hierarchy of splits in the Classification Tree and Regression Tree algorithms are
found.

At this point we already have a lot of useful information about our variables and
their relationship to the categorical CAT.MEDV variable. Now it’s a good time
to come back to the data description and try to understand the feature selection
results logically. For example, the LSTAT variable is the % lower status of the
population of the census tract, the RM variable is the average number of rooms
per house in the census tract, INDUS corresponds to the amount of non-retail
business in the census tract, and PTRatio is the pupil to teacher ratio in schools
in the census tract. The Boston Housing dataset is a small but very well-known
and widely used dataset. Feature selection confirms the intuitive observations
that these features are dependent on the output variable’s value. However, in
most cases, these relationships are typically hard to detect due to the large scale
of most datasets which involve complex interrelationships between variables.
Armed with the knowledge that we have obtained through the Features Selection
tool, you could quickly create a classification model and compare the
performance of a classification model created using only the two “best”
variables, out of the original 13, to a classification model created using all 13
variables. (See chapters related to all six different classification methods that
appear late on in this guide or see the chapter, Fitting a Model Using Feature
Selection that appears in the Analytic Solver Data Mining User Guide.)
Let’s go back to the Data worksheet (or Data Set if using AnalyticSolver.com)
and re-run Feature Selection, but this time we will use Feature Selection for
evaluating a variable’s importance or relevance for predicting the median house
prices instead of classifying them into two categories, low or high, as done in the
analysis above. Again, select all variables except CAT. MEDV and MEDV and
move them to Selected Variables. Then select MEDV as the Output
Variable. Continuous is selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 61


Click the Measures tab or Next. Leave Discretize predictors and Discretize
output variable unchecked. If Discretize predictors is selected, no statistics will
be enabled. If Discretize output variable is selected, F-Statistic, Fisher score,
and Gini index are enabled.
Select Pearson correlation, Spearman rank correlation, and Kendall
concordance.
The Pearson product-moment correlation coefficient is a widely used statistic
that measures the closeness of the linear relationship between two variables,
with a value between +1 and −1 inclusive, where 1 indicates complete positive
correlation, 0 indicates no correlation, and −1 indicates complete negative
correlation.
The Spearman rank correlation coefficient is a nonparametric measure that
assesses the relationship between two variables. This measure calculates the
correlation coefficient between the ranked values of the two variables. If data
values are repeated, the Spearman rank correlation coefficient will be +1 or -1, if
each of the variables is a perfect monotone (or non-varying) function of the
other.
Kendall concordance, also known as Kendall’s tau coefficient, is also used to
measure the level of association between two variables. A tau value of +1
signifies perfect agreement and a -1 indicates complete disagreement. If a
variable and the outcome variable are independent, then one could expect the
Kendall tau to be approximately zero.

Frontline Solvers Analytic Solver Data Mining Reference Guide 62


Click the Output Options tab (or click Next) and select Feature importance
plot, entering a value of 5 for Number of features. Leave all remaining options
at their defaults.

Frontline Solvers Analytic Solver Data Mining Reference Guide 63


Click Finish to run Feature Selection for regression and open the Feature
Importance Plot.

This plot ranks the variables in order of importance according to Pearson’s rank
order coefficient. Click the X in the upper right hand corner and double click
FS_Output3 to open. Scroll down to view the Detailed Feature Selection

Frontline Solvers Analytic Solver Data Mining Reference Guide 64


Report. Click the down arrows next to each measure to investigate how
different ranking methods arrange the input features based on their importance
or relevance.

Double click FS_Top_Feature2 to display the 5 top Selected Predictors ranked


accordingly by the Pearson Correlation.

If we sort by each Statistic (Pearson Correlation: Rho and P-Value, Spearman


Correlation: Rho and P-Value, and Kendall Correlation: Tau and P-Value), we
can see that the top four variables common amongst all statistics are again
LSTAT, INDUS, RM and TAX. The low p-values indicate that with extremely
large statistical evidence, the observed correlation is real, i.e. not due to random
sampling. Although observed correlations are not extremely large, they still
show significant relationships, and, more importantly, they show relative
importance/relevance of variables (or features). Note that the correlation is a
signed measure and correlations with large magnitude, are considered to be
important/relevant for regression models. If we order the variables according to
the Spearman Correlation coefficient from largest to smallest by magnitude, we
see that the variables with the smallest magnitudes are the CHAS and B
variables with magnitudes of 0.1857 and 0.1406, respectively. Since these
values are close to zero (and other measures have also agreed with such
ranking), we can conclude that these variables will not be of much relevance in
our regression model. The top 5 variables with the largest magnitude of
Spearman correlation are LSTAT, RM, INDUS, NOX and TAX. If we rank the
variables according to the magnitude of Pearson Correlation, we see that our top
five variables are (in order) LSTAT (-0.7377), RM (0.6954), PTRATIO (-
0.5078), INDUS (-0.4837) and TAX (-0.4685) while the variable with the least
amount of relevance/importance is the CHAS variable (census tract proximity to
the Charles River). If we rank the variables by the Kendall Correlation, the 5
most relevant/important variables are again (in order) LSTAT, RM, INDUS,
TAX, and CRIM (per capita crime rate). Variables that might be worth a 2nd
look include the CRIM and NOX variables as all three statistics rank these

Frontline Solvers Analytic Solver Data Mining Reference Guide 65


variables in the middle. It’s worth noting that the top two variables ranked by
all 3 correlations (LSTAT and RM) are at the polar extremes with respect to
their magnitudes - meaning that the median house price would tend to increase
with increased RM(#rooms), and will tend to decrease with increased % lower
status of the population.
The Feature Selection tool has allowed us to quickly explore and learn about our
data. We now have a pretty good idea of which variables are the most relevant
or most important to our classification or regression model, how our variables
relate to each other and to housing prices, and which data attributes would be
worth extra time and money in future data collection. Interestingly, for this
example, most of our ranking statistics have agreed (mostly) on the most
important or relevant features with strong evidence. We computed and
examined various metrics and statistics and for some (where p-values can be
computed) we’ve seen a statistical evidence that the test of interest succeeded
with definitive conclusion. In this example, we’ve observed that several
variable (or features) were consistently ranked in the top 5 most important
variables by most of the measures produced by Analytic Solver Data Mining’s
Feature Selection tool. However, this will not always be the case. On some
datasets you will find that the ranking statistics and metrics compete on
rankings. In cases such as these, further analysis may be required.

Feature Selection Options


This section gives an explanation on each option located on each of the three
Feature Selection tabs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 66


Variables listbox
Variables (or features) included in the dataset are listed here.

Continuous Variables listbox


Place continuous variables from the Variables listbox to be included in Feature
Selection by clicking the > command button. Feature Selection will accept all
values for continuous variables except non-numeric values.

Categorical Variables listbox


Place categorical variables from the Variables listbox to be included in Feature
Selection by clicking the > command button. Feature Selection will accept non-
numeric categorical variables.

Output Variable
Click the > command button to select the Output Variable. This variable may
be continuous or categorical. If the variable contains more than 10 unique
values, the output variable will be considered “continuous”. If the variable

Frontline Solvers Analytic Solver Data Mining Reference Guide 67


contains less than 10 unique values, the output variable will be considered
“categorical”.

Output Variable Type


If the Output Variable contains more than 10 unique values, Continuous will be
automatically selected and options relevant to this type of variable will be
offered on the Measures tab. If the Output Variable contains 10 or less unique
values, Categorical will be automatically selected and options relevant to this
type of variable will be offered on the Measures tab. The default selection can
always be overridden by the user based on his/her knowledge of the output
variable.

In a supervised setting, if we classify data mining problems as follows:

• : real-valued features, regression problem


• : real-valued features, binary classification problem
• : real-valued features, multi-class classification problem
• : nominal categorical features, regression problem

Frontline Solvers Analytic Solver Data Mining Reference Guide 68


• : nominal categorical features, binary classification
problem

• : nominal categorical features, multi-class


classification problem

then we can describe the applicability of the Feature Selection metrics by the
following table:

R-R R-{0,1} R-{1..C} {1..C}-R {1..C}-{0,1} {1..C}-{1..C}


Pearson N
Spearman N
Kendall N
Welch's D N
F-Test D N N
Chi-
D D D D N N
squared
Mutual
D D D D N N
Info
Gain Ratio D D D D N N
Fisher D N N
Gini D N N

"N" means that metrics can be applied naturally, and “D” means that features
and/or the outcome variable must be discretized before applying the particular
filter. As a result, depending on the variables (features) selected and the type of
problem chosen in the first dialog, various metrics will be available or disabled
in this dialog.

Discretize predictors
When this option is selected, Analytic Solver Data Mining will transform
continuous variables listed under Continuous Variables on the Data Source tab
into categorical variables.
Click the Advanced command button to open the Predictor Discretization -
Advanced dialog. Here the Maximum number of bins can be selected. Analytic
Solver Data Mining will assign records to the bins based on if the variable’s
value falls within the interval of the bin (if Equal interval is selected for Bins to
be made with) or on an equal number of records in each bin (if Equal Count is
selected for Bins to be made with). These settings will be applied to each of the
variables listed under Continuous Variables on the Data Source tab.

Frontline Solvers Analytic Solver Data Mining Reference Guide 69


Discretize output variable
When this option is selected, Analytic Solver Data Mining will transform the
continuous output variable, listed under Output Variable on the Data Source tab,
into a categorical variable.
Click the Advanced command button to open the Output Discretization -
Advanced dialog. Here the Maximum number of bins can be selected. Analytic
Solver Data Mining will assign records to the bins based on if the output
variable’s value falls within the interval of the bin (if Equal interval is selected
for Bins to be made with) or on an equal number of records in each bin (if Equal
Count is selected for Bins to be made with). These settings will be applied to the
variable selected for Output Variable in the Data Source tab.

Pearson correlation
The Pearson product-moment correlation coefficient is a widely used statistic
that measures the closeness of the linear relationship between two variables,
with a value between +1 and −1 inclusive, where 1 indicates complete positive
correlation, 0 indicates no correlation, and −1 indicates complete negative
correlation.

Spearman rank correlation


The Spearman rank correlation coefficient is a nonparametric measure that
assesses the relationship between two variables. This measure calculates the
correlation coefficient between the ranked values of the two variables. If data
values are repeated, the Spearman rank correlation coefficient will be +1 or -1, if
each of the variables is a perfect monotone (or non-varying) function of the
other.

Kendall concordance
Kendall concordance, also known as Kendall’s tau coefficient, is also used to
measure the level of association between two variables. A tau value of +1

Frontline Solvers Analytic Solver Data Mining Reference Guide 70


signifies perfect agreement and a -1 indicates complete disagreement. If a
variable and the outcome variable are independent, then one could expect the
Kendall tau to be approximately zero.

Welch’s Test
Welch’s Test is a two-sample test (i.e. applicable for binary classification
problems) that is used to check the hypothesis that two populations with
possibly unequal variances have equal means. When used with the Feature
Selection tool, a large T-statistic value (in conjunction with a small p-value)
would provide sufficient evidence that the Distribution of values for each of the
two classes are distinct and the variable may have enough discriminative power
to be included in the classification model.

F-Statistic
F-Test tests the hypothesis of at least one sample mean being different from
other sample means assuming equal variances among all samples. If the
variance between the two samples is large with respect to the variance within the
sample, the F statistic will be large. Specifically for Feature Selection purposes,
it is used to test if a particular feature is able to separate the records from
different target classes by examining between-class and within-class variances.

Fisher score
Fisher Score is a variation of the F-Statistic. It chooses (or assigns higher
values) to variables that assign similar values to samples from the same class
and different values to samples from different classes. The larger the Fisher
Score value, the more relevant or important the variable (or feature).

Chi-Squared
The Chi-squared test statistic is used to assess the statistical independence of
two events. When applied to Feature Selection, it is used as a test of
independence to assess whether the assigned class is independent of a particular
variable. The minimum value for this statistic is 0. The higher the Chi-Squared
statistic, the more independent the variable.

Cramer's V
Cramer’s V is a variation of the Chi-Squared statistic that also measures the
association between two discrete nominal variables. This statistic ranges from 0
to 1 with 0 indicating no association between the two variables and 1 indicating
complete association (the two variables are equal).

Mutual information
Mutual information is the degree of a variables’ mutual dependence or the
amount of uncertainty in variable 1 that can be reduced by incorporating
knowledge about variable 2. Mutual Information is non-negative and is equal to
zero if the two variables are statistically independent. Also, it is always less
than the entropy (amount of information contained) in each individual variable.

Frontline Solvers Analytic Solver Data Mining Reference Guide 71


Gain ratio
This ratio, ranging from 0 and 1, is defined as the mutual information (or
information gain) normalized by the feature entropy. This normalization helps
address the problem of overestimating features with many values but the
normalization overestimates the relevance of features with low entropy. It is a
good practice to consider both mutual information and gain ratio for deciding on
feature rankings. The larger the gain ratio, the larger the evidence for the feature
to be relevant in a classification model.

Gini index
The Gini index measures a variable’s ability to distinguish between classes. The
maximum value of the index for binary classification is 0.5. The smaller the
Gini index, the more relevant the variable.

Table of all produced measures


If this option is selected, Analytic Solver Data Mining will produce a table
containing all of the selected measures from the Measures tab. This option is
selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 72


Top features table
If this option is selected, Analytic Solver Data Mining will produce a table
containing the top number of features as determined by the Number of features
edit box and the Rank By option. This option is not selected by default.

Feature importance plot


If this option is selected, Analytic Solver Data Mining will plot the top most
important or relevant features as determined by the value entered for the Number
of features option and the Rank By option. This feature is selected by default.
To open this plot in the Cloud app, click Charts on the Ribbon.

Number of features
Enter a value here ranging from 1 to the number of features selected in the
Continuous and Categorical Variables listboxes on the Data Source tab. This
value, along with the Rank By option setting, will be used to determine the
variables included in the Top Features Table and Feature Importance Plot. This
option has a default setting of “2”.

Rank By
Select Measure or P-Value, then select the measure from the Rank By drop
down menu to rank the variables by most important or relevant to least
important or relevant in the Top Features Table and Feature Importance Plot. If
Measure is selected, then the variables will be ranked by the actual value of the
measure or statistic selected, depending on the interpretation (either largest to
smallest or smallest to largest). If P-Value is selected, then the variables will be
ranked from smallest to largest using the P-value of the measure or statistic
selected.

Chart Wizard
To create a chart, you can invoke the Chart Wizard by clicking Explore on the
Data Mining ribbon. A description of each chart type follows.

Bar Chart
The bar chart is one of the easiest and effective plots to create and understand.
The best application for this type of chart is comparing an individual statistic
(i.e. mean, count, etc.) across a group of variables. The bar height represents the
statistic while the bars represent the different variable groups. An example of a
bar chart is shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 73


Box Whisker Plot
A box plot graph summarizes a dataset and is often used in exploratory data
analysis. This type of graph illustrates the shape of the distribution, its central
value, and the range of the data. The plot consists of the most extreme values in
the data set (maximum and minimum values), the lower and upper quartiles, and
the median.
Box plots are very useful when large numbers of observations are involved or
when two or more data sets are being compared. In addition, they are also
helpful for indicating whether a distribution is skewed and whether there are any
unusual observations (outliers) in the data set. The most important trait of the
box plot is its failure to be strongly influenced extreme values, or outliers.
A box plot includes the following statistical features.
Median: The median value in a dataset is the value that appears in the middle of
a sorted dataset. If the dataset has an even number of values then the median is
the average of the two middle values in the dataset.
Quartiles: Quartiles, by definition, separate a quarter of data points from the
rest. This roughly means that the first quartile is the value under which 25% of
the data lie and the third quartile is the value over which 25% of the data are
found. (Note: This indicates that the second quartile is the median itself.)
First Quartile, Q1: Concluding from the definitions above, the first quartile is
the median of the lower half of the data. If the number of data points is odd, the
lower half includes the median.
Third Quartile, Q3: Third quartile is the median of the upper half of the data. If
the number of data points is odd, the upper half of the data includes the median.
See the following example.
Consider the following dataset --
52, 57, 60, 63, 71, 72, 73, 76, 98, 110, 120
The dataset has 11 values sorted in ascending order. The median is the middle
value, (i.e. 6th value in this case.)
Median = 72
Q1 is the median of the first 6 values, (i.e. the mean of 3rd and 4th values)
25th Percentile = 61.5
Q3 is the median of the last 6 values. (i.e. the mean of the 8th and 9th values).

Frontline Solvers Analytic Solver Data Mining Reference Guide 74


75th Percentile = 87
The mean is the average of all the data values ((52 + 57 + 60 + 63 + 71 + 72 +
73 + 76 + 98 + 110 + 120) / 11).
Mean = 77.45
Interquartile Range = 25.5 The Interquartile range is a useful measure of the
amount of variation in a set of data and is simply the 75th Percentile – 25th
Percentile (87 – 61.5 = 25.5)
The box extends from Q1 to Q3 and includes Q2. The extreme points are
included the "whiskers". This means the box includes the middle one- half of the
data. In Analytic Solver Data Mining, the mean is denoted with a dotted line
and the median with a solid line. Analytic Solver completes the box plot by
extending its "whiskers" to the most extreme points, 52 and 120.
Max: 120
Min: 52

Histogram
A Histogram, or a Frequency Histogram is a bar graph which depicts the range
and scale of the observations on the x axis and the number of data points (or
frequency) of the various intervals on the y axis. These types of graphs are
popular among statisticians. Although these types of graphs do not show the
exact values of the data points, they give a very good idea about the spread and
shape of the data.
Consider the percentages below from a college final exam.
82.5, 78.3, 76.2, 81.2, 72.3, 73.2, 76.3, 77.3, 78.2, 78.5, 75.6, 79.2, 78.3, 80.2,
76.4, 77.9, 75.8, 76.5, 77.3, 78.2
One can immediately see the value of a histogram by taking a quick glance at
the graph below. This plot quickly and efficiently illustrates the shape and size
of the dataset above. Note: Analytic Solver Data Mining determines the
number and size of the intervals when drawing the histogram.

Frontline Solvers Analytic Solver Data Mining Reference Guide 75


Line Chart
A line chart is best suited for time series datasets. In the example below, the line
chart plots the number of airline passengers from January 1949 to December
1960. (The X – axis is the number of months starting with January 1949 as “1”.)

Parallel Coordinates
A Parallel Coordinates plot consists of N number of vertical axes where N is the
number of variables selected to be included in the plot. A line is drawn
connecting the observation’s values for each different variable (each different
axis) creating a “multivariate profile”. These types of graphs can be useful for
prediction and possible data binning. In addition, these graphs can expose
clusters, outliers and variable “overlap”. Axes can be reordered by simply
dragging and axis and moving the axis to the desired location. . An example of
a Parallel Coordinates plot is shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 76


Scatterplot
One of the most common, effective and easy to create plots is the scatterplot.
These graphs are used to compare the relationships between two variables and
are useful in identifying clusters and variable “overlap”.

Scatterplot Matrix
A Matrix plot combines several scatterplots into one panel enabling the user to
see pairwise relationships between variables. Given a set of variables Var1,
Var2, Var3, ...., Var N the matrix plot contains all the pairwise scatter plots of
the variables on a single page in a matrix format. The names of the variables are
on the diagonals. In other words, if there are k variables, there will be k rows
and k columns in the matrix and the ith row and jth column will be the plot of
Vari versus Varj.
The axes titles and the values of the variables appear at the edge of the
respective row or column. The comparison of the variables and their interactions
with one another can be studied easily and with a simple glance which is why
matrix plots are becoming increasingly common in general purpose statistical
software programs. An example is shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 77


Variable Plot
Analytic Solver Data Mining’s Variables graph simply plots each selected
variable’s distribution. See below for an example.

Export to PowerBI/Tableau
Use the Chart Wizard to export your data to Microsoft's Power BI or Tableau.
Both Power BI and Tableau allow you to visualize and explore your data using
an extensive menu of features. This functionality is currently only supported in
Analytic Solver Desktop.

Frontline Solvers Analytic Solver Data Mining Reference Guide 78


Bar Chart
This example describes the use of the Bar Chart in Analytic Solver to illustrate
the details of the Sports TV Ratings dataset. Steps for creating a bar chart in
Analytic Solver Cloud may be found below.

Using Analytic Solver Desktop to Create a Bar Chart


Click Help – Example Models on the Data Mining ribbon, then click
Forecasting / Data Mining Models and open the example file,
SportsTVRatings.xlsx.
Select a cell within the data (say A2), then click Explore – Chart Wizard to
bring up the first dialog of the Chart Wizard.

Click Next.
On the Y Axis Selection Dialog, select World Series, and then click Next.

Frontline Solvers Analytic Solver Data Mining Reference Guide 79


Note that all variables will be plotted on the X axis.

Click Finish. Click Next to set the Panel and Color options. These options can
always be set later in the upper right hand corner of the plot.

Change Color
Change graph By & Panel By
to include here.
horizontal or
vertical bars
here.

To change the variable on the X-axis (or the Y-axis if Bars are set to "Vertical"),
simply click the down arrow and select the desired variable from the menu.

Frontline Solvers Analytic Solver Data Mining Reference Guide 80


To add a 2nd chart to the window, simply click the desired chart icon at the top
of the Chart Wizard.

Please see the Common Chart Options section (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, enter BarChart for the chart name, then click Save. The chart will
close. To reopen the chart, click Explore – Existing Charts – BarChart.

Using Analytic Solver Cloud to Create a Bar Chart


Click Help – Example Models – Forecasting / Data Mining Models in Analytic
Solver Cloud to open the Boston Housing dataset. Then click Explore – Chart
Wizard and select Bar Charts at the very top.

• Change the plotted statistic by clicking the down arrow beneath


Statistic.

Frontline Solvers Analytic Solver Data Mining Reference Guide 81


• Use Color By to select a color for a specific set of observations.
• Use Filter to filter the results according to a specific criterion.

Box Whisker Plot Example


This example describes the use of the Boxplot chart to illustrate the
characteristics of the dataset.
Click Help – Examples on the Data Mining ribbon to open the BoxPlot.xlsx
example file.
Select on a cell within the data (say A2), then click Explore – Chart Wizard to
bring up the first dialog of the Chart Wizard. Select BoxPlot, and then click
Next.

On the Y Axis Selection dialog, select Y1, and then click Next.

Select X-Var on the X-Axis Selection dialog, then click Finish. Click Next to
set Panel and Color options. These options can always be set later in the upper
right hand corner of the plot.

Frontline Solvers Analytic Solver Data Mining Reference Guide 82


Uncheck class 4 under the X-Var filter to remove this class from the plot.

Hover next to the plot to display the following Intellisense window.

The dotted line denotes the Mean of 22.49, the solid line denotes the Median of
23.22. The box reaches from the 25th Percentile of 9.07 to the 75th Percentile of
37.87. The lower “whisker” (or lower bound) reaches to -47.343 and the upper
“whisker” (or upper bound) reaches to 61.454.
To select a different variable on the y-axis, click the right pointing arrow and
select the desired variable from the menu.

Frontline Solvers Analytic Solver Data Mining Reference Guide 83


To change the variable on the X-axis, select the down arrow next to X-Var and
select the desired variable.

To add a 2nd boxplot, click the BoxPlot icon on the top of the Chart Wizard
dialog.

A second chart is added to the Chart Wizard dialog. Click the X in the upper
right corner of each plot to remove from the window. Color by and Panel by
options are always available in the upper right hand corner of each plot.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.
Please see the Common Chart Options section (below) for a complete
description of each icon on the chart title bar.

Frontline Solvers Analytic Solver Data Mining Reference Guide 84


To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, enter BoxPlot for the chart name, then click Save. The chart will
close. To reopen the chart, click Explore – Existing Charts – BoxPlot. To
delete the chart, click Delete.

Using Analytic Solver Cloud to Create a Box Plot


Click Help – Example Models – Forecasting / Data Mining Models in Analytic
Solver Cloud to open the Boxplot dataset. Then click Explore – Chart Wizard
and select Boxplot.

• Change the plotted variable(s) by clicking the down arrow beneath


Variable.
• Use Filter to filter the results according to specific criterion.

Histogram Example
The example below illustrates the use of Analytic Solver Data Mining’s chart
wizard in drawing a histogram of the Boston_Housing.xlsx dataset. Click Help
– Examples on the Data Mining ribbon to open the example dataset,
Boston_Housing.xlsx. Select a cell within the dataset, say A2, and then click
Explore – Chart Wizard on the Data Mining ribbon. The following dialog
appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 85


Select Histogram, and then click Next.

Select Frequency, then click Next.

Select INDUS, then click Finish.

Frontline Solvers Analytic Solver Data Mining Reference Guide 86


The data has been divided into 14 different bins or intervals. Unselect the
variables CRIM and ZN under Filters. Notice the graph did not change. This
is because removing these variables is, in effect, removing a column from the
dataset. Since we are currently not interested in these columns, the plot is not
affected. However, now move the right most INDUS slider to the left till the
upper bound is approximately 17. Notice the x axis now ranges from 0 to 16 as
values above 17.37 have been removed.

To change the number of bins in the histogram, move the Bins slider to the left
or right (at the bottom of the dialog).
To change the variables included in the plot, simply click the Histogram icon on
the title bar of the Chart Wizard,

to bring up the X-Y Axis Selection dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 87


Select DIS for the X-Axis, then click Next to choose color and panel options.
At this point, you could also click Finish to draw the histogram. Color and
panel options can be chosen at any time.

Select CAT. MEDV for Color By, then click Finish to draw the histogram.

The two histograms are drawn in the same window. Click the X in the upper
right corner of each plot to remove from the window. Color by and Panel by
options are always available in the upper right hand corner of each plot.

Frontline Solvers Analytic Solver Data Mining Reference Guide 88


Please see the section Common Chart Options (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, type Histogram for the chart name, then click Save. The chart will
close. To reopen the chart, click Explore – Existing Charts – Histogram.

Using Analytic Solver Cloud to Create a Histogram


Click Help – Example Models – Forecasting / Data Mining Models in Analytic
Solver Cloud to open the Boston Housing dataset. Then click Explore – Chart
Wizard and select Histogram.

• Change the plotted variable by clicking the down arrow beneath


Selected Variable.

Line Chart Example


The example below illustrates the use of Analytic Solver Data Mining’s chart
wizard in drawing a Line Chart using the Airpass.xlsx dataset. Click Help –
Examples on the Data Mining ribbon to open the example dataset, Airpass.xlsx.
Select a cell within the dataset, say A2, and then click Explore – Chart Wizard
on the Data Mining ribbon. The following dialog appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 89


Select Line Chart, then click Next.

Select Passengers, then click Next.

Frontline Solvers Analytic Solver Data Mining Reference Guide 90


Select Month, then select Finish. Click Next to choose Panel and Color
options. Both can be selected or changed in the upper right hand corner of the
plot.

The y-axis plots the number of passengers and the x-axis plots the month. This
plot shows that as the months progress, the number of airline passengers
increase.
Please see the section Common Chart Options (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

Frontline Solvers Analytic Solver Data Mining Reference Guide 91


To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, type LineChart for the chart name, then click Save. The chart will
300300close. To reopen the chart, click Explore – Existing Charts –
LineChart.

Using Analytic Solver Cloud to Create a Line Chart


Click Help – Example Models – Forecasting / Data Mining Models in Analytic
Solver Cloud to open the Airpass dataset. Then click Explore – Chart Wizard
and select Line Chart.

• Change the x-axis variable by clicking the down arrow next to X-axis.
• Filter your results by clicking the down arrow next to Filter.

Parallel Coordinates Chart Example


The example below illustrates the use of Analytic Solver Data Mining’s chart
wizard in drawing a Parallel Coordinates Plot using the dataset.
A parallel coordinates plot allows the exploration of high dimensional datasets,
or datasets with a large number of features (variables). This type of graph starts
with a set of vertically drawn parallel lines, equally spaced, which corresponds
to the features included in the graph. Observations for each feature are recorded
as dots on the vertical line. Observations that are contained within the same
record are connected by a line.
Click Help – Examples on the Data Mining ribbon to open the example dataset,
SportsTVRatings.xlsx. Select a cell within the dataset, say A2, then click
Explore – Chart Wizard on the Data Mining ribbon. The following dialog
appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 92


Select Parallel Coordinates, then click Next.

Select two variables, Indy500 and Daytona 500. Click Finish to draw the plot.

Frontline Solvers Analytic Solver Data Mining Reference Guide 93


The first thing that we notice is the range of each of the races that are indicated
at the top and bottom of each vertical line. The range of ratings for the Indy 500
has a high of 10.9 and a low of 2.3 whereas the range for the Daytona 500 is
11.3 to 4.4. As a result, this chart already conveys that the viewership for the
Daytona 500 is larger than the viewship for the Indy 500.
When looking at the observations for each feature, this chart shows that in most
years, the viewship of the Indy 500 was low whereas the viewship for the
Daytona 500 was high. There are just four years wheree high ratings for the
Indy 500 was recorded. In these same years, the ratings for the Daytona 500
was correspondingly lower.
To remove a variable from the matrix, unselect the desired variable under
Filters. To add a variable to the matrix, select the desired variable under Filters.
Please see the section Common Chart Options (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, type Parallel for the chart name, then click Save. The chart will close.
To reopen the chart, click Explore – Existing Charts – Parallel.

Using Analytic Solver Cloud to Create a Parallel


Coordinates Chart
Click Help – Example Models – Forecasting / Data Mining Models in Analytic
Solver Cloud to open the Sports TV Ratings dataset. Then click Explore – Chart
Wizard and select Parallel Coordinates.

Frontline Solvers Analytic Solver Data Mining Reference Guide 94


• Click the down arrow beneath Color By to add a color to specified
variable. In this example, each year is given a different color.
• Filter your data by clicking the down arrow next to Filter. Select the
variables to be included in the chart.

ScatterPlot Example
The example below illustrates the use of Analytic Solver Data Mining’s chart
wizard in drawing a Scatterplot using the Boston_Housing.xlsx dataset. Click
Help – Examples on the Data Mining ribbon to open the example dataset,
Boston_Housing.xlsx. Select a cell within the dataset, say A2, and then click
Explore – Chart Wizard on the Data Mining ribbon. The following dialog
appears.

Select ScatterPlot, then click Next.

Frontline Solvers Analytic Solver Data Mining Reference Guide 95


Select LSTAT on the Y Axis Selection Dialog and click Next.

Select MEDV from the X-Axis Selection Dialog. Then click Finish.

Select Color by: CHAS (Charles River dummy variable = 1 if tract bounds
river; 0 otherwise) and Panel by: CAT.MEDV (Median value of owner-
occupied homes in $1000's > 30).
This new graph illustrates that most houses that border the river are higher
priced homes.

Frontline Solvers Analytic Solver Data Mining Reference Guide 96


To remove a variable from the matrix, unselect the desired variable under
Filters. To add a variable to the matrix, select the desired variable under Filters.
Please see the section Common Chart Options (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, type Scatterplot for the chart name, then click Save. The chart will
close. To reopen the chart, click Explore – Existing Charts – Scatterplot.

Using Analytic Solver Cloud to Create a Scatterplot


Chart
Click Help – Example Models – Forecasting / Data Mining Models in Analytic
Solver Cloud to open the Boston Housing dataset. Then click Explore – Chart
Wizard and select Scatterplot.

Frontline Solvers Analytic Solver Data Mining Reference Guide 97


• Click the down arrow beneath Variable select the variable plotted on
the Y-axis.
• Click the down arrow beneath Versus to select the variable plotted on
the X-axis.
• Use Size By to increase or decrease the magnitude of the plotted points
according to the variable selected. In this example, the magnitude of
the points have been increased or decreased according to the INDUS
variable, or how close a house is to an industrial complex.
• Use Color By to change the color of the plotted points according to the
variable selected. In this example the points are categorized by color
using the RAD variable.

Scatterplot Matrix Plot Example


The example below illustrates the use of Analytic Solver Data Mining’s chart
wizard in drawing a Scatterplot Matrix using the Boston_Housing.xlsx dataset.
Click Help – Examples on the Data Mining ribbon to open the example dataset,
Boston_Housing.xlsx. Select a cell within the dataset, say A2, then click
Explore – Chart Wizard on the Data Mining ribbon. The following dialog
appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 98


Select Scatterplot Matrix, then click Next.

Select INDUS, AGE, DIS and RAD variables, then click Finish.

Frontline Solvers Analytic Solver Data Mining Reference Guide 99


Histograms of the selected variables appear on the diagonal. Find the plot in the
second row (from the top) and third column (from the left) of the matrix.

This plot indicates a pairwise relationship between the variables AGE and DIS.
The Y-axis for this plot can be found at the 2nd row, 1st column.

The X-axis for this plot can be found at the last row, 3rd column.

Frontline Solvers Analytic Solver Data Mining Reference Guide 100


To remove a variable from the matrix, unselect the desired variable under
Filters. To add a variable to the matrix, select the desired variable under Filters.
Please see the section Common Chart Options (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, type ScatterplotMatrix for the chart name, then click Save. The chart
will close. To reopen the chart, click Explore – Existing Charts –
ScatterplotMatrix.

Variable Plot Example


The example below illustrates the use of Analytic Solver Data Mining’s chart
wizard in drawing a Variable plot using the Boston_Housing.xlsx dataset. Click
Help – Examples on the Data Mining ribbon to open the example dataset,
Boston_Housing.xlsx. Select a cell within the dataset, say A2, and then click
Explore – Chart Wizard on the Data Mining ribbon. The following dialog
appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 101


Select Variable, then click Next.

All variables are selected by default. Click Finish to draw the chart.
The distributions of each variable are shown in bar chart form. To remove a
variable from the matrix, unselect the desired variable under Filters. To add a
variable to the matrix, select the desired variable under Filters.

Frontline Solvers Analytic Solver Data Mining Reference Guide 102


Please see the section Common Chart Options (below) for a complete
description of each icon on the chart title bar.

To exit the graph, click the red X in the upper right hand corner of the Chart
Wizard window.

To save the chart for later viewing, click Save. To delete the chart, click
Discard, to cancel the save and return to the chart, click Cancel. For this
example, type Variables for the chart name, then click Save. The chart will
close. To reopen the chart, click Explore – Existing Charts – Variables.

Export to Power BI
The example below illustrates how to export your data to Microsoft's Power BI.
Microsoft's POWER BI, for use with Office 365, is a cloud-based service that
works with Excel to help you visualize your data using various charts and
reports. Note: This functionality is currently only supported in Analytic Solver
Desktop.

Frontline Solvers Analytic Solver Data Mining Reference Guide 103


Click Help – Examples on the Data Mining ribbon to open the example dataset,
Boston_Housing.xlsx. This dataset includes fourteen variables pertaining to
housing prices from census tracts in the Boston area collected by the US Census
Bureau.
Select a cell within the dataset, say A2, and then click Explore – Chart Wizard
on the Data Mining ribbon. The following dialog appears.

Select Export to Power BI, then click Next.


If you have not yet logged on to Power BI in the current instance of Excel, you
will be asked to log in. Enter your credentials and then click Sign in.
After you have successfully logged in, you will be asked to either update an
existing dataset or create a new one. In this example, we will create a new
dataset in Power BI named Boston Housing.

Click OK. Once the upload is complete, you will receive the following message.

Frontline Solvers Analytic Solver Data Mining Reference Guide 104


Logon to Power BI (http://powerbi.microsoft.com/). The newly created dataset
will be listed under Datasets.

Select Boston Housing, then determine the components to be included in the


graph. In the screenshot below, a bar chart displays the frequency of the CAT.
MEDV (equal to 1 if the median value of owner-occupied homes in the census
tract is greater than or equal to $50K) variable for each value of the INDUS
variable (proportion of non-retail business acres per town). The components of
the graph were chosen from Fields on the left and the bar type was selected by

clicking the icon to the right of the graph.

Frontline Solvers Analytic Solver Data Mining Reference Guide 105


If you hover over one of the bars, for example the bar corresponding to value
1.52, you'll see that for this particular value, four housing tracts were assigned a
value of "1" for the CAT. MEDV variable.

Use the icon to pin this graph to the Dashboard.

Now, each time the dataset is updated, the results may be uploaded to your
Power BI dashboard. Click back to Excel and change the INDUS variable value
for 2 more records, assigned "1's" in the CAT.MEDV column, to 1.52. Click
Explore – Chart Wizard and upload the dataset a 2nd time by selecting Export
to Power BI. Note: We are not asked to log in to the Power BI site a second
time since we are using the same instance of Excel but we are asked if we would

Frontline Solvers Analytic Solver Data Mining Reference Guide 106


like to update an existing dataset. Select the existing Boston Housing dataset,
then click OK.

Click back to Power BI in your browser and refresh, the chart will update
automatically to reflect the change to the dataset.

Export to Tableau
Tableau is a popular interactive software package that allows you to visually
explore and analyze your data. Tableau can import data from a wide range of
sources, including Excel workbooks, and it is often used in conjunction with
Excel. Note: This functionality is currently only supported in Analytic Solver
Desktop.
With a single click, you can convert the results of your data mining model into a
Tableau Data Extract (.tde) file or HTML file using the Tableau Web Connector,
open them directly in Tableau, and visualize them with a few clicks.
Click Help – Examples on the Data Mining ribbon to open the example dataset,
Boston_Housing.xlsx. This dataset includes fourteen variables pertaining to
housing prices from census tracts in the Boston area collected by the US Census
Bureau.
Select a cell within the dataset, say A2, and then click Explore – Chart Wizard
on the Data Mining ribbon. The following dialog appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 107


Select Export to Tableau, then click Next to save the dataset formatted as a
*.tde file. If this extension is selected, Analytic Solver will extract static data.
You will be prompted to enter a name for the Tableau file. In this example, we
will name the file, BostonHousing.tde.

Once you click Save, the .tde file will be created, and the following message
will appear.

The *tde file will be saved in the location you selected with the name you typed
in the Save As dialog (as shown above). The rows and columns in the Tableau
file will match the rows and columns in the original dataset.
To open the files in Tableau, either double click each file (if using Desktop
Tableau) or click Other Files under Connect and open the desired file(s).

The Tableau Web Connector offers much more flexibility over Tableau Data
Extract by allowing you to refresh your data dynamically inside of Tableau. To
export data using this format, select the .html extension, then click Save.

Frontline Solvers Analytic Solver Data Mining Reference Guide 108


If Tableau Web Connector is selected, you will be prompted to select a folder in
which to save the HTML file. After you click Save, the following message will
appear.

To open the files in Tableau, open a new workbook in Tableau and click
Connect to Data.

On the connect menu, select More Servers – Web Data Connector on the
Connect menu.

Frontline Solvers Analytic Solver Data Mining Reference Guide 109


On the Web Data Connector dialog, enter the location displayed on the dialog
shown above (i.e., http://localhost:8080/) and press Enter.

When the following dialog appears, select your file. To add more data, click
Data – New Data Source on the Tableau ribbon, then repeat the actions
described above.

If your data has changed, you'll need to refresh the table within the Tableau Web
Connector HTML files. To do so:
1. Open the Chart Wizard and select Export to Tableau.
2. Select the HTML extension and then save the file to the desired
location.
3. Click OK.
4. In Tableau, click Data – Refresh All Extracts to update your data.
For more information on using Tableau, please refer to the Tableau
documentation found at http://www.tableau.com/.

Common Chart Options


Common Chart Options
The following options are common to all charts created by the chart wizard.

Frontline Solvers Analytic Solver Data Mining Reference Guide 110


Filters
Use this section to filter the data in the chart by selecting/unselecting the desired
variable(s) and moving the sliders to the left and right.

Observations
All variables in the dataset are reported under this section.

Chart Title Bar


Use the Chart Title bar to switch to a different chart type, print the chart or copy
the chart to the clipboard.

Chart Option Bar


Use the Chart Option bar to print/copy the chart or change the chart display
options.
• The first icon (starting from the left) in Analytic Solver Desktop is the
Print icon. Click this icon to see a preview of the chart before it is
printed and to change printer and page settings.
• Click the 2nd icon, the Copy icon, to copy the chart to the clipboard for
pasting into a new or existing document.
• Click the 3rd option, the Chart Options icon, to change chart settings
such as Legend and Axis titles, to add labels, or to change chart colors
or borders. (Several charts do not support all tabs and options.)

Click the Legend tab to display the chart legend, legend position, and to add a
chart title.

Frontline Solvers Analytic Solver Data Mining Reference Guide 111


Click the Labels tab to change or add either a header or footer to the chart. Use
this tab to select the position of the header/footer (center, left, or right), the font,
and the backplane style and color.

Click the Colors tab to change the colors used in the chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 112


Click the Borders tab to change the border of the chart.

Click the Axes tab to change the X and Y Axis titles, placement and font. (The
Formatting menu is enabled only for Variable Plot, Histogram, and Scatterplot
Charts.)

Click OK to accept the changes or Cancel to disregard the changes and return to
the chart window.

Data Mining Cloud Chart Options

The first icon (starting from the left) in Data Mining Cloud or
AnalyticSolver.com is the Back button. Click this button to go back to the chart
selection screen. Click the Frontline Solvers icon (the lightbulb icon) to be
transported to our website at www.solver.com.

Frontline Solvers Analytic Solver Data Mining Reference Guide 113


The Data field determines the data points to be included in the chart. Click the
down arrow next to Statistic to select the statistic (mean, median, count, etc)
used to aggregate the data in the chart. Click the down arrow for Color By to
reveal specific data points in different colors and click the down arrow for Filter
to select the specific features (variables) to include in the chart.

The second icon is the options icon. Click this icon to determine the data points
to include in your model. To exclude/include data points, simply move the
sliders to the left and right.

Frontline Solvers Analytic Solver Data Mining Reference Guide 114


Transforming Datasets with
Missing or Invalid Data

Introduction
Analytic Solver Data Mining’s Missing Data Handling utility allows users to
detect missing values in the dataset and handle them in a way you specify.
Analytic Solver Data Mining considers an observation to be missing if the cell is
empty or contains an invalid formula. Analytic Solver Data Mining also allows
you to indicate specific data that you want designated as "missing" or "corrupt".

Analytic Solver Data Mining offers several different methods for dealing with
missing values. Each variable can be assigned a different “treatment”. For
example, if there is a missing value, then the entire record could be deleted or
the missing value could be replaced by an estimated mean/median/mode of the
bin or even with a value that you specify. The available options depend on the
variable type.

In the following examples, we will explore the various ways in which Analytic
Solver Data Mining can treat missing or invalid values in a dataset.

Missing Data Handling Examples


To open the Examples.xlsx workbook, click Help – Examples on the Desktop
or AnalyticSolver.com ribbon, click Forecasting/Data Mining Examples, and
open the dataset, Examples.

This workbook contains six worksheets containing small sample datasets. The
Example 1 dataset contains empty cells (cells B6 and D10), cells containing
invalid formulas (B13, C6, & C8), cells containing non numeric characters
(D13), etc. Analytic Solver Data Mining will treat each of these as missing
values.

Frontline Solvers Analytic Solver Data Mining Reference Guide 115


Open the Missing Data Handling dialog by clicking Data Mining – Transform –
Missing Data Handling. Confirm that Example 1 is displayed for Worksheet.

Click OK. The results of the data transformation are inserted into the Imputation
worksheet. Since no treatment was specified for any of the variables, none of
the missing or invalid values were replaced.

Frontline Solvers Analytic Solver Data Mining Reference Guide 116


If “Overwrite Existing Worksheet” is selected in the Missing Data Handling
dialog, Analytic Solver will overwrite the existing data with the treatment option
specified. Note: You must save the workbook in order for these changes to be
saved.

The Example 2 dataset is similar to the Example 1 dataset in that this dataset
contains empty cells (cells B6 and D10), cells containing invalid formulas (B13,
C8 & D4), cells containing non numeric characters (column C), etc. In this
example we will see how the missing values can be replaced by the variable
(column) Mean and Median.

To start, open the Missing Data Handling dialog. Confirm that Example 2 is
displayed for Worksheet.

Select Variable_1 in the Variables field then click the down arrow next to
Select Treatment in the section under How do you want to handle missing values
for the selected variable(s) and select Mean.

Frontline Solvers Analytic Solver Data Mining Reference Guide 117


Click Apply to selected variable(s).

The next dialog shows “Mean” appearing under “Treatment” for Variable_1.

Now select Variable_3 in the Variables field and click the down arrow next to
Mean under How do you want to handle missing values for the selected
variable(s). Select Median, then click Apply to selected variable(s) and click
OK to transform the data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 118


See the newly inserted Imputation1 worksheet for the results, shown below.

In the Variable_1 column, invalid or missing values have been replaced with the
mean calculated from the remaining values in the column. (12.34, 34, 44, -433,
43, 34, 6743, 3, 4 & 3). The cells containing missing values or invalid values in
the Variable_3 column, have been replaced by the median of the remaining
values in that column (12, 33, 44, 66, 33, 66, 22, 88, 55 & 79). The invalid data
for Variable_2 remains since no treatment was selected for this variable.

In the Example 3 dataset, Variable_3 has been replaced with date values.

Frontline Solvers Analytic Solver Data Mining Reference Guide 119


Open the Missing Data Handling dialog. Confirm that Example 3 is displayed
for Worksheet. In this example, we will replace the missing / invalid values for
Variable_2 and Variable_3 with the mode of each column.

On the Missing Data Handling dialog select Variable_2, click the down arrow
next to Select treatment under How do you want to handle values for the
selected variable(s), then select Mode. (The options Mean and Median do not
appear in the list since Variable_2 contains non-numeric values.) Click on
Apply to selected variable(s). Repeat these steps for Variable_3. Then click
OK.

Frontline Solvers Analytic Solver Data Mining Reference Guide 120


Results within the Imputation2 worksheet are shown below.

The missing values in the Variable_2 column have been replaced by the mode of
the valid values (dd) even though, in this instance, the data is non-numeric.
(Remember, the mode is the most frequently occurring value in the Variable_2
column.)

In the Variable_3 column, the third and ninth records contained missing values.
As you can see, they have been replaced by the mode for that column, 2 – Feb –
01.

Frontline Solvers Analytic Solver Data Mining Reference Guide 121


The Example 4 dataset again contains missing and invalid data for all three
variables: missing data in cells B6 and D10 and Excel errors in cells B13, C6,
and C8. In this example, we will demonstrate Analytic Solver Data Mining’s
ability to replace missing values with User Specified Values.

Open the Missing Data Handling dialog. Confirm that Example 4 is displayed
for Worksheet.

Select Variable_1, then click the down arrow next to Select treatment under
How do you want to handle missing values for the selected variable(s), then
select User specified value. In the field that appears directly to the right of User
specified value, enter 100, then click Apply to selected variable(s). Repeat
these steps for Variable_2. Then click OK.

Frontline Solvers Analytic Solver Data Mining Reference Guide 122


The results are shown below.

The missing values for Variable_1 in records 5 and 12 and in records 5 and 7 for
Variable_2, have been replaced by 100 while the empty cells for Variable_3
remain untouched.

In the Example 5 dataset, the value -999 appears in all three columns. This
example will illustrate Analytic Solver Data Mining’s ability to detect a given
value and replace that value with a user specified value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 123


Open the Missing Data Handling dialog. Confirm that Example 5 is displayed
for Worksheet or Data Source within the Data Source group.

Select Missing values are represented by this value and enter -999 in the field
that appears directly to the right of the option.

Select Variable_1 in the Variables field, click the down arrow next to Select
treatment and choose Mean from the menu, then click Apply to selected
variable(s).

Select Variable_2 in the Variables field, click the down arrow next to Mean and
choose User specified value from the menu. Enter “zzz” for the value then
click Apply to selected variable(s).

Finally, select Variable_3 in the Variables field, click the down arrow next to
User specified value and choose Mode from the menu. Click Apply to selected
variable(s).

Click OK to transform the data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 124


The results are shown below.

Note that in the Variable_1 column, the specified missing code (-999) was
replaced by the mean of the column (in record 12). In the Variable_2 column,
the missing values have been replaced by the user specified value of “zzz” in
records 5 and 7, and for variable_3, by the mode of the column in record 9.

Let’s take a look at one more dataset, Example 6, of Examples.xlsx.

Frontline Solvers Analytic Solver Data Mining Reference Guide 125


Open the Missing Data Handling dialog, confirm that Example 6 is displayed
for Worksheet or Data Source within the Data Source group, then apply the
following procedures to the indicated columns.

A. Select Missing values are represented by this value and enter 33 in


the field that appears directly to the right of the option.
B. Select Variable_1, select Delete record for How do you want to
handle missing values for the selected variable(s)?, then click Apply to
selected variable(s).
C. Select Variable_2, select Mode for How do you want to handle
missing values for the selected variable(s)?, then click Apply to
selected variable(s).
D. Select Variable_3, select User specified value for How do you want to
handle missing values for the selected variable(s)?, enter 9999, then
click Apply to selected variable(s).

Click OK to transform the data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 126


See the output in the Imputation5 worksheet.

Records 7 and 12 have been deleted since Delete Record was chosen for the
treatment of missing values for Variable_1. In the Variable_2 column, the
missing values in records 2 and 11 have been replaced by the mode of the
column, "dd". (Remember, record 7 (which included #NAME for Variable_2)
was deleted.) It is important to note that "Delete record" holds priority over
any other instruction in the Missing Data Handling feature.

In the Variable_3 column, we instructed Analytic Solver Data Mining to treat 33


as a missing value. As a result, the value of “33” in records 1, 2, 3, 6 and 9, were
replaced by the user specified value of “9999”.

Frontline Solvers Analytic Solver Data Mining Reference Guide 127


Note: The value for Variable_3 for record 12 was 33 which should have been
replaced by 9999. However, since Variable_1 contained a missing value for this
record, the instruction "Delete record" was executed first.

Options for Missing Data Handling


The following options appear on the Missing Data Handling dialog.

Missing Values are represented by this value


If this option is selected, a value (either non-numeric or numeric) must be
provided in the field that appears directly to the right of the option. Analytic
Solver Data Mining will treat this value as “missing” and will be handled per the
instructions applied in the Missing Data Handling dialog.
Note: Analytic Solver Data Mining treats empty and invalid cells as missing
values automatically.

Overwrite existing worksheet


If checked, Analytic Solver Data Mining overwrites the data set with the new
dataset in which all the missing values are appropriately treated.

Frontline Solvers Analytic Solver Data Mining Reference Guide 128


Variable names in the first Row
When this option is selected, Analytic Solver Data Mining will list each variable
according to the first row in the selected data range. When the box is unchecked,
Analytic Solver Data Mining follows the default naming convention, i.e., the
variable in the first column of the selected range will be called "Var1", the
second column "Var2," etc.

Variables
Each variable and its selected treatment option are listed here.

How do you want to handle missing values for


the selected variable(s)?
When a variable in the Variables field is selected, this option is enabled. Click
the down arrow to display the following options.
Delete record - If this option is selected, Analytic Solver Data Mining will
delete the entire record if a missing or invalid value is found for that variable.
Mode - All missing values in the column for the variable specified will be
replaced by the mode - the value occurring most frequently in the remainder of
the column.
Mean - All missing values in the column for the variable specified will be
replaced by the mean - the average of the values in the remainder of the column.
Median - All missing values in the column for the variable specified will be
replaced by the median - the number that would appear in the middle of the
remaining column values if all values were written in ascending order.
User specified value – If selected, a value must be entered in the field that
appears directly to the right of this menu. Analytic Solver Data Mining will
replace all missing / invalid values with this specified value.
No treatment - If this option is selected, no treatment will be applied to the
missing / invalid values for the selected variable.

Apply to selected variable(s)


Clicking this command button will apply the treatment option to the selected
variable.

Reset
Resets treatment for all variables listed in the Variables field. Also, deselects
the Overwrite Existing Worksheet option if selected.

OK
Click to run the Missing Data Handling feature of Analytic Solver Data Mining.

Frontline Solvers Analytic Solver Data Mining Reference Guide 129


Transform Continuous Data

Introduction
Analytic Solver Data Mining contains two techniques for transforming
continuous data: Binning and Rescaling.

Bin Continuous Data


Binning a dataset is a process of grouping measured data into data classes which
can reduce the effect of minor errors in the dataset leading to better
understanding and visualization. For example, consider exact ages versus the
categories, "child", "adult", and "elderly". The three categories would suffice in
most analysis rather than using exact ages which are less visual. In Analytic
Solver Data Mining, the user decides what values the binned variable should
take.
A variable can be binned in the following ways.
Equal count: When using this option, the data is binned in such a way that each
bin contains the same number of records. When this option is selected, the
options Rank of the bin, Mean of the bin, and Median of the bin are enabled.
Rank of the bin: In this option each value in the variable is assigned a rank
according to the start and interval values as specified by the user.
Mean of the bin: The mean is calculated as the average of the values lying in
the bin interval. This mean value is assigned to each record that lies in that
interval.
Median of the bin: Records with the same binning value are counted and the
median is calculated on the input value. The median value is then assigned to the
binned variable.
Equal Interval: Equal interval is based on bin size. When this method is
selected, the whole range is divided into bins with bin sizes specified by the
user. The options of Rank and Mid value are available with this method.
Rank of the bin: In this option each value in the variable is assigned a rank
according to the start and increment value. Users can specify the starting and
increment value.
Mid value: The mean is calculated as the average of the values lying in the bin
interval. This mean value is assigned to each value of the variable that lies in
that interval.

Rescale Continuous Data


The Rescaling utility was introduced in Analytic Solver Data Mining V2017.
Use this utility to normalize one or more features in your data. Many Data
Mining workflows include feature scaling/normalization during the data
preprocessing stage. Along with this general-purpose facility, you can access
rescaling functionality directly from the dialogs for Supervised Algorithms
available in Analytic Solver Data Mining application.

Frontline Solvers Analytic Solver Data Mining Reference Guide 130


Analytic Solver Data Mining provides the following methods for feature
scaling: Standardization, Normalization, Adjusted Normalization and Unit
Norm.
• Standardization makes the feature values have zero mean and unit
variance. (x−mean)/std.dev.
• Normalization scales the data values to the [0,1]
range. (x−min)/(max−min)
The Correction option specifies a small positive number ε that is
applied as a correction to the formula. The corrected formula is widely
used in Neural Networks when Logistic Sigmoid function is used to
activate the neurons in hidden layers – it ensures that the data values
never reach the asymptotic limits of the activation function. The
corrected formula is [x−(min−ε)]/[(max+ε)−(min−ε)].
• Adjusted Normalization scales the data values to the [-1,1] range.
[2(x−min)/(max−min)]−1
The Correction option specifies a small positive number ε that is
applied as a correction to the formula. The corrected formula is widely
used in Neural Networks when Hyperbolic Tangent function is used to
activate the neurons in hidden layers – it ensures that the data values
never reach the asymptotic limits of the activation function. The
corrected formula is {2[(x−(min−ε))/((max+ε)−(min−ε))]}−1.
• Unit Normalization is another frequently used method to scale the
data such that the feature vector has a unit length. This usually means
dividing each value by the Euclidean length (L2-norm) of the vector. In
some applications, it can be more practical to use the Manhattan
Distance (L1-norm).

Examples for Binning Continuous Data


The next four examples illustrate usage of the binning utility included within
Analytic Solver Data Mining. These examples all use the dataset within
Binning_Example.xlsx. Open Binning_Example.xlsx by clicking Help –
Examples, then Forecasting/Data Mining Examples –
Binning_Example.xlsx. A portion of the dataset is shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 131


Click Transform -- Transform Continuous Data – Bin on the desktop Data
Mining or AnalyticSolver.com ribbon or Transform – Bin Continuous Data
on the Cloud app Ribbon, to open the Bin Continuous Data dialog.
Select x3 in the Variables field. The options are immediately activated. Enter 5
for #bins for variable. Under Value in the binned variable is, enter 10 for Start
and 3 for Interval, then click Apply to selected variable. The variable, x3, will
appear in the field labeled, Binned Variable Name

Frontline Solvers Analytic Solver Data Mining Reference Guide 132


Now click Finish. Analytic Solver Data Mining reports the binning intervals in
the Bin_Output. This report (pictured below), displays the number of intervals,
or bins, created, along with the lower value, upper value and number of records
assigned to each bin. As specified, 5 bins, or intervals, were created.

Frontline Solvers Analytic Solver Data Mining Reference Guide 133


Click Bin_Transform to see the records assigned to each bin.

As specified, 5 bins were created for the Binned_x3 variable starting with a rank
of 10 and an interval of 3: 10, 13 (10 + 3), 16 (13 + 3), 19 (16 + 3), and 22 (19
+ 3). The first four smallest values (96, 104, 111, 113 in records 14, 19, 6, and
3, respectively) have been assigned to Bin 10. The next four values in ascending
order (136, 148, 150, 151 in records 17, 20, 18, and 1, respectively) have been
assigned to Bin 13. The next four values in ascending order (164, 168, 168, 173
in records 15, 9, 4, and 11, respectively) have been assigned to Bin 16. The next
five values in ascending order (174, 175, 178, 192, 197 in records 22, 7, 12, 5,
and 10, respectively) have been assigned to Bin 19 and the last five values (199,
202, 204, 245, 252 in records 13, 2, 21, 8, and 16, respectively) have been
assigned to Bin 22.
Though Binning Type is set to Equal Count, the number of records in each
interval may not be essentially the same. Factors such as border values, total
number of records, etc. influence the number of records assigned to each bin.
The next example bins the value of the variable to the mean of the bin rather
than the rank of the bin.
Click back to Sheet1 and open the Bin Continuous Data dialog. Select variable
“x3”, then select Mean of the bin, rather than Rank of the bin, for Value in the
binned variable is. Again enter 5 for #bins for variable. Click Apply to
selected variable then click Finish.

Frontline Solvers Analytic Solver Data Mining Reference Guide 134


The Bin_Output1 worksheet displays the number of bins created, the minimum
and maximum values and the number of records assigned to each bin.

Click the Bin_Transform1 output sheet.

Frontline Solvers Analytic Solver Data Mining Reference Guide 135


In the output, the Binned_x3 variable is equal to the mean of all the x3 variables
assigned to that bin. Let’s take the first record for an example. Recall, from the
previous example, the values from Bin 13: 136, 148, 150, 151. The mean of
these values is 146.25 ((136 + 148 + 150 + 151) / 4) which is the value for the
Binned_x3 variable for the first record.
Similarly, if we were to select the Median of the bin option, the Binned_x3
variable would equal the median of all x3 variables assigned to each bin.
The next example explores the Equal interval option.
Click back to Sheet1 and open the Bin Continuous Data dialog. Select x3 in the
Variables field, enter 4 for #bins for variable, select Equal interval under Bins
to be made with, enter 12 for Start and 3 for Interval under Value in the binned
variable is, then click Apply to selected variable.

Frontline Solvers Analytic Solver Data Mining Reference Guide 136


Click Finish. The Bin_Output2 output sheet displays the number of bins
created and the number of records assigned to each bin.

Frontline Solvers Analytic Solver Data Mining Reference Guide 137


Analytic Solver Data Mining calculates the interval roughly as the (Maximum
value for the x3 variable - Minimum value for the x3 variable) / #bins specified
by the user - or in this instance (252 – 96) / 4 which equals 39. This means that
the bins will be assigned x3 variables in accordance to the following rules.
Bin 12: Values 96 - < 135
Bin 15: Values 135 - < 174
Bin 18: Values 174 - < 213
Bin 21: Values 213 - < 252
In the first record, x3 has a value of 151. As a result, this record has been
assigned to Bin 15 since 151 lies in the interval of Bin 3.
Click back to Sheet1 and open the Bin Continuous Data dialog. Select x3 in the
Variables field, enter 4 for #bins for variable, select Equal interval under Bins
to be made with, select Mid Value for Value in the binned variable is, then click
Apply to selected variable.

Frontline Solvers Analytic Solver Data Mining Reference Guide 138


Then click Finish.

The Bin_Ouput3 worksheet displays the 4 intervals and the number of records
along with the range of values assigned to each bin: Bin 1 (96 to 135), Bin 2
(135 – 174), Bin 3 (174 – 213), and Bin 4 (213 – 252).

Frontline Solvers Analytic Solver Data Mining Reference Guide 139


The output sheet, Bin_Transform3, shows us which records have been assigned
to each of the 4 bins. The value of the binned variable is the midpoint of each
interval: 115.5 for Bin 1, 154.5 for Bin 2, 193.5 for Bin 3 and 232.5 for Bin 4.
In the first record, x3’s value is 154.4. Since this value lies in the interval for
Bin 2 (135 – 174) the mid value of this interval is reported for the Binned_x3
variable, 154.5. In the last record, x3’s value is 193.5. Since this value lies in
the interval for Bin 3 (174 – 213), the mid value of this interval is reported for
the Binned_x3 variable, 193.5.

Options for Binning Continuous Data


The following options appear on the Bin Continuous Data dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 140


Variable names in the first row
If this option is selected, the list of variables in the Variables field will be listed
according to titles appearing in the first row of the dataset.

Binned Variable Name


Variable appearing here will be binned.

Show binning intervals in the output


Select this option to include the binning intervals in the output report.

Name of binned variable


The name displayed here will appear for the binned variable in the output report.

#bins for variable


Enter the number of desired bins here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 141


Equal count
When this option is selected, the data is binned in such a way that each bin
contains the same number of records. Note: There is a possibility that the
number of records in a bin may not be equal due to factors such as border
values, the number of records being divisible by the number of bins, etc. The
options for Value of the binned variable for this process are Rank, Mean, and
Median. See below for explanations of each.

Equal interval
When this option is selected, the binning procedure will assign records to bins if
the record’s value falls in the interval of the bin. Bin intervals are calculated by
roughly subtracting the Minimum variable value from the Maximum variable
value and dividing by the number of bins ((Max Value – Min Value) / # bins).
The options for Value of the binned variable for this process are Rank and Mid
value. See below for explanations of each.

Rank of the bin


When either the Equal count or the Equal interval option is selected, Rank of the
bin is enabled. When selected, the User has the option to specify the Start value
of the first bin and the Interval of each bin. Subsequent bin values will be
calculated as the previous bin + interval value.

Mean of the bin


When the Equal count option is selected, Mean of the bin is enabled. Analytic
Solver Data Mining calculates the mean of all values in the bin and assigns that
value to the binned variable.

Median of the bin


When the Equal count option is selected, Median of the bin is enabled. Analytic
Solver Data Mining finds the median of all values in the bin and assigns that
value to the binned variable.

Mid Value
When the Equal Interval option is selected, this option is enabled. The mid
value of the interval will be displayed on the output report for the assigned bin.

Apply to Selected Variable


Click this command button to apply the selected options to the selected variable.

Examples for Rescaling Continuous Data


The next example illustrates how to use the rescaling utility included within
Analytic Solver Data Mining. This example uses the Utilities.xlsx example
dataset. Open Utilities.xlsx by clicking, Help – Examples, then
Forecasting/Data Mining Examples.

Frontline Solvers Analytic Solver Data Mining Reference Guide 142


Click Transform -- Transform Continuous Data – Rescale on the desktop
Data Mining or AnalyticSolver.com ribbon or Transform - Rescale
Continuous Data on the Cloud app Ribbon, to open the Rescaler dialog.
Select x1, x2, x3, x4, x5, x6, x7 and x8 in the Variables field. then click > to add
them as Selected Variables.

Click Next to advance to the Parameters dialog.


Click Partition Data to open the Partition Data dialog, then select the Partition
Data option to enable the partition options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 143


Click Done to accept the random partition defaults. For more information on
partition, see the Random Data Partitioning chapter that occurs later in this
guide.
Under Rescaling: Fitting, Select Adjusted Normalization. Leave the Correction
option set to the default of 0.01. Select Show Fitted Statistics to include in the
output.

Click Next to advance to the Transformation dialog. Leave Training and


Validation selected under Partition Data (the defaults), to rescale both partitions.

Frontline Solvers Analytic Solver Data Mining Reference Guide 144


Click Finish. Four output sheets are inserted to the right of the Data tab:
Rescaling, Rescaling_TrainingTransform, Rescaling_ValidationTransform and
Rescaling_Stored. The Output Navigator appears at the top of each of these
three sheets. (See the Scoring New Data chapter for information on how to score
new data using Rescaling_Stored.)

Click the Fitted Statistics link to navigate to the Fitted Statistics table located on
the Rescaling output sheet. Shift and Scale values are inferred from the training
data. Each formula below can be rearranged into the form (x-shift)/scale. Then
other partitions/new data is rescaled using the statistics of data features in the
training set.

Click the Transformed: Training link on the Output Navigator to display the
rescaled variable values for the Training partition.

Frontline Solvers Analytic Solver Data Mining Reference Guide 145


Note: Unselected variables are appended to the rescaled variables in the
Transformed: Training and Transformed: Validation data tables to maintain the
complete input data.

Click the Transformed: Validation link on the Output Navigator to display the
rescaled variable values for the Validation partition.

Rescaling Options
See below for explanations of options included on each of the three Rescaler
dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 146


See the section, Common Dialog Options, for information pertaining to each
option included on the Rescaler – Data dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 147


See below for explanations of options included on the Rescaler – Parameters
dialog.

Partition Data
If you haven't already partitioned your dataset, you can do so from within the
Rescaler method by selecting Partition Data on the Parameters tab. If this option
is selected, Analytic Solver Data Mining will partition your dataset (according to
the partition options you set) immediately before running the Rescaler. If
partitioning has already occurred on the dataset, this option will be disabled.
For more information on partitioning, please see the Data Mining Partitioning
chapter.
To specify the partitioning options, click the Partition Data command button to
open the Partitioning dialog, then select Partition Data to enable the Partitioning
options. See the chapter, Data Mining Partitioning, for more information on the
options included in this dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 148


Rescaling: Fitting
Use Rescaling to normalize one or more features in your data. Many Data
Mining workflows include feature scaling/normalization during the data
preprocessing stage. Along with this general-purpose facility, you can access
rescaling functionality directly from the dialogs for Supervised Algorithms
available in Analytic Solver Data Mining application.
Analytic Solver Data Mining provides the following methods for feature
scaling: Standardization, Normalization, Adjusted Normalization and Unit
Norm.
• Standardization makes the feature values have zero mean and unit
variance. (x−mean)/std.dev.
• Normalization scales the data values to the [0,1]
range. (x−min)/(max−min)
The Correction option specifies a small positive number ε that is
applied as a correction to the formula. The corrected formula is widely
used in Neural Networks when Logistic Sigmoid function is used to
activate the neurons in hidden layers – it ensures that the data values
never reach the asymptotic limits of the activation function. The
corrected formula is [x−(min−ε)]/[(max+ε)−(min−ε)].
• Adjusted Normalization scales the data values to the [-1,1] range.
[2(x−min)/(max−min)]−1
The Correction option specifies a small positive number ε that is
applied as a correction to the formula. The corrected formula is widely
used in Neural Networks when Hyperbolic Tangent function is used to
activate the neurons in hidden layers – it ensures that the data values
never reach the asymptotic limits of the activation function. The
corrected formula is {2[(x−(min−ε))/((max+ε)−(min−ε))]}−1.
• Unit Normalization is another frequently used method to scale the
data such that the feature vector has a unit length. This usually means
dividing each value by the Euclidean length (L2-norm) of the vector. In
some applications, it can be more practical to use the Manhattan
Distance (L1-norm).

Frontline Solvers Analytic Solver Data Mining Reference Guide 149


Show Fitted Statistics
Select Fitted Statistics to include in the Rescaler output. Shift and Scale values
are inferred from the training data. Each formula in the data table can be
rearranged into the form (x-shift)/scale. Then other partitions/new data is
rescaled using the statistics of data features in the training set.

See below for explanations of options located on the Rescaler – Transformation


dialog.

Partitioned Data
Select Training to apply the Rescaler method to the Training Partition.
Select Validation to apply the Rescaler method to the Validation Partition, if one
exists.
Select Testing to apply the Rescaler method to the Test Partition, if one exists.

Frontline Solvers Analytic Solver Data Mining Reference Guide 150


New Data
See the Scoring New Data chapter in the Data Miing User Guide for more
information on scoring new data within a worksheet or database.

Frontline Solvers Analytic Solver Data Mining Reference Guide 151


Transforming Categorical Data

Introduction
Analysts often deal with data that is not numeric. Non numeric data values can
be alphanumeric (mix of text and numbers) or numeric values with no numerical
significance (such as a postal code). Such variables are called 'Categorical'
variables, where every unique value of the variable is a separate 'category'.
Categorical variables can be nominal or ordinal. Nominal variable values have
no order, for example, True or False or Male or Female. Values for an ordinal
variable have a clear order but no fixed unit of measurement, i.e. Kinder, First,
Second, Third, Fourth, and Fifth or a Size Chart of 1, 2, 3, 4, 5.
Dealing with categorical data poses some limitations. For example, if your data
contains a multitude of categories, you might want to combine several categories
into one or perhaps you may want to use a data mining technique that does not
directly handle untransformed categorical variables.
Analytic Solver Data Mining provides options to transform data in the following
ways:
1. By Creating Dummy Variables: When this feature is used, a non-numeric
variable (column) is transformed into several new numeric or binary
variables (columns).
Imagine a variable called Language which has data values English, French,
German and Spanish. Running this transformation will result in the creation
of four new variables: Language_English, Language_French,
Language_German, and Language_Spanish. Each of these variables will
take on values of either 0 or 1 depending on the value of the Language
variable in the record. For instance, if in a particular record Language =
German, then among the dummy variables created, Language_German
will be 1 while the other Language_XXX variables will be set to zero.
2. Create Category Scores: In this feature, a string variable is converted into
a new numeric, categorical variable.
3. Reduce Categories: This utility helps you create a new categorical
variable that reduces the number of categories. You can reduce the number
of categories “by frequency” or “manually”.
There are two different options to choose from.
A. Option 1 assigns categories 1 through n - 1 to the n - 1
most frequently occurring categories, and assigns
category n to all remaining categories.
B. Option 2 maps multiple distinct category values in the
original column to a new category variable between 1 and
n where n is the number of observations.
Note: See the Analytic Solver User Guide for data limitations in Analytic
Solver Comprehensive/Data Mining and Basic.

Frontline Solvers Analytic Solver Data Mining Reference Guide 152


Transforming Categorical Data Examples
Four examples are presented in this section. The first example replaces one
categorical variable with three binary variables, the second example replaces the
same categorical variable with one ordinal variable and the last two examples
illustrate how to use the Reduce Categories tool. The first two examples use the
dataset contained within IrisFacto.xlsx, the last two use the Iris.xlsx dataset.
(IrisFacto.xlsx is derived from the well-known dataset, Iris.xlsx.) Both of these
datasets may be found either in Help – Examples – Forecasting/Datamining.

Example 1
Click the Data worksheet and then Transform -- Transform Categorical Data
-- Create Dummies in Desktop or AnalyticSolver.com or Transform –
Categorical Data – Create Dummies in the Cloud app to bring up the Create
Dummies dialog.
Select Species_name in the Variables field and then > to move the variable to
the Variables to be factored field. Note that Species_Name is a string variable.

Click OK and view the output, Encoding, which is inserted on the Model tab of
the Analytic Solver Task Pane under Data Mining – Transformations – Create
Dummies.

Frontline Solvers Analytic Solver Data Mining Reference Guide 153


As shown in the output above, the variable, Species_name, is expressed as three
binary dummy variables: Species_name_Setosa, Species_name_Verginica and
Species_name_Versicolor. These new dummy variables are assigned values of
either 1, to indicate that the record belongs, or 0, to indicate that the record does
not belong. For example, Species_name_Setosa is assigned a value of 1 only
when the value of Species_name="Setosa" is in the dataset. Otherwise,
Species_name_Setosa = 0. The same is true for the two remaining dummy
variables i.e. Species_name_Verginica and Species_name_Versicolor.
Analytic Solver Data Mining converted the string variable into three categorical
variables which resulted in a completely numeric dataset.

Example 2
Click back to the Data worksheet and then Transform -- Transform
Categorical Data -- Create Category Scores in Desktop or
AnalyticSolver.com or Transform – Categorical Data – Create Category
Scores in the Cloud app to bring up the Create Category Scores dialog.
Select Species_name in the Variables field and click > to move the variable to
the Variables to be factored field. Keep the default option of Assign numbers
1,2,3....

Frontline Solvers Analytic Solver Data Mining Reference Guide 154


Click OK. Expand Data Mining – Transformations – Create Category Scores
to view the results contained within Factorization.

Analytic Solver Data Mining has sorted the values of the Species_name variable
alphabetically and then assigned values of 1, 2 or 3 to each record depending on
the species type. (Starting from 1 because we selected Assign numbers 1,2,3....

Frontline Solvers Analytic Solver Data Mining Reference Guide 155


To have Analytic Solver Data Mining start from 0, select the option Assign
numbers 0, 1, 2,… on the Create Category Scores dialog.) A variable,
Factorized_Species_name is created to store these assigned numbers. Analytic
Solver Data Mining has converted this dataset to an entirely numeric dataset.

Example 3
Click back to the Data worksheet and then Transform -- Transform
Categorical Data – Reduce Categories in Desktop or AnalyticSolver.com or
Transform – Categorical Data – Reduce Categories in the Cloud app to bring
up the Reduce Categories dialog.
Select Petal_length as the variable…

…then select the Manually radio button under Assign Category heading.

Frontline Solvers Analytic Solver Data Mining Reference Guide 156


All unique values of the Petal_length variable are now listed. Select all
categories with values less than 2; click the down arrow next to Category and
select 1, then click Apply.
Repeat these steps for categories with values from 3 to 3.9 and apply a Category
number of 2. Continue repeating these steps until values ranging from 4 thru 4.9
are assigned a category number = 3, values ranging from 5 thru 5.9 are assigned
a category number = 4, and values ranging from 6 thru 6.9 are assigned a
category = 5.
If using Analytic Solver Comprehensive or Analytic Solver Data Maining, the
maximum number of categories will be equal to the number of unique values for
the selected variable. In this instance the petal_length variable contains 43
unique values.
If By Frequency is selected, Analytic Solver Data Mining assigns category
numbers 1 through 29 to the most frequent 29 unique values; and category
number 30 to all other unique values.

Frontline Solvers Analytic Solver Data Mining Reference Guide 157


Click OK. The output, Category_Reduction, is inserted into the Analytic Solver
task pane under Data Mining – Transformations – Reduce Categories.

In the output, Analytic Solver Data Mining has assigned new categories as
shown in the column, Reduced-Petal_Length, based on the choices made in the
Reduce Categories dialog.

Example 4
Click back to the Data worksheet and one more open the Reduce Categories
dialog. This time, select Petal_width in the Category variable field, leave the
default setting of By Frequency, enter 12 for Limit number of categories to, click
Apply then click OK.

Frontline Solvers Analytic Solver Data Mining Reference Guide 158


The output, Category_Reduction1, will be inserted into the task pane under Data
Mining – Transformations – Reduce Categories.

There are 22 unique values for Petal_width and Analytic Solver Data Mining
has classified the Petal_width variable using 12 different categories. The most
frequently appearing value is 0.2 (with 29 instances) which has been assigned to
category 1. The second most frequently appearing value is 1.3 (with 13
instances) which has been assigned to category 2. See the chart below for all
category assignments.
Value Number of Instances Assigned Category
0.2 29 1
1.3 13 2
1.8 12 3

Frontline Solvers Analytic Solver Data Mining Reference Guide 159


1.5 12 4
2.3 8 5
1.4 8 6
0.4 7 7
0.3 7 8
1.0 7 9
2.1 6 10
2.0 6 11
All Remaining Values 35 12
Incrementally increased category numbers are assigned to each value as the
number of instances decreases until the 11th category is assigned. All remaining
values are then lumped into the last category, 12.

Options for Transforming Categorical Data


Explanations for options that appear on one of the three Transform Categorical
Data dialogs appear below.

Data Range
Either type the cell address directly into this field, or using the reference button;
select the required data range from the worksheet or data set. If the cell pointer
(active cell) is already somewhere in the data range, Analytic Solver Data
Mining automatically picks up the contiguous data range surrounding the active

Frontline Solvers Analytic Solver Data Mining Reference Guide 160


cell. When the data range is selected Analytic Solver Data Mining displays the
number of records in the selected range.

First row contains headers


When this box is checked, Analytic Solver Data Mining lists the variables
according to the first row of the selected data range. When the box is unchecked,
Analytic Solver Data Mining follows the default naming convention, i.e., the
variable in the first column of the selected range will be called "Var1", the
second column "Var2," etc.

Variables
This list box contains the names of the variables in the selected data range. To
select a variable, simply click to highlight, then click the > button. Use the
CTRL or SHIFT keys to select multiple variables.

Variables to be factored
This list box contains the names of the input variables or the variables that will
be replaced with dummy variables. To remove a variable, simply click to
highlight, then click the < button. Use the CTRL or SHIFT keys to select
multiple variables.

Assign Numbers Options


The user can specify the number with which to start categorization 0 or 1. Select
the appropriate option.

Frontline Solvers Analytic Solver Data Mining Reference Guide 161


Category variable
Click the down arrow to select the desired variable for category reduction.

Assign Category
If By frequency is selected, incrementally increased category numbers will be
assigned to each category as the number of instances decrease until the n - 1
category is assigned. All remaining values will then be lumped into the last
category, n. If this option is selected, the Limit number of categories to option
will be enabled.
If Manually is selected, Analytic Solver Data Mining allows you to assign a
specific category number to single or multiple categories using the Assign
Category ID dropdown menu. If this option is selected, the Category option
will be enabled.

Limit number of categories to


If By frequency is selected, Limit number of categories to is enabled. Enter a
value from 1 to n-1 where n is the maximum number of unique values contained
in the variable. Click Apply to apply this mapping, or Reset to start over.

Assign Category ID
If Manually is selected, Assign Category ID is enabled. Click the down arrow to
select the Category number to assign to each unique value for the variable. This
list will contain values from 1 to n where n is the maximum number of distinct

Frontline Solvers Analytic Solver Data Mining Reference Guide 162


values contained in the variable. Click Apply to apply this mapping, or Reset to
start over.

Apply
Click the Apply command button to assign the specified category to the selected
variable.

Reset
Click the Reset command button to reset all categories in the variable to
unassigned.

Frontline Solvers Analytic Solver Data Mining Reference Guide 163


Principal Components Analysis

Introduction
In the data mining field, databases with large amounts of variables are routinely
encountered. In most cases, the size of the database can be reduced by removing
highly correlated or superfluous variables. The accuracy and reliability of a
classification or regression model produced from this resultant database will be
improved by the removal of these redundant and unnecessary variables. In
addition, superfluous variables increase the data-collection and data-processing
costs of deploying a model on a large database. As a result, one of the first steps
in data mining should be finding ways to reduce the number of independent or
input variables used in the model (otherwise known as dimensionality) without
sacrificing accuracy.
Dimensionality Reduction is the process of reducing the amount of variables to
be used as input in a regression or classification model. This domain can be
divided into two branches, feature selection and feature extraction. Feature
selection attempts to discover a subset of the original variables while Feature
Extraction attempts to map a high – dimensional model to a lower dimensional
space. In the past, Analytic Solver (previously referred to as XLMiner) only
contained a feature extraction tool, Principal Components Analysis (Transform –
Principal Components). However, in V2015, a new feature selection tool was
introduced, Feature Selection. This chapter explains Analytic Solver Data
Mining’s Principal Components Analysis functionality. For more information
on Analytic Solver Data Mining’s Feature Selection tool, please see the previous
chapter, “Feature Selection”.
Principal component analysis (PCA) is a mathematical procedure that
transforms a number of (possibly) correlated variables into a smaller number of
uncorrelated variables called principal components. The objective of principal
component analysis is to reduce the dimensionality (number of variables) of the
dataset but retain as much of the original variability in the data as possible. The
first principal component accounts for the majority of the variability in the data,
the second principal component accounts for the majority of the remaining
variability, and so on.
A principal component analysis is concerned with explaining the variance
covariance structure of a high dimensional random vector through a few linear
combinations of the original component variables. Consider a database X with m
rows and n columns (X4x3)
X11 X12 X13
X21 X22 X23
X31 X32 X33
X41 X42 X43
1. The first step in reducing the number of columns (variables) in the X matrix
using the Principal Components Analysis algorithm is to find the mean of
each column.
(X11 + X21 + X31 + X41)/4 = Mu1

Frontline Solvers Analytic Solver Data Mining Reference Guide 164


(X12 + X22 + X32 + X42)/4 = Mu2
(X13 + X23 + X33 + X43)/4 = Mu3
2. Next, the algorithm subtracts each element in the database by the mean
(Mu) thereby obtaining a new matrix, Ẍ, which also contains 4 rows and 3
columns.
X11 – Mu1 = Ẍ11 X12 – Mu2 = Ẍ12 X13 – Mu3 = Ẍ13
X21 – Mu1 = Ẍ21 X22 – Mu2 = Ẍ22 X23 – Mu3 = Ẍ23
X31 – Mu1 = Ẍ31 X32 – Mu2 = Ẍ32 X33 – Mu3 = Ẍ33
X41 – Mu1 = Ẍ41 X42 – Mu2 = Ẍ42 X43 – Mu3 = Ẍ43
3. Next, the PCA algorithm calculates the covariance or correlation matrix
(depending on the user’s preference) of the new Ẍ matrix.
4. Afterwards the algorithm calculates eigenvalues and eigenvectors from the
covariance matrix for each variable and lists these eigenvalues in order from
largest to smallest.
Larger eigenvalues denote that the variable should remain in the database.
Variables with smaller eigenvalues will be removed according to the user’s
preference.
5. Analytic Solver Data Mining allows users to choose between selecting a
fixed number of components (variables) to be included in the “reduced”
matrix (we will refer to this new matrix as the Y matrix) or the smallest
subset of variables that “explains” or accounts for a certain percentage
variance in the database. Variables with eigenvalues below the chosen
threshold will not be included in the Y matrix. Assume that the user has
chosen a fixed number of variables (2) to be included in the Y matrix.
6. A new matrix V (containing eigenvectors based on the selected
eigenvalues) is formed.
7. The original matrix X which has 4 rows and 3 columns will be multiplied by
the V matrix, containing 4 rows and 2 columns. This matrix multiplication
results in the new reduced Y matrix, containing 4 rows and 2 columns.
In algebraic form, consider a p-dimensional random vector X = ( X1, X2, ..., Xp )
where p principal components of X are k univariate random variables Y1, Y2, ...,
Yk which are defined by the following formulae:

where the coefficient vectors l1, l2 ,..etc. are chosen such that they satisfy the
following conditions:
First Principal Component = Linear combination l1'X that maximizes Var(l1'X)
and || l1 || =1
Second Principal Component = Linear combination l2'X that maximizes
Var(l2'X) and || l2 || =1
and Cov(l1'X , l2'X) =0

Frontline Solvers Analytic Solver Data Mining Reference Guide 165


jth Principal Component = Linear combination lj'X that maximizes Var(lj'X) and
|| lj || =1
and Cov(lk'X , lj'X) =0 for all k < j
These functions indicate that the principal components are those linear
combinations of the original variables which maximize the variance of the linear
combination and which have zero covariance (and hence zero correlation) with
the previous principal components.
It can be proved that there are exactly p such linear combinations. However,
typically, the first few principal components explain most of the variance in the
original data. As a result, instead of working with all the original variables X1,
X2, ..., Xp, you would typically first perform PCA and then use only the first two
or three principal components, say Y1 and Y2, in a subsequent analysis.

Examples for Principal Components


Two examples appear in this section to illustrate the Principal Components
Analysis Tool in Analytic Solver Data Mining. Each example uses the example
file, Utilities.xlsx. This example dataset gives data on 22 public utilities
within the US.
Open this dataset by clicking, Help - Examples on the Data Mining ribbon, then,
Forecasting/Data Mining Examples -- Utlities.xlsx

Click Transform – Principal Components on the Data Mining ribbon to open


the Principal Components Analysis dialog. Select variables x1 to x8, then click
the > command button to move them to the Selected Variables field.

Frontline Solvers Analytic Solver Data Mining Reference Guide 166


Click Next.
Analytic Solver Data Mining provides two routines for specifying the number of
principal components: Fixed #components and Smallest # components
explaining. The Fixed # components method allows the user to specify a fixed
number of components, or variables, to be included in the “reduced” model.
The Smallest #components explaining method allows the user to specify a
percentage of the variance. When this method is selected Analytic Solver Data
Mining will calculate the minimum number of principal components required to
account for that percentage of the variance.
In addition, Analytic Solver Data Mining provides two methods for calculating
the principal components: using the covariance or the correlation matrix. When
using the correlation matrix method, the data will be normalized first before the
method is applied. (The dataset is normalized by dividing each variable by its
standard deviation.) Normalizing gives all variables equal importance in terms
of variability. If the covariance method is selected, the dataset should first be
normalized.
Select Use Correlation Matrix (Use Standardized Variables). Then click
Next.

Frontline Solvers Analytic Solver Data Mining Reference Guide 167


On the Step 3 of 3 dialog, confirm Show principal components score is
selected, and then click Finish. This option displays an output matrix where the
columns are the principal components, the rows are the individual data records
and the value in each cell is the calculated score for that record on the relevant
principal component.
For a description of Show Q-Statistics and Show Hotteling's T-Squared Statistics
options, please see the Principal Components Options section below.

Expand Reports – Principal Component Analysis – Run 1 to view the output of


the analysis: PCA_Output and PCA_Scores. The output from PCA_Output is
shown below.

The top section of PCA_Output simply displays the number of principal


components created (8 as selected in the Step 2 of 3 dialog above), the number
of records in the dataset, the method chosen, (Matrix Method: Correlation also
selected in the Step 2 of 3 dialog) and the Transformation method chosen

Frontline Solvers Analytic Solver Data Mining Reference Guide 168


(Transformation method: Fixed # components as selected in the Step 2 of 3
dialog).
PCA_Output also holds the principal component table. The maximum
magnitude element for Component1 corresponds to x2 (-0.5712). This signifies
that the first principal component is measuring the effect of x2 on the utility
companies. Likewise, the second component appears to be measuring the effect
of x6 on the utility companies (maximum magnitude = |-0.6031|). Component1
accounts for 27.16% of the variance while the second component accounts for
23.75%. Together, these two components account for more than 50% of the
total variation. You can alternatively say the maximum magnitude element for
component 1 corresponds to x2.

Double click PCA_Scores to view the Principal Components table. This table
holds the weighted averages of the normalized variables (after each variable’s
mean is subtracted). (This matrix is described in the 2nd step of the PCA
algorithm - see Introduction above.) Again, we are looking for the magnitude or
absolute value of each figure in the table.

Click back to the Data sheet, then reopen the Principal Components Analysis
dialog. Select cells x1 through x8 then click Next on the dialog to advance to
the Step 2 of 3 dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 169


This time, select Smallest # components explaining, enter 50 for % of
variance, select Use Correlation Matrix (Use Standardized Variables), and
then click Finish.

The results of this analysis are inserted into Results – Reports -- Principal
Components Analysis. Open PCA_Output1. Only the first two components are
included in the output file since these two components account for over 50% of
the variation.

Frontline Solvers Analytic Solver Data Mining Reference Guide 170


The output from PCA_Scores1 is shown below. This table holds the weighted
averages of the normalized variables (after each variable’s mean is subtracted).
(This matrix is described in the 2nd step of the PCA algorithm - see Introduction
above.) Again, we are looking for the magnitude or absolute value of each figure
in the table.

After applying the Principal Components Analysis algorithm, users may proceed
to analyze their dataset by applying additional data mining algorithms featured
in Analytic Solver Data Mining.

Options for Principal Components Analysis


Options for Principal Components Analysis appear on the Step 2 of 3 and Step 3
of 3 dialogs. For more information on the Step1 of 3 dialog, please see the
Common Dialog Options section in the Introduction to Analytic Solver Data
Mining chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 171


Principal Components
Select the number of principal components displayed in the output.
Fixed # components
Specify a fixed number of components by selecting this option and entering an
integer value from 1 to n where n is the number of Input variables selected in
the Step 1 of 3 dialog. This option is selected by default, the default value of n is
equal to the number of input variables.

Smallest #components explaining


Select this option to specify a percentage. Analytic Solver Data Mining will
calculate the minimum number of principal components required to account for
that percentage of variance.

Method
To compute Principal Components the data is matrix multiplied by a
transformation matrix. This option lets you specify the choice of calculating this
transformation matrix.
Use Covariance matrix
The covariance matrix is a square, symmetric matrix of size n x n (number of
variables by number of variables). The diagonal elements are variances and the
off-diagonals are covariances. The eigenvalues and eigenvectors of the
covariance matrix are computed and the transformation matrix is defined as the
transpose of this eigenvector matrix. If the covariance method is selected, the
dataset should first be normalized. One way to organize the data is to divide
each variable by its standard deviation. Normalizing gives all variables equal
importance in terms of variability.1
Use Correlation matrix (Use Standardized Variables)
An alternative method is to derive the transformation matrix on the eigenvectors
of the correlation matrix instead of the covariance matrix. The correlation
matrix is equivalent to a covariance matrix for the data where each variable has

1
Shmueli, Galit, Nitin R. Patel, and Peter C. Bruce. Data Mining for Business Intelligence. 2 nd ed. New Jersey: Wiley, 2010

Frontline Solvers Analytic Solver Data Mining Reference Guide 172


been standardized to zero mean and unit variance. This method tends to
equalize the influence of each variable, inflating the influence of variables with
relatively small variance and reducing the influence of variables with high
variance. This option is selected by default.

Show principal components score


This option results in the display of a matrix in which the columns are the
principal components, the rows are the individual data records, and the value in
each cell is the calculated score for that record on the relevant principal
component. This option is selected by default.

Q Statistics and Hotteling’s T-Squared Statistics


Q Statistics, or residuals, and Hottelling’s T-Squared statistics are summary
statistics which help explain how well a model fits the sample data and can also
be used to detect any outliers in the data. A detailed explanation for each is
beyond the scope of this guide. Please see the literature for more information on
each of these statistics.

Show Q - Statistics
If this option is selected, Analytic Solver Data Mining will include Q-Statistics
in the output. Q statistics (or residuals) measure the difference between sample
data and the projection of the model onto the sample data. These statistics an
also be used to determine if any outliers exist in the data. Low values for Q
statistics indicate a well fit model.

Show Hotteling’s T-Squared Statistics


If this option is selected, Analytic Solver Data Mining will include Hotteling’s
T-Squared statistics in the output. T-Squared statistics measure the variation in
the sample data within the mode and indicate how far the sample data is from
the center of the model. These statistics can also be used to detect outliers in the
sample data. Low T-Squared statistics indicate a well fit model.

Frontline Solvers Analytic Solver Data Mining Reference Guide 173


k-Means Clustering

Introduction
Cluster Analysis, also called data segmentation, has a variety of goals which all
relate to grouping or segmenting a collection of objects (also called
observations, individuals, cases, or data rows) into subsets or "clusters". These
“clusters” are grouped in such a way that the observations included in each
cluster are more closely related to one another than objects assigned to different
clusters. The most important goal of cluster analysis is the notion of the degree
of similarity (or dissimilarity) between the individual objects being clustered.
There are two major methods of clustering -- hierarchical clustering and k-
means clustering.
This chapter explains the k-Means Clustering algorithm. (See the Hierarchical
Clustering chapter for information on this type of clustering analysis.) The goal
of this process is to divide the data into a set number of clusters (k) and to assign
each record to a cluster while minimizing the distribution within each cluster. A
non-hierarchical approach to forming good clusters is to specify a desired
number of clusters, say, k, then assign each case (object) to one of k clusters so
as to minimize a measure of dispersion within the clusters. A very common
measure is the sum of distances or sum of squared Euclidean distances from the
mean of each cluster. The problem can be set up as an integer programming
problem but because solving integer programs with a large number of variables
is time consuming, clusters are often computed using a fast, heuristic method
that generally produces good (but not necessarily optimal) solutions. The k-
means algorithm is one such method.

Example for k-Means Clustering


The example contained in this section uses the Wine.xlsx example file to
demonstrate how to create a model using the k-Means Clustering algorithm.
Open this example file by clicking, Help – Examples then Forecasting/Data
Mining Examples.
As shown in the figure below, each row in this dataset represents a sample of
wine taken from one of three wineries (A, B or C). In this example, the Type
variable representing the winery is ignored and the clustering is performed
simply on the basis of the properties of the wine samples (the remaining
variables).

Frontline Solvers Analytic Solver Data Mining Reference Guide 174


Click Data Mining – Cluster – k-Means Clustering to open the k – Means
Clustering dialog.
Select all variables under Variables in input data except Type, then click the >
button to shift the selected variables to the Selected Variables field.

Afterwards, click Next to advance to the next dialog.


Select Normalize Input Data. If this option is selected, Analytic Solver Data
Mining will normalize the input data before applying the k-Means Clustering
algorithm. Normalizing the data is important for adjusting the values measured
on different scales to a common scale. Note: The related outputs will be
reported in normalized coordinates' as only normalized coordinates are
displayed when this option is selected.

Frontline Solvers Analytic Solver Data Mining Reference Guide 175


Enter 8 for # Clusters to instruct the K-Means Clustering algorithm to form 8
cohesive groups of observations in the Wine data. Click One can use the results
of Hierarchical Clustering or several values of K to understand the best data
partitioning level.
Leave # Iterations at the default setting of 10. This option limits the number of
iterations for the K-Means Clustering algorithm. Even if the convergence
criteria has not yet been met, the cluster adjustment will stop once the limit on #
Iterations has been reached.
Select Random Starts and click the top arrow to increment the option value to
5. The final result of the K-Means Clustering algorithm depends on the initial
choice on the cluster centroids. The “random starts” option allows a better
choice by trying several random assignments. The best assignment (based on
Sum of Squared Distances) is chosen as an initialization for further K-Means
iterations.
Set Seed is selected by default. This option initializes the random number
generator that is used to assign the initial cluster centroids. Setting the random
number seed to a positive value ensures the reproducibility of the analysis. The
default value is “12345”.

Click Next.
On the Step 3 of 3 dialog, Show data summary and Show distances from each
cluster center are both selected by default. Click Finish.

Frontline Solvers Analytic Solver Data Mining Reference Guide 176


The K-Means Clustering method starts with k initial clusters. The algorithm
proceeds by alternating between two steps: "assignment" – where each record is
assigned to the cluster with the nearest centroid, and "update" – where new
cluster centroids are recomputed based on the partitioning found in the
"assignment" step.
The results of the clustering method, KMC_Output and KMC_Clusters, are
inserted right of the Data worksheet. In the top section of the output, the options
that were selected are listed.

The table “Random Starts Summary” displays the information about the initial
search for the best centroid assignment. The assignment marked by “Best Start”
is used as the initial assignment of the centroids.

Frontline Solvers Analytic Solver Data Mining Reference Guide 177


The "Cluster Centers" tables (shown below – click the Cluster Centers link in
the Output Navigator to view), displays detailed information about the clusters
formed by the k-Means Clustering algorithm: the final centroids and inter-
cluster distances. If the input data was normalized, Analytic Solver Data Mining
displays these tables in original and normalized coordinates.

Based on these distances, we could make an inference about the degree of (dis)
similarity between the formed clusters.

The "Cluster Summary" table (shown below – click the Cluster Summary link in
the Output Navigator to view) displays the number of records (observations)
included in each cluster and the within-cluster average distance. This
information can be used to better understand the data partitioning: how large and
how sparse the resulting clusters are.

Click the Cluster Labels link in the Output Navigator to view the "Cluster
Labels" table. This table displays the final cluster assignment for each
observation in the input data – the point is assigned to the "closest" cluster, i.e.
the one with the nearest centroid. For example, in the first record, the final
cluster assignment is 4 since the distance to that cluster centroid is the closest
(1.7952).

Frontline Solvers Analytic Solver Data Mining Reference Guide 178


k-Means Clustering Options
See the Common Dialog Options section in the Introduction to Analytic
Solver Data Mining chapter for an explanation of options present on the k-
Means Clustering – Step 1 of 3 dialog. See below for an explanation of options
on the k-Means Clustering Step 2 of 3 and Step 3 of 3 dialogs.

Normalize input data


If this option is selected, Analytic Solver Data Mining will normalize the input
data before applying the k-Means Clustering algorithm. Normalizing the data is
important to ensure that the distance measure accords equal weight to each
variable. Without normalization, the variable with the largest scale will
dominate the measure.

# Clusters
Enter the number of final cohesive groups of observations (k) to be formed here.
The number of clusters should be at least 1 and at most the number of
observations-1 in the data range. This value should be based on your knowledge
of the data and the number of projected clusters. One can use the results of

Frontline Solvers Analytic Solver Data Mining Reference Guide 179


Hierarchical Clustering or several values of K to understand the best data
partitioning level. The default value for this option is 2.

# Iterations
This option limits the number of iterations for the K-Means Clustering
algorithm. Even if the convergence criteria has not yet been met, the cluster
adjustment will stop once the limit on # Iterations has been reached. The
default value for this option is 10.

Options
If Fixed start is selected, Analytic Solver Data Mining will start building the
model with a single fixed starting point. This option is selected by default.
If Random starts is selected, the algorithm will start at any random point. When
this option is selected, a field to the right of the option is enabled. Enter the
number of desired starting points here. The final result of the K-Means
Clustering algorithm depends on the initial choice on the cluster centroids. The
“random starts” option allows a better choice by trying several random
assignments. The best assignment (based on Sum of Squared Distances) is
chosen as an initialization for further K-Means iterations.

Set Seed
This option initializes the random number generator that is used to assign the
initial cluster centroids. Setting the random number seed to a nonzero value
(any number of your choice is OK) ensures that the same sequence of random
numbers is used each time the initial cluster centroids are calculated. The
default value is “12345”. When the seed is zero, the random number generator
is initialized from the system clock, so the sequence of random numbers will be
different each time the centroids are initialized. If you need the results from
successive runs of the clustering method to be strictly comparable, you should
set the seed. To do this, select the checkbox next to the Set Seed edit box, or
type the number you want into the box. This option accepts positive integers
with up to 9 digits.

Frontline Solvers Analytic Solver Data Mining Reference Guide 180


Show data summary
Select this option to display the data summary in the k-Means Clustering output.
This option is selected by default.

Show distances from each cluster center


Select this option to display the distances from each cluster center in the k-
Means Clustering output. This option is selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 181


Hierarchical Clustering

Introduction
Cluster Analysis, also called data segmentation, has a variety of goals. All
relate to grouping or segmenting a collection of objects (also called
observations, individuals, cases, or data rows) into subsets or "clusters", such
that those within each cluster are more closely related to one another than
objects assigned to different clusters. The most important goal of cluster
analysis is the notion of degree of similarity (or dissimilarity) between the
individual objects being clustered. There are two major methods of clustering --
hierarchical clustering and k-means clustering. (See the k-means clustering
chapter for information on this type of clustering analysis.)
In hierarchical clustering the data are not partitioned into a particular cluster in
a single step. Instead, a series of partitions takes place, which may run from a
single cluster containing all objects to n clusters each containing a single object.
Hierarchical Clustering is subdivided into agglomerative methods, which
proceed by a series of fusions of the n objects into groups, and divisive
methods, which separate n objects successively into finer groupings. The
hierarchical clustering technique employed by Analytic Solver Data Mining is
an Agglomerative technique. Hierarchical clustering may be represented by a
two dimensional diagram known as a dendrogram which illustrates the fusions
or divisions made at each successive stage of analysis. An example of such a
dendrogram is given below:

Agglomerative methods
An agglomerative hierarchical clustering procedure produces a series of
partitions of the data, Pn, Pn-1, ....... , P1. The first Pn consists of n single object
'clusters', the last P1, consists of a single group containing all n cases.
At each particular stage the method joins the two clusters which are closest
together (most similar). (At the first stage, this amounts to joining together the
two objects that are closest together, since at the initial stage each cluster has
one object.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 182


Differences between methods arise because of the different methods of defining
distance (or similarity) between clusters. Several agglomerative techniques will
now be described in detail.

Single linkage clustering


One of the simplest agglomerative hierarchical clustering methods is single
linkage, also known as the nearest neighbor technique. The defining feature of
this method is that distance between groups is defined as the distance between
the closest pair of objects, where only pairs consisting of one object from each
group are considered.
In the single linkage method, D(r,s) is computed as
D(r,s) = Min { d(i,j) : Where object i is in cluster r and object j is cluster s }
Here the distance between every possible object pair (i,j) is computed, where
object i is in cluster r and object j is in cluster s. The minimum value of these
distances is said to be the distance between clusters r and s. In other words, the
distance between two clusters is given by the value of the shortest link between
the clusters.
At each stage of hierarchical clustering, the clusters r and s, for which D(r,s) is
minimum, are merged.

This measure of inter-group distance is illustrated in the figure below:

Complete linkage clustering


The complete linkage, also called farthest neighbor, clustering method is the
opposite of single linkage. In this clustering method, the distance between

Frontline Solvers Analytic Solver Data Mining Reference Guide 183


groups is defined as the distance between the most distant pair of objects, one
from each group.
In the complete linkage method, D(r,s) is computed as
D(r,s) = Max { d(i,j) : Where object i is in cluster r and object j is cluster s }
Here the distance between every possible object pair (i,j) is computed, where
object i is in cluster r and object j is in cluster s and the maximum value of these
distances is said to be the distance between clusters r and s. In other words, the
distance between two clusters is given by the value of the longest link between
the clusters.
At each stage of hierarchical clustering, the clusters r and s, for which D(r,s) is
minimum, are merged.
The measure is illustrated in the figure below:

Average linkage clustering


Here the distance between two clusters is defined as the average of distances
between all pairs of objects, where each pair is made up of one object from each
group.
In the average linkage method, D(r,s) is computed as
D(r,s) = Trs / ( Nr * Ns)
Where Trs is the sum of all pairwise distances between cluster r and cluster s. Nr
and Ns are the sizes of the clusters r and s respectively.
At each stage of hierarchical clustering, the clusters r and s, for which D(r,s) is
the minimum, are merged. The figure below illustrates average linkage
clustering:

Frontline Solvers Analytic Solver Data Mining Reference Guide 184


Centroid Method
With this method, groups once formed are represented by their mean values for
each variable, that is, their mean vector, and inter-group distance is now defined
in terms of distance between two such mean vectors.
In the group average linkage method, the two clusters r and s are merged such
that, after merging, the average pairwise distance within the newly formed
cluster, is minimized. Suppose we label the new cluster formed by merging
clusters r and s, as t. Then D(r,s) , the distance between clusters r and s is
computed as
D(r,s) = Average { d(i,j) : Where observations i and j are in cluster t, the cluster
formed by merging clusters r and s }
At each stage of hierarchical clustering, the clusters r and s, for which D(r,s) is
minimized, are merged. In this case, those two clusters are merged such that the
newly formed cluster, on average, will have minimum pairwise distances
between the points.

Ward's hierarchical clustering method


Ward (1963) proposed a clustering procedure seeking to form the partitions Pn, P
n-1,........, P1 in a manner that minimizes the loss associated with each grouping, and
to quantify that loss in a form that is readily interpretable. At each step in the
analysis, the union of every possible cluster pair is considered and the two
clusters whose fusion results in the minimum increase in 'information loss' are
combined. Information loss is defined by Ward in terms of an error sum-of-
squares criterion, ESS.
The rationale behind Ward's proposal can be illustrated most simply by
considering univariate data. Suppose for example, 10 objects have scores (2, 6,
5, 6, 2, 2, 2, 2, 0, 0, 0) on some particular variable. The loss of information that

Frontline Solvers Analytic Solver Data Mining Reference Guide 185


would result from treating the ten scores as one group with a mean of 2.5 is
represented by ESS given by,
ESS One group = (2 -2.5)2 + (6 -2.5)2 + ....... + (0 -2.5)2 = 50.5
On the other hand, if the 10 objects are classified according to their scores into
four sets,
{0,0,0}, {2,2,2,2}, {5}, {6,6}
The ESS can be evaluated as the sum of squares of four separate error sums of
squares
ESS One group = ESS group1 + ESSgroup2 + ESSgroup3 + ESSgroup4 = 0.0
Clustering the 10 scores into 4 clusters results in no loss of information.

McQuitty's Method
When this procedure is selected, at each step, when two clusters are to be joined,
the distance of the new cluster to an existing cluster is computed as the average
of the distances from the proposed cluster to the existing cluster.

Median Method
The Median Method also uses averaging when calculating the distance between
two records or observations. However, this method uses the median instead of
the mean.
One of the reasons why Hierarchical Clustering is so attractive to statisticians is
that all possible clusters can be examined visually, or in any desired way, by
examining the full dendrogram. However, there are a few limitations.
1. Hierarchical clustering can be computationally expensive as this method
requires computing and storing an n x n distance matrix. If using a large
dataset, this requirement can be very slow and require large amounts of
memory.
2. Clusters created through Hierarchical clustering are not very stable. If
records are eliminated, the results can be very different.
3. Outliers in the data can impact the results negatively.

Examples of Hierarchical Clustering


The utilities.xlsx example dataset (shown below) holds corporate data on 22 US
public utilities. This example will illustrate how a user could use Analytic
Solver Data Mining to perform a cluster analysis using hierarchical clustering.
Open this example by clicking Help -- Examples -- Forecasting/Data Mining
Examples.
Then click Cluster -- Hierarchical Clustering to open the Step 1 of 3 dialog.
Each record includes 8 observations. Before we can use a clustering technique,
the data must be “normalized” or “standardized”. A popular method for
normalizing continuous variables is to divide each variable by its standard
deviation. After the variables are standardized, the distance can be computed
between clusters using the Euclidean metric.

Frontline Solvers Analytic Solver Data Mining Reference Guide 186


An explanation of the variables is below.

In this example, we will use Hierarchical clustering to predict the cost impact of
deregulation. To perform the requisite analysis, economists would be required
to build a detailed cost model of the various utilities. However, to save a
considerable amount of time and effort, we could cluster similar types of
utilities, build a detailed cost model for just one ”typical” utility in each cluster,
then scale up from these models to estimate results for all utilities.
Click Cluster -- Hierarchical Clustering to bring up the Hierarchical
Clustering dialog.
Select variables x1 through x8 in the Variables in Input Data field, then click >
to move the selected variables to the Selected Variables field.

Frontline Solvers Analytic Solver Data Mining Reference Guide 187


Then click Next to advance to the Hierarchical Clustering - Step 2 of 3 dialog.
At the top of the dialog, select Normalize input data. When this option is
selected, Analytic Solver Data Mining will normalize the data by subtracting the
variable’s mean from each observation and dividing by the standard deviation.
Normalizing the data is important to ensure that the distance measure accords
equal weight to each variable -- without normalization, the variable with the
largest scale will dominate the measure.
Under Similarity measure, Euclidean distance is selected by default. The
Hierarchical clustering method uses the Euclidean Distance as the similarity
measure for raw numeric data. When the data is binary the remaining two
options, Jaccard's coefficients and Matching coefficients are enabled.
Under Clustering method, select Group average linkage. Recall from the
Introduction to this chapter, the group average linkage method calculates the
average distance of all possible distances between each record in each cluster.
Click Next.

Frontline Solvers Analytic Solver Data Mining Reference Guide 188


On the Step 3 of 3 dialog, Draw dendrogram and Show cluster membership
checkboxes are selected by default. Enter 10 for Maximum Number of Leaves
and 4 for the # Clusters. Then click Finish.

Analytic Solver Data Mining will create four clusters using the group average
linkage method. The output HC_Output, HC_Clusters and HC_Dendrogram
are inserted right.

Frontline Solvers Analytic Solver Data Mining Reference Guide 189


The top portion of the output simply displays the choices made during the
algorithm setup.

Further down, the output details the history of the cluster formation. Initially,
each individual case is considered its own cluster (single member in each
cluster). Analytic Solver Data Mining begins the method with # clusters = #
cases. At stage 1, above, clusters (i.e. cases) 12 and 21 were found to be closer
together than any other two clusters (i.e. cases), so they are joined together in to
cluster 12. At this point there is one cluster with two cases (cases 12 and 21),
and 19 additional clusters that still have just one case in each. At stage 2,
clusters 10 and 13 are found to be closer together than any other two clusters, so
they are joined together into cluster 10.
This process continues until there is just one cluster. At various stages of the
clustering process, there are different numbers of clusters. A graph called a
dendrogram illustrates these steps.

Frontline Solvers Analytic Solver Data Mining Reference Guide 190


In the above dendrogram, the Sub Cluster IDs are listed along the x-axis (in an
order convenient for showing the cluster structure). The y-axis measures inter-
cluster distance. Consider cases 12 and 21-- they have an inter-cluster distance
of 1.38. (Hover over the horizontal connecting line to see the Between-Cluster
Distance.) No other cases have a smaller inter-cluster distance, so 12 and 21 are
joined into one cluster, indicated by the horizontal line linking them. Next, we
see that cases 10 and 13 have the next smallest inter-cluster distance, so they are
joined into one cluster. The next smallest inter-cluster distance is between
clusters 4 and 20 and so on.

Frontline Solvers Analytic Solver Data Mining Reference Guide 191


If we draw a horizontal line through the diagram at any level on the y-axis (the
distance measure), the vertical cluster lines that intersect the horizontal line
indicate clusters whose members are at least that close to each other. If we draw
a horizontal line at distance = 2.4, for example, we see that there are 13 clusters.
In addition, we can see that a case can belong to multiple clusters, depending on
where we draw the line.
For purposes of assigning cases to clusters, we must specify the number of
clusters in advance. In this example, we specified a limit of 4.
Output from HC_Clusters is shown below. This table displays the assignment
of each record to the four clusters.

This next example illustrates Hierarchical Clustering when the data represents
the distance between the ith and jth records. (When applied to raw data,
Hierarchical clustering converts the data into the distance matrix format before
proceeding with the clustering algorithm. Providing the distance measures in
the data requires one less step for the Hierarchical clustering algorithm.)
Open the DistMatrix example dataset and the Hierarchical Clustering diagram.

Frontline Solvers Analytic Solver Data Mining Reference Guide 192


All variables will be added as Input Variables. Click Next.
Notice Normalize input data, Jaccard’s coefficients and Matching coefficients
are disabled when Distance matrix is used. Again, select Group average
linkage as the Clustering method. Then click Next.

Select Draw dendrogram (default), Show cluster membership (default) and


enter 4 for # Clusters and 10 for Maximum Number of Leaves.

Frontline Solvers Analytic Solver Data Mining Reference Guide 193


Then click Finish.
Output contained within HC_Output, HC_Clusters, and HC_Dendrogram will
be inserted right.
The Clustering Stages output (included on HC_Output) is shown below.

The Dendrogram output (included on HC_Dendrogram) is shown below.


Note: To view this charts in the Cloud app, click the Charts icon on the Ribbon,
select HC_Dendrogram for Worksheet and Dendrogram for Hierarchical
Clustering for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 194


One of the reasons why Hierarchical Clustering is so attractive to statisticians is
that it is not necessary to specify the number of clusters desired. In addition, the
clustering process can be easily illustrated with a dendrogram. However, there
are a few limitations.
i. Hierarchical clustering requires computing and storing an n x n distance
matrix. If using a large dataset, this requirement can be very slow and
require large amounts of memory.
ii. Clusters created through Hierarchical clustering are not very stable. If
records are eliminated, the results can be very different.
iii. Outliers in the data can impact the results negatively.

Options for Hierarchical Clustering


The following options appear on the Hierarchical Clustering Step 1 of 3, Step 2
of 3 and Step 3 of 3 dialogs. See the Common options section of the chapter
"Introduction to Analytic Solver Data Mining" for options appearing on the Step
1 of 3 dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 195


Data Type
The Hierarchical clustering method can be used on raw data as well as the data
in Distance Matrix format. Choose the appropriate option to fit your dataset. If
Raw Data is chosen, Analytic Solver Data Mining computes the similarity
matrix before clustering.

Frontline Solvers Analytic Solver Data Mining Reference Guide 196


Normalize input data
Normalizing the data is important to ensure that the distance measure accords
equal weight to each variable -- without normalization, the variable with the
largest scale will dominate the measure.

Similarity Measures
The Hierarchical clustering uses the Euclidean Distance as the similarity
measure for working on raw numeric data. When the data is binary, the
remaining two options, Jaccard's coefficients and Matching coefficient are
enabled.
Suppose we have binary values for all the xij ’s. See the table below for
individual i’s and j’s.

The most useful similarity measures in this situation are:


Jaccard’s coefficient = d/(b+c+d). This coefficient ignores zero matches.
The matching coefficient = (a + d)/p.

Clustering Method
See the introduction to this chapter for descriptions of each method.

Frontline Solvers Analytic Solver Data Mining Reference Guide 197


Draw Dendrogram
Select this option to have Analytic Solver Data Mining create a dendrogram to
illustrate the clustering process.

Maximum Number of Leaves


If Draw Dendrogram is selected, this option is enabled. Use this option to
define the maximum number of leaves in the dendrogram tree. The default
setting is equal to 10 or if the number of records in the dataset is less than 10;
then the default is the Minimum between the Number of Rows in the dataset and
your current licensed limit.

Show cluster membership


Select this option to display the cluster number (ID) to which each record is
assigned by the routine.

Number of Clusters
Recall that the agglomerative method of hierarchical clustering continues to
form clusters until only one cluster is left. This option lets you stop the process
at a given number of clusters.

Frontline Solvers Analytic Solver Data Mining Reference Guide 198


Text Mining

Introduction
Text mining is the practice of automated analysis of one document or a
collection of documents (corpus) and extracting non-trivial information from it.
Also, Text Mining usually involves the process of transforming unstructured
textual data into structured representation by analyzing the patterns derived from
text. The results can be analyzed to discover interesting knowledge, some of
which would only be found by a human carefully reading and analyzing the text.
Typical widely-used tasks of Text Mining include but are not limited to
Automatic Text Classification/Categorization, Topic Extraction, Concept
Extraction, Documents/Terms Clustering, Sentiment Analysis, Frequency-based
Analysis and many more. Some of these tasks could not be completed by a
human, which makes Text Mining a particularly useful and applicable tool in
modern Data Science. Analytic Solver Data Mining takes an integrated
approach to text mining as it does not totally separate analysis of unstructured
data from traditional data mining techniques applicable for structured
information. While Analytic Solver Data Mining is a very powerful tool for
analyzing text only, it also offers automated treatment of mixed data, i.e.
combination of multiple unstructured and structured fields. This is a particularly
useful feature that has many real-world applications, such as analyzing
maintenance reports, evaluation forms, insurance claims, etc. Analytic Solver
Data Mining uses the “bag of words” model – the simplified representation of
text, where the precise grammatical structure of text and exact word order is
disregarded. Instead, syntactic, frequency-based information is preserved and is
used for text representation. Although such assumptions might be harmful for
some specific applications of Natural Language Processing (NLP), it has been
proven to work very well for applications such as Text Categorization, Concept
Extraction and others, which are the particular areas addressed by Analytic
Solver Data Mining’s Text Mining capabilities. It has been shown in many
theoretical/empirical studies that syntactic similarity often implies semantic
similarity. One way to access syntactic relationships is to represent text in terms
of Generalized Vector Space Model (GVSP). Advantage of such representation
is a meaningful mapping of text to the numeric space, the disadvantage is that
some semantic elements, e.g. order of words, are lost (recall the bag-of-words
assumption).
Input to Text Miner (the Text Mining tool within Analytic Solver Data Mining)
could be of two main types – few relatively large documents (e.g. several books)
or relatively large number of smaller documents (e.g. collection of emails, news
articles, product reviews, comments, tweets, Facebook posts, etc.). While
Analytic Solver Data Mining is capable of analyzing large text documents, it is
particularly effective for large corpuses of relatively small documents.
Obviously, this functionality has limitless number of applications – for instance,
email spam detection, topic extraction in articles, automatic rerouting of
correspondence, sentiment analysis of product reviews and many more.
The input for text mining is a dataset on a worksheet, with at least one column
that contains free-form text (or file paths to documents in a file system
containing free-form text), and, optionally, other columns that contain traditional

Frontline Solvers Analytic Solver Data Mining Reference Guide 199


structured data. In the first tab of the Text Mining dialog, the user selects the
text variable(s), and the other variable(s) to be processed.
The output for the text mining is a set of reports that contain general explorative
information about the collection of documents and structured representations of
text (free-form text columns are expanded to a set of new columns with numeric
representation. The new columns will each correspond to either (i) a single term
(word) found in the “corpus” of documents, or, if requested, (ii) a concept
extracted from the corpus through Latent Semantic Indexing (LSI, also called
LSA or Latent Semantic Analysis). Each concept represents an automatically
derived complex combination of terms/words that have been identified to be
related to a particular topic in the corpus of documents. The structural
representation of text can serve as an input to any traditional Data Mining
techniques available in Analytic Solver Data Mining – unsupervised/supervised,
affinity, visualization techniques, etc. In addition, Analytic Solver Data Mining
also presents a visual representation of Text Mining results to allow the user to
interactively explore the information, which otherwise would be extremely hard
to analyze manually. Typical visualizations that aid in understanding of Text
Mining outputs and that are produced by Analytic Solver Data Mining are:
• Zipf plot – for visual/interactive exploration of frequency-based information
extracted by Text Miner
• Scree Plot, Term-Concept and Document-Concept 2D scatter plots – for
visual/interactive exploration of Concept Extraction results
If you are interested in visualizing specific parts of Text Mining analysis
outputs, Analytic Solver Data Mining provides rich capabilities for charting –
the functionality that can be used to explore Text Mining results and supplement
standard charts discussed above.
In the example below, you will learn how to use Text Miner in Analytic Solver
Data Mining to process/analyze approximately 1000 text files and use the results
for automatic topic categorization. This will be achieved by using structured
representation of text presented to Logistic Regression for building the model
for classification.

Text Mining Example


This example uses the text files within the Text Mining Example Documents.zip
archive file to illustrate how to use Analytic Solver Data Mining’s Text Mining
tool. These documents were selected from the well-known text dataset
(downloadable from http://www.cs.cmu.edu/afs/cs/project/theo-
20/www/data/news20.html) which consists of 20,000 messages, collected from
20 different internet newsgroups. We selected about 1,200 of these messages
that were posted to two interest groups, Autos and Electronics (about 500
documents from each).
Note: The Data Mining Cloud app does not currently support importing from a
File Folder.
The Text Mining Example Documents.zip archive file is located at C:\Program
Files\Frontline Systems\Analytic Solver Platform\Datasets. See the section
Importing from a File Folder within the Sampling or Importing from a
Database, Worksheet or File Folder chapter for directions on extracting and
importing the text files into Analytic Solver Data Mining. We will pick up
where this example leaves off, after the files have been imported and a sample
created.

Frontline Solvers Analytic Solver Data Mining Reference Guide 200


Here is an example of a document that appeared in the Electronics newsgroup.
Note the appearance of email addresses, “From” and “Subject” lines. All three
appear in each document.

The selected file paths are now in random order, but we will need to categorize
the “Autos” and “Electronics” files in order to be able to identify them later. To
do this, we’ll use Excel to sort the rows by the file path: Select columns C
through D and rows 23 through 323, then choose Sort from the Data tab. In the
Sort dialog, select column d, where the file paths are located, and click OK.

Frontline Solvers Analytic Solver Data Mining Reference Guide 201


The file paths should now be sorted between Electronics and Autos files.

On the Data Mining Platform Ribbon tab, click the Text icon to bring up the
Text Miner dialog. Select TextVar in the Variables list box, and click the upper
> button to move it to the Selected Text Variables list box. By doing so, we are
selecting the text in the documents as input to the Text Miner model. Ensure that
“Text variables contain file paths” is checked.

Click the Next button, or click the Pre-Processing tab at the top.

Leave the default setting for Analyze all terms selected under Mode. When this
option is selected, Analytic Solver Data Mining will examine all terms in the
document. A “term” is defined as an individual entity in the text, which may or
may not be an English word. A term can be a word, number, email, url, etc.
terms are separated by all possible delimiting characters (i.e. \, ?, ', `, ~, |, \r,
\n, \t, :, !, @, #, $, %, ^, &, *, (, ), [, ], {, }, <>,_, ;, =, -, +, \) with some
exceptions related to stopwords, synonyms, exclusion terms and boilerplate

Frontline Solvers Analytic Solver Data Mining Reference Guide 202


normalization (URLs, emails, monetary amounts, etc.). Text Miner will not
tokenize on these delimiters.

Note: Exceptions are related not to how terms are separated but as to whether
they are split based on the delimiter. For example: URL's contain many
characters such as "/", ";", etc. Text Miner will not tokenize on these characters
in the URL but will consider the URL as a whole and will remove the URL if
selected for removal. (See below for more information.)

If Analyze specified terms only is selected, the Edit Terms button will be
enabled. If you click this button, the Edit Exclusive Terms dialog opens. Here
you can add and remove terms to be considered for text mining. All other terms
will be disregarded. For example, if we wanted to mine each document for a
specific part name such as “alternator” we would click Add Term on the Edit
Exclusive Terms dialog, then replace “New term” with “alternator” and click
Done to return to the Pre-Processing dialog. During the text mining process,
Analytic Solver Data Mining would analyze each document for the term
“alternator”, excluding all other terms.

Leave both Start term/phrase and End term/phrase empty under Text
Location. If this option is used, text appearing before the first occurrence of the
Start Phrase will be disregarded and similarly, text appearing after End Phrase
(if used) will be disregarded. For example, if text mining the transcripts from a
Live Chat service, you would not be particularly interested in any text appearing
before the heading “Chat Transcript” or after the heading “End of Chat
Transcript”. Thus you would enter “Chat Transcript” into the Start Phrase field
and “End of Chat Transcript” into the End Phrase field.

Leave the default setting for Stopword removal. Click Edit to view a list of
commonly used words that will be removed from the documents during pre-
processing. To remove a word from the Stopword list, simply highlight the
desired word, then click Remove Stopword. To add a new word to the list,
click Add Stopword, a new term “stopword” will be added. Double click to
edit.
Analytic Solver Data Mining also allows additional stopwords to be added or
existing to be removed via a text document (*.txt) by using the Browse button to
navigate to the file. Terms in the text document can be separated by a space, a
comma, or both. If we were supplying our three terms in a text document, rather
than in the Edit Stopwords dialog, the terms could be listed as: subject
emailterm from or subject,emailterm,from or subject, emailterm, from. If we
had a large list of additional stopwords, this would be the preferred way to enter
the terms.

Frontline Solvers Analytic Solver Data Mining Reference Guide 203


Click Advanced in the Term Normalization group to open the Term
Normalization – Advanced dialog. Select all options as shown below. Then
click Done. This dialog allows us to indicate to Analytic Solver Data Mining,
that
• If stemming reduced term length to 2 or less characters, disregard the
term (Minimum stemmed term length).
• HTML tags, and the text enclosed, will be removed entirely. HTML
tags and text contained inside these tags often contain technical,
computer-generated information that is not typically relevant to the
goal of the text mining application.
• URLs will be replaced with the term, “urltoken”. Specific form of
URLs do not normally add any meaning, but it is sometimes interesting
to know how many URLs are included in a document.
• Email addresses will be replaced with the term, “emailtoken”. Since
the documents in our collection all contain a great many email
addresses (and the distinction between the different emails often has
little use in Text Mining), these email addresses will be replaced with
the term “emailtoken”.
• Numbers will be replaced with the term, “numbertoken”.
• Monetary amounts will be substituted with the term, “moneytoken”.

Frontline Solvers Analytic Solver Data Mining Reference Guide 204


Recall that when we inspected an email from the document collection we saw
several terms such as “subject”, “from” and email addresses. Since all of our
documents contain these terms, including them in the analysis will not provide
any benefit and could bias the analysis. As a result, we will exclude these terms
from all documents by selecting Exclusion list then clicking Edit. The Edit
Exclusion List dialog opens. Click Add Exclusion Term. The label
“exclusionterm” is added. Click to edit and change to “subject”. Then repeat
these same steps to add the term “from”.

We can take the email issue one step further and completely remove the term
“emailtoken” from the collection. Click Add Exclusion Term and edit
“exclusionterm” to “emailtoken”.

To remove a term from the exclusion list, highlight the term and click Remove
Exclusion Term.

We could have also entered these terms into a text document (*.txt) and added
the terms all at once by using the Browse button to navigate to the file and
import the list. Terms in the text document can be separated by a space, a
comma, or both. If, for example we were supplying excluded terms in a
document rather than in the Edit Exclusion List dialog, we would enter the terms
as: subject emailtoken from, or subject,emailtoken,from, or subject, emailtoken,
from. If we had a large list of terms to be excluded, this would be the preferred
way to enter the terms.

Click Done to close the dialog and return to Pre-Processing.

Analytic Solver Data Mining also allows the combining of synonyms and full
phrases by clicking Advanced within Vocabulary Reduction. Select Synonym
reduction at the top of the dialog to replace synonyms such as
“car”, “automobile”, “convertible”, “vehicle”, “sedan”, “coupe”,
“subcompact”, and “jeep” with “auto”. Click Add Synonym and replace
“rootterm” with “auto” then replace “synonym list” with “car, automobile,
convertible, vehicle, sedan, coupe” (without the quotes). During pre-
processing, Analytic Solver Data Mining will replace the terms “car”,
“automobile”, “convertible”, “vehicle”, “sedan”, “coupe”, “subcompact” and
“jeep” with the term “auto”. To remove a synonym from the list, highlight the
term and click Remove Synonym.

Frontline Solvers Analytic Solver Data Mining Reference Guide 205


If adding synonyms from a text file, each line must be of the form
rootterm:synonymlist or using our example: auto:car automobile convertible
vehicle sedan coup or auto:car,automobile,convertible,vehicle,sedan,coup. Note
separation between the terms in the synonym list be either a space, a comma or
both. If we had a large list of synonyms, this would be the preferred way to
enter the terms.

Analytic Solver Data Mining also allows the combining of words into phrases
that indicate a singular meaning such as “station wagon” which refers to a
specific type of car rather than two distinct tokens – station and wagon. To add
a phrase in the Vocabulary Reduction – Advanced dialog, select Phrase
reduction and click Add Phrase. The term “phrasetoken” will be appear, click
to edit and enter “wagon”. Click “phrase” to edit and enter “station wagon”. If
supplying phrases through a text file (*.txt), each line of the file must be of the
form phrasetoken:phrase or using our example, wagon:station wagon. If we had
a large list of phrases, this would be the preferred way to enter the terms.

Enter 200 for Maximum Vocabulary Size. Analytic Solver Data Mining will
reduce the number of terms in the final vocabulary to the top 200 most
frequently occurring in the collection.

Leave Perform stemming at the selected default. Stemming is the practice of


stripping words down to their “stems” or “roots”, for example, stemming terms
such as “argue”, “argued”, “argues”, “arguing”, and “argus” would result in the
stem “argu. However “argument” and “arguments” would stem to “argument”.

Frontline Solvers Analytic Solver Data Mining Reference Guide 206


The stemming algorithm utilized in Analytic Solver Data Mining is “smart” in
the sense that while “running” would be stemmed to “run”, “runner” would
not. . Analytic Solver Data Mining uses the Porter Stemmer 2 algorithm for the
English Language. For more information on this algorithm, please see the
Webpage: http://tartarus.org/martin/PorterStemmer/

Leave the default selection for Normalize case. When this option is checked,
Analytic Solver Data Mining converts all text to a consistent (lower) case, so
that Term, term, TERM, etc. are all normalized to a single token “term” before
any processing, rather than creating three independent tokens with different
case. This simple method can dramatically affect the frequency distributions of
the corpus, leading to biased results.

Enter 3 for Remove terms occurring in less than _% of documents and 97 for
Remove terms occurring in more than _% of documents. For many text mining
applications, the goal is to identify terms that are useful for discriminating
between documents. If a particular term occurs in all or almost all documents, it
may not be possible to highlight the differences. If a term occurs in very few
documents, it will often indicate great specificity of this term, which is not very
useful for some Text Mining purposes.

Enter 20 for Maximum term length. Terms that contain more than 20 characters
will be excluded from the text mining analysis and will not be present in the
final reports. This option can be extremely useful for removing some parts of
text which are not actual English words, for example, URLs or computer-
generated tokens, or to exclude very rare terms such as Latin species or disease
names, i.e. Pneumonoultramicroscopicsilicovolcanoconiosis.

Click Next to advance to the Representation tab or simply click Representation


at the top.

Keep the default selection of TF-IDF (Term Frequency – Inverse Document


Frequency) for Term-Document Matrix Scheme. A term-document matrix is a
matrix that displays the frequency-based information of terms occurring in a

Frontline Solvers Analytic Solver Data Mining Reference Guide 207


document or collection of documents. Each column is assigned a term and each
row a document. If a term appears in a document, a weight is placed in the
corresponding column indicating the term’s importance or
contribution. Analytic Solver Data Mining offers four different commonly used
methods of weighting scheme used to represent each value in the matrix:
Presence/Absence, Term Frequency, TF-IDF (the default) and Scaled term
frequency. If Presence/Absence is selected, Analytic Solver Data Mining will
enter a 1 in the corresponding row/column if the term appears in the document
and 0 otherwise. This matrix scheme does not take into account the number of
times the term occurs in each document. If Term Frequency is selected,
Analytic Solver Data Mining will count the number of times the term appears in
the document and enter this value into the corresponding row/column in the
matrix. The default setting – Term Frequency – Inverse Document Frequency
(TF-IDF) is the product of scaled term frequency and inverse document
frequency. Inverse document frequency is calculated by taking the logarithm of
the total number of documents divided by the number of documents that contain
the term. A high value for TF-IDF indicates that a term that does not occur
frequently in the collection of documents taken as a whole, appears quite
frequently in the specified document. A TF-IDF value close to 0 indicates that
the term appears frequently in the collection or rarely in a specific document. If
Scaled term frequency is selected, Analytic Solver Data Mining will normalize
(bring to the same scale) the number of occurrences of a term in the documents
(see the table below)..

It’s also possible to create your own scheme by clicking the Advanced command
button to open the Term Document Matrix – Advanced dialog. Here users can
select their own choices for local weighting, global weighting, and
normalization. Please see the table below for definitions regarding options for
Term Frequency, Document Frequency and Normalization.

Local Weighting Global Weighting Normalization


1, if 𝑡𝑓𝑡𝑑 > 0
Binary 𝑙𝑤𝑡𝑑 = { None 𝑔𝑤𝑡 = 1 None 𝑛𝑑 = 1
0, if 𝑡𝑓𝑡𝑑 = 0
𝑁 𝑛𝑑
Raw 1
𝑙𝑤𝑡𝑑 = 𝑡𝑓𝑡𝑑 Inverse 𝑔𝑤𝑡 = log 2 Cosine
Frequency 1 + 𝑑𝑓𝑡 =
‖𝑔 𝑑 ‖2
̅̅̅
1
Logarithmic 𝑙𝑤𝑡𝑑 = log(1 + 𝑡𝑓𝑡𝑑 ) Normal 𝑔𝑤𝑡 = 2
√∑𝑑 𝑡𝑓𝑡𝑑
𝑙𝑤𝑡𝑑
𝑡𝑓 𝑐𝑓𝑡
Augnorm (max 𝑡𝑑
𝑡𝑓𝑡𝑑 ) + 1 GF-IDF 𝑔𝑤𝑡 =
𝑡
𝑑𝑓𝑡
=
2
𝑔𝑤𝑡
Entropy 𝑝𝑡𝑑 log 𝑝𝑡𝑑
= 1+∑
𝑑 log 𝑁
IDF 𝑁
𝑔𝑤𝑡 = log 2
probability 1 + 𝑑𝑓𝑡

Notations:
• 𝑡𝑓𝑡𝑑 – frequency of term 𝑡 in a document 𝑑;
• 𝑑𝑓𝑡 – document frequency of term 𝑡;
• 𝑙𝑤𝑡𝑑 – local weighting of term 𝑡 in a document 𝑑;

Frontline Solvers Analytic Solver Data Mining Reference Guide 208


• 𝑔𝑤𝑡𝑑 – global weighting of term 𝑡 in a document 𝑑;
• 𝑛𝑑 – normalization of vector of terms representing the document 𝑑;
• 𝑁 – total number of documents in the collection;
• 𝑐𝑓𝑡 – collection frequency of term 𝑡;
• 𝑝𝑡𝑑 – estimated probability of term 𝑡 to appear in a document 𝑑
𝑡𝑓
(𝑝𝑡𝑑 = 𝑡𝑑⁄𝑐𝑓 );
𝑡
• 𝑔 𝑑 – vector of terms representing the document 𝑑.
̅̅̅

Finally, the element 𝑇𝑡𝑑 of Term-Document Matrix is computed as 𝑇𝑡𝑑 =


𝑙𝑤𝑡𝑑 ∗ 𝑔𝑤𝑡 ∗ 𝑛𝑑 , ∀𝑡, 𝑑

Leave Perform latent semantic indexing selected (the default). When this option
is selected, Analytic Solver Data Mining will use Latent Semantic Indexing
(LSI) to detect patterns in the associations between terms and concepts to
discover the meaning of the document.

The statistics produced and displayed in the Term-Document Matrix contain


basic information on the frequency of terms appearing in the document
collection. With this information we can “rank” the significance or importance
of these terms relative to the collection and particular document. Latent
Semantic Indexing, in comparison, uses singular value decomposition (SVD) to
map the terms and documents into a common space to find patterns and
relationships. For example: if we inspected our document collection, we might
find that each time the term “alternator” appeared in an automobile document,
the document also included the terms “battery” and “headlights”. Or each time
the term “brake” appeared in an automobile document, the terms “pads” and
“squeaky” also appeared. However there is no detectable pattern regarding the
use of the terms “alternator” and “brake”. Documents including “alternator”
might not include “brake” and documents including “brake” might not include
“alternator”. Our four terms, battery, headlights, pads, and squeaky describe
two different automobile repair issues: failing brakes and a bad
alternator. Latent Semantic Indexing will attempt to 1. Distinguish between
these two different topics, 2. Identify the documents that deal with faulty
brakes, alternator problems or both and 3. Map the terms into a common
semantic space using singular value decomposition. SVD is a tool used by Text
Miner to extract concepts that explain the main dimensions of meaning of the
documents in the collection. The results of LSA are usually hard to examine
because the construction of the concept representations will not be fully
explained. Interpreting these results is actually more of an art, than a science.
However, Analytic Solver Data Mining provides several visualizations that
simplify this process greatly.

Select Maximum number of concepts and increment the counter to 20. Doing
so will tell Analytic Solver Data Mining to retain the top 20 of the most
significant concepts. If Automatic is selected, Analytic Solver Data Mining will
calculate the importance of each concept, take the difference between each and
report any concepts above the largest difference. For example if three concepts
were identified (Concept1, Concept2, and Concept3) and given importance
factors of 10, 8, and 2, respectively, Analytic Solver Data Mining would keep
Concept1 and Concept2 since the difference between Concept2 and Concept 3
(8-2=6) is larger than the difference between Concept1 and Concept2 (10-
8=2). If Minimum percentage explained is selected, Analytic Solver Data
Mining will identify the concepts with singular values that, when taken together,
sum to the minimum percentage explained, 90% is the default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 209


Click Next or the Output Options tab.

Keep Term-Document and Concept-Document selected under Matrices (the


default) and select Term-Concept to print each matrix in the output. The Term-
Document matrix displays the terms across the top of the matrix and the
documents down the left side of the matrix. The Concept – Document and Term
– Concept matrices are output from the Perform latent semantic indexing option
that we selected on the Representation tab. In the first matrix, Concept –
Document, 20 concepts will be listed across the top of the matrix and the
documents will be listed down the left side of the matrix. The values in this
matrix represent concept coordinates in the identified semantic space. In the
Term-Concept matrix, the terms will be listed across the top of the matrix and
the concepts will be listed down the left side of the matrix. The values in this
matrix represent terms in the extracted semantic space.

Keep Term frequency table selected (the default) under Preprocessing Summary
and select Zipf’s plot. Increase the Most frequent terms to 20 and select
Maximum corresponding documents. The Term frequency table will include
the top 20 most frequently occurring terms. The first column, Collection
Frequency, displays the number of times the term appears in the collection. The
2nd column, Document Frequency, displays the number of documents that
include the term. The third column, Top Documents, displays the top 5
documents where the corresponding term appears the most frequently. The Zipf
Plot graphs the document frequency against the term ranks in descending order
of frequency.. Zipf’s law states that the frequency of terms used in a free-form
text drops exponentially, i.e. that people tend to use a relatively small number of
words extremely frequently and use a large number of words very rarely.

Keep Show documents summary selected and check Keep a short excerpt. under
Documents. Analytic Solver Data Mining will produce a table displaying the
document ID, length of the document, number of terms and 20 characters of the
text of the document.

Frontline Solvers Analytic Solver Data Mining Reference Guide 210


Select all plots under Concept Extraction to produce various plots in the output.
Select Write text mining model under Text Miner Model to write the model to the
output sheets.

Click the Finish button to run the Text Mining analysis. Result worksheets are
inserted to the right.

Select the TM_Output tab. The Term Count table shows that the original term
count in the documents was reduced by 16.02% by the removal of stopwords,
excluded terms, synonyms, phrase removal and other specified preprocessing
procedures.

Scroll down to the Documents table. This table lists each Document with its
length, number of terms, and if Keep a short excerpt is selected on the Output
Options tab and a value is present for Number of characters, then an excerpt
from each document will be displayed.

Double click TM_TDM to display the Term – Document Matrix. As discussed


above, this matrix lists the 200 most frequently appearing terms across the top
and the document IDs down the left. A portion of this table is shown below. If a

Frontline Solvers Analytic Solver Data Mining Reference Guide 211


term appears in a document, a weight is placed in the corresponding column
indicating the importance of the term using our selection of TF-IDF on the
Representation dialog.

Click the TM_Vocabulary tab to view the Final List of Terms table. This table
contains the top 20 terms occurring in the document collection, the number of
documents that include the term and the top 5 document IDs where the
corresponding term appears most frequently. In this list we see terms such as
“car”, “power”, “engine”, “drive”, and “dealer” which suggests that many of the
documents, even the documents from the electronic newsgroup, were related to
autos.

When you click on the TM_Vocabulary tab, the Zipf Plot opens. We see that
our collection of documents obey the power law stated by Zipf (see above). As
we move from left to right on the graph, the documents that contain the most
frequently appearing terms (when ranked from most frequent to least frequent)
drop quite steeply. Hover over each data point to see the detailed information
about the term corresponding to this data point.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select the desired worksheet, in this case TM_Vocabulary, for
Worksheet and the desired chart for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 212


The term “numbertoken” is the most frequently occurring term in the document
collection appearing in 223 documents (out of 300), 1,083 times total. Compare
this to a less frequently occurring term such as "think" which appears in only 64
documents and only 82 times total.

Click the TM_LSASummary tab to view the Concept Importance and Term
Importance tables. The first table, the Concept Importance table, lists each
concept, it’s singular value, the cumulative singular value and the % singular
value explained. (The number of concepts extracted is the minimum of the
number of documents (985) and the number of terms (limited to 200).) These
values are used to determine which concepts should be used in the Concept –
Document Matrix, Concept – Term Matrix and the Scree Plot according to the
Users selection on the Representation tab. In this example, we entered “20” for
Maximum number of concepts.

Frontline Solvers Analytic Solver Data Mining Reference Guide 213


The Term Importance table lists the 200 most important terms. (To increase the
number of terms from 200, enter a larger value for Maximum Vocabulary on the
Pre-processing tab of Text Miner.)

When you click the TM_LSASummary tab, the Scree Plot opens. This plot
gives a graphical representation of the contribution or importance of each
concept. The largest “drop” or “elbow” in the plot appears between the 1st and
2nd concept. This suggests that the first top concept explains the leading topic in
our collection of documents. Any remaining concepts have significantly
reduced importance. However, we can always select more than 1 concept to
increase the accuracy of the analysis – it is advised to examine the Concept
Importance table and the “Cumulative Singular Value” in particular to identify
how many top concepts capture enough information for your application.

Click TM_LSA_CDM to display the Concept – Document Matrix. This matrix


displays the top concepts (as selected on the Representation tab) along the top of
the matrix and the documents down the left side of the matrix.

Frontline Solvers Analytic Solver Data Mining Reference Guide 214


When you click on the TM_LSA_CDM tab, the Concept-Document Scatter Plot
opens. This graph is a visual representation of the Concept – Document
matrix. Note that Analytic Solver Data Mining normalizes each document
representation so it lies on a unit hypersphere. Documents that appear in the
middle of the plot, with concept coordinates near 0 are not explained well by
either of the shown concepts. The further the magnitude of coordinate from
zero, the more effect that particular concept has for the corresponding document.
In fact, two documents placed at extremes of a concept (one close to -1 and
other to +1) indicates strong differentiation between these documents in terms of
the extracted concept. This provides means for understanding actual meaning of
the concept and investigating which concepts have the largest discriminative
power, when used to represent the documents from the text collection.

You can examine all extracted concepts by changing the axes on a scatter plot -
click the down pointing arrow next to Concept 1 or the concept on the Y axis by
clicking the right pointing arrow next to Concept 2. Use your touchscreen or
your mouse scroll wheel to zoom in and out.

Double click TM_LSA_CTM to display the Concept – Term Matrix which lists
the top 20 most important concepts along the top of the matrix and the top 200
most frequently appearing terms down the side of the matrix.

When you click on the TM_LSA-CTM tab, the Term-Concept Scatter Plot
opens. This graph is a visual representation of the Concept – Term Matrix. It
displays all terms from the final vocabulary in terms of two concepts. Similarly

Frontline Solvers Analytic Solver Data Mining Reference Guide 215


to the Concept-Document scatter plot, the Concept-Term scatter plot visualizes
the distribution of vocabulary terms in the semantic space of meaning extracted
with LSA. The coordinates are also normalized, so the range of axes is always [-
1,1], where extreme values (close to +/-1) highlight the importance or “load” of
each term to a particular concept. The terms appearing in a zero-neighborhood
of concept range do not contribute much to a concept definition. In our example,
if we identify a concept having a set of terms that can be divided into two
groups: one related to “Autos” and other to “Electronics”, and these groups are
distant from each other on the axis corresponding to this concept, this would
definitely provide an evidence that this particular concept “caught” some pattern
in the text collection that is capable of discriminating the topic of article.
Therefore, Term-Concept scatter plot is an extremely valuable tool for
examining and understanding the main topics in the collection of documents,
finding similar words that indicate similar concept, or the terms explaining the
concept from “opposite sides” (e.g. term1 can be related to cheap affordable
electronics and term2 can be related to expensive luxury electronics)

Recall that if you want to examine different pair of concepts, click the down
pointing arrow next to Concept 1 and the right pointing arrow next to Concept 2
to change the concepts on either axis. Use your touchscreen or mouse wheel to
scroll in or out.

The TFIDF_Stored and LSA_Stored output sheets are used to process new
documents using an existing text mining model. See the section below,
Processing New Documents Based on an Existing Text Mining Model, to find
out how to score new text documents using an existing Text Mining model.
Note: When adding additional documents to an existing text mining model,
Analytic Solver Data Mining will not extract new terms or phrases from these
new documents. Rather, Analytic Solver Data Mining will first use the
vocabulary from the model to build a Term-Document Matrix and then, if
requested, will use transformation matrices to map documents in the new data
onto the existing semantic space extracted from the “base” model. Please see
below for an example explaining how to add additional documents to an existing
Text Mining model.

From here, we can use any of the six classification algorithms to classify our
documents according to some term or concept using the Term – Document

Frontline Solvers Analytic Solver Data Mining Reference Guide 216


matrix, Concept – Document matrix or Concept – Term matrix where each
document becomes a “record” and each concept becomes a “variable”. If
wanting to classify documents based on a binary variable such as Auto
email/non-Auto email, then we would use either the Term – Document or
Concept – Document matrix. If wanting to cluster terms or classify terms, then
we would use the Term-Concept matrix. We could even use the transpose of the
Term – Document matrix where each term would become a “record” and each
column would become a “feature”. See the Analytic Solver Data Mining User
Guide for an example model that uses the Logistic Regression Classification
method to create a classification model using the Concept Document matrix
within TM_LSA_CDM.
This concludes our example on how to use Analytic Solver Data Mining's new
Text Miner feature. This example has illustrated how Analytic Solver Data
Mining provides powerful tools for importing a collection of documents for
comprehensive text preprocessing, quantitation, and concept extraction, in order
to create a model that can be used to process new documents - all performed
without any manual intervention. When using Text Miner in conjunction with
our classification algorithms, Analytic Solver Data Mining can be used to
classify customer reviews as satisfied/not satisfied, distinguish between which
products garnered the least negative reviews, extract the topics of articles,
cluster the documents/terms, etc. The applications for Text Miner are
boundless.

Processing New Documents Based on an Existing


Text Mining Model
In Analytic Solver Data Mining, it’s possible to process new text documents
based on the existing Text Mining model(s) if the option Write Text Mining
Model is selected on the Output Options tab. Two models (TFIDF_Stored,
LSA_Stored) are created: for Term Frequency – Inverse Document Frequency
Vectorization (TF-IDF) and, if Concept Extraction was performed, for Latent
Semantic Analysis (LSA).
The TF-IDF model contains the information needed for processing the new text
documents based on the vocabulary inferred from the training corpus. All
preprocessing settings and stages will be applied to the text in new documents to
ensure the proper mapping to the baseline vocabulary. The LSA model contains
the information needed for “mapping” the Term-Document Matrices (TDM)
representing the vectorized collection of new documents onto existing latent
semantic space defined by the training documents. The below example
illustrates how to vectorize the set of new documents and extract the concepts
from them using the text models created in the previous section. Two hundred
additional text documents (100 each for electronics and autos) have been
extracted from the same Newsgroups dataset as used in the example above
(complete dataset downloadable from
http://www.cs.cmu.edu/afs/cs/project/theo-20/www/data/news20.html)
Note: The scoring of new data based on the TF-IDF model produces a Term-
Document Matrix given a collection of new documents. The scoring based on
the LSA model produces a Concept-Document Matrix (CDM) given a Term-
Document Matrix.
Click Help – Examples on the Data Mining ribbon, then select Examples from
the menu to open the Text Mining Example Documents.zip archive and extract
these files to a desired location. See the section Importing from a File Folder

Frontline Solvers Analytic Solver Data Mining Reference Guide 217


within the Sampling or Importing from a Database, Worksheet or File
Folder chapter for directions on extracting and importing the files into Analytic
Solver Data Mining.
Click Sample – Import from File Folder to open the Import From File System
dialog. Click Browse and navigate to the location of the additional electronics
text files. Set file type to All Files, (lower, right corner of Browse dialog) then
select all 100 files in the folder. Click the >> button on the Import From File
System dialog to move all files to Selected Files. Repeat these steps to load the
additional auto documents in the Additional autos folder. You should now have
200 documents listed under Selected Files.
Select Sample from selected files, then enter 100 for Desired Sample
Size. Keep Write file paths selected for Output, then click OK. Recall that when
Write file paths is selected, pointers to the file locations are stored in
FileSampling. If Write file contents is selected, the content of each text
document will be written to a cell in FileSampling, up to a maximum of 32,767
characters.

Click OK. FileSampling1 is inserted into the Solver Task Pane. We will again
sort the documents by type (electronic or auto) by using Microsoft Excel’s Sort
functionality (on the Data menu).

Frontline Solvers Analytic Solver Data Mining Reference Guide 218


Click Score on the Data Mining ribbon to bring up the Select New Data Sheet &
Stored Model Sheet dialog. Select Match By Name to match TextVar under
Variables In New Data with TextVar under Model Variables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 219


Notice that FileSampling1 has been selected for Worksheet under Data to be
Scored and TFIDF_Stored has been selected for Worksheet under Stored Model
Model. This example will vectorize new data according to the TF-IDF model
(i.e. product TDM – term-document matrix).
Alternatively, variables could be mapped by selecting TextVar under both
Selected Text Variables and Model Text Variables, then click Match Selected. If
Match Sequentially is used, Analytic Solver Data Mining will match variables in
the order that they appear. To unmatch a single pair of variables, highlight the
desired variables in the Model Text Variables list box and select Unmatch
Selected. To unmatch all variables, click Unmatch All.
Click OK to score the new documents using the existing model created in the
above example.

To extract concepts for new data based on the LSA model (i.e. product CDM -
Concept-Document matrix), we will score the term-document matrix. Click
Score on the Data Mining ribbon to bring up the Select New Data Sheet &
Stored Model Sheet dialog.

Select LSA_Stored for Worksheet under Stored Model. Select Match By Name
to match the terms from the Stored Model sheet (LSA_Stored) with the terms
from the term document matrix.

Click OK to score the term document matrix. The output is the Concept-
Document matrix.

Frontline Solvers Analytic Solver Data Mining Reference Guide 220


Text Mining Options
The following options appear on each of the five different Text Miner dialogs:
Data Source, Models, Pre-Processing, Representation, and Output Options.

Variables
Variables contained in this listbox are text variables included within a dataset
with at least one column that contains free-form text (or file paths to documents
containing free-form text), and optionally other columns that contain traditional
structured data.

First Row Contains Headers


Select this option if the first row of your dataset contains headers for your data.
This option is selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 221


Selected Text Variables
Variables contained in this listbox have been selected from the Variables listbox
as inputs to Text Miner.

Text variables contain file paths


Select this option if the text variables within your dataset contain “pointers” or
paths to a text document or collection of text documents. If your dataset
contains “points”, this option will be selected by default.

Selected Non-Text Variables


Variables contained in this listbox have been selected from the Variables listbox
as non-text inputs to Text Miner, i.e.numeric variables.

Map text variables to an existing model


Select this option if processing new text documents based on an existing text
model. Once this option is selected, options on the model dialog will be
enabled, and options on both the Pre-Processing and Representation tabs will be
disabled. Options used and defined in the existing “base” Text Miner model
(created in the previous section) will be prefilled. Vocabulary for the new
collection is defined in the existing base model and will be used to map the new
documents to the existing space of terms. Some preprocessing options that affect
vocabulary reduction and term normalization, are not applicable in this mode
and are not prefilled. To change any of these options, you must create a new
“baseline” Text Miner model.

Frontline Solvers Analytic Solver Data Mining Reference Guide 222


Select Model Worksheet
The TM_Model output sheet should be automatically selected. If there are
multiple TM_Model output sheets within the same Workbook, click the down
arrow and select the desired one.

Select Model Workbook


The current workbook will be automatically prefilled. If multiple workbooks
are opened, click the down arrow to select the workbook containing the desired
one.

Selected Text Variables


Variables contained in this listbox are text variables included within a dataset,
with at least one column that contains free-form text (or file paths to documents
containing free-form text), and optionally other columns that contain traditional
structured data.

Model Text Variables


Text variables included in this listbox are existing text variables already
included in the existing or “base” Text Miner model.

Match Selected
Select one text variable from the Selected Text Variables and Model Text
Variables listbox, then select Match Selected to manually map variables from
the dataset to the existing model. The match will appear under Model Text
Variables.

Unmatch Selected
Select a set of matched variables under Model Text Variables and click
Unmatch Selected to unmatch the pair.

Unmatch All
Click Unmatch All, to unmatch all previously matched variables under Model
Text Variables.

Match By Name
Click Match By Name to match all variables in the Selected Text Variables
listbox with variables of the same name in the Model Text Variables listbox.

Match Sequentially
Click Match Sequentially to match all variables, in order as listed, in the
Selected Text Variables listbox with variables, in order as listed, in the Model
Text Variables listbox.

Frontline Solvers Analytic Solver Data Mining Reference Guide 223


Analyze All Terms
When this option is selected, Analytic Solver Data Mining will examine all
terms in the document. A “term” is defined as an individual entity in the text,
which may or may not be an English word. A term can be a word, number,
email, url, etc. This option is selected by default.

Analyze specified terms only


When this option is selected, Analytic Solver Data Mining will examine all
terms in the document. A “term” is defined as an individual entity in the text,
which may or may not be an English word. A term can be a word, number,
email, url, etc. If Analyze specified terms only is selected, the Edit Terms button
will be enabled. If you click this button, the Edit Exclusive Terms dialog
opens. Here you can add and remove terms to be considered for text mining. All
other terms will be disregarded. For example, if we wanted to mine each
document for a specific part name such as “alternator” we would click Add Term
on the Edit Exclusive Terms dialog, then replace “New term” with “alternator”
and click Done to return to the Pre-Processing dialog. During the text mining
process, Analytic Solver Data Mining would analyze each document for the
term “alternator”, excluding all other terms.

Start term/phrase
If this option is used, text appearing before the first occurrence of the Start
Phrase will be disregarded and similarly, text appearing after End Phrase (if
used) will be disregarded. For example, if text mining the transcripts from a
Live Chat service, you would not be particularly interested in any text appearing
before the heading “Chat Transcript” or after the heading “End of Chat
Transcript”. Thus you would enter “Chat Transcript” into the Start Phrase field
and “End of Chat Transcript” into the End Phrase field.

Frontline Solvers Analytic Solver Data Mining Reference Guide 224


End term/phrase
If this option is selected, text appearing before the first occurrence of the Start
Phrase will be disregarded and similarly, text appearing after End Phrase (if
used) will be disregarded. For example, if text mining the transcripts from a
Live Chat service, you would not be particularly interested in any text appearing
before the heading “Chat Transcript” or after the heading “End of Chat
Transcript”. Thus you would enter “Chat Transcript” into the Start Phrase field
and “End of Chat Transcript” into the End Phrase field.

Stopword removal
If selected (the default), over 300 commonly used words/terms (such as a, to,
the, and, etc.) will be removed from the document collection during
preprocessing. Click the Edit command button to view the list of terms. To
remove a word from the Stopword list, simply highlight the desired word, then
click Remove Stopword. To add a new word to the list, click Add Stopword, a
new term “stopword” will be added. Double click to edit.
Analytic Solver Data Mining also allows additional stopwords to be added or
existing to be removed via a text document (*.txt) by using the Browse button to
navigate to the file. Terms in the text document can be separated by a space, a
comma, or both. If we were supplying our three terms in a text document, rather
than in the Edit Stopwords dialog, the terms could be listed as: subject
emailterm from or subject,emailterm,from or subject, emailterm, from. If we
had a large list of additional stopwords, this would be the preferred way to enter
the terms.

Click done to close the Edit Stopwords dialog and return to the Pre-Processing
tab.

Frontline Solvers Analytic Solver Data Mining Reference Guide 225


Exclusion list
If selected, terms entered into the Exclusion list will be removed from the
document collection. This is beneficial if all or a large number of documents in
the collection contain the same terms, for example, “from”, “to”, “subject” in a
collection of emails. If all documents contain the same terms, including them in
the analysis will not provide any benefit and could bias the analysis. Click Edit
to enter the terms to be excluded. The Edit Exclusion List dialog opens. Click
Add Exclusion Term. The label “exclusionterm” is added. Click to edit and
enter the desired term. Analytic Solver Data Mining will remove the terms
listed in this dialog from the document collection during pre-processing. To
remove a term from the exclusion list, highlight the term and click Remove
Exclusion Term.
We could have also entered these terms into a text document (*.txt) and added
the terms all at once by using the Browse button to navigate to the file and
import the list. Terms in the text document can be separated by a space, a
comma, or both. If, for example we were supplying excluded terms in a
document rather than in the Edit Exclusion List dialog, we would enter the terms
as: subject emailtoken from, or subject,emailtoken,from, or subject, emailtoken,
from. If we had a large list of terms to be excluded, this would be the preferred
way to enter the terms.

Click Done to close the dialog and return to Pre-Processing.

Vocabulary Reduction Advanced…


Analytic Solver Data Mining also allows the combining of synonyms and full
phrases by clicking Advanced within Vocabulary Reduction.

Synonym Reduction
Select Synonym reduction at the top of the dialog to replace synonyms such as
“car”, “automobile”, “convertible”, “vehicle”, “sedan”, “coupe”, “subcompact”,
and “jeep” with “auto”. Click Add Synonym and replace “rootterm” with the
term to be substituted, then replace “synonym list” with the list of synonyms,
i.e. :car, automobile, convertible, vehicle, sedan, coupe. During pre-processing,
Analytic Solver Data Mining will replace the terms “car”, “automobile”,
“convertible”, “vehicle”, “sedan”, “coupe”, “subcompact” and “jeep” with the

Frontline Solvers Analytic Solver Data Mining Reference Guide 226


term “auto”. To remove a synonym from the list, highlight the term and click
Remove Synonym.
If adding synonyms from a text file, each line must be of the form
rootterm:synonymlist or using our example: auto:car automobile convertible
vehicle sedan coup or auto:car,automobile,convertible,vehicle,sedan,coup. Note
separation between the terms in the synonym list be either a space, a comma or
both. If we had a large list of synonyms, this would be the preferred way to
enter the terms.

Phrase Reduction
Analytic Solver Data Mining also allows the combining of words into phrases
that indicate a singular meaning such as “station wagon” which refers to a
specific type of car rather than two distinct tokens – station and wagon. To add
a phrase in the Vocabulary Reduction – Advanced dialog, select Phrase
reduction and click Add Phrase. The term “phrasetoken” will be appear. Click
to edit and enter the term that will replace the phrase. i.e. wagon. Click “phrase”
to edit and enter the phrase that will be substituted, i.e. “station wagon”. If
supplying phrases through a text file (*.txt), each line of the file must be of the
form phrasetoken:phrase or using our example, wagon:station wagon. If we had
a large list of phrases, this would be the preferred way to enter the terms.

Click Done to return to the Pre-processing tab.

Frontline Solvers Analytic Solver Data Mining Reference Guide 227


Maximum vocabulary size
Analytic Solver Data Mining will reduce the number of terms in the final
vocabulary to the most frequently occurring in the collection. The default is
“1000”.

Perform stemming
Stemming is the practice of stripping words down to their “stems” or “roots”, for
example, stemming terms such as “argue”, “argued”, “argues”, “arguing”, and
“argus” would result in the stem “argu. However “argument” and “arguments”
would stem to “argument”. The stemming algorithm utilized in Analytic Solver
Data Mining is “smart” in the sense that while “running” would be stemmed to
“run”, “runner” would not. . Analytic Solver Data Mining uses the Porter
Stemmer 2 algorithm for the English Language. For more information on this
algorithm, please see the Webpage: http://tartarus.org/martin/PorterStemmer/

Normalize case
When this option is checked, Analytic Solver Data Mining converts all text to a
consistent (lower) case, so that Term, term, TERM, etc. are all normalized to a
single token “term” before any processing, rather than creating three
independent tokens with different case. This simple method can dramatically
affect the frequency distributions of the corpus, leading to biased results.

Term Normalization Advanced…


Click Advanced in the Term Normalization group to open the Term
Normalization – Advanced dialog. This dialog allows us to replace or remove
nonsensical terms such as HTML tags, URLs, Email addresses, etc. from the
document collection. It’s possible to remove normalized terms completely by
including the normalized term (for example, “emailtoken”) in the Exclusion list.

Minimum stemmed term length


If stemming reduced a term’s length to 2 or less characters, Text Miner will
disregard the term. This option is selected by default.

Remove HTML tags


If selected, HTML tags will be removed from the document collection. HTML
tags and text contained inside these tags contain technical, computer-generated
information that is not typically relevant to the goal of the text mining
application. This option is not selected by default.

Normalize URL’s
If selected, URLs appearing in the document collection will be replaced with the
term, “urltoken”. URLs do not normally add any meaning, but it is sometimes
interesting to know how many URLs are included in a document. This option is
not selected by default.

Normalize email addresses


If selected, email addresses appearing in the document collection will be
replaced with the term, “emailtoken”. This option is not selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 228


Normalize numbers
If selected, numbers appearing in the document collection will be replaced with
the term, “numbertoken”. This option is not selected by default.

Normalize monetary amounts


If selected, monetary amounts will be substituted with the term, “moneytoken”.
This option is not selected by default.

Remove terms occurring in less than __% of


documents
If selected, Text Miner will remove terms that appear in less than the percentage
of documents specified. For most text mining applications, rarely occurring
terms do not typically offer any added information or meaning to the document
in relation to the collection. The default percentage is 2%.

Remove terms occurring in more than __% of


documents
If selected, Text Miner will remove terms that appear in more than the
percentage of documents specified. For many text mining applications, the goal
is identifying terms that have discriminative power or terms that will
differentiate between a number of documents. The default percentage is 98%.

Maximum term length


If selected, Text Miner will remove terms that contain a set number of
characters. This option can be extremely useful for removing some parts of text
which are not actual English words, for example, URLs or computer-generated
tokens, or to exclude very rare terms such as Latin species or disease names, i.e.
Pneumonoultramicroscopicsilicovolcanoconiosis.

Frontline Solvers Analytic Solver Data Mining Reference Guide 229


Term-Document Matrix Scheme
A term-document matrix is a matrix that displays the frequency-based
information of terms occurring in a document or collection of documents. Each
column is assigned a term and each row a document. If a term appears in a
document, a weight is placed in the corresponding column indicating the term’s
importance or contribution. Analytic Solver Data Mining offers four different
commonly used methods of weighting scheme used to represent each value in
the matrix: Presence/Absence, Term Frequency, TF-IDF (the default) and
Scaled term frequency.
• If Presence/Absence is selected, Analytic Solver Data Mining will enter
a 1 in the corresponding row/column if the term appears in the
document and 0 otherwise. This matrix scheme does not take into
account the number of times the term occurs in each document.
• If Term Frequency is selected, Analytic Solver Data Mining will count
the number of times the term appears in the document and enter this
value into the corresponding row/column in the matrix.
• The default setting – Term Frequency – Inverse Document Frequency
(TF-IDF) is the product of scaled term frequency and inverse
document frequency. Inverse document frequency is calculated by
taking the logarithm of the total number of documents divided by the
number of documents that contain the term. A high value for TF-IDF
indicates that a term that does not occur frequently in the collection of
documents taken as a whole, appears quite frequently in the specified
document. A TF-IDF value close to 0 indicates that the term appears
frequently in the collection or rarely in a specific document.

Frontline Solvers Analytic Solver Data Mining Reference Guide 230


• If Scaled term frequency is selected, Analytic Solver Data Mining will
normalize (bring to the same scale) the number of occurrences of a term in
the documents (see the table below).
It’s also possible to create your own scheme by clicking the Advanced command
button to open the Term Document Matrix – Advanced dialog. Here users can
select their own choices for local weighting, global weighting, and
normalization. Please see the table below for definitions regarding options for
Term Frequency, Document Frequency and Normalization.

Local Weighting Global Weighting Normalization


1, if 𝑡𝑓𝑡𝑑 > 0
Binary 𝑙𝑤𝑡𝑑 = { None 𝑔𝑤𝑡 = 1 None 𝑛𝑑 = 1
0, if 𝑡𝑓𝑡𝑑 = 0
𝑁 𝑛𝑑
Raw 1
𝑙𝑤𝑡𝑑 = 𝑡𝑓𝑡𝑑 Inverse 𝑔𝑤𝑡 = log 2 Cosine
Frequency 1 + 𝑑𝑓𝑡 =
‖𝑔 𝑑 ‖2
̅̅̅
1
Logarithmic 𝑙𝑤𝑡𝑑 = log(1 + 𝑡𝑓𝑡𝑑 ) Normal 𝑔𝑤𝑡 =
2
√∑𝑑 𝑡𝑓𝑡𝑑
𝑙𝑤𝑡𝑑
𝑡𝑓 𝑐𝑓𝑡
Augnorm (max 𝑡𝑑
𝑡𝑓𝑡𝑑 ) + 1 GF-IDF 𝑔𝑤𝑡 =
𝑡
𝑑𝑓𝑡
=
2
𝑔𝑤𝑡
Entropy 𝑝𝑡𝑑 log 𝑝𝑡𝑑
= 1+∑
𝑑 log 𝑁
IDF 𝑁
𝑔𝑤𝑡 = log 2
probability 1 + 𝑑𝑓𝑡

Notations:
• 𝑡𝑓𝑡𝑑 – frequency of term 𝑡 in a document 𝑑;
• 𝑑𝑓𝑡 – document frequency of term 𝑡;
• 𝑙𝑤𝑡𝑑 – local weighting of term 𝑡 in a document 𝑑;
• 𝑔𝑤𝑡𝑑 – global weighting of term 𝑡 in a document 𝑑;
• 𝑛𝑑 – normalization of vector of terms representing the document 𝑑;
• 𝑁 – total number of documents in the collection;
• 𝑐𝑓𝑡 – collection frequency of term 𝑡;
• 𝑝𝑡𝑑 – estimated probability of term 𝑡 to appear in a document 𝑑
𝑡𝑓
(𝑝𝑡𝑑 = 𝑡𝑑⁄𝑐𝑓 );
𝑡
• 𝑔 𝑑 – vector of terms representing the document 𝑑.
̅̅̅

Finally, the element 𝑇𝑡𝑑 of Term-Document Matrix is computed as 𝑇𝑡𝑑 =


𝑙𝑤𝑡𝑑 ∗ 𝑔𝑤𝑡 ∗ 𝑛𝑑 , ∀𝑡, 𝑑

Perform latent semantic indexing


When this option is selected, Analytic Solver Data Mining will use Latent
Semantic Indexing (LSI) to detect patterns in the associations between terms and
concepts to discover the meaning of the document.
The statistics produced and displayed in the Term-Document Matrix contain
basic information on the frequency of terms appearing in the document
collection. With this information we can “rank” the significance or importance

Frontline Solvers Analytic Solver Data Mining Reference Guide 231


of these terms relative to the collection and particular document. Latent
Semantic Indexing, in comparison, uses singular value decomposition (SVD) to
map the terms and documents into a common space to find patterns and
relationships. For example: if we inspected our document collection, we might
find that each time the term “alternator” appeared in an automobile document,
the document also included the terms “battery” and “headlights”. Or each time
the term “brake” appeared in an automobile document, the terms “pads” and
“squeaky” also appeared. However there is no detectable pattern regarding the
use of the terms “alternator” and “brake”. Documents including “alternator”
might not include “brake” and documents including “brake” might not include
“alternator”. Our four terms, battery, headlights, pads, and squeaky describe
two different automobile repair issues: failing brakes and a bad
alternator. Latent Semantic Indexing will attempt to 1. Distinguish between
these two different topics, 2. Identify the documents that deal with faulty
brakes, alternator problems or both and 3. Map the terms into a common
semantic space using singular value decomposition. SVD is a tool used by Text
Miner to extract concepts that explain the main dimensions of meaning of the
documents in the collection. The results of LSA are usually hard to examine
because it can’t fully explain how the concept representation was constructed. It
is more an art rather than science to make sense out of the results of LSA –
Analytic Solver Data Mining provides several visualizations that simplify this
process greatly.

Concept Extraction – Latent Semantic Indexing


Select Automatic, Maximum number of concepts or Minimum percentage
explained.
• If Automatic is selected, Analytic Solver Data Mining will calculate the
importance of each concept, take the difference between each and report
any concepts above the largest difference. For example if three concepts
were identified (Concept1, Concept2, and Concept3) and given importance
factors of 10, 8, and 2, respectively, Analytic Solver Data Mining would
keep Concept1 and Concept2 since the difference between Concept2 and
Concept 3 (8-2=6) is larger than the difference between Concept1 and
Concept2 (10-8=2). If Maximum number of concepts is selected, Analytic
Solver Data Mining will identify the top number of concepts according to
the value entered here. The default is 2 concepts.
• If Minimum percentage explained is selected, Analytic Solver Data Mining
will identify the concepts with singular values that, when taken together,
sum to the minimum percentage explained, 90% is the default.
• If Maximum number of concepts is selected, Analytic Solver Data Mining
will retain the top significant concepts according to the value entered here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 232


Term-Document Matrix
The term-document matrix is a matrix that displays the most frequently
occurring terms across the top of the matrix and the document IDS down the
left. If a term appears in a document, a weight is placed in the corresponding
column indicating the importance of the term using our selection of TF-IDF on
the Representation dialog. The number of terms contained in the matrix is
controlled by the Maximum vocabulary size option on the Pre-Processing tab.
The number of documents is equal to the number of documents in the sample.
Analytic Solver Data Mining offers four different commonly used methods for
ranking the number of times a term appears in a document on the Pre-Processing
tab: Presence/Absence, Term Frequency, TF-IDF (the default) and Scaled term
frequency. This matrix is selected by default.

Concept-Document Matrix
The Concept – Document Matrix is enabled when Perform latent semantic
indexing is selected on the Representation tab. The most important concepts
will be listed across the top of the matrix and the documents will be listed down
the left side of the matrix. The number of concepts is controlled by the setting
for Concept Extraction – Latent Semantic indexing on the Representation tab:
Automatic, Maximum number of concepts, or Minimum percentage explained. If
a concept appears in a document, the singular value decomposition weight is
placed in the corresponding column indicating the importance of the concept in
the document. If Perform latent semantic indexing is selected, this option will
also be selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 233


Term-Concept Matrix
The Term – Concept matrix is enabled when Perform latent semantic indexing is
selected on the Representation tab. The most important concepts will be listed
across the top of the matrix and the most frequently occurring terms will be
listed down the left. The number of most important concepts is controlled by
the setting for Concept Extraction – Latent Semantic indexing option on the
Representation tab: Automatic, Maximum number of concepts, or Minimum
percentage explained. The number of terms in the matrix is controlled by the
Maximum vocabulary size on the Pre-Processing tab. If a term appears in a
concept, the singular value decomposition weight is placed in the corresponding
column indicating the importance of the term in the concept. This option is not
selected by default.

Term frequency table


The Term frequency table displays the most frequently occurring terms in the
document collection according to the value entered for Most frequent terms.
The first column of the table, Collection Frequency, displays the number of
times the term appears in the collection. The 2 nd column of the table, Document
Frequency, displays the number of documents that include the term. The third
column in the table, Top Documents, displays the top documents where the
corresponding term appears the most frequently according to the Maximum
corresponding documents setting (see below). This option is selected by default.

Most frequent terms


This option is enabled only when Term frequency table is selected. This option
controls the number of terms displayed in the Term frequency table and Zipf’s
plot. This option is selected by default with a value of “10” terms.

Full vocabulary
This option is enabled only when Term frequency table is selected. If selected,
the full vocabulary list will be displayed in the term frequency table.

Maximum corresponding documents


This option is enabled only when Most frequent terms is selected. This option
controls the number of documents displayed in the third column of the Term
frequency table. This option is not selected by default.

Zipf’s plot
The Zipf Plot graphs the document frequency against the term ranks (or terms
ranked in order of importance). Typically the number of terms in a document
follow Zipf’s law which states that the frequency of terms used in a free-form
text drops exponentially. In other “words” (pun intended) when we speak we
tend to use a few words a lot but most words very rarely. Hover over each
point in the plot to see the most frequently occurring terms in the document
collection. This option is not selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 234


Show documents summary
If selected, Analytic Solver Data Mining will produce a Documents table
displaying the document ID, length of the document, and number of terms
included in each document. This option is selected by default.

Keep a short except. Number of characters


If selected, Analytic Solver Data Mining will produce a fourth column in the
Documents table displaying the first N number of characters in the document.
This option is not selected by default, but if selected, the default number of
characters is “20”.

Scree Plot
This plot gives a graphical representation of the contribution or importance of
each concept according to the setting for Maximum number of concepts. Find
the largest “drop” or “elbow” in the plot to discover the leading topics in the
document collection. When moving from left to right on the x-axis, the
importance of each concept will diminish. This information may be used to
limit the number of concepts (as variables) used as inputs into a classification
model. This option is not selected by default.

Maximum number of concepts


If Scree Plot is enabled, Maximum number of concepts is enabled. Enter the
number of concepts to be displayed in the Scree Plot here.

Terms scatter plot


This graph is a visual representation of the Concept – Term Matrix. It displays
all terms from the final vocabulary in terms of two concepts. Similarly to
Concept-Document scatter plot, Concept-Term scatter plot visualizes the
distribution of vocabulary terms in the semantic space of meaning extracted with
LSA. The coordinates are also normalized, so the range of axes is always [-1,1],
where extreme values (close to +/-1) highlight the importance or “load” of each
term to a particular concept. The terms appearing in a zero-neighborhood of
concept range do not contribute much to a concept definition. In our example, if
we would identify a concept having a set of terms that can be divided into two
groups: one related to “Autos” and other to “Electronics”, and these groups
would be distant from each other on the axis corresponding to this concept, this
would definitely provide an evidence that this particular concept “caught” some
pattern in the text collection that is capable of discriminating topic of article.
Therefore, Term-Concept scatter plot is an extremely valuable tool for
examining and understanding the main topics in the collection of documents,
finding similar words that indicate similar concept, or the terms explaining the
concept from “opposite sides” (e.g. term1 can be related to cheap affordable
electronics and term2 can be related to expensive luxury electronics). This
option is not selected by default.

Document scatter plot


This graph is a visual representation of the Document – Concept Matrix. This
graph is a visual representation of the Concept – Document matrix. Note that
Analytic Solver Data Mining normalizes each document representation so it lies
on a unit hypersphere. Documents that appear in the middle of the plot, with

Frontline Solvers Analytic Solver Data Mining Reference Guide 235


concept coordinates near 0 are not explained well by either shown concept. The
further the magnitude of coordinate from zero, the more effect particular concept
has for the corresponding document. In fact, two documents placed at extremes
of a concept (one close to -1 and other to +1) indicates strong differentiation
between these documents in terms of the extracted concept. This provides means
for understanding actual meaning of the concept and investigating which
concepts have the largest discriminative power, when used to represent the
documents from text collection. This option is not selected by default.

Concept importance
This table displays the total number of concepts extracted, the Singular Value
for each, the Cumulative Singular Value and the % of Singular Value explained
which is used when Minimum percentage explained is selected for Concept
Extraction – Latent Semantic Indexing on the Representation tab. This option is
not selected by default.

Term Importance
This table display each term along with its Importance as calculated by singular
value decomposition. This option is not selected by default.

Write Text Mining Model


Select this option under Text Miner Model to write the base line or “base
corpus” model to an output sheet. The base corpus model can be used to process
new documents based on the existing text mining model. This option is not
selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 236


Exploring a Time Series Dataset

Introduction
Time series datasets contain a set of observations generated sequentially in time.
Organizations of all types and sizes utilize time series datasets for analysis and
forecasting for predicting next year’s sales figures, raw material demand,
monthly airline bookings, etc. .

Example of a time series dataset: Monthly airline bookings.

A time series model is first used to obtain an understanding of the underlying


forces and structure that produced the data and then secondly, to fit a model that
will predict future behavior. In the first step, the analysis of the data, a model is
created to uncover seasonal patterns or trends in the data, for example bathing
suit sales in June. In the second step, forecasting, the model is used to predict
the value of the data in the future, for example, next year’s bathing suit sales.
Separate modeling methods are required to create each type of model.
Analytic Solver Data Mining features three techniques for exploring trends in a
dataset, ACF (Autocorrelation function), ACVF (Autocovariance of data) and
PACF (Partial autocorrelation function). These techniques help the user to
explore various patterns in the data which can be used in the creation of the
model. After the data is analyzed, a model can be fit to the data using Analytic
Solver Data Mining’s ARIMA method.

Autocorrelation (ACF)
Autocorrelation (ACF) is the correlation between neighboring observations in a
time series. When determining if an autocorrelation exists, the original time
series is compared to the “lagged” series. This lagged series is simply the
original series moved one time period forward (xn vs xn+1). Suppose there are 5
time based observations: 10, 20, 30, 40, and 50. When lag = 1, the original
series is moved forward one time period. When lag = 2, the original series is
moved forward two time periods.

Frontline Solvers Analytic Solver Data Mining Reference Guide 237


Day Observed Value Lag-1 Lag-2
1 10
2 20 10
3 30 20 10
4 40 30 20
5 50 40 30
The autocorrelation is computed according to the formula:

∑𝑛 ̅ ̅
𝑖=𝑘+1(𝑌𝑡 −𝑌 )(𝑌𝑡−𝑘 −𝑌 )
𝑟𝑘 = ∑𝑛 ̅ 2
where k = 0, 1, 2, …., n
𝑖=1(𝑌𝑡 −𝑌 )

Where Yt is the Observed Value at time t, 𝑌̅ is the mean of the Observed Values
and Yt –k is the value for Lag-k.
For example, using the values above, the autocorrelation for Lag-1 and Lag - 2
can be calculated as follows.
𝑌̅ = (10 + 20 + 30 + 40 + 50) / 5 = 30
r1 = ((20 – 30) * (10 - 30) + (30 - 30) * (20 - 30) + (40 - 30) * (30 - 30) + (50 –
30) * (40 – 30)) / ((10 – 30)2 + (20 - 30)2 + (30 – 30)2 + (40 – 30)2 + (50 – 30)2)
= 0.4
r2 =( (30 – 30) * (10 – 30) + (40 – 30) * (20 – 30) + (50 – 30) * (30 – 30)) / (((10
– 30)2 + (20 - 30)2 + (30 – 30)2 + (40 – 30)2 + (50 – 30)2) = -0.1
The two red horizontal lines on the graph below delineate the Upper confidence
level (UCL) and the Lower confidence level (LCL). If the data is random, then
the plot should be within the UCL and LCL. If the plot exceeds either of these
two levels, as seen in the plot above, then it can be presumed that some
correlation exists in the data.

Partial Autocorrelation Function (PACF)


This technique is used to compute and plot the partial autocorrelations between
the original series and the lags. However, PACF eliminates all linear
dependence in the time series beyond the specified lag.

Autocovariance of Data (ACVF)


Autocovariance is the covariance of the time series with itself at pairs of time
points.

Frontline Solvers Analytic Solver Data Mining Reference Guide 238


ARIMA
An ARIMA (autoregressive integrated moving-average models) model is a
regression-type model that includes autocorrelation. The basic assumption in
estimating the ARIMA coefficients is that the data are stationary, that is, the
trend or seasonality cannot affect the variance. This is generally not true. To
achieve the stationary data, Analytic Solver Data Mining will first apply
“differencing”: ordinary, seasonal or both.

After Analytic Solver Data Mining fits the model, various results will be
available. The quality of the model can be evaluated by comparing the time plot
of the actual values with the forecasted values. If both curves are close, then it
can be assumed that the model is a good fit. The model should expose any
trends and seasonality, if any exist. If the residuals are random then the model
can be assumed a good fit. However, if the residuals exhibit a trend, then the
model should be refined. Fitting an ARIMA model with parameters (0,1,1) will
give the same results as exponential smoothing. Fitting an ARIMA model with
parameters (0,2,2) will give the same results as double exponential smoothing.

Partitioning
To avoid over fitting of the data and to be able to evaluate the predictive
performance of the model on new data, we must first partition the data into
training and validation sets using Analytic Solver Data Mining’s time series
partitioning utility. After the data is partitioned, ACF, PACF, and ARIMA can
be applied to the dataset.

Examples for Time Series Analysis


The examples below illustrate how Analytic Solver Data Mining can be used to
explore the Income.xlsx dataset to uncover trends and seasonalities in a dataset.
Click Help – Examples on the Data Mining ribbon, then Forecasting/Data
Mining Examples.
This dataset contains the average income of tax payers by state.
Typically the following steps are performed in a time series analysis.
1. The data is first partitioned into two sets with 60% of the data assigned to
the training set and 40% of the data assigned to validation.
2. Exploratory techniques are applied to both the training and validation sets.
If the results are in synch then the model can be fit. If the ACF and PACF
plots are the same, then the same model can be used for both sets.
3. The model is fit using the ARIMA method.
4. When we fit a model using the ARIMA method, Analytic Solver displays
the ACF and PACF plots for residuals. If these plots are in the band of UCL
and LCL then it indicates that the residuals are random and the model is
adequate.
iv. If the residuals are not within the bands, then some correlation exists, and
the model should be improved.

Frontline Solvers Analytic Solver Data Mining Reference Guide 239


First we must perform a partition on the data. Click Partition within the Time
Series group on the Data Mining ribbon to open the following dialog.
Select Year under Variables and click > to define the variable as the Time
Variable. Select the remaining variables under Variables and click > to include
them in the partitioned data.
Select Specify #Records under Specify Partitioning Options to specify the
number of records assigned to the training and validation sets. Then select
Specify #Records under Specify #Records for Partitioning. Enter 50 for the
number of Training Set records and 21 for the number of Validation Set records.
If Specify Percentages is selected under Specify Partitioning Options, Analytic
Solver Data Mining will assign a percentage of records to each set according to
the values entered by the user or automatically entered under Specify
Percentages for Partitioning.

Click OK. TSPartition is inserted to the right of the Income worksheet.

Note in the output above, the partitioning method is sequential (rather than
random). The first 50 observations have been assigned to the training set and
the remaining 21 observations have been assigned to the validation set.

Frontline Solvers Analytic Solver Data Mining Reference Guide 240


Open the Lag Analysis dialog by clicking ARIMA – Lag Analysis. Select CA
under Variables in input data, then click > to move the variable to Selected
variable. Enter 1 for Minimum Lag and 40 for Maximum Lag under
Parameters: Training and 1 for Minimum Lag and 15 for Maximum Lag under
Parameters: Validation.
Under Charting, select ACF chart, ACVF chart, and PACF chart to include each
chart in the output.

Click OK. TS_Lags is inserted right of the TSPartition worksheet.

First, let's take a look at the ACF charts. Note on each chart, the autocorrelation
decreases as the number of lags increase. This suggests that a definite pattern
does exist in each partition. However, since the pattern does not repeat, it can be
assumed that no seasonality is included in the data. In addition, both charts
appear to exhibit a similar pattern.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select TS_Lags for Worksheet and ACF/ACVF/PACF
Training/Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 241


The PACF functions show a definite pattern which means there is a trend in the
data. However, since the pattern does not repeat, we can conclude that the data
does not show any seasonality.
The screenshots below display the autocovariance values.

All three charts suggest that a definite pattern exists in the data, but no
seasonality. In addition, both datasets exhibit the same behavior in both the
training and validation sets which suggests that the same model could be
appropriate for each. Now we are ready to fit the model.
The ARIMA model accepts three parameters: p – the number of autoregressive
terms, d – the number of non-seasonal differences, and q – the number of lagged
errors (moving averages).
Recall that the ACF plot showed no seasonality in the data which means that
autocorrelation is almost static, decreasing with the number of lags increasing.
This suggests setting q = 0 since there appears to be no lagged errors. The
PACF plot displayed a large value for the first lag but minimal plots for
successive lags. This suggest setting p =1. With most datasets, setting d =1 is
sufficient or can at least be a starting point.
Click back to the TSPartition tab and then click ARIMA – ARIMA Model to
bring up the Time Series – ARIMA dialog.
Select CA under Variables in input data then click > to move the variable to the
Selected Variable field. Under Nonseasonal Parameters set Autoregressive (p)
to 1, Difference (d) to 1 and Moving Average (q) to 0.

Frontline Solvers Analytic Solver Data Mining Reference Guide 242


Click Advanced to open the ARIMA – Advanced Options dialog. Select Fitted
Values and residuals, Produce forecasts, and Report Forecast Confidence
Intervals. The default Confidence Level setting of 95 is automatically entered.
The option Variance-covariance matrix is selected by default.

Click OK on the ARIMA-Advanced Options dialog and again on the Time Series
– ARIMA dialog. Analytic Solver Data Mining calculates and displays various
parameters and charts in four output sheets, Arima_Output, Arima_Fitted,
Arima_Forecast and Arima_Stored. Click the Arima_Output tab to view the
Output Navigator.

Frontline Solvers Analytic Solver Data Mining Reference Guide 243


Click the ARIMA Model link on the Output Navigator to move to display the
ARIMA Model and Ljung-Box Test Results on Residuals.

Analytic Solver has calculated the constant term and the AR1 term for our
model, as seen above. These are the constant and f1 terms of our forecasting
equation. See the following output of the Chi - square test.
The very small p-values for the constant term (1.119E-7) and AR1 term (1.19e-
89) suggest that the model is a good fit to our data.
Click the Fitted link on the Output Navigator. This table plots the actual and
fitted values and the resulting residuals for the training partition. As shown in
the graph below, the Actual and Forecasted values match up fairly well. The
usefulness of the model in forecasting will depend upon how close the actual
and forecasted values are in the Forecast, which we will inspect later.

Use your mouse to select a point on the graph to compare the Actual value to the
Forecasted value.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select Arima_Fitted for Worksheet and ACF/ACVF/PACF
Training/Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 244


Take a look at the ACF and PACF plots for Errors found at the bottom of
ARIMA_Output. Analytic Solver contains one more additional chart, the ACVF
Plot for the Residuals.

With the exception of Lag1, the majority of the lags in the PACF and ACF
charts are either clearly within the UCL and LCL bands or just outside of these
bands. This suggests that the residuals are random and are not correlated.
Click the Forecast link on the Output Navigator to display the Forecast Data
table and charts.

Frontline Solvers Analytic Solver Data Mining Reference Guide 245


The table shows the actual and forecasted values along with LCI (Lower
Confidence Interval), UCI (Upper Confidence Interval) and Residual values.
The "Lower" and "Upper" values represent the lower and upper bounds of the
confidence interval. There is a 95% chance that the forecasted value will fall
into this range. The graph to the right plots the Actual values for CA against
the Forecasted values. Again, click any point on either curve to compare the
Actual against the Forecasted values.

Options for Exploring Time Series Datasets


The options described below appear on one of the 3 Time Series dialogs.

The options below appear on the Time Series Partition Data dialog.

Time variable
Select a time variable from the available variables and click the > button. If a
Time Variable is not selected, Analytic Solver will assign one to the partitioned
data.

Variables in the Partition Data


Select one or more variables from the Variables field by clicking on the
corresponding selection button.

Frontline Solvers Analytic Solver Data Mining Reference Guide 246


Specify Partitioning Options
Select Specify Percentages to specify the percentage of the total number of
records desired in the Validation and Training sets. Select Specify # Records to
enter the desired number of records in the Validation and Training sets.

Specify Percentages for Partitioning


Select Automatic to have Analytic Solver automatically use 60% of the records
in the Training set and 40% of the records in the Validation set. Select Specify
# Records under Specify Partitioning Options, to manually select the number of
records to include in the Validation and Training sets. If Specify Percentages is
selected under Specify Partitioning Options, then select Specify Percentages to
specify the percentage of the total number of records to be included in the
Validation and Training sets.
The options below appear on the Lag Analysis dialog.

Variables in the input data


Select one or more variables from the Variables field by clicking on the
corresponding selection button.

Selected variable
The selected variable appears here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 247


Parameters: Training
Enter the minimum and maximum lags for the Training Data here. The # lags
for the Training set should be >= 1 and < N where N is the number of records in
the Training dataset.

Parameters: Validation
Enter the minimum and maximum lags for the Validation Data here. The # lags
for the Validation Data set should be >= 1 and < N where N is the number of
records in the Validation dataset.

Plot ACF Chart


If this option is selected, Analytic Solver plots the autocorrelations for the
selected variable.

Plot PACF Chart


If this option is selected, Analytic Solver plots the partial autocorrelations for
the selected variable.

Plot ACVF Chart


If this option is selected, Analytic Solver plots the Autocovariance of Data for
the selected variable.

The options below appear on the Time Series – ARIMA dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 248


Time Variable
The Time variable is automatically selected when using a partitioned dataset.
When using an unpartitioned dataset, select the desired Time variable by
clicking the > button.

Selected Variable
Select the desired variable to be included in the ARIMA model by clicking the >
button.

Fit seasonal model


Select this option to specify a seasonal model. The seasonal parameters are
enabled when this option is selected.

Period
If Fit seasonal model is selected, this option is enabled. Seasonality in a dataset
appears as patterns at specific periods in the time series.

Nonseasonal Parameters
Enter the nonseasonal parameters here for Autoregressive (p), Difference (d),
and Moving Average (q).

Frontline Solvers Analytic Solver Data Mining Reference Guide 249


Seasonal Parameters
Enter the Seasonal parameters here for Autoregressive (P), Difference (D), and
Moving Average (Q).

The options below appear on the ARIMA – Advanced Options dialog.

Maximum number of iterations


Enter the maximum number of iterations here. The default is 200 iterations.

Fitted Values and residuals


Analytic Solver Data Mining will include the fitted values and residuals in the
output if this option is selected.

Variance-covariance matrix
Analytic Solver Data Mining will include the variance-covariance matrix in the
output if this option is selected. This option is selected by default.

Produce forecasts
If this option is selected, Analytic Solver Data Mining will display the desired
number of forecasts. If the data has been partitioned, Analytic Solver will
display the forecasts on the validation data.

Number of forecasts
If Produce forecasts is selected and a non-partitioned dataset is being used, this
option is enabled. The maximum number of forecasts is 100.

Confidence level for forecast confidence


intervals
If this option is selected, enter the desired confidence level here. (The default
level is 95%.) The Lower and Upper values of the computed confidence levels
will be included in the output. The forecasted value will be guaranteed to fall
within this range for the specified confidence level.

Frontline Solvers Analytic Solver Data Mining Reference Guide 250


Smoothing Techniques

Introduction
Data collected over time is likely to show some form of random variation.
"Smoothing techniques" can be used to reduce or cancel the effect of these
variations. These techniques, when properly applied, will “smooth” out the
random variation in the time series data to reveal any underlying trends that may
exist.
Analytic Solver Data Mining features four different smoothing techniques:
Exponential, Moving Average, Double Exponential, and Holt Winters. The first
two techniques, Exponential and Moving Average, are relatively simple
smoothing techniques and should not be performed on datasets involving
seasonality. The last two techniques are more advanced techniques which can
be used on datasets involving seasonality.

Exponential smoothing
Exponential smoothing is one of the more popular smoothing techniques due to
its flexibility, ease in calculation and good performance. As in Moving Average
Smoothing, a simple average calculation is used. Exponential Smoothing,
however, assigns exponentially decreasing weights starting with the most recent
observations. In other words, new observations are given relatively more weight
in the average calculation than older observations. Analytic Solver Data
Mining utilizes the formulas below in the Exponential Smoothing tool.

S0 = x0
St = αxt-1 + (1-α)st-1, t > 0

where
• original observations are denoted by {xt} starting at t = 0
• α is the smoothing factor which lies between 0 and 1

As with Moving Average Smoothing, Exponential Smoothing should only be


used when the dataset contains no seasonality. The forecast will be a constant
value which is the smoothed value of the last observation.

Moving Average Smoothing


In this simple technique each observation is assigned an equal weight.
Additional observations are forecasted by using the average of the previous
observations. If we have the time series X1, X2, X3, ....., Xt, then this technique
will predict Xt+k as follows :
St = Average (xt-k+1, xt-k+2, ....., xt), t= k, k+1, k+2, ...N
where k is the smoothing parameter. Analytic Solver Data Mining allows a
parameter value between 2 and t-1 where t is the number of observations in the
dataset. Care should be taken when choosing this parameter as a large
parameter value will oversmooth the data while a small parameter value will
undersmooth the data. Using the past three observations are enough to predict
the next observations. As with Exponential Smoothing, this technique should
not be applied when seasonality is present in the dataset.

Frontline Solvers Analytic Solver Data Mining Reference Guide 251


Double exponential smoothing
Double exponential smoothing can be defined as “Exponential smoothing of
Exponential smoothing”. As stated above, Exponential smoothing should not be
used when the data includes seasonality. However, Double Exponential
smoothing introduces a 2nd equation which includes a trend parameter.
Therefore, this technique can and should be used when a trend is inherent in the
dataset, but not used when seasonality is present. Double exponential
smoothing is defined in the following manner:
St = At + Bt , t = 1,2,3,..., N
Where, At = aXt + (1- a) St-1 0< a <= 1
Bt = b (At - At-1) + (1 - b ) Bt-1 0< b <= 1
The forecast equation is: Xt+k = At + K Bt , K = 1, 2, 3, ...
where a denotes the Alpha parameter and b denotes the Trend parameters.
Analytic Solver Data Mining allows these two parameters to be entered
manually. In addition, Analytic Solver includes an optimize feature which will
chose the best values for Alpha and Trend based on the Forecasting Mean
Squared Error. If the trend parameter is 0, then this technique is equivalent to
the Exponential Smoothing technique. (However, results may not be identical
due to different initialization methods for these two techniques.)

Holt Winters Smoothing


What happens if the data exhibits trends as well as seasonality? We now
introduce a third parameter, g to account for seasonality (sometimes called
periodicity) in a dataset. The resulting set of equations is called the Holt-Winters
method after the names of the inventors. The Holt Winters method can be used
on datasets involving trend and seasonality (a, b , g). Values for all three
parameters can range between 0 and 1.

There are three models associated with this method:


Multiplicative: Xt = (At+ Bt)* St +et At and Bt are previously calculated initial
estimates. St is the average seasonal factor for the tth season.
Additive: Xt = (At+ Bt) +SNt + et
No Trend: b = 0, so, Xt = At * SNt +et

Holt Winters smoothing is similar to exponential smoothing if b and g = 0 and is


similar to double exponential smoothing if g = 0.

Frontline Solvers Analytic Solver Data Mining Reference Guide 252


Exponential Smoothing Example
This example illustrates how to use Analytic Solver’s Exponential Smoothing
technique to uncover trends in the example time series, Airpass.xlsx and
Income.xlsx. To open both files, click Help – Examples -- Forecasting/Data
Mining Examples
Airpass.xlsx contains the monthly totals of international airline passengers from
1949 - 1960. Income.xlsx contains the average income of tax payers by state.
Click Partition in the Time Series group on the Data Mining ribbon to open the
Time Series Partition Data dialog, as shown below.
Select Month as the Time Variable. Select Passengers as the Variables in the
Partition Data.

Then click OK to partition the data into training and validation sets.
(Partitioning is optional. Smoothing techniques may be run on full unpartitioned
datasets.) The result of the partition, TSPartition, is inserted right of the Airpass
worksheet.
Click Smoothing – Exponential to open the Exponential Smoothing dialog.
Select Month as the Time Variable, unless already selected. Select Passengers
as the Selected variable and also Produce Forecast on validation.

Frontline Solvers Analytic Solver Data Mining Reference Guide 253


Click OK to apply the smoothing technique. Expo and Expo_Stored will be
inserted right of the Data worksheet. See the “Scoring New Data” chapter for
information on the Expo_Stored sheet.
The Actual Vs Fitted: Training chart shows that the Exponential smoothing
technique does not result in a good fit as the model does not effectively capture
the seasonality in the dataset. As a result, the summer months where the number
of airline passengers are typically high appear to be under forecasted (i.e. too
low) and the forecasts for months with low passenger numbers are too high.
Consequently, an exponential smoothing forecast should never be used when the
dataset includes seasonality. An alternative would be to perform a regression
on the model and then apply this technique to the residuals.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select Expo for Worksheet and Time Series Training Data or Time
Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 254


Now let’s take a look at an example that does not include seasonality. Click
Partition within the Time Series group on the Data Mining ribbon to open the
Time Series Partition dialog. First partition the dataset into training and
validation sets using Year as the Time Variable and CA as the Variables in the
partition data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 255


Click OK to accept the partitioning defaults and create the two sets (Training
and Validation). TSPartition is inserted right of the Income worksheet. Click
Smoothing – Exponential from the Data Mining ribbon to open the
Exponential Smoothing dialog.
Select Year for Time Variable if it has not already been selected. Select CA as
the Selected Variable and Produce forecast on validation.
The smoothing parameter (Alpha) determines the magnitude of weights assigned
to the observations. For example, a value close to 1 would result in the most
recent observations being assigned the largest weights and the earliest
observations being assigned the smallest weights. A value close to 0 would
result in the earliest observations being assigned the largest weights and the
latest observations being assigned the smallest weights. As a result, the value of
Alpha depends on how much influence the most recent observations should have
on the model.
Analytic Solver includes the Optimize feature that will choose the Alpha
parameter value that results in the minimum residual mean squared error. It is
recommended that this feature be used carefully as it can often lead to a model
that is overfit to the training set. An overfit model rarely exhibits high
predictive accuracy in the validation set.

If we click OK to accept the default Alpha value of 0.2. Two output sheets,
Expo and Expo_Stored, will be inserted right of the Data worksheet. For more
information on the Expo_Stored worksheet, see the chapter “Scoring New Data”
in the Analytic Solver Data Mining User Guide.
The Training and Validation Error Measures tables show a fitted model with a
MSE of 258,202.3 for the Training set and a MSE of 2.16E08 for the Validation
set. These are fairly large numbers and indicate that the model is not well-fit.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select Expo for Worksheet and Time Series Training Data or Time

Frontline Solvers Analytic Solver Data Mining Reference Guide 256


Series Validation Data for Chart.

Click Smoothing – Exponential Smoothing to run the technique a second time.


Again select CA as the Selected Variable and Produce forecast on validation.
However, this time, select Optimize, then click OK.

Expo1 is inserted right of the Expo worksheet. Analytic Solver used an Alpha =
0.9976…

Frontline Solvers Analytic Solver Data Mining Reference Guide 257


which results in a MSE of 22,110.2 for the Training Set and a MSE of 1.93E08
for the Validation Set. Although an alpha of .9976 did result in lower values,
the MSE in both the training and validation sets indicates the model is still not a
good fit.

Note: Click the Charts icon on the Data Mining Cloud Ribbon to view the
charts shown above.

Moving Average Smoothing Example


This example illustrates how to use Analytic Solver's Moving Average
Smoothing technique to uncover trends in the Airpass.xlsx time series dataset.
Click Help – Examples --Forecasting/Data Mining Examples to open the
dataset. Airpass.xlsx contains monthly totals of international airline passengers
from 1949 - 1960.
Click Partition in the Time Series group on the Data Mining ribbon to open the
Time Series Partition Data dialog. Select Month as the Time Variable. Select

Frontline Solvers Analytic Solver Data Mining Reference Guide 258


Passengers as the Variables in the partition data. Then click OK to partition
the data into training and validation sets. (Partitioning is optional. Smoothing
techniques may be run on full unpartitioned datasets.)

The output sheet, TSPartition, will be inserted directly right of the Airpass sheet.
Click Smoothing – Moving Average to open the Moving Average Smoothing
dialog.
Select Month for Time Variable if not already selected. Select Passengers as
the Selected variable. Since this dataset is expected to include some seasonality
(i.e. airline passenger numbers increase during the holidays and summer
months), the value for the Interval parameter should be the length of one
seasonal cycle, i.e. 12 months. As a result, enter 12 for Interval. Select
Produce forecast on validation.

Frontline Solvers Analytic Solver Data Mining Reference Guide 259


Afterwards, click OK to apply the smoothing technique to the partitioned
dataset.
The report, MovingAvg, will be inserted directly right of TSPartition.
The Actual Vs. Fitted: Training and The Actual Vs. Forecast: Validation charts
show that the moving average smoothing technique does not result in a good fit
as the model does not effectively capture the seasonality in the dataset. The
summer months where the number of airline passengers are typically high,
appear to be under forecasted and the months where the number of airline
passengers are low, the model results in a forecast that is too high. A moving
average forecast should never be used when the dataset includes seasonality.
An alternative would be to perform a regression on the model and then apply
this technique to the residuals.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select MovingAvg for Worksheet and Time Series Training Data or
Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 260


Now let’s take a look at an example that does not include seasonality. Open the
example dataset Income.xlsx. This dataset contains the average income of tax
payers by state. First partition the dataset into training and validation sets using
Year as the Time Variable and CA as the Variables in the partition data.

Then click OK to accept the partitioning defaults and create the two partitions
(Training and Validation). The output, TSPartition, will be inserted right of the
Income sheet..
Click Smoothing – Moving Average from the Data Mining ribbon to open the
Moving Average Smoothing dialog. Year is already selected for Time Variable.
Select CA as the Selected variable and Produce forecast on validation
checkbox..

Frontline Solvers Analytic Solver Data Mining Reference Guide 261


Click OK to run the Moving Average Smoothing technique.
Click MovingAvg, inserted right of the TSPartition worksheet, to view the
Actual vs Fitted: Training and Actual vs. Forecast: Validation charts and Error
Measures.

Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select MovingAvg for Worksheet and Time Series Training Data or
Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 262


MovingAvg_Stored is available for scoring new data. Please see the "Scoring
New Data" chapter within the Analytic Solver Data Mining User Guide for more
information on scoring new data using a stored model sheet.

Double Exponential Smoothing Example


This example illustrates how to use Analytic Solver’s Double Exponential
Smoothing technique to uncover seasonality trends in the Airpass.xlsx time
series dataset. (See the examples above for instructions on how to open this
example.)
Click Partition in the Time Series group on the Data Mining ribbon to open the
Time Series Partition Data dialog. Select Month as the Time Variable. Select
Passengers as the Variables in the Partition Data.

Then click OK to partition the data into training and validation sets. TSPartition
will be inserted right of the Airpass worksheet.
Click Smoothing – Double Exponential to open the Double Exponential
Smoothing dialog.
Select Month as the Time Variable, if not already selected. Select Passengers
as the Selected variable, then check Produce Forecast on validation to test the
forecast on the validation set.
This example uses the defaults for both the Alpha and Trend parameters.
However, Analytic Solver Data Mining includes a feature that will choose the

Frontline Solvers Analytic Solver Data Mining Reference Guide 263


Alpha and Trend parameter values that result in the minimum residual mean
squared error. It is recommended that this feature be used carefully as this
feature most often leads to a model that is overfit to the training set. An overfit
model rarely exhibits high predictive accuracy in the validation set.

Click OK to run the Double Exponential Smoothing algorithm. The output,


DoubleExpo and DoubleExpo_Stored, will be inserted right of the TSPartition
worksheet.
Click on the DoubleExpo tab to view the results of the smoothing. Click on any
point on either graph to see the Actual vs. Forecast results at the top of the chart.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select DoubleExp for Worksheet and Time Series Training Data or
Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 264


If instead, the Optimize feature is used ….

... an Alpha of 0.9568 is chosen along with a Trend of 0.009.


.

These parameters result in a MSE of 450.7 for the Training set and a MSE of
8477.64 for the Validation Set. Again the model created with the parameters
from the Optimize algorithm appear to result in a model with a better fit than a
model created with the default parameters.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select DoubleExp1 for Worksheet and Time Series Training Data or
Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 265


Holt Winters Smoothing Example
This example illustrates how to use Analytic Solver's Holt Winters Smoothing
technique to uncover trends in the time series dataset Airpass.xlsx. This
example will create three different forecasts, one for each Holt Winters model
type, beginning with Multiplicative.
Click back to the TSPartition worksheet and click Smoothing – Holt Winters –
Multiplicative to open the Holt Winters Smoothing (Multiplicative Model)
dialog.
Select Month for the Time variable, if not already selected, and Passengers for
Selected variable.
Since our dataset contains airline passengers, we can assume some seasonality
exists in the data since most passengers fly during the summer and holiday
months (i.e. December).
It takes a full 12 months to complete the seasonality cycle so enter 12 for
Period, # Complete seasons is automatically entered with the number 7. This
example will use the defaults for the three parameters: Alpha, Beta, and
Gamma.
Values between 0 and 1 can be entered for each parameter. As with Exponential
Smoothing, values close to 1 will result in the most recent observations being
weighted more than earlier observations.
In the Multiplicative model, it is assumed that the values for the different
seasons differ by percentage amounts.
Produce Forecast on validation is selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 266


Click OK to run the smoothing technique. The results of the smoothing
technique, MulHoltWinters and MulHoltWinters_Stored, are inserted to the right
of the TSPartition worksheet. For more information on MulHoltWinters_Stored,
see the chapter “Scoring New Data” within the Analytic Solver User Guide.

Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select MulHoltWinters for Worksheet and Time Series Training Data
or Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 267


If you inspect the MSE (Mean Squared Error) term in the Error Measures
(Validation) table, you’ll see that this value is fairly high. In addition, the peaks
for the Forecast data appear to lag behind the peaks in the Validation data. This
suggests that our Trend (Beta) parameter is too large.
Let’s go back and try the Multiplicative method one more time using the
Optimize parameter. This parameter will choose the best values for the Alpha,
Trend, and Seasonal parameters based on the Forecasting Mean Squared Error.
It is recommended that this feature be used carefully as this option can lead to
overfitting. An overfit model rarely exhibits high predictive accuracy in the
validation set.
Click Smoothing – Holt-Winters – Multiplicative on the Data Mining ribbon.
We will again select Passengers for Selected Variable (if not already selected)
and 12 for Period. Produce Forecast on Validation is selected by default.
This time we’ll also select the Optimize parameter.

Afterwards, click OK to proceed with the smoothing technique.


MulHoltWinters1 is right.

The Parameters/Options table gives us the parameter settings as chosen by the


Optimize feature for Alpha (0.858), Beta (0.003) and Gamma (0.917). (Recall
that the default settings as 0.20 (Alpha), 0.15 (Beta) and 0.05 (Seasonal).) Scroll
down to find the Training and Validation Error Measures.

Frontline Solvers Analytic Solver Data Mining Reference Guide 268


Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select MulHoltWinters1 for Worksheet and Time Series Training Data
or Time Series Validation Data for Chart.

Now let’s create a new model using the Additive model. This technique
assumes the values for the different seasons differ by a constant amount. Click
back to the TSPartition sheet, then click Smoothing – Holt Winters – Additive
to open the Holt Winters Smoothing (Additive Model) dialog.
Select Month for Time Variable, if not already selected, and Passengers for
Selected variable. Again, enter 12 for Period. Produce Forecast on validation
is selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 269


Click OK to run the smoothing technique. AddHoltWinters and
AddHoltWinters_Stored will be inserted right of the TSPartition worksheet. For
more information on AddHoltWinters_Stored, see the chapter, “Scoring New
Data” within the Analytic Solver User Guide.
Click a point anywhere on the function to compare Actual vs. Forecasted at the
top of the graph.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select AddHoltWinters for Worksheet and Time Series Training Data
or Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 270


Let’s try the Additive model again using the Optimize feature. Click back to
TSPartition and then click Smoothing – Holt-Winters – Additive on the Data
Mining ribbon.
Select Month for Select Passengers for Selected Variable, 12 for Period.
Produce Forecast on Validation is selected by default. Select Optimize to run
the Optimize algorithm which will pick the best values for the three parameters,
Alpha, Beta, and Gamma.

Click OK, then click the, AddHoltWinters1, tab.

Notice the parameter values chosen by the Optimize algorithm were 0.858 for
Alpha, .00351 for Beta, and 0.917 for Gamma. Scroll down to view the results
of the model fitting.
Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select AddHoltWinters1 for Worksheet and Time Series Training Data
or Time Series Validation Data for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 271


The last Holt Winters model should be used with time series that contain
seasonality, but no trends. Click back to TSPartition, then click Smoothing –
Holt Winters – No Trend to open the Holt Winters Smoothing (No trend
Model) dialog.
Select Month for Time Variable unless already selected. Select Passengers as
the Selected variable. Enter 12 for Period. Produce Forecast on validation is
selected by default. Notice that the trend parameter is missing. Values for
Alpha and Gamma can range from 0 to 1. A value of 1 for each parameter will
assign higher weights to the most recent observations and lower weights to the
earlier observations. This example will accept the default values.

Frontline Solvers Analytic Solver Data Mining Reference Guide 272


Click OK to run the smoothing technique. NoTrendHoltWinters and
NoTrendHoltWinters_Stored are right of the TSPartition worksheet.

Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select NoTrendHoltWinters for Worksheet and Time Series Training
Data or Time Series Validation Data for Chart.
Let’s try the No Trend model again using the Optimize feature. Click back to
TSPartition, then click Smoothing – Holt-Winters – No Trend on the Data
Mining ribbon.
Select Month for Time variable, unless already selected, and Passengers for
Selected Variable, 12 for Period. Produce Forecast on Validation is selected
by default. Select Optimize to run the Optimize algorithm which will pick the
best values for the two parameters, Alpha and Gamma.

Click OK. NoTrendHoltWinters1 is inserted right.

Frontline Solvers Analytic Solver Data Mining Reference Guide 273


Notice the parameter values chosen by the Optimize algorithm were 0.984 for
Alpha and 0.233 for Gamma. Scroll down to view the results of the model
fitting.

Note: To view these two charts in the Cloud app, click the Charts icon on the
Ribbon, select NoTrendHoltWinters for Worksheet and Time Series Training
Data or Time Series Validation Data for Chart.

Common Smoothing Options


Common Options
The following options are common to each of the Smoothing techniques.

First row contains headers


When this option is selected, variables will be listed in the Variables in input
data list box according to the first row in the dataset. If this option is not
checked, variables will appear as VarX where X = 1,2,3,4, etc.

Frontline Solvers Analytic Solver Data Mining Reference Guide 274


Variables in Input Data
All variables in the dataset will be listed here.

Time variable
Select a variable associated with time from the Variables in input data list box.

Selected variable
Select a variable to apply the smoothing technique.

Output Options
If applying this smoothing technique to partitioned data, the option Produce
forecast on validation will appear. Otherwise, the option Produce forecast will
appear. If selected, Analytic Solver Data Mining will include a forecast on the
output results.

Exponential Smoothing Options


This section explains the options included in the Weights section on the
Exponential Smoothing dialog.

Optimize
Select this option if you want to select the Alpha Level that minimizes the
residual mean squared errors in the training and validation sets. Take care when
using this feature as this option can result in an over fitted model. This option is
not selected by default.

Level (Alpha)
Enter the smoothing parameter here. This parameter is used in the weighted
average calculation and can be from 0 to 1. A value of 1 or close to 1 will result
in the most recent observations being assigned the largest weights and the
earliest observations being assigned the smallest weights. A value of 0 or close
to 0 will result in the most recent observations being assigned the smallest
weights and the earliest observations being assigned the largest weights. The
default value is 0.2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 275


Moving Average Smoothing Options
The section describes the options included in the Weights section of the Moving
Average Smoothing dialog.

Interval
Enter the window width of the moving average here. This parameter accepts a
value of 1 up to N -1(where N is the number of observations in the dataset). If a
value of 5 is entered for the Interval, then Analytic Solver Data Mining will use
the average of the last five observations for the last smoothed point or Ft = (Yt +
Yt-1 + Yt-2 + Yt – 3 + Yt-4) / 5. The default value is 2.

Double Exponential Smoothing Options


This section describes the options appearing in the Weights section on the
Double Exponential Smoothing dialog.

Optimize
Select this option to select the Alpha and Beta values that minimize the residual
mean squared errors in the training and validation sets. Take care when using
this feature as this option can result in an over fitted model. This option is not
selected by default.

Level (Alpha)
Enter the smoothing parameter here. This parameter is used in the weighted
average calculation and can be from 0 to 1. A value of 1 or close to 1 will result
in the most recent observations being assigned the largest weights and the
earliest observations being assigned the smallest weights in the weighted
average calculation. A value of 0 or close to 0 will result in the most recent
observations being assigned the smallest weights and the earliest observations
being assigned the largest weights in the weighted average calculation. The
default is 0.2.

Trend (Beta)
The Double Exponential Smoothing technique includes an additional parameter,
Beta, to contend with trends in the data. This parameter is also used in the
weighted average calculation and can be from 0 to 1. A value of 1 or close to 1
will result in the most recent observations being assigned the largest weights and

Frontline Solvers Analytic Solver Data Mining Reference Guide 276


the earliest observations being assigned the smallest weights in the weighted
average calculation. A value of 0 or close to 0 will result in the most recent
observations being assigned the smallest weights and the earliest observations
being assigned the largest weights in the weighted average calculation. The
default is 0.15.

Holt Winter Smoothing Options


The options in this section appear in the Parameters and Weights section of the
Holt Winters Smoothing dialogs.

Period
Enter the number of periods that make up one season. The value for # Complete
seasons will be automatically filled.

Optimize
Select this option to select the Alpha, Beta, and Gamma values that minimize
the residual mean squared errors in the training and validation sets. Take care
when using this feature as this option can result in an over fitted model. This
option is not selected by default.

Level (Alpha)
Enter the smoothing parameter here. This parameter is used in the weighted
average calculation and can be from 0 to 1. A value of 1 or close to 1 will result
in the most recent observations being assigned the largest weights and the
earliest observations being assigned the smallest weights in the weighted
average calculation. A value of 0 or close to 0 will result in the most recent
observations being assigned the smallest weights and the earliest observations
being assigned the largest weights in the weighted average calculation. The
default is 0.2.

Trend (Beta)
The Holt Winters Smoothing also utilizes the Trend parameter, Beta, to contend
with trends in the data. This parameter is also used in the weighted average
calculation and can be from 0 to 1. A value of 1 or close to 1 will result in the
most recent observations being assigned the largest weights and the earliest
observations being assigned the smallest weights in the weighted average
calculation. A value of 0 or close to 0 will result in the most recent observations

Frontline Solvers Analytic Solver Data Mining Reference Guide 277


being assigned the smallest weights and the earliest observations being assigned
the largest weights in the weighted average calculation. The default is 0.15.
This option is not included on the No Trend Model dialog.

Seasonal (Gamma)
The Holt Winters Smoothing technique utilizes an additional seasonal
parameter, Gamma, to manage the presence of seasonality in the data. This
parameter is also used in the weighted average calculation and can be from 0 to
1. A value of 1 or close to 1 will result in the most recent observations being
assigned the largest weights and the earliest observations being assigned the
smallest weights in the weighted average calculation. A value of 0 or close to 0
will result in the most recent observations being assigned the smallest weights
and the earliest observations being assigned the largest weights in the weighted
average calculation. The default is 0.05.

Produce Forecast
If this option is selected, Analytic Solver Data Mining will include a forecast on
the output results.

If running the Holt Winters Smoothing technique on an unpartitioned dataset,


the following two options are enabled.

Update estimate each time


If applying this smoothing technique to an unpartitioned dataset, this option is
enabled. Select this option to create an updated forecast each time that a
forecast is generated.

# Forecasts
If applying this smoothing technique to an unpartitioned dataset, this option is
enabled. Enter the desired number of forecasts here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 278


Data Mining Partitioning

Introduction
One very important issue when fitting a model is how well the newly created
model will behave when applied to new data. To address this issue, the dataset
can be divided into multiple partitions: a training partition used to create the
model, a validation partition to test the performance of the model and, if desired,
a third test partition. Partitioning is performed randomly, to protect against a
biased partition, according to proportions specified by the user or according to
rules concerning the dataset type. For example, when creating a time series
forecast, data is partitioned by chronological order.

Training Set
The training dataset is used to train or build a model. For example, in a linear
regression, the training dataset is used to fit the linear regression model, i.e. to
compute the regression coefficients. In a neural network model, the training
dataset is used to obtain the network weights. After fitting the model on the
training dataset, the performance of the model should be tested on the validation
dataset.

Validation Set
Once a model is built using the training dataset, the performance of the model
must be validated using new data. If the training data itself was utilized to
compute the accuracy of the model fit, the result would be an overly optimistic
estimate of the accuracy of the model. This is because the training or model
fitting process ensures that the accuracy of the model for the training data is as
high as possible -- the model is specifically suited to the training data. To obtain
a more realistic estimate of how the model would perform with unseen data, we
must set aside a part of the original data and not include this set in the training
process. This dataset is known as the validation dataset.
To validate the performance of the model, Analytic Solver Data Mining
measures the discrepancy between the actual observed values and the predicted
value of the observation. This discrepancy is known as the error in prediction
and is used to measure the overall accuracy of the model.

Test Set
The validation dataset is often used to fine-tune models. For example, you might
try out neural network models with various architectures and test the accuracy of
each on the validation dataset to choose the best performer among the competing
architectures. In such a case, when a model is finally chosen, its accuracy with
the validation dataset is still an optimistic estimate of how it would perform with
unseen data. This is because the final model has come out as the winner among
the competing models based on the fact that its accuracy with the validation
dataset is highest. As a result, it is a good idea to set aside yet another portion of
data which is used neither in training nor in validation. This set is known as the

Frontline Solvers Analytic Solver Data Mining Reference Guide 279


test dataset. The accuracy of the model on the test data gives a realistic estimate
of the performance of the model on completely unseen data.
Analytic Solver Data Mining provides two methods of partitioning: Standard
Partitioning and Partitioning with Oversampling. Analytic Solver Data Mining
provides two approaches to standard partitioning: random partitioning and user-
defined partitioning.

Random Partitioning
In simple random sampling, every observation in the main dataset has equal
probability of being selected for the partition dataset. For example, if you
specify 60% for the training dataset, then 60% of the total observations are
randomly selected for the training dataset. In other words, each observation has
a 60% chance of being selected.
Random partitioning uses the system clock as a default to initialize the random
number seed. Alternatively, the random seed can be manually set which will
result in the same observations being chosen for the training/validation/test sets
each time a standard partition is created.

User – defined Partitioning


In user-defined partitioning, the partition variable specified is used to partition
the dataset. This is useful when you have already predetermined the
observations to be used in the training, validation, and/or test sets. This partition
variable takes the value: "t" for training, "v" for validation and "s" for test. Rows
with any other values in the Partition Variable column are ignored. The partition
variable serves as a flag for writing each observation to the appropriate
partition(s).

Partition with Oversampling


This method of partitioning is used when the percentage of successes in the
output variable is very low, e.g. callers who “opt in” to a short survey at the end
of a customer service call. Typically, the number of successes, in this case, the
number of people who finish the survey, is very low so information connected
with these callers is minimal. As a result, it would be almost impossible to
formulate a model based on these callers. In these types of cases, we must use
Oversampling (also called weighted sampling). Oversampling can be used
when there are only two classes, one of much greater importance than the other,
i.e. callers who finish the survey as compared to callers who simply hang up.
Analytic Solver Data Mining takes the following steps when partitioning with
oversampling.
1. The data is partitioned by randomly allocating 50% of the success values
for the output variable to the training set. The output variable must be
limited to two classes which can either be numbers or strings.
2. Analytic Solver Data Mining maintains the % success in training set
specified by the user in the training set by randomly selecting the required
records with failures.
3. The remaining 50% of successes are randomly allocated to the validation
set.

Frontline Solvers Analytic Solver Data Mining Reference Guide 280


4. If % validation data to be taken away as test data is selected, then
Analytic Solver Data Mining will create an appropriate test set from the
validation set.

Partition Options
It is no longer always necessary to partition a dataset before running a
classification or regression algorithm. Rather, you can now perform partitioning
on the Parameters dialog for each classification or regression method.

If the active data set is un-partitioned, the Partition Data command button, will
be enabled. If the active data set has already been partitioned, this button will be
disabled. Clicking the Partition Data button opens the following dialog. Select
Partition Data on the dialog to enable the partitioning options.

If a data partition will be used to train and validate several different


classification or regression algorithms that will be compared for predictive
power, it may be better to use the Ribbon Partition choices to create a
partitioned dataset. But if the data partition will be used with a single algorithm,
or if it isn’t crucial to compare algorithms on exactly the same partitioned data,
“Partition-on-the-Fly” offers several advantages:
• User interface steps are saved, and the Analytic Solver task pane is not
cluttered with partition output.
• Partition-on-the-fly is much faster than creating a standard partition and
then running an algorithm.
• Partition-on-the-fly can handle larger datasets without exhausting
memory, since the intermediate partition results for the partitioned data
is never created.

Standard Data Partition Example


The example in this section illustrates how to use Analytic Solver Data Mining’s
partition utility to partition the example dataset, Wine.xlsx. Click Help –
Examples -- Forecasting/Data Mining Examples to open.

Frontline Solvers Analytic Solver Data Mining Reference Guide 281


Click Partition – Standard Partition on the Data Mining Ribbon. The
Standard Data Partition dialog opens.
Highlight all variables in the Variables In Input Data list box, then click > to
include them in the partitioned data. Then click OK to accept the remainder of
the default settings. Sixty percent of the observations will be assigned to the
Training set and forty percent of the observations will be assigned to the
Validation set.

STDPartition inserted right of the Data worksheet.


107 observations were assigned to the training set and 71 observations were
assigned to the validation set, or roughly 60% and 40% of the observations,
respectively.

Frontline Solvers Analytic Solver Data Mining Reference Guide 282


It is also possible for the user to specify which sets each observation should be
assigned. In column O, enter a “t”, “v” or “s” to indicate the assignment of each
record to either the training dataset (t), the validation dataset (v), or the test
dataset (s), as shown in the screenshot below.

Click Partition – Standard Partition on the Data Mining ribbon to open the
Standard Data Partition dialog.
Select Use Partition Variable in the Partitioning options section, select
Partition Variable in the Variables list box, then click > next to Use Partition
Variable. Analytic Solver Data Mining will use the values in the Partition
Variable column to create the training, validation, and test sets. Records with a
“t” in the O column will be designated as training records. Records with a “v”
in the O column will be designated as validating records and records with an “s”
in this column will be designated as testing records. Now highlight all
remaining variables in the list box and click > to include them in the partitioned
data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 283


Click OK to create the partitions. STDPartition1 is inserted right. If you
inspect the results, you will find that all records assigned a “t” now belong to the
training set, all records assigned a “v” now belong to the validation set, and all
records assigned an “s” now belong to the test set.

Frontline Solvers Analytic Solver Data Mining Reference Guide 284


Partition with Oversampling Example
This example illustrates the use of partitioning with oversampling using
Analytic Solver Data Mining. Click Help – Examples on the Data Mining
ribbon, then Forecasting/Data Mining Examples to open the example model,
Catalog_multi.xlsx.
This sample dataset contains information associated with the response of a direct
mail offer, published by DMEF, the Direct Marketing Educational Foundation.
The output variable is Target dependent variable:buyer(yes=1). Since the
success rate for the target variable (Target dependent variable:buyer(yes=1)) is
less than 1%, the data will be “trained” with a 50% success rate using Analytic
Solver Data Mining’s oversampling utility.
Click Partition – Partition with Oversampling (in the Data Mining section of
the Data Mining ribbon) to open the Partition with Oversampling dialog.
First confirm that Data Range at the top of the dialog is displayed as
$A$1:$V$58206. If not, simply click in the Data Range field and type the
correct range.
Select all variables in the Variables list box then click > to move all variables to
the Variables in the Partition Data list box. Afterwards, highlight Target
dependent variable: buyer(yes = 1) in the Variables in the Partition Data list
box then click the > immediately right of Output variable to designate this
variable as the output variable. Reminder: this output variable is limited to two
classes, e.g. 0/1 or “yes”/”no”.
Enter 50 for Specify % validation data to be taken away as test data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 285


Click OK to partition the data. OSPartition is inserted right.

Frontline Solvers Analytic Solver Data Mining Reference Guide 286


The percentage of success records in the original data set is 0.9896 or 576/58204
(number of successes / number of total rows in original dataset). 50% was
specified for both Specify % success in training set and Specify % validation
data to be taken away as test data in the Partition with Oversampling dialog. As
a result, Analytic Solver Data Mining has randomly allocated 50% of the
successes (the 1’s) to the training set and the remaining 50% to the validation
set. This means that there are 288 successes in the training set and 288
successes in the validation set. To complete the training set, Analytic Solver
Data Mining randomly selected 288 non successes (0’s). The training set has
576 rows (288 1’s + 288 0’s).
The output above shows that the % Success in original data set is .9896.
Analytic Solver Data Mining will maintain this percentage in the validation set
as well by allocating as many 0’s as needed. Since 288 successes (1’s) have
already been allocated to the validation set, 14,551 non successes (0’s) must be
added to the validation set to maintain the .98% ratio.
Since we specified 50% of validation data should be taken as test data, Analytic
Solver Data Mining has allocated 50% of the validation records to the test set.
Each set contains 14,551 rows.

Standard Partitioning Options


The options below appear on the Standard Data Partition dialog, shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 287


Use partition variable
Select this option when assigning each record to a specific set using an added
variable in the dataset. Each observation should be assigned a “t”, “v” or “s” to
delineate “training”, “validation” or “test”, respectively.
Select this variable from the Variables in Input Data list box then click >, to the
right of the Use partition variable radio button, to add the appropriate variable
as the partition variable.

Set Seed
Random partitioning uses the system clock as a default to initialize the random
number seed. By default this option is selected to specify a seed for random
number generation for the partitioning. Setting this option will result in the same
records being assigned to the same set on successive runs. The default seed
entry is 12345.

Frontline Solvers Analytic Solver Data Mining Reference Guide 288


Pick up rows randomly
When this option is selected, Analytic Solver Data Mining will randomly select
observations to be included in the training, validation, and test sets.

Automatic percentages
If Pick up rows randomly is selected under Partitioning options, this option will
be enabled. Select this option to accept the defaults of 60% and 40% for the
percentages of records to be included in the training and validation sets. This is
the default selection.

Specify percentages
If Pick up rows randomly is selected under Partitioning options, this option will
be enabled. Select this option to manually enter percentages for training set,
validation set and test sets. Records will be randomly allocated to each set
according to these percentages.

Equal percentages
If Pick up rows randomly is selected under Partitioning options, this option will
be enabled. If this option is selected, Analytic Solver Data Mining will allocate
33.33% of the records in the database to each set: training, validation, and test.

Partitioning with Oversampling Options


The following options appear on the Partitioning with Oversampling dialog, as
shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 289


Set seed
Random partitioning uses the system clock as a default to initialize the random
number seed. This option is not selected by default. Setting this option will
result in the same records being assigned to the same set on successive runs.
The default seed entry is 12345.

Output variable
Select the output variable from the variables listed in the Variables in the
partition data list box.

#Classes
After the output variable is chosen, the number of classes (distinct values) for
the output variable will be displayed here. Analytic Solver Data Mining
supports a class size of 2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 290


Specify Success class
After the output variable is chosen, you can select the success value for the
output variable here (i.e. 0 or 1 or “yes” or “no”).

% of success in data set


After the output variable is selected, the percentage of the number of successes
in the dataset is listed here.

Specify % success in training set


Enter the percentage of successes to be assigned to the training set here. The
default is 50%. With this setting, 50% of the successes will be assigned to the
training set and 50% will be assigned to the validation set.

Specify % validation data to be taken away as


test data
If a test set is desired, specify the percentage of the validation set that should be
allocated to the test set here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 291


Discriminant Analysis
Classification Method

Introduction
Discriminant analysis is a technique for classifying a set of observations into
predefined classes in order to determine the class of an observation based on a
set of variables. These variables are known as predictors or input variables. The
model is built based on a set of observations for which the classes are known.
This set of observations is sometimes referred to as the training set. Based on
the training set, the technique constructs a set of linear functions of the
predictors, known as discriminant functions, such that L = b1x1 + b2x2 + … +
bnxn + c, where the b's are discriminant coefficients, the x's are the input
variables or predictors and c is a constant.
These discriminant functions are used to predict the class of a new observation
with an unknown class. For a k class problem k discriminant functions are
constructed. Given a new observation, all the k discriminant functions are
evaluated and the observation is assigned to class i if the ith discriminant
function has the highest value.

Discriminant Analysis Example


The example below illustrates how to use the Discriminant Analysis
classification algorithm using the Boston_Housing.xlsx example dataset. Click
Help – Examples -- Forecasting/Data Mining Examples to open the dataset.
A portion of the dataset is shown in the screenshot below.

This dataset includes fourteen variables pertaining to housing prices from census
tracts in the Boston area. Data collected by the US Census Bureau.

CRIM Per capita crime rate by town


ZN Proportion of residential land zoned for lots over 25,000 sq.ft.

Frontline Solvers Analytic Solver Data Mining Reference Guide 292


INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)
RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to 1940
DIS Weighted distances to five Boston employment centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of African-Americans by
town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's
CAT.MEDV Binary variable that indicates based on the MEDV variable. If MEDV >
30, CAT.MEDV = 1
First, we’ll need to perform a standard partition, as explained in the previous
chapter, using percentages of 80% training and 20% validation. STDPartition
will be inserted to the right of the Data worksheet. (For more information on
how to partition a dataset, please see the previous Data Mining Partitioning
chapter.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 293


Click Classify – Discriminant Analysis to open the Discriminant Analysis –
Data dialog. Select the CAT. MEDV variable in the Variables in Input Data
list box then click > to select as the Output Variable. Immediately, the options
for Classes in the Output Variable are enabled. #Classes is prefilled as “2”
since the CAT. MEDV variable contains two classes, 0 and 1.
Success Class is selected by default and Class 1 is to be considered a “success”
or the significant class in the Lift Chart. (Note: This option is enabled when the
number of classes in the output variable is equal to 2.)
Enter a value between 0 and 1 for Success Probability Cutoff. If the calculated
probability for success for an observation is greater than or equal to this value,
than a “success” (or a 1) will be predicted for that observation. If the calculated
probability for success for an observation is less than this value, then a “non-
success” (or a 0) will be predicted for that observation. The default value is 0.5.
(Note: This option is only enabled when the # of classes is equal to 2.)
Select CRIM, ZN, INDUS, NOX, RM, AGE, DIS, RAD, TAX, PTRATIO, &
B in the Variables in Input Data list box then click > to move to the Selected
Variables list box. (Record ID, CHAS, LSTAT, & MEDV should remain in the
Variables in Input Data list box as shown below.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 294


Click Next to advance to the Parameters dialog.
If you haven't already partitioned your dataset, you can do so from within the
Discriminant Analysis method by selecting Partition Data on the Parameters tab.
If this option is selected, Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
prediction method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.
Click Rescale Data, to open the Rescaling Dialog.

Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted

Frontline Solvers Analytic Solver Data Mining Reference Guide 295


Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.
Select the Rescale Data option, then click Done to accept the default,
Standardization, and close the dialog.
Click Prior Probability. Three options appear in the Prior Probability Dialog:
Empirical, Uniform and Manual.

If the first option is selected, Empirical, Analytic Solver Data Mining will
assume that the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
If the second option is selected, Uniform, Analytic Solver Data Mining will
assume that all classes occur with equal probability.
Select the third option, Manual, to manually enter the desired class and
probability value
Click Done to close the dialog.
Select Canonical Variate. When this option is selected, Analytic Solver Data
Mining produces the canonical variates for the data based on an orthogonal
representation of the original variates. This has the effect of choosing a
representation which maximizes the distance between the different groups. For a
k class problem there are k-1 Canonical variates. Typically, only a subset of the
canonical variates is sufficient to discriminate between the classes. For this
example, we have two canonical variates which means that if we replace the
four original predictors by just two predictors, X1 and X2, (which are actually
linear combinations of the four original predictors) the discrimination based on
these two predictors will perform just as well as the discrimination based on the
original predictors.
When Canonical Variate Analysis is selected, Show CVA Model is enabled.
Select this option to produce the Canonical Variates in the output.
Select Show LDA Model to print the Linear Discriminant Functions in the
output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 296


Click Next to advance to the Scoring dialog.
Select all three options for Score Training/Validation data.
When Detailed report is selected, Analytic Solver Data Mining will create a
detailed report of the Discriminant Analysis output.
When Summary report is selected, Analytic Solver Data Mining will create a
report summarizing the Discriminant Analysis output.
When Lift Charts is selected, Analytic Solver Data Mining will include Lift
Chart and ROC Curve plots in the output.
Since we did not create a test partition, the options for Score test data are
disabled. See the chapter “Data Mining Partitioning” for information on how to
create a test partition.
See the Scoring New Data chapter within the Analytic Solver Data Mining User
Guide for more information on Score New Data in options.

Click Finish. Output sheets will be inserted into your active workbook.

Scroll down till you come to the Linear Discriminant Functions table. In this
example, there are 2 functions -- one for each class. Each variable is assigned to
the class that contains the higher value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 297


Immediately beneath the Linear Discriminant Functions table is the Canonical
Variates table. These functions give a representation of the data that maximizes
the separation between the classes. The number of functions is one less than the
number of classes (so in this case there is just one function). If we were to plot
the cases in this example on a line where xi is the ith case's value for variate1,
you would see a clear separation of the data. This output is useful in illustrating
the inner workings of the discriminant analysis procedure, but is not typically
needed by the end-user analyst.
Click the DA_TrainingScore tab to view the Training: Classification Summary.
A Confusion Matrix is used to evaluate the performance of a classification
method. This matrix summarizes the records that were classified correctly and
those that were not.

Confusion Matrix
Predicted
Class
Actual Class 1 0
1 TP FN
0 FP TN
TP stands for True Positive. These are the number of cases classified as
belonging to the Success class that actually were members of the Success class.
FN stands for False Negative. These are the number of cases that were
classified as belonging to the Failure class when they were actually members of
the Success class (i.e. patients with cancerous tumors who were told their tumors
were benign). FP stands for False Positive. These cases were assigned to the
Success class but were actually members of the Failure group (i.e. patients who

Frontline Solvers Analytic Solver Data Mining Reference Guide 298


were told they tested positive for cancer when, in fact, their tumors were
benign). TN stands for True Negative. These cases were correctly assigned to
the Failure group.

In the Training Dataset, we see 56 records belonging to the Success class were
correctly assigned to that class while 17 records belonging to the Success class
were incorrectly assigned to the Failure class. 320 records belonging to the
Failure class were correctly assigned to this same class while 12 records
belonging to the Failure class were incorrectly assigned to the Success clas. The
total number of misclassified records was 29 (12 + 17) which results in an error
equal to 7.16%.

Metrics
Precision is the probability of correctly identifying a randomly selected record
as one belonging to the Success class (i.e. the probability of correctly identifying
a random patient as having cancer). Recall (or Sensitivity) measures the
percentage of actual positives which are correctly identified as positive (i.e. the
proportion of people with cancer who are correctly identified as having cancer).
Specificity (also called the true negative rate) measures the percentage of
failures correctly identified as failures (i.e. the proportion of people with no
cancer being categorized as not having cancer.) The F-1 score, which fluctuates
between 1 (a perfect classification) and 0, defines a measure that balances
precision and recall.
Precision = TP/(TP+FP)
Sensitivity or True Positive Rate (TPR) = TP/(TP + FN)
Specificity (SPC) or True Negative Rate =TN / (FP + TN)
F1 = 2 * TP / (2 * TP + FP + FN)
Scroll down to the Training: Classification Details table to see how each
observation in the training data was classified. The probability values for
success in each record are shown after the predicted class and actual class

Frontline Solvers Analytic Solver Data Mining Reference Guide 299


columns. Records assigned to a class other than what was predicted are
highlighted in red.

Click the DA_ValidationScore tab to open the Validation: Classification


Summary.

In the Validation Dataset, 16 records were correctly classified as belonging to


the Success class while 80 cases were correctly classified as belonging to the
Failure class. Five (5) records were incorrectly classified as belonging to the
Success class when they were, in fact, members of the Failure class. No (0)
records were incorrectly assigned to the Failure class. This resulted in a total
classification error of 4.95%.
Scroll down to the Validation: Classification Details table to see how each
observation in the validation data was classified. The probability values for
success in each record are shown after the predicted class and actual class
columns. Records assigned to a class other than what was predicted are
highlighted in red.

Click DA_TrainingLiftChart and DA_ValidationLiftChart tabs to navigate


to the Training and Validation Data Lift, Decile, ROC Curve and Cumulative
Gain Charts.
Lift Charts and ROC Curves are visual aids that help users evaluate the
performance of their fitted models. Charts found on the DA_Training LiftChart

Frontline Solvers Analytic Solver Data Mining Reference Guide 300


tab were calculated using the Training Data Partition. Charts found on the
DA_ValidationLiftChart tab were calculated using the Validation Data Partition.
It is good practice to look at both sets of charts to assess model performance on
both datasets.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select DA_TrainingLiftChart or DA_ValidationLiftChart for Worksheet
and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, ROC Curve, and Lift Charts for Training Partition

Decile-wise Lift Chart, ROC Curve, and Lift Charts for Valid. Partition

After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in decreasing order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the cumulative number of cases in decreasing
probability (on the x-axis) vs the cumulative number of true positives on the y-
axis. The baseline (red line connecting the origin to the end point of the blue
line) is a reference line. For a given number of cases on the x-axis, this line
represents the expected number of successes if no model existed, and instead
cases were selected at random. This line can be used as a benchmark to measure
the performance of the fitted model. The greater the area between the lift curve
and the baseline, the better the model. In the Training Lift chart, if we selected
100 cases as belonging to the success class and used the fitted model to pick the
members most likely to be successes, the lift curve tells us that we would be
right on about 63 of them. Conversely, if we selected 100 random cases, we
could expect to be right on about 15 of them. In the Training Lift chart, if we
selected 50 cases as belonging to the success class and used the fitted model to
pick the members most likely to be successes, the lift curve tells us that we
would be right on about 17 of them. Conversely, if we selected 50 random
cases, we could expect to be right on about 8 of them.

Frontline Solvers Analytic Solver Data Mining Reference Guide 301


The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
This bars in this chart indicate the factor by which the model outperforms a
random assignment, one decile at a time. Refer to the validation graph above.
In the first decile, taking the most expensive predicted housing prices in the
dataset, the predictive performance of the model is about 5 times better as
simply assigning a random predicted value.
The Regression ROC curve was updated in V2017. This new chart compares
the performance of the regressor (Fitted Predictor) with an Optimum Predictor
Curve and a Random Classifier curve. The Optimum Predictor Curve plots a
hypothetical model that would provide perfect classification results. The best
possible classification performance is denoted by a point at the top left of the
graph at the intersection of the x and y axis. This point is sometimes referred to
as the “perfect classification”. The closer the AUC is to 1, the better the
performance of the model. In the Validation Partition, AUC = .95 which
suggests that this fitted model is a good fit to the data.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against the Predictive Positive Rate or
Support.
Lift Chart (Alternative) and Gain Chart for Training Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 302


Lift Chart (Alternative) and Gain Chart for Validation Partition

Click the down arrow and select Gain Chart from the menu. In this chart, the
True Positive Rate or Sensitivity is plotted against the Predictive Positive Rate
or Support.

Click the Canonical Scores - Validation link to navigate to


DA_ValidationCanScore. Canonical Scores are the values of each case for the
function. These are intermediate values useful for illustration but are not usually
required by the end-user analyst. Canonical Scores are also available for the
training dataset on the DA_TrainingCanScore sheet.

Frontline Solvers Analytic Solver Data Mining Reference Guide 303


For information on Stored Model Sheets, in this example DA_Stored, please
refer to the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide.

Discriminant Analysis Options


The options below appear on the Discriminant Analysis dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 304


Variables in Input Data
The variables present in the dataset are listed here.

Selected Variables
The variables to be included in the Discriminant Analysis algorithm are listed
here.

Output Variable
The selected output variable is displayed here.

Number of Classes
This value is the number of classes in the output variable.

Success Class
This option is selected by default. Select the class to be considered a “success”
or the significant class in the Lift Chart. This option is enabled when the
number of classes in the output variable is equal to 2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 305


Success Probability Cutoff
Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less
than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
classification method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 306


Rescale Data

Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Prior Probability
Click Prior Probability to open the dialog below. Three options appear in the
Prior Probability Dialog: Empirical, Uniform and Manual.

• If the first option is selected, Empirical, Analytic Solver Data Mining


will assume that the probability of encountering a particular class in the
dataset is the same as the frequency with which it occurs in the training
data.
• If the second option is selected, Uniform, Analytic Solver Data Mining
will assume that all classes occur with equal probability.
• Select the third option, Manual, to manually enter the desired class and
probability.

Frontline Solvers Analytic Solver Data Mining Reference Guide 307


Canonical Variate Analysis
When this option is selected, Analytic Solver Data Mining produces the
canonical variates for the data based on an orthogonal representation of the
original variates. This has the effect of choosing a representation which
maximizes the distance between the different groups. For a k class problem
there are k-1 Canonical variates. Typically, only a subset of the canonical
variates is sufficient to discriminate between the classes. For this example, we
have two canonical variates which means that if we replace the four original
predictors by just two predictors, X1 and X2, (which are actually linear
combinations of the four original predictors) the discrimination based on these
two predictors will perform just as well as the discrimination based on the
original predictors.

Show CVA Model


When Canonical Variate Analysis is selected, Show CVA Model is enabled.
Select this option to produce the Canonical Scores in the output.

Show LDA Model


Select this option to display the functions that define each class in the output.

Score Training Data


Select these options to show an assessment of the performance of the
Discriminant Analysis algorithm in classifying the training data. The report is
displayed according to your specifications - Detailed, Summary, and Lift charts.
Canonical Scores is available only if Canonical Variate is selected in the Step 2
of 3 dialog. Lift charts are only available when the Output Variable contains 2
categories.

Score Validation Data


These options are enabled when a validation data set is present. Select these
options to show an assessment of the performance of the Discriminant Analysis
algorithm in classifying the validation data. The report is displayed according to
your specifications - Detailed, Summary, and Lift charts. Canonical Scores is
available only if Canonical Variate is selected in the Step 2 of 3 dialog. Lift
charts are only available when the Output Variable contains 2 categories.

Frontline Solvers Analytic Solver Data Mining Reference Guide 308


Score Test Data
These options are enabled when a test set is present. Select these options to
show an assessment of the performance of the Discriminant Analysis algorithm
in classifying the test data. The report is displayed according to your
specifications - Detailed, Summary, and Lift charts. Canonical Scores is
available only if Canonical Variate is selected in the Step 2 of 3 dialog. Lift
charts are only available when the Output Variable contains 2 categories.

Score New Data


See the Scoring chapter within the Analytic Solver Data Mining User Guide for
more information on the options located in the Score Test Data and Score New
Data groups.

Frontline Solvers Analytic Solver Data Mining Reference Guide 309


Logistic Regression

Introduction
Logistic Regression is a regression model where the dependent (target) variable
is categorical. Analytic Solver Data Mining provides the functionality to fit a
Logistic Model for binary classification problems, i.e. where the dependent
variable contains exactly two classes. The fitted model can be used to estimate
the posterior probability of the binary outcome based on one or more predictors
(features or independent variables). Examples of such binary outcomes could be
a college acceptance or rejection, loan application approval or rejection, or
classification of a tumor being benign or cancerous.
Logistic Regression is a popular and powerful classification method widely used
in various fields due to the model’s simplicity and high interpretability. Analytic
Solver Data Mining implements highly efficient algorithms for Logistic
Regression fitting and scoring procedures, which makes this method applicable
for large datasets. It’s important to note that Logistic Regression is a linear
model and cannot capture the non-linear relationships in the data.
Technically, the Logistic Regression fitting procedure aims to fit the coefficients
(b_i) of a linear combination of predictor variables (X_i) to estimate the log
odds of the binary outcome, i.e. a logit transformation of probability of a
particular outcome (p).
Note the similarity between the formulations of Linear and Logistic Regression.
Both define the response as a linear combination of predictor variables.
However, the linear model predicts a continuous response, which can take any
real value, while Logistic Regression requires a response (probability) to be
bounded in [0,1] range. This is achieved through the logit transformation as
shown below.

Logistic Regression Example


This example illustrates how to fit a model using Analytic Solver Data Mining’s
Logistic Regression algorithm using the Boston_Housing dataset.
Click Help – Examples on the Data Mining ribbon, then Forecasting/Data
Mining Examples and open the example file, Boston_Housing.xlsx.
This dataset includes fourteen variables pertaining to housing prices from census
tracts in the Boston area collected by the US Census Bureau.
CRIM Per capita crime rate by town
ZN Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)

Frontline Solvers Analytic Solver Data Mining Reference Guide 310


RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to 1940
DIS Weighted distances to five Boston employment centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of African-Americans by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's
The figure below displays a portion of the data; observe the last column (CAT.
MEDV). This variable has been derived from the MEDV variable by assigning
a 1 for MEDV levels above 30 (>= 30) and a 0 for levels below 30 (<30) and
will not be used in this example.

First, we partition the data into training and validation sets using the Standard
Data Partition defaults of 60% of the data randomly allocated to the Training Set
and 40% of the data randomly allocated to the Validation Set. For more
information on partitioning a dataset, see the Data Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 311


This example develops a model for predicting the median price of a house in a
census track in the Boston area.
Click Classify – Logistic Regression on the Data Mining ribbon. The Logistic
Regression dialog appears.
The categorical variable CAT.MEDV has been derived from the MEDV
variable (Median value of owner-occupied homes in $1000's) a 1 for MEDV
levels above 30 (>= 30) and a 0 for levels below 30 (<30). This will be our
Output Variable.
Select the nominal categorical variable, CHAS, as a Categorical Variable. This
variable is a 1 if the housing tract is located adjacent to the Charles River.
Select the remaining variables as Selected Variables.
One major assumption of Logistic Regression is that each observation provides
equal information. Analytic Solver Data Mining offers an opportunity to
provide a Weight Variable. Using a Weight Variable allows the user to allocate
a weight to each record. A record with a large weight will influence the model
more than a record with a smaller weight. For the purposes of this example, a
Weight Variable will not be used.
Choose the value that will be the indicator of “Success” by clicking the down
arrow next to Success Class. In this example, we will use the default of 1.

Frontline Solvers Analytic Solver Data Mining Reference Guide 312


Enter a value between 0 and 1 for Success Probability Cutoff. If this value is
less than this value, then a 0 will be entered for the class value, otherwise a 1
will be entered for the class value. In this example, we will keep the default of
0.5.

Click Next to advance to the Logistic Regression - Parameters dialog.


Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by selecting Partition Data on the
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
classification method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.
Keep Fit Intercept selected, the default setting, to fit the Logistic Regression
intercept. If this option is not selected, Analytic Solver Data Mining will force
the intercept term to 0.
Keep the default of 50 for the Iterations. Estimating the coefficients in the
Logistic Regression algorithm requires an iterative non-linear maximization
procedure. You can specify a maximum number of iterations to prevent the
program from getting lost in very lengthy iterative loops. This value must be an
integer greater than 0 or less than or equal to 100 (1< value <= 100).
Click Prior Probability to open the Prior Probability dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 313


Analytic Solver Data Mining will incorporate prior assumptions about how
frequently the different classes occur in each of the partitions.
• If Empirical is selected, Analytic Solver Data Mining will assume that
the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
• If Uniform is selected, Analytic Solver Data Mining will assume that
all classes occur with equal probability.
• If Manual is selected, the user can enter the desired class and
probability value.
For this example, click Done to select the default of Empirical and close the
dialog.
Select Variance – Covariance Matrix. When this option is selected, Analytic
Solver Data Mining will display the coefficient covariance matrix in the output.
Entries in the matrix are the covariances between the indicated coefficients. The
“on-diagonal” values are the estimated variances of the corresponding
coefficients.
Select Multicollinearity Diagnostics. At times, variables can be highly
correlated with one another which can result in large standard errors for the
affected coefficients. Analytic Solver Data Mining will display information
useful in dealing with this problem if Multicollinearity Diagnostics is selected.
Select Analysis of Coefficients. When this option is selected, Analytic Solver
Data Mining will produce a table with all coefficient information such as the
Estimate, Odds, Standard Error, etc. When this option is not selected, Analytic
Solver Data Mining will only print the Estimates.
When you have a large number of predictors and you would like to limit the
model to only the significant variables, click Feature Selection to open the
Feature Selection dialog and select Perform Feature Selection at the top of the
dialog. Keep the default selection of 12 for Maximum Subset Size. This option
can take on values of 1 up to N where N is the number of Selected Variables.
The default setting is N.

Frontline Solvers Analytic Solver Data Mining Reference Guide 314


Analytic Solver Data Mining offers five different selection procedures for
selecting the best subset of variables.
• Backward Elimination in which variables are eliminated one at a time,
starting with the least significant. If this procedure is selected, FOUT
is enabled. A statistic is calculated when variables are eliminated. For
a variable to leave the regression, the statistic’s value must be less than
the value of FOUT (default = 2.71).
• Forward Selection in which variables are added one at a time, starting
with the most significant. If this procedure is selected, FIN is enabled.
On each iteration of the Forward Selection procedure, each variable is
examined for the eligibility to enter the model. The significance of
variables is measured as a partial F-statistic. Given a model at a current
iteration, we perform an F Test, testing the null hypothesis stating that
the regression coefficient would be zero if added to the existing set if
variables and an alternative hypothesis stating otherwise. Each variable
is examined to find the one with the largest partial F-Statistic. The
decision rule for adding this variable into a model is: Reject the null
hypothesis if the F-Statistic for this variable exceeds the critical value
chosen as a threshold for the F Test (FIN value), or Accept the null
hypothesis if the F-Statistic for this variable is less than a threshold. If
the null hypothesis is rejected, the variable is added to the model and
selection continues in the same fashion, otherwise the procedure is
terminated.
• Sequential Replacement in which variables are sequentially replaced
and replacements that improve performance are retained.
• Stepwise selection is similar to Forward selection except that at each
stage, Analytic Solver Data Mining considers dropping variables that
are not statistically significant. When this procedure is selected, the
Stepwise selection options FIN and FOUT are enabled. In the stepwise
selection procedure a statistic is calculated when variables are added or
eliminated. For a variable to come into the regression, the statistic’s
value must be greater than the value for FIN (default = 3.84). For a
variable to leave the regression, the statistic’s value must be less than
the value of FOUT (default = 2.71). The value for FIN must be greater
than the value for FOUT.

Frontline Solvers Analytic Solver Data Mining Reference Guide 315


• Best Subsets where searches of all combinations of variables are
performed to observe which combination has the best fit. (This option
can become quite time consuming depending on the number of input
variables.) If this procedure is selected, Number of best subsets is
enabled.
Click Done to accept the default choice, Backward Elimimination with an
F-out setting of 2.71, and return to the Parameters dialog, then click Next to
advance to the Scoring dialog.

Select Detailed report, Summary report, and Lift charts under both Score
Training Data and Score Validation Data. Analytic Solver Data Mining will
create a detailed report, complete with the Output Navigator for ease in routing
to specific areas in the output, a report that summarizes the regression output for
both datasets, and lift charts, ROC curves, and Decile charts for both partitions.
Since we did not create a test partition when we partitioned our dataset, Score
Test Data options are disabled. See the chapter “Data Mining Partitioning” for
details on how to create a test set.
For information on scoring in a worksheet or database, please see the “Scoring
New Data” chapter in the Analytic Solver Data Mining User Guide.

Click Finish. The logistic regression output is inserted to the right of the
STDPartition worksheet. Use the Output Navigator on LR_Output to navigate
through the output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 316


Click the Training: Classification Details link to open the Training:
Classification Summary.
A Confusion Matrix is used to evaluate the performance of a classification
method. This matrix summarizes the records that were classified correctly and
those that were not.

True Positive cases (TP) are the number of cases classified as belonging to the
Success class that actually were members of the Success class. False Negative
cases (FN) are the number of cases that were classified as belonging to the
Failure class when they were actually members of the Success class (i.e. if a
cancerous tumor is considered a "success", then imagine patients with cancerous
tumors who were told their tumors were benign). False Positive (FP) cases were
assigned to the Success class but were actually members of the Failure group
(i.e. patients who were told they tested positive for cancer when, in fact, their
tumors were benign). True Negative (TN) cases were correctly assigned to the
Failure group.

In the Training Dataset, we see 40 records belonging to the Success class were
correctly assigned to that class while 7 records belonging to the Success class
were incorrectly assigned to the Failure class. In addition, 250 records
belonging to the Failure class were correctly assigned to this same class while 7
records belonging to the Failure class were incorrectly assigned to the Success
class. The total number of misclassified records was 14 (7+7) which results in
an error equal to 4.61%.

Frontline Solvers Analytic Solver Data Mining Reference Guide 317


Precision is the probability of correctly identifying a randomly selected record
as one belonging to the Success class (i.e. the probability of correctly identifying
a random patient with cancer as having cancer). Recall (or Sensitivity) measures
the percentage of actual positives which are correctly identified as positive (i.e.
the proportion of people with cancer who are correctly identified as having
cancer). Specificity (also called the true negative rate) measures the percentage
of failures correctly identified as failures (i.e. the proportion of people with no
cancer being categorized as not having cancer.) The F-1 score, which fluctuates
between 1 (a perfect classification) and 0, defines a measure that balances
precision and recall.
Precision = TP/ (TP+FP)
Sensitivity or True Positive Rate (TPR) = TP/(TP + FN)
Specificity (SPC) or True Negative Rate =TN / (FP + TN)
F1 = 2 * TP /(2TP+ FP + FN)
Scroll down to view the Training: Classification Details table. Note:
Misclassified records will appear in red.

Click the Validation: Classification Summary link to open the Validation


Classification Summary.

In the Validation Dataset, 34 records were correctly classified as belonging to


the Success class while 3 cases were incorrectly assigned to the Failure class.
156 cases were correctly classified as belonging to the Failure class. Nine (9)
records were incorrectly classified as belonging to the Success class when they
were, in fact, members of the Failure class. This resulted in a total classification
error of 5.94%

Frontline Solvers Analytic Solver Data Mining Reference Guide 318


Scroll down to view the Validation: Classification Details table. Again,
misclassified records appear in red.

Click the Predictors hyperlink in the Output Navigator to display the Model
Predictors table. In Analytic Solver Data Mining, a preprocessing feature
selection step is included to take advantage of automatic variable screening and
elimination using Rank-Revealing QR Decomposition. This allows Analytic
Solver Data Mining to identify the variables causing multicollinearity, rank
deficiencies and otherproblems that would otherwise cause the algorithm to fail.
Information about “bad” variables is used in Variable Selection and
Multicollinearity Diagnostics and in computing other reported statistics.
Included and excluded predictors are shown in the table below. In this model
there were no excluded predictors. All predictors were eligible to enter the
model passing the tolerance threshold of 5.23374E-10. This denotes a tolerance
beyond which a variance – covariance matrix is not exactly singular to within
machine precision. The test is based on the diagonal elements of the triangular
factor R resulting from Rank-Revealing QR Decomposition. Predictors that do
not pass the test are excluded.
Note: If a predictor is excluded, the corresponding coefficient estimates will be
0 in the regression model and the variable – covariance matrix would contain all
zeros in the rows and columns that correspond to the excluded predictor.
Multicollinearity diagnostics, variable selection and other remaining output will
be calculated for the reduced model.
The design matrix may be rank-deficient for several reasons. The most common
cause of an ill-conditioned regression problem is the presence of feature(s) that
can be exactly or approximately represented by a linear combination of other
feature(s). For example, assume that among predictors you have 3 input
variables X, Y, and Z where Z = a * X + b * Y where a and b are constants.
This will cause the design matrix to not have a full rank. Therefore, one of these
3 variables will not pass the threshold for entrance and will be excluded from the
final regression model.

Frontline Solvers Analytic Solver Data Mining Reference Guide 319


Since we selected Perform Feature Selection on the Feature Selection dialog,
Analytic Solver Data Mining has produced the following output on the
LogReg_FS tab which displays the variables that are included in the subsets.
This table contains the two subsets with the highest Residual Sum of Squares
values.

In this table, every model includes a constant term (since Fit Intercept was
selected) and one or more variables as the additional coefficients. We can use
any of these models for further analysis simply by clicking the hyperlink under
Subset ID in the far left column. The Logistic Regression will open. Click
Finish to run Logistic Regression using the variable subset as listed in the table.
The choice of model depends on the calculated values of various error values
and the probability. RSS is the residual sum of squares, or the sum of squared
deviations between the predicted probability of success and the actual value (1
or 0). "Mallows Cp" is a measure of the error in the best subset model, relative
to the error incorporating all variables. Adequate models are those for which Cp
is roughly equal to the number of parameters in the model (including the
constant), and/or Cp is at a minimum. "Probability" is a quasi hypothesis test of
the proposition that a given subset is acceptable; if Probability < .05 we can rule
out that subset.
The considerations about RSS, Cp and Probability in this example would lead us
to believe that the subset with 14 coefficients is the best model in this example.
Model terms are shown in the Coefficients output on the LogReg_Output sheet.

This table contains the coefficient estimate, the standard error of the coefficient,
the p-value, the odds ratio for each variable (which is simply e x where x is the
value of the coefficient) and confidence interval for the odds. (Note for the
Intercept term, the Odds Ratio is calculated as exp^0.)
Summary statistics, found directly above in the Regression Summary, show the
residual degrees of freedom (#observations - #predictors), a standard deviation
type measure for the model (which typically has a chi-square distribution), the
number of iterations required to fit the model, and the Multiple R-squared value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 320


The multiple R-squared value shown here is the r-squared value for a logistic
regression model , defined as -
R2 = (D0-D)/D0 ,
where D is the Deviance based on the fitted model and D0 is the deviance based
on the null model. The null model is defined as the model containing no
predictor variables apart from the constant.
Note: If a variable has been eliminated by Rank-Revealing QR Decomposition,
the variable will appear in red in the Coefficients table with a 0 Coefficient, Std.
Error, CI Lower, CI Upper, and RSS Reduction and N/A for the t-Statistic and
P-Values.
Collinearity Diagnostics help assess whether two or more variables so closely
track one another as to provide essentially the same information.

The columns represent the variance components (related to principal


components in multivariate analysis), while the rows represent the variance
proportion decomposition explained by each variable in the model. The
eigenvalues are those associated with the singular value decomposition of the
variance-covariance matrix of the coefficients, while the condition numbers are
the ratios of the square root of the largest eigenvalue to all the rest. In general,
multicollinearity is likely to be a problem with a high condition number (more
than 20 or 30), and high variance decomposition proportions (say more than 0.5)
for two or more variables.
Click LogReg_TrainingLiftChart and LogReg_ValidationLiftChart tab to
navigate to the Training and Validation Data Lift, Decile, and ROC Curve
charts.
Lift Charts and ROC Curves are visual aids that help users evaluate the
performance of their fitted models. Charts found on the NB_Training LiftChart
tab were calculated using the Training Data Partition. Charts found on the
LogReg_ValidationLiftChart tab were calculated using the Validation Data
Partition. It is good practice to look at both sets of charts to assess model
performance on both datasets.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select LogReg_TrainingLiftChart or LogReg_ValidationLiftChart for
Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, ROC Curve, and Lift Charts for Training Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 321


Decile-wise Lift Chart, ROC Curve, and Lift Charts for Valid. Partition

After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in decreasing order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the cumulative number of cases in decreasing
probability (on the x-axis) vs the cumulative number of true positives on the y-
axis. The baseline (red line connecting the origin to the end point of the blue
line) is a reference line. For a given number of cases on the x-axis, this line
represents the expected number of successes if no model existed, and instead
cases were selected at random. This line can be used as a benchmark to measure
the performance of the fitted model. The greater the area between the lift curve
and the baseline, the better the model. In the Training Lift chart, if we selected
100 cases as belonging to the success class and used the fitted model to pick the
members most likely to be successes, the lift curve tells us that we would be
right on about 45 of them. Conversely, if we selected 100 random cases, we
could expect to be right on about 15 of them. The Validation Lift chart tells us
that if we selected 100 cases as belonging to the success class and used the fitted
model to pick the members most likely to be successes, the lift curve tells us that
we would be right on about 37 of them. If we selected 100 random cases, we
could expect to be right on about 15 of them.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
The bars in this chart indicate the factor by which the model outperforms a
random assignment, one decile at a time. Refer to the validation graph above.
In the first decile, taking the most expensive predicted housing prices in the
dataset, the predictive performance of the model is about 4.5 times better as
simply assigning a random predicted value.
The Regression ROC curve was updated in V2017. This new chart compares
the performance of the regressor (Fitted Predictor) with an Optimum Predictor
Curve and a Random Classifier curve. The Optimum Predictor Curve plots a

Frontline Solvers Analytic Solver Data Mining Reference Guide 322


hypothetical model that would provide perfect classification results. The best
possible classification performance is denoted by a point at the top left of the
graph at the intersection of the x and y axis. This point is sometimes referred to
as the “perfect classification”. The closer the AUC is to 1, the better the
performance of the model. In the Validation Partition, AUC = .97 which
suggests that this fitted model is a good fit to the data.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against the Predictive Positive Rate or
Support.
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 323


Click the down arrow and select Gain Chart from the menu. In this chart, the
True Positive Rate or Sensitivity is plotted against the Predictive Positive Rate
or Support.
See the chapter “Score New Data” in the Analytic Solver Data Mining User
Guide for more information on LR_Stored.

Logistic Regression Options


The following options appear on one of the five Logistic Regression dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 324


Variables In Input Data
All variables in the dataset are listed here.

Selected Variables
Variables listed here will be utilized in the Logistic Regression algorithm.

Weight Variable
One major assumption of Logistic Regression is that each observation provides
equal information. Analytic Solver Data Mining offers an opportunity to
provide a Weight variable. Using a Weight variable allows the user to allocate a
weight to each record. A record with a large weight will influence the model
more than a record with a smaller weight.

Output Variable
Select the variable whose outcome is to be predicted. The classes in the output
variable must be equal to 2.

Number of Classes
Displays the number of classes in the Output variable.

Success Class
This option is selected by default. Select the class to be considered a “success”
or the significant class in the Lift Chart. This option is enabled when the
number of classes in the output variable is equal to 2.

Success Probability Cutoff


Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less
than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 325


Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialogo. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
classification method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Rescale Data
Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.
Note: Rescaling has no substantial effect in Logistic Regression other than proportional
scaling.

Frontline Solvers Analytic Solver Data Mining Reference Guide 326


Prior Probability
Click Prior Probability to open the dialog below. Three options appear in the
Prior Probability Dialog: Empirical, Uniform and Manual.

• If the first option is selected, Empirical, Analytic Solver Data Mining


will assume that the probability of encountering a particular class in the
dataset is the same as the frequency with which it occurs in the training
data.
• If the second option is selected, Uniform, Analytic Solver Data Mining
will assume that all classes occur with equal probability.
• Select the third option, Manual, to manually enter the desired class and
probability.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by selecting Partition Data on the
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
classification method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Fit Intercept
When this option is selected, the default setting, Analytic Solver Data Mining
will fit the Logistic Regression intercept. If this option is not selected, Analytic
Solver Data Mining will force the intercept term to 0.

Iterations (Max)
Estimating the coefficients in the Logistic Regression algorithm requires an
iterative non-linear maximization procedure. You can specify a maximum
number of iterations to prevent the program from getting lost in very lengthy
iterative loops. This value must be an integer greater than 0 or less than or equal
to 100 (1< value <= 100).

Frontline Solvers Analytic Solver Data Mining Reference Guide 327


Variance – Covariance Matrix
When this option is selected, Analytic Solver Data Mining will display the
coefficient covariance matrix in the output. Entries in the matrix are the
covariances between the indicated coefficients. The “on-diagonal” values are
the estimated variances of the corresponding coefficients.

Multicollinearity Diagnostics
At times, variables can be highly correlated with one another which can result in
large standard errors for the affected coefficients. Analytic Solver Data Mining
will display information useful in dealing with this problem if Multicollinearity
Diagnostics is selected.

Analysis Of Coefficients
When this option is selected, Analytic Solver Data Mining will produce a table
with all coefficient information such as the Estimate, Odds, Standard Error, etc.
When this option is not selected, Analytic Solver Data Mining will only print the
Estimates.

Feature Selection
When you have a large number of predictors and you would like to limit the
model to only significant variables, click Feature Selection to open the Feature
Selection dialog and select Perform Feature Selection at the top of the dialog.
Maximum Subset Size can take on values of 1 up to N where N is the number of
Selected Variables. If no Categorical Variables exist, the default for this option
is N. If one or more Categorical Variables exist, the default is "15".

Analytic Solver Data Mining offers five different selection procedures for
selecting the best subset of variables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 328


• Backward Elimination in which variables are eliminated one at a time,
starting with the least significant. If this procedure is selected, FOUT
is enabled. A statistic is calculated when variables are eliminated. For
a variable to leave the regression, the statistic’s value must be less than
the value of FOUT (default = 2.71).
• Forward Selection in which variables are added one at a time, starting
with the most significant. If this procedure is selected, FIN is enabled.
On each iteration of the Forward Selection procedure, each variable is
examined for the eligibility to enter the model. The significance of
variables is measured as a partial F-statistic. Given a model at a current
iteration, we perform an F Test, testing the null hypothesis stating that
the regression coefficient would be zero if added to the existing set if
variables and an alternative hypothesis stating otherwise. Each variable
is examined to find the one with the largest partial F-Statistic. The
decision rule for adding this variable into a model is: Reject the null
hypothesis if the F-Statistic for this variable exceeds the critical value
chosen as a threshold for the F Test (FIN value), or Accept the null
hypothesis if the F-Statistic for this variable is less than a threshold. If
the null hypothesis is rejected, the variable is added to the model and
selection continues in the same fashion, otherwise the procedure is
terminated.
• Sequential Replacement in which variables are sequentially replaced
and replacements that improve performance are retained.
• Stepwise selection is similar to Forward selection except that at each
stage, Analytic Solver Data Mining considers dropping variables that
are not statistically significant. When this procedure is selected, the
Stepwise selection options FIN and FOUT are enabled. In the stepwise
selection procedure a statistic is calculated when variables are added or
eliminated. For a variable to come into the regression, the statistic’s
value must be greater than the value for FIN (default = 3.84). For a
variable to leave the regression, the statistic’s value must be less than
the value of FOUT (default = 2.71). The value for FIN must be greater
than the value for FOUT.
• Best Subsets where searches of all combinations of variables are
performed to observe which combination has the best fit. (This option
can become quite time consuming depending on the number of input
variables.) If this procedure is selected, Number of best subsets is
enabled.

Score Training Data


Select these options to show an assessment of the performance of the algorithm
in classifying the training data. The report is displayed according to your

Frontline Solvers Analytic Solver Data Mining Reference Guide 329


specifications - Detailed, Summary and Lift charts. Lift charts are only
available when the Output Variable contains 2 categories.

Score Validation Data


These options are enabled when a validation dataset is present. Select these
options to show an assessment of the performance of the algorithm in classifying
the validation data. The report is displayed according to your specifications -
Detailed, Summary and Lift charts. Lift charts are only available when the
Output Variable contains 2 categories.

Score Test Data


These options are enabled when a test dataset is present. Select these options to
show an assessment of the performance of the algorithm in classifying the test
data. The report is displayed according to your specifications - Detailed,
Summary and Lift charts. Lift charts are only available when the Output
Variable contains 2 categories.

Score New Data


For information on scoring in a worksheet or database, please see the chapters
“Scoring New Data” and “Scoring Test Data” in the Analytic Solver Data
Mining User Guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 330


k – Nearest Neighbors
Classification Method

Introduction
K-nearest neighbors is a simple but powerful classifier. This method classifies a
given record based on the predominant classification of it's "k" nearest neighbor
records.
The k-Nearest Neighbors Classifier performs the following steps for each record
in the dataset.
1. The Euclidean Distance between the given record and all remaining records
is calculated. In order for this distance measure to be accurate, all variables
must be scaled appropriately.
2. The classification of the "k" nearest neighbors is examined. The
predominant classification is assigned to the given row.
3. This procedure is repeated for all remaining rows.
Analytic Solver Data Mining allows the user to select a maximum value for k
and builds models in parallel on all values of k up to the maximum specified
value. Additional scoring can be performed on the best of these models.
As k increases, the computing time will also increase. If a high value of k is
selected, such as 18 or 20, the risk of underfitting the data is high. Conversely, a
low value of k, such as 1 or 2, runs the risk of overfitting the data. In most
applications, k is in units of tens rather than in hundreds or thousands.

k-Nearest Neighbors Classification Example


The example below illustrates’ the use of Analytic Solver Data Mining’s k-
Nearest Neighbors classification method using the well-known Iris dataset. This
dataset was introduced by R. A. Fisher and reports four characteristics of three
species of the Iris flower. A portion of the dataset is shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 331


First, we partition the data using a standard partition with percentages of 60%
training and 40% validation (the default settings for the Automatic choice).
For more information on how to partition a dataset, please see the previous Data
Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 332


Click Classify – k-Nearest Neighbors to open the k-Nearest Neighbors
Classification dialog.
Select Petal_width, Petal_length, Sepal_width, and Sepal_length under
Variables in Input Data then click > to select as Selected Variables. Select
Species_name as the Output Variable.
Note: Since the variable Species_No is perfectly predictive of the output
variable, Species_name, it will not be included in the model.
Once the Output Variable is selected, Number of Classes (3) will be filled
automatically. Since our output variable contains more than 2 classes, Success
Class and Success Probability Cutoff are disabled.

Click Next to advance to the Parameters dialog.


Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by selecting Partition Data on the
Parameters dialog. If this option is selected, Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the classification method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.
Click Rescale Data, to open the Rescaling Dialog. Recall that the Euclidean
distance measurement performs best when each variable is rescaled. Use
Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 333


Click "Done" to accept the default of Standardization and close the dialog.
Enter 10 for # Neighbor. (This number is based on standard practice from the
literature.) This is the parameter k in the k-Nearest Neighbor algorithm. If the
number of observations (rows) is less than 50 then the value of k should be
between 1 and the total number of observations (rows). If the number of rows is
greater than 50, then the value of k should be between 1 and 50. Note that if k is
chosen as the total number of observations in the training set, then for any new
observation, all the observations in the training set become nearest neighbors.
The default value for this option is 1.
Select Search 1..K under Nearest Neighbors Search. When this option is
selected, Analytic Solver Data Mining will display the output for the best k
between 1 and the value entered for # Neighbors. If Fixed K is selected, the
output will be displayed for the specified value of k.
Click Prior Probability to open the is selected. Analytic Solver Data Mining
will incorporate prior assumptions about how frequently the different classes
occur and will assume that the probability of encountering a particular class in
the data set is the same as the frequency with which it occurs in the training
dataset.
• If Empirical is selected, Analytic Solver Data Mining will assume that
the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
• If Uniform is selected, Analytic Solver Data Mining will assume that
all classes occur with equal probability.
• If Manual is selected, the user can enter the desired class and
probability value.

For this example, click Done to select the default of Empirical and close the
dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 334


Click Next to advance to the Scoring dialog.
Summary Report under both Score Training Data and Score Validation Data is
selected by default. Select Detailed Report under both Score Training Data
and Score Validation Data. Analytic Solver Data Mining will create detailed
and summary reports for both the training and validation sets.
Lift charts are disabled since there are more than 2 categories in our Output
Variable, Species_name. Since we did not create a test partition, the options for
Score test data are disabled. See the chapter “Data Mining Partitioning” for
information on how to create a test partition.
For more information on the Score new data options, please see the “Scoring
New Data” chapter in the Analytic Solver Data Mining User Guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 335


Click Finish to run the k-Nearest Neighbors Classification method. Results are
inserted to the right. Double click the KNNC_Output sheet. The top part of
this sheet contains all of our inputs. At the top of this sheet is the Output
Navigator.

Click the Search Log link to view the Search log. (This output is produced
because we selected Seach 1..k on the Parameters tab. If this option had not
been selected, this output would not be produced.)

The Validation error log for the different k's lists the % Errors for all values of k
for both the training and validation data sets. The k with the smallest % Error in
the Validation partition is selected as the “Best k”. Scoring is performed later
using this best value of k.
Click the KNNC_TrainingScore tab to view the Training Data scoring tables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 336


This Summary report tallies the actual and predicted classifications. (Predicted
classifications were generated by applying the model to the validation data.)
Correct classification counts are along the diagonal from the upper left to the
lower right.
There were 4 records mislabeled in the Training partition: 3 records assigned to
the Versicolor class when they should have been assigned to the Verginica class
and 1 record assigned to the Verginica class that should have been assigned to
the Versicolor class. The total misclassification error is 4.44%.
Any misclassified records would appear under Training: Classification Details
in red.
Click Validation: Classification Summary link on the Output Navigator to be
routed to the Validation Data Classification Summary.

When the model was applied to the Validation partition, no records were
misclassified.

k-Nearest Neighbors Classification Options


The following options appear on the k-Nearest Neighbors Classification dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 337


Variables in input data
The variables in the dataset appear here.

Selected variables
The variables selected as input variables appear here

Output variable
The variable to be classified is entered here.

Number of Classes
The number of classes in the output variable appear here.

Success Class
This option is selected by default. Select the class to be considered a “success”
or the significant class in the Lift Chart. This option is enabled when the
number of classes in the output variable is equal to 2.

Success Probability Cutoff


Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less

Frontline Solvers Analytic Solver Data Mining Reference Guide 338


than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.

# Neighbors (k)
This is the parameter k in the k-Nearest Neighbor algorithm.

Nearest Neighbors Search


If Search 1..K is selected, Analytic Solver Data Mining will display the output
for the best k between 1 and the value entered for # Neighbors (k).
If Fixed K selected, the output will be displayed for the specified value of k.

Frontline Solvers Analytic Solver Data Mining Reference Guide 339


Prior Probabilities
Analytic Solver Data Mining will incorporate prior assumptions about how
frequently the different classes occur and will assume that the probability of
encountering a particular class in the data set is the same as the frequency with
which it occurs in the training dataset.
• If Empirical is selected, Analytic Solver Data Mining will assume that
the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
• If Uniform is selected, Analytic Solver Data Mining will assume that
all classes occur with equal probability.
• If Manual is selected, the user can enter the desired class and
probability value.

Rescale Data
Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 340


Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by selecting Partition Options on the
Parameters dialog. If this option is selected, Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the classification method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.

Score Training Data


Select these options to show an assessment of the performance of the algorithm
in classifying the training data. The report is displayed according to your
specifications - Detailed, Summary, and Lift charts. Lift charts are only
available when the Output Variable contains 2 categories.

Score Validation Data


These options are enabled when a validation dataset exists. Select to show an
assessment of the performance of the algorithm in classifying the validation
data. The report is displayed according to your specifications - Detailed,
Summary, and Lift charts. Lift charts are only available when the Output
Variable contains 2 categories.

Score Test Data


These options are enabled when a test dataset exists. Select to show an
assessment of the performance of the algorithm in classifying the test data. The
report is displayed according to your specifications - Detailed, Summary, and

Frontline Solvers Analytic Solver Data Mining Reference Guide 341


Lift charts. Lift charts are only available when the Output Variable contains 2
categories.

Score New Data


For more information on the Score new data options, please see the “Scoring
New Data” chapter within the Analytic Solver Data Mining User Guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 342


Classification Tree
Classification Method

Introduction
Classification tree methods (also known as decision tree methods) are a good
choice when the data mining task is classification or prediction of outcomes.
The goal of this algorithm is to generate rules that can be easily understood,
explained, and translated into SQL or a natural query language.
A Classification tree labels, records and assigns variables to discrete classes and
can also provide a measure of confidence that the classification is correct. The
tree is built through a process known as binary recursive partitioning. This is an
iterative process of splitting the data into partitions, and then splitting it up
further on each of the branches.
Initially, a training set is created where the classification label (say, "purchaser"
or "non-purchaser") is known (pre-classified) for each record. In the next step,
the algorithm systematically assigns each record to one of two subsets on the
some basis, for example income >= $75,000 or income < $75,000). The object is
to attain as homogeneous set of labels (say, "purchaser" or "non-purchaser") as
possible in each partition. This splitting (or partitioning) is then applied to each
of the new partitions. The process continues until no more useful splits can be
found. The heart of the algorithm is the rule that determines the initial split rule
(see figure below).

As explained above, the process starts with a training set consisting of pre-
classified records (target field or dependent variable with a known class or label
such as "purchaser" or "non-purchaser"). The goal is to build a tree that
distinguishes among the classes. For simplicity, assume that there are only two
target classes and that each split is a binary partition. The splitting criterion
easily generalizes to multiple classes, and any multi-way partitioning can be
achieved through repeated binary splits. To choose the best splitter at a node, the
algorithm considers each input field in turn. In essence, each field is sorted.
Then, every possible split is tried and considered, and the best split is the one
which produces the largest decrease in diversity of the classification label within

Frontline Solvers Analytic Solver Data Mining Reference Guide 343


each partition (this is just another way of saying "the increase in homogeneity").
This is repeated for all fields, and the winner is chosen as the best splitter for
that node. The process is continued at subsequent nodes until a full tree is
generated.
Analytic Solver Data Mining uses the Gini index as the splitting criterion, which
is a very commonly used measure of inequality. The index fluctuates between a
value of 0 and 1. A Gini index of 0 would indicate that all records in the node
belong to the same category. A Gini index of 1 would indicate that each record
in the node belongs to a different category. For a complete discussion of this
index, please see Leo Breiman’s and Richard Friedman’s book, Classification
and Regression Trees (3).

Pruning the tree


Pruning is the process of removing leaves and branches to improve the
performance of the decision tree when moving from the training data (where the
classification is known) to real-world applications (where the classification is
unknown). The tree-building algorithm makes the best split at the root node
where there are the largest number of records and, hence, considerable
information. Each subsequent split has a smaller and less representative
population with which to work. Towards the end, idiosyncrasies of training
records at a particular node display patterns that are peculiar only to those
records. These patterns can become meaningless and sometimes harmful for
prediction if you try to extend rules based on them to larger populations.
For example, say the classification tree is trying to predict height and it comes to
a node containing one tall person X and several other shorter people. The
algorithm can decrease diversity at that node by a new rule imposing "people
named X are tall" and thus classify the training data. In the real world this rule is
obviously inappropriate. Pruning methods solve this problem -- they let the tree
grow to maximum size, then remove smaller branches that fail to generalize.
(Note: In practice, we do not include irrelevant fields such as "name", this is
simply used an illustration.)
Since the tree is “grown” from the training data set, when it has reached full
structure it usually suffers from over-fitting (i.e. it is "explaining" random
elements of the training data that are not likely to be features of the larger
population of data). This results in poor performance on real life data.
Therefore, trees must be pruned using the validation data set.

Single Tree Classification Tree Example


This example illustrates how to create a classification tree using the single
classification tree method. We will use the Boston_Housing.xlsx dataset to
illustrate this method.
Click Help – Examples, then Forecasting/Data Mining Examples to open the
Boston_Housing.xlsx datasetThis dataset includes fourteen variables pertaining
to housing prices from census tracts in the Boston area collected by the US
Census Bureau.
CRIM Per capita crime rate by town
ZN Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)

Frontline Solvers Analytic Solver Data Mining Reference Guide 344


RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to 1940
DIS Weighted distances to five Boston employment centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of African-Americans by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's
The figure below displays a portion of the data; observe the last column (CAT.
MEDV). This variable has been derived from the MEDV variable by assigning
a 1 for MEDV levels above 30 (>= 30) and a 0 for levels below 30 (<30) and
will not be used in this example.

First, we partition the data into training and validation sets using the Standard
Data Partition defaults of 60% of the data randomly allocated to the Training Set
and 40% of the data randomly allocated to the Validation Set. For more
information on partitioning a dataset, see the Data Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 345


Click Classify – Classification Tree to open the Classification Tree dialog.
Select CAT. MEDV as the Output variable. Then select CHAS for Categorical
Variables and all remaining variables except MEDV and Record ID as
Selected Variables. The MEDV variable is not included in the Input since the
CAT. MEDV variable is derived from the MEDV variable.
Choose the value that will be the indicator of “Success” by clicking the down
arrow next to Success Class. In this example, we will use the default of 1.
Enter a value between 0 and 1 for Success Probability Cutoff. If the Probability
of success (probability of the output variable = 1) is less than this value, then a 0
will be entered for the class value, otherwise a 1 will be entered for the class
value. In this example, we will keep the default of 0.5.

Frontline Solvers Analytic Solver Data Mining Reference Guide 346


Click Next to advance to the Classification Tree – Parameters dialog.
As discussed in previous sections, Analytic Solver Data Mining includes the
ability to partition a dataset from within a classification or prediction method by
clicking Partition Data on the Parameters dialog. Analytic Solver Data Mining
will partition your dataset (according to the partition options you set)
immediately before running the classification method. If partitioning has
already occurred on the dataset, this option will be disabled. For more
information on partitioning, please see the Data Mining Partitioning chapter.
In the Tree Growth section, select Levels, Nodes, Splits, and Records in
Terminal Nodes. Leave all selections at their default settings. Values entered
for these options limit tree growth, i.e. if 10 is entered for Levels, the tree will be
limited to 10 levels.
Click Prior Probability. Three options appear in the Prior Probability Dialog:
Empirical, Uniform and Manual.

Frontline Solvers Analytic Solver Data Mining Reference Guide 347


• If the first option is selected, Empirical, Analytic Solver Data Mining
will assume that the probability of encountering a particular class in the
dataset is the same as the frequency with which it occurs in the training
data.
• If the second option is selected, Uniform, Analytic Solver Data Mining
will assume that all classes occur with equal probability.
• Select the third option, Manual, to manually enter the desired class and
probability value.
Click Done to accept the default section, Empirical, and close the dialog.
Select Prune (Using Validation Set). (This option is enabled when a Validation
Dataset exists.) Analytic Solver Data Mining will prune the tree using the
validation set when this option is selected. (Pruning the tree using the validation
set reduces the error from over-fitting the tree to the training data.)
Click Tree for Scoring and select Fully Grown.
Select Show Feature Importance to include Feature Importance Data Table in
the output. This table shows the relative importance of the feature measured as
the reduction of the error criterion during the tree growth.
Leave Maximum Number of Levels at the default setting of 7. This option
specifies the maximum number of levels in the tree to be displayed in the output.
Select Trees to Display to select the types of trees to display: Fully Grown, Best
Pruned, Minimum Error or User Specified.
• Select Fully Grown to “grow” a complete tree using the training data.
• Select Best Pruned to create a tree with the fewest number of nodes,
subject to the constraint that the error be kept below a specified level
(minimum error rate plus the standard error of that error rate).
• Select Minimum error to produce a tree that yields the minimum
classification error rate when tested on the validation data.
• To create a tree with a specified number of decision nodes select User
Specified and enter the desired number of nodes.
Select Fully Grown, Best Pruned, and Minimum Error.

Frontline Solvers Analytic Solver Data Mining Reference Guide 348


Click Next to advance to the Classification Tree - Scoring dialog.
Summary Report is selected by default under both Score training data and Score
validation data. Select Detailed Report and Lift Charts under both Score
Training Data and Score Validation Data to produce a detailed assessment of
the performance of the tree in both sets. Since we did not create a test partition,
the options for Score test data are disabled. See the chapter “Data Mining
Partitioning” for information on how to create a test partition.
Please see the “Scoring New Data” chapter for information on the Score new
data options within the Analytic Solver Data Mining User Guide.

Click Finish. Output from the Classification Tree algorithm will be inserted
into the Model tab of the task pane under Reports – Clasification Tree. Open
CT_Output to view the Output Navigator. Click any link in this section to
navigate to various sections of the output.

Click CT_FullTree to view the full tree.

Frontline Solvers Analytic Solver Data Mining Reference Guide 349


Recall that the objective of this example is to classify each case as a 0 (low
median value) or a 1 (high median value). Consider the top decision node
(denoted by a circle). The label above this node indicates the variable
represented at this node (i.e. the variable selected for the first split) in this case,
RM (Average # of Rooms ). The value inside the node indicates the split
threshold. (Hover over the decision node to read the decision rule.) If the RM
value for a specific record is greater than or equal to 6.78 (RM >= 6.78), the
record will be assigned to the right node. If the RM value for the record is less
than 6.78, the value will be assigned to the left node. There are 51 records with
values for the RM variable greater than or equal to 6.78 while 253 records
contained RM values less than 6.78. We can think of records with an RM value
less than 6.78 (RM < 6.78) as tentatively classified as "0" (low median value).
Any record where RM >= 6.78 can be tentatively classified as a "1" (high
median value).
Let’s follow the tree as it descends to the left for a couple levels. The 253
records with RM values less than 6.78 are further split as we move down the
tree. The second split occurs with the LSTAT variable (percent of the
population that is of lower socioeconomic status). The LSTAT values for 4
records (out of 253) fell below the split value of 4.07. These records are
tentatively classified as a “1” meaning these records have low percentages of the
population with lower socioeconomic status. The LSTAT values for the
remaining 249 records are greater than or equal to 4.07, and are tentatively
classified as “0".
A square node indicates a terminal node, which means there are no further splits.
The 4 records split to the left from the LSTAT node are classified as “1” as
indicated by the “1” in the middle of the square node. There are no further splits
for this group. The path of their classification is: If few rooms and a low
percentage of the population with lower socioeconomic status, then classify as a
1.
The 249 records assigned to the right are split again on the DIS variable
(weighted distances to the five Boston employment centers). Records with a DIS
variable value greater than 1.23 (249 records) are assigned to the node to the
right (RM) and records with a DIS variable value less than or equal to 1.23 (3
records) are assigned to the terminal node to the left.

Frontline Solvers Analytic Solver Data Mining Reference Guide 350


The structure of the full tree will be clear by reading the Full – Grown Tree
Rules also on the CT_FullTree tab.

The first entry in this table shows a split on the RM variable with a split value of
6.78. The 304 total cases in the training partition and 202 cases in the validation
partition were split between nodes 2 (LeftChild column) and 3 (Rightchild
column).
Moving to NodeID 2 we find that, in the training partition, 253 cases were
assigned to this node (from node 1) which has a “0” value (Node Label column).
These cases were split on the LSTAT variable using a value of 4.07: 154 cases
were assigned to node 5 and 3 cases were assigned to node 4. In the Validation
Partition, 155 cases were assigned to this node (from node 1). These cases were
split on the same variable (LSTAT) and value (4.07): 154 cases were assigned
to node 5 and 1 case was assigned to node 4.
Moving to NodeID 4 we find that, in the training partition, 4 cases were
assigned to this node and in the validation partition, 1 case. No further splits
were made on this node.
Moving to NodeID 5, we find that, in the training partition, 249 cases were
assigned to this node and in the validation partition, 154 cases were assigned to
this node, both from node 5. From here, these cases were split on the DIS
variable using a value of 1.23 between nodes 8 and 9.
You can continue following the "rules" down to the last level of the tree, level
17.
Click the Min-Error Tree Rules (Using Validation Data) link on the Output
Navigator to view the Minimum Error Tree on CT_MinErrorTree.

Frontline Solvers Analytic Solver Data Mining Reference Guide 351


The "minimum error tree" is the tree that yields a minimum classification error
rate when tested on the validation data. The misclassification (error) rate is
measured as the tree is pruned. The tree that produces the lowest error rate is
selected. The Min Error Tree Rules can also be found on the CT_MinErrorTree
sheet.
Click the CT_BestTree tab to view the Best Pruned Tree and the Rules for the
Best Pruned Tree.

Note: The Best Pruned Tree is based on the validation data set, and is the
smallest tree whose misclassification rate is within one standard error of the
misclassification rate of the Minimum Error Tree
Click the CT_Output tab to navigate to the Training Log.

The training log, above, shows the misclassification (error) rate as each
additional node is added to the tree. Starting off at 0 nodes with the full data set,
all records would be classified as "low median value" (0).
Analytic Solver Data Mining chooses the number of decision nodes for the
pruned tree and the minimum error tree from the values of Validation MSE. In
the Prune log shown above, the smallest Validation MSE error belongs to the
trees with 3 and 4 decision nodes. Where there is a tie, meaning when two trees
have the exact same Error Rate, the tree with the smaller number of nodes is
selected. Therefore, the tree with two decision nodes is the Minimum Error
Tree – the tree with the smallest misclassification error in the validation dataset.
Click the Validation: Classification Summary link to navigate to the
Classification Confusion Matrix.

Frontline Solvers Analytic Solver Data Mining Reference Guide 352


The confusion matrix, above, displays counts for cases that were correctly and
incorrectly classified in the validation data set. There were 13 cases
misclassified in the validation dataset resulting in a % error of 6.44%.
Click the CT_ValidationLiftChart tab to find the Lift Chart, ROC Curve, and
Decile Chart for the Validation partition. Click the CT_TrainingLiftChart tab to
display these same charts created using the training partition.
Lift Charts and ROC Curves are visual aids that help users evaluate the
performance of their fitted models. Charts found on the CT_Training LiftChart
tab were calculated using the Training Data Partition. Charts found on the
CT_ValidationLiftChart tab were calculated using the Validation Data Partition.
It is good practice to look at both sets of charts to assess model performance on
both datasets.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select CT_TrainingLiftChart or CT_ValidationLiftChart for Worksheet
and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, ROC Curve, and Lift Charts for Training Partition

Decile-wise Lift Chart, ROC Curve, and Lift Charts for Valid Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 353


After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in decreasing order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the cumulative number of cases in decreasing
probability (on the x-axis) vs the cumulative number of true positives on the y-
axis. The baseline (red line connecting the origin to the end point of the blue
line) is a reference line. For a given number of cases on the x-axis, this line
represents the expected number of successes if no model existed, and instead
cases were selected at random. This line can be used as a benchmark to measure
the performance of the fitted model. The greater the area between the lift curve
and the baseline, the better the model. In the Training Lift chart, if we selected
100 cases as belonging to the success class and used the fitted model to pick the
members most likely to be successes, the lift curve tells us that we would be
right on about 46 of them. Conversely, if we selected 100 random cases, we
could expect to be right on about 15 of them.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
This bars in this chart indicate the factor by which the model outperforms a
random assignment, one decile at a time. Refer to the validation graph above.
In the first decile, taking the most expensive predicted housing prices in the
dataset, the predictive performance of the model is about 5 times better as
simply assigning a random predicted value.
The Regression ROC curve was updated in V2017. This new chart compares
the performance of the regressor (Fitted Predictor) with an Optimum Predictor
Curve and a Random Classifier curve. The Optimum Predictor Curve plots a
hypothetical model that would provide perfect classification results. The best
possible classification performance is denoted by a point at the top left of the
graph at the intersection of the x and y axis. This point is sometimes referred to
as the “perfect classification”. The closer the AUC is to 1, the better the
performance of the model. In the Validation Partition, AUC = .94 which
suggests that this fitted model is a good fit to the data.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 354


Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against the Predictive Positive Rate or
Support.
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Click the down arrow and select Gain Chart from the menu. In this chart, the
True Positive Rate or Sensitivity is plotted against the Predictive Positive Rate
or Support.
Analytic Solver Data Mining generates CT_Stored along with the other output.
Please refer to the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for details.

Classification Tree Options


The following options appear on one of the three Classification Tree dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 355


Variables In Input Data
The variables included in the dataset appear here.

Selected Variables
Variables selected to be included in the output appear here.

Output Variable
The dependent variable or the variable to be classified appears here.

Categorical Variables
Place categorical variables from the Variables listbox to be included in the
model by clicking the > command button. This classification algorithm will
accept non-numeric categorical variables.

Number of Classes
Displays the number of classes in the Output variable.

Frontline Solvers Analytic Solver Data Mining Reference Guide 356


Success Class
This option is selected by default. Select the class to be considered a “success”
or the significant class in the Lift Chart. This option is enabled when the
number of classes in the output variable is equal to 2.

Success Probability Cutoff


Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less
than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
classification method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Rescale Data
Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted

Frontline Solvers Analytic Solver Data Mining Reference Guide 357


Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Tree Growth
In the Tree Growth section, select Levels, Nodes, Splits, and Records in
Terminal Nodes. Values entered for these options limit tree growth, i.e. if 10 is
entered for Levels, the tree will be limited to 10 levels.

Prior Probability
Three options appear in the Prior Probability Dialog: Empirical, Uniform and
Manual.

• If the first option is selected, Empirical, Analytic Solver Data Mining


will assume that the probability of encountering a particular class in the
dataset is the same as the frequency with which it occurs in the training
data.
• If the second option is selected, Uniform, Analytic Solver Data Mining
will assume that all classes occur with equal probability.
• Select the third option, Manual, to manually enter the desired class and
probability value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 358


Prune (Using Validation Set)
If a validation partition exists, this option is enabled. When this option is
selected, Analytic Solver Data Mining will prune the tree using the validation
set. Pruning the tree using the validation set reduces the error from over-fitting
the tree to the training data.

Show Feature Importance


Select Feature Importance to include the Features Importance table in the
output. This table displays the variables that are included in the model along
with their Importance value.

Maximum Number of Levels


This option specifies the maximum number of levels in the tree to be displayed
in the output.
Note: If a tree is limited to X levels in the output (intentionally or due to
Analytic Solver Basic's limit of 7 levels), Analytic Solver will draw the first X
levels of the diagram.

Trees to Display
Select Trees to Display to select the types of trees to display: Fully Grown, Best
Pruned, Minimum Error or User Specified.
• Select Fully Grown to “grow” a complete tree using the training data.
• Select Best Pruned to create a tree with the fewest number of nodes,
subject to the constraint that the error be kept below a specified level
(minimum error rate plus the standard error of that error rate).
• Select Minimum error to produce a tree that yields the minimum
classification error rate when tested on the validation data.
• To create a tree with a specified number of decision nodes select User
Specified and enter the desired number of nodes.

Score Training Data


Select these options to show an assessment of the performance of the tree in
classifying the training data. The report is displayed according to your
specifications - Detailed, Summary, and Lift Charts. Lift charts are only
available when the Output Variable contains 2 categories.

Frontline Solvers Analytic Solver Data Mining Reference Guide 359


Score validation data
These options are enabled when a validation dataset is present. Select these
options to show an assessment of the performance of the tree in classifying the
validation data. The report is displayed according to your specifications -
Detailed, Summary, and Lift Charts. Lift charts are only available when the
Output Variable contains 2 categories.

Score test data


These options are enabled when a test set is present. Select these options to
show an assessment of the performance of the tree in classifying the test data.
The report is displayed according to your specifications - Detailed, Summary,
and Lift Charts. Lift charts are only available when the Output Variable
contains 2 categories.

Score new data


Please see the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for information on the Score new data options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 360


Naïve Bayes Classification
Method

Introduction
Suppose your data consists of fruits, described by their color and shape.
Bayesian classifiers operate by saying "If you see a fruit that is red and round,
which type of fruit is it most likely to be? In the future, classify red and round
fruit as that type of fruit."
A difficulty arises when you have more than a few variables and classes – an
enormous number of observations (records) would be required to estimate these
probabilities.
The Naive Bayes classification method avoids this problem by not requiring a
large number of observations for each possible combination of the variables.
Rather, the variables are assumed to be independent of one another and,
therefore the probability that a fruit that is red, round, firm, 3" in diameter, etc.
will be an apple can be calculated from the independent probabilities that a fruit
is red, that it is round, that it is firm, that it is 3" in diameter, etc.
In other words, Naïve Bayes classifiers assume that the effect of a variable value
on a given class is independent of the values of other variables. This assumption
is called class conditional independence and is made to simplify the
computation. In this sense, it is considered to be “Naïve”.
This assumption is a fairly strong assumption and is often not applicable.
However, bias in estimating probabilities often may not make a difference in
practice -- it is the order of the probabilities, not their exact values, which
determine the classifications.
Studies comparing classification algorithms have found the Naïve Bayesian
classifier to be comparable in performance with classification trees and neural
network classifiers. It has also been found that these classifiers exhibit high
accuracy and speed when applied to large databases.
A more technical description of the Naïve Bayesian classification method
follows.

Bayes Theorem
Let X be the data record (case) whose class label is unknown. Let H be some
hypothesis, such as "data record X belongs to a specified class C." For
classification, we want to determine P (H|X) -- the probability that the
hypothesis H holds, given the observed data record X.
P (H|X) is the posterior probability of H conditioned on X. For example, the
probability that a fruit is an apple, given the condition that it is red and round. In
contrast, P(H) is the prior probability, or apriori probability, of H. In this
example P(H) is the probability that any given data record is an apple, regardless
of how the data record looks. The posterior probability, P (H|X), is based on

Frontline Solvers Analytic Solver Data Mining Reference Guide 361


more information (such as background knowledge) than the prior probability,
P(H), which is independent of X.
Similarly, P (X|H) is posterior probability of X conditioned on H. That is, it is
the probability that X is red and round given that we know that it is true that X is
an apple. P(X) is the prior probability of X, i.e., it is the probability that a data
record from our set of fruits is red and round. Bayes theorem is useful in that it
provides a way of calculating the posterior probability, P(H|X), from P(H), P(X),
and P(X|H). Bayes theorem can be written as: P (H|X) = P(X|H) P(H) / P(X).

Naïve Bayes Classification Example


The following example illustrates Analytic Solver Data Mining’s Naïve Bayes
classification method. Click Help – Examples on the Data Mining ribbon, then
Forecasting/Data Mining Examples to open the Flying_Fitness.xlsx example
dataset. A portion of the dataset appears below.

First, we partition the data into training and validation sets using the Standard
Data Partition defaults of 60% of the data randomly allocated to the Training Set
and 40% of the data randomly allocated to the Validation Set. For more
information on partitioning a dataset, see the Data Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 362


Click Classify – Naïve Bayes. The following Naïve Bayes dialog appears.
Select Var2, Var3, Var4, Var5, and Var6 as Selected Variables and
TestRest/Var1 as the Output Variable. The Number of Classes statistic will be
automatically updated with a value of 2 when the Output Variable is selected.
This indicates that the Output variable, TestRest/Var1, contains two classes, 0
and 1.
Choose the value that will be the indicator of “Success” by clicking the down
arrow next to Success Class. In this example, we will use the default of 1
indicating that a value of “1” will be specified as a “success”.
Enter a value between 0 and 1 for Success Probability Cutoff. If the Probability
of success (probability of the output variable = 1) is less than this value, then a 0
will be entered for the class value, otherwise a 1 will be entered for the class
value. In this example, we will keep the default of 0.5.

Frontline Solvers Analytic Solver Data Mining Reference Guide 363


Click Next to advance to the Naïve Bayes – Parameters dialog.
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. If this option is selected, Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the classification method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.
On the Parameters dialog, click Prior Probability to calculate the Prior class
probabilities. When this option is selected, Analytic Solver Data Mining will
calculate the class probabilities from the training data . For the first class,
Analytic Solver Data Mining will calculate the probability using the number of
“0” records / total number of points. For the second class, Analytic Solver Data
Mining will calculate the probability using the number of “1” records / total
number of points.
• If the first option is selected, Empirical, Analytic Solver Data Mining
will assume that the probability of encountering a particular class in the
dataset is the same as the frequency with which it occurs in the training
data.
• If the second option is selected, Uniform, Analytic Solver Data Mining
will assume that all classes occur with equal probability.
• Select the third option, Manual, to manually enter the desired class and
probability value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 364


Click Done to accept the default setting, Empirical, and close the dialog.
In Analytic Solver, users have the ability to use Laplace smoothing, or not,
during model creation. In this example, leave Laplace Smoothing and
Pseudocount at their defaults.
If a particular realization of some feature never occurs in a given class in the
training partition, then the corresponding frequency-based prior conditional
probability estimate will be zero. For example, assume that you have trained a
model to classify emails using the Naïve Bayes Classifier with 2 classes: work
and personal. Assume that the model rates one email as having a high
probability of belonging to the "personal" class. Now assume that there is a 2nd
email that is the same as the previous email, but this email includes one word
that is different. Now, if this one word was not present in any of the “personal”
emails in the training partition, the estimated probability would be
zero. Consequently, the resulting product of all probabilities will be zero,
leading to a loss of all the strong evidence of this email to belong to a “personal”
class. To mitigate this problem, Analytic Solver Data Mining allows you to
specify a small correction value, known as a pseudocount, so that no probability
estimate is ever set to 0. Normalizing the Naïve Bayes classifier in this way is
called Laplace smoothing. Pseudocount set to zero is equivalent to no
smoothing. There are arguments in the literature which support a pseudocount
value of 1, although in practice fractional values are often used. When Laplace
Smoothing is selected, Analytic Solver Data Mining will accept any positive
value for pseudocount.
Under Naïve Bayes: Display, select Show Prior Conditional Probability and
Show Log-Density to add both in the output.

Click Next to advance to the Scoring dialog.


Select Detailed report and Lift Charts under both Score training data and
Score validation data. Summary report under both Score Training Data and
Score Validation Data are selected by default. These settings will allow us to
obtain the complete output results for this classification method. Since we did
not create a test partition, the options for Score test data are disabled. See the
chapter “Data Mining Partitioning” for information on how to create a test
partition.
For more information on the options for Score new data, please see the chapters
“Scoring New Data” and “Scoring Test Data" within the Analytic Solver Data
Mining User Guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 365


Click Finish to generate the output. Results are inserted to the right.
Click NB_Output to display the Output Navigator. Click any link to navigate to
the selected topic.

In this example, we are classifying pilots on whether they are fit to fly based on
various physical and psychological tests. Our output variable, TestRes/Var1 is 1
if the pilot is fit and 0 if not.
A Confusion Matrix is used to evaluate the performance of a classification
method. This matrix summarizes the records that were classified correctly and
those that were not.

TP stands for True Positive. These are the number of cases classified as
belonging to the Success class that actually were members of the Success class.
FN stands for False Negative. These are the number of cases that were
classified as belonging to the Failure class when they were actually members of
the Success class (i.e. patients with cancerous tumors who were told their tumors
were benign). FP stands for False Positive. These cases were assigned to the
Success class but were actually members of the Failure group (i.e. patients who
were told they tested postive for cancer when, in fact, their tumors were benign).
TN stands for True Negative. These cases were correctly assigned to the Failure
group.
Precision is the probability of correctly identifying a randomly selected record
as one belonging to the Success class (i.e. the probability of correctly identifying
a random patient with cancer as having cancer). Recall (or Sensitivity)
measures the percentage of actual positives which are correctly identified as
positive (i.e. the proportion of people with cancer who are correctly identified as
having cancer). Specificity (also called the true negative rate) measures the
percentage of failures correctly identified as failures (i.e. the proportion of
people with no cancer being categorized as not having cancer.) The F-1 score,
which fluctuates between 1 (a perfect classification) and 0, defines a measure
that balances precision and recall.
Precision = TP/(TP+FP)
Sensitivity or True Positive Rate (TPR) = TP/(TP + FN)
Specificity (SPC) or True Negative Rate =TN / (FP + TN)

Frontline Solvers Analytic Solver Data Mining Reference Guide 366


F1 = (2 * TP) /( 2TP + FP + FN))
Click the Training: Classification Summary and Validation: Classification
Summary links to view the Classification Summaries for both partitions.

In the Training Dataset, we see 4 records were misclassified giving a


misclassification error of 16.67%. Four (4) records were misclassified as
successes.

However, in the Validation Dataset, 7 records were correctly classified as


belonging to the Success class while 1 case was incorrectly assigned to the
Failure class. Six (6) cases were correctly classified as belonging to the Failure
class while 2 records were incorrectly classified as belonging to the Success
class. This resulted in a total classification error of 18.75%.
While predicting the class for the output variable, Analytic Solver Data Mining
calculates the conditional probability that the variable may be classified to a
particular class. In this example, the classes are 0 and 1. For every record in each
partition, the conditional probabilities for class - 0 and for class - 1 are
calculated. Analytic Solver Data Mining assigns the class to the output variable

Frontline Solvers Analytic Solver Data Mining Reference Guide 367


for which the conditional probability is the largest. Misclassified records will be
highlighted in red.
It's possible that a N/A "error" may be displayed in the Classification table.
These appear when the Naïve Bayes classifier is unable to classify specific
patterns because they have not been seen in the training dataset. Rows of such
partitions with unseen values are considered to be outliers. When N/A’s are
present, Lift charts will not be available for that dataset.
Click the NB_LogDensity tab to view the Log Densities for each partition.
Log PDF, or Logarithm of Unconditional Probability Density, is the distribution
of the predictors marginalized over the classes and is computed using:

𝐶 𝐶

log[𝑃{𝑋1 , … , 𝑋𝑛 }] = log [∑ 𝑃{𝑋1 , … , 𝑋𝑛 , 𝑌 = 𝑐} ] = log [∑ 𝜋{𝑌 = 𝑐} 𝑃{𝑋1 , … , 𝑋𝑛 |𝑌 = 𝑐} ]


𝑐=1 𝑐=1
where𝜋{𝑌 = 𝑐} is a prior class probability

Click the Prior Conditional Probability: Training link to display the table
below. This table shows the probabilities for each case by variable. For
example, for Var2, 21% of the records where Var2 = 0 were assigned to Class 0,
57% of the records where Var2 = 1 were assigned to Class 0 and 21% of the
records where Var2 = 2 were assigned to Class 0.

Frontline Solvers Analytic Solver Data Mining Reference Guide 368


Click the NB_TrainingLiftChart amd NB_ValidationLiftChart tabs to find the
Lift Chart, ROC Curve, and Decile Chart for both the Training and Validation
partitions.
Lift Charts and ROC Curves are visual aids that help users evaluate the
performance of their fitted models. Charts found on the NB_Training LiftChart
tab were calculated using the Training Data Partition. Charts found on the
NB_ValidationLiftChart tab were calculated using the Validation Data Partition.
It is good practice to look at both sets of charts to assess model performance on
both datasets.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select CT_TrainingLiftChart or CT_ValidationLiftChart for Worksheet
and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, ROC Curve, and Lift Charts for Training Partition

Decile-wise Lift Chart, ROC Curve, and Lift Charts for Valid. Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 369


After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in decreasing order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the cumulative number of cases in decreasing
probability (on the x-axis) vs the cumulative number of true positives on the y-
axis. The baseline (red line connecting the origin to the end point of the blue
line) is a reference line. For a given number of cases on the x-axis, this line
represents the expected number of successes if no model existed, and instead
cases were selected at random. This line can be used as a benchmark to measure
the performance of the fitted model. The greater the area between the lift curve
and the baseline, the better the model. In the Training Lift chart, if we selected
10 cases as belonging to the success class and used the fitted model to pick the
members most likely to be successes, the lift curve tells us that we would be
right on about 9 of them. Conversely, if we selected 10 random cases, we could
expect to be right on about 4 of them. The Validation Lift chart tells us that we
could expect to see the Random model perform the same or better on the
validation partition than our fitted model.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
This bars in this chart indicate the factor by which the model outperforms a
random assignment, one decile at a time. Refer to the validation graph above.
In the first decile, the predictive performance of the model is about 1.8 times
better as simply assigning a random predicted value.
The Regression ROC curve was updated in V2017. This new chart compares
the performance of the regressor (Fitted Predictor) with an Optimum Predictor
Curve and a Random Classifier curve. The Optimum Predictor Curve plots a
hypothetical model that would provide perfect classification results. The best
possible classification performance is denoted by a point at the top left of the
graph at the intersection of the x and y axis. This point is sometimes referred to
as the “perfect classification”. The closer the AUC is to 1, the better the
performance of the model. In the Validation Partition, AUC = .43 which
suggests that this fitted model is not a good fit to the data.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 370


Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against the Predictive Positive Rate or
Support.
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Click the down arrow and select Gain Chart from the menu. In this chart, the
True Positive Rate or Sensitivity is plotted against the Predictive Positive Rate
or Support.
Please see the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for information on NB_Stored.

Naïve Bayes Classification Method Options


The options below appear on one of the three Naïve Bayes classification
methods’ dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 371


Variables in input data
The variables included in the dataset appear here.

Selected Variables
Variables selected to be included in the output appear here.

Output Variable
The dependent variable or the variable to be classified appears here.

Number of Classes
Displays the number of classes in the Output variable.

Success Class
This option is selected by default. Select the class to be considered a “success”
or the significant class in the Lift Chart. This option is enabled when the
number of classes in the output variable is equal to 2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 372


Success Probability Cutoff
Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less
than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. If this option is selected, Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the classification method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.

Prior Probability
Click Prior Probability. Three options appear in the Prior Probability Dialog:
Empirical, Uniform and Manual.

Frontline Solvers Analytic Solver Data Mining Reference Guide 373


If the first option is selected, Empirical, Analytic Solver Data Mining will
assume that the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
If the second option is selected, Uniform, Analytic Solver Data Mining will
assume that all classes occur with equal probability.
Select the third option, Manual, to manually enter the desired class and
probability value.

Laplace Smoothing
If a particular realization of some feature never occurs in a given class in the
training partition, then the corresponding frequency-based prior conditional
probability estimate will be zero. For example, assume that you have trained a
model to classify emails using the Naïve Bayes Classifier with 2 classes: work
and personal. Assume that the model rates one email as having a high
probability of belonging to the "personal" class. Now assume that there is a 2nd
email that is the same as the previous email, but this email includes one word
that is different. Now, if this one word was not present in any of the “personal”
emails in the training partition, the estimated probability would be
zero. Consequently, the resulting product of all probabilities will be zero,
leading to a loss of all the strong evidence of this email to belong to a “personal”
class. To mitigate this problem, Analytic Solver Data Mining allows you to
specify a small correction value, known as a pseudocount, so that no probability
estimate is ever set to 0. Normalizing the Naïve Bayes classifier in this way is
called Laplace smoothing. Pseudocount set to zero is equivalent to no
smoothing. There are arguments in the literature which support a pseudocount
value of 1, although in practice, fractional values are often used. When Laplace
Smoothing is selected, Analytic Solver Data Mining will accept any positive
value for pseudocount.

Show Prior Conditional Probability


Select this option to print Prior Conditional Probability for the training partition
in the output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 374


Show Log-Density
Select this option to print the Log-Density values for each partition in the output.
Log PDF, or Logarithm of Unconditional Probability Density, is the distribution
of the predictors marginalized over the classes and is computed using:

𝐶 𝐶

log[𝑃{𝑋1 , … , 𝑋𝑛 }] = log [∑ 𝑃{𝑋1 , … , 𝑋𝑛 , 𝑌 = 𝑐} ] = log [∑ 𝜋{𝑌 = 𝑐} 𝑃{𝑋1 , … , 𝑋𝑛 |𝑌 = 𝑐} ]


𝑐=1 𝑐=1
where𝜋{𝑌 = 𝑐} is a prior class probability
y

Score Training Data


Select these options to show an assessment of the performance of the tree in
classifying the training data. The report is displayed according to your
specifications - Detailed, Summary and Lift Charts. Lift charts are only
available when the output variable has 2 classes.

Score Validation Data


These options are enabled when a validation dataset is present. Select these
options to show an assessment of the performance of the tree in classifying the
validation data. The report is displayed according to your specifications -
Detailed, Summary and Lift Charts. Lift charts are only available when the
output variable has 2 classes.

Score Test Data


These options are enabled when a test dataset is present. Select these options to
show an assessment of the performance of the tree in classifying the test data.
The report is displayed according to your specifications - Detailed, Summary
and Lift Charts. Lift charts are only available when the output variable has 2
classes.

Score New Data


Please see the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for information on the Score New Data options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 375


Neural Network Classification
Method

Introduction
Artificial neural networks are relatively crude electronic networks of "neurons"
based on the neural structure of the brain. They process records one at a time,
and "learn" by comparing their classification of the record (which, at the outset,
is largely arbitrary) with the known actual classification of the record. The errors
from the initial classification of the first record is fed back into the network, and
used to modify the networks algorithm the second time around, and so on for
many iterations.
Roughly speaking, a neuron in an artificial neural network is
1. A set of input values (xi) and associated weights (wi)
2. A function (g) that sums the weights and maps the results to an output
(y).

Neurons are organized into layers: input, hidden and output. The input layer is
composed not of full neurons, but rather consists simply of the record’s values
that are inputs to the next layer of neurons. The next layer is the hidden layer.
Several hidden layers can exist in one neural network. The final layer is the
output layer, where there is one node for each class. A single sweep forward
through the network results in the assignment of a value to each output node,
and the record is assigned to the class node with the highest value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 376


Training an Artificial Neural Network
In the training phase, the correct class for each record is known (this is termed
supervised training), and the output nodes can therefore be assigned "correct"
values -- "1" for the node corresponding to the correct class, and "0" for the
others. (In practice, better results have been found using values of “0.9” and
“0.1”, respectively.) It is thus possible to compare the network's calculated
values for the output nodes to these "correct" values, and calculate an error term
for each node (the "Delta" rule). These error terms are then used to adjust the
weights in the hidden layers so that, hopefully, during the next iteration the
output values will be closer to the "correct" values.

The Iterative Learning Process


A key feature of neural networks is an iterative learning process in which
records (rows) are presented to the network one at a time, and the weights
associated with the input values are adjusted each time. After all cases are
presented, the process is often repeated. During this learning phase, the network
“trains” by adjusting the weights to predict the correct class label of input
samples. Advantages of neural networks include their high tolerance to noisy
data, as well as their ability to classify patterns on which they have not been
trained. The most popular neural network algorithm is the back-propagation
algorithm proposed in the 1980's.
Once a network has been structured for a particular application, that network is
ready to be trained. To start this process, the initial weights (described in the
next section) are chosen randomly. Then the training, or learning, begins.
The network processes the records in the training data one at a time, using the
weights and functions in the hidden layers, then compares the resulting outputs
against the desired outputs. Errors are then propagated back through the system,
causing the system to adjust the weights for application to the next record. This
process occurs over and over as the weights are continually tweaked. During the
training of a network the same set of data is processed many times as the
connection weights are continually refined.
Note that some networks never learn. This could be because the input data does
not contain the specific information from which the desired output is derived.
Networks also will not converge if there is not enough data to enable complete
learning. Ideally, there should be enough data available to create a validation set.

Frontline Solvers Analytic Solver Data Mining Reference Guide 377


Feedforward, Back-Propagation
The feedforward, back-propagation architecture was developed in the early
1970's by several independent sources (Werbor; Parker; Rumelhart, Hinton and
Williams). This independent co-development was the result of a proliferation of
articles and talks at various conferences which stimulated the entire industry.
Currently, this synergistically developed back-propagation architecture is the
most popular, effective, and easy-to-learn model for complex, multi-layered
networks. Its greatest strength is in non-linear solutions to ill-defined problems.
The typical back-propagation network has an input layer, an output layer, and at
least one hidden layer. There is no theoretical limit on the number of hidden
layers but typically there are just one or two. Some studies have shown that the
total number of layers needed to solve problems of any complexity is 5 (one
input layer, three hidden layers and an output layer). Each layer is fully
connected to the succeeding layer.
As noted above, the training process normally uses some variant of the Delta
Rule, which starts with the calculated difference between the actual outputs and
the desired outputs. Using this error, connection weights are increased in
proportion to the error times, which are a scaling factor for global accuracy. This
means that the inputs, the output, and the desired output all must be present at
the same processing element. The most complex part of this algorithm is
determining which input contributed the most to an incorrect output and how
must the input be modified to correct the error. (An inactive node would not
contribute to the error and would have no need to change its weights.) To solve
this problem, training inputs are applied to the input layer of the network, and
desired outputs are compared at the output layer. During the learning process, a
forward sweep is made through the network, and the output of each element is
computed layer by layer. The difference between the output of the final layer
and the desired output is back-propagated to the previous layer(s), usually
modified by the derivative of the transfer function. The connection weights are
normally adjusted using the Delta Rule. This process proceeds for the previous
layer(s) until the input layer is reached.

Structuring the Network


The number of layers and the number of processing elements per layer are
important decisions. These parameters, to a feedforward, back-propagation
topology, are also the most ethereal - they are the "art" of the network designer.
There is no quantifiable, best answer to the layout of the network for any
particular application. There are only general rules picked up over time and
followed by most researchers and engineers applying this architecture to their
problems.
Rule One: As the complexity in the relationship between the input data and the
desired output increases, the number of the processing elements in the hidden
layer should also increase.
Rule Two: If the process being modeled is separable into multiple stages, then
additional hidden layer(s) may be required. If the process is not separable into
stages, then additional layers may simply enable memorization of the training
set, and not a true general solution.
Rule Three: The amount of training data available sets an upper bound for the
number of processing elements in the hidden layer(s). To calculate this upper
bound, use the number of cases in the training data set and divide that number
by the sum of the number of nodes in the input and output layers in the network.

Frontline Solvers Analytic Solver Data Mining Reference Guide 378


Then divide that result again by a scaling factor between five and ten. Larger
scaling factors are used for relatively less noisy data. If too many artificial
neurons are used the training set will be memorized, not generalized, and the
network will be useless on new data sets.

Automated Neural Network Classification Example


This example focuses on creating a Neural Network using an Automated
network architecture. See the section below for an example on creating a Neural
Network using a Manual Architecture.
Click Help – Examples on the Data Mining ribbon, then click
Forecasting/Data Mining Examples to open the file Wine.xlsx.
This file contains 13 quantitative variables measuring the chemical attributes of
wine samples from 3 different wineries (Type variable). The objective is to
assign a wine classification to each record. A portion of this dataset is shown
below.

First, we partition the data into training and validation sets using a Standard
Data Partition with percentages of 60% of the data randomly allocated to the
Training Set and 40% of the data randomly allocated to the Validation Set. For
more information on partitioning a dataset, see the Data Mining Partitioning
chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 379


Select a cell on the StdPartition worksheet, then click Classify – Neural
Network – Automatic Network on the Data Mining ribbon.
Select Type as the Output variable and the remaining variables as Selected
Variables.
Since the Output variable contains three classes (A, B, and C) to denote the three
different wineries, the options for Binary Classification are disabled. (The
options under Binary Classification are only enabled when the number of
classes is equal to 2.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 380


Click Next to advance to the next dialog.
When an automated network is created, several networks are run with increasing
complexity in the architecture. The networks are limited to 2 hidden layers and
the number of hidden neurons in each layer is bounded by UB1 = (#features +
#classes) * 2/3 on the 1st layer and UB2 = (UB1 + #classes) * 2/3 on the 2nd
layer.
First, all networks are trained with 1 hidden layer with the number of nodes not
exceeding the UB1 and UB2 bounds, then a second layer is added and a 2 –
layer architecture is tried until the UB2 limit is satisfied.
The limit on the total number of trained networks is the minimum of 100 and
(UB1 * (1+UB2)). In this dataset, there are 13 features in the model and 3
classes in the Type output variable giving the following bounds:
UB1 = FLOOR(13 + 3) * 2/3 = 10.67 ~ 10
UB2 = FLOOR(10 + 3) * 2/3 = 8.67 ~ 8
(where FLOOR rounds a number down to the nearest multiple of significance.)
# Networks Trained = MIN {100, (10 * (1 + 8)} = 90
As discussed in previous sections, Analytic Solver Data Mining includes the
ability to partition a dataset from within a classification or prediction method by
clicking Partition Data on the Parameters dialog. If this option is selected,
Analytic Solver Data Mining will partition your dataset (according to the
partition options you set) immediately before running the classification method.
If partitioning has already occurred on the dataset, this option will be disabled.

Frontline Solvers Analytic Solver Data Mining Reference Guide 381


For more information on partitioning, please see the Data Mining Partitioning
chapter.
Click Rescale Data to open the Rescaling dialog. Use Rescaling to normalize
one or more features in your data during the data preprocessing stage. Analytic
Solver Data Mining provides the following methods for feature scaling:
Standardization, Normalization, Adjusted Normalization and Unit Norm. For
more information on this new feature, see the Rescale Continuous Data section
within the Transform Continuous Data chapter that occurs earlier in this guide.
For this example, select Rescale Data and then select Normalization. Click
Done to close the dialog.

Note: When selecting a rescaling technique, it's recommended that you apply
Normalization ([0,1)] if Sigmoid is selected for Hidden Layer Activation and
Adjusted Normalization ([-1,1]) if Hyperbolic Tangent is selected for Hidden
Layer Activation. This applies to both classification and regression. Since we
will be using Logistic Sigmoid for Hidden Layer Activation, Normalization
was selected.
Click Prior Probability. Three options appear in the Prior Probability Dialog:
Empirical, Uniform and Manual.

If the first option is selected, Empirical, Analytic Solver Data Mining will
assume that the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
If the second option is selected, Uniform, Analytic Solver Data Mining will
assume that all classes occur with equal probability.
Select the third option, Manual, to manually enter the desired class and
probability value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 382


Click Done to close the dialog and accept the default setting, Empirical.
Users can change both the Training Parameters and Stopping Rules for the
Neural Network. Click Training Parameters to open the Training Parameters
dialog. For more information on these options, please see the Neural Network
Classification Options section below. For now, simply click Done to accept the
option defaults and close the dialog.

Click Stopping Rules to open the Stopping Rules dialog. Here users can specify
a comprehensive set of rules for stopping the algorithm early plus cross-
validation on the training error. For more information on these options, please
see the Neural Network Classification Options section below. For now, simply
click Done to accept the option defaults and close the dialog.

Keep the default selections for the Hidden Layer and Output Layer options. See
the Neural Network Classification Options section below for more information
on these options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 383


Click Finish. Output sheets are inserted to the right of the STDPartition
worksheet.
Double click NNC_Output to open.
The top section of the output includes the Output Navigator which can be used
to quickly navigate to various sections of the output. The Data, Variables, and
Parameters/Options sections of the output all reflect inputs chosen by the user.
A little further down is the Architecture Search Error Log, a portion is shown
below.

Notice the number of networks trained and reported in the Error Report was 90
(# Networks Trained = MIN {100, (10 * (1 + 8)} = 90).
This report may be sorted by each column by clicking the arrow next to each
column heading. Click the arrow next to Validation % Error and select Sort
Smallest to Largest from the menu. Then click the arrow next to Training %
Error and do the same to display all networks 0% Error in both the Training and
Validation sets.

Click a Net ID, say Net 2, hyperlink to bring up the Neural Network
Classification dialog. Click Finish to run the Neural Net Classification method
with Manual Architecture using the input and option settings specified for Net 2.
The layout of this report changes when the number of classes is reduced to two.
Please see the section NNC with Output Variable Containing 2 Classes below
for an example with a dataset that includes just two classes.

Frontline Solvers Analytic Solver Data Mining Reference Guide 384


Scroll down on the NNC_Output sheet to see the confusion matrices for each
Neural Network listed in the table above. Here’s the confusion matrices for Net
1and 2. These matrices expand upon the information shown in the Error Report
for each network ID.

Manual Neural Network Classification Example


This example uses the same partitioned dataset to illustrate the use of the
Manual Network Architecture selection.
Select a cell on the StdPartition worksheet, then click Classify – Neural
Network – Manual Network on the Data Mining ribbon. The Neural Network
Classification dialog appears.
Select Type as the Output variable and the remaining variables as Selected
Variables. Since the Output variable contains three classes (A, B, and C) to
denote the three different wineries, the options for Classes in the Output
Variable are disabled. (The options under Classes in the Output Variable are
only enabled when the number of classes is equal to 2.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 385


Click Next to advance to the next dialog.
As discussed in the previous sections, Analytic Solver Data Mining includes the
ability to partition a dataset from within a classification or prediction method by
clicking Partition Data Parameters dialog. Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the classification method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.
Click Rescale Data to open the Rescaling dialog.

Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.
Note: When selecting a rescaling technique, it's recommended that you apply
Normalization ([0,1)] if Sigmoid is selected for Hidden Layer Activation and
Adjusted Normalization ([-1,1]) if Hyperbolic Tangent is selected for Hidden
Layer Activation. However, in this particular example dataset, the Neural
Network algorithm perform best when Standardization was selected, rather than
Normalization.
For this example, select the option, Rescale Data, and then click Done to accept
the default selection, Standardization, and close the dialog.
Click Add Layer to add a hidden layer to the Neural Network. To remove a
layer, select the layer to be removed, then click Remove Layer. Enter 12 for
Neurons.
Keep the default selections for the Hidden Layer and Output Layer options. See
the Neural Network Classification Options section below for more information
on these options.
Click Prior Probability. Three options appear in the Prior Probability Dialog:
Empirical, Uniform and Manual.

Frontline Solvers Analytic Solver Data Mining Reference Guide 386


If the first option is selected, Empirical, Analytic Solver Data Mining will
assume that the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
If the second option is selected, Uniform, Analytic Solver Data Mining will
assume that all classes occur with equal probability.
Select the third option, Manual, to manually enter the desired class and
probability value.
Click Done to close the dialog and accept the default setting, Empirical.
Click Training Parameters to open the Training Parameters dialog. See the
Neural Network Options section below for more information on these options.
For now click Done to accept the default settings and close the dialog.

Click Stopping Rules to open the Stopping Rules dialog. Here users can specify
a comprehensive set of rules for stopping the algorithm early plus cross-
validation on the training error. Again, see the example above or the Neural
Network Options section below for more information on these parameters. For
now, click Done to accept the default settings and close the dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 387


Select Show Neural Network Weights to include this information in the output.
Keep the default selections for the Hidden Layer and Output Layer options. See
the Neural Network Classification Options section below for more information
on these options.

Click Next to advance to the Scoring dialog.


Select Detailed Report and Summary report under both Score Training Data
and Score Validation Data. Lift Charts are disabled since the number of classes
is greater than 2.
Since a Test Data partition was not created, the options under Score Test Data
are disabled. For information on how to create a test partition, see the "Data
Mining Partition" chapter.
For more information on the Score New Data options, see the “Scoring New
Data” chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 388


Click Finish to produce the output which is inserted to the right of out
Automatic Neural Network output sheets.
Click NNC_Output1 to view the Output Navigator.

Click the Train. Score – Summary link to display the Training: Classification
Summary. Here we can see how well the Neural Network Classification
algorithm performed. The algorithm finished with a classification error of
40.19%.

Click the Valid. Score – Summary link link to display the Validation:
Classification Summary. The algorithm had a misclassification error of 46.48%
in this partition.

Frontline Solvers Analytic Solver Data Mining Reference Guide 389


Click the Training Log link in the Output Navigator to display the Neural
Network Training Log. This log displays the Sum of Squared errors and
Misclassification errors for each epoch or iteration of the Neural Network.
Thirty epochs, or iterations, were performed.
During an epoch, each training record is fed forward in the network and
classified. The error is calculated and is back propagated for the weights
correction. Weights are continuously adjusted during the epoch. The
misclassification error is computed as the records pass through the network. This
table does not report the misclassification error after the final weight adjustment.
Scoring of the training data is performed using the final weights so the training
classification error may not exactly match with the last epoch error in the Epoch
log.

Analytic Solver Data Mining also provides intermediate information produced


during the last pass through the network. Click the Neuron Weights link in the
Output Navigator to view the Interlayer connections' weights table.

Frontline Solvers Analytic Solver Data Mining Reference Guide 390


Recall that a key element in a neural network is the weights for the connections
between nodes. In this example, we chose to have one hidden layer, and we also
chose to have 12 nodes in that layer. Analytic Solver Data Mining's output
contains a section that contains the final values for the weights between the
input layer and the hidden layer, between hidden layers, and between the last
hidden layer and the output layer. This information is useful at viewing the
“insides” of the neural network; however, it is unlikely to be of use to the data
analyst end-user. Displayed above are the final connection weights between the
input layer and the hidden layer for our example.
Open NNC_TrainingScore to view the classifications assigned to each record in
the training dataset. The class with the largest assigned probability becomes the
assigned class. Here you can compare the Predicted Class with the Actual
Class. Highlighted records indicate the record was misclassified.

Open NNC_ValidationScore to view the classifications assigned to each record


in the validation dataset. Again, the class with the largest assigned probability
becomes the assigned class.

See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for information on the Stored Model Sheet, NNC_Stored1.

NNC with Output Variable Containing 2 Classes


The Error Report for an automated neural network for a dataset with 2 classes in
the output variable will look slightly different. Open the file
Boston_Housing.xlsx by clicking Help – Examples on the Data Mining ribbon,
then Forecasting/Data Mining Examples. If using AnalyticSolver.com,
upload the dataset using the Upload icon on the Solver Home tab. This dataset
includes fourteen variables pertaining to housing prices from census tracts in the
Boston area collected by the US Census Bureau. The categorical variable
CAT.MEDV which has been derived from the MEDV variable (Median value of

Frontline Solvers Analytic Solver Data Mining Reference Guide 391


owner-occupied homes in $1000's) by assigning a 1 for MEDV levels above 30
(>= 30) and a 0 for levels below 30 (<30).
First, we partition the data into training and validation sets using the Standard
Data Partition defaults of 60% of the data randomly allocated to the Training Set
and 40% of the data randomly allocated to the Validation Set. For more
information on partitioning a dataset, see the Data Mining Partitioning chapter.

Select a cell on the STDPartition worksheet, then click Classify – Neural


Network – Automatic Network on the Data Mining ribbon. Select
CAT.MEDV (has 2 classes) for the Output variable, the nominal categorical
variable CHAS as a Categorical Variable and the remaining variables (except
MEDV and Record ID) as Selected variables.
The Number of Classes statistic will be automatically updated with a value of 2
when the Output Variable is selected. This indicates that the Output variable,
CAT.MEDV, contains two classes, 0 and 1.
Choose the value that will be the indicator of “Success” by clicking the down
arrow next to Success Class. In this example, we will use the default of 1
indicating that a value of “1” will be specified as a “success”.
Enter a value between 0 and 1 for Success Probability Cutoff. If the Probability
of success (probability of the output variable = 1) is less than this value, then a 0

Frontline Solvers Analytic Solver Data Mining Reference Guide 392


will be entered for the class value, otherwise a 1 will be entered for the class
value. In this example, we will keep the default of 0.5.

Click Finish to accept the default settings for all parameters. Open the sheet,
NNC_Output, which will be inserted to the right of the STDPartition sheet.
Scroll down to the Architecture Search Error Log.

The above error report gives the total number of errors, % Error, % Sensitivity
(also known as true positive rate) and % Specificity (also known as true negative
rate) in the classification produced by each network ID for the training and
validation datasets separately. As shown in the Automatic Neural Network
Classification section above, this report may be sorted by column by clicking the
arrow next to each column heading. In addition, click the Net ID hyperlinks to
re-run the Neural Network Classification method with Manual Architecture with
the input and option settings as specified in the specific Net ID.
Let’s take a look at Net ID 5. This network has one hidden layer containing 5
nodes. For this neural network, the percentage of errors in the training data is
15.46% and the percentage of errors in the validation data is 18.32%.
Sensitivity and Specificity measures can vary in importance depending on the
application and goals of the application. Declaring a tumor cancerous when it is
in fact benign could result in many unnecessary expensive and invasive tests and

Frontline Solvers Analytic Solver Data Mining Reference Guide 393


treatments. However, in a model where a “success” does not indicate a
potentially fatal disease, this measure might not be viewed as important.
The percentage specificity is 96.89% for the training dataset and 93.9% in the
validation dataset. This means that 96.89% of the records in the training dataset
and 93.9% of the records in the validation dataset identified as being negative,
were in fact negative. In the case of a cancer diagnosis, we would prefer that this
percentage be higher, or much closer to 100% as it could be potentially fatal if a
person with cancer was diagnosed as not having cancer. However, as mentioned
in the paragraph above, this measure can vary in importance depending on the
application.

Neural Network Classification Method Options


The options below appear on one of the Neural Network Classification dialogs.
See below for option descriptions on the Neural Network Classification - Data
dialog.

Variables In Input Data


The variables included in the dataset appear here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 394


Selected Variables
Variables selected to be included in the output appear here.

Categorical Variables
Place categorical variables from the Variables listbox to be included in the
model by clicking the > command button. The Neural Network Classification
algorithm will accept non-numeric categorical variables.

Output Variable
The dependent variable or the variable to be classified appears here.

Number of Classes
Displays the number of classes in the Output variable.

Success Class
This option is selected by default. Click the drop down arrow to select the value
to specify a “success”. This option is only enabled when the # of classes is
equal to 2.

Success Probability Cutoff


Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less
than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 395


See below for option descriptions on the Neural Network Classification -
Parameters dialog. Note: The Neural Network Automatic Classification –
Parameters dialog does not include Architecture, but is otherwise the same.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
classification method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Rescale Data
Click Rescale Data to open the Rescaling dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 396


Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Hidden Layers/Neurons
Click Add Layer to add a hidden layer. To delete a layer, click Remove Layer.
Once the layer is added, enter the desired Neurons.

Hidden Layer
Nodes in the hidden layer receive input from the input layer. The output of the
hidden nodes is a weighted sum of the input values. This weighted sum is
computed with weights that are initially set at random values. As the network
“learns”, these weights are adjusted. This weighted sum is used to compute the
hidden node’s output using a transfer function. The default selection is Sigmoid.
Select Sigmoid (the default setting) to use a logistic function for the transfer
function with a range of 0 and 1. This function has a “squashing effect” on very
small or very large values but is almost linear in the range where the value of the
function is between 0.1 and 0.9.
Select Hyperbolic Tangent to use the tanh function for the transfer function, the
range being -1 to 1. If more than one hidden layer exists, this function is used
for all layers.
ReLU (Rectified Linear Unit) is a widely used choice for hidden layers. This
function applies a max(0,x) function to the neuron values. When used instead of
logistic sigmoid or hyperbolic tangent activations, some adjustments to the
Neural Network settings are typically required to achieve a good performance,
such as: significantly decreasing the learning rate, increasing the number of
learning epochs and network parameters.

Output Layer
As in the hidden layer output calculation (explained in the above paragraph), the
output layer is also computed using the same transfer function as described for
Activation: Hidden Layer. The default selection is Sigmoid.
Select Sigmoid (the default setting) to use a logistic function for the transfer
function with a range of 0 and 1.
Select Hyperbolic Tangent to use the tanh function for the transfer function, the
range being -1 to 1.
In neural networks, the Softmax function is often implemented at the final layer
of a classification neural network to impose the constraints that the posterior
probabilities for the output variable must be >= 0 and <= 1 and sum to 1. Select
Softmax to utilize this function.

Prior Probability
Click Prior Probability. Three options appear in the Prior Probability Dialog:
Empirical, Uniform and Manual.

Frontline Solvers Analytic Solver Data Mining Reference Guide 397


If the first option is selected, Empirical, Analytic Solver Data Mining will
assume that the probability of encountering a particular class in the dataset is the
same as the frequency with which it occurs in the training data.
If the second option is selected, Uniform, Analytic Solver Data Mining will
assume that all classes occur with equal probability.
Select the third option, Manual, to manually enter the desired class and
probability value.

Neuron Weight Initialization Seed


If an integer value appears for Neuron weight initialization seed, Analytic Solver
Data Mining will use this value to set the neuron weight random number seed.
Setting the random number seed to a nonzero value (any number of your choice
is OK) ensures that the same sequence of random numbers is used each time the
neuron weight are calculated. The default value is “12345”. If left blank, the
random number generator is initialized from the system clock, so the sequence
of random numbers will be different in each calculation. If you need the results
from successive runs of the algorithm to another to be strictly comparable, you
should set the seed. To do this, type the desired number you want into the box.
This option accepts both positive and negative integers with up to 9 digits.

Training Parameters
Click Training Parameters to open the Training Parameters dialog to specify
parameters related to the training of the Neural Network algorithm.

Frontline Solvers Analytic Solver Data Mining Reference Guide 398


Learning Order [Original or Random]
This option specifies the order in which the records in the training
dataset are being processed. It is recommended to shuffle the training
data to avoid the possibility of processing correlated reocrds in order.
It also helps the neural network algorithm to converge faster. If
Random is selected, Random Seed is enabled. If Original is selected,
the algorithm will use the original order of records.

Learning Order [Random Seed]


This option specifies the seed for shuffling the training records. Note
that different random shuffling may lead to different results, but as long
as the training data is shuffled, different ordering typically does not
result in drastic changes in performance.

Random Seed for Weights Initialization


If an integer value appears for Random Seed for Weights Initialization,
Analytic Solver Data Mining will use this value to set the seed for the
initial assignment of the neuron values. Setting the random number
seed to a nonzero value (any number of your choice is OK) ensures that
the same sequence of random numbers is used each time the neuron
values are calculated. The default value is “12345”. If left blank, the
random number generator is initialized from the system clock, so the
sequence of random numbers will be different in each calculation. If
you need the results from successive runs of the algorithm to another to
be strictly comparable, you should set the seed. To do this, type the
desired number you want into the box.

Learning Rate
This is the multiplying factor for the error correction during
backpropagation; it is roughly equivalent to the learning rate for the
neural network. A low value produces slow but steady learning, a high
value produces rapid but erratic learning. Values for the step size
typically range from 0.1 to 0.9.

Frontline Solvers Analytic Solver Data Mining Reference Guide 399


Weight Decay
To prevent over-fitting of the network on the training data, set a weight
decay to penalize the weight in each iteration. Each calculated weight
will be multiplied by (1-decay).

Weight Change Momentum


In each new round of error correction, some memory of the prior
correction is retained so that an outlier that crops up does not spoil
accumulated learning.

Error Tolerance
The error in a particular iteration is backpropagated only if it is greater
than the error tolerance. Typically error tolerance is a small value in the
range from 0 to 1.

Response Rescaling Correction


This option specifies a small number, which is applied to the
Normalization rescaling formula, if the output layer activation is
Sigmoid (or Softmax in Classification), and Adjusted Normalization, if
the output layer activation is Hyperbolic Tangent. The rescaling
correction ensures that all response values stay within the range of
activation function.

Stopping Rules
Click Stopping Rules to open the Stopping Rules dialog. Here users can specify
a comprehensive set of rules for stopping the algorithm early plus cross-
validation on the training error.

Partition for Error Computation


Specifies which data partition is used to estimate the error after each
training epoch.

Frontline Solvers Analytic Solver Data Mining Reference Guide 400


Number of Epochs
An epoch is one sweep through all records in the training set. Use this
option to set the number of epochs to be performed by the algorithm.

Maximum Number of Epochs Without


Improvement
The algorithm will stop after this number of epochs has been
completed, and no improvement has ben realized.

Maximum Training Time


The algorithm will stop once this time (in seconds) has been exceeded.

Keep Minimum Relative Change in Error


If the relative change in error is less than this value, the algorithm will
stop.

Keep Minimum Relative Change in Error


Compared to Null Model
If the relative change in error compared to the Null Model is less than
this value, the algorithm will stop. Null Model is the baseline model
used for comparing the performance of the neural network model.

See below for option descriptions on the Neural Network Classification -


Scoring dialog.

Score Training Data


Select these options to show an assessment of the performance of the algorithm
in classifying the training data. The report is displayed according to your
specifications – Detailed, Summary and Lift Charts. Lift Charts are disabled
when the number of classes is greater than 2.

Score Validation Data


These options are enabled when a validation set is present. Select to show an
assessment of the performance of the algorithm in classifying the validation
data. The report is displayed according to your specifications – Detailed,

Frontline Solvers Analytic Solver Data Mining Reference Guide 401


Summary and Lift Charts. Lift Charts are disabled when the number of classes
is greater than 2.

Score Test Data


These options are enabled when a test set is present. Select these options to
show an assessment of the performance of the algorithm in classifying the test
data. The report is displayed according to your specifications – Detailed,
Summary and Lift Charts. Lift Charts are disabled when the number of classes
is greater than 2.

Score New Data


See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for more details on the Score New Data options on the Step 3 of 3
dialog. Lift Charts are disabled when the number of classes is greater than 2.

Frontline Solvers Analytic Solver Data Mining Reference Guide 402


Ensemble Methods for
Classification
Analytic Solver Data Mining offers two powerful ensemble methods for use
with all classification methods: bagging (bootstrap aggregating) and boosting.
A third method, random trees, may only be applied to classification trees. Each
classification method on their own can be used to find one model that results in
good classifications of the new data. We can view the statistics and confusion
matrices of the current classifier to see if our model is a good fit to the data, but
how would we know if there is a better classifier just waiting to be found? The
answer is – we don't. However, ensemble methods allow us to combine multiple
“weak” classification models which, when taken together form a new, more
accurate “strong” classification model. These methods work by creating multiple
diverse classification models, by taking different samples of the original dataset,
and then combining their outputs. (Outputs may be combined by several
techniques for example, majority vote for classification and averaging for
regression. This combination of models effectively reduces the variance in the
“strong” model. The three different types of ensemble methods offered in
Analytic Solver Data Mining (bagging, boosting, and random trees) differ on
three items: 1.The selection of training data for each classifier or “weak” model,
2.How the “weak” models are generated and 3. How the outputs are combined.
In all three methods, each “weak” model is trained on the entire training dataset
to become proficient in some portion of the dataset.

Bagging, or bootstrap aggregating, was one of the first ensemble algorithms ever
to be written. It is a simple algorithm, yet very effective. Bagging generates
several training data sets by using random sampling with replacement (bootstrap
sampling), applies the classification algorithm to each dataset, then takes the
majority vote amongst the models to determine the classification of the new
data. The biggest advantage of bagging is the relative ease that the algorithm
can be parallelized which makes it a better selection for very large datasets.

Boosting, in comparison, builds a “strong” model by successively training


models to concentrate on the misclassified records in previous models. Once
completed, all classifiers are combined by a weighted majority vote. Analytic
Solver Data Mining offers three different variations of boosting as implemented
by the AdaBoost algorithm (one of the most popular ensemble algorithms in use
today): M1 (Freund), M1 (Breiman), and SAMME (Stagewise Additive
Modeling using a Multi-class Exponential).

Adaboost.M1 first assigns a weight (wb(i)) to each record or observation. This


weight is originally set to 1/n and will be updated on each iteration of the
algorithm. An original classification model is created using this first training
set (Tb) and an error is calculated as:

n
eb =  wb(i) I (Cb( xi) yi))
i −1

where the I() function returns 1 if true and 0 if not.

Frontline Solvers Analytic Solver Data Mining Reference Guide 403


The error of the classification model in the bth iteration is used to calculate the
constant αb. This constant is used to update the weight wb(i). In AdaBoost.M1
(Freund), the constant is calculated as:

αb= ln((1-eb)/eb)
In AdaBoost.M1 (Breiman), the constant is calculated as:

αb= 1/2ln((1-eb)/eb)

In SAMME, the constant is calculated as:

αb= 1/2ln((1-eb)/eb + ln(k-1) where k is the number of classes

(When the number of categories is equal to 2, SAMME behaves the same as


AdaBoost Breiman.)

In any of the three implementations (Freund, Breiman, or SAMME), the new


weight for the (b + 1)th iteration will be

wb + 1(i ) = wb(i ) exp( bI (Cb( xi )  yi ))

Afterwards, the weights are all readjusted to sum to 1. As a result, the weights
assigned to the observations that were classified incorrectly are increased and
the weights assigned to the observations that were classified correctly are
decreased. This adjustment forces the next classification model to put more
emphasis on the records that were misclassified. (This α constant is also used in
the final calculation which will give the classification model with the lowest
error more influence.) This process repeats until b = Number of weak learners
(controlled by the User). The algorithm then computes the weighted sum of
votes for each class and assigns the “winning” classification to the record.
Boosting generally yields better models than bagging, however, it does have a
disadvantage as it is not parallelizable. As a result, if the number of weak
learners is large, boosting would not be suitable.

Random trees, also known as random forests, is a variation of bagging. This


method works by training multiple “weak” classification trees using a fixed
number of randomly selected features (sqrt[number of features] for classification
and number of features/3 for prediction) then takes the mode of each class to
create a “strong” classifier. Typically, in this method the number of “weak”
trees generated could range from several hundred to several thousand depending
on the size and difficulty of the training set. Random Trees are parallelizable
since they are a variant of bagging. However, since Random Trees selects a
limited amount of features in each iteration, the performance of random trees is
faster than bagging.

Classification Ensemble methods are very powerful methods and typically result
in better performance than a single tree. This feature addition in Analytic Solver
Data Mining (introduced in V2015) will provide users with more accurate
classification models and should be considered.

Frontline Solvers Analytic Solver Data Mining Reference Guide 404


Boosting Ensemble Method Example
Analytic Solver Data Mining includes two different methods for use with all six
Analytic Solver Data Mining classifiers, boosting and bagging, and one method
for use with classification trees, random trees. All three methods are ensemble
methods that are used to generate one powerful model by combining several
“weaker” tree models. This example illustrates how to create a classification
tree using the boosting ensemble method. The boosting method starts by first
training a single classifier, then examining the misclassified records from that
model to train a successive model. This process repeats until a high level of
accuracy is obtained. We will use the Boston_Housing.xlsx dataset to illustrate
this method.
Click Help – Examples, then Forecasting/Data Mining Examples to open the
Boston_Housing.xlsx dataset.
The figure below displays a portion of the data; observe the last column (CAT.
MEDV). This variable has been derived from the MEDV variable by assigning
a 1 for MEDV levels above 30 (>= 30) and a 0 for levels below 30 (<30) and
will not be used in this example.

Click Classify – Ensemble – Boosting to open the Boosting: Classification


dialog.
Select CAT. MEDV as the Output variable and select CHAS as a Categorical
Variable. Then select all remaining variables except MEDV as Selected
variables. The MEDV variable is not included in the Input as it is not a
categorical variable. The CAT. MEDV variable is a categorical variable that is
based on the MEDV variable.
Choose the value that will be the indicator of “Success” by clicking the down
arrow next to Success Class. In this example, we will use the default of 1.
Enter a value between 0 and 1 for Success Probability Cutoff. If the Probability
of success is less than this value, then a 0 will be entered for the class value,
otherwise a 1 will be entered for the class value. In this example, we will keep
the default of 0.5.

Frontline Solvers Analytic Solver Data Mining Reference Guide 405


Click Next to advance to the Boosting: Classification Parameters dialog.
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or regression method by clicking Partition Data on the
Parameters dialog. Click Partition Data to open the Partitioning dialog.
Analytic Solver Data Mining will partition your dataset (according to the
partition options you set) immediately before running the classification method.
If partitioning has already occurred on the dataset, this option will be disabled.
For more information on partitioning, please see the Data Mining Partitioning
chapter.
Select Partition Data at the top of the dialog to enable the partitioning options.
Use the default settings to partition the data into training and validation sets.
For more information on partitioning a dataset, see the Data Mining Partitioning
chapter. Click Done to close the dialog.

Enter "10" for the Number of weak learners. This option controls the number of
“weak” classification models that will be created. The ensemble method will

Frontline Solvers Analytic Solver Data Mining Reference Guide 406


stop when the number or classification models created reaches the value set for
this option. The algorithm will then compute the weighted sum of votes for
each class and assign the “winning” classification to each record.
Under Ensemble: Classification click the down arrow beneath Weak Leaner to
select one of the six featured classifiers. In this example, we will select Neural
Networks.
Click Neural Network to specify options for this weak learner. In this example,
we will add two layers. Click Add Layer twice to add the two layers. Enter "2"
neurons for the 1st Layer. Then click Done to accept the default settings. For
more information on each of these options, please see the Neural Network
chapter that appears previously in this guide.

Since we selected Neural Network as the weak learner, click rescaling to rescale
the data using Normalization ([0,1])

Leave the default selection for AdaBoost Variant, AdaBoost.M1 (Breiman).


The difference in the algorithms is the way in which the weights assigned to

Frontline Solvers Analytic Solver Data Mining Reference Guide 407


each observation or record are updated. (Please refer to the section Ensemble
Methods in the Introduction to the chapter.)
In AdaBoost.M1 (Freund), the constant is calculated as:

αb= ln((1-eb)/eb)
In AdaBoost.M1 (Breiman), the constant is calculated as:

αb= 1/2ln((1-eb)/eb)

In SAMME, the constant is calculated as:

αb= 1/2ln((1-eb)/eb + ln(k-1) where k is the number of classes

(When the number of categories is equal to 2, SAMME behaves the same as


AdaBoost Breiman.)

Leave Random Seed for Resampling at the default setting of 12345. This option
specifies the seed for random resampling of the training data for the weak
learner.

To display the weak learner models in the output, select Show Weak Learner
Models.

Frontline Solvers Analytic Solver Data Mining Reference Guide 408


Click Next to advance to the Boosting: Classification - Scoring dialog.
Summary Report is selected by default under both Score Training Data and
Score Validation Data. Select Detailed Report under both Score Training Data
and Score Validation Data to produce a detailed assessment of the performance
of the tree in both sets. Select Lift Charts to include Lift Charts, ROC Curves
and Decile charts for both the Training and Validation datasets. Since we did
not create a test partition, the options for Score test data are disabled. See the
chapter “Data Mining Partitioning” for information on how to create a test
partition.
Please see the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for information on the Score new data options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 409


Click Finish. Output from the Boosting algorithm will be inserted to the right
of the Data worksheet. Click CBoosting_Output to view the Output Navigator.
Click any link in this section to navigate to various sections of the output.

Scroll down the output sheet CBoosting_Output to display the Boosting Models.
The number of Weak Learners in the output is equal to 50 which matches our
input on the Parameters dialog for the Number of Weak Learners option.
Analytic Solver Data Mining assigns a weight to each weak learner.

Click the CBoosting_TrainingScore tab to view the Training: Classification


Summary to view the Classification Confusion Matrix. For more information on
confusion matrices, please see the Neural Network chapter that appears
previously in this guide.

Scroll down to view how each record was classified in the training partition.
Click the CBoosting_ValidationScore tab to view the Validation: Classification
Summary.

Frontline Solvers Analytic Solver Data Mining Reference Guide 410


Scroll down to view how each record was classified in the validation partition.
Click the CBoosting_TrainingLiftChart to navigate to the Lift Charts, shown
below. For more information on lift charts, ROC curves, and Decile charts,
please see the Neural Network chapter that appears previously in this guide.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select CBoosting_TrainingLiftChart or CBoosting_ValidationLiftChart
for Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.

Frontline Solvers Analytic Solver Data Mining Reference Guide 411


Click the CBoosting_ValidationLiftChart to navigate to the charts, shown
below.

Analytic Solver Data Mining generates CBoosting_Stored along with the other
output. Please refer to the “Scoring New Data” chapter in the Analytic Solver
Data Mining User Guide for details.

Bagging Ensemble Method for Classification


Now let’s use the 2nd ensemble method, bagging. We’ll re-use the same dataset,
Boston_Housing.xlsx, then we will compare the results.
Click Classify – Ensemble Methods – Bagging to open the Bagging – Data
dialog. Again, select CAT. MEDV as the Output variable. Then select all
remaining variables except MEDV and CHAS as Selected Variables.
Keep the default selections for Binary Classification. (For more information on
these options, see the Boosting Ensemble Method Example above.)

Frontline Solvers Analytic Solver Data Mining Reference Guide 412


Click Next to advance to the Bagging - Parameters dialog.
Select Partition Data at the top of the dialog to enable the partitioning options.
Use the default settings to partition the data into training and validation sets.
Click Done to close the dialog.

Leave the Number of weak learners as the default of 50. This option controls
the number of “weak” classification models that will be created. The ensemble
method will stop when the number or classification models created reaches the
value set for this option. The algorithm will then compute the weighted sum of
votes for each class and assign the “winning” classification to each record.

Frontline Solvers Analytic Solver Data Mining Reference Guide 413


Under Ensemble: Classification click the down arrow beneath Weak Leaner to
select one of the six featured classifiers. In this example, we will select Logistic
Regression.

Click Logistic Regression to specify options for this weak learner. In this
example, we will click Done to accept the default settings. For more
information on each of these options, please see the Logistic Regression chapter
that appears previously in this guide.
Under Bagging: Common, leave Random Seed for Bootstrapping at the default
of 12345. Analytic Solver Data Mining will use this value to set the
bootstrapping random number seed. Setting the random number seed to a
nonzero value (any number of your choice is OK) ensures that the same
sequence of random numbers is used each time the dataset is chosen for the
classifier.

To display the weak learner models in the output, select Show Weak Learner
Models.

Click Next to advance to the Bagging - Scoring dialog.


Summary Report is selected by default under both Score Training Data and
Score Validation Data. Select Detailed Report and Lift Charts under both Score
Training Data and Score Validation Data to produce a detailed assessment of
the performance of the tree in both sets. Since we did not create a test partition,

Frontline Solvers Analytic Solver Data Mining Reference Guide 414


the options for Score test data are disabled. See the chapter “Data Mining
Partitioning” for information on how to create a test partition.
Please see the “Scoring New Data” chapter in the Analytic Solver Data Mining
User Guide for information on the Score new data options.

Click Finish to run the ensemble method. Output from the Ensemble Methods
algorithm will be inserted to the right. Double click CBagging_Output to view
the Output Navigator. Click any link in this section to navigate to various
sections of the output.

Scroll down CBagging_Output to Details of the bagging tree ensemble. The


Importance percentage for each Variable is listed here and measures the
variable’s contribution in reducing the total misclassification error.

Click the CBagging_TrainingScore tab to view the Training: Classification


Summary.

Frontline Solvers Analytic Solver Data Mining Reference Guide 415


In the training dataset, nine (9) records were misclassified, 3 in the success class
and 6 in the failure class, resulting in a classification error of 2.96%.
Scroll down to view how each record was classified in the training partition.
Misclassified records will appear in red.
Click the CBagging_ValidationScore tab to view the Validation: Classification
Summary.

In the validation dataset, 15 records were misclassified, 12 in the failure class


and 3 in the success class, resulting in an overall error of 7.43%. For more
information on confusion matrices, please see the Neural Network chapter that
appears previously in this guide.
Scroll down to view how each record was classified in the validation partition.
Click the CBagging_TrainingLiftChart and CBagging_ValidationLiftChart
tabs to find the Lift Chart, ROC Curve, and Decile Chart for both the Training
and Validation partitions.

Frontline Solvers Analytic Solver Data Mining Reference Guide 416


Lift Charts and ROC Curves are visual aids that help users evaluate the
performance of their fitted models. Charts found on the CBagging_Training
LiftChart tab were calculated using the Training Data Partition. Charts found on
the CBagging_ValidationLiftChart tab were calculated using the Validation
Data Partition. It is good practice to look at both sets of charts to assess model
performance on both datasets.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select CBagging_TrainingLiftChart or CBagging_ValidationLiftChart
for Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, ROC Curve, and Lift Charts for Training Partition

Decile-wise Lift Chart, ROC Curve, and Lift Charts for Valid. Partition

After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in decreasing order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the cumulative number of cases in decreasing
probability (on the x-axis) vs the cumulative number of true positives on the y-
axis. The baseline (red line connecting the origin to the end point of the blue
line) is a reference line. For a given number of cases on the x-axis, this line
represents the expected number of successes if no model existed, and instead
cases were selected at random. This line can be used as a benchmark to measure
the performance of the fitted model. The greater the area between the lift curve
and the baseline, the better the model. In the Training Lift chart, if we selected
100 cases as belonging to the success class and used the fitted model to pick the
members most likely to be successes, the lift curve tells us that we would be
correct on about 40 of them. Conversely, if we selected 100 random cases, we
could expect to be right on about 15 of them.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
This bars in this chart indicate the factor by which the model outperforms a
random assignment, one decile at a time. Refer to the validation graph above.
In the first decile, taking the most expensive predicted housing prices in the

Frontline Solvers Analytic Solver Data Mining Reference Guide 417


dataset, the predictive performance of the model is about 5 times better as
simply assigning a random predicted value.
The Regression ROC curve was updated in V2017. This new chart compares
the performance of the regressor (Fitted Predictor) with an Optimum Predictor
Curve and a Random Classifier curve. The Optimum Predictor Curve plots a
hypothetical model that would provide perfect classification results. The best
possible classification performance is denoted by a point at the top left of the
graph at the intersection of the x and y axis. This point is sometimes referred to
as the “perfect classification”. The closer the AUC is to 1, the better the
performance of the model. In the Validation Partition, AUC = .973 which
suggests that this fitted model could be a good fit to this dataset.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against the Predictive Positive Rate or
Support.
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 418


Click the down arrow and select Gain Chart from the menu. In this chart, the
True Positive Rate or Sensitivity is plotted against the Predictive Positive Rate
or Support.
Analytic Solver Data Mining generates CBagging_Stored along with the other
output. Please refer to the “Scoring New Data” chapter in the Analytic Solver
Data Mining User Guide for details.

Random Trees Ensemble Method Example


Now let’s use the 3rd ensemble method, random trees. We’ll again re-use the
Boston_Housing.xlsx dataset.
Click Classify – Ensemble – Random Trees to open the Random Trees
Classification – Step 1 of 3 dialog.
Select CAT. MEDV as the Output variable. Then select all remaining
variables except MEDV and CHAS as Selected Variables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 419


Click Next to advance to the Random Trees – Parameters dialog.
Select Partition Data at the top of the dialog to enable the partitioning options.
Use the default settings to partition the data into training and validation sets.
Click Done to close the dialog.

Enter 50 for the Number of weak learners . This option controls the number of
“weak” classification models that will be created. The ensemble method will
stop when the number or classification models created reaches the value set for
this option. The algorithm will then compute the weighted sum of votes for
each class and assign the “winning” classification to each record.
Under Bagging: Common, leave Random Seed for Bootstrapping at the default
of 12345.

Frontline Solvers Analytic Solver Data Mining Reference Guide 420


Since Random Trees only support Decision Trees as it's weak leaner, no drop
down menu is present. Rather, you can click Decision Tree under Weak Leaner
to specify options for the Classification Tree algorithm. Select Levels, Nodes,
Splits, and Records in Terminal Nodes. Click Done to accept all at their default
settings. For more information on the classification tree options, see the
Classification Tree chapter that occurs previously in this guide.

Leave Number of Randomly Selected Features and Random Seed for Feature
Selection at their defaults.
To display the weak learner models in the output, select Show Weak Learner
Models. To print Feature Importance output, select Show Feature
Importance.

Frontline Solvers Analytic Solver Data Mining Reference Guide 421


Click Next to advance to the Random Trees Classification – Scoring dialog.
Summary Reports are selected by default under both Score Training Data and
Score Validation Data. Select Detailed Report and Lift Charts under both Score
Training Data and Score Validation Data to produce a detailed assessment of
the performance of the tree in both sets. Since we did not create a test partition,
the options for Score test data are disabled. See the chapter “Data Mining
Partitioning” for information on how to create a test partition.
Please see the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for information on the Score new data options.

Click Finish to run the ensemble method. Output from the Ensemble Methods
algorithm will be inserted to the right. Open CRandTrees_Output to view the
Output Navigator. Click any link in this section to navigate to various sections
of the output.

Click the Features Importance link to navigate to the Features Importance table.
This table displays the variables that are included in the model along with their
Importance value.

Scroll down CRandTrees_Output to view the model for each of the 50 weak
learners.

Frontline Solvers Analytic Solver Data Mining Reference Guide 422


Click the CRandTrees_TrainingScore sheet to view the Classification Confusion
Matrix.

The confusion matrix, above, displays counts for cases that were correctly and
incorrectly classified in the training data sets. Eighteen (18) records were
misclassified in the training dataset, 13 in the Success class and 5 in the Failure
class, resulting in an error of 5.92%.
Scroll down to view the details of how each record in the dataset was classified
in the Training: Classification Details table. If the PostProb value for PostProb
1 is greater than 0.5, the record is assigned a classification of 1. Otherwise, the
record is assigned a classification of 0.
Click the CRandTrees_ValidationScore sheet to view the Classification
Confusion Matrix.

Frontline Solvers Analytic Solver Data Mining Reference Guide 423


In this partition, none records were misclassified, 4 in the Success class and 5 in
the Failure class, resulting in an error of 4.46%.
Scroll down to view the details of how each record in the dataset was classified
in the Training: Classification Details table. If the PostProb value for PostProb
1 is greater than 0.5, the record is assigned a classification of 1. Otherwise, the
record is assigned a classification of 0. Misclassified records appear in red.
Click the CRandTrees_TrainingLiftChart and
CRandTrees_ValidationLiftChart tabs to navigate to the Lift Charts, shown
below.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select CRandTrees_TrainingLiftChart or
CRandTrees_ValidationLiftChart for Worksheet and Decile Chart, ROC Chart
or Gain Chart for Chart.
Decile-wise Lift Chart, ROC Curve, and Lift Charts for Training Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 424


Decile-wise Lift Chart, ROC Curve, and Lift Charts for Valid. Partition

The Training Lift Chart tells us that if we selected 100 cases as belonging to the
success class and used the fitted model to pick the members most likely to be
successes, the lift curve tells us that we would be right on about 45 of them.
Conversely, if we selected 100 random cases, we could expect to be right on
about 15 of them. In the Validation Lift chart, we see that if we selected 100
cases as belonging to the success class and used the fitted model to pick the
members most likely to be successes, the lift curve tells us that we would be
right on about 37 of them. Conversely, if we selected 100 random cases, we
could expect to be right on about 15 of them.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
This bars in this chart indicate the factor by which the model outperforms a
random assignment, one decile at a time. Refer to the validation graph above.
In the first decile, taking the most expensive predicted housing prices in the
dataset, the predictive performance of the model is about 5 times better as
simply assigning a random predicted value.
In an ROC curve, we can compare the performance of a classifier with that of a
random guess which would lie at a point along a diagonal line (red line) running
from the origin (0, 0) to the point (1, 1). The closer the value AUC is to 1, the
better the performance of the classification model. In this example, the AUC for
the validation partition equals 0.96 which suggests that the fitted model could be
a good fit to the data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 425


In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart. Each of these
charts consists of an Optimum Predictor curve, a Fitted Predictor curve, and a
Random Predictor curve. The Optimum Predictor curve plots a hypothetical
model that would provide perfect classification for our data. The Fitted
Predictor curve plots the fitted model and the Random Predictor curve plots the
results from using no model or by using a random guess (i.e. for x% of selected
observations, x% of the total number of positive observations are expected to be
correctly classified).
Analytic Solver Data Mining generates CRandTrees_Stored along with the other
output sheets. Please refer to the “Scoring New Data” chapter within the
Analytic Solver Data Mining User Guide for details.

Classification Ensemble Methods Options


The following options appear on the Bagging, Boosting, and Random Trees
Data dialogs.
Please see below for options appearing on the Ensemble Methods- Data dialog.

Variables In Input Data


The variables included in the dataset appear here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 426


Selected Variables
Variables selected to be included in the output appear here.

Categorical Variables
Place categorical variables from the Variables listbox to be included in the
model by clicking the > command button. Ensemble Methods will accept non-
numeric categorical variables.

Output Variable
The dependent variable or the variable to be classified appears here.

Number of Classes
Displays the number of classes in the Output variable.

Success Class
This option is selected by default. Click the drop down arrow to select the value
to specify a “success”. This option is only enabled when the # of classes is
equal to 2.

Success Probability Cutoff


Enter a value between 0 and 1 here to denote the cutoff probability for success.
If the calculated probability for success for an observation is greater than or
equal to this value, than a “success” (or a 1) will be predicted for that
observation. If the calculated probability for success for an observation is less
than this value, then a “non-success” (or a 0) will be predicted for that
observation. The default value is 0.5. This option is only enabled when the # of
classes is equal to 2.
Please see below for options appearing on the Boosting – Parameters dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 427


Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. Click Partition Data to open the Partitioning dialog.
Analytic Solver Data Mining will partition your dataset (according to the
partition options you set) immediately before running the classification method.
If partitioning has already occurred on the dataset, this option will be disabled.
For more information on partitioning, please see the Data Mining Partitioning
chapter.

Rescale Data
Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 428


Number of Weak Learners
This option controls the number of “weak” classification models that will be
created. The ensemble method will stop when the number or classification
models created reaches the value set for this option. The algorithm will then
compute the weighted sum of votes for each class and assign the “winning”
classification to each record.

Weak Learner
Under Ensemble: Classification click the down arrow beneath Weak Leaner to
select one of the six featured classifiers: Discriminant Analysis, Logistic
Regression, k-NN, Naïve Bayes, Neural Networks, or Decision Trees. After a
weak learner is chosen, the command button to the right will be enabled. Click
this command button to control various option settings for the weak leaner.

AdaBoost Variant
The difference in the algorithms is the way in which the weights assigned to
each observation or record are updated. (Please refer to the section Ensemble
Methods in the Introduction to the chapter.)
In AdaBoost.M1 (Freund), the constant is calculated as:

αb= ln((1-eb)/eb)
In AdaBoost.M1 (Breiman), the constant is calculated as:

αb= 1/2ln((1-eb)/eb)

In SAMME, the constant is calculated as:

αb= 1/2ln((1-eb)/eb + ln(k-1) where k is the number of classes

(When the number of categories is equal to 2, SAMME behaves the same as


AdaBoost Breiman.)

Random Seed for Resampling


If an integer value appears for Random Seed for Resampling, Analytic Solver
Data Mining will use this value to specify the seed for random resampling of the
training data for each weak learner. Setting the random number seed to a
nonzero value (any number of your choice is OK) ensures that the same
sequence of random numbers is used each time the dataset is chosen for the
classifier. The default value is “12345”. If left blank, the random number
generator is initialized from the system clock, so the sequence of random
numbers will be different in each calculation. If you need the results from
successive runs of the algorithm to another to be strictly comparable, you should
set the seed. To do this, type the desired number you want into the box. This
option accepts both positive and negative integers with up to 9 digits.

Show Weak Learner

Frontline Solvers Analytic Solver Data Mining Reference Guide 429


To display the weak learner models in the output, select Show Weak Learner
Models.
Please see below for options unique to the Bagging – Parameters dialog.

Random Seed for Boostrapping


If an integer value appears for Bootstrapping Random seed, Analytic Solver
Data Mining will use this value to set the bootstrapping random number seed.
Setting the random number seed to a nonzero value (any number of your choice
is OK) ensures that the same sequence of random numbers is used each time the
dataset is chosen for the classifier. The default value is “12345”. If left blank,
the random number generator is initialized from the system clock, so the
sequence of random numbers will be different in each calculation. If you need
the results from successive runs of the algorithm to another to be strictly
comparable, you should set the seed. To do this, type the desired number you
want into the box. This option accepts both positive and negative integers with
up to 9 digits.

Frontline Solvers Analytic Solver Data Mining Reference Guide 430


Please see below for options unique to the Random Trees – Parameters dialog.

Please see below for options that are unique to the Random Trees Classification
- Step 2 of 3 dialog. For remaining option explanations, please see above.

Number of Randomly Selected Features


The Random Trees ensemble method works by training multiple “weak”
classification trees using a fixed number of randomly selected features then
taking the mode of each class to create a “strong” classifier. The option Number
of randomly selected features controls the fixed number of randomly selected
features in the algorithm. The default setting is 3.

Random Seed for Featured Selection


If an integer value appears for Feature Selection Random seed, Analytic Solver
Data Mining will use this value to set the feature selection random number seed.
Setting the random number seed to a nonzero value (any number of your choice
is OK) ensures that the same sequence of random numbers is used each time the
dataset is chosen for the classifier. The default value is “12345”. If left blank,
the random number generator is initialized from the system clock, so the
sequence of random numbers will be different in each calculation. If you need
the results from successive runs of the algorithm to another to be strictly
comparable, you should set the seed. To do this, type the desired number you
want into the box. This option accepts both positive and negative integers with
up to 9 digits.
Please see below for options that are unique to the Ensemble Methods Scoring
dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 431


Score Training Data
Select these options to show an assessment of the performance of the algorithm
in classifying the training data. The report is displayed according to your
specifications – Detailed, Summary and Lift Charts. Lift Charts are disabled
when the number of classes is greater than 2.

Score Validation Data


These options are enabled when a validation set is present. Select to show an
assessment of the performance of the algorithm in classifying the validation
data. The report is displayed according to your specifications – Detailed,
Summary and Lift Charts. Lift Charts are disabled when the number of classes
is greater than 2.

Score Test Data


These options are enabled when a test set is present. Select these options to
show an assessment of the performance of the algorithm in classifying the test
data. The report is displayed according to your specifications – Detailed,
Summary and Lift Charts. Lift Charts are disabled when the number of classes
is greater than 2.

Score New Data


See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for more details on the Score New Data options on the Scoring
dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 432


Linear Regression Method

Introduction
Linear regression is performed on a dataset either to predict the response
variable based on the predictor variable, or to study the relationship between the
response variable and predictor variables. For example, using linear regression,
the crime rate of a state can be explained as a function of demographic factors
such as population, education, male to female ratio etc.
This procedure performs linear regression on a selected dataset that fits a linear
model of the form
Y= b0 + b1X1 + b2X2+ .... + bkXk+ e
where Y is the dependent variable (response), X1, X2,.. .,Xk are the independent
variables (predictors) and e is the random error. b0 , b1, b2, .... bk are known as
the regression coefficients, which are estimated from the data. The multiple
linear regression algorithm in Analytic Solver Data Mining chooses regression
coefficients to minimize the difference between the predicted and actual values.
See the Analytic Solver Data Mining User Guide for a step-by-step example on
how to use Linear Regression to predict housing prices using the example
dataset, Boston_Housing.xlsx.

Linear Regression Options


The following options appear on the four Linear Regression dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 433


See below, for option explanations included on the Linear Regression Data
dialog.

Variables Input Data


All variables in the dataset are listed here.

Selected Variables
Variables listed here will be utilized in the Analytic Solver Data Mining output.

Weight Variable
One major assumption of Linear Regression is that each observation provides
equal information. Analytic Solver Data Mining offers an opportunity to
provide a Weight variable. Using a Weight variable allows the user to allocate a
weight to each record. A record with a large weight will influence the model
more than a record with a smaller weight.

Output Variable
Select the variable whose outcome is to be predicted here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 434


See below, for option explanations included on the Linear Regression
Parameters dialog.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialogo. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
regression method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Rescale Data
Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 435


Note: Rescaling has no effect in Regression methods. The coefficient estimates will be scaled
proportionally with the data resulting in the same results with or without scaling. This feature is
included on this dialog for consistency.

Fit Intercept
If this option is selected, a constant term will be included in the model.
Otheriwse, a constant term will not be included in the equation. This option is
selected by default.

Feature Selection
When you have a large number of predictors and you would like to limit the
model to only significant variables, click Feature Selection to open the Feature
Selection dialog and select Perform Feature Selection at the top of the dialog.
Maximum Subset Size can take on values of 1 up to N where N is the number of
Selected Variables. If no Categorical Variables exist, the default for this option
is N. If one or more Categorical Variables exist, the default is "15".

Analytic Solver Data Mining offers five different selection procedures for
selecting the best subset of variables.
• Backward Elimination in which variables are eliminated one at a time,
starting with the least significant. If this procedure is selected, FOUT
is enabled. A statistic is calculated when variables are eliminated. For
a variable to leave the regression, the statistic’s value must be less than
the value of FOUT (default = 2.71).
• Forward Selection in which variables are added one at a time, starting
with the most significant. If this procedure is selected, FIN is enabled.
On each iteration of the Forward Selection procedure, each variable is
examined for the eligibility to enter the model. The significance of
variables is measured as a partial F-statistic. Given a model at a current
iteration, we perform an F Test, testing the null hypothesis stating that
the regression coefficient would be zero if added to the existing set if
variables and an alternative hypothesis stating otherwise. Each variable
is examined to find the one with the largest partial F-Statistic. The
decision rule for adding this variable into a model is: Reject the null
hypothesis if the F-Statistic for this variable exceeds the critical value

Frontline Solvers Analytic Solver Data Mining Reference Guide 436


chosen as a threshold for the F Test (FIN value), or Accept the null
hypothesis if the F-Statistic for this variable is less than a threshold. If
the null hypothesis is rejected, the variable is added to the model and
selection continues in the same fashion, otherwise the procedure is
terminated.
• Sequential Replacement in which variables are sequentially replaced
and replacements that improve performance are retained.
• Stepwise selection is similar to Forward selection except that at each
stage, Analytic Solver Data Mining considers dropping variables that
are not statistically significant. When this procedure is selected, the
Stepwise selection options FIN and FOUT are enabled. In the stepwise
selection procedure a statistic is calculated when variables are added or
eliminated. For a variable to come into the regression, the statistic’s
value must be greater than the value for FIN (default = 3.84). For a
variable to leave the regression, the statistic’s value must be less than
the value of FOUT (default = 2.71). The value for FIN must be greater
than the value for FOUT.
• Best Subsets where searches of all combinations of variables are
performed to observe which combination has the best fit. (This option
can become quite time consuming depending on the number of input
variables.) If this procedure is selected, Number of best subsets is
enabled.

Regression Display
Under Regression: Display, select all desired display options to include each in
the output.
Under Statistics, the following display options are present.
• ANOVA
• Variance-Covariance Matrix
• Multicollinearity Diagnostics
Under Advanced, the following display options are present.
• Analysis of Coefficients
• Analysis of Residuals
• Influence Diagnostics
• Confidence/Prediction Intervals

See below, for option explanations included on the Linear Regression Scoring
dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 437


Score Training Data
Select these options to show an assessment of the performance of the tree in
classifying the training data. The report is displayed according to your
specifications - Detailed, Summary, and Lift charts.

Score Validation Data


These options are enabled when a Validation dataset is present. Select these
options to show an assessment of the performance of the tree in classifying the
validation data. The report is displayed according to your specifications -
Detailed, Summary, and Lift charts.

Score Test Data


These options are enabled when a test set is present. Select these options to
show an assessment of the performance of the tree in classifying the test data.
The report is displayed according to your specifications - Detailed, Summary,
and Lift charts.

Score New Data


See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for more details on the In worksheet or In database options under
Score New Data.
When this option is selected the Studentized Residuals are displayed in the
output. Studentized residuals are computed by dividing the unstandardized
residuals by quantities related to the diagonal elements of the hat matrix, using a
common scale estimate computed without the i th case in the model. These
residuals have t - distributions with ( n-k-1) degrees of freedom. As a result, any
residual with absolute value exceeding 3 usually requires attention. This option
is not selected by default.

Frontline Solvers Analytic Solver Data Mining Reference Guide 438


k-Nearest Neighbors Regression
Method

Introduction
In the k-nearest-neighbor regression method, the training data set is used to
predict the value of a variable of interest for each member of a "target" data set.
The structure of the data generally consists of a variable of interest ("amount
purchased," for example), and a number of additional predictor variables (age,
income, location, etc.).
1. For each row (case) in the target data set (the set to be predicted), locate the
k closest members (the k nearest neighbors) of the training data set. A
Euclidean Distance measure is used to calculate how close each member of
the training set is to the target row that is being examined.
2. Find the weighted sum of the variable of interest for the k nearest neighbors
(the weights are the inverse of the distances).
3. Repeat this procedure for the remaining rows (cases) in the target set.
4. Additionally, Analytic Solver Data Mining also allows the user to select a
maximum value for k, builds models in parallel on all values of k (up to the
maximum specified value) and performs scoring on the best of these
models.
Computing time increases as k increases, but the advantage is that higher values
of k provide “smoothing” that reduces vulnerability to noise in the training data.
Typically, k is in units of tens rather than in hundreds or thousands of units.

k-Nearest Neighbors Regression Method Example


The example below illustrates the use of Analytic Solver Data Mining’s k-
Nearest Neighbors Regression method using the Boston_Housing.xlsx dataset.
Click Help – Examples on the Data Mining ribbon, then click
Forecasting/Data Mining Examples to open Boston_Housing.xlsx. If using
AnalyticSolver.com, upload the dataset using the Upload icon on the Solve
Home tab. This dataset contains 14 variables, the description of each is given in
the table below. The dependent variable MEDV is the median value of a
dwelling. The objective of this example is to predict the value of this variable.

Frontline Solvers Analytic Solver Data Mining Reference Guide 439


CRIM Per capita crime rate by town
ZN Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
NOX Nitric oxides concentration (parts per 10 million)
RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to 1940
DIS Weighted distances to five Boston employment centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of African-
Americans by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's
A portion of the dataset is shown below. The last variable, CAT. MEDV, is a
discrete classification of the MEDV variable and will not be used in this
example.

First, we partition the data into training and validation sets using the Standard
Data Partition defaults with percentages of 60% of the data randomly allocated
to the Training Set and 40% of the data randomly allocated to the Validation
Set. For more information on partitioning a dataset, see the Data Mining
Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 440


STDPartition is inserted into the Model tab of the Analytic Solver task pane
under Transformations -- Data Partition. Click Predict – k-Nearest Neighbors
to open the following dialog
Select MEDV as the Output Variable, and the remaining variables (except CAT.
MEDV, CHAS, and Record ID) as Selected Variables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 441


Click Next to advance to the Parameters dialog.
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or regression method by selecting Partition Data on the
Parameters dialog. If this option is selected, Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the regression method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.
Enter 5 for # Neighbors (K). (This number is based on standard practice from
the literature.) This is the parameter k in the k-Nearest Neighbor algorithm. If
the number of observations (rows) is less than 50 then the value of k should be
between 1 and the total number of observations (rows). If the number of rows is
greater than 50, then the value of k should be between 1 and 50. Note that if k is
chosen as the total number of observations in the training set, then for any new
observation, all the observations in the training set become nearest neighbors.
The default value for this option is 1.
Select Search 1..K under Nearest Neighbors Search. When this option is
selected, Analytic Solver Data Mining will display the output for the best k
between 1 and the value entered for # Neighbors. If Fixed K is selected, the
output will be displayed for the specified value of k.

Frontline Solvers Analytic Solver Data Mining Reference Guide 442


Click Next to advance to the Scoring dialog.
Summary Reports are selected by default under both Score Training Data and
Score Validation Data. Select Detailed Report and Lift Charts under both Score
Training Data and Score Validation Data to produce a detailed assessment of
the performance of the tree in both sets. Since we did not create a test partition,
the options for Score test data are disabled. See the chapter “Data Mining
Partitioning” for information on how to create a test partition.
Please see the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for information on the Score new data options.

Click Finish. Results from the regression method are inserted to the right.
KNNP_Output contains the Output Navigator which allows easy access to all
portions of the output.

Scroll down KNNP_Output to the Validation error log (shown below). As per
our specifications, Analytic Solver Data Mining has calculated the RMS error
for all values of k and denoted the value of k with the smallest RMS Error. The
validation partition will be scored using this value of k.

Frontline Solvers Analytic Solver Data Mining Reference Guide 443


Click the Training: Prediction Summary link on the Output Navigator to
open the Training: Prediction Summary and Training: Prediction Details tables,
shown below. The Training: Prediction Summary report summarizes the
prediction error. The first number, the total sum of squared errors, is the sum of
the squared deviations (residuals) between the predicted and actual values. The
second is the average of the squared residuals, the third is the square root of the
average of the squared residuals and the fourth is the average deviation. All
these values are calculated for the best k, i.e. k=5.
Training: Prediction Details displays the predicted value, the actual value and
the difference between them (the residuals), for each record. Note that the
algorithm perfectly predicted the correct selling price in the training partition.

Click the Validation: Prediction Summary link on the Output Navigator to


open the Validation: Prediction Summary and Validation: Prediction Details
tables, shown below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 444


Lift charts and RROC Curves (on the KNNP_TrainingLiftChart and
KNNP_ValidationLiftChart tabs, respectively) are visual aids for measuring
model performance. Lift Charts consist of a lift curve and a baseline. The greater
the area between the lift curve and the baseline, the better the model. RROC
(regression receiver operating characteristic) curves plot the performance of
regressors by graphing over-estimations (or predicted values that are too high)
versus underestimations (or predicted values that are too low.) The closer the
curve is to the top left corner of the graph (in other words, the smaller the area
above the curve), the better the performance of the model.
After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in descending order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the number of cases versus the cumulated value.
The baseline (red line connecting the origin to the end point of the blue line) is
drawn as the number of cases versus the average of actual output variable values
multiplied by the number of cases.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
The bars in this chart indicate the factor by which the kNNP model outperforms
a random assignment, one decile at a time. Refer to the validation graph below.
In the first decile in both the training and validation datasets, taking the most
expensive predicted housing prices in the dataset, the predictive performance of
the model is about 1.8 times better as simply assigning a random predicted
value.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select KNNP_TrainingLiftChart or KNNP_ValidationLiftChart for
Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-Wise Lift Chart, ROC Curve and Lift Chart from Training Partition

Decile-Wise Lift Chart, ROC Curve and Lift Chart from Validation
Partition

In an RROC curve, we can compare the performance of a regressor with that of


a random guess (red line) for which under estimations are equal to over-
estimations shifted to the minimum under estimate. Anything to the left of this

Frontline Solvers Analytic Solver Data Mining Reference Guide 445


line signifies a better prediction and anything to the right signifies a worse
prediction. The best possible prediction performance would be denoted by a
point at the top left of the graph at the intersection of the x and y axis. Area
Over the Curve (AOC) is the space in the graph that appears above the ROC
curve and is calculated using the formula: sigma 2 * n2/2 where n is the number
of records The smaller the AOC, the better the performance of the model. The
ROC Curve for the Training Partition is blank. This is because the KNN
algorithm perfectly predicted the selling price in the training partition.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against % Cases. The Gain Chart plots the
Gain Ratio against % Cases.
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 446


See the “Scoring New Data” chapter on Stored Model Sheets for more
information KNNP_Stored.

k-Nearest Neighbors Regression Method Options


The following options appear on the three k-Nearest Neighbors dialogs.

Variables In Input Data


All variables in the dataset are listed here.

Selected Variables
Select variables to be included in the model here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 447


Output Variable
Select the continous variable whose outcome is to be predicted here.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by selecting Partition Options on the
Parameters dialog. If this option is selected, Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the prediction method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.

Rescale Data
Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

# Neighbors (k)
This is the parameter k in the k-nearest neighbor algorithm. If the number of
observations (rows) is less than 50 then the value of k should be between 1 and
the total number of observations (rows). If the number of rows is greater than
50, then the value of k should be between 1 and 50. The default value is 1.

Frontline Solvers Analytic Solver Data Mining Reference Guide 448


Nearest Neighbors Search
If Search 1..K is selected, Analytic Solver Data Mining will display the output
for the best k between 1 and the value entered for # Neighbors (k).
If Fixed K selected, the output will be displayed for the specified value of k.
This is the default setting.

Score Training Data


Select these options to show an assessment of the performance of the algorithm
in classifying the training data. The report is displayed according to your
specifications - Detailed, Summary and Lift Charts.

Score Validation Data


These options are enabled when a validation dataset exists. Select these options
to show an assessment of the performance of the algorithm in classifying the
validation data. The report is displayed according to your specifications -
Detailed, Summary, and Lift Charts.

Score Test Data


These options are enabled when a test set is present. Select these options to
show an assessment of the performance of the algorithm in classifying the test
data. The report is displayed according to your specifications - Detailed,
Summary, and Lift Charts.

Score New Data


See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for information on KNNP_Stored.

Frontline Solvers Analytic Solver Data Mining Reference Guide 449


Frontline Solvers Analytic Solver Data Mining Reference Guide 450
Regression Tree Method

Introduction
As with all regression techniques, Analytic Solver Data Mining assumes the
existence of a single output (response) variable and one or more input
(predictor) variables. The output variable is numerical. The general regression
tree building methodology allows input variables to be a mixture of continuous
and categorical variables. A decision tree is generated where each decision node
in the tree contains a test on some input variable's value. The terminal nodes of
the tree contain the predicted output variable values.
A Regression tree may be considered as a variant of decision trees, designed to
approximate real-valued functions instead of being used for classification
methods.

Methodology
A Regression tree is built through a process known as binary recursive
partitioning. This is an iterative process that splits the data into partitions or
“branches”, and then continues splitting each partition into smaller groups as the
method moves up each branch.
Initially, all records in the training set (the pre-classified records that are used to
determine the structure of the tree) are grouped into the same partition. The
algorithm then begins allocating the data into the first two partitions or
“branches”, using every possible binary split on every field. The algorithm
selects the split that minimizes the sum of the squared deviations from the mean
in the two separate partitions. This splitting “rule” is then applied to each of the
new branches. This process continues until each node reaches a user-specified
minimum node size and becomes a terminal node. (If the sum of squared
deviations from the mean in a node is zero, then that node is considered a
terminal node even if it has not reached the minimum size.)

Pruning the tree


Since the tree is grown from the training data set, a fully developed tree
typically suffers from over-fitting (i.e. it is "explaining" random elements of the
training data that are not likely to be features of the larger population). This
over-fitting results in poor performance on “real life” data. Therefore, the tree
must be “pruned” using the validation data set. Analytic Solver Data Mining
calculates the cost complexity factor at each step during the growth of the tree
and decides the number of decision nodes in the pruned tree. The cost
complexity factor is the multiplicative factor that is applied to the size of the tree
(which is measured by the number of terminal nodes).
The tree is pruned to minimize the sum of (1) the output variable variance in the
validation data, taken one terminal node at a time, and (2) the product of the cost
complexity factor and the number of terminal nodes. If the cost complexity
factor is specified as zero then pruning is simply finding the tree that performs
best on validation data in terms of total terminal node variance. Larger values of
the cost complexity factor result in smaller trees. Pruning is performed on a “last

Frontline Solvers Analytic Solver Data Mining Reference Guide 451


in first out” basis meaning the last grown node is the first to be subject to
elimination.

Single Tree Regression Tree Example


Analytic Solver Data Mining includes four different methods for creating
regression trees: boosting, bagging, random trees, and single tree. The first
three (boosting, bagging, and random trees) are ensemble methods that are used
to generate one powerful model by combining several “weaker” tree models.
For information on these methods, please see the Ensemble Methods chapter
that occurs later in this guide.
This example illustrates how to use the Regression Tree algorithm using a single
tree. We will use the Boston_Housing.xlsx dataset to illustrate this method. See
below for examples using bagging, random trees and single trees.
Click Help – Examples, then Forecasting/Data Mining Examples to open the
Boston_Housing.xlsx dataset. If using AnalyticSolver.com, upload the dataset
using the Upload icon on the Solver Home tab. This dataset includes fourteen
variables pertaining to housing prices from census tracts in the Boston area.
This dataset was collected by the US Census Bureau.
CRIM Per capita crime rate by town
ZN Proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)
RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to 1940
DIS Weighted distances to five Boston employment centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of African-Americans by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's
The figure below displays a portion of the data; observe the last column (CAT.
MEDV). This variable has been derived from the MEDV variable by assigning
a 1 for MEDV levels above 30 (>= 30) and a 0 for levels below 30 (<30) and
will not be used in these examples. The variable will not be used in this
example.

Frontline Solvers Analytic Solver Data Mining Reference Guide 452


First, we partition the data into training and validation sets using the Standard
Data Partition defaults of 60% of the data randomly allocated to the Training Set
and 40% of the data randomly allocated to the Validation Set. For more
information on partitioning a dataset, see the Data Mining Partitioning chapter.
Click Predict – Regression Tree - Single Tree to open the Regression Tree –
Data dialog.
Select MEDV as the Output Variable, CHAS as a Categorical Variable, then
select the remaining variables (except CAT.MEDV) as the Selected Variables.
Recall that both CAT.MEDV and CHAS are nominal categorical variables.

Click Next to advance to the Regression Tree – Parameters dialog.


As discussed in previous sections, Analytic Solver Data Mining includes the
ability to partition a dataset from within a classification or prediction method by
clicking Partition Data on the Parameters dialog. Analytic Solver Data Mining
will partition your dataset (according to the partition options you set)
immediately before running the regression method. If partitioning has already
occurred on the dataset, this option will be disabled. For more information on
partitioning, please see the Data Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 453


In the Tree Growth section, select Levels, Nodes, Splits, and Records in
Terminal Nodes. Leave all selections at their default settings. Values entered
for these options limit tree growth, i.e. if 10 is entered for Levels, the tree will be
limited to 10 levels.
Select Prune (Using Validation Set). (This option is enabled when a Validation
Dataset exists.) Analytic Solver Data Mining will prune the tree using the
validation set when this option is selected. (Pruning the tree using the validation
set reduces the error from over-fitting the tree to the training data.)
Click Tree for Scoring and select Fully Grown.
Select Show Feature Importance. This table shows the relative importance of the
feature measured as the reduction of the error criterion during the tree growth.
Leave Maximum Number of Levels at the default setting of 7. This option
specifies the maximum number of levels in the tree to be displayed in the output.
Select Trees to Display to select the types of trees to display: Fully Grown, Best
Pruned, Minimum Error or User Specified.
• Select Fully Grown to “grow” a complete tree using the training data.
• Select Best Pruned to create a tree with the fewest number of nodes,
subject to the constraint that the error be kept below a specified level
(minimum error rate plus the standard error of that error rate).
• Select Minimum error to produce a tree that yields the minimum
classification error rate when tested on the validation data.
• To create a tree with a specified number of decision nodes select User
Specified and enter the desired number of nodes.
Select Fully Grown, Best Pruned, and Minimum Error.

Frontline Solvers Analytic Solver Data Mining Reference Guide 454


Select Next to advance to the Scoring dialog.
Summary Report under Score Training Data and Score Validation Data has
been automatically selected. Select Detailed Report and Lift Chart under both
Score Training Data and Score Validation Data to produce a detailed
assessment of the performance of the tree in both sets. Since we did not create a
test partition, the options for Score test data are disabled. See the chapter “Data
Mining Partitioning” for information on how to create a test partition.
See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for details on scoring to a worksheet or database.

Click Finish. Output from the method will be inserted to the right of the
workbook. RT_Output contains the Output Navigator which allows easy access
to all portions of the output.

Click the Training: Prediction Summary link to navigate to the Training:


Prediction Summary and Prediction Details tables. The Prediction Summary
tables contain summary reports using the Using Full-Grown Tree on both the
validation data and the test data (if present). These reports contain the total sum
of squared errors, the mean squared error, the root mean square error (RMS
error, or the square root of the mean squared error), and also the average error
(which is much smaller, since errors fall roughly into negative and positive
errors and tend to cancel each other out unless squared first.). These small error
values in both datasets suggest that the Single Tree method has created a very
accurate predictor. However, in general, these errors are not great measures.
RROC curves (discussed below) are much more sophisticated and provide more
precise information about the accuracy of the predictor.

Frontline Solvers Analytic Solver Data Mining Reference Guide 455


The Prediction Details table displays the predicted value for each record along
with the actual value and the residuals for each record.
To view the Validation: Prediction Summary and Validation: Prediction
Details data tables, click the Validation: Prediction Summary link on the
Output Navigator.

Click the RT_Output tab to navigate to the Training Log. The Training log
(shown below) shows the mean-square error (MSE) at each stage of the tree for
both the training and validation data sets. The MSE value is the average of the
squares of the errors between the predicted and observed values in the sample.
The training log shows that the training MSE continues reducing as the tree
continues to split.
Analytic Solver Data Mining chooses the number of decision nodes for the
pruned tree and the minimum error tree from the values of Validation MSE. In
the Prune log shown below, the smallest Validation MSE error belongs to the
tree with 10 decision nodes. This is the Minimum Error Tree – the tree with the
smallest misclassification error in the validation dataset. Best Pruned Tree is the
tree with the largest error that is still less than the sum of the Standard Error and
the Validation MSE. In this example, the tree with 5 nodes is the Best Pruned
Tree.

To view the Min-Error Tree, click the Min-Error Rules (Using Validation
Data link in the Output Navigator.

Frontline Solvers Analytic Solver Data Mining Reference Guide 456


LSTAT (% Lower Status of the Population) is chosen as the first splitting
variable; if this value is < 9.73 (96 cases), then RM (# of Rooms) is chosen for
splitting; if RM <7.01 (66 cases) then DIS (weighted distances to 5 employment
centers) is chosen for splitting. If DIS is >=1.48 (66 cases) then the RM
variable is (again) chosen as the next divider. If RM <= 6.54 (35 cases), the
predicted value is 22.19, otherwise, the predicted value is 28.79.
Click the Min- Error Tree Rules link to navigate to Tree rules for the Min-
Error tree.

The first entry in this table shows a split on the LSTAT variable with a split
value of 9.725. The 304 cases in the training partition and the 202 cases in the
validation partition were split between nodes 2 (LeftChild column) and 3
(Rightchild column).
Moving to NodeID 2 we find that 116 cases in the training partition and 96 cases
in the validation partition were assigned to this node (from node 1) which has a
predicted value of 29.68 (Response column). These cases were split on the RM
variable using a value of 7.0115. In the validation partition 66 cases were
assigned to node 4 and 30 cases were assigned to node 5.
Moving to NodeID 3 we find that 106 cases in the validation partition were
assigned to this node (from node 1) which has a predicted value = 17.21. From
here, these cases were split on the CRIM variable using a value of 5.85 between
nodes 6 (73 cases) and 7 (33 cases).
Moving to NodeID 4 we find that 116 cases in the validation partition were
assigned to this node (from node 2) which has a predicted value = 22.81. From
here, these cases were split on the DIS variable using a value of 1.42 between
nodes 8 (3 cases) and 9 (113 cases).

Frontline Solvers Analytic Solver Data Mining Reference Guide 457


Moving to NodeID 5 we find that 30 cases were assigned to this node (from
node 2) which has a predicted value = 39.96. Node 5 is a Terminal node so no
further splits are made.
Moving to NodeID 6; 73 cases were assigned to this node (from node 3) which
has a predicted value = 19.26. These 73 cases were split on the DIS variable
with a splitting value = 1.9832 on nodes 10 and 11.
One can continue down through the tree until all terminal nodes are reached.
Click the RT_TrainingLiftChart and RT_ValidationLiftChart tabs top display
the lift charts and regression ROC curves. These are visual aids for measuring
model performance. Lift Charts consist of a lift curve and a baseline. The greater
the area between the lift curve and the baseline, the better the model. RROC
(regression receiver operating characteristic) curves plot the performance of
regressors by graphing over-estimations (or predicted values that are too high)
versus underestimations (or predicted values that are too low.) The closer the
curve is to the top left corner of the graph (in other words, the smaller the area
above the curve), the better the performance of the model.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select RT_TrainingLiftChart or RT_ValidationLiftChart for Worksheet
and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, RROC Curve and Lift Chart for Training Partition

After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted in descending order using the predicted output variable value.
After sorting, the actual outcome values of the output variable are cumulated
and the lift curve is drawn as the number of cases versus the cumulated value.
The baseline (red line connecting the origin to the end point of the blue line) is
drawn as the number of cases versus the average of actual output variable values
multiplied by the number of cases. The decilewise lift curve is drawn as the
decile number versus the cumulative actual output variable value divided by the
decile's mean output variable value. This bars in this chart indicate the factor by
which the CT model outperforms a random assignment, one decile at a time.
Refer to the validation graph below. In the first decile, taking the most
expensive predicted housing prices in the dataset, the predictive performance of
the model is almost 2 times better as simply assigning a random predicted value.
Decile-wise Lift Chart, RROC Curve and Lift Chart for Validation
Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 458


In an Regression ROC curve, we can compare the performance of a regressor
with that of a random guess (red line) for which under estimations are equal to
over-estimations shifted to the minimum under estimate. Anything to the left of
this line signifies a better prediction and anything to the right signifies a worse
prediction. The best possible prediction performance would be denoted by a
point at the top left of the graph at the intersection of the x and y axis. This
point is sometimes referred to as the “perfect prediction”. Area Over the Curve
(AOC) is the space in the graph that appears above the ROC curve and is
calculated using the formula: sigma2 * n2/2 where n is the number of records
The smaller the AOC, the better the performance of the model.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor
curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
The Alternative Lift Chart plots Lift against % Cases.
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 459


Click the down arrow and select Gain Chart from the menu. In this chart, the
Gain Ratio is plotted against the % Cases.
For information on RT_Stored, please see the “Scoring New Data” chapter
within the Analytic Solver Data Mining User Guide.

Regression Tree Options


The options below appear on one of the three Regression Tree dialogs.

Variables in input data


All variables in the dataset are listed here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 460


Selected variables
Variables listed here will be utilized in the Analytic Solver Data Mining output.

Output Variable
Select the variable whose outcome is to be predicted here.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
regression method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Frontline Solvers Analytic Solver Data Mining Reference Guide 461


Rescale Data

Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Tree Growth
In the Tree Growth section, select Levels, Nodes, Splits, and Records in
Terminal Nodes. Values entered for these options limit tree growth, i.e. if 10 is
entered for Levels, the tree will be limited to 10 levels.

Prune (Using Validation Set)


If a validation partition exists, this option is enabled. When this option is
selected, Analytic Solver Data Mining will prune the tree. Pruning the tree using
the validation set reduces the error from over-fitting the tree to the training data.

Show Feature Importance


Select Feature Importance to include the Features Importance table in the
output. This table displays the variables that are included in the model along
with their Importance value.

Maximum Number of Levels


This option specifies the maximum number of levels in the tree to be displayed
in the output. Select Trees to Display to select the types of trees to display:
Fully Grown, Best Pruned, Minimum Error or User Specified.
• Select Fully Grown to “grow” a complete tree using the training data.
• Select Best Pruned to create a tree with the fewest number of nodes,
subject to the constraint that the error be kept below a specified level
(minimum error rate plus the standard error of that error rate).
• Select Minimum error to produce a tree that yields the minimum
classification error rate when tested on the validation data.
• To create a tree with a specified number of decision nodes select User
Specified and enter the desired number of nodes.

Frontline Solvers Analytic Solver Data Mining Reference Guide 462


Score Training Data
Select these options to show an assessment of the performance of the tree in
classifying the training data. The report is displayed according to your
specifications - Detailed, Summary, and Lift charts.

Score Validation Data


These options are enabled when a Validation dataset is present. Select to show
an assessment of the performance of the tree in classifying the validation data.
The report is displayed according to your specifications - Detailed, Summary,
and Lift charts.

Score Test Data


These options are enabled when a Test dataset is present. Select these options to
show an assessment of the performance of the tree in classifying the test data.
The report is displayed according to your specifications - Detailed, Summary,
and Lift charts.

Score New Data


The options in this group allow you to apply the model for scoring to an
altogether new data. See the "Scoring New Data" chapter within the Analytic
Solver Data Mining User Guide for details on these options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 463


Neural Network Regression
Method

Introduction
Artificial neural networks are relatively crude electronic networks of "neurons"
based on the neural structure of the brain. They process records one at a time,
and "learn" by comparing their prediction of the record (which, at the outset, is
largely arbitrary) with the known actual record. The errors from the initial
prediction of the first record is fed back to the network and used to modify the
network’s algorithm for the second iteration. These steps are repeated multiple
times.
Roughly speaking, a neuron in an artificial neural network is
1. A set of input values (xi) with associated weights (wi)
2. An input function (g) that sums the weights and maps the results to an
output function(y).

Neurons are organized into layers: input, hidden and output. The input layer is
composed not of full neurons, but simply of the values in a record that are inputs
to the next layer of neurons. The next layer is the hidden layer of which there
could be several. The final layer is the output layer, where there is one node for
each class. A single sweep forward through the network results in the
assignment of a value to each output node. The record is assigned to the class
node with the highest value.

Frontline Solvers Analytic Solver Data Mining Reference Guide 464


Training an Artificial Neural Network
In the training phase, the correct class for each record is known (this is termed
supervised training), and the output nodes can therefore be assigned "correct"
values -- "1" for the node corresponding to the correct class, and "0" for the
others. (In practice, better results have been found using values of “0.9” and
“0.1”, respectively.) As a result, it is possible to compare the network's
calculated values for the output nodes to these "correct" values, and calculate an
error term for each node. These error terms are then used to adjust the weights in
the hidden layers so that, hopefully, the next time around the output values will
be closer to the "correct" values.

The Iterative Learning Process


A key feature of neural networks is an iterative learning process in which
records (rows) are presented to the network one at a time, and the weights
associated with the input values are adjusted each time. After all cases are
presented, the process often starts over again. During this learning phase, the
network “trains” by adjusting the weights to predict the correct class label of
input samples. Advantages of neural networks include their high tolerance to
noisy data, as well as their ability to classify patterns on which they have not
been trained. The most popular neural network algorithm is the back-
propagation algorithm proposed in the 1980's.
Once a network has been structured for a particular application, that network is
ready to be trained. To start this process, the initial weights (described in the
next section) are chosen randomly. Then the training, or learning, begins.
The network processes the records in the training data one at a time, using the
weights and functions in the hidden layers, then compares the resulting outputs
against the desired outputs. Errors are then propagated back through the system,
causing the system to adjust the weights for the next record. This process occurs
over and over as the weights are continually tweaked. During the training of a
network the same set of data is processed many times as the connection weights
are continually refined.

Frontline Solvers Analytic Solver Data Mining Reference Guide 465


Note that some networks never learn. This could be because the input data do
not contain the specific information from which the desired output is derived.
Networks also will not converge if there is not enough data to enable complete
learning. Ideally, there should be enough data available to create a validation set.

Feedforward, Back-Propagation
The feedforward, back-propagation architecture was developed in the early
1970's by several independent sources (Werbor; Parker; Rumelhart, Hinton and
Williams). This independent co-development was the result of a proliferation of
articles and talks at various conferences which stimulated the entire industry.
Currently, this synergistically developed back-propagation architecture is the
most popular, effective, and easy-to-learn model for complex, multi-layered
networks. Its greatest strength is in non-linear solutions to ill-defined problems.
The typical back-propagation network has an input layer, an output layer, and at
least one hidden layer. Theoretically, there is no limit on the number of hidden
layers but typically there are just one or two. Some studies have shown that the
total number of layers needed to solve problems of any complexity is 5 (one
input layer, three hidden layers and an output layer). Each layer is fully
connected to the succeeding layer.
As noted above, the training process normally uses some variant of the Delta
Rule, which starts with the calculated difference between the actual outputs and
the desired outputs. Using this error, connection weights are increased in
proportion to the error times, which are a scaling factor for global accuracy. This
means that the inputs, the output, and the desired output all must be present at
the same processing element. The most complex part of this algorithm is
determining which input contributed the most to an incorrect output and how to
modify the input to correct the error. (An inactive node would not contribute to
the error and would have no need to change its weights.) To solve this problem,
training inputs are applied to the input layer of the network, and desired outputs
are compared at the output layer. During the learning process, a forward sweep
is made through the network, and the output of each element is computed layer
by layer. The difference between the output of the final layer and the desired
output is back-propagated to the previous layer(s), usually modified by the
derivative of the transfer function. The connection weights are normally
adjusted using the Delta Rule. This process proceeds for the previous layer(s)
until the input layer is reached.

Structuring the Network


The number of layers and the number of processing elements per layer are
important decisions. These parameters, to a feedforward, back-propagation
topology, are also the most ethereal - they are the "art" of the network designer.
There is no quantifiable, best answer to the layout of the network for any
particular application. There are only general rules picked up over time and
followed by most researchers and engineers applying this architecture to their
problems.
Rule One: As the complexity in the relationship between the input data and the
desired output increases, the number of the processing elements in the hidden
layer should also increase.
Rule Two: If the process being modeled is separable into multiple stages, then
additional hidden layer(s) may be required. If the process is not separable into
stages, then additional layers may simply enable memorization of the training
set, and not a true general solution.

Frontline Solvers Analytic Solver Data Mining Reference Guide 466


Rule Three: The amount of training data available sets an upper bound for the
number of processing elements in the hidden layer(s). To calculate this upper
bound, use the number of cases in the training data set and divide that number
by the sum of the number of nodes in the input and output layers in the network.
Then divide that result again by a scaling factor between five and ten. Larger
scaling factors are used for relatively less noisy data. If too many artificial
neurons are used the training set will be memorized, not generalized, and the
network will be useless on new data sets.

Automated Neural Network Regression Method


Example
This example focuses on creating a Neural Network using the Automatic
Architecture. See the Ensemble Methods chapter that appears later on in this
guide to see an example on creating a Neural Network using the boosting and
bagging ensemble methods. See the section below for an example of how to
create a single neural network.
We will use the Boston_Housing.xlsx example dataset. This dataset contains 14
variables, the description of each is given in the table below. The dependent
variable MEDV is the median value of a dwelling. This objective of this
example is to predict the value of this variable.

CRIM Per capita crime rate by town


ZN Proportion of residential land zoned for lots over
25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds
river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)
RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to
1940
DIS Weighted distances to five Boston employment
centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of
African-Americans by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's

A portion of the dataset is shown below. The last variable, CAT.MEDV, is a


discrete classification of the MEDV variable and will not be used in this
example.

Frontline Solvers Analytic Solver Data Mining Reference Guide 467


First, we partition the data into training and validation sets using the Standard
Data Partition defaults with percentages of 60% of the data randomly allocated
to the Training Set and 40% of the data randomly allocated to the Validation
Set. For more information on partitioning a dataset, see the Data Mining
Partitioning chapter.

Click Predict – Neural Network – Automatic on the Data Mining ribbon. The
Neural Network Regression Data dialog appears.

Frontline Solvers Analytic Solver Data Mining Reference Guide 468


Select MEDV as the Output variable, CHAS as the Categorical Variable, and
the remaining variables as Selected Variables (except the CAT.MEDV and
Record ID variables). The last variable, CAT.MEDV, is a discrete classification
of the MEDV variable and will not be used in this example.

Click Next to advance to the Parameters dialog.


When a neural network with automatic architecture is created, several networks
are run with increasing complexity in the architecture. The networks are limited
to 2 hidden layers and the number of hidden neurons in each layer is bounded by
UB1 = (#features + 1) * 2/3 on the 1st layer and UB2 = (UB1 + 1) * 2/3 on the
2nd layer. For this example, select Automatic Architecture. See below for an
example on specifying the number of layers when manually defining the
network architecture.
First, all networks are trained with 1 hidden layer with the number of nodes not
exceeding the UB1 bounds, then a second layer is added and a 2 – layer
architecture is tried until the UB2 limit is satisfied.
The limit on the total number of trained networks is the minimum of 100 and
(UB1 * (1+UB2)). In this dataset, there are 13 features in the model giving the
following bounds:
UB1 = Floor ((13 + 1) * 2/3) = 9.33 ~ 9
UB2 = Floor ((9 + 1) * 2/3) = 6.67 ~ 6
(Floor: Rounds a number down to the nearest multiple of significance.)
# Networks Trained = MIN {100, (9 * (1 + 6)} = 63
Users can now change both the Training Parameters and Stopping Rules for the
Neural Network. Click Training Parameters to open the Training Parameters

Frontline Solvers Analytic Solver Data Mining Reference Guide 469


dialog. For information on these parameters, please see the Options section that
occurs later in this chapter. For now, click Done to accept the default settings
and close the dialog.

Click Stopping Rules to open the Stopping Rules dialog. Here users can specify
a comprehensive set of rules for stopping the algorithm early plus cross-
validation on the training error. For more information on these options, please
see the Options section that appears later on in this chapter. For now, click
Done to accept the default settings and close the dialog.

Keep the defaults for both Hidden Layer and Output Layer. See the Neural
Network Regression Options section below for more information on these
options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 470


Click Finish.
Click the NNP_Output tab.

The top section includes the Output Navigator which can be used to quickly
navigate to various sections of the output. The Data, Variables, and
Parameters/Options sections of the output all reflect inputs chosen by the user.
Scroll down to the Error Report, a portion is shown below. This report displays
each network created by the Automatic Architecture algorithm and can be sorted
by each column by clicking the down arrow next to each column heading.

Click the down arrow next to the last column heading, Validation:MSE (Mean
Standard Error for the Validation dataset), and select Sort Smallest to Largest
from the menu. Note: Sorting is not supported in AnalyticSolver.com.

Immediately, the records in the table are sorted by smallest value to largest value
according to the Validation: MSE values.

Frontline Solvers Analytic Solver Data Mining Reference Guide 471


Since this column is the average residual error, the error closest to 0 would be
the record with the “best” average error, or lowest residual error. Take a look at
Net ID 61 which has the lowest average error in the validation dataset. This
network contains 2 hidden layers containing 9 neurons in the first hidden layer
and 4 neurons in the 2nd hidden layer. The Sum of Squared Error for this
network is 25, 941.36 in the Training set and 16924.00 in the Validation set.
Note that the number of networks trained is 63 or MIN {100, (9 * (1 + 6)} = 63
(as discussed above). Click the any of the 63 hyperlinks to open the Neural
Network Regression Data dialog. Click Finish to run the Neural Network
Regression method using the input and option settings for the ID selected.
Read on below for the last choice in Neural Network Regression methods,
creating a Neural Network, manually.

Manual Neural Network Regression Method Example


This example will use the same partitioned dataset to illustrate the use of the
Manual Network Architecture selection. Click back to the STDPartition sheet
and then click Predict – Neural Network – Manual Network on the Data
Mining ribbon.
Select MEDV as the Output variable, CHAS as a Categorical Variable and the
remaining variables as Selected Variables (except the CAT.MEDV and Record
ID variables). The last variable, CAT.MEDV, is a discrete classification of the
MEDV variable and will not be used in this example.

Frontline Solvers Analytic Solver Data Mining Reference Guide 472


Click Next to advance to the next dialog.
As discussed in the previous sections, Analytic Solver Data Mining includes the
ability to partition a dataset from within a classification or prediction method by
clicking Partition Data Parameters dialog. Analytic Solver Data Mining will
partition your dataset (according to the partition options you set) immediately
before running the regression method. If partitioning has already occurred on
the dataset, this option will be disabled. For more information on partitioning,
please see the Data Mining Partitioning chapter.
Click Add Layer to add a hidden layer to the Neural Network. Enter 25 for
Neurons for this layer. To remove a layer, select the layer to be removed, then
click Remove Layer.
To change the Training Parameters and Stopping Rules for the Neural Network,
click Training Parameters and Stopping Rules, respectively. For this example,
we will use the defaults. See the Neural Network Regression Options below for
more information on these parameters.
Leave Sigmoid selected for Hidden Layer and output Layer. See the Neural
Network Regression Options section below for more information on these
options.
Select Show Neural Network Weights to display this information in the output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 473


Click Next to advance to the Scoring dialog.
Summary Report is selected by default under Score Training Data and Score
Validation Data. Select Detailed Report and Lift Charts under Score Training
Data and Score Validation Data to show an assessment of the performance of
the algorithm in predicting the output variable. The output is displayed
according to the user’s specifications.
If a test dataset exists, the options under Score Test Data will be enabled and
Summary Report will be selected by default. Select Detailed report and Lift
charts under Score test data to show an assessment of the performance of the
algorithm on the test dataset in predicting the output variable.
Since we did not create a test dataset when we partitioned the data, these options
are not enabled. See the “Data Mining Partitioning” chapter for more
information on creating a test dataset.
See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for information on options under Score new data.

Frontline Solvers Analytic Solver Data Mining Reference Guide 474


Click Finish to initiate the output. Output from the Manual Neural Network
Regression algorithm is inserted at the end of the workbook. The Output
Navigator appears on NNP_Output. Click any link to easily view the results.

Click the Training: Prediction Summary link on the Output Navigator to open
the Training Summary. This data table displays various statistics to measure the
performance of the trained network: Sum of Squared Error (SSE), Mean
Squared Error (MSE), Root Mean Squared Error (RMSE), the Median Absolute
Deviation (MAD) and the Coefficient of Determination (R2). However, RROC
charts, shown below, are better indicators of fit. Read on to view how these
more sophisticated tools can tell us about the fit of the neural network to our
data.

Scroll down to view the Prediction Details data table. This table displays the
Actual versus Predicted values, along with the Residuals, for the training
dataset.
Click the Validation Prediction Summary link on the Output Navigator to
open the Validation: Prediction Summary and Validation: Prediction Details
data tables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 475


Analytic Solver Data Mining also provides intermediate information produced
during the last pass through the network. Scroll down NNP_Output to the
Interlayer connections' weights table.

Recall that a key element in a neural network is the weights for the connections
between nodes. In this example, we chose to have one hidden layer with 25
neurons. The Inter-Layer Connections Weights table contains the final values
for the weights between the input layer and the hidden layer, between hidden
layers, and between the last hidden layer and the output layer. This information
is useful at viewing the “insides” of the neural network; however, it is unlikely
to be of utility to the data analyst end-user. Displayed above are the final
connection weights between the input layer and the hidden layer for our example
and also the final weights between the hidden layers and the output layer.
Click the Training Log link on the Output Navigator to display the following
log.

Frontline Solvers Analytic Solver Data Mining Reference Guide 476


During an epoch, each training record is fed forward in the network and
classified. The error is calculated and is back propagated for the weights
correction. Weights are continuously adjusted during the epoch. The sum of
squares error is computed as the records pass through the network but does not
report the sum of squares error after the final weight adjustment. Scoring of the
training data is performed using the final weights so the training classification
error may not exactly match with the last epoch error in the Epoch log.
Click the NNP_TrainingDataLiftChart and NNP_ValidationDataLiftChart
tabs to view the lift charts and Regression ROC charts for both the training and
validation datasets.
Lift charts and Regression ROC Curves are visual aids for measuring model
performance. Lift Charts consist of a lift curve and a baseline. The greater the
area between the lift curve and the baseline, the better the model. RROC
(regression receiver operating characteristic) curves plot the performance of
regressors by graphing over-estimations (or predicted values that are too high)
versus underestimations (or predicted values that are too low.) The closer the
curve is to the top left corner of the graph (in other words, the smaller the area
above the curve), the better the performance of the model.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select NNP_TrainingLiftChart or NNP_ValidationLiftChart for
Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, RROC Curve and Lift Chart for Training Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 477


Decile-wise Lift Chart, RROC Curve and Lift Chart for Valid. Partition

After the model is built using the training data set, the model is used to score on
the training data set and the validation data set (if one exists). Then the data
set(s) are sorted using the predicted output variable value. After sorting, the
actual outcome values of the output variable are cumulated and the lift curve is
drawn as the number of cases versus the cumulated value. The baseline (red line
connecting the origin to the end point of the blue line) is drawn as the number of
cases versus the average of actual output variable values multiplied by the
number of cases.
The decilewise lift curve is drawn as the decile number versus the cumulative
actual output variable value divided by the decile's mean output variable value.
This bars in this chart indicate the factor by which the NNP model outperforms a
random assignment, one decile at a time. Typically, this graph will have a
"stairstep" appearance - the bars will descend in order from left to right. This
means that the model is "binning" the records correctly, from highest priced to
lowest. However, in this example, the left most bars are shorter than bars
appearing to the right. This type of graph indicates that the model might not be
a good fit to the data. Additional analysis is required.
The Regression ROC curve (RROC) was updated in V2017. This new chart
compares the performance of the regressor (Fitted Predictor) with an Optimum
Predictor Curve. The Optimum Predictor Curve plots a hypothetical model that
would provide perfect prediction results. The best possible prediction
performance is denoted by a point at the top left of the graph at the intersection
of the x and y axis. This point is sometimes referred to as the “perfect
classification”. Area Over the Curve (AOC) is the space in the graph that
appears above the ROC curve and is calculated using the formula: sigma2 * n2/2
where n is the number of records The smaller the AOC, the better the
performance of the model.
In V2017, two new charts were introduced: a new Lift Chart and the Gain
Chart. To display these new charts, click the down arrow next to Lift Chart
(Original), in the Original Lift Chart, then select the desired chart.

Select Lift Chart (Alternative) to display Analytic Solver Data Mining's new Lift
Chart. Each of these charts consists of an Optimum Predictor curve, a Fitted
Predictor curve, and a Random Predictor curve. The Optimum Predictor curve
plots a hypothetical model that would provide perfect classification for our data.
The Fitted Predictor curve plots the fitted model and the Random Predictor

Frontline Solvers Analytic Solver Data Mining Reference Guide 478


curve plots the results from using no model or by using a random guess (i.e. for
x% of selected observations, x% of the total number of positive observations are
expected to be correctly classified).
Lift Chart (Alternative) and Gain Chart for Training Partition

Lift Chart (Alternative) and Gain Chart for Validation Partition

Click the down arrow and select Gain Chart from the menu. In this chart, the
Gain Ratio is plotted against the % Cases.
See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for information on the Stored Model Sheet, NNP_Stored.

Neural Network Regression Method Options


The options below appear on one of the Neural Network Regression method
dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 479


Variables In Input Data
All variables in the dataset are listed here.

Selected Variables
Variables listed here will be utilized in the Analytic Solver Data Mining output.

Categorical Variables
Place categorical variables from the Variables listbox to be included in the
model by clicking the > command button. The Neural Network Regression
algorithm will accept non-numeric categorical variables.

Output Variable
Select the variable whose outcome is to be predicted here.

Frontline Solvers Analytic Solver Data Mining Reference Guide 480


See below for options appearing on the Neural Network Regression –
Parameters dialog. Note: The Neural Network Automatic Regression –
Parameters dialog does not include Architecture, but is otherwise the same.

Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data
Parameters dialog. Analytic Solver Data Mining will partition your dataset
(according to the partition options you set) immediately before running the
regression method. If partitioning has already occurred on the dataset, this
option will be disabled. For more information on partitioning, please see the
Data Mining Partitioning chapter.

Rescale Data
Click Rescale Data to open the Rescaling dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 481


Use Rescaling to normalize one or more features in your data during the data
preprocessing stage. Analytic Solver Data Mining provides the following
methods for feature scaling: Standardization, Normalization, Adjusted
Normalization and Unit Norm. For more information on this new feature, see
the Rescale Continuous Data section within the Transform Continuous Data
chapter that occurs earlier in this guide.

Hidden Layers/Neurons
Click Add Layer to add a hidden layer. To delete a layer, click Remove Layer.
Once the layer is added, enter the desired Neurons.

Hidden Layer
Nodes in the hidden layer receive input from the input layer. The output of the
hidden nodes is a weighted sum of the input values. This weighted sum is
computed with weights that are initially set at random values. As the network
“learns”, these weights are adjusted. This weighted sum is used to compute the
hidden node’s output using a transfer function.
Select Sigmoid (the default setting) to use a logistic function for the transfer
function with a range of 0 and 1. This function has a “squashing effect” on very
small or very large values but is almost linear in the range where the value of the
function is between 0.1 and 0.9.
Select Hyperbolic Tangent to use the tanh function for the transfer function, the
range being -1 to 1. If more than one hidden layer exists, this function is used
for all layers.
ReLU (Rectified Linear Unit) is a widely used choice for hidden layers. This
activation function applies max(0,x) function to the neuron values. When used
instead of logistic sigmoid or hyperbolic tangent activations, some adjustments
to the Neural Network settings are typically required to achieve a good
performance, such as: significantly decreasing the learning rate, increasing the
number of learning epochs and network parameters.

Output Layer
As in the hidden layer output calculation (explained in the above paragraph), the
output layer is also computed using the same transfer function as described for
Activation: Hidden Layer.
Select Sigmoid (the default setting) to use a logistic function for the transfer
function with a range of 0 and 1.

Frontline Solvers Analytic Solver Data Mining Reference Guide 482


Select Hyperbolic Tangent to use the tanh function for the transfer function, the
range being -1 to 1.
Linear activation takes the form of const*x to create an output signal
proportional to the neuron input. It is applicable and most commonly used for
the output layer of regression problems to handle the continuous response that is
unbounded in nature.

Training Parameters
Click Training Parameters to open the Training Parameters dialog to specify
parameters related to the training of the Neural Network algorithm.

Learning Order [Original or Random]


This option specifies the order in which the records in the training
dataset are being processed. It is recommended to shuffle the training
data to avoid the possibility of processing correlated reocrds in order.
It also helps the neural network algorithm to converge faster. If
Random is selected, Random Seed is enabled.

Learning Order [Random Seed]


This option specifies the seed for shuffling the training records. Note
that different random shuffling may lead to different results, but as long
as the training data is shuffled, different ordering typically does not
result in drastic changes in performance.

Random Seed for Weights Initialization


If an integer value appears for Random Seed for Weights Initialization,
Analytic Solver Data Mining will use this value to set the seed for the
initial assignment of the neuron values. Setting the random number
seed to a nonzero value (any number of your choice is OK) ensures that
the same sequence of random numbers is used each time the neuron
values are calculated. The default value is “12345”. If left blank, the
random number generator is initialized from the system clock, so the
sequence of random numbers will be different in each calculation. If

Frontline Solvers Analytic Solver Data Mining Reference Guide 483


you need the results from successive runs of the algorithm to another to
be strictly comparable, you should set the seed. To do this, type the
desired number you want into the box.

Learning Rate
This is the multiplying factor for the error correction during
backpropagation; it is roughly equivalent to the learning rate for the
neural network. A low value produces slow but steady learning, a high
value produces rapid but erratic learning. Values for the step size
typically range from 0.1 to 0.9.

Weight Decay
To prevent over-fitting of the network on the training data, set a weight
decay to penalize the weight in each iteration. Each calculated weight
will be multiplied by (1-decay).

Weight Change Momentum


In each new round of error correction, some memory of the prior
correction is retained so that an outlier that crops up does not spoil
accumulated learning.

Error Tolerance
The error in a particular iteration is backpropagated only if it is greater
than the error tolerance. Typically error tolerance is a small value in the
range from 0 to 1.

Response Rescaling Correction


This option specifies a small number, which is applied to the
Normalization rescaling formula, if the output layer activation is
Sigmoid (or Softmax in Classification), and Adjusted Normalization, if
the output layer activation is Hyperbolic Tangent. The rescaling
correction ensures that all response values stay within the range of
activation function.

Stopping Rules
Click Stopping Rules to open the Stopping Rules dialog. Here users can specify
a comprehensive set of rules for stopping the algorithm early plus cross-
validation on the training error.

Frontline Solvers Analytic Solver Data Mining Reference Guide 484


Partition for Error Computation
Specifies which data partition is used to estimate the error after each
training epoch.

Number of Epochs
An epoch is one sweep through all records in the training set. Use this
option to set the number of epochs to be performed by the algorithm.

Maximum Number of Epochs Without


Improvement
The algorithm will stop after this number of epochs has been
completed, and no improvement has ben realized.

Maximum Training Time


The algorithm will stop once this time (in seconds) has been exceeded.

Keep Minimum Relative Change in Error


If the relative change in error is less than this value, the algorithm will
stop.

Keep Minimum Relative Change in Error


Compared to Null Model
If the relative change in error compared to the Null Model is less than
this value, the algorithm will stop. Null Model is the baseline model
used for comparing the performance of the neural network model.

Frontline Solvers Analytic Solver Data Mining Reference Guide 485


See below for option descriptions on the Neural Network Regression - Scoring
dialog.

Score Training Data


Select these options to show an assessment of the performance of the algorithm
in predicting the output variable using the training data. The report is displayed
according to your specifications - Detailed, Summary, and Lift Charts.

Score Validation Data


These options are enabled when a validation dataset exists. Select these options
to show an assessment of the performance of the algorithm in predicting the
value of the output variable using the validation data. The report is displayed
according to your specifications - Detailed, Summary and Lift Charts.

Score Test Data


These options are enabled when a test dataset exists. Select these options to
show an assessment of the performance of the algorithm in predicting the value
of the output variable using the test data. The report is displayed according to
your specifications - Detailed, Summary, Lift Charts and Lift Charts.

Score New Data


The options in this group apply to the model to be scored to an altogether new
dataset. See the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for details on these options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 486


Ensemble Methods
Analytic Solver Data Mining offers three powerful ensemble methods for use
with Regression: bagging (bootstrap aggregating), boosting, and random trees.
Analytic Solver Data Mining Regression Algorithms on their own can be used to
find one model that results in good predictions for the new data. We can view
the statistics and confusion matrices of the current predictor to see if our model
is a good fit to the data, but how would we know if there is a better predictor just
waiting to be found? The answer is that we do not know if a better predictor
exists. However, ensemble methods allow us to combine multiple “weak”
regression models which, when taken together form a new, more accurate
“strong” regression model. These methods work by creating multiple diverse
regression models, by taking different samples of the original dataset, and then
combining their outputs. (Outputs may be combined by several techniques for
example, majority vote for classification and averaging for regression. This
combination of models effectively reduces the variance in the “strong” model.
The three different types of ensemble methods offered in Analytic Solver Data
Mining (bagging, boosting, and random trees) differ on three items: 1.The
selection of training data for each predictor or “weak” model, 2.How the “weak”
models are generated and 3. How the outputs are combined. In all three
methods, each “weak” model is trained on the entire training dataset to become
proficient in some portion of the dataset.

Bagging, or bootstrap aggregating, was one of the first ensemble algorithms ever
to be written. It is a simple algorithm, yet very effective. Bagging generates
several training data sets by using random sampling with replacement (bootstrap
sampling), applies the regression model to each dataset, then takes the average
amongst the models to calculate the predictions for the new data. The biggest
advantage of bagging is the relative ease that the algorithm can be parallelized
which makes it a better selection for very large datasets.

Boosting, in comparison, builds a “strong” model by successively training


models to concentrate on records receiving inaccurate predicted values in
previous models. Once completed, all predictors are combined by a weighted
majority vote. Analytic Solver Data Mining offers three different variations of
boosting as implemented by the AdaBoost algorithm (one of the most popular
ensemble algorithms in use today): M1 (Freund), M1 (Breiman), and SAMME
(Stagewise Additive Modeling using a Multi-class Exponential).

Adaboost.M1 first assigns a weight (wb(i)) to each record or observation. This


weight is originally set to 1/n and will be updated on each iteration of the
algorithm. An original regression model is created using this first training set
(Tb) and an error is calculated as:

n
eb =  wb(i) I (Cb( xi) yi)) )
i −1

where the I() function returns 1 if true and 0 if not.

The error of the regression model in the bth iteration is used to calculate the
constant αb. This constant is used to update the weight (wb(i). In AdaBoost.M1
(Freund), the constant is calculated as:

Frontline Solvers Analytic Solver Data Mining Reference Guide 487


αb= ln((1-eb)/eb)
In AdaBoost.M1 (Breiman), the constant is calculated as:

αb= 1/2ln((1-eb)/eb)

In SAMME, the constant is calculated as:

αb= 1/2ln((1-eb)/eb + ln(k-1) where k is the number of classes

(When the number of categories is equal to 2, SAMME behaves the same as


AdaBoost Breiman.)

In any of the three implementations (Freund, Breiman, or SAMME), the new


weight for the (b + 1)th iteration will be

wb + 1(i ) = wb(i ) exp( bI (Cb( xi )  yi ))

Afterwards, the weights are all readjusted to sum to 1. As a result, the weights
assigned to the observations that were assigned inaccurate predicted values are
increased and the weights assigned to the observations that were assigned
accurate predicted values are decreased. This adjustment forces the next
regression model to put more emphasis on the records that were assigned
inaccurate predictions. (This α constant is also used in the final calculation
which will give the regression model with the lowest error more influence.)
This process repeats until b = Number of weak learners (controlled by the User).
The algorithm then computes the weighted average among all weak learners and
assigns that value to the record. Boosting generally yields better models than
bagging, however, it does have a disadvantage as it is not parallelizable. As a
result, if the number of weak learners is large, boosting would not be suitable.

Random trees, also known as random forests, is a variation of bagging. This


method works by training multiple “weak” regression trees using a fixed number
of randomly selected features (sqrt[number of features] for classification and
number of features/3 for prediction) then takes the average value for the weak
learners and assigns that value to the “strong” predictor. (This ensemble
method only accepts Regression Trees as a weak learner.) Typically, in this
method the number of “weak” trees generated could range from several hundred
to several thousand depending on the size and difficulty of the training set.
Random Trees are parallelizable since they are a variant of bagging. However,
since Random Trees selects a limited amount of features in each iteration, the
performance of random trees is faster than bagging.

Ensemble Methods are very powerful methods and typically result in better
performance than a single tree. This feature addition in Analytic Solver Data
Mining (introduced in V2015) provides users with more accurate prediction
models and should be considered over the single tree method.

Boosting Regression Example


This example focuses on the boosting ensemble method using linear regression
as the weak learner. We will use the Boston_Housing.xlsx example dataset.
This dataset contains 14 variables, the description of each is given in the table

Frontline Solvers Analytic Solver Data Mining Reference Guide 488


below. The dependent variable MEDV is the median value of a dwelling. This
objective of this example is to predict the value of this variable.

CRIM Per capita crime rate by town


ZN Proportion of residential land zoned for lots over
25,000 sq.ft.
INDUS Proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds
river; 0 otherwise)
NOX Nitric oxides concentration (parts per 10 million)
RM Average number of rooms per dwelling
AGE Proportion of owner-occupied units built prior to
1940
DIS Weighted distances to five Boston employment
centers
RAD Index of accessibility to radial highways
TAX Full-value property-tax rate per $10,000
PTRATIO Pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of
African-Americans by town
LSTAT % Lower status of the population
MEDV Median value of owner-occupied homes in $1000's

A portion of the dataset is shown below. The last variable, CAT.MEDV, is a


discrete classification of the MEDV variable and will not be used in this
example.

First, we partition the data into training and validation sets using the Standard
Data Partition defaults with percentages of 60% of the data randomly allocated
to the Training Set and 40% of the data randomly allocated to the Validation

Frontline Solvers Analytic Solver Data Mining Reference Guide 489


Set. For more information on partitioning a dataset, see the Data Mining
Partitioning chapter.

Click Predict – Ensemble – Boosting on the Data Mining ribbon. The


Boosting – Data dialog appears. Confirm that STDPartition is selected for
Worksheet under Data Source.
Select MEDV as the Output variable, CHAS as a Categorical Variable and the
remaining variables as Selected Variables (except the CAT.MEDV and Record
ID variable). The last variable, CAT.MEDV, is a discrete classification of the
MEDV variable and will not be used in this example.

Frontline Solvers Analytic Solver Data Mining Reference Guide 490


Click Next to advance to the next dialog.
Select the down arrow beneath Weak Learner and selct Linear Regression from
the menu. A command button will appear to the right of the Weak Learner
menu labeled Linear Regression. Click here to change any options related to
this weak leaner. For more information on any of these options, see the Linear
Regression chapter the appears earlier in this Guide.
Select Show Weak Learner Models to include this information in the output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 491


Click Next to advance to the Boosting – Scoring dialog.
Summary report is selected by default under Score Training Data and Score
Validation Data. Select Detailed Report, and Lift Charts under both Score
Training Data and Score Validation Data. Since a Test Data partition was not
created, the options under Score Test Data are disabled.
For more information on the Score New Data options, see the “Scoring New
Data” chapter within the Analytic Solver Data Mining User Guide.

Click Finish to view the output. Multiple output sheets are inserted at the end of
the workbook. Double click RBoosting_Output to view the Output Navigator.

Click the Training: Prediction Summary link on the Output Navigator to display
the Training: Prediction Summary and Training: Prediction Details data tables.
Analytic Solver Data Mining displays various statistics in the Prediction
Summary that can be used to determine if the model is a good fit to the data.
See the Prediction Details table to see the actual value, predicted value and
residual for each record.

Frontline Solvers Analytic Solver Data Mining Reference Guide 492


Click the Validation: Prediction Summary link on the Output Navigator to
display the Validation: Prediction Summary and Validation: Prediction Details
data tables.

Click the Boosting Model link on the Output Naviagator to view the Boosting
model for each weak learner. Recall that the default is "50" on the Parameters
dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 493


Click the RBoosting_TrainLiftChart and RBoosting_ValidLiftChart tabs to
navigate to the Lift Charts and Regression RROC curves for both the training
and validation datasets. For more information on how to interpret these charts,
see the Mulitple Linear Regression chapter that appears in the Analytic Solver
Data Mining User Guide.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select RBoosting_TrainingLiftChart or RBoosting_ValidationLiftChart
for Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, RROC Curve and Lift Chart for Training Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 494


Decile-wise Lift Chart, RROC Curve and Lift Chart for Valid. Partition

See the “Scoring New Data” chapter in the Analytic Solver Data Mining User
Guide for information on the Stored Model Sheet, RBoosting_Stored.
Continue on with the Bagging Neural Network Regression Example in the next
section to compare the results between the two ensemble methods.

Bagging Regression Example


This example focuses on creating a Neural Network using the bagging ensemble
method. See the section above to see an example on creating a Neural Network
using the boosting ensemble method. This example reuses the standard data
partition created in the Boosting example above.
Click Predict – Ensemble– Bagging on the Data Mining ribbon. The Bagging
– Data dialog appears.
Select MEDV as the Output variable, CHAS as a Categorical Variables and the
remaining variables as Selected Variables (except the CAT.MEDV and Record
ID variables). The last variable, CAT.MEDV, is a discrete classification of the
MEDV variable and will not be used in this example.

Frontline Solvers Analytic Solver Data Mining Reference Guide 495


Click Next to advance to the next dialog.
Select the down arrow beneath Weak Learner and select Neural Network from
the menu. A command button will appear to the right of the Weak Learner
menu labeled Neural Network. Click this button and then Add Layer twice to
add two layers with 5 and 3 neurons, respectively. For more information on any
of these options, see the Neural Network chapter the appears earlier in this
Guide.

Select Show Weak Learner Models to include this information in the output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 496


Click Next to advance to the Bagging – Scoring dialog.
Summary report is selected by default under Score Training Data and Score
Validation Data. Select Detailed Report, and Lift Charts under both Score
Training Data and Score Validation Data. Since a Test Data partition was not
created, the options under Score Test Data are disabled.
For more information on the Score New Data options, see the “Scoring New
Data” chapter within the Analytic Solver Data Mining User Guide.

Click Finish to view the output.


Click RBagging_Output to view the Output Navigator.

Click the Training: Prediction Summary link on the Output Navigator to display
the Training: Prediction Summary and Training: Prediction Details data tables.
Analytic Solver Data Mining displays various statistics in the Prediction
Summary that can be used to determine if the model is a good fit to the data.
See the Prediction Details table to see the actual value, predicted value and
residual for each record.

Frontline Solvers Analytic Solver Data Mining Reference Guide 497


Click the Validation: Prediction Summary link on the Output Navigator to
display the Validation: Prediction Summary and Validation: Prediction Details
data tables.

Click the Bagging Model link on the Output Naviagator to view the Bagging
model for each weak learner. Recall that the default is "50" on the Parameters
dialog.
In this example, we chose to have two hidden layers, the first with 5 neurons and
the 2nd with 3 neurons. The Inter-Layer Connections Weights tables contain the
final values for the weights between the input layer and the hidden layer,
between hidden layers, and between the last hidden layer and the output layer.

Click the RBagging_TrainLiftChart and RBagging_ValidLiftChart tabs to


navigate to the Lift Charts and Regression RROC curves for both the training
and validation datasets. For more information on how to interpret these charts,
see the Neural Network chapter that appears earlier in this guide.

Frontline Solvers Analytic Solver Data Mining Reference Guide 498


Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select RBagging_TrainingLiftChart or RBagging_ValidationLiftChart
for Worksheet and Decile Chart, ROC Chart or Gain Chart for Chart.
Decile-wise Lift Chart, RROC Curve and Lift Chart for Training Partition

Decile-wise Lift Chart, RROC Curve and Lift Chart for Validation
Partition

See the “Scoring New Data” chapter within the Analytic Solver Data Mining
User Guide for information on the Stored Model Sheet, RBagging_Stored.
Continue on with the Random Trees Ensemble Method Example below.

Frontline Solvers Analytic Solver Data Mining Reference Guide 499


Random Trees Ensemble Method Example
This example illustrates how to use the 3rd ensemble method, random trees, to
create a regression model. We’ll again re-use the same partition of the
Boston_Housing.xlsx dataset.
Click Predict – Ensemble – Random Trees on the Data Mining ribbon.
Select MEDV as the Output variable and CHAS as a Categorical Variable.
Then select all remaining variables except CAT.MEDV and Record ID as
Selected Variables.

Click Next to advance to the Random Trees- Data dialog.


Recall that Random Trees only supports Decision Trees as a Weak Learner.
Click Decision Tree to change any options associated with this algorithm. This
example uses the default settings for the Decision Tree algorithm. For more
information on these options, see the Regression Trees chapter that occurs
earlier in this guide.
Select Show Weak Learner Models and Show Feature Importance to include
this information in the output.

Frontline Solvers Analytic Solver Data Mining Reference Guide 500


Click Next to advance to the Random Trees Scoring dialog.
Summary Report is selected by default under both Score Training Data and
Score Validation Data. Select Detailed Report and Lift Charts under both
Score Training Data and Score Validation Data to produce a detailed
assessment of the performance of the tree in both sets. Since we did not create a
test partition, the options for Score test data are disabled. See the chapter “Data
Mining Partitioning” for information on how to create a test partition.
Please see the “Scoring New Data” chapter within the Analytic Solver User
Guide for information on the Score new data options.

Click Finish. The output of the Ensemble Methods algorithm are inserted at the
end of the workbook. Double click RRandTrees_Output to view the Output
Navigator. Click any link in this section to navigate to various sections of the
output.

Click the Training: Prediction Summary link on the Output Navigator to display
the Training: Prediction Summary and Training: Prediction Details data tables.

Frontline Solvers Analytic Solver Data Mining Reference Guide 501


Analytic Solver Data Mining displays various statistics in the Prediction
Summary that can be used to determine if the model is a good fit to the data.
See the Prediction Details table to see the actual value, predicted value and
residual for each record.

Click the Validation: Prediction Summary link on the Output Navigator to


display the Validation: Prediction Summary and Validation: Prediction Details
data tables.

Click the Random Trees Model link on the Output Naviagator to view the
Random Trees model for each weak learner. Recall that the default is "50" on
the Parameters dialog so there is output for each of the 50 trees. For more
information on how to read these tree rules, see the Regression Tree chapter that
occurs earlier in this guide.
Click the RRandom_TrainLiftChart and RRandom_ValidLiftChart tabs to
navigate to the Lift Charts and Regression RROC curves for both the training
and validation datasets. For informaton on how to interpret these charts, see the
Regression Tree chapter that appears earlier in this guide.
Note: To view these charts in the Cloud app, click the Charts icon on the
Ribbon, select RRandTrees_TrainingLiftChart or
RRandTrees_ValidationLiftChart for Worksheet and Decile Chart, ROC Chart
or Gain Chart for Chart.
Decile-wise Lift Chart, RROC Curve and Lift Chart for Training Partition

Frontline Solvers Analytic Solver Data Mining Reference Guide 502


Decile-wise Lift Chart, RROC Curve and Lift Chart for Valid. Partition

Analytic Solver Data Mining generates RRandTrees_Stored along with the other
output. Please refer to the “Scoring New Data” chapter within the Analytic
Solver Data Mining User Guide for details.

Ensemble Methods for Regression Options


The following options appear on the Bagging, Boosting, and Random Trees
dialogs.

Frontline Solvers Analytic Solver Data Mining Reference Guide 503


Please see below for options appearing on the Ensemble Methods- Data dialog.

Variables In Input Data


The variables included in the dataset appear here.

Selected Variables
Variables selected to be included in the output appear here.

Categorical Variables
Place categorical variables from the Variables listbox to be included in the
model by clicking the > command button. Ensemble Methods will accept non-
numeric categorical variables.

Output Variable
The dependent variable or the variable to be classified appears here.
Please see below for options appearing on the Boosting – Parameters dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 504


Partition Data
Analytic Solver Data Mining includes the ability to partition a dataset from
within a classification or prediction method by clicking Partition Data on the
Parameters dialog. Click Partition Data to open the Partitioning dialog.
Analytic Solver Data Mining will partition your dataset (according to the
partition options you set) immediately before running the regression method. If
partitioning has already occurred on the dataset, this option will be disabled.
For more information on partitioning, please see the Data Mining Partitioning
chapter.

Rescale Data
Recall that the Euclidean distance measurement performs best when each
variable is rescaled. Here you can select how you want to standardize your
variables using Standardization. Normalization, Adjusted Normalization and
Unit Norm.

Frontline Solvers Analytic Solver Data Mining Reference Guide 505


• Standardization, sometimes referred to as Z-Scores, subtracts the mean
from each record's variable value and divides it by the standard
deviation. (x−mean)
• Normalization subtracts the minimum value from each record's variable
value and divides by the range. (x−min)/(max−min)
The Correction option specifies a number ε that is applied as a
correction to the rescaling formula. The corrected formula is
[x−(min−ε)]/[(max+ε)−(min−ε)].
• Adjusted Normalization subtracts the minimum value from each
record's variable value and divides by the range.
[2(x−min)/(max−min)]−1
The Correction option specifies a number ε that is applied as a
correction to the rescaling formula. The corrected formula is
{2[(x−(min−ε))/((max+ε)−(min−ε))]}−1.
• Unit Normalization is another option that is widely used in machine-
learning to scale the components of a feature vector such that the
complete vector has a length of one. This usually means dividing each
component by the Euclidean length of the vector. In some applications
it can be more practical to use the L1-norm.

Number of Weak Learners


This option controls the number of “weak” prediction models that will be
created. The ensemble method will stop when the number of prediction models
created reaches the value set for this option.

Weak Learner
Under Ensemble: Regression click the down arrow beneath Weak Leaner to
select one of the four featured classifiers: Linear Regression, k-NN, Neural
Networks, or Decision Trees. After a weak learner is chosen, the command
button to the right will be enabled. Click this command button to control
various option settings for the weak leaner.

Step Size
The Adaboost algorithm minimizes a loss function using the gradient descent
method. The Step size option is used to ensure that the algorithm does not
descend too far when moving to the next step. It is recommended to leave this
option at the default of 0.3, but any number between 0 and 1 is acceptable. A
Step size setting closer to 0 results in the algorithm taking smaller steps to the
next point, while a setting closer to 1 results in the algorithm taking larger steps
towards the next point.

Show Weak Learner


To display the weak learner models in the output, select Show Weak Learner
Models.
Please see below for options unique to the Bagging – Parameters dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 506


Random Seed for Boostrapping
If an integer value appears for Bootstrapping Random seed, Analytic Solver
Data Mining will use this value to set the bootstrapping random number seed.
Setting the random number seed to a nonzero value (any number of your choice
is OK) ensures that the same sequence of random numbers is used each time the
dataset is chosen for the classifier. The default value is “12345”. If left blank,
the random number generator is initialized from the system clock, so the
sequence of random numbers will be different in each calculation. If you need
the results from successive runs of the algorithm to another to be strictly
comparable, you should set the seed. To do this, type the desired number you
want into the box. This option accepts both positive and negative integers with
up to 9 digits.
Please see below for options unique to the Random Trees – Parameters dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 507


Number of Randomly Selected Features
The Random Trees ensemble method works by training multiple “weak”
classification trees using a fixed number of randomly selected features then
taking the mode of each class to create a “strong” classifier. The option Number
of randomly selected features controls the fixed number of randomly selected
features in the algorithm. The default setting is 3.

Random Seed for Feature Selection


If an integer value appears for Feature Selection Random seed, Analytic Solver
Data Mining will use this value to set the feature selection random number seed.
Setting the random number seed to a nonzero value (any number of your choice
is OK) ensures that the same sequence of random numbers is used each time the
dataset is chosen for the classifier. The default value is “12345”. If left blank,
the random number generator is initialized from the system clock, so the
sequence of random numbers will be different in each calculation. If you need
the results from successive runs of the algorithm to another to be strictly
comparable, you should set the seed. To do this, type the desired number you
want into the box. This option accepts both positive and negative integers with
up to 9 digits.
Please see below for options that are unique to the Ensemble Methods Scoring
dialog.

Frontline Solvers Analytic Solver Data Mining Reference Guide 508


Score Training Data
Select these options to show an assessment of the performance of the algorithm
in predicting the output variable using the training data. The report is displayed
according to your specifications - Detailed, Summary, and Lift Charts.

Score Validation Data


These options are enabled when a validation dataset exists. Select these options
to show an assessment of the performance of the algorithm in predicting the
value of the output variable using the validation data. The report is displayed
according to your specifications - Detailed, Summary and Lift Charts.

Score Test Data


These options are enabled when a test dataset exists. Select these options to
show an assessment of the performance of the algorithm in predicting the value
of the output variable using the test data. The report is displayed according to
your specifications - Detailed, Summary, Lift Charts and Lift Charts.

Score New Data


The options in this group apply to the model to be scored to an altogether new
dataset. See the “Scoring New Data” chapter within the Analytic Solver Data
Mining User Guide for details on these options.

Frontline Solvers Analytic Solver Data Mining Reference Guide 509


Association Rules

Introduction
The goal of association rules mining is to recognize associations and/or
correlations among large sets of data items. A typical and widely-used example
of association rules mining is the Market Basket Analysis. Most ‘market basket’
databases consist of a large number of transaction records where each record
lists all items purchased by a customer during a trip through the check-out line.
Data is easily and accurately collected through the bar-code scanners.
Supermarket managers are interested in determining what foods customers
purchase together, like, for instance, bread and milk, bacon and eggs, wine and
cheese, etc. This information is useful in planning store layouts (placing items
optimally with respect to each other), cross-selling promotions, coupon offers,
etc.
Association rules provide results in the form of "if-then" statements. These rules
are computed from the data and, unlike the if-then rules of logic, are
probabilistic in nature. The “if” portion of the statement is referred to as the
antecedent and the “then” portion of the statement is referred to as the
consequent.
In addition to the antecedent (the "if" part) and the consequent (the "then" part),
an association rule contains two numbers that express the degree of uncertainty
about the rule. In association analysis the antecedent and consequent are sets of
items (called itemsets) that are disjoint meaning they do not have any items in
common. The first number is called the support which is simply the number of
transactions that include all items in the antecedent and consequent. (The
support is sometimes expressed as a percentage of the total number of records in
the database.) The second number is known as the confidence which is the ratio
of the number of transactions that include all items in the consequent as well as
the antecedent (namely, the support) to the number of transactions that include
all items in the antecedent. For example, assume a supermarket database has
100,000 point-of-sale transactions, out of which 2,000 include both items A and
B and 800 of these include item C. The association rule "If A and B are
purchased then C is purchased on the same trip" has a support of 800
transactions (alternatively 0.8% = 800/100,000) and a confidence of 40%
(=800/2,000). In other words, support is the probability that a randomly selected
transaction from the database will contain all items in the antecedent and the
consequent. Confidence is the conditional probability that a randomly selected
transaction will include all the items in the consequent given that the transaction
includes all the items in the antecedent.
Lift is one more parameter of interest in the association analysis. Lift is the ratio
of Confidence to Expected Confidence. Expected Confidence, in the example
above, is the "confidence of buying A and B does not enhance the probability of
buying C." or the number of transactions that include the consequent divided by
the total number of transactions. Suppose the total number of transactions for C
is 5,000. Expected Confidence is computed as 5% (5,000/1,000,000) while the
ratio of Lift Confidence to Expected Confidence is 8 (40%/5%). Hence, Lift is a
value that provides information about the increase in probability of the "then"
(consequent) given the "if" (antecedent).

Frontline Solvers Analytic Solver Data Mining Reference Guide 510


A lift ratio larger than 1.0 implies that the relationship between the antecedent
and the consequent is more significant than would be expected if the two sets
were independent. The larger the lift ratio, the more significant the association.

Association Rules Example


This example below illustrates how to use Analytic Solver Data Mining’s
Association Rules method using the example dataset contained in the file,
Associations.xlsx. Click Help – Examples on the Data Mining ribbon, then
Forecasting/Data Mining Examples to open this dataset.

Click Associate – Association Rules to open the Association Rules dialog.


Since the data contained in the Associations.xlsx dataset are all 0’s and 1’s,
Data in binary matrix format is selected by default for the option, Input data
format. Analytic Solver Data Mining will treat the data as a matrix of two
entities -- zeros and non-zeros. A 0 signifies that the item is absent in that
transaction and a 1 signifies the item is present.
Note: If a value other than 0 or 1 were present in the dataset, Data in Item List
Format would have been selected by default for the option, Input Data Format.
Keep the default of 200 for the Minimum Support (# transactions). This option
specifies the minimum number of transactions in which a particular item-set
must appear to qualify for inclusion in an association rule.
Keep the default of 50 for Minimum confidence %. This option specifies the
minimum confidence threshold for rule generation. If A is the set of Antecedents
and C the set of Consequents, then only those A =>C ("Antecedent implies
Consequent") rules will qualify, for which the ratio (support of A U C) /
(support of A) at least equals this percentage.

Frontline Solvers Analytic Solver Data Mining Reference Guide 511


Click OK. AR_Output is inserted to the right of the Assoc_binary worksheet.

Rule 27 indicates that if a Cook book and a Reference book is purchased, then
with 80% confidence a Child book will also be purchased. The A - Support
indicates that the rule has the support of 305 transactions, meaning that 305
people bought a cook book and a Reference book. The C - Support column
indicates the number of transactions involving the purchase of Child books. The

Frontline Solvers Analytic Solver Data Mining Reference Guide 512


Support column indicates the number of transactions where all three types were
purchased.
The Lift Ratio indicates how much more likely a transaction will be found
where all three book types (Cook, Reference, and Child) are purchased, as
compared to the entire population of transactions. In other words, the Lift Ratio
is the Confidence divided by the percentage of C-Support transactions in the
entire dataset. The percentage of C-Support transactions in the entire dataset for
Rule 27 is .423 (846/2000). Confidence is then divided by this value to find the
Lift Ratio or 0.803/.423 = 1.899. Given support at 80.3% and a lift ratio of
1.899 (lift ratio > 1), this rule can be considered “useful”.

Association Rules Options


The following options appear on the Association Rules dialog.

Input data format


Select Data in binary matrix format if each column in the data represents a
distinct item. If this option is selected, Analytic Solver Data Mining treats the
data as a matrix of two entities -- zeros and non-zeros. All non-zeros are treated
as 1's. So, effectively the data set is converted to a binary matrix which contains
0's and 1's. A 0 indicates that the item is absent in the transaction and a 1
indicates it is present.
Select Data in item list format if each row of data consists of item codes or
names that are present in that transaction.

Minimum support (# transactions)


Specify the minimum number of transactions in which a particular item-set must
appear for it to qualify for inclusion in an association rule here. The default
value is 10% of the total number of rows.

Minimum confidence (%)


A value entered for this option specifies the minimum confidence threshold for
rule generation. If A is the set of Antecedents and C the set of Consequents, then
only those A =>C ("Antecedent implies Consequent") rules will qualify, for

Frontline Solvers Analytic Solver Data Mining Reference Guide 513


which the ratio (support of A U C) / (support of A) is greater than or equal to.
The default setting is 50.

Frontline Solvers Analytic Solver Data Mining Reference Guide 514

You might also like