0% found this document useful (0 votes)
17 views24 pages

L4SAS Viya - Decision Tree

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 24

Decision Trees

Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.


Decision Trees in SAS Visual Statistics
• There is only one response variable. It can be either a category or a measure.
(Both decision trees and regression trees can be created.)

• There can be multiple predictors (categories or measures)

• Both category and measure predictors are accommodated


=> extremely versatile technique as can be used with any data type

• Using Interactive mode, you can manually train and prune a decision tree

• You can derive a leaf ID. This ID can be used in other models that are featured in
the SAS Visual Statistics functionality

2
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Roles
• Response – only one measure or
categorical variable

• Predictors – assign any number of


measure and category variables

• Partition ID – only one partition


variable (optional)
>to build a DT for a cluster

3
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
continued...
Decision Tree Options
• Decision Tree
• Event level • Rapid growth
• Autotune • Prune with
• Missing assignment validation data

• Minimum value • Pruning


• Growth strategy • Reuse predictors

• Maximum branches • Number of bins

• Maximum levels • Prediction cutoff

• Leaf size • Statistic percentile

• Bin response variable • Tolerance


• Predictor bins
• Bin method
4
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
See lectures notes for more details continued...
Decision Tree Options
• Decision Tree
• Maximum branches
- Max number of splits at one node
• Maximum levels
- Maximum depth of the tree
• Leaf size
- Minimum number of observations in a leaf node
• Pruning
- Specifies the aggressiveness of the tree pruning
algorithm. A more aggressive algorithm creates a
smaller decision tree. Larger values are more
aggressive
• Reuse predictors
- Allows more than one split in the same branch
based on a predictor. 5
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Options
• Model Display
• Plot layout
• (General) Statistic to show
• (Decision Tree / Icicle Plot) Statistic to show
• Legend visibility
• Plot type
• Plot to show
• Confusion matrix legend visibility

6
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results
Summary Bar

Variable
Importance
Decision Tree

Assessment

Icicle
7
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Decision Results
Four panes appear under a summary bar. They can help you analyze
the results of the decision tree model.
• Tree with Treemap – displays an interactive and navigational decision tree
with node statistics and node rules
• Icicle Plot – displays a hierarchical breakdown of the tree data
• Variable Importance Plot – provides the variable importance information
for effects in the tree
• Assessment Plots – provide information on the performance of the tree
- Confusion matrix summarizes classifications.
- ROC (receiver operating characteristic) measures classification accuracy.
- Misclassification measures predictive accuracy.

8
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Decision Results
Summary bar

Target
Baseline category Performance of the
for computing decision Tree
the different
performance
indicators

9
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree: Assessment Statistics
The Assessment Statistics tab displays the value of any assessment statistics
that are computed for the model.

10
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Tree and Treemap

The color of the node in the treemap


indicates the predicted level for that
node

Zoomed in and
node selected
11
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Analyzing Decision Tree Results: Icicle
Icicle Plot: displays a hierarchical breakdown of the tree data

• Each tile = one node


• Size proportional to the
number of observation
• Colour = prediction 12
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Details Table: Node Statistics
The Node Statistics tab provides summary statistics for each node
in the decision tree.

13
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: Node Rules
The Node Rules tab provides the sorting rule that is used for each node
in the decision tree.
• Node ID Just another way
• Parent ID to represent your decision tree
• Type (Class or Leaf)
• Column for each predictor and rule applied

14
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Variable Importance Plot
Provides the variable importance information for effects in the tree.

15
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: Variable Importance
The Variable Importance tab provides variable importance information
for the variables that are used in the tree.
• Variable name
• Importance value
• Standard Deviation

16
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Leaf Statistics
what does the distribution inside a leaf look like

Count

Percent

17
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree Results: Assessment

Confusion
Matrix

ROC Misclassification

18
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment: Confusion Matrix

19
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: Confusion Matrix
The Confusion Matrix tab provides a summary of the correct and incorrect
classifications for the model that is used to generate the confusion matrix.

20
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment: Misclassification Plot

Misclassification measures predictive


accuracy:

A Misclassification plot displays how


many observations were correctly and
incorrectly classified for each value of
the response variable (TP/FP/FN/TN)

> We want to minimize the orange part

21
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Decision Tree: Misclassification
The Misclassification tab displays a summary of correct and incorrect
classifications for the model

22
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Assessment: ROC Chart
ROC (receiver operating characteristic)
measures classification accuracy:
- The ROC chart displays the ability of a
model to avoid false positive and false
negative classifications.
- The classification accuracy of a model is
demonstrated by the degree that the
ROC curve pushes upward and to the
left.
- This degree can be quantified by the
area under the curve.
- The area will range from 50, for a
worthless model, to 100, for a perfect
classifier.
23
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.
Details Table: ROC
The ROC tab displays the results that are used to generate the ROC plot

24
Copyr ig ht © SAS Institute Inc . All r ig hts re s e r ve d.

You might also like