SLIQ Algorithm with Example
What is SLIQ Algorithm?
SLIQ stands for Supervised Learning In Quest. It is a decision tree induction algorithm
developed to handle large datasets efficiently, especially for classification tasks. SLIQ
improves earlier methods by using a pre-sorting technique, building classification trees
quickly without re-sorting at every node, and maintaining a class list to track data.
Steps in SLIQ:
Pre-sort each attribute once.
Initialize a 'class list' with record ID, class label, and node number.
Choose the best split based on Gini index using sorted attributes.
Split the class list based on the selected attribute.
Repeat recursively for each child node.
Stop when nodes are pure or meet stopping conditions.
Example Dataset:
ID Age Income (K) Class
1 22 25 No
2 45 50 Yes
3 27 30 No
4 35 45 Yes
5 50 60 Yes
Attributes Pre-Sorted:
Age: 22, 27, 35, 45, 50
Income: 25, 30, 45, 50, 60
Class List Created:
ID Class Node
1 No 0
2 Yes 0
3 No 0
4 Yes 0
5 Yes 0
Finding the Best Split:
Split based on Age at 30 gives the best Gini index (perfect split).
Gini Index Calculation:
Split at Age <= 30:
Left Group: IDs 1, 3 (Class No)
Right Group: IDs 2, 4, 5 (Class Yes)
Gini(Left) = 0
Gini(Right) = 0
Weighted Gini = 0
Final Decision Tree:
[Root: Age <= 30?]
├── Yes (<=30) → Class: No
└── No (>30) → Class: Yes
Summary Table:
Split Condition Left Group Right Group Gini Split Remarks
Age <= 22.5 1 record (No) 4 records (3 0.3 Not best
Yes, 1 No)
Age <= 30.0 2 records (No) 3 records (Yes) 0.0 Best Split
Age <= 40.0 3 records 2 records (Yes) >0 Not better
(mixed)