|
1 | 1 | """
|
2 |
| -============================ |
3 |
| -Classifier Chain |
4 |
| -============================ |
5 |
| -Example of using classifier chain on a multilabel dataset. |
6 |
| -
|
7 |
| -For this example we will use the `yeast |
8 |
| -<https://www.openml.org/d/40597>`_ dataset which contains |
9 |
| -2417 datapoints each with 103 features and 14 possible labels. Each |
10 |
| -data point has at least one label. As a baseline we first train a logistic |
11 |
| -regression classifier for each of the 14 labels. To evaluate the performance of |
12 |
| -these classifiers we predict on a held-out test set and calculate the |
13 |
| -:ref:`jaccard score <jaccard_similarity_score>` for each sample. |
14 |
| -
|
15 |
| -Next we create 10 classifier chains. Each classifier chain contains a |
16 |
| -logistic regression model for each of the 14 labels. The models in each |
17 |
| -chain are ordered randomly. In addition to the 103 features in the dataset, |
18 |
| -each model gets the predictions of the preceding models in the chain as |
19 |
| -features (note that by default at training time each model gets the true |
20 |
| -labels as features). These additional features allow each chain to exploit |
21 |
| -correlations among the classes. The Jaccard similarity score for each chain |
22 |
| -tends to be greater than that of the set independent logistic models. |
23 |
| -
|
24 |
| -Because the models in each chain are arranged randomly there is significant |
25 |
| -variation in performance among the chains. Presumably there is an optimal |
26 |
| -ordering of the classes in a chain that will yield the best performance. |
27 |
| -However we do not know that ordering a priori. Instead we can construct an |
28 |
| -voting ensemble of classifier chains by averaging the binary predictions of |
29 |
| -the chains and apply a threshold of 0.5. The Jaccard similarity score of the |
30 |
| -ensemble is greater than that of the independent models and tends to exceed |
31 |
| -the score of each chain in the ensemble (although this is not guaranteed |
32 |
| -with randomly ordered chains). |
33 |
| -
|
| 2 | +================================================== |
| 3 | +Multilabel classification using a classifier chain |
| 4 | +================================================== |
| 5 | +This example shows how to use :class:`~sklearn.multioutput.ClassifierChain` to solve |
| 6 | +a multilabel classification problem. |
| 7 | +
|
| 8 | +The most naive strategy to solve such a task is to independently train a binary |
| 9 | +classifier on each label (i.e. each column of the target variable). At prediction |
| 10 | +time, the ensemble of binary classifiers is used to assemble multitask prediction. |
| 11 | +
|
| 12 | +This strategy does not allow to model relationship between different tasks. The |
| 13 | +:class:`~sklearn.multioutput.ClassifierChain` is the meta-estimator (i.e. an estimator |
| 14 | +taking an inner estimator) that implements a more advanced strategy. The ensemble |
| 15 | +of binary classifiers are used as a chain where the prediction of a classifier in the |
| 16 | +chain is used as a feature for training the next classifier on a new label. Therefore, |
| 17 | +these additional features allow each chain to exploit correlations among labels. |
| 18 | +
|
| 19 | +The :ref:`Jaccard similarity <jaccard_similarity_score>` score for chain tends to be |
| 20 | +greater than that of the set independent base models. |
34 | 21 | """
|
35 | 22 |
|
36 | 23 | # Author: Adam Kleczewski
|
37 | 24 | # License: BSD 3 clause
|
38 | 25 |
|
| 26 | +# %% |
| 27 | +# Loading a dataset |
| 28 | +# ----------------- |
| 29 | +# For this example, we use the `yeast |
| 30 | +# <https://www.openml.org/d/40597>`_ dataset which contains |
| 31 | +# 2,417 datapoints each with 103 features and 14 possible labels. Each |
| 32 | +# data point has at least one label. As a baseline we first train a logistic |
| 33 | +# regression classifier for each of the 14 labels. To evaluate the performance of |
| 34 | +# these classifiers we predict on a held-out test set and calculate the |
| 35 | +# Jaccard similarity for each sample. |
| 36 | + |
39 | 37 | import matplotlib.pyplot as plt
|
40 | 38 | import numpy as np
|
41 | 39 |
|
42 | 40 | from sklearn.datasets import fetch_openml
|
43 |
| -from sklearn.linear_model import LogisticRegression |
44 |
| -from sklearn.metrics import jaccard_score |
45 | 41 | from sklearn.model_selection import train_test_split
|
46 |
| -from sklearn.multiclass import OneVsRestClassifier |
47 |
| -from sklearn.multioutput import ClassifierChain |
48 | 42 |
|
49 | 43 | # Load a multi-label dataset from https://www.openml.org/d/40597
|
50 | 44 | X, Y = fetch_openml("yeast", version=4, return_X_y=True, parser="pandas")
|
51 | 45 | Y = Y == "TRUE"
|
52 | 46 | X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=0)
|
53 | 47 |
|
54 |
| -# Fit an independent logistic regression model for each class using the |
55 |
| -# OneVsRestClassifier wrapper. |
| 48 | +# %% |
| 49 | +# Fit models |
| 50 | +# ---------- |
| 51 | +# We fit :class:`~sklearn.linear_model.LogisticRegression` wrapped by |
| 52 | +# :class:`~sklearn.multiclass.OneVsRestClassifier` and ensemble of multiple |
| 53 | +# :class:`~sklearn.multioutput.ClassifierChain`. |
| 54 | +# |
| 55 | +# LogisticRegression wrapped by OneVsRestClassifier |
| 56 | +# ************************************************** |
| 57 | +# Since by default :class:`~sklearn.linear_model.LogisticRegression` can't |
| 58 | +# handle data with multiple targets, we need to use |
| 59 | +# :class:`~sklearn.multiclass.OneVsRestClassifier`. |
| 60 | +# After fitting the model we calculate Jaccard similarity. |
| 61 | + |
| 62 | +from sklearn.linear_model import LogisticRegression |
| 63 | +from sklearn.metrics import jaccard_score |
| 64 | +from sklearn.multiclass import OneVsRestClassifier |
| 65 | + |
56 | 66 | base_lr = LogisticRegression()
|
57 | 67 | ovr = OneVsRestClassifier(base_lr)
|
58 | 68 | ovr.fit(X_train, Y_train)
|
59 | 69 | Y_pred_ovr = ovr.predict(X_test)
|
60 | 70 | ovr_jaccard_score = jaccard_score(Y_test, Y_pred_ovr, average="samples")
|
61 | 71 |
|
62 |
| -# Fit an ensemble of logistic regression classifier chains and take the |
63 |
| -# take the average prediction of all the chains. |
| 72 | +# %% |
| 73 | +# Chain of binary classifiers |
| 74 | +# *************************** |
| 75 | +# Because the models in each chain are arranged randomly there is significant |
| 76 | +# variation in performance among the chains. Presumably there is an optimal |
| 77 | +# ordering of the classes in a chain that will yield the best performance. |
| 78 | +# However, we do not know that ordering a priori. Instead, we can build a |
| 79 | +# voting ensemble of classifier chains by averaging the binary predictions of |
| 80 | +# the chains and apply a threshold of 0.5. The Jaccard similarity score of the |
| 81 | +# ensemble is greater than that of the independent models and tends to exceed |
| 82 | +# the score of each chain in the ensemble (although this is not guaranteed |
| 83 | +# with randomly ordered chains). |
| 84 | + |
| 85 | +from sklearn.multioutput import ClassifierChain |
| 86 | + |
64 | 87 | chains = [ClassifierChain(base_lr, order="random", random_state=i) for i in range(10)]
|
65 | 88 | for chain in chains:
|
66 | 89 | chain.fit(X_train, Y_train)
|
67 | 90 |
|
68 |
| -Y_pred_chains = np.array([chain.predict(X_test) for chain in chains]) |
| 91 | +Y_pred_chains = np.array([chain.predict_proba(X_test) for chain in chains]) |
69 | 92 | chain_jaccard_scores = [
|
70 | 93 | jaccard_score(Y_test, Y_pred_chain >= 0.5, average="samples")
|
71 | 94 | for Y_pred_chain in Y_pred_chains
|
|
76 | 99 | Y_test, Y_pred_ensemble >= 0.5, average="samples"
|
77 | 100 | )
|
78 | 101 |
|
79 |
| -model_scores = [ovr_jaccard_score] + chain_jaccard_scores |
80 |
| -model_scores.append(ensemble_jaccard_score) |
| 102 | +# %% |
| 103 | +# Plot results |
| 104 | +# ------------ |
| 105 | +# Plot the Jaccard similarity scores for the independent model, each of the |
| 106 | +# chains, and the ensemble (note that the vertical axis on this plot does |
| 107 | +# not begin at 0). |
| 108 | + |
| 109 | +model_scores = [ovr_jaccard_score] + chain_jaccard_scores + [ensemble_jaccard_score] |
81 | 110 |
|
82 | 111 | model_names = (
|
83 | 112 | "Independent",
|
|
96 | 125 |
|
97 | 126 | x_pos = np.arange(len(model_names))
|
98 | 127 |
|
99 |
| -# Plot the Jaccard similarity scores for the independent model, each of the |
100 |
| -# chains, and the ensemble (note that the vertical axis on this plot does |
101 |
| -# not begin at 0). |
102 |
| - |
103 | 128 | fig, ax = plt.subplots(figsize=(7, 4))
|
104 | 129 | ax.grid(True)
|
105 | 130 | ax.set_title("Classifier Chain Ensemble Performance Comparison")
|
|
111 | 136 | ax.bar(x_pos, model_scores, alpha=0.5, color=colors)
|
112 | 137 | plt.tight_layout()
|
113 | 138 | plt.show()
|
| 139 | + |
| 140 | +# %% |
| 141 | +# Results interpretation |
| 142 | +# ---------------------- |
| 143 | +# There are three main takeaways from this plot: |
| 144 | +# |
| 145 | +# - Independent model wrapped by :class:`~sklearn.multiclass.OneVsRestClassifier` |
| 146 | +# performs worse than the ensemble of classifier chains and some of individual chains. |
| 147 | +# This is caused by the fact that the logistic regression doesn't model relationship |
| 148 | +# between the labels. |
| 149 | +# - :class:`~sklearn.multioutput.ClassifierChain` takes advantage of correlation |
| 150 | +# among labels but due to random nature of labels ordering, it could yield worse |
| 151 | +# result than an independent model. |
| 152 | +# - An ensemble of chains performs better because it not only captures relationship |
| 153 | +# between labels but also does not make strong assumptions about their correct order. |
0 commit comments