Skip to content

Commit c7b6cb4

Browse files
DOC: Rework and factorize quickstart examples (#700)
1 parent e1936d9 commit c7b6cb4

File tree

9 files changed

+199
-172
lines changed

9 files changed

+199
-172
lines changed

doc/Makefile

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,6 @@ html:
6464
$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
6565
@echo
6666
@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
67-
cp _build/html/_images/sphx_glr_plot_toy_model_001.png images/quickstart_1.png
6867

6968
dirhtml:
7069
$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml

doc/images/quickstart_1.png

-66.1 KB
Binary file not shown.

doc/images/quickstart_2.png

-190 KB
Binary file not shown.

doc/index_classification.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,10 @@ Prediction sets (classification)
22
================================
33

44
.. toctree::
5-
:maxdepth: 2
5+
:maxdepth: 1
66

77
choosing_the_right_algorithm_classification
8+
examples_classification/1-quickstart/plot_quickstart_classification
89
examples_classification/index
910
theoretical_description_classification
1011
index_binary_classification

doc/index_regression.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Prediction intervals (regression)
22
=================================
33

44
.. toctree::
5-
:maxdepth: 2
5+
:maxdepth: 1
66

77
choosing_the_right_algorithm_regression
88
examples_regression/1-quickstart/plot_toy_model

doc/notebooks_classification.rst

Lines changed: 0 additions & 10 deletions
This file was deleted.

doc/quick_start.rst

Lines changed: 3 additions & 134 deletions
Original file line numberDiff line numberDiff line change
@@ -34,141 +34,10 @@ To install directly from the github repository :
3434
=====================
3535

3636
Let us start with a basic regression problem.
37-
Here, we generate one-dimensional noisy data that we fit with a linear model.
37+
Here, we generate one-dimensional noisy data that we fit with a MLPRegressor: `Use MAPIE to plot prediction intervals <https://mapie.readthedocs.io/en/stable/examples_regression/1-quickstart/plot_toy_model.html>`_
3838

39-
..
40-
Comment to developers: the following piece of code is heavily inspired by `examples/regression/1-quickstart/plot_toy_model.py`.
41-
When updating it, please replicate the changes to this other file.
4239

43-
.. testcode::
44-
45-
import numpy as np
46-
from sklearn.datasets import make_regression
47-
from sklearn.model_selection import train_test_split
48-
49-
X, y = make_regression(n_samples=500, n_features=1, noise=20)
50-
51-
X_train, X_temp, y_train, y_temp = train_test_split(X, y)
52-
X_test, X_conformalize, y_test, y_conformalize = train_test_split(X_temp, y_temp)
53-
54-
# We follow a sequential ``fit``, ``conformalize``, and ``predict`` process.
55-
# We set the confidence level to estimate prediction intervals at approximately one and two
56-
# standard deviation from the mean.
57-
58-
from mapie.regression import SplitConformalRegressor
59-
60-
mapie_regressor = SplitConformalRegressor(confidence_level=[0.95, 0.68], prefit=False)
61-
mapie_regressor.fit(X_train, y_train)
62-
mapie_regressor.conformalize(X_conformalize, y_conformalize)
63-
64-
y_pred, y_pred_intervals = mapie_regressor.predict_interval(X_test)
65-
66-
# MAPIE's ``predict`` method returns point predictions as a ``np.ndarray`` of shape ``(n_samples)``.
67-
# The ``predict_set`` method returns prediction intervals as a ``np.ndarray`` of shape ``(n_samples, 2, 2)``
68-
# giving the lower and upper bounds of the intervals for each confidence level.
69-
70-
# You can compute the coverage of your prediction intervals.
71-
72-
from mapie.metrics.regression import regression_coverage_score
73-
74-
coverage_scores = regression_coverage_score(y_test, y_pred_intervals)
75-
76-
# The estimated prediction intervals can then be plotted as follows.
77-
78-
from matplotlib import pyplot as plt
79-
80-
confidence_level = [0.95, 0.68]
81-
82-
plt.xlabel("x")
83-
plt.ylabel("y")
84-
plt.scatter(X, y, alpha=0.3)
85-
plt.plot(X_test, y_pred, color="C1")
86-
order = np.argsort(X_test[:, 0])
87-
plt.plot(X_test[order], y_pred_intervals[order, 0], color="C1", ls="--")
88-
plt.plot(X_test[order], y_pred_intervals[order, 1], color="C1", ls="--")
89-
plt.fill_between(
90-
X_test[order].ravel(),
91-
y_pred_intervals[order][:, 0, 0].ravel(),
92-
y_pred_intervals[order][:, 1, 0].ravel(),
93-
alpha=0.2
94-
)
95-
plt.title(
96-
f"Effective coverage for "
97-
f"confidence_level={confidence_level[0]:.2f}: {coverage_scores[0]:.3f}\n"
98-
f"Effective coverage for "
99-
f"confidence_level={confidence_level[1]:.2f}: {coverage_scores[1]:.3f}"
100-
)
101-
plt.show()
102-
103-
.. image:: images/quickstart_1.png
104-
:width: 400
105-
:align: center
106-
107-
The title of the plot compares the target coverages with the effective coverages.
108-
The target coverage, or the confidence level, is the fraction of true labels lying in the
109-
prediction intervals that we aim to obtain for a given dataset.
110-
It is given by the ``confidence_level`` parameter defined in ``SplitConformalRegressor``, here equal to ``0.95`` and ``0.68``.
111-
The effective coverage is the actual fraction of true labels lying in the prediction intervals.
112-
113-
3. Run _MapieClassifier
40+
3. Classification
11441
=======================
11542

116-
Similarly, it's possible to do the same for a basic classification problem.
117-
118-
.. code:: python
119-
120-
import numpy as np
121-
from sklearn.linear_model import LogisticRegression
122-
from sklearn.datasets import make_blobs
123-
from sklearn.model_selection import train_test_split
124-
125-
classifier = LogisticRegression()
126-
X, y = make_blobs(n_samples=500, n_features=2, centers=3)
127-
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)
128-
129-
.. code:: python
130-
131-
from mapie.classification import _MapieClassifier
132-
133-
mapie_classifier = _MapieClassifier(estimator=classifier, method='score', cv=5)
134-
mapie_classifier = mapie_classifier.fit(X_train, y_train)
135-
136-
alpha = [0.05, 0.32]
137-
y_pred, y_pis = mapie_classifier.predict(X_test, alpha=alpha)
138-
139-
.. code:: python
140-
141-
from mapie.metrics import classification_coverage_score
142-
143-
coverage_scores = classification_coverage_score(y_test, y_pis)
144-
145-
.. code:: python
146-
147-
from matplotlib import pyplot as plt
148-
149-
x_min, x_max = np.min(X[:, 0]), np.max(X[:, 0])
150-
y_min, y_max = np.min(X[:, 1]), np.max(X[:, 1])
151-
step = 0.1
152-
153-
xx, yy = np.meshgrid(np.arange(x_min, x_max, step), np.arange(y_min, y_max, step))
154-
X_test_mesh = np.stack([xx.ravel(), yy.ravel()], axis=1)
155-
156-
y_pis = mapie_classifier.predict(X_test_mesh, alpha=alpha)[1][:,:,0]
157-
158-
plt.scatter(
159-
X_test_mesh[:, 0], X_test_mesh[:, 1],
160-
c=np.ravel_multi_index(y_pis.T, (2,2,2)),
161-
marker='.', s=10, alpha=0.2
162-
)
163-
plt.scatter(X[:, 0], X[:, 1], c=y, cmap='tab20c')
164-
plt.xlabel("x1")
165-
plt.ylabel("x2")
166-
plt.title(
167-
f"Target and effective coverages for "
168-
f"alpha={alpha[0]:.2f}: ({1-alpha[0]:.3f}, {coverage_scores[0]:.3f})"
169-
)
170-
plt.show()
171-
172-
.. image:: images/quickstart_2.png
173-
:width: 400
174-
:align: center
43+
Similarly, it's possible to do the same for a basic classification problem: `Use MAPIE to plot prediction sets <https://mapie.readthedocs.io/en/stable/examples_classification/1-quickstart/plot_quickstart_classification.html>`_
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
"""
2+
======================================================
3+
Use MAPIE to plot prediction sets
4+
======================================================
5+
6+
In this example, we explain how to use MAPIE on a basic classification setting.
7+
"""
8+
9+
##################################################################################
10+
# We will use MAPIE to estimate prediction sets on a two-dimensional dataset with
11+
# three labels.
12+
13+
import numpy as np
14+
from sklearn.neighbors import KNeighborsClassifier
15+
from sklearn.datasets import make_blobs
16+
from matplotlib import pyplot as plt
17+
from matplotlib.colors import ListedColormap
18+
from mapie.utils import train_conformalize_test_split
19+
from mapie.classification import SplitConformalClassifier
20+
from mapie.metrics.classification import classification_coverage_score
21+
22+
np.random.seed(42)
23+
24+
##############################################################################
25+
# Firstly, let us create our dataset:
26+
27+
X, y = make_blobs(n_samples=500, n_features=2, centers=3, cluster_std=3.4)
28+
29+
(X_train, X_conformalize, X_test,
30+
y_train, y_conformalize, y_test) = train_conformalize_test_split(
31+
X, y, train_size=0.4, conformalize_size=0.4, test_size=0.2
32+
)
33+
34+
##############################################################################
35+
# We fit our training data with a KNN estimator.
36+
# Then, we initialize a :class:`~mapie.classification.SplitConformalClassifier`
37+
# using our estimator, indicating that it has already been fitted with
38+
# `prefit=True`.
39+
# Lastly, we compute the prediction sets with the desired confidence level using the
40+
# ``conformalize`` and ``predict_set`` methods.
41+
42+
classifier = KNeighborsClassifier(n_neighbors=10)
43+
classifier.fit(X_train, y_train)
44+
45+
confidence_level = 0.95
46+
mapie_classifier = SplitConformalClassifier(
47+
estimator=classifier, confidence_level=confidence_level, prefit=True
48+
)
49+
mapie_classifier.conformalize(X_conformalize, y_conformalize)
50+
y_pred, y_pred_set = mapie_classifier.predict_set(X_test)
51+
52+
##############################################################################
53+
# ``y_pred`` represents the point predictions as a ``np.ndarray`` of shape
54+
# ``(n_samples)``.
55+
# ``y_pred_set`` corresponds to the prediction sets as a ``np.ndarray`` of shape
56+
# ``(n_samples, 3, 1)``. This array contains only boolean values: ``True`` if the label
57+
# is included in the prediction set, and ``False`` if not.
58+
59+
##############################################################################
60+
# Finally, we can easily compute the coverage score (i.e., the proportion of times the
61+
# true labels fall within the predicted sets).
62+
63+
coverage_score = classification_coverage_score(y_test, y_pred_set)
64+
print(f"For a confidence level of {confidence_level:.2f}, "
65+
f"the target coverage is {confidence_level:.3f}, "
66+
f"and the effective coverage is {coverage_score[0]:.3f}.")
67+
68+
##############################################################################
69+
# In this example, the effective coverage is slightly above the target coverage
70+
# (i.e., 0.95), indicating that the confidence level we set has been reached.
71+
# Therefore, we can confirm that the prediction sets effectively contain the
72+
# true label more than 95% of the time.
73+
74+
##############################################################################
75+
# Now, let us plot the confidence regions across the plane.
76+
# This plot will give us insights about what the prediction set looks like for each
77+
# point.
78+
79+
x_min, x_max = np.min(X[:, 0]), np.max(X[:, 0])
80+
y_min, y_max = np.min(X[:, 1]), np.max(X[:, 1])
81+
step = 0.1
82+
83+
xx, yy = np.meshgrid(np.arange(x_min, x_max, step), np.arange(y_min, y_max, step))
84+
X_test_mesh = np.stack([xx.ravel(), yy.ravel()], axis=1)
85+
86+
y_pred_set = mapie_classifier.predict_set(X_test_mesh)[1][:, :, 0]
87+
88+
cmap_back = ListedColormap(
89+
[(0.7803921568627451, 0.9137254901960784, 0.7529411764705882),
90+
(0.9921568627450981, 0.8156862745098039, 0.6352941176470588),
91+
(0.6196078431372549, 0.6039215686274509, 0.7843137254901961),
92+
(0.7764705882352941, 0.8588235294117647, 0.9372549019607843),
93+
(0.6196078431372549, 0.6039215686274509, 0.7843137254901961),
94+
(0.6196078431372549, 0.6039215686274509, 0.7843137254901961)]
95+
)
96+
cmap_dots = ListedColormap(
97+
[(0.19215686274509805, 0.5098039215686274, 0.7411764705882353),
98+
(0.9019607843137255, 0.3333333333333333, 0.050980392156862744),
99+
(0.19215686274509805, 0.6392156862745098, 0.32941176470588235)]
100+
)
101+
102+
plt.scatter(
103+
X_test_mesh[:, 0], X_test_mesh[:, 1],
104+
c=np.ravel_multi_index(y_pred_set.T, (2, 2, 2)),
105+
cmap=cmap_back, marker='.', s=10
106+
)
107+
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=cmap_dots)
108+
plt.xlabel("x1")
109+
plt.ylabel("x2")
110+
plt.title("Confidence regions with KNN")
111+
plt.show()
112+
113+
##############################################################################
114+
# On the plot above, the dots represent the samples from our dataset, with their
115+
# color indicating their respective label.
116+
# The blue, orange and green zones correspond to prediction sets
117+
# containing only the blue label, orange label and green label respectively.
118+
# The purple zone represents areas where the prediction sets contain more than one
119+
# label, indicating that the model is uncertain.

0 commit comments

Comments
 (0)