Skip to content

Commit 00032b0

Browse files
ArturoAmorQjeremiedbbglemaitre
authored
DOC Use notebook style in plot_lof_outlier_detection.py (#26017)
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
1 parent 500636b commit 00032b0

File tree

1 file changed

+48
-29
lines changed

1 file changed

+48
-29
lines changed

examples/neighbors/plot_lof_outlier_detection.py

Lines changed: 48 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -6,56 +6,74 @@
66
The Local Outlier Factor (LOF) algorithm is an unsupervised anomaly detection
77
method which computes the local density deviation of a given data point with
88
respect to its neighbors. It considers as outliers the samples that have a
9-
substantially lower density than their neighbors. This example shows how to
10-
use LOF for outlier detection which is the default use case of this estimator
11-
in scikit-learn. Note that when LOF is used for outlier detection it has no
12-
predict, decision_function and score_samples methods. See
13-
:ref:`User Guide <outlier_detection>`: for details on the difference between
14-
outlier detection and novelty detection and how to use LOF for novelty
15-
detection.
16-
17-
The number of neighbors considered (parameter n_neighbors) is typically
18-
set 1) greater than the minimum number of samples a cluster has to contain,
19-
so that other samples can be local outliers relative to this cluster, and 2)
20-
smaller than the maximum number of close by samples that can potentially be
21-
local outliers.
22-
In practice, such information is generally not available, and taking
23-
n_neighbors=20 appears to work well in general.
9+
substantially lower density than their neighbors. This example shows how to use
10+
LOF for outlier detection which is the default use case of this estimator in
11+
scikit-learn. Note that when LOF is used for outlier detection it has no
12+
`predict`, `decision_function` and `score_samples` methods. See the :ref:`User
13+
Guide <outlier_detection>` for details on the difference between outlier
14+
detection and novelty detection and how to use LOF for novelty detection.
15+
16+
The number of neighbors considered (parameter `n_neighbors`) is typically set 1)
17+
greater than the minimum number of samples a cluster has to contain, so that
18+
other samples can be local outliers relative to this cluster, and 2) smaller
19+
than the maximum number of close by samples that can potentially be local
20+
outliers. In practice, such information is generally not available, and taking
21+
`n_neighbors=20` appears to work well in general.
2422
2523
"""
2624

25+
# %%
26+
# Generate data with outliers
27+
# ---------------------------
28+
29+
# %%
2730
import numpy as np
28-
import matplotlib.pyplot as plt
29-
from sklearn.neighbors import LocalOutlierFactor
3031

3132
np.random.seed(42)
3233

33-
# Generate train data
3434
X_inliers = 0.3 * np.random.randn(100, 2)
3535
X_inliers = np.r_[X_inliers + 2, X_inliers - 2]
36-
37-
# Generate some outliers
3836
X_outliers = np.random.uniform(low=-4, high=4, size=(20, 2))
3937
X = np.r_[X_inliers, X_outliers]
4038

4139
n_outliers = len(X_outliers)
4240
ground_truth = np.ones(len(X), dtype=int)
4341
ground_truth[-n_outliers:] = -1
4442

45-
# fit the model for outlier detection (default)
43+
# %%
44+
# Fit the model for outlier detection (default)
45+
# ---------------------------------------------
46+
#
47+
# Use `fit_predict` to compute the predicted labels of the training samples
48+
# (when LOF is used for outlier detection, the estimator has no `predict`,
49+
# `decision_function` and `score_samples` methods).
50+
51+
from sklearn.neighbors import LocalOutlierFactor
52+
4653
clf = LocalOutlierFactor(n_neighbors=20, contamination=0.1)
47-
# use fit_predict to compute the predicted labels of the training samples
48-
# (when LOF is used for outlier detection, the estimator has no predict,
49-
# decision_function and score_samples methods).
5054
y_pred = clf.fit_predict(X)
5155
n_errors = (y_pred != ground_truth).sum()
5256
X_scores = clf.negative_outlier_factor_
5357

54-
plt.title("Local Outlier Factor (LOF)")
58+
# %%
59+
# Plot results
60+
# ------------
61+
62+
# %%
63+
import matplotlib.pyplot as plt
64+
from matplotlib.legend_handler import HandlerPathCollection
65+
66+
67+
def update_legend_marker_size(handle, orig):
68+
"Customize size of the legend marker"
69+
handle.update_from(orig)
70+
handle.set_sizes([20])
71+
72+
5573
plt.scatter(X[:, 0], X[:, 1], color="k", s=3.0, label="Data points")
5674
# plot circles with radius proportional to the outlier scores
5775
radius = (X_scores.max() - X_scores) / (X_scores.max() - X_scores.min())
58-
plt.scatter(
76+
scatter = plt.scatter(
5977
X[:, 0],
6078
X[:, 1],
6179
s=1000 * radius,
@@ -67,7 +85,8 @@
6785
plt.xlim((-5, 5))
6886
plt.ylim((-5, 5))
6987
plt.xlabel("prediction errors: %d" % (n_errors))
70-
legend = plt.legend(loc="upper left")
71-
legend.legendHandles[0]._sizes = [10]
72-
legend.legendHandles[1]._sizes = [20]
88+
plt.legend(
89+
handler_map={scatter: HandlerPathCollection(update_func=update_legend_marker_size)}
90+
)
91+
plt.title("Local Outlier Factor (LOF)")
7392
plt.show()

0 commit comments

Comments
 (0)