DOC Rework outlier detection estimators example #25878

ArturoAmorQ · 2023-03-16T14:20:52Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

In preparation to address this comment, the current state of the Evaluation of outlier detection estimators example can benefit from a "tutorialization" to explain the steps and results.

For such purpose, this PR:

replaces use of LabelBinarizer to a proper OneHotEncoder when needed;
reduces the number of evaluated datasets to keep message clear and use less resources;
splits the pre-processing of the different datasets to have individually tuned pipelines (current state only shows the performance of default parameters);
adds an ablation section showing the impact of the pre-processing of a local outlier factor model.

Any other comments?

The ablation section is a nice to have but makes the example take longer to run. Hopefully the soon to come speed-ups in neighbors computations will solve this issue.

…nto outlier_benchmarks

…arn into outlier_benchmarks

…nto outlier_benchmarks

ArturoAmorQ · 2023-03-22T14:06:09Z

This PR is good to be reviewed now.

betatim

I looked at the prose and left some small comments on that. Not thought about the code.

Overall the prose reads nice!

examples/miscellaneous/plot_outlier_detection_bench.py

Co-authored-by: Tim Head <betatim@gmail.com>

ArturoAmorQ · 2023-05-25T09:39:54Z

For info this PR is failing on the CI with the following traceback:

File "/home/circleci/project/examples/miscellaneous/plot_outlier_detection_bench.py", line 218, in <module>
    y_pred[model_name]["ames_housing"] = fit_predict(
  File "/home/circleci/project/examples/miscellaneous/plot_outlier_detection_bench.py", line 84, in fit_predict
    y_pred = model.fit(X).decision_function(X)
  File "/home/circleci/project/sklearn/pipeline.py", line 420, in fit
    self._final_estimator.fit(Xt, y, **fit_params_last_step)
  File "/home/circleci/project/sklearn/ensemble/_iforest.py", line 291, in fit
    X = self._validate_data(X, accept_sparse=["csc"], dtype=tree_dtype)
  File "/home/circleci/project/sklearn/base.py", line 594, in _validate_data
    out = check_array(X, input_name="X", **check_params)
  File "/home/circleci/project/sklearn/utils/validation.py", line 964, in check_array
    _assert_all_finite(
  File "/home/circleci/project/sklearn/utils/validation.py", line 129, in _assert_all_finite
    _assert_all_finite_element_wise(
  File "/home/circleci/project/sklearn/utils/validation.py", line 178, in _assert_all_finite_element_wise
    raise ValueError(msg_err)
ValueError: Input X contains NaN.
IsolationForest does not accept missing values encoded as NaN natively. For supervised learning, you might want to consider sklearn.ensemble.HistGradientBoostingClassifier and Regressor which accept missing values encoded as NaNs natively. Alternatively, it is possible to preprocess the data, for instance by using an imputer transformer in a pipeline or drop samples with missing values. See https://scikit-learn.org/stable/modules/impute.html You can find a list of all estimators that handle NaN values at the following page: https://scikit-learn.org/stable/modules/impute.html#estimators-that-handle-nan-values

The ames_housing dataset does not contain any missing values and I don't get the traceback when running locally, even after merging main. Is it possible to have an issue with the datasets.fetch_openml function on the CI only?

glemaitre · 2023-05-25T09:03:55Z

examples/miscellaneous/plot_outlier_detection_bench.py

+from time import perf_counter
+
+
+def fit_predict(X, model_name, expected_anomaly_frac, categorical_columns=()):


Not a big fan of the empty tuple. I would find it more natural with:

def fit_predict(..., categorical_columns=None): categorical_columns = [] if categorical_columns is None else categorical_columns

Since expected_anomaly_frac is only use for LOF, I would expect it to be a have default to None.
Also, I would use expected_anomaly_fraction since this is only 3 letters more.

glemaitre · 2023-05-25T09:08:00Z

examples/miscellaneous/plot_outlier_detection_bench.py

+        model.fit(X)
+        y_pred = model[-1].negative_outlier_factor_
+
+    if model_name == "IForest":


It should be an elif here.

glemaitre · 2023-05-25T09:09:05Z

examples/miscellaneous/plot_outlier_detection_bench.py

+        ordinal_encoder = OrdinalEncoder(
+            handle_unknown="use_encoded_value", unknown_value=-1
+        )
+        iforest = IsolationForest(random_state=rng)
+        preprocessor = ColumnTransformer(
+            [("categorical", ordinal_encoder, categorical_columns)],
+            remainder="passthrough",
+        )
+        model = make_pipeline(preprocessor, iforest)


Suggested change

ordinal_encoder = OrdinalEncoder(

handle_unknown="use_encoded_value", unknown_value=-1

)

iforest = IsolationForest(random_state=rng)

preprocessor = ColumnTransformer(

[("categorical", ordinal_encoder, categorical_columns)],

remainder="passthrough",

)

model = make_pipeline(preprocessor, iforest)

ordinal_encoder = OrdinalEncoder(

handle_unknown="use_encoded_value", unknown_value=-1

)

preprocessor = ColumnTransformer(

[("categorical", ordinal_encoder, categorical_columns)],

remainder="passthrough",

)

model = make_pipeline(preprocessor, IsolationForest(random_state=rng))

glemaitre · 2023-05-25T09:23:50Z

examples/miscellaneous/plot_outlier_detection_bench.py

+        ordinal_encoder = OrdinalEncoder(
+            handle_unknown="use_encoded_value", unknown_value=-1
+        )
+        iforest = IsolationForest(random_state=rng)


Using the global rng is not a good practice here. We should pass the random_state as an argument.

glemaitre · 2023-05-25T09:41:59Z

examples/miscellaneous/plot_outlier_detection_bench.py

+from time import perf_counter
+
+
+def fit_predict(X, model_name, expected_anomaly_frac, categorical_columns=()):


Reading a bit more about the example, I would decouple the creation of the estimator and the real fit_predict:

# %% # ... text about the desired preprocessing and modelling from sklearn.neighbors import LocalOutlierFactor from sklearn.ensemble import IsolationForest from sklearn.preprocessing import ( OneHotEncoder, OrdinalEncoder, RobustScaler, ) from sklearn.compose import ColumnTransformer from sklearn.pipeline import make_pipeline def make_estimator(name, categorical_columns=None, **kwargs): """Create an outlier detection estimator based on its name.""" if name == "LOF": outlier_detector = LocalOutlierFactor(**kwargs) if categorical_columns is None: preprocessor = RobustScaler() else: preprocessor = ColumnTransformer( transformers=[("categorical", OneHotEncoder(), categorical_columns)], remainder=RobustScaler(), ) else: # name == "IForest" outlier_detector = IsolationForest(**kwargs) if categorical_columns is None: preprocessor = None else: ordinal_encoder = OrdinalEncoder( handle_unknown="use_encoded_value", unknown_value=-1 ) preprocessor = ColumnTransformer( transformers=[ ("categorical", ordinal_encoder, categorical_columns), ], remainder="passthrough", ) return make_pipeline(preprocessor, outlier_detector) # %% # ... text about the `fit_predict` # %% from time import perf_counter def fit_predict(estimator, X): tic = perf_counter() if estimator[-1].__class__.__name__ == "LocalOutlierFactor": estimator.fit(X) y_pred = estimator[-1].negative_outlier_factor_ else: # "IsolationForest" y_pred = estimator.fit(X).decision_function(X) toc = perf_counter() print(f"Duration for {model_name}: {toc - tic:.2f} s") return y_pred

In the code above, it means that we will show how to define n_neighbors each time but we can also pass random_state when requesting an isolation forest.

As LocalOutlierFactor and IsolationForest don't share kwargs, the suggested make_estimator function would only work inside if-statements that make more boilerplate code. The idea was to have a function to handle the cases to avoid this.

I could still pass kwargs to the LocalOutlierFactor and set the random_state directly in the construction of the IsolationForest, though I'm aware this is not a good practice. WDYT?

glemaitre · 2023-05-25T09:49:31Z

Is it possible to have an issue with the datasets.fetch_openml function on the CI only?

Since doc_min_dependencies does not fail, I would think this is a change of behavior induced by a newer version of pandas.

Edit: I can indeed reproduce the error with pandas 2.0.1 while the code was working with pandas 1.5.1.

glemaitre · 2023-05-25T09:55:23Z

Here is the difference:

pandas 2.0.1:

<class 'pandas.core.frame.DataFrame'>
Index: 2714 entries, 0 to 2929
Data columns (total 79 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   MS_SubClass         2714 non-null   category
 1   MS_Zoning           2714 non-null   category
 2   Lot_Frontage        2714 non-null   int64   
 3   Street              2714 non-null   category
 4   Alley               2714 non-null   category
 5   Lot_Shape           2714 non-null   category
 6   Land_Contour        2714 non-null   category
 7   Utilities           2714 non-null   category
 8   Lot_Config          2714 non-null   category
 9   Land_Slope          2714 non-null   category
 10  Neighborhood        2714 non-null   category
 11  Condition_1         2714 non-null   category
 12  Condition_2         2714 non-null   category
 13  Bldg_Type           2714 non-null   category
 14  House_Style         2714 non-null   category
 15  Overall_Qual        2714 non-null   category
 16  Overall_Cond        2714 non-null   category
 17  Year_Built          2714 non-null   int64   
 18  Year_Remod_Add      2714 non-null   int64   
 19  Roof_Style          2714 non-null   category
 20  Roof_Matl           2714 non-null   category
 21  Exterior_1st        2714 non-null   category
 22  Exterior_2nd        2714 non-null   category
 23  Mas_Vnr_Type        1038 non-null   category
 24  Mas_Vnr_Area        2714 non-null   int64   
 25  Exter_Qual          2714 non-null   category
 26  Exter_Cond          2714 non-null   category
 27  Foundation          2714 non-null   category
 28  Bsmt_Qual           2714 non-null   category
 29  Bsmt_Cond           2714 non-null   category
 30  Bsmt_Exposure       2714 non-null   category
 31  BsmtFin_Type_1      2714 non-null   category
 32  BsmtFin_SF_1        2714 non-null   int64   
 33  BsmtFin_Type_2      2714 non-null   category
 34  BsmtFin_SF_2        2714 non-null   int64   
 35  Bsmt_Unf_SF         2714 non-null   int64   
 36  Total_Bsmt_SF       2714 non-null   int64   
 37  Heating             2714 non-null   category
 38  Heating_QC          2714 non-null   category
 39  Central_Air         2714 non-null   category
 40  Electrical          2714 non-null   category
 41  First_Flr_SF        2714 non-null   int64   
 42  Second_Flr_SF       2714 non-null   int64   
 43  Low_Qual_Fin_SF     2714 non-null   int64   
 44  Gr_Liv_Area         2714 non-null   int64   
 45  Bsmt_Full_Bath      2714 non-null   int64   
 46  Bsmt_Half_Bath      2714 non-null   int64   
 47  Full_Bath           2714 non-null   int64   
 48  Half_Bath           2714 non-null   int64   
 49  Bedroom_AbvGr       2714 non-null   int64   
 50  Kitchen_AbvGr       2714 non-null   int64   
 51  Kitchen_Qual        2714 non-null   category
 52  TotRms_AbvGrd       2714 non-null   int64   
 53  Functional          2714 non-null   category
 54  Fireplaces          2714 non-null   int64   
 55  Fireplace_Qu        2714 non-null   category
 56  Garage_Type         2714 non-null   category
 57  Garage_Finish       2714 non-null   category
 58  Garage_Cars         2714 non-null   int64   
 59  Garage_Area         2714 non-null   int64   
 60  Garage_Qual         2714 non-null   category
 61  Garage_Cond         2714 non-null   category
 62  Paved_Drive         2714 non-null   category
 63  Wood_Deck_SF        2714 non-null   int64   
 64  Open_Porch_SF       2714 non-null   int64   
 65  Enclosed_Porch      2714 non-null   int64   
 66  Three_season_porch  2714 non-null   int64   
 67  Screen_Porch        2714 non-null   int64   
 68  Pool_Area           2714 non-null   int64   
 69  Pool_QC             2714 non-null   category
 70  Fence               2714 non-null   category
 71  Misc_Feature        106 non-null    category
 72  Misc_Val            2714 non-null   int64   
 73  Mo_Sold             2714 non-null   int64   
 74  Year_Sold           2714 non-null   int64   
 75  Sale_Type           2714 non-null   category
 76  Sale_Condition      2714 non-null   category
 77  Longitude           2714 non-null   float64 
 78  Latitude            2714 non-null   float64 
dtypes: category(46), float64(2), int64(31)
memory usage: 856.1 KB

pandas 1.5.3

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2714 entries, 0 to 2929
Data columns (total 79 columns):
 #   Column              Non-Null Count  Dtype   
---  ------              --------------  -----   
 0   MS_SubClass         2714 non-null   category
 1   MS_Zoning           2714 non-null   category
 2   Lot_Frontage        2714 non-null   int64   
 3   Street              2714 non-null   category
 4   Alley               2714 non-null   category
 5   Lot_Shape           2714 non-null   category
 6   Land_Contour        2714 non-null   category
 7   Utilities           2714 non-null   category
 8   Lot_Config          2714 non-null   category
 9   Land_Slope          2714 non-null   category
 10  Neighborhood        2714 non-null   category
 11  Condition_1         2714 non-null   category
 12  Condition_2         2714 non-null   category
 13  Bldg_Type           2714 non-null   category
 14  House_Style         2714 non-null   category
 15  Overall_Qual        2714 non-null   category
 16  Overall_Cond        2714 non-null   category
 17  Year_Built          2714 non-null   int64   
 18  Year_Remod_Add      2714 non-null   int64   
 19  Roof_Style          2714 non-null   category
 20  Roof_Matl           2714 non-null   category
 21  Exterior_1st        2714 non-null   category
 22  Exterior_2nd        2714 non-null   category
 23  Mas_Vnr_Type        2714 non-null   category
 24  Mas_Vnr_Area        2714 non-null   int64   
 25  Exter_Qual          2714 non-null   category
 26  Exter_Cond          2714 non-null   category
 27  Foundation          2714 non-null   category
 28  Bsmt_Qual           2714 non-null   category
 29  Bsmt_Cond           2714 non-null   category
 30  Bsmt_Exposure       2714 non-null   category
 31  BsmtFin_Type_1      2714 non-null   category
 32  BsmtFin_SF_1        2714 non-null   int64   
 33  BsmtFin_Type_2      2714 non-null   category
 34  BsmtFin_SF_2        2714 non-null   int64   
 35  Bsmt_Unf_SF         2714 non-null   int64   
 36  Total_Bsmt_SF       2714 non-null   int64   
 37  Heating             2714 non-null   category
 38  Heating_QC          2714 non-null   category
 39  Central_Air         2714 non-null   category
 40  Electrical          2714 non-null   category
 41  First_Flr_SF        2714 non-null   int64   
 42  Second_Flr_SF       2714 non-null   int64   
 43  Low_Qual_Fin_SF     2714 non-null   int64   
 44  Gr_Liv_Area         2714 non-null   int64   
 45  Bsmt_Full_Bath      2714 non-null   int64   
 46  Bsmt_Half_Bath      2714 non-null   int64   
 47  Full_Bath           2714 non-null   int64   
 48  Half_Bath           2714 non-null   int64   
 49  Bedroom_AbvGr       2714 non-null   int64   
 50  Kitchen_AbvGr       2714 non-null   int64   
 51  Kitchen_Qual        2714 non-null   category
 52  TotRms_AbvGrd       2714 non-null   int64   
 53  Functional          2714 non-null   category
 54  Fireplaces          2714 non-null   int64   
 55  Fireplace_Qu        2714 non-null   category
 56  Garage_Type         2714 non-null   category
 57  Garage_Finish       2714 non-null   category
 58  Garage_Cars         2714 non-null   int64   
 59  Garage_Area         2714 non-null   int64   
 60  Garage_Qual         2714 non-null   category
 61  Garage_Cond         2714 non-null   category
 62  Paved_Drive         2714 non-null   category
 63  Wood_Deck_SF        2714 non-null   int64   
 64  Open_Porch_SF       2714 non-null   int64   
 65  Enclosed_Porch      2714 non-null   int64   
 66  Three_season_porch  2714 non-null   int64   
 67  Screen_Porch        2714 non-null   int64   
 68  Pool_Area           2714 non-null   int64   
 69  Pool_QC             2714 non-null   category
 70  Fence               2714 non-null   category
 71  Misc_Feature        2714 non-null   category
 72  Misc_Val            2714 non-null   int64   
 73  Mo_Sold             2714 non-null   int64   
 74  Year_Sold           2714 non-null   int64   
 75  Sale_Type           2714 non-null   category
 76  Sale_Condition      2714 non-null   category
 77  Longitude           2714 non-null   float64 
 78  Latitude            2714 non-null   float64 
dtypes: category(46), float64(2), int64(31)
memory usage: 856.2 KB

So the difference is in the Misc_Feature where None values in pandas 1.5.1 have been mapped to np.nan in pandas 2.0.1.

glemaitre · 2023-05-25T10:03:53Z

Basically, the culprit is pandas-dev/pandas#50286 where None as been added to the default na_values in read_csv. So this is a breaking change but it might be considered part of the breaking change between 1.X to 2.X.

I think we can tweak it on our side by omitting, None and announcing a FutureWarning.
It looks like a good case to revive #25488 to be able to handle this case.

github-actions · 2023-08-24T09:14:42Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: add9833. Link to the linter CI: here}

glemaitre

For the rest, LGTM.

glemaitre · 2023-05-25T09:47:20Z

examples/miscellaneous/plot_outlier_detection_bench.py

-from sklearn.preprocessing import LabelBinarizer
-import pandas as pd
+from sklearn.datasets import fetch_kddcup99
+from sklearn.model_selection import train_test_split

 rng = np.random.RandomState(42)


The rational is that each time rng is used, its state will be modified and changing code ordering could potentially lead to different results. So it is safer to just pass integer around.

glemaitre · 2023-09-11T13:46:01Z

examples/miscellaneous/plot_outlier_detection_bench.py

+lof = LocalOutlierFactor(n_neighbors=int(n_samples * expected_anomaly_fraction))
+
+fig, ax = plt.subplots()
+for model_idx, preprocessor in enumerate(preprocessor_list):


I think that the green line will be difficult to distinguish from the red for colorblind. Do you mind to also have a different linestyle for each line to distinguish them this way as well.

glemaitre · 2023-09-11T16:26:40Z

LGTM. Enabling auto-merge. Thanks @ArturoAmorQ

Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>

ogrisel

I realized I had a pending review on this PR. Here were the comments:

ogrisel · 2023-03-28T13:17:17Z

examples/miscellaneous/plot_outlier_detection_bench.py

+This example benchmarks two outlier detection algorithms, namely
+:ref:`local_outlier_factor` (LOF) and :ref:`isolation_forest` (IForest), using
+ROC curves on classical anomaly detection datasets. The goal is to show that
+different algorithms perform well on different datasets.


Suggested change

different algorithms perform well on different datasets.

different algorithms perform well on different datasets and highlight

differences in training speed and sensitivity to hyper-parameters.

ogrisel · 2023-03-28T13:18:06Z

examples/miscellaneous/plot_outlier_detection_bench.py

 1. The algorithms are trained on the whole dataset which is assumed to
 contain outliers.


Suggested change

1. The algorithms are trained on the whole dataset which is assumed to

contain outliers.

1. The algorithms are trained (without labels) on the whole dataset which is assumed

to contain outliers.

ogrisel · 2023-03-30T09:02:19Z

examples/miscellaneous/plot_outlier_detection_bench.py

+# similarly in terms of ROC AUC for the forestcover and cardiotocography
+# datasets. The score for IForest is slightly better for the SA dataset and LOF
+# performs considerably better on WDBC than IForest.
+#


Let's add a something about the tradeoff between ROC and computational performance:

Recall however that Isolation Forest tends to train much faster than LOF on datasets with a large number of samples. Indeed LOF needs to compute pairwise distances to find nearest neighbors which is has a quadratic complexity in large dimensions. This can make this method prohibitive on large datasets.

ArturoAmorQ added 5 commits March 14, 2023 15:19

DOC Rework outlier detection estimators example

eb5ac7c

Refactor narrative

692f19f

Several tweaks

b8670cd

Several tweaks

a389087

Add ablation section

a568b57

github-actions bot added the Documentation label Mar 16, 2023

ArturoAmorQ and others added 9 commits March 16, 2023 16:06

Tweak

f4cdb8c

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

19c36ac

…nto outlier_benchmarks

Merge branch 'main' into outlier_benchmarks

5093b06

Reduce number of samples to make the example run faster

82d5ba4

Merge branch 'outlier_benchmarks' of github.com:ArturoAmorQ/scikit-le…

3015de3

…arn into outlier_benchmarks

Add FunctionTransformer to ablation study section

d8cd592

Add interpretation to ablation results

0789293

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

7cfd6f8

…nto outlier_benchmarks

Fix broken references

16403bc

ArturoAmorQ added the Waiting for Reviewer label Mar 27, 2023

ArturoAmorQ added 7 commits March 30, 2023 15:26

Specify dtype

1451a8f

Use more specific dataset names

0935880

Use anomaly_frac to avoid human overfitting

d6be410

Fix format

f5edd11

Downsample breast cancer dataset

6efc305

Use n_jobs for LOF models

67e6b9c

Fix conflicts

ee4be2e

betatim reviewed Apr 3, 2023

View reviewed changes

ArturoAmorQ and others added 5 commits April 3, 2023 16:09

Apply suggestions from code review

2dca981

Co-authored-by: Tim Head <betatim@gmail.com>

Remove blank line

2a14b51

Use plot_chance_level instead of hand-coded line

11dc0e7

Simplify plot code

b6c1c54

Remove unclear statement on computational cost

20a5e05

ArturoAmorQ added 2 commits April 19, 2023 14:14

Merge branch 'main' into outlier_benchmarks

f5221c1

Merge branch 'main' into outlier_benchmarks

f043b05

glemaitre self-requested a review May 25, 2023 08:41

Merge branch 'main' into outlier_benchmarks

8380f15

glemaitre reviewed May 25, 2023

View reviewed changes

glemaitre mentioned this pull request May 25, 2023

ENH allows to overwrite read_csv parameter in fetch_openml #25488

Closed

ArturoAmorQ added 2 commits May 25, 2023 14:33

Ugly fix for Pandas breaking change

5bc2ab5

Address comments from Guillaume

c8d7d8d

This was referenced May 25, 2023

ENH allows to overwrite read_csv parameter in fetch_openml #26433

Merged

BUG: pd.read_csv(io.StringIO("a\nNone")).a[0] is 'None' on pandas 1 but NaN on pandas 2 pandas-dev/pandas#52493

Open

ArturoAmorQ added 2 commits June 23, 2023 11:10

Fix conflicts

5588934

Fix conflicts

330f237

glemaitre self-requested a review August 29, 2023 16:45

Merge branch 'main' into outlier_benchmarks

b36a255

glemaitre approved these changes Sep 11, 2023

View reviewed changes

ArturoAmorQ added 2 commits September 11, 2023 17:32

Use different linestyles for color-blind friendly plots

ffb0b47

Use integers instead of np.random.RandomState

add9833

glemaitre merged commit 3f11069 into scikit-learn:main Sep 11, 2023

ArturoAmorQ deleted the outlier_benchmarks branch September 11, 2023 18:06

ogrisel reviewed Dec 8, 2023

View reviewed changes

ArturoAmorQ mentioned this pull request Feb 28, 2024

DOC Iter on outlier detection estimators example #28550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Rework outlier detection estimators example #25878

DOC Rework outlier detection estimators example #25878

ArturoAmorQ commented Mar 16, 2023

ArturoAmorQ commented Mar 22, 2023

betatim left a comment •

edited

Loading

ArturoAmorQ commented May 25, 2023

glemaitre May 25, 2023

glemaitre May 25, 2023

glemaitre May 25, 2023

glemaitre May 25, 2023

glemaitre May 25, 2023

glemaitre May 25, 2023

glemaitre May 25, 2023

ArturoAmorQ May 25, 2023

ArturoAmorQ May 25, 2023

glemaitre commented May 25, 2023 •

edited

Loading

glemaitre commented May 25, 2023

glemaitre commented May 25, 2023 •

edited by jorisvandenbossche

Loading

github-actions bot commented Aug 24, 2023 •

edited

Loading

glemaitre left a comment

glemaitre May 25, 2023

glemaitre Sep 11, 2023

glemaitre commented Sep 11, 2023

ogrisel left a comment

ogrisel Mar 28, 2023

ogrisel Mar 28, 2023

ogrisel Mar 30, 2023

		from time import perf_counter


		def fit_predict(X, model_name, expected_anomaly_frac, categorical_columns=()):

	different algorithms perform well on different datasets.
	different algorithms perform well on different datasets and highlight
	differences in training speed and sensitivity to hyper-parameters.

		1. The algorithms are trained on the whole dataset which is assumed to
		contain outliers.

DOC Rework outlier detection estimators example #25878

DOC Rework outlier detection estimators example #25878

Conversation

ArturoAmorQ commented Mar 16, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

ArturoAmorQ commented Mar 22, 2023

betatim left a comment • edited Loading

Choose a reason for hiding this comment

ArturoAmorQ commented May 25, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented May 25, 2023 • edited Loading

glemaitre commented May 25, 2023

pandas 2.0.1:

pandas 1.5.3

glemaitre commented May 25, 2023 • edited by jorisvandenbossche Loading

github-actions bot commented Aug 24, 2023 • edited Loading

✔️ Linting Passed

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Sep 11, 2023

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

betatim left a comment •

edited

Loading

glemaitre commented May 25, 2023 •

edited

Loading

glemaitre commented May 25, 2023 •

edited by jorisvandenbossche

Loading

github-actions bot commented Aug 24, 2023 •

edited

Loading