[MRG] Combining LOF and Isolation benchmarks #16606

MaiRajborirug · 2020-03-02T04:50:06Z

What does this implement/fix? Explain your changes.
Create one example that combines both LOF and IF ROC curves, according to the last discussion in a merged PR #9798 (comment).

Other Comments
Also fixed some data input typos in the plot_annomaly_comparison.py

MaiRajborirug · 2020-03-02T06:41:11Z

jnothman

Should this PR be removing bench_lof.py and/or bench_isolation_forest.py?

examples/plot_anomaly_comparison.py

jnothman · 2020-03-02T19:54:03Z

Should this PR be removing bench_lof.py and/or bench_isolation_forest.py?

Please let us know when you've addressed this, or if you need help.

Currently this appears to have some lining errors

MaiRajborirug · 2020-03-02T20:09:52Z

Should this PR be removing bench_lof.py and/or bench_isolation_forest.py?

Please let us know when you've addressed this, or if you need help.

Currently this appears to have some lining errors

Sorry about that. I just delete them.

albertcthomas · 2020-03-03T09:17:32Z

The goal is to merge these two benchmarks into an example. Quoting @ogrisel's comment

I had a quick IRL conversation with @albertcthomas and I think we should convert those 2 benchmark scripts into a single example script with a plt.subplot grid, one subplot per dataset with both LOF and IF ROC curves.

@MaiRajborirug can you please move the merged benchmark to examples/ and change the name of script to match the usual names of the examples?

MaiRajborirug · 2020-03-03T18:21:18Z

@albertcthomas, sure. Is the name plot_anomaly_bench.py good?

albertcthomas · 2020-03-04T10:12:14Z

@albertcthomas, sure. Is the name plot_anomaly_bench.py good?

Looks good to me. I will try to review your PR soon.

MaiRajborirug · 2020-03-15T14:32:43Z

@albertcthomas, just a followup comment. How was the review going?

albertcthomas

A first pass. This example will need to be referred to in the documentation on Novelty and Outlier detection.

examples/plot_anomaly_bench.py

albertcthomas · 2020-03-17T17:38:26Z

You also need to make the tests pass: you have some flake8 errors that you need to fix.

MaiRajborirug · 2020-03-18T17:32:23Z

You also need to make the tests pass: you have some flake8 errors that you need to fix.
Fixed it

A first pass. This example will need to be referred to in the documentation on Novelty and Outlier detection.
Could you tell me where is the document Novelty and Outlier detection link in Github? I only found it in the Sklearn website.

MaiRajborirug · 2020-03-18T17:34:14Z

You also need to make the tests pass: you have some flake8 errors that you need to fix.

Fixed it. Thank you.

A first pass. This example will need to be referred to in the documentation on Novelty and Outlier detection.

Do you know where is the document Novelty and Outlier detection link in Github? I only found it in the Sklearn website. And should I adjust the document file as well?

albertcthomas

The documentation is located here: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/outlier_detection.rst

examples/plot_anomaly_bench.py

MaiRajborirug · 2020-03-20T04:59:08Z

The documentation is located here: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/outlier_detection.rst

I might need some help with inserting new .png file into the document. I can't find the folder ../auto_examples/ensemble/images/

albertcthomas

Thanks for you work @MaiRajborirug. There is a few things to fix/do in order for this to be good on my side :).

doc/modules/.ipynb_checkpoints/outlier_detection-checkpoint.rst

doc/modules/outlier_detection.rst

examples/plot_anomaly_bench.py

MaiRajborirug · 2020-03-23T13:18:06Z

@albertcthomas does the changes look good? Please let me know if you want to adjust the document any further.

examples/plot_anomaly_bench.py

MaiRajborirug · 2020-03-25T01:34:52Z

@albertcthomas, should we discuss the algorithm's performance in the example or not?

albertcthomas · 2020-03-25T14:57:33Z

@albertcthomas, should we discuss the algorithm's performance in the example or not?

Yes. I would discuss how to read the curves, that a higher ROC curve means a better performance and that we are usually mostly interested in low values of the FPR. I am not sure that we really want to emphasize any difference in performance between Isolation Forest and LOF. IMO this example is to show how to compare two outlier detection estimators.

MaiRajborirug

I added the section about interpreting the ROC curve. @albertcthomas, is it short and clear enough?

examples/plot_anomaly_bench.py

MaiRajborirug · 2020-03-28T23:39:08Z

Thank you, I just updated the code as you suggested @albertcthomas .

albertcthomas · 2020-03-29T10:37:32Z

You can see the rendered example here. Could you improve the rendering of the legends in the plot so that they can fit in the figure and do not have ' in their texts?

MaiRajborirug · 2020-03-29T15:02:29Z

You can see the rendered example here. Could you improve the rendering of the legends in the plot so that they can fit in the figure and do not have ' in their texts?

I fixed the ' and shortened the plot label description

examples/plot_anomaly_bench.py

MaiRajborirug · 2020-03-30T01:57:36Z

@albertcthomas do you think this example is ready?

albertcthomas · 2020-04-02T13:38:07Z

It is also a bit sad that LOF is not performing well on these datasets

MaiRajborirug · 2020-04-04T19:08:42Z

I agree that LOF doesn't perform well here. @albertcthomas do you have any further suggestions on this PR improvement?

cmarmo · 2022-03-30T03:37:16Z

The pull request says 'Merging is blocked' even though all checks have passed. Could someone tell me what happen?

The main branch is protected now. Only core-devs can merge... this is not really different from before for contributors but it is explicitly stated.
Thanks for synchronizing the pull request.
@ogrisel, @jeremiedbb, this is a very old PR with one approval needing a decision. Could it be possibly included in 1.1? Thanks!

cmarmo · 2022-04-02T22:04:39Z

@MaiRajborirug this pull request has been added to the list for the review sprint. Feel free to join if you think it will speed up the process. Thanks.

jeremiedbb

Thanks for the PR @MaiRajborirug. We are currently moving our examples to use a notebook style (see #22406). Could you update your PR to follow these guidelines ? In particular, split the content in different cells with titles and move the imports in the cells where they are needed.