-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Combining LOF and Isolation benchmarks #16606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this PR be removing bench_lof.py and/or bench_isolation_forest.py?
Please let us know when you've addressed this, or if you need help. Currently this appears to have some lining errors |
Sorry about that. I just delete them. |
The goal is to merge these two benchmarks into an example. Quoting @ogrisel's comment
@MaiRajborirug can you please move the merged benchmark to examples/ and change the name of script to match the usual names of the examples? |
@albertcthomas, sure. Is the name |
Looks good to me. I will try to review your PR soon. |
@albertcthomas, just a followup comment. How was the review going? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A first pass. This example will need to be referred to in the documentation on Novelty and Outlier detection.
You also need to make the tests pass: you have some flake8 errors that you need to fix. |
|
Fixed it. Thank you.
Do you know where is the document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is located here: https://github.com/scikit-learn/scikit-learn/blob/master/doc/modules/outlier_detection.rst
I might need some help with inserting new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for you work @MaiRajborirug. There is a few things to fix/do in order for this to be good on my side :).
doc/modules/.ipynb_checkpoints/outlier_detection-checkpoint.rst
Outdated
Show resolved
Hide resolved
@albertcthomas does the changes look good? Please let me know if you want to adjust the document any further. |
@albertcthomas, should we discuss the algorithm's performance in the example or not? |
Yes. I would discuss how to read the curves, that a higher ROC curve means a better performance and that we are usually mostly interested in low values of the FPR. I am not sure that we really want to emphasize any difference in performance between Isolation Forest and LOF. IMO this example is to show how to compare two outlier detection estimators. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the section about interpreting the ROC curve. @albertcthomas, is it short and clear enough?
Thank you, I just updated the code as you suggested @albertcthomas . |
You can see the rendered example here. Could you improve the rendering of the legends in the plot so that they can fit in the figure and do not have |
I fixed the |
@albertcthomas do you think this example is ready? |
It is also a bit sad that LOF is not performing well on these datasets |
I agree that LOF doesn't perform well here. @albertcthomas do you have any further suggestions on this PR improvement? |
The main branch is protected now. Only core-devs can merge... this is not really different from before for contributors but it is explicitly stated. |
@MaiRajborirug this pull request has been added to the list for the review sprint. Feel free to join if you think it will speed up the process. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @MaiRajborirug. We are currently moving our examples to use a notebook style (see #22406). Could you update your PR to follow these guidelines ? In particular, split the content in different cells with titles and move the imports in the cells where they are needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's another pass @MaiRajborirug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple more nitpicks on my side but looks good otherwise once the comments from @ArturoAmorQ.
It could also be interesting to check if we can give some insights about what makes the performance different between the 2 algorithms for different datasets. @ogrisel had the intuition that IForest could be more performant of datasets with heterogeneous, while LOF might be better with numeric features and rotational invariance for instance.
@jeremiedbb that is a very interesting topic to dive into, but let's work it in another PR. It might be better to use generated data for that experiment where we have full control of the datasets. |
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
@jeremiedbb @ArturoAmorQ how were the changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @MaiRajborirug!
Thanks @MaiRajborirug and @albertcthomas and @ArturoAmorQ ! |
…#16606) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
What does this implement/fix? Explain your changes.
Create one example that combines both LOF and IF ROC curves, according to the last discussion in a merged PR #9798 (comment).
Other Comments
Also fixed some data input typos in the
plot_annomaly_comparison.py