Skip to content

[WIP] Performance comparison (ROC) plots for anomaly detection methods #16378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

MaiRajborirug
Copy link
Contributor

@MaiRajborirug MaiRajborirug commented Feb 3, 2020

Reference Issues/PRs
PRs : [MRG] Comparison plot for anomaly detection methods. #10004

What does this implement/fix? Explain your changes.

  • Add a plot for anomaly detection methods on multi-D dataset (3 information dimension and >1 noise dimension)
  • Include algorithm performance comparisons: accuracy_score, roc_auc_score, and roc_curve

The plot:
plot_anomaly_comparison-3D

@ogrisel
Copy link
Member

ogrisel commented Feb 4, 2020

I like the new column with the ROC curve plots but for the other columns, I preferred the 2D plots instead of the 3D plots.

@albertcthomas
Copy link
Contributor

Thanks @MaiRajborirug, this is a nice visualization but I agree with @ogrisel: with the 3D it's harder to see the specificity of each of the estimators.

@ogrisel
Copy link
Member

ogrisel commented Feb 4, 2020

I particular, on the 2D plots it was easier to see the shape of the decision boundary with the black contour line.

@MaiRajborirug
Copy link
Contributor Author

Thank you for your reviews! I will create an ROC curve and accuracy score in the 2D-plot so that we have the performance comparison measurement.

@MaiRajborirug
Copy link
Contributor Author

MaiRajborirug commented Feb 5, 2020

According to your advice, I add ROC curves (last column), AUC, and prediction accuracy to the 2-D plots

The plot:
plots_with_ROC

@MaiRajborirug MaiRajborirug requested review from ogrisel and removed request for jnothman and amueller February 5, 2020 06:40
@MaiRajborirug MaiRajborirug changed the title [WIP] Performance comparison 3-D plot for anomaly detection methods [WIP] Performance comparison plots for anomaly detection methods Feb 5, 2020
@MaiRajborirug
Copy link
Contributor Author

MaiRajborirug commented Feb 5, 2020

The last update is to make this PR a bit shorter.

The plot:
plots_with_ROC2

@MaiRajborirug MaiRajborirug changed the title [WIP] Performance comparison plots for anomaly detection methods [WIP] Performance comparison (ROC) plots for anomaly detection methods Feb 7, 2020
@albertcthomas
Copy link
Contributor

albertcthomas commented Feb 13, 2020

This is nice plot but I am a bit ambivalent about the usefulness of ROC curves for such toy examples. If we want to make an example with ROC curves @ogrisel suggested (more than 2 years ago) to change 2 of the benchmarks to an example. This would maybe be a better thing to do.

@glemaitre
Copy link
Member

TBH, I had exactly the same reaction as @albertcthomas. I don't think that the quantitative analysis on such toy datasets is a must-have (maybe only the accuracy because it does not clutter the example so much). I think that the main point of the example is indeed a qualitative analysis. It provides highlights and intuition regarding the implemented algorithms, linked to assumptions made regarding the methods.

However, I agree with you that we miss an example where we should show an end-to-end pipeline where anomaly detection is beneficial in classification and this should be rigorously analyzed with such classification metrics/plots.

Another limitation of the ROC is that we only have 3 of the 4 methods as well.

@MaiRajborirug
Copy link
Contributor Author

MaiRajborirug commented Mar 2, 2020

@albertcthomas, I created a new PR #16606 corresponding to @ogrisel 's and your suggestion. Could you take a look at them?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants