Skip to content

BIRCH mentions outlier removal, DBSCAN doesn't. Is that right? #20413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ThomasOfferman opened this issue Jun 28, 2021 · 3 comments · Fixed by #21343
Closed

BIRCH mentions outlier removal, DBSCAN doesn't. Is that right? #20413

ThomasOfferman opened this issue Jun 28, 2021 · 3 comments · Fixed by #21343
Assignees
Labels
Documentation good first issue Easy with clear instructions to resolve

Comments

@ThomasOfferman
Copy link

Describe the issue linked to the documentation

I was reading the user guide about clustering (https://scikit-learn.org/stable/modules/clustering.html). There is a table listing features of every clustering algorithm (how it handles different geometries, different pros/cons etc). BIRCH lists outlier removal, DBSCAN doesn't.

image

I have a limited understanding of the two algorithms so I hope I'm not wasting everyone's time, but shouldn't this be switched? I believe DBSCAN naturally supports outlier removal and I couldn't find anything in the rest of the documentation to suggest BIRCH supports outlier removal.

Suggest a potential alternative/fix

Add outlier removal as a feature to DBSCAN in the table and remove it in BIRCH's entry.

@TomDLT
Copy link
Member

TomDLT commented Jul 10, 2021

The original Birch paper does mention outliers, but I am not sure about our implementation, and it is not obvious from the documentation.

I agree DBSCAN explicitly handles outliers (through self.core_sample_indices_).

@adrinjalali
Copy link
Member

both DBSCAN and optics can be used to remove outliers, I'd e happy with a PR adding the terms to that table.

@adrinjalali adrinjalali added good first issue Easy with clear instructions to resolve help wanted labels Oct 13, 2021
@christopherlim98
Copy link
Contributor

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation good first issue Easy with clear instructions to resolve
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants