Skip to content

No label or value to be assigned to outliers in Radius Neighbors Classifier and Regressor #9629

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
webber26232 opened this issue Aug 26, 2017 · 3 comments

Comments

@webber26232
Copy link
Contributor

webber26232 commented Aug 26, 2017

Description

In the Radius Neighbor models, some outlier samples may not have any neighbor within a given radius. Therefore the model cannot predict their label or value. Currently, RadiusNeighborsRegressor will return NaN for outliers after calling predict(X) method. RadiusNeighborsClassifier has a parameter: outlier_label which accept an integer or None. If None is given, when any outlier is detected, a ValueError will be reaised. If an integer is given, the integer will be treated as the label for outliers.

Can we assign random labels or random values for those outliers? Currently, the only solution for outliers is removing them or seting a large radius. I don't think removing samples from test set is an ideal solution. However, assuming that there are some outliers having a extremely long distance between themselves and other samples, cover all of them, we have to dramatically increase the radius, hence the number of noise neighbors will be very large, which will badly affect prediction result.

Therefore, by setting random values or labels to outliers, we are able to find a radius to keep the inlier prediction accurate enough and influence of random outliers being limited.

I will do some experiment on it.

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

@jnothman
Copy link
Member

jnothman commented Aug 27, 2017 via email

@webber26232
Copy link
Contributor Author

@jnothman Oh yes, you are right.
I just implemented predict_proba() for Radius Neighbors Classifier and I think I am able to implement this function as well.

@TomDLT
Copy link
Member

TomDLT commented Aug 7, 2019

Fixed in #9597

@TomDLT TomDLT closed this as completed Aug 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants