You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the Radius Neighbor models, some outlier samples may not have any neighbor within a given radius. Therefore the model cannot predict their label or value. Currently, RadiusNeighborsRegressor will return NaN for outliers after calling predict(X) method. RadiusNeighborsClassifier has a parameter: outlier_label which accept an integer or None. If None is given, when any outlier is detected, a ValueError will be reaised. If an integer is given, the integer will be treated as the label for outliers.
Can we assign random labels or random values for those outliers? Currently, the only solution for outliers is removing them or seting a large radius. I don't think removing samples from test set is an ideal solution. However, assuming that there are some outliers having a extremely long distance between themselves and other samples, cover all of them, we have to dramatically increase the radius, hence the number of noise neighbors will be very large, which will badly affect prediction result.
Therefore, by setting random values or labels to outliers, we are able to find a radius to keep the inlier prediction accurate enough and influence of random outliers being limited.
I will do some experiment on it.
Steps/Code to Reproduce
Expected Results
Actual Results
Versions
The text was updated successfully, but these errors were encountered:
I might note that setting outlier_label to some int not known from the
dataset (e.g. -1 for true classes 0, 1, ...) allows a post-processor to
then modify that label to something randomised. So it's not like the
current implementation disallows that usage pattern; it merely does not
provide it.
@jnothman Oh yes, you are right.
I just implemented predict_proba() for Radius Neighbors Classifier and I think I am able to implement this function as well.
Description
In the Radius Neighbor models, some outlier samples may not have any neighbor within a given radius. Therefore the model cannot predict their label or value. Currently, RadiusNeighborsRegressor will return NaN for outliers after calling predict(X) method. RadiusNeighborsClassifier has a parameter: outlier_label which accept an integer or None. If None is given, when any outlier is detected, a ValueError will be reaised. If an integer is given, the integer will be treated as the label for outliers.
Can we assign random labels or random values for those outliers? Currently, the only solution for outliers is removing them or seting a large radius. I don't think removing samples from test set is an ideal solution. However, assuming that there are some outliers having a extremely long distance between themselves and other samples, cover all of them, we have to dramatically increase the radius, hence the number of noise neighbors will be very large, which will badly affect prediction result.
Therefore, by setting random values or labels to outliers, we are able to find a radius to keep the inlier prediction accurate enough and influence of random outliers being limited.
I will do some experiment on it.
Steps/Code to Reproduce
Expected Results
Actual Results
Versions
The text was updated successfully, but these errors were encountered: