-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Open
Labels
ModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practicesmodule:neighbors
Description
As reported by @Witiko in #20097 (reply in thread), the centroids' initialization in BallTree
is tight to the case of the euclidean distance regardless of the DistanceMetric
set indirectly by the user provided metric
argument.
scikit-learn/sklearn/neighbors/_ball_tree.pyx
Lines 71 to 104 in 7b71511
# determine Node centroid | |
for j in range(n_features): | |
centroid[j] = 0 | |
if with_sample_weight: | |
sum_weight_node = 0 | |
for i in range(idx_start, idx_end): | |
sum_weight_node += sample_weight[idx_array[i]] | |
this_pt = data + n_features * idx_array[i] | |
for j from 0 <= j < n_features: | |
centroid[j] += this_pt[j] * sample_weight[idx_array[i]] | |
for j in range(n_features): | |
centroid[j] /= sum_weight_node | |
else: | |
for i in range(idx_start, idx_end): | |
this_pt = data + n_features * idx_array[i] | |
for j from 0 <= j < n_features: | |
centroid[j] += this_pt[j] | |
for j in range(n_features): | |
centroid[j] /= n_points | |
# determine Node radius | |
radius = 0 | |
for i in range(idx_start, idx_end): | |
radius = fmax(radius, | |
tree.rdist(centroid, | |
data + n_features * idx_array[i], | |
n_features)) | |
tree.node_data[i_node].radius = tree.dist_metric._rdist_to_dist(radius) | |
tree.node_data[i_node].idx_start = idx_start | |
tree.node_data[i_node].idx_end = idx_end |
The initialisation of those centroids should depends on the effective DistanceMetric
. Apart from the euclidean case, are there closed form solutions for computing them?
Metadata
Metadata
Assignees
Labels
ModerateAnything that requires some knowledge of conventions and best practicesAnything that requires some knowledge of conventions and best practicesmodule:neighbors