Skip to content

Adapt centroids' initialisation in BallTree #20582

@jjerphan

Description

@jjerphan

As reported by @Witiko in #20097 (reply in thread), the centroids' initialization in BallTree is tight to the case of the euclidean distance regardless of the DistanceMetric set indirectly by the user provided metric argument.

# determine Node centroid
for j in range(n_features):
centroid[j] = 0
if with_sample_weight:
sum_weight_node = 0
for i in range(idx_start, idx_end):
sum_weight_node += sample_weight[idx_array[i]]
this_pt = data + n_features * idx_array[i]
for j from 0 <= j < n_features:
centroid[j] += this_pt[j] * sample_weight[idx_array[i]]
for j in range(n_features):
centroid[j] /= sum_weight_node
else:
for i in range(idx_start, idx_end):
this_pt = data + n_features * idx_array[i]
for j from 0 <= j < n_features:
centroid[j] += this_pt[j]
for j in range(n_features):
centroid[j] /= n_points
# determine Node radius
radius = 0
for i in range(idx_start, idx_end):
radius = fmax(radius,
tree.rdist(centroid,
data + n_features * idx_array[i],
n_features))
tree.node_data[i_node].radius = tree.dist_metric._rdist_to_dist(radius)
tree.node_data[i_node].idx_start = idx_start
tree.node_data[i_node].idx_end = idx_end

The initialisation of those centroids should depends on the effective DistanceMetric. Apart from the euclidean case, are there closed form solutions for computing them?

Metadata

Metadata

Assignees

No one assigned

    Labels

    ModerateAnything that requires some knowledge of conventions and best practicesmodule:neighbors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions