Skip to content

Docs: nearest neighbour choice of algorithm section is unclear #13880

@rlms

Description

@rlms

The documentation for choice of nearest neighbours algorithm is unclear about what happens when algorithm="auto".

Currently, algorithm = 'auto' selects 'kd_tree' if K < N/2 and the 'effective_metric_' is in the 'VALID_METRICS' list of 'kd_tree'. It selects 'ball_tree' if K < N/2 and the 'effective_metric_' is in the 'VALID_METRICS' list of 'ball_tree'. It selects 'brute' if K < N/2 and the 'effective_metric_' is not in the 'VALID_METRICS' list of 'kd_tree' or 'ball_tree'. It selects 'brute' if K >= N/2. This choice is based on the assumption that the number of query points is at least the same order as the number of training points, and that leaf_size is close to its default value of 30.

Presumably 'ball_tree' is selected only if the effective metric is not first found in the VALID_METRICS list for 'kd_tree', but that isn't specified. And in general it doesn't seem super clear (e.g. the repetition of K < N/2). I'd like to rewrite it to something like

Currently, if K < N/2 algorithm = 'auto' selects the first out of 'kd_tree', 'ball_tree' and 'brute' that has 'effective_metric_' in its 'VALID_METRICS' list. If K >= N/2 it always selects 'brute'. This choice is based on the assumption that the number of query points is at least the same order as the number of training points, and that leaf_size is close to its default value of 30.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions