-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
The documentation for choice of nearest neighbours algorithm is unclear about what happens when algorithm="auto".
Currently, algorithm = 'auto' selects 'kd_tree' if K < N/2 and the 'effective_metric_' is in the 'VALID_METRICS' list of 'kd_tree'. It selects 'ball_tree' if K < N/2 and the 'effective_metric_' is in the 'VALID_METRICS' list of 'ball_tree'. It selects 'brute' if K < N/2 and the 'effective_metric_' is not in the 'VALID_METRICS' list of 'kd_tree' or 'ball_tree'. It selects 'brute' if K >= N/2. This choice is based on the assumption that the number of query points is at least the same order as the number of training points, and that leaf_size is close to its default value of 30.
Presumably 'ball_tree' is selected only if the effective metric is not first found in the VALID_METRICS list for 'kd_tree', but that isn't specified. And in general it doesn't seem super clear (e.g. the repetition of K < N/2). I'd like to rewrite it to something like
Currently, if K < N/2 algorithm = 'auto' selects the first out of 'kd_tree', 'ball_tree' and 'brute' that has 'effective_metric_' in its 'VALID_METRICS' list. If K >= N/2 it always selects 'brute'. This choice is based on the assumption that the number of query points is at least the same order as the number of training points, and that leaf_size is close to its default value of 30.