LLE utilizing kNN for sample points #29752
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request addresses the issue #29715 by
Definitions
I refer to
NearestNeighbors
object viann = NearestNeighbors().fit(X)
, as the sample, and an element of that set as a sample point.nn.kneighbors(X=[x])
as a query point.The distinction is important because
From a math perspective there are two kNN functions: one for sample points, and one for query points. From a code perspective
nn.nn.kneighbors(X=None)
nn.nn.kneighbors(X=[x])
Problem
The original code
ValueError
whenn_neighbors
was greater than or equal ton_samples
.n_neighbors
ofn_samples-1
.n_neighbors
isn_samples
.The kind of kNN being computed for LLE is the former; kNN for points from the sample points.
However, the message given with the value error was written for the latter case; kNN for query points.
Specifically, the code
raised an error when
n_neighbors=5
andn_samples=5
, and gave the confusing messageFurther, the discussion in PR #29716 showed that the surrounding code was confusing to developers because of the
n_neighbors + 1
in calls of the formNearestNeighbors(n_neighbors=n_neighbors + 1, n_jobs=n_jobs)
NearestNeighbors
object for LLEThese calls tried to implement the kNN function for sample points (which should be done with
nn.kneighbors(X=None)
) by using the kNN function for query points (nn.kneighbors(X=[x])
) and then removing the first neighborThis treatment neglected the edge cases addressed by the kNN function for sample points
nn.kneighbors(X=None)
.Solution
The code
nn.kneighbors(X=None)
is now used to calculate kNNs of sample points.The error message now describes the condition for that function to work.
Tests
The change was tested locally, and all existing tests passed successfully.
The following shows that the method of dropping the first element in the list of nearest neighbors of a sample point sometimes gives neighbors that are the sample point itself.
Additional comments
I'm a new contributor, eager to learn.
Possible future work: It appears that
NearestNeighbors
objectnbrs
is fitnbrs
is passed to a functionNearestNeighbors
objectnbrs
to fit anotherNearestNeighbors
object,knn
.The first fit seems unnecessary, and might be doubling the computation time for the algorithm. Advice on how to asses that situation will be appreciated.