Skip to content

Fix #29715: Removed >= in line 225 of manifold/_locally_linear #29716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

AdrianoCLeao
Copy link

Description

This pull request addresses the issue #29715 by fixing the condition in the LocallyLinearEmbedding class that checks the relationship between n_neighbors and n_samples.

Problem

The original code raised a ValueError when n_neighbors was greater than or equal to n_samples, which caused confusion and incorrect error messages. Specifically, users encountered an error even when n_neighbors was equal to n_samples, which should be a valid scenario based on the error message:

if n_neighbors >= N:
    raise ValueError(
        "Expected n_neighbors <= n_samples, but n_samples = %d, n_neighbors = %d"
        % (N, n_neighbors)
    )

Solution

The condition was updated to only raise a ValueError when n_neighbors is greater than n_samples, allowing n_neighbors to be equal to n_samples without raising an error:

if n_neighbors > N:
    raise ValueError(
        "Expected n_neighbors <= n_samples, but n_samples = %d, n_neighbors = %d"
        % (N, n_neighbors)
    )

This change aligns the behavior of the code with the expected usage, preventing unnecessary errors when n_neighbors == n_samples.

Tests

  • The change was tested locally, and all existing tests passed successfully.
  • Additional edge cases were considered to ensure the condition behaves as expected.

No documentation changes were necessary, as this is a minor fix to existing functionality.

Copy link

github-actions bot commented Aug 25, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: bfe8952. Link to the linter CI: here

@@ -222,7 +222,7 @@ def _locally_linear_embedding(
raise ValueError(
"output dimension must be less than or equal to input dimension"
)
if n_neighbors >= N:
if n_neighbors > N:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think since we're passing n_neighbors + 1 to NearestNeighbors here a few lines above,

https://github.com/scikit-learn/scikit-learn/pull/29716/files#diff-fda62beef1d843550673f77d3e2c36344a4a9ea5fb1d1dc2b9f1f0d23405522bR215

the condition is correct, and the error message is wrong, i.e. here n_neighbors does NOT include the data point itself. (I find it quite confusing though)

cc @adam2392 maybe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the condition is incorrect, it seems like an in-line comment should be added, as I agree it's confusing.

Unfortunately, we can't just do n_neighbors += 1, as it's used below.

Why are we passing in n_neighbors + 1 into NearestNeighbors anyways? Def something worth clarifying in the code + then fix the error message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that since n_neighbors is increased by 1 only in the NearestNeighbors function, it does not affect the original n_neighbors variable itself. Therefore, when the validation checks the value of n_neighbors, it will only check the original variable, not the incremented one used internally by NearestNeighbors.

The reason for incrementing n_neighbors by 1 in NearestNeighbors is to ensure that the point itself is included in the list of neighbors. This is because most nearest neighbors algorithms will include the point being queried as the first neighbor. By requesting n_neighbors + 1 neighbors, the algorithm guarantees that the actual n_neighbors closest points, excluding the query point itself, are returned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that tells me the condition is correct, but the message is wrong.

@adrinjalali
Copy link
Member

@AdrianoCLeao would you be able to apply the changes here? You need to fix the message and revert the condition to what it was.

@dkobak
Copy link
Contributor

dkobak commented Jun 26, 2025

@AdrianoCLeao would you be able to fix this PR? As described above, you need to fix the message and revert the condition to what it was.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants