Skip to content

DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5nizza opened this issue Apr 14, 2025 · 1 comment · May be fixed by #31201
Open

DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

5nizza opened this issue Apr 14, 2025 · 1 comment · May be fixed by #31201

Comments

@5nizza
Copy link

5nizza commented Apr 14, 2025

Describe the issue linked to the documentation

Two examples for missing-values imputation use k-neighbors imputation without scaling data first.
As a result, the approaches under-perform.
The examples are:

  1. https://scikit-learn.org/stable/auto_examples/impute/plot_missing_values.html#sphx-glr-auto-examples-impute-plot-missing-values-py
  2. https://scikit-learn.org/stable/auto_examples/impute/plot_iterative_imputer_variants_comparison.html

In the first example, the effect is quite small, adding scaling before calling k-neighbours imputer changes MSE for the california dataset for k-NN from 0.2987 ± 0.1469 to 0.2912 ± 0.1410 and for the diabetes dataset from 3314 ± 114 to 3323 ± 90.

In the second example (comparing iterative imputations), the change is more significant: before the change, iterative imputation with k-neighbors performed worse than imputation with mean, after the scaling -- it performs better than mean imputation.

In both cases, it is a better practice to scale data before using a k-neighbors approach which is based on distances between points.

Image

Suggest a potential alternative/fix

I will submit a patch to fix an issue.

@5nizza 5nizza added Documentation Needs Triage Issue requires triage labels Apr 14, 2025
@5nizza 5nizza changed the title Add scaling when using k-neighbours imputation Examples (imputation): add scaling when using k-neighbours imputation Apr 14, 2025
5nizza added a commit to 5nizza/scikit-learn that referenced this issue Apr 14, 2025
@ogrisel ogrisel removed the Needs Triage Issue requires triage label Apr 24, 2025
@ogrisel
Copy link
Member

ogrisel commented Apr 24, 2025

I agree that k-NN requires feature scaling in general.

@ArturoAmorQ ArturoAmorQ changed the title Examples (imputation): add scaling when using k-neighbours imputation DOC Examples (imputation): add scaling when using k-neighbours imputation Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants