DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

5nizza · 2025-04-14T12:17:24Z

Describe the issue linked to the documentation

Two examples for missing-values imputation use k-neighbors imputation without scaling data first.
As a result, the approaches under-perform.
The examples are:

In the first example, the effect is quite small, adding scaling before calling k-neighbours imputer changes MSE for the california dataset for k-NN from 0.2987 ± 0.1469 to 0.2912 ± 0.1410 and for the diabetes dataset from 3314 ± 114 to 3323 ± 90.

In the second example (comparing iterative imputations), the change is more significant: before the change, iterative imputation with k-neighbors performed worse than imputation with mean, after the scaling -- it performs better than mean imputation.

In both cases, it is a better practice to scale data before using a k-neighbors approach which is based on distances between points.

Suggest a potential alternative/fix

I will submit a patch to fix an issue.

…n#31200)

ogrisel · 2025-04-24T08:42:33Z

I agree that k-NN requires feature scaling in general.

5nizza added Documentation Needs Triage Issue requires triage labels Apr 14, 2025

5nizza changed the title ~~Add scaling when using k-neighbours imputation~~ Examples (imputation): add scaling when using k-neighbours imputation Apr 14, 2025

5nizza added a commit to 5nizza/scikit-learn that referenced this issue Apr 14, 2025

scaling data before using k-neighbours regression (fixes: scikit-lear…

9cea45a

…n#31200)

5nizza linked a pull request Apr 14, 2025 that will close this issue

DOC Scale data before using k-neighbours regression (fixes #31200) #31201

Open

ogrisel removed the Needs Triage Issue requires triage label Apr 24, 2025

ArturoAmorQ changed the title ~~Examples (imputation): add scaling when using k-neighbours imputation~~ DOC Examples (imputation): add scaling when using k-neighbours imputation Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

5nizza commented Apr 14, 2025

ogrisel commented Apr 24, 2025

DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

DOC Examples (imputation): add scaling when using k-neighbours imputation #31200

Comments

5nizza commented Apr 14, 2025

Describe the issue linked to the documentation

Suggest a potential alternative/fix

ogrisel commented Apr 24, 2025