Fix KernelDensity for non-whitened data (issue #25623) #31496

edschofield · 2025-06-06T07:28:28Z

Reference Issues/PRs

Summary

This PR updates KernelDensity to correctly handle datasets whose covariance matrix is not the identity matrix.

Problem

Currently, KernelDensity assumes that the input data is already whitened (i.e., has identity covariance) when applying scalar bandwidths. As issue #25623 describes, density estimates are incorrect with non-whitened data.

Solution

This PR fixes the issue by:

Computing the empirical covariance matrix of the training data.
Computing a whitening transformation $W = \sigma^{-1/2}$ using eigen-decomposition.
Applying this transformation to both training and query data internally.
Correcting the final log-density estimates by adding the log-determinant of the whitening transform, restoring the correct units in the original feature space.

Special care is taken to:

Support both 1D and nD data correctly.
Preserve expected behavior when a fixed scalar bandwidth is manually specified.

I have tested this in 1D and 2D and the results match those of scipy.stats.gaussian_kde(). I haven't added any additional unit tests yet though.

Any other comments?

It seems there's also a separate pull request for this issue by @Charlie-XIAO, which I didn't notice before working on this patch. That PR is here: #27971.

It would be nice to get this fixed! KernelDensity() gives wildly incorrect results right now ...

github-actions · 2025-06-06T07:29:33Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 2d4899a. Link to the linter CI: here}

Fix KernelDensity for unscaled data (issue scikit-learn#25623)

7fc0dc9

github-actions bot added the module:neighbors label Jun 6, 2025

Reformat with ruff

2d4899a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix KernelDensity for non-whitened data (issue #25623) #31496

Fix KernelDensity for non-whitened data (issue #25623) #31496

Uh oh!

edschofield commented Jun 6, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Fix KernelDensity for non-whitened data (issue #25623) #31496

Are you sure you want to change the base?

Fix KernelDensity for non-whitened data (issue #25623) #31496

Uh oh!

Conversation

edschofield commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

Summary

Problem

Solution

Any other comments?

Uh oh!

github-actions bot commented Jun 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

edschofield commented Jun 6, 2025 •

edited

Loading

github-actions bot commented Jun 6, 2025 •

edited

Loading