Skip to content

Fix KernelDensity for non-whitened data (issue #25623) #31496

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

edschofield
Copy link

@edschofield edschofield commented Jun 6, 2025

Reference Issues/PRs

Fixes #25623: #25623

Summary

This PR updates KernelDensity to correctly handle datasets whose covariance matrix is not the identity matrix.

Problem

Currently, KernelDensity assumes that the input data is already whitened (i.e., has identity covariance) when applying scalar bandwidths. As issue #25623 describes, density estimates are incorrect with non-whitened data.

Solution

This PR fixes the issue by:

  1. Computing the empirical covariance matrix of the training data.
  2. Computing a whitening transformation $W = \sigma^{-1/2}$ using eigen-decomposition.
  3. Applying this transformation to both training and query data internally.
  4. Correcting the final log-density estimates by adding the log-determinant of the whitening transform, restoring the correct units in the original feature space.

Special care is taken to:

  • Support both 1D and nD data correctly.
  • Preserve expected behavior when a fixed scalar bandwidth is manually specified.

I have tested this in 1D and 2D and the results match those of scipy.stats.gaussian_kde(). I haven't added any additional unit tests yet though.

Any other comments?

It seems there's also a separate pull request for this issue by @Charlie-XIAO, which I didn't notice before working on this patch. That PR is here: #27971.

It would be nice to get this fixed! KernelDensity() gives wildly incorrect results right now ...

Copy link

github-actions bot commented Jun 6, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2d4899a. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KernelDensity incorrect handling of bandwidth
1 participant