Skip to content

sklearn PCA with n_components = 'mle' and svd_solver = 'full' results in math domain error #10217

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Yannic92 opened this issue Nov 29, 2017 · 7 comments
Labels
Milestone

Comments

@Yannic92
Copy link

Yannic92 commented Nov 29, 2017

Description

sklearn PCA with n_components = 'mle' and svd_solver = 'full' results in math domain error

The problem is in this line of code. The result of (spectrum[i] - spectrum[j]) is 0 and therefore i get log(0) which causes this exception.
Is this a sign of bad data or should the implementation handle this case?

Steps/Code to Reproduce

Store this in a file i.e. foo.csv

0.00000, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99960, 0.99903, 0.28285, 0.28246, 1.00000, 0.00000, 0.00000, 0.92666, 0.82283, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99991, 0.99934, 0.31101, 0.31064, 1.00000, 0.00000, 0.00000, 0.99241, 0.51799, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99991, 0.99934, 0.31101, 0.31064, 1.00000, 0.00000, 0.00000, 0.99241, 0.51799, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99991, 0.99934, 0.31101, 0.31064, 1.00000, 0.00000, 0.00000, 0.99241, 0.51799, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00006, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99920, 0.99863, 0.35419, 0.35384, 1.00000, 0.00000, 0.00000, 0.99421, 0.47923, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000
1.00000, 0.11108, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99144, 0.99088, 0.31786, 0.31748, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000
0.00027, 0.35941, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.87935, 0.87885, 0.32529, 0.32492, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00011, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00966, 0.99986, 0.99929, 0.35232, 0.35196, 1.00000, 0.00000, 0.00000, 0.98768, 0.53487, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000
0.00000, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99882, 0.99825, 0.21965, 0.21922, 1.00000, 0.00000, 0.00000, 1.00000, 0.60636, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00027, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99883, 0.99826, 0.33354, 0.33318, 1.00000, 0.00000, 0.00000, 0.99208, 0.51767, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000
0.00064, 0.38476, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.63538, 0.99263, 0.99207, 0.35795, 0.35759, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000
0.00064, 0.38476, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.74959, 0.96602, 1.00000, 0.35795, 0.35759, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.11961, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99962, 0.99906, 0.28849, 0.28810, 1.00000, 0.00000, 0.00000, 0.96794, 0.61479, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00018, 0.17423, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.99251, 0.99195, 0.39428, 0.39395, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000
0.00041, 0.24312, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.98138, 0.99330, 0.99273, 0.40907, 0.40988, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00007, 0.16128, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.25979, 0.99251, 0.99195, 0.41658, 0.41626, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.09204, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99967, 0.99911, 0.34629, 0.34593, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00069, 0.10755, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 1.00000, 0.00000, 0.99592, 0.99536, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00091, 0.18421, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 1.00000, 1.00000, 0.00000, 0.00000, 0.92623, 0.82295, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00091, 0.18421, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 1.00000, 1.00000, 0.00000, 0.00000, 0.92623, 0.82295, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99998, 0.99941, 0.31381, 0.31343, 1.00000, 0.00000, 0.00000, 0.96963, 0.69883, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.94950, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.99943, 0.31559, 0.31521, 1.00000, 0.00000, 0.00000, 0.92511, 0.68432, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00224, 0.89557, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.99999, 0.99942, 0.31477, 0.31439, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.79127, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.99943, 0.31602, 0.31580, 1.00000, 0.00000, 0.00000, 0.96963, 0.69883, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00000, 0.61136, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.99943, 0.31618, 0.31580, 1.00000, 0.00000, 0.00000, 0.96963, 0.69883, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000
0.00022, 0.19567, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 1.00000, 0.99943, 0.31477, 0.31439, 1.00000, 0.00000, 0.00000, 0.98541, 0.98441, 1.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000, 0.00000

import numpy
from sklearn.decomposition import PCA

test = numpy.genfromtxt("path/to/foo.csv", delimiter=',')
pca = PCA(n_components='mle', svd_solver='full')
data = pca.fit_transform(test) # This will produce the error

Expected Results

I expect data to have dimensions (X,Y) with X <= 26 and Y <=26

Actual Results

Traceback (most recent call last):
File "path/to/playground.py", line 13, in
data = pca.fit_transform(test) # This will produce the error
File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 344, in fit_transform
U, S, V = self._fit(X)
File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 388, in fit
return self.fit_full(X, n_components)
File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 427, in fit_full
infer_dimension(explained_variance
, n_samples, n_features)
File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 103, in infer_dimension
ll[rank] = assess_dimension(spectrum, rank, n_samples, n_features)
File "$path$\Python\Python36\lib\site-packages\sklearn\decomposition\pca.py", line 88, in assess_dimension
(1. / spectrum
[j] - 1. / spectrum
[i])) + log(n_samples)
ValueError: math domain error

Versions

Windows-10-10.0.14393-SP0
Python 3.6.2 (v3.6.2:5fd33b5, Jul 8 2017, 04:57:36) [MSC v.1900 64 bit (AMD64)]
NumPy 1.13.1
SciPy 0.19.1
Scikit-Learn 0.19.0

Side References

I asked a question regarding this problem on stackoverflow

@rth
Copy link
Member

rth commented Nov 30, 2017

There are known issues with PCA when using n_components='mle' see #4441 . A fix was proposed in #4827 but it's still work in progress. Though, you could try rebasing that Pull Request on top on scikit-learn master to see if it solves your issue.

If you can provide the dataset (as a link or attachment) needed to reproduce this issue that would help.

@rth
Copy link
Member

rth commented Nov 30, 2017

I can reproduce this issue running your code snippet on the test_dataset.csv you provided on the master branch on Linux. Thanks for reporting it.

@rth rth added the Bug label Nov 30, 2017
@thechargedneutron
Copy link
Contributor

I am willing to take up this issue if @rth and @Yannic92 are not working on it.

@rth
Copy link
Member

rth commented Dec 2, 2017

Sure, fine with me, @thechargedneutron . Thanks!

It might be worth checking if #4827 addresses this and possibly continue that (stalled?) PR if the original author doesn't respond.

@thechargedneutron
Copy link
Contributor

@rth What should be the ideal behaviour? How should equal spectrum be treated?

@rth
Copy link
Member

rth commented Dec 5, 2017

@thechargedneutron I have not read the reference paper, and I'm not certain what would be the ideal behavior in this case.

@jnothman
Copy link
Member

Closing as duplicate (part) of #4441

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants