Skip to content

[MRG+1] Made PCA expose the singular values #7685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Oct 30, 2016

Conversation

tomlof
Copy link
Contributor

@tomlof tomlof commented Oct 17, 2016

Reference Issue

Fixes #6955.

What does this implement/fix? Explain your changes.

I've made it so that the singular values from the underlying SVD are stored in the PCA classes under the attribute "singular_values_".

This was a trivial fix, and all I did was to save the singular values in an instance variable. I also added the attribute to the list of attributes in the documentations and added doc tests for them. I also added unit tests (test_singular_values) that cover the estimators PCA, IncrementalPCA and TruncatedSVD to the corresponding tests in sklearn/decomposition/tests.

Any other comments?

It may appear like there are loads of commits in this fix, but the commits listed below are some old stuff I was working on years ago, that are no longer in my fork. This fix only add changes to the modules: "decomposition/incremental_pca.py", "decomposition/pca.py" and "decomposition/truncated_pca.py" and to their corresponding unit tests, and the changes are made on the latest commit to the main scikit-learn repository.

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good apart from minor comments. I'm not sure whether we want a test or not.

to 1.0
to 1.0.

singular_values_ : array, [n_components]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please say array, shape (n_components,)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -166,6 +171,7 @@ def fit(self, X, y=None):
self.singular_values_ = None
self.explained_variance_ = None
self.explained_variance_ratio_ = None
self.singular_values_ = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to add this here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, but the other attributes were set there, so I thought it would be good for consistency to have the singular values be set there as well.

@@ -202,6 +202,11 @@ class PCA(_BasePCA):
If ``n_components`` is not set then all components are stored and the
sum of explained variances is equal to 1.0.

singular_values_ : array, [n_components]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm here all the docstrings seem to be using a different convention. I like shape + tuple better, that's the standard numpy way. But I don't mind that much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated them all to follow the numpy standard.

@@ -663,6 +686,7 @@ def _fit(self, X):
self.explained_variance_ = exp_var = (S ** 2) / n_samples
full_var = np.var(X, axis=0).sum()
self.explained_variance_ratio_ = exp_var / full_var
self.singular_values_ = S.copy() # Store the singular values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to copy?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to copy here. I've removed it.


singular_values_ : array, [n_components]
The singular values corresponsing to each of the selected components.
The singular values corresponds to the 2-norms of the ``n_components``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe say are the 2-norms or are equal to. You just used "corresponding". Also, there's a typo in "corresponding".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -385,6 +396,7 @@ def _fit_full(self, X, n_components):
explained_variance_ = (S ** 2) / n_samples
total_var = explained_variance_.sum()
explained_variance_ratio_ = explained_variance_ / total_var
singular_values_ = S.copy() # Store the singular values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be calculated by the user as np.sqrt(explained_variance_ * n_samples)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Stupid question. I've read the issue and figure this is all about making something comparable available in TruncatedSVD.

@jnothman
Copy link
Member

Please add tests.

@tomlof
Copy link
Contributor Author

tomlof commented Oct 25, 2016

Do you mean to add more doc tests, or to add a unit test?

@amueller
Copy link
Member

unit test

@tomlof
Copy link
Contributor Author

tomlof commented Oct 25, 2016

Ok, I've added unit tests for the singular values to PCA, IncrementalPCA and TruncatedSVD.

@amueller
Copy link
Member

Thanks. The test are failing though.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and thanks for the great tests! I only wonder if we should be refactoring the tests.

@jnothman jnothman changed the title Made PCA expose the singular values [MRG+1] Made PCA expose the singular values Oct 26, 2016
@amueller
Copy link
Member

yeah it would be nice not to duplicate the code as much, bot otherwise LGTM.

@jnothman
Copy link
Member

@tomlof please add an entry in what's new and let us know so we can merge.

@tomlof
Copy link
Contributor Author

tomlof commented Oct 27, 2016

Ok. I've updated the pull request description so that it mentions the unit tests.

@jnothman
Copy link
Member

What's new is doc/whats_new.rst, our changelog.

@@ -93,11 +93,11 @@ class TruncatedSVD(BaseEstimator, TransformerMixin):
TruncatedSVD(algorithm='randomized', n_components=5, n_iter=7,
random_state=42, tol=0.0)
>>> print(svd.explained_variance_ratio_) # doctest: +ELLIPSIS
[ 0.0782... 0.0552... 0.0544... 0.0499... 0.0413...]
[ 0.0606... 0.0584... 0.0497... 0.0434... 0.0372...]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did these change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Someone else had changed those doctests. You see this in my merge commit 279fd60 above. I reverted the change, but when I ran the tests, it failed and I had to change back to what it was I pulled from the main repo. This change back was included in ae86e2f.

I didn't check what had changed to make the unit test change, but just assumed that since it was included in the main repo it was validated. Perhaps someone updated sparse_random_matrix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be the product of a change in sample_without_replacement in commit edc9e7. Let me know if there is anything else you want me to change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you're saying it was a merge error on your part and it's since been fixed to reflect master. All good, then.

- :class:`decomposition.PCA`, :class:`decomposition.IncrementalPCA` and
:class:`decomposition.TruncatedSVD` now expose the singular values
from the underlying SVD. They are stored in the attribute
`singular_values_`, like in :class:`decomposition.IncrementalPCA`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed-width font requires double-backticks in RST

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've updated.

@jnothman jnothman merged commit 4ae1013 into scikit-learn:master Oct 30, 2016
@jnothman
Copy link
Member

Thanks @tomlof

@tomlof tomlof deleted the pca_expose_singular_values branch November 2, 2016 11:15
@amueller
Copy link
Member

This seems to have cause a test failure in master: #7893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Store singular values in PCA instances
3 participants