-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Made PCA expose the singular values #7685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Made PCA expose the singular values #7685
Conversation
…variable as per Issue scikit-learn#6955.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good apart from minor comments. I'm not sure whether we want a test or not.
to 1.0 | ||
to 1.0. | ||
|
||
singular_values_ : array, [n_components] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please say array, shape (n_components,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -166,6 +171,7 @@ def fit(self, X, y=None): | |||
self.singular_values_ = None | |||
self.explained_variance_ = None | |||
self.explained_variance_ratio_ = None | |||
self.singular_values_ = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to add this here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, but the other attributes were set there, so I thought it would be good for consistency to have the singular values be set there as well.
@@ -202,6 +202,11 @@ class PCA(_BasePCA): | |||
If ``n_components`` is not set then all components are stored and the | |||
sum of explained variances is equal to 1.0. | |||
|
|||
singular_values_ : array, [n_components] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm here all the docstrings seem to be using a different convention. I like shape + tuple
better, that's the standard numpy way. But I don't mind that much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've updated them all to follow the numpy standard.
@@ -663,6 +686,7 @@ def _fit(self, X): | |||
self.explained_variance_ = exp_var = (S ** 2) / n_samples | |||
full_var = np.var(X, axis=0).sum() | |||
self.explained_variance_ratio_ = exp_var / full_var | |||
self.singular_values_ = S.copy() # Store the singular values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to copy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to copy here. I've removed it.
|
||
singular_values_ : array, [n_components] | ||
The singular values corresponsing to each of the selected components. | ||
The singular values corresponds to the 2-norms of the ``n_components`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe say are the 2-norms
or are equal to
. You just used "corresponding". Also, there's a typo in "corresponding".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -385,6 +396,7 @@ def _fit_full(self, X, n_components): | |||
explained_variance_ = (S ** 2) / n_samples | |||
total_var = explained_variance_.sum() | |||
explained_variance_ratio_ = explained_variance_ / total_var | |||
singular_values_ = S.copy() # Store the singular values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be calculated by the user as np.sqrt(explained_variance_ * n_samples)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. Stupid question. I've read the issue and figure this is all about making something comparable available in TruncatedSVD.
Please add tests. |
Do you mean to add more doc tests, or to add a unit test? |
unit test |
…into pca_expose_singular_values
Ok, I've added unit tests for the singular values to PCA, IncrementalPCA and TruncatedSVD. |
…into pca_expose_singular_values
Thanks. The test are failing though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for the great tests! I only wonder if we should be refactoring the tests.
yeah it would be nice not to duplicate the code as much, bot otherwise LGTM. |
@tomlof please add an entry in what's new and let us know so we can merge. |
Ok. I've updated the pull request description so that it mentions the unit tests. |
What's new is |
@@ -93,11 +93,11 @@ class TruncatedSVD(BaseEstimator, TransformerMixin): | |||
TruncatedSVD(algorithm='randomized', n_components=5, n_iter=7, | |||
random_state=42, tol=0.0) | |||
>>> print(svd.explained_variance_ratio_) # doctest: +ELLIPSIS | |||
[ 0.0782... 0.0552... 0.0544... 0.0499... 0.0413...] | |||
[ 0.0606... 0.0584... 0.0497... 0.0434... 0.0372...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did these change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Someone else had changed those doctests. You see this in my merge commit 279fd60 above. I reverted the change, but when I ran the tests, it failed and I had to change back to what it was I pulled from the main repo. This change back was included in ae86e2f.
I didn't check what had changed to make the unit test change, but just assumed that since it was included in the main repo it was validated. Perhaps someone updated sparse_random_matrix
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears to be the product of a change in sample_without_replacement
in commit edc9e7. Let me know if there is anything else you want me to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you're saying it was a merge error on your part and it's since been fixed to reflect master. All good, then.
- :class:`decomposition.PCA`, :class:`decomposition.IncrementalPCA` and | ||
:class:`decomposition.TruncatedSVD` now expose the singular values | ||
from the underlying SVD. They are stored in the attribute | ||
`singular_values_`, like in :class:`decomposition.IncrementalPCA`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed-width font requires double-backticks in RST
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I've updated.
…into pca_expose_singular_values
Thanks @tomlof |
This seems to have cause a test failure in master: #7893 |
Reference Issue
Fixes #6955.
What does this implement/fix? Explain your changes.
I've made it so that the singular values from the underlying SVD are stored in the PCA classes under the attribute "singular_values_".
This was a trivial fix, and all I did was to save the singular values in an instance variable. I also added the attribute to the list of attributes in the documentations and added doc tests for them. I also added unit tests (
test_singular_values
) that cover the estimators PCA, IncrementalPCA and TruncatedSVD to the corresponding tests in sklearn/decomposition/tests.Any other comments?
It may appear like there are loads of commits in this fix, but the commits listed below are some old stuff I was working on years ago, that are no longer in my fork. This fix only add changes to the modules: "decomposition/incremental_pca.py", "decomposition/pca.py" and "decomposition/truncated_pca.py" and to their corresponding unit tests, and the changes are made on the latest commit to the main scikit-learn repository.