-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX Fixes test_scale_and_stability #18746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
ogrisel
merged 43 commits into
scikit-learn:master
from
thomasjpfan:test_scale_and_stability_round_2
Nov 13, 2020
Merged
Changes from all commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
812f41d
FIX Fixes test_scale_and_stability
thomasjpfan 6637c3b
ENH Uses Nics suggestion
thomasjpfan 665a6c8
STY Linting
thomasjpfan 0f2aef8
REV Revert change
thomasjpfan 679151c
REV Adjust weights before
thomasjpfan 5e5cdaf
BUG Fix
thomasjpfan 538798a
ENH Sets condition on pinv2
thomasjpfan 235565e
DOC Adds whats new
thomasjpfan f96e865
DOC Only rank deficient X
thomasjpfan a6d4d99
DOC Rank-deficient in both
thomasjpfan 43e5cba
REV Less diffs
thomasjpfan 0fc771b
TST Splits test_scale_and_stability into 3 tests
thomasjpfan 59d6d85
TST Only rank decifient y
thomasjpfan 76876f5
DOC Update
thomasjpfan 6bdd21b
DOC Update docstring for tests
thomasjpfan 0aebfd4
TST Adds test that fails on master
thomasjpfan a4386c0
TST Back to one test
thomasjpfan b680691
DOC Update docstring
thomasjpfan 9158c96
TST Updated with randomly generated X and y
thomasjpfan a22f454
TST Places back old tests
thomasjpfan 7881351
Merge remote-tracking branch 'upstream/master' into test_scale_and_st…
thomasjpfan 70cd30f
STY Linting
thomasjpfan ece83c5
TST Adds back original dataset
thomasjpfan 9bfec22
REV Less diffs
thomasjpfan e9485b8
TST Do not think old dataset is needed
thomasjpfan 2a4733e
ENH Removes unneeded code
thomasjpfan f2318b0
TST Increase tolerance
thomasjpfan ed30019
Merge remote-tracking branch 'upstream/master' into test_scale_and_st…
thomasjpfan 5dbaaf9
REV Less diffs
thomasjpfan dd56c1b
TST Use atol
thomasjpfan bf588bb
REV Less diffs
thomasjpfan c39e682
TST Update tests
thomasjpfan c8aa37c
REV Places back all estimators
thomasjpfan 8287d8c
ENH Adds original dataset
thomasjpfan 3dbdd9e
Merge remote-tracking branch 'upstream/master' into test_scale_and_st…
thomasjpfan 8a8860f
ENH Update
thomasjpfan 08cb652
MNT Change eps
thomasjpfan 8abba5a
REV Less diffs
thomasjpfan a02c2d1
Try to remove _UnstableArchMixin from CCA
ogrisel e76209f
Merge branch 'master' of github.com:scikit-learn/scikit-learn into te…
ogrisel e79ba18
[cd build]
ogrisel 9d6296a
Remove noqa flag
ogrisel 0d6d52d
[arm64]
ogrisel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -132,6 +132,9 @@ Changelog | |
predictions for `est.transform(Y)` when the training data is single-target. | ||
:pr:`17095` by `Nicolas Hug`_. | ||
|
||
- |Fix| Increases the stability of :class:`cross_decomposition.CCA` :pr:`18746` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
by `Thomas Fan`_. | ||
|
||
- |API| For :class:`cross_decomposition.NMF`, | ||
the `init` value, when 'init=None' and | ||
n_components <= min(n_samples, n_features) will be changed from | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specify that it's only for poorly conditioned matrices? So that not all users expect to have changes in the results
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue comes from:
https://github.com/scipy/scipy/blob/8e30f7797bd1ee442f4f1a25172e4402521c1e16/scipy/linalg/basic.py#L1385
when
pinv2
is called. On windows and the pypi version of numpy the singular values forXk
are:which results in a rank of 3.
On other platforms, where the original test passes, the singular values are
which results in a rank of 2. This is by designed as stated reference paper on page 12, where the rank should decrease when
Xk
gets updated.Xk
is updated here:scikit-learn/sklearn/cross_decomposition/_pls.py
Line 263 in b5d63e3
Given, this I do not think its for poorly conditioned matrices. It is our usage of
pinv2
that was flaky.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After which iteration? If this happens at the first iter, then this is indeed a problem of condition number I believe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(by first iteration I mean the first component, or equivalently the first call to
_get_first_singular_vectors_power_method
)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This happened on the second call to
_get_first_singular_vectors_power_method
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would still be valuable to identify in which cases there's a stability improvement. Clearly, not all users will be impacted by this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this happens randomly. I updated
test_scale_and_stability
to generate randomy
s and found that there were seeds that fail on my machine and on windows.From a user point of view, for some datasets they may have gotten an incorrect model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both random
y
andX
? Because otherwise the problem might just be coming fromX
.I still believe this is related to poorly conditioned matrices because what you pass to
_get_first_singular_vectors_power_method
at iteration i are the deflated matrices from iteration i - 1 (deflation = subtracting with a rank-1 matrix to obtain a (r - 1) rank matrix)I'm quite certain that the name of the parameter to
pinv
(cond or rcond) is related to the condition numberThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated with randomly generated X and y. The condition number will alway gets much bigger when the matrix becomes a r - 1 rank matrix.
When the rank becomes r - 1,
pinv2
by default is having trouble determining the rank of the matrix:https://github.com/scipy/scipy/blob/8e30f7797bd1ee442f4f1a25172e4402521c1e16/scipy/linalg/basic.py#L1457-L1466
Setting
cond= 10 * eps
helps in this case.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specificly when
pinv2
thinks ar-1
matrix is rank r, it divides by a really small singular value when computing the inverse.