[MRG] Fix ARDRegression accuracy issue with scipy 1.3.0 (Issue #14055) #14067

timstaley · 2019-06-11T14:13:50Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This simply implements the strategy for choosing a pinvh singular value threshold that was the default in Scipy prior 1.3.0. (The new default strategy gives a lower threshold, producing a less stable convergence of ARDRegression).

Any other comments?

Unfortunately the way the cond parameter to scipy.linalg.pinvh is treated has changed between 1.2 and 1.3; in 1.2 the parameter was always multiplied by the largest absolute value of the singular decomposition, whereas in 1.3 it is left as-is. This means that applying this fix in 1.2.1 results in larger threshold than before!

Of course, we could pin behaviour more precisely by checking for the version of Scipy present, I'm curious to get the Sklearn regular dev-team's opinion on this.

jnothman

I haven't confirmed that this is the same logic as scipy < 1.3, but it otherwise looks good.

I think we should include this in 0.21.3 as a compatibility fix.
Please add an entry to the change log under 0.21.3 at doc/whats_new/v0.21.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

jnothman · 2019-06-13T04:19:38Z

But it seems the test fails on some CI:


    def test_ard_accuracy_on_easy_problem():
        # Check that ARD converges with reasonable accuracy on an easy problem
        # (Github issue #14055)
        # This particular seed seems to converge poorly in the failure-case
        # (scipy==1.3.0, sklearn==0.21.2)
        seed = 45
        X = np.random.RandomState(seed=seed).normal(size=(250, 3))
        y = X[:, 1]
    
        regressor = ARDRegression()
        regressor.fit(X, y)
    
        abs_coef_error = np.abs(1 - regressor.coef_[1])
        # Expect an accuracy of better than 1E-4 in most cases -
        # Failure-case produces 0.16!
>       assert abs_coef_error < 0.0001
E       assert 0.0036970085191541102 < 0.0001

X          = array([[  2.63747728e-02,   2.60321701e-01,  -3.95145542e-01],
       [ -2.04300905e-01,  -1.27163265e+00,  -2.5968786...[ -7.66169154e-01,   4.76602920e-01,   7.09365480e-01],
       [  1.41698340e+00,  -1.23040030e+00,   1.19125349e+00]])
abs_coef_error = 0.0036970085191541102
regressor  = ARDRegression(alpha_1=1e-06, alpha_2=1e-06, compute_score=False, copy_X=True,
              fit_intercept=True, lambda...a_2=1e-06, n_iter=300,
              normalize=False, threshold_lambda=10000.0, tol=0.001,
              verbose=False)
seed       = 45
y          = array([ 0.2603217 , -1.27163265, -0.87330464, -0.01568471,  0.8019266 ,
       -1.00860081, -1.54687318, -0.62769549, ...142468, -0.25631022, -0.21171933,  0.60143522,
       -1.05992542,  0.75961278, -0.28630983,  0.47660292, -1.2304003 ])

jnothman

Please add the what's new entry?

NicolasHug · 2019-06-19T14:55:44Z

I makes me uncomfortable that the cond was changed in scipy as a bugfix, and that we are reverting it here.

jnothman · 2019-06-20T00:13:02Z

I agree that it would be good to look at how scipy substantiated their change and why that should not apply here

timstaley · 2019-06-20T09:52:29Z

Hey folks, yeah I was hoping that relaxing the test-condition a little bit would get a pass from the various CI-builds. Weirdly the latest CI-fail seems to be on an unrelated unit-test (RANSACRegressor?) so I'm not sure what's going on there.

Certainly in agreement that a more principled fix would be desirable, and this is something of a temporary kludge. Unfortunately I think a proper rework will require a decent understanding of both the ARDRegressor and scipy pinvh, and I have neither (or the time to grok them) right now. This is something of a benchmark routine for the day job, so it's in our interests for it to work nicely but not a priority.

If you still want to get this merged in I'll add the various what's new entries as requested and see if it passes on CI next time around, just haven't gotten around to it yet.

timstaley · 2019-06-25T10:30:40Z

I've added the what's new entry, hopefully got all the formatting correct. Happily the tests seem to be passing now!

jnothman · 2019-07-07T11:54:44Z

Do we want this in 0.21.3? @NicolasHug? @ogrisel

NicolasHug · 2019-07-07T12:04:28Z

I'm OK with the idea.

We would still need to open an issue to find out how to make it work with the new cond default.

Regarding the proposed PR: we should probably port the whole pinvh instead of just the condition: eigh is currently called twice and that's sub-optimal.

NicolasHug · 2019-07-07T12:05:17Z

sklearn/linear_model/bayes.py

+            dtype_code = sigma_inverse_.dtype.char.lower()
+            factor = {'f': 1E3, 'd': 1E6}
+            pinvh_cond_factor = factor[dtype_code] * np.finfo(dtype_code).eps
+            s, u = linalg.eigh(sigma_inverse_, lower=True, check_finite=False)


eigh is going to be called again in pinvh, we might just as well port the whole pinvh

Sounds sensible, good way to pin behaviour across Scipy versions. Where should such 'vendored' code go in the scikit-learn codebase - 'utils.pinvh', or maybe 'utils.linalg'?

If it's a lot of code, you can put it under sklearn/externals/_scipy_linalg.py then import it or not depending on the scipy version from within sklearn/utils/fixes.py.

If it's not that much code, you can put in in utils.fixes directly.

thomasjpfan · 2019-07-08T13:31:55Z

The original issue should still be open so we can work on the underlying issue.

timstaley · 2019-07-10T14:28:45Z

Ok, compatibility-fix copy of scipy.linalg.pinvh in place, which in hindsight makes this PR much simpler to parse. Fingers crossed on the CI not flaking out.

…py 1.2.2.

sklearn/externals/_scipy_linalg.py

jnothman · 2019-07-24T08:31:33Z

Please append commits rather than squashing and force-pushing. We will squash upon merge.

jnothman · 2019-07-25T03:54:05Z

@NicolasHug / @thomasjpfan are you happy with this?

thomasjpfan · 2019-07-25T04:50:52Z

~~Calculating the cond and passing into pinvh at~~

scikit-learn/sklearn/linear_model/bayes.py

Line 554 in f4fc84a

def update_sigma(X, alpha_, lambda_, keep_lambda, n_samples):

~~may work:~~

# calcuate condition for pinvh pinvh_cond = 1E6 * np.finfo(X.dtype).eps # Compute sigma and mu (using Woodbury matrix identity) def update_sigma(X, alpha_, lambda_, keep_lambda, n_samples): sigma_ = pinvh(np.eye(n_samples) / alpha_ + np.dot(X[:, keep_lambda] * np.reshape(1. / lambda_[keep_lambda], [1, -1]), X[:, keep_lambda].T), cond=pinvh_cond)

~~In this case, the dtype of X is always np.float64, thus the 1E6 factor from the previous version of pinvh.~~

Edit: This is not the right approach. Lets merge this PR after https://github.com/scikit-learn/scikit-learn/pull/14067/files#r306688549 is resolved.

jnothman · 2019-07-27T09:06:13Z

If we want to include this in 0.21.3, it's got to be merged in the next day or two.

NicolasHug · 2019-07-27T15:00:35Z

Made some very minor cleaning in
a1f8909 , will merge when this is green.

Thanks @timstaley!

timstaley force-pushed the fix-ardregression-accuracy-with-scipy-1.3.0 branch from 701aac5 to e54ec68 Compare June 11, 2019 14:16

jnothman reviewed Jun 13, 2019

View reviewed changes

rth added this to the 0.21.3 milestone Jun 13, 2019

jnothman reviewed Jun 19, 2019

View reviewed changes

thomasjpfan mentioned this pull request Jun 19, 2019

ARD Regressor accuracy degrades when upgrading Scipy 1.2.1 -> 1.3.0 #14055

Closed

timstaley force-pushed the fix-ardregression-accuracy-with-scipy-1.3.0 branch from 7178528 to ab5aa91 Compare June 25, 2019 09:26

NicolasHug reviewed Jul 7, 2019

View reviewed changes

TST: Add regression-test demonstrating poor convergence

17740e3

timstaley force-pushed the fix-ardregression-accuracy-with-scipy-1.3.0 branch 2 times, most recently from d057f2a to 942adb4 Compare July 10, 2019 14:26

FIX: Workaround for Issue 14055 / ARDRegression - copy pinvh from sci…

b7f2829

…py 1.2.2.

timstaley force-pushed the fix-ardregression-accuracy-with-scipy-1.3.0 branch from 942adb4 to b7f2829 Compare July 10, 2019 14:29

jnothman approved these changes Jul 24, 2019

View reviewed changes

sklearn/externals/_scipy_linalg.py Show resolved Hide resolved

jnothman mentioned this pull request Jul 24, 2019

[MRG] Release 0.21.3 #14188

Merged

thomasjpfan mentioned this pull request Jul 27, 2019

[MRG] BUG Fix ARDRegressino for scipy>=1.3 #14489

Closed

minor clean

a1f8909

NicolasHug approved these changes Jul 27, 2019

View reviewed changes

NicolasHug merged commit 4265923 into scikit-learn:master Jul 27, 2019

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Jul 28, 2019

Fix ARDRegression accuracy issue with scipy 1.3.0 (scikit-learn#14067)

dd2fa68

This was referenced Nov 19, 2019

[MRG] BUG Fixes test_scale_and_stability in windows #15661

Merged

CCA sets condition in pinv2 to match with scipy < 1.3 #15667

Open

Uh oh!

[MRG] Fix ARDRegression accuracy issue with scipy 1.3.0 (Issue #14055) #14067

[MRG] Fix ARDRegression accuracy issue with scipy 1.3.0 (Issue #14055) #14067

Uh oh!

Conversation

timstaley commented Jun 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 13, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented Jun 19, 2019

Uh oh!

jnothman commented Jun 20, 2019 via email

Uh oh!

timstaley commented Jun 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timstaley commented Jun 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 7, 2019

Uh oh!

NicolasHug commented Jul 7, 2019

Uh oh!

NicolasHug Jul 7, 2019

Choose a reason for hiding this comment

Uh oh!

timstaley Jul 8, 2019

Choose a reason for hiding this comment

Uh oh!

rth Jul 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Jul 8, 2019

Uh oh!

timstaley commented Jul 10, 2019

Uh oh!

Uh oh!

jnothman commented Jul 24, 2019

Uh oh!

jnothman commented Jul 25, 2019

Uh oh!

thomasjpfan commented Jul 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 27, 2019

Uh oh!

NicolasHug commented Jul 27, 2019

Uh oh!

Uh oh!

timstaley commented Jun 11, 2019 •

edited

Loading

timstaley commented Jun 20, 2019 •

edited

Loading

timstaley commented Jun 25, 2019 •

edited

Loading

rth Jul 8, 2019 •

edited

Loading

thomasjpfan commented Jul 25, 2019 •

edited

Loading