Skip to content

[MRG] Adding min. req. for numpy and scipy to circleci #10557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 1, 2018

Conversation

jotasi
Copy link

@jotasi jotasi commented Jan 31, 2018

Reference Issues/PRs

See discussion in #10527

What does this implement/fix? Explain your changes.

As suggested by @lesteve in his comment in the discussion about #10527, this PR specifies the numpy and scipy versions installed in the python 2 build of circleCI. They are set to the minimum requirements (numpy>=1.8.2, scipy>=0.13.3) to verify that the examples all run with those minimum requirements as well.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment on the python2 line saying "Test examples run with minimal dependencies."

@jotasi jotasi changed the title Adding min. req. for numpy and scipy to circleci [MRG] Adding min. req. for numpy and scipy to circleci Feb 1, 2018
@lesteve lesteve merged commit e7ecdb0 into scikit-learn:master Feb 1, 2018
@lesteve
Copy link
Member

lesteve commented Feb 1, 2018

Merging, thanks a lot!

@qinhanmin2014
Copy link
Member

ping @jnothman @lesteve @jotasi
Seems that this one has made python2 build fail in master (maybe because we don't build the examples in PR, right?)
See https://circleci.com/gh/scikit-learn/scikit-learn/17795

Unexpected failing examples:
/home/circleci/project/examples/gaussian_process/plot_gpc_xor.py failed leaving traceback:
Traceback (most recent call last):
  File "/home/circleci/project/examples/gaussian_process/plot_gpc_xor.py", line 35, in <module>
    clf = GaussianProcessClassifier(kernel=kernel, warm_start=True).fit(X, Y)
  File "/home/circleci/project/sklearn/gaussian_process/gpc.py", line 615, in fit
    self.base_estimator_.fit(X, y)
  File "/home/circleci/project/sklearn/gaussian_process/gpc.py", line 210, in fit
    self.kernel_.bounds)]
  File "/home/circleci/project/sklearn/gaussian_process/gpc.py", line 428, in _constrained_optimization
    fmin_l_bfgs_b(obj_func, initial_theta, bounds=bounds)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 185, in fmin_l_bfgs_b
    **opts)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 314, in _minimize_lbfgsb
    f, g = func_and_grad(x)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/scipy/optimize/lbfgsb.py", line 263, in func_and_grad
    f = fun(x, *args)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/scipy/optimize/optimize.py", line 59, in __call__
    fg = self.fun(x, *args)
  File "/home/circleci/project/sklearn/gaussian_process/gpc.py", line 202, in obj_func
    theta, eval_gradient=True)
  File "/home/circleci/project/sklearn/gaussian_process/gpc.py", line 346, in log_marginal_likelihood
    self._posterior_mode(K, return_temporaries=True)
  File "/home/circleci/project/sklearn/gaussian_process/gpc.py", line 399, in _posterior_mode
    L = cholesky(B, lower=True)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py", line 81, in cholesky
    check_finite=check_finite)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/scipy/linalg/decomp_cholesky.py", line 20, in _cholesky
    a1 = asarray_chkfinite(a)
  File "/home/circleci/miniconda/envs/testenv/lib/python2.7/site-packages/numpy/lib/function_base.py", line 595, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

@jnothman
Copy link
Member

jnothman commented Feb 1, 2018 via email

@jotasi
Copy link
Author

jotasi commented Feb 2, 2018

I'm very sorry for breaking. I wasn't aware that the examples aren't build in PR-tests and thus did not test whether all examples still worked.

I traced the reason for the error and it turns out to be due to a bug in scipy.special.expit (expit(x)=1/(1-exp(-x)) which used to eval to nan for large numbers (approx. greater than 700) instead of 1. It was fixed in February of 2014 and is thus still present in SciPy 0.13.3 but fixed in SciPy 0.14.0.

The problem arises here:

for _ in range(self.max_iter_predict):
# Line 4
pi = expit(f)
W = pi * (1 - pi)
# Line 5
W_sr = np.sqrt(W)
W_sr_K = W_sr[:, np.newaxis] * K
B = np.eye(W.shape[0]) + W_sr_K * W_sr
L = cholesky(B, lower=True)
# Line 6
b = W * f + (self.y_train_ - pi)
# Line 7
a = b - W_sr * cho_solve((L, True), W_sr_K.dot(b))
# Line 8
f = K.dot(a)
# Line 10: Compute log marginal likelihood in loop and use as
# convergence criterion
lml = -0.5 * a.T.dot(f) \
- np.log(1 + np.exp(-(self.y_train_ * 2 - 1) * f)).sum() \
- np.log(np.diag(L)).sum()
# Check if we have converged (log marginal likelihood does
# not decrease)
# XXX: more complex convergence criterion
if lml - log_marginal_likelihood < 1e-10:
break
log_marginal_likelihood = lml

when one element of f exceeds 717 and thus expit returns an array containing a nan. Then pi->W->W_sr->W_sr_K->B all contain nan's and thus the call to choleskyfails.

So I guess for this example to properly run, the minimum SciPy version is 0.14.0.

lesteve added a commit that referenced this pull request Feb 2, 2018
@lesteve
Copy link
Member

lesteve commented Feb 2, 2018

Sorry I should have checked the generated documentation. The doc only gets build in PRs if you modify one of the examples or if you have [doc build] in your commit message.

Unless there is a straightfoward fix, I would be in favour of reverting the commit until we figure out what to do about it.

@jotasi
Copy link
Author

jotasi commented Feb 2, 2018

Oh, good to know. I'll remember that next time!

I don't really see a straightforward fix, except for bumping the SciPy version up to 0.14.0 as otherwise one would either need to tweak the example to not trigger the bug or even change scikit-learn/sklearn/gaussian_process/gpc.py to take care of it. On the other hand, bumping SciPy up to 0.14.0 would lead to the minimum SciPy version needed for the examples differing from the minimum version required for scikit-learn per documentation (and the reasoning to support the package versions in ubuntu-trusty).

@qinhanmin2014
Copy link
Member

ping @lesteve Seems that you commit in a separate branch, not in master?

@lesteve
Copy link
Member

lesteve commented Feb 2, 2018

Should be reverted in 84488b5.

lesteve added a commit that referenced this pull request Feb 2, 2018
… build (#10557)"

This reverts commit e7ecdb0. The examples are
failing with Scipy 0.13.3 and we need to investigate.
@jotasi jotasi deleted the specify_ci_versions branch February 15, 2018 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants