Skip to content

[MRG] Release 0.20.1 #12383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 140 commits into from
Nov 21, 2018
Merged

[MRG] Release 0.20.1 #12383

merged 140 commits into from
Nov 21, 2018

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Oct 15, 2018

This is gathering commits for 0.20.1. See https://github.com/scikit-learn/scikit-learn/milestone/27 for remaining work. From master == f5ef674, I have performed the following interactive rebase:

(Devs take note: it's really nice if the first line of a commit message makes it clear what component it relates to, as well as whether it's DOC/FIX/TST, etc. Feel free to rename PRs or edit the commit message upon squash-and-merge, although making sure not to remove the PR number.)

drop 27972d4 MNT bump to version 0.21.dev0 after branching 0.20.X (#11941)
drop 5a10991 CI Handle new branch in scikit-learn.github.io (#11945)
drop a47e779 EXA Fix plot in plot_optics.py
drop 8cbb131 MNT joblib 0.12.3 (#11949)
drop 0908617 typo and formatting fixes in 0.20 doc (#11963)
drop f575384 DOC whats new boilerplate for 0.21 (#11974)
drop 721ebae MNT Change max_bound -> max_eps in OPTICS (#11984)
drop 84c4e54 COSMIT remove unnecessary _TreeNode methods (#11983)
drop 07051bc DOC OPTICS: improve docstring and add default values. (#11987)
drop c268230 MNT Remove n_clusters_ in OPTICS (#11981)
drop 5e101a2 Joblib 0.12.4 (#12007)
drop e5333f5 OPTICS remove redundant recursion (#11985)
drop 251e58b FIX ordering_ type and cosmetic changes to structure for OPTICS main loop (#11986)
drop a86709f [MRG] MNT rename min_cluster_size_ratio to min_cluster_size (#11913)
drop a79d44e FIX OPTICS Change quick_scan floating point comparison to isclose (#11929)
drop efe7b8c DOC Fix optics metric issues (DOC and precomputed) (#12028)
drop e616ee3 DOC move OPTICS to 0.21
drop daaa2a5 DOC Reword to avoid that people draw wrong conclusions (#12095)
drop 5d30b2d ENH (0.21) Make OPTICS more memory efficient when calling kneighbors (#12103)
drop e81bcd5 Added the changes to remove the documentation support statements for Python 2 (#12083)
drop 755c7dd DOC (0.21) OPTICS Note the order of reachability_ and core_distances_ (#12132)
pick 09851ac TST update make_column_transformer test + add comment (#12156)
pick 9427c36 coef0 is a float, not an int (#12161)
pick e63feeb DOC More specific about the limitation of make_column_transformer (#12163)
pick b886da5 MAINT update comment
pick b915ca6 MNT Avoid using "is" when comparing strings (#12168)
pick 6463406 MNT Unused import in plot_gpr_co2.py
drop 4207225 DOC start section for the 0.20.1 bugfix notes (#12170)
pick 88b49e5 Fix parallel backend neighbors (#12172)
drop a11154e [MRG] improve check_non_negative for sparse matrices (#12106)
pick da0cb32 FIX Use take instead of choose in compute_sample_weight (#12165)
pick d88bef1 DOC Add sections to whats new 0.20.1 (#12183)
pick 2e2e69d DOC KDE normalisation clarified (#11275)
pick 819d8ef [MRG] Fix diagonal in DBSCAN with precomputed sparse neighbors graph (#12105)
squash ea52161 MNT Move what's new entry
pick a358d7d DOC Add Versionadded tag to sklearn/_config.py (#12187)
pick 239482f BaseSearchCV._run_search raises NotImplementedError instead of being an abstractmethod (#12182)
pick 2cf145d DOC Add versionadded to set_config (#12196)
pick 0b58bc3 DOC Improve ColumnTransformer docstrings (#12206)
pick 9d58ca5 MNT Remove duplicate import of warnings & unused variables (#12203)
pick 663e024 DOC Fix typo in neighbors/nearest_centroid.py (#12223)
pick e1c3c22 DOC Fixing summary table in the linear model documentation. (#12220)
pick a3616f6 MNT Use name instead of float to specify colors (#12199)
pick 94c70ff [MRG] More informative error message in OneHotEncoder(categories=None) with negative integer values (#12180)
pick 59b15c5 add explicit mention of scaing for saga in logisticregression docs. (#12236)
pick 11612fc MNT Raise error for duplicate classes when constructing a MultiLabelBinarizer (#12195)
pick 5313325 DOC Encourage contributors to use sklearn.show_versions() (#12225)
pick acf3bab MNT Add versionadded to set_config parameters
pick dd3b705 MNT Unused imports in examples
pick 60cf1d6 Fix numpy.int overflow in make_classification (#10811)
squash da85815 DOC what's new entry for "Fix numpy.int overflow in make_classification #10811"
squash 877e3f3 MNT Move what's new entry
drop 7166cd5 MNT Remove duplicate entry in whats new
pick dfd009d Remove test_import_sklearn_no_warnings (#12244)
pick cbbe489 DOC fix cross-entropy typo in tree docs (#12242)
pick bfab306 [MRG] Added Tips in SVM user guide for tuning C parameter in LinearSVC and LinearSVR (#12185)
drop 3e5777a [MRG] Fast PolynomialFeatures on dense arrays (#12251)
drop fb7be87 [MRG + 1] return_train_score deprecation (#12241)
drop e0e7387 Remove unused private functions (#12253)
pick 24aa6b8 DOC Remove mentions of removed AUTHORS.rst file (#12262)
pick f456a40 MNT Change show_versions format to suit markdown (#12255)
pick 5e24762 DOC add note on discretization creating non-linearity (#12269)
pick e9cdb55 MNT Updated PyPI URLs (#12274)
drop 2eca77b MNT complete VotingClassifier flatten_transform deprecation (#12252)
pick 8e08028 DOC Move 'for instance' to front
pick 58228cb [MRG+1] ColumnTransformer fix having mixed types in a single passthrough (#12200)
pick b725921 [MRG] Created 'cross-validation estimator' entry in glossary (#11661)
pick 3f5bf97 DOC removed ambiguity in pipeline gridsearch example (#12272)
pick 74b56db MNT Converting http to https (#12277)
pick 1e052e9 Converting http to https (2)... (#12292)
pick 0bbb7d0 DOC Add class example for LedoitWolf (#12214)
pick afa0694 FIX cache of OpenML fetcher (#12246)
pick 08924c3 EXA Fix bad data visualisation in "Importance of Feature Scaling" (#12280)
pick a1be325 MNT Make check_X_y raise a better error when y is None (#12283)
pick 8afe43e DOC Added "mars" testimonial to testimonials page (#12298)
pick 9e1d48f DOC What's new typo
pick e6359e2 DOC Fix broken link to joblib documentation (#12301)
pick b020f62 MNT Apply pep8 to docs code (#12275)
drop f4e7d2b Converting http to https (3)... (#12302)
pick 3804ccd MNT Refactors doc test into seperate script (#12248)
drop 4280308 ENH (0.21) Remember predecessor in OPTICS (#12135)
drop 2020867 FIX (0.21) OPTICS correctly handle multiple infs in reachability array. (#12029)
pick 63e5ae6 TST Use same random seeds for both GaussianMixture.fit (#12307)
drop 4e2e1fa ENH Cache class mapping in MultiLabelBinarizer() (#12116)
pick 5fd9e03 ENH Raise an error when pos_label is not in binary y_true (#12313)
pick c5b020f TST Use v_measure_score to compare label arrays up to permutation (#12265)
pick 205ff38 DOC What's new typo
pick 5df8cd3 EXA Fix title, overlapping plots and axis labels in plot_ols_ridge_variance.py (#12296)
pick 03c3af5 [MRG] Fix fetch_openml when ignore attributes are numeric (#12330)
pick bbb0d93 [MRG] FIX Update power_transform docstring and add FutureWarning (#12317)
pick c8a4132 DOC check_array() and check_X_y() documentation update (#12340)
drop a80bbd9 ENH add get_n_leaves() and get_max_depth() to DesicionTrees (#12300)
pick 00c2f41 DOC fix logistic regression.fit docstring on y (#12343)
drop 39bd736 [MRG] Move RandomTreesEmbedding criterion & max_features to be class attributes (#12324)
drop 831c760 ENH (0.21) Add max_error to the existing set of metrics for regression (#12232)
pick 8985a63 DOC Update v0.20.rst with power_transform API change (#12351)
drop 0f94f29 MNT simple deprecations and removals for 0.21 (#12238)
drop c13ba26 [MRG] Matplotlib tree plotting (#9251)
pick c76dc05 EXA Fix labels overlap in Out-of-core classification of text documents example (#12359)
pick 1e7cd7d BUG Fix OrdinalEncoder with manually specified categories (#12367)
pick 76b1078 TST Fix missing assert and parametrize k-means tests  (#12368)
drop 0296916 FIX (0.21) OPTICS processing order (#12357)
pick 0ad8736 [MRG] ENH apply sparse_threshold even if all columns are sparse (#12304)
drop 98357ec FIX (0.21) make count_nonzero dtype invariant wrt axis (#12341)
drop 274ce3a DOC use default_role='any' (#12355)
pick f5ef674 DOC improve OneHotEncoder documentation (#12314)
pick 906f078 DOC Fix the link to computing docs (#12380)
drop afe0a9b MNT skip test falling on master in legacy platforms (#12382)
pick a1d0e96 FIX Increase mean precision for large float32 arrays (#12338)
pick bea1eb5 [MRG] Clarified indempotence of fit (#12305)
pick d4c9e84 DOC minor clarifications in ensemble.rst (#11810)
pick d9b56b6 DOC DecisionTreeClassifier does not support categorical data (#12402)
pick 53069c2 DOC Change i.e. to e.g. in MinMaxScaler (#12415)
drop a5fa7d3 [MRG] Fast PolynomialFeatures on CSR matrices (#12197)
drop 5bcd84b [MRG+1] Add check_is_fitted to non standard functions (#12279)
pick 99bea79 DOC: Fix typo where FMI was referred to as AMI (#12414)
drop 3c76b9c FIX 'MultiTaskLasso' object has no attribute 'coef_' when warm_start = True (#12361)
drop 5af272a MNT Remove unused variables (#12230)
pick 2ab1961 Resurrected PR #5224 from @andylamb. (#12427)
pick 81d2178 EXA calculate number of noise points (#12428)
pick b498ac7 ENH Raise descriptive ValueError if number of samples equals number of classes in Linear Discriminant Analysis (#12391)
drop be742c3 MNT Change default metric in OPTICS (#12439)
pick 013d295 FIX olivetti_faces DESCR to point to the good location (#12441)
drop 88cdeb8 FIX (0.21) Correct RD/CD/order/predecessor in OPTICS (#12421)
drop dc883c7 ENH Corrected spelling of Harabasz score. (#12211)
pick ebe77d6 modify kbins test using kmeans due to unstable local minimum (#12450)
pick 4e2da4a [MRG] Added FutureWarning in sgd models for tol parameter (#12399)
pick e50fc2a TST Fix test gaussian mixture warm start (#12452)
drop 9ec5a15 ALL Add resample and shuffle to __all__ (#12456)
pick 5cef1df FIX ensure max_features > 0 in ensemble.bagging (#12388)
pick 2912e3a TST Parametrize, refactor and add new kmeans tests (#12432)
pick 338f763 TST Throw correct error for pytest version (#12475)
pick e67e30c Fix numpy vstack on generator expressions (#12467)
pick d4802ae Fix mean shift equation as per issue 12420 (#12455)
drop f6b0c67 TST Added estimator check for idempotence of fit() (#12328)
pick a1fabce DOC Mention of pairwise_distances in Guide on Metrics (#12416)
drop 6555631 FEA (0.21) multilabel confusion matrix (#11179)
pick c2f17d0 FIX pairwise_distances_argmin_min wrong with metric="euclidean" (#12481)
pick 478267e DOC Small fix in compose.rst (#12487)
pick 83ac5dc MNT what's new corrections
pick c6455fa Fix include .pxd definitions for sklearn.tree in wheels (#12381)
pick 1c88b3c Fix IncrementalPCA when final batch is smaller than dimensions required for SVD (#12379)
pick d42d0d7 DOC fix broken link in doc/developers/contributing.rst (#12508)
pick 6226cf5 TST skip test_backend_respected if joblib is forced into serial mode (#12496)
pick 036dfdd Fix inconsistent labels returned by BayesianGaussianMixture.fit_predict (#12451)
pick a028416 DOC Add details to StandardScaler calculation (#12446)
drop 9f842e3 API Adds passthrough option to Pipeline (#11674)
pick f24e300 Impose shared memory when fitting a SGDClassifier (#12498)
edit 6d9acd7 MNT what's new corrections
pick 6b4e00d MNT KBinsDiscretizer.transform should not mutate _encoder (#12514)
pick 5d8dfc9 FIX SkLearn `.score()` method generating error with Dask DataFrames (#12462)
drop c676981 MNT (0.21) OPTiCS change the default `algorithm` to `auto` (#12529)
pick 5196657 [MRG] Fix segfault in AgglomerativeClustering with read-only mmaps (#12485)
pick 6f5fec9 Fix dead link to numpydoc (#12532)
drop 8e3a191 fix typo in whatsnew
pick 1128094 BLD we should ensure continued support for joblib 0.11 (#12350)
drop 8408b46 ALL Add HashingVectorizer to __all__ (#12534)
drop 54de189 DOC (0.21) Make sure plot_tree docs are generated and fix link in whatsnew (#12533)
pick c68f306 DOC tweak KMeans regarding cluster_centers_ convergence (#12537)
pick c81e255 joblib 0.13.0 (#12531)
drop 664ff34 ENH Prefer threads for IsolationForest (#12543)
pick 3282d43 [MRG] Additional Warnings in case OpenML auto-detected a problem with dataset  (#12541)
pick 042843a FIX YeoJohnson transform lambda bounds (#12522)
pick ea115c2 DOC: add a testimonial from JP Morgan (#12555)
pick 362cb3b TST autoreplace assert_true(...==...) with plain assert (#12547)
pick 43e3a02 MNT Remove unused assert_true imports (#12560)
pick eb36e49 MNT Don't change self.n_values in OneHotEncoder.fit (#12286)
pick 8622885 DOC Add skorch to related projects (#12561)
pick eb36c28 FIX stop words validation in text vectorizers with custom preprocessors / tokenizers (#12393)
pick 6c5f285 EXA Fix comment in plot-iris-logistic example (#12564)
pick 7c2b47a DOC Add 's' to "correspond" in docs for Hamming Loss. (#12565)
pick 4e81949 FIX remove FutureWarning in _object_dtype_isnan and add test (#12567)
pick 4b78d7a DOC: Clarify `cv` parameter description in `GridSearchCV` (#12495)
pick 2afee93 FIX incorrect error when OneHotEncoder.transform called prior to fit (#12443)
pick 1f2dd75 MNT bare asserts (#12571)
pick 32e5fd4 FIX Workaround limitation of cloudpickle under PyPy (#12566)
pick 914a915 DOC Fix typo (#12563)
pick d0e99db TST don't test utils.fixes docstrings (#12576)
drop 02dc9ed Fix max_depth overshoot in BFS expansion of trees (#12344)
drop 4b0a70c TST Use NotFittedError instead of ValueError in test_dummy.py (#12579)
pick 14c816e ENH/FIX openml, Adds retrying if reading from cache fails (#12526)
pick 94db3d9 ENH Improved error message for bad predict_proba shape in ThresholdScorer (#12486)
pick bfc4a56 MNT Duplicate import
pick c47c8a9 DOC improved documentation of MissingIndicator (#12424)
drop 24e4641 FIX: clone behavior for estimator types (#12585)
pick 0b85b0a FIX use ellipsis in PowerTransformer doctest (#12595)
pick feed6c7 DOC fix typos in gaussian_process.rst (#12602)
pick 1d57955 DOV consistency of parameters for GroupKFold and LeaveOneGroupOut (#12581)
pick 82e095c DOC generative model description etc for LatentDirichletAllocation (#12216)
pick 62117f4 Bug with string dtype in OneHotEncoder with handle_unknown='ignore' (#12471)
pick 66507a2 MAINT Use CircleCI for linting (#12606)
pick 9ecada8 DOC fix TeX error in Davies Bouldin formula
pick 4ab6055 MNT Change deprecation for min_impurity_split from removal to changing the default (#12400)
pick d25da1b FIX make joblib utils private, and remove mentions of externals.joblib (#12345)
pick ac327c5 [MRG] DOC Windows build dependencies (#12615)
pick 0b8650a [MRG] set blockwise diagonals to zero for euclidean distance (#12612)
pick 104f684 FIX check_array dtype check for pandas series (#12625)

jorisvandenbossche and others added 30 commits October 15, 2018 10:11
…2156)

Follow-up on scikit-learn#12152
And added comment why transformer_weights is not passed through, see
scikit-learn#11183 (review)
for more discussion
…an abstractmethod (scikit-learn#12182)

* _run_search raises NotImplementedError instead of being and abstractmethod

* add error message

* test for a BaseSearchCV child w/o a _run_search

* make the test python2 compatible, still in 0.20 zone.

* specify cv in tests not to trigger the related FutureWarning

* PEP8
…) with negative integer values (scikit-learn#12180)

* Fix Issue scikit-learn#12179

OneHotEncoder "only non-negative integers" message should suggest using
categories='auto'

* Fix Issue scikit-learn#12179

    OneHotEncoder "only non-negative integers" message should suggest using
    categories='auto'

* Fix Issue scikit-learn#12179

OneHotEncoder "only non-negative integers" message should suggest using
categories='auto'

* Fixes scikit-learn#12180 Modify the error message

* Fix the spacing
…C and LinearSVR (scikit-learn#12185)

* Added Tip for tuning C parameter in LinearSVC/SVR

* Added reference about performance improvement

* Clearer explanation about large C values

* Update svm.rst
@qinhanmin2014
Copy link
Member

Things in my mind, feel free to pick or close
#12633 MNT Reomve LGTM wanings (apart from joblib) in 0.20.1
#12634 DOC Note that we have API changes in 0.20.1

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

I looked into it and here is what I got so far. In the CircleCI build calling np.linalg.slogdet with an array full of NaN yield a segmentation fault. I can not reproduce that locally:

import numpy as np

n = 56
array = np.repeat(np.nan, n*n).reshape(n, n)

np.linalg.slogdet(array) # Segmentation fault on CircleCI machine

Not sure what is the reason behind this ... suggestions welcome otherwise I'll keep debugging.

The problematic script is examples/applications/plot_stock_market.py. FWIW I started with gdb and ended up using print statements once I figured out that the problem was in slogdet. There was a Python warning as well that helped as well:

RuntimeWarning: invalid value encountered in slogdet 
  sign, logdet = _umath_linalg.slogdet(a, signature=signature).
gdb details
gdb --args python examples/applications/plot_stock_market.py
/home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/linalg/linalg.py:1965: RuntimeWarning: invalid value encountered in slogdet
  sign, logdet = _umath_linalg.slogdet(a, signature=signature)


Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff7bb9620 <main_arena>, bytes=257) at malloc.c:3489
3489    malloc.c: No such file or directory.
(gdb) bt
#0  _int_malloc (av=0x7ffff7bb9620 <main_arena>, bytes=257) at malloc.c:3489
#1  0x00007ffff7890020 in __GI___libc_malloc (bytes=257) at malloc.c:2891
#2  0x00007ffff7848136 in __GI___open_catalog (cat_name=cat_name@entry=0x7ffff67689b0 "mkl_msg.cat", 
    nlspath=nlspath@entry=0x7ffff797af70 "/usr/share/locale/%L/%N:/usr/share/locale/%L/LC_MESSAGES/%N:/usr/share/locale/%l/%N:/usr/share/locale/%l/LC_MESSAGES/%N:", env_var=env_var@entry=0x7fffffffe7b7 "C.UTF-8", catalog=catalog@entry=0x555556f40fe0)
    at open_catalog.c:172
#3  0x00007ffff7847918 in catopen (cat_name=0x7ffff67689b0 "mkl_msg.cat", flag=<optimized out>) at catgets.c:75
#4  0x00007ffff653a0b2 in mkl_serv_print ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_rt.so
#5  0x00007fffee9e0a37 in mkl_serv_default_xerbla ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#6  0x00007ffff10fa23e in mkl_lapack_xdlaswp ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so
#7  0x00007fffef4ba19c in mkl_lapack_dlaswp ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#8  0x00007ffff1300892 in mkl_lapack_dgetrf_local ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so
#9  0x00007ffff178c9f7 in mkl_lapack_xdgetrf ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_core.so
#10 0x00007fffef572ac9 in mkl_lapack_dgetrf ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#11 0x00007fffedfdf980 in mkl_lapack.dgetrf_ ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
#12 0x00007ffff52f2855 in DOUBLE_slogdet ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/linalg/_umath_linalg.cpython-36m-x86_64-linux-gnu.so
#13 0x00007ffff6236d07 in PyUFunc_GeneralizedFunction ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#14 0x00007ffff623a115 in PyUFunc_GenericFunction ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#15 0x00007ffff623ba41 in ufunc_generic_call ()
   from /home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/umath.cpython-36m-x86_64-linux-gnu.so
#16 0x000055555566738b in _PyObject_FastCallDict ()
#17 0x00005555556e899a in _PyObject_FastCallKeywords ()
#18 0x00005555556ee58e in call_function ()
#19 0x0000555555713734 in _PyEval_EvalFrameDefault ()
#20 0x00005555556e847b in fast_function ()
#21 0x00005555556ee515 in call_function ()
#22 0x0000555555712a1a in _PyEval_EvalFrameDefault ()
#23 0x00005555556e847b in fast_function ()
#24 0x00005555556ee515 in call_function ()
#25 0x0000555555712a1a in _PyEval_EvalFrameDefault ()
#26 0x00005555556e847b in fast_function ()
#27 0x00005555556ee515 in call_function ()
#28 0x0000555555712a1a in _PyEval_EvalFrameDefault ()
#29 0x00005555556e847b in fast_function ()
#30 0x00005555556ee515 in call_function ()
#31 0x0000555555712a1a in _PyEval_EvalFrameDefault ()
#32 0x00005555556e7814 in _PyEval_EvalCodeWithName ()
#33 0x00005555556e86b1 in fast_function ()
#34 0x00005555556ee515 in call_function ()
#35 0x0000555555713734 in _PyEval_EvalFrameDefault ()
#36 0x00005555556e91c9 in PyEval_EvalCodeEx ()
#37 0x00005555556ea0e6 in function_call ()
#38 0x0000555555666fae in PyObject_Call ()
#39 0x00005555557142da in _PyEval_EvalFrameDefault ()

@jnothman
Copy link
Member Author

jnothman commented Nov 21, 2018 via email

@rth
Copy link
Member

rth commented Nov 21, 2018

Interesting problem. I suppose adding nomkl to CircleCI conda requirements might be a lazy fix, but it's more avoiding the problem than solving it.

Also FWIW all tests pass on conda forge using this PR conda-forge/scikit-learn-feedstock#80

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

It's very strange for this to crop up all of a sudden... Have you tried clearing the conda instance from the cache?

Not sure how I would do that to be honest. From a quick look at .circle/config.yml it does not look like we are caching the conda install but maybe I am missing something. What I tried:

conda clean -a -y # to be on the safe side
conda remove -n my-test --all -y
conda create -n my-test numpy ipython python=3.6 -y

SafetyError: The package for numpy-base located at /home/circleci/miniconda/pkgs/numpy-base-1.15.4-py36h81de0dd_0
appears to be corrupted. The path 'lib/python3.6/site-packages/numpy/linalg/linalg.py'
has a sha256 mismatch.
  reported sha256: e410474c4534f3b226ba7f16faab47331406e8841ed35e9827fbd61638785d91
  actual sha256: 84bf78e8b2ff62c946cd9e4934ffca1e5795f5bce93a4aaac60986cefcef0243

It's hilarious that it's plot_stock_market again causing headaches...

Depends on your sense of humour I guess, but yeah I totally get your point ;-)

@jnothman
Copy link
Member Author

You hadn't mentioned the corrupted numpy-base before had you? It certainly looks suspicious!

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

You hadn't mentioned the corrupted numpy-base before had you? It certainly looks suspicious!

Looking at it a bit more it may be a red herring not sure ... I was able to delete the cached packages (conda clean -a did not seem to work for some reason) and re-download them but that did not get rid of the segmentation fault.

I have tested that adding nomkl was fixing the numpy problem above so it may be a reasonable work-around indeed.

@rth
Copy link
Member

rth commented Nov 21, 2018

Overall it feels like a packaging issue upstream with numpy and MKL. Maybe we can open an issue at numpy with your minimal example from #12383 (comment) and use nomkl (which would also make builds faster)?

@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2018

Maybe we can open an issue at numpy with your minimal example from #12383 (comment)

Has anyone tried and manage to reproduce the crash locally or is this just an issue on the circle ci host? The stock market example works with my numpy with openblas installed from pypi.

@rth
Copy link
Member

rth commented Nov 21, 2018

I can't reproduce the small example #12383 (comment) using the same docker image as in CircleCI. Have not tried to run the full build though, so versions might differ slightly..

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

Overall it feels like a packaging issue upstream with numpy and MKL. Maybe open an issue at numpy with your minimal example from #12383 (comment)

The thing is I only managed to reproduce the error on the CircleCI machine ... I haven't tried reusing the same docker image to reproduce locally. I'll try first using nomkl in the CircleCI build.

@qinhanmin2014
Copy link
Member

I'll try first using nomkl in the CircleCI build.

A PR with [doc build]?

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

I have a PR with nomkl and [doc build] here:
#12636

@amueller
Copy link
Member

amueller commented Nov 21, 2018

Such a great thing to wake up to lol. Why are there NaN in that array anyway?
@rth can you run the full build?
Looks like a caching issues indeed. I'll only be able to do it in an hour, but I would try to ssh into circle again and see if we can fix the numpy-core mismatch thing and/or clear the cache there.

@amueller
Copy link
Member

For the record, the installed versions of numpy that are failing are
numpy: 1.15.4-py36h1d66e8a_0
numpy-base: 1.15.4-py36h81de0dd_0

@rth
Copy link
Member

rth commented Nov 21, 2018

I won't be able to now. Possibly some issue with the default conda channel cf #12636 (comment) It also fails on master.

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

Such a great thing to wake up to lol. Why are there NaN in that array anyway?

Good morning! Good question about the NaNs, not sure.

Looks like a caching issues indeed. I'll only be able to do it in an hour, but I would try to ssh into circle again and see if we can fix the numpy-core mismatch thing and/or clear the cache there.

Which cache do you have in mind and how would you clear it? I would have thought that there were no cache for the miniconda install but maybe I am missing something.

@amueller
Copy link
Member

Looks like there is indeed no miniconda cache. Your ssh was down, so I restarted it and I'll try to have a look, but it'll be a bit slow.

@amueller
Copy link
Member

Seems unrelated to the numpy version btw. Even crashes with numpy 1.13

@lesteve
Copy link
Member

lesteve commented Nov 21, 2018

Seems unrelated to the numpy version btw.

I saw the same thing: I tried with a few different numpy version and the segmentation was still happening.

@amueller
Copy link
Member

amueller commented Nov 21, 2018

but doing nomkl does seem to fix it - using conda's nomkl, not conda-forge btw.

@ogrisel
Copy link
Member

ogrisel commented Nov 21, 2018

but doing nomkl does seem to fix it.

It did not fix it for the python 2 circle ci build in #12636 (first commit of that PR).

@amueller
Copy link
Member

Sorry edited: I didn't use conda-forge, I did conda install nomkl.

@amueller
Copy link
Member

you said

Also, if I do another rebase of master onto this branch, it is able to recognise which commits I've already transferred to the new branch

but that's not happening for me using your branch. Is that expected?

@amueller
Copy link
Member

honestly I have no idea and will just cherry-pick.

@amueller amueller merged commit fd13f47 into scikit-learn:0.20.X Nov 21, 2018
@jnothman
Copy link
Member Author

jnothman commented Nov 21, 2018 via email

@amueller
Copy link
Member

It included some that you picked before, though. That was confusing to me.

@qinhanmin2014
Copy link
Member

Hmm, so Circle CI fails in the PR and passes in 0.20.X? So many mysterious things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.