Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
d2b405c
FIX Shuffle each class's samples with different random_state in Strat…
qinhanmin2014 Feb 27, 2019
319f27a
FIX Explicitly ignore SparseEfficiencyWarning in DBSCAN (#13539)
peay Apr 1, 2019
936a9fa
[MRG] Save sample_weight_arr instead of sample_weight in KernelDensit…
aditya1702 May 14, 2019
3e14f61
DOC Make explicit that groups required *Group* splitter (#14235)
glemaitre Jul 2, 2019
4f9e557
FIX IndexError due to imprecision in KMeans++ (#11756)
jnothman Jul 18, 2019
e423647
What's new for #11756
jnothman Jul 23, 2019
9474386
FIX ColumnTransformer: raise error on reordered columns with remainde…
schuderer Jul 22, 2019
a2d1b07
MNT Set release version to 0.20.4
jnothman Jul 23, 2019
1569889
Upgrade joblib to version 0.13.2
jnothman Jul 23, 2019
ac2efa2
DOC what's new for joblib upgrade
jnothman Jul 23, 2019
92af13d
Require sphinx 1.6.* rather than 1.6.2
jnothman Jul 23, 2019
6dd45f0
Upgrade doc build requirements since packages are gone from conda
jnothman Jul 23, 2019
21d3033
Use arbitrary versions in Python 2 Circle
jnothman Jul 23, 2019
9498fa6
remove matplotlib version req
jnothman Jul 23, 2019
f9687ca
TST Manually scramble the indices in svm tests (#12890)
qinhanmin2014 Jan 2, 2019
c4d6722
Fixes for travis on latest deps
jnothman Jul 23, 2019
686cde8
Add conda-forge to test old deps no longer on main channels
jnothman Jul 23, 2019
92a6ef0
Docstring indent
jnothman Jul 23, 2019
09b3b52
TST Sets decimal
thomasjpfan May 23, 2019
22ca031
Debug conda install
jnothman Jul 23, 2019
4b143a4
Add tol to silence warning
jnothman Jul 23, 2019
87f67ca
Not getting anywhere with scipy 0.16. Try 0.17.
jnothman Jul 23, 2019
824f7aa
Fix up use of ignore_warnings
jnothman Jul 23, 2019
34ec4ea
Scipy.sparse may only accept arrays, not lists
jnothman Jul 23, 2019
7b0b97b
ignore SparseSeries warning
jnothman Jul 23, 2019
6df9737
Disable coverage in py3.4 due to deps
jnothman Jul 23, 2019
2dc1aff
Avoid installing pytest-cov if no COVERAGE
jnothman Jul 23, 2019
bfaefaf
Make sure coverage is there even if conda removes it
jnothman Jul 23, 2019
201ef7a
DOC update release date
jnothman Jul 29, 2019
989fbf4
DOC update news from 0.20 perspective
jnothman Jul 29, 2019
91bd0fe
DOC fix typo in what's new version numbers
jnothman Jul 29, 2019
c0bd85f
DOC update news
jnothman Jul 29, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ jobs:
- MINICONDA_PATH: ~/miniconda
- CONDA_ENV_NAME: testenv
- PYTHON_VERSION: "2"
- NUMPY_VERSION: "1.10"
- SCIPY_VERSION: "0.16"
- MATPLOTLIB_VERSION: "1.4"
- SCIKIT_IMAGE_VERSION: "0.11"
- PANDAS_VERSION: "0.17.1"
- NUMPY_VERSION: "1.*"
- SCIPY_VERSION: "0.*"
- MATPLOTLIB_VERSION: "*"
- SCIKIT_IMAGE_VERSION: "0.*"
- PANDAS_VERSION: "0.*"
steps:
- checkout
- run: ./build_tools/circle/checkout_merge_commit.sh
Expand Down
6 changes: 3 additions & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,12 @@ matrix:
- libatlas-dev
# Python 3.4 build
- env: DISTRIB="conda" PYTHON_VERSION="3.4" INSTALL_MKL="false"
NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.16.1" CYTHON_VERSION="0.25.2"
PILLOW_VERSION="4.0.0" COVERAGE=true
NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.17" CYTHON_VERSION="0.25.2"
PILLOW_VERSION="4.0.0" COVERAGE=
if: type != cron
# Python 3.5 build
- env: DISTRIB="conda" PYTHON_VERSION="3.5" INSTALL_MKL="false"
NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.16.1" CYTHON_VERSION="0.25.2"
NUMPY_VERSION="1.10.4" SCIPY_VERSION="0.17" CYTHON_VERSION="0.25.2"
PILLOW_VERSION="4.0.0" COVERAGE=true
SKLEARN_SITE_JOBLIB=1 JOBLIB_VERSION="0.11"
if: type != cron
Expand Down
2 changes: 1 addition & 1 deletion build_tools/circle/build_doc.sh
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ conda update --yes --quiet conda
# provided versions
conda create -n $CONDA_ENV_NAME --yes --quiet python="${PYTHON_VERSION:-*}" \
numpy="${NUMPY_VERSION:-*}" scipy="${SCIPY_VERSION:-*}" cython \
pytest coverage matplotlib="${MATPLOTLIB_VERSION:-*}" sphinx=1.6.2 pillow \
pytest coverage matplotlib="${MATPLOTLIB_VERSION:-*}" sphinx=1.6.* pillow \
scikit-image="${SCIKIT_IMAGE_VERSION:-*}" pandas="${PANDAS_VERSION:-*}" \
joblib

Expand Down
16 changes: 11 additions & 5 deletions build_tools/travis/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# matrix entry) from which we pull from local Travis repository. This allows
# us to keep build artefact for gcc + cython, and gain time

set -e
set -ex

echo 'List files from cached directories'
echo 'pip:'
Expand All @@ -38,12 +38,18 @@ make_conda() {
export PATH=$MINICONDA_PATH/bin:$PATH
conda update --yes conda

conda create -n testenv --yes $TO_INSTALL
conda create -c conda-forge -n testenv --yes $TO_INSTALL
source activate testenv
}

if [[ "$COVERAGE" == "true" ]]; then
TEST_DEPS="pytest pytest-cov"
else
TEST_DEPS="pytest"
fi

if [[ "$DISTRIB" == "conda" ]]; then
TO_INSTALL="python=$PYTHON_VERSION pip pytest pytest-cov \
TO_INSTALL="python=$PYTHON_VERSION pip $TEST_DEPS \
numpy=$NUMPY_VERSION scipy=$SCIPY_VERSION \
cython=$CYTHON_VERSION"

Expand Down Expand Up @@ -84,7 +90,7 @@ elif [[ "$DISTRIB" == "ubuntu" ]]; then
# and scipy
virtualenv --system-site-packages testvenv
source testvenv/bin/activate
pip install pytest pytest-cov cython==$CYTHON_VERSION
pip install $TEST_DEPS cython==$CYTHON_VERSION

elif [[ "$DISTRIB" == "scipy-dev" ]]; then
make_conda python=3.7
Expand All @@ -96,7 +102,7 @@ elif [[ "$DISTRIB" == "scipy-dev" ]]; then
echo "Installing joblib master"
pip install https://github.com/joblib/joblib/archive/master.zip
export SKLEARN_SITE_JOBLIB=1
pip install pytest pytest-cov
pip install $TEST_DEPS
fi

if [[ "$COVERAGE" == "true" ]]; then
Expand Down
1 change: 1 addition & 0 deletions build_tools/travis/test_pytest_soft_dependency.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ if [[ "$CHECK_PYTEST_SOFT_DEPENDENCY" == "true" ]]; then
if [[ "$COVERAGE" == "true" ]]; then
# Need to append the coverage to the existing .coverage generated by
# running the tests
pip install coverage
CMD="coverage run --append"
else
CMD="python"
Expand Down
10 changes: 5 additions & 5 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -207,20 +207,20 @@
<li><em>On-going development:</em>
<a href="/dev/whats_new.html"><em>What's new</em> (Changelog)</a>
</li>
<li><strong>Scikit-learn 0.21 will drop support for Python 2.7 and Python 3.4.</strong>
<li><strong>Scikit-learn from 0.21 requires Python 3.5 or greater.</strong>
</li>
<li><em>March 2019.</em> scikit-learn 0.20.3 is available for download (<a href="whats_new.html#version-0-20-3">Changelog</a>).
<li><em>July 2019.</em> scikit-learn 0.21.3 (<a href="whats_new.html#version-0-21-3">Changelog</a>) and 0.20.4 (<a href="whats_new.html#version-0-20-4">Changelog</a>) are available for download.
</li>
<li><em>May 2019.</em> scikit-learn 0.21.0 to 0.21.2 are available for download (<a href="whats_new.html#version-0-21">Changelog</a>).
</li>
<li><em>December 2018.</em> scikit-learn 0.20.2 is available for download (<a href="whats_new.html#version-0-20-2">Changelog</a>)
<li><em>March 2019.</em> scikit-learn 0.20.3 is available for download (<a href="whats_new.html#version-0-20-3">Changelog</a>).
</li>
<li><em>September 2018.</em> scikit-learn 0.20.0 is available for download (<a href="whats_new.html#version-0-20-0">Changelog</a>).
</li>
<li><em>July 2018.</em> scikit-learn 0.19.2 is available for download (<a href="whats_new.html#version-0-19">Changelog</a>).
</li>
<li><em>July 2017.</em> scikit-learn 0.19.0 is available for download (<a href="whats_new/v0.19.html#version-0-19">Changelog</a>).
</li>
<li><em>June 2017.</em> scikit-learn 0.18.2 is available for download (<a href="whats_new/v0.18.html#version-0-18-2">Changelog</a>).
</li>
</ul>
</div>

Expand Down
46 changes: 45 additions & 1 deletion doc/whats_new/v0.20.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,50 @@

.. currentmodule:: sklearn

.. _changes_0_20_4:

Version 0.20.4
==============

**July 30, 2019**

This is a bug-fix release with some bug fixes applied to version 0.20.3.

Changelog
---------

The bundled version of joblib was upgraded from 0.13.0 to 0.13.2.

:mod:`sklearn.cluster`
..............................

- |Fix| Fixed a bug in :class:`cluster.KMeans` where KMeans++ initialisation
could rarely result in an IndexError. :issue:`11756` by `Joel Nothman`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this entry is not included in master (and 0.21.3)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, true. It's now included in the 0.20.4 section in master at least...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for missing it.


:mod:`sklearn.compose`
.....................

- |Fix| Fixed an issue in :class:`compose.ColumnTransformer` where using
DataFrames whose column order differs between :func:``fit`` and
:func:``transform`` could lead to silently passing incorrect columns to the
``remainder`` transformer.
:pr:`14237` by `Andreas Schuderer <schuderer>`.

:mod:`sklearn.model_selection`
..............................

- |Fix| Fixed a bug where :class:`model_selection.StratifiedKFold`
shuffles each class's samples with the same ``random_state``,
making ``shuffle=True`` ineffective.
:issue:`13124` by :user:`Hanmin Qin <qinhanmin2014>`.

:mod:`sklearn.neighbors`
......................

- |Fix| Fixed a bug in :class:`neighbors.KernelDensity` which could not be
restored from a pickle if ``sample_weight`` had been used.
:issue:`13772` by :user:`Aditya Vyas <aditya1702>`.

.. _changes_0_20_3:

Version 0.20.3
Expand Down Expand Up @@ -30,7 +74,7 @@ Changelog
:issue:`12946` by :user:`Pierre Tallotte <pierretallotte>`.

:mod:`sklearn.covariance`
......................
.........................

- |Fix| Fixed a regression in :func:`covariance.graphical_lasso` so that
the case `n_features=2` is handled correctly. :issue:`13276` by
Expand Down
2 changes: 1 addition & 1 deletion sklearn/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
#
__version__ = '0.20.3'
__version__ = '0.20.4'


try:
Expand Down
5 changes: 3 additions & 2 deletions sklearn/cluster/dbscan_.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
# License: BSD 3 clause

import numpy as np
import warnings
from scipy import sparse

from ..base import BaseEstimator, ClusterMixin
from ..utils import check_array, check_consistent_length
from ..utils.testing import ignore_warnings
from ..neighbors import NearestNeighbors

from ._dbscan_inner import dbscan_inner
Expand Down Expand Up @@ -139,7 +139,8 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
X.sum_duplicates() # XXX: modifies X's internals in-place

# set the diagonal to explicit values, as a point is its own neighbor
with ignore_warnings():
with warnings.catch_warnings():
warnings.simplefilter('ignore', sparse.SparseEfficiencyWarning)
X.setdiag(X.diagonal()) # XXX: modifies X's internals in-place

X_mask = X.data <= eps
Expand Down
3 changes: 3 additions & 0 deletions sklearn/cluster/k_means_.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ def _k_init(X, n_clusters, x_squared_norms, random_state, n_local_trials=None):
rand_vals = random_state.random_sample(n_local_trials) * current_pot
candidate_ids = np.searchsorted(stable_cumsum(closest_dist_sq),
rand_vals)
# XXX: numerical imprecision can result in a candidate_id out of range
np.clip(candidate_ids, None, closest_dist_sq.size - 1,
out=candidate_ids)

# Compute distances to center candidates
distance_to_candidates = euclidean_distances(
Expand Down
35 changes: 31 additions & 4 deletions sklearn/compose/_column_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,9 @@ class ColumnTransformer(_BaseComposition, TransformerMixin):
the transformers.
By setting ``remainder`` to be an estimator, the remaining
non-specified columns will use the ``remainder`` estimator. The
estimator must support `fit` and `transform`.
estimator must support :term:`fit` and :term:`transform`.
Note that using this feature requires that the DataFrame columns
input at :term:`fit` and :term:`transform` have identical order.

sparse_threshold : float, default = 0.3
If the output of the different transfromers contains sparse matrices,
Expand Down Expand Up @@ -295,11 +297,17 @@ def _validate_remainder(self, X):
"'passthrough', or estimator. '%s' was passed instead" %
self.remainder)

n_columns = X.shape[1]
# Make it possible to check for reordered named columns on transform
if (hasattr(X, 'columns') and
any(_check_key_type(cols, str) for cols in self._columns)):
self._df_columns = X.columns

self._n_features = X.shape[1]
cols = []
for columns in self._columns:
cols.extend(_get_column_indices(X, columns))
remaining_idx = sorted(list(set(range(n_columns)) - set(cols))) or None
remaining_idx = list(set(range(self._n_features)) - set(cols))
remaining_idx = sorted(remaining_idx) or None

self._remainder = ('remainder', self.remainder, remaining_idx)

Expand Down Expand Up @@ -488,8 +496,27 @@ def transform(self, X):

"""
check_is_fitted(self, 'transformers_')

X = _check_X(X)

if self._n_features > X.shape[1]:
raise ValueError('Number of features of the input must be equal '
'to or greater than that of the fitted '
'transformer. Transformer n_features is {0} '
'and input n_features is {1}.'
.format(self._n_features, X.shape[1]))

# No column reordering allowed for named cols combined with remainder
if (self._remainder[2] is not None and
hasattr(self, '_df_columns') and
hasattr(X, 'columns')):
n_cols_fit = len(self._df_columns)
n_cols_transform = len(X.columns)
if (n_cols_transform >= n_cols_fit and
any(X.columns[:n_cols_fit] != self._df_columns)):
raise ValueError('Column ordering must be equal for fit '
'and for transform when using the '
'remainder keyword')

Xs = self._fit_transform(X, None, _transform_one, fitted=True)
self._validate_output(Xs)

Expand Down
48 changes: 48 additions & 0 deletions sklearn/compose/tests/test_column_transformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -498,6 +498,17 @@ def test_column_transformer_invalid_columns(remainder):
assert_raise_message(ValueError, "Specifying the columns",
ct.fit, X_array)

# transformed n_features does not match fitted n_features
col = [0, 1]
ct = ColumnTransformer([('trans', Trans(), col)], remainder=remainder)
ct.fit(X_array)
X_array_more = np.array([[0, 1, 2], [2, 4, 6], [3, 6, 9]]).T
ct.transform(X_array_more) # Should accept added columns
X_array_fewer = np.array([[0, 1, 2], ]).T
err_msg = 'Number of features'
with pytest.raises(ValueError, match=err_msg):
ct.transform(X_array_fewer)


def test_column_transformer_invalid_transformer():

Expand Down Expand Up @@ -1033,3 +1044,40 @@ def test_column_transformer_negative_column_indexes():
tf_1 = ColumnTransformer([('ohe', ohe, [-1])], remainder='passthrough')
tf_2 = ColumnTransformer([('ohe', ohe, [2])], remainder='passthrough')
assert_array_equal(tf_1.fit_transform(X), tf_2.fit_transform(X))


@pytest.mark.parametrize("explicit_colname", ['first', 'second'])
def test_column_transformer_reordered_column_names_remainder(explicit_colname):
"""Regression test for issue #14223: 'Named col indexing fails with
ColumnTransformer remainder on changing DataFrame column ordering'

Should raise error on changed order combined with remainder.
Should allow for added columns in `transform` input DataFrame
as long as all preceding columns match.
"""
pd = pytest.importorskip('pandas')

X_fit_array = np.array([[0, 1, 2], [2, 4, 6]]).T
X_fit_df = pd.DataFrame(X_fit_array, columns=['first', 'second'])

X_trans_array = np.array([[2, 4, 6], [0, 1, 2]]).T
X_trans_df = pd.DataFrame(X_trans_array, columns=['second', 'first'])

tf = ColumnTransformer([('bycol', Trans(), explicit_colname)],
remainder=Trans())

tf.fit(X_fit_df)
err_msg = 'Column ordering must be equal'
with pytest.raises(ValueError, match=err_msg):
tf.transform(X_trans_df)

# No error for added columns if ordering is identical
X_extended_df = X_fit_df.copy()
X_extended_df['third'] = [3, 6, 9]
tf.transform(X_extended_df) # No error should be raised

# No 'columns' AttributeError when transform input is a numpy array
X_array = X_fit_array.copy()
err_msg = 'Specifying the columns'
with pytest.raises(ValueError, match=err_msg):
tf.transform(X_array)
8 changes: 4 additions & 4 deletions sklearn/cross_decomposition/tests/test_pls.py
Original file line number Diff line number Diff line change
Expand Up @@ -357,13 +357,13 @@ def test_scale_and_stability():
X_score, Y_score = clf.fit_transform(X, Y)
clf.set_params(scale=False)
X_s_score, Y_s_score = clf.fit_transform(X_s, Y_s)
assert_array_almost_equal(X_s_score, X_score)
assert_array_almost_equal(Y_s_score, Y_score)
assert_array_almost_equal(X_s_score, X_score, decimal=4)
assert_array_almost_equal(Y_s_score, Y_score, decimal=4)
# Scaling should be idempotent
clf.set_params(scale=True)
X_score, Y_score = clf.fit_transform(X_s, Y_s)
assert_array_almost_equal(X_s_score, X_score)
assert_array_almost_equal(Y_s_score, Y_score)
assert_array_almost_equal(X_s_score, X_score, decimal=4)
assert_array_almost_equal(Y_s_score, Y_score, decimal=4)


def test_pls_errors():
Expand Down
4 changes: 2 additions & 2 deletions sklearn/datasets/svmlight_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,8 @@ def load_svmlight_file(f, n_features=None, dtype=np.float64,

See also
--------
load_svmlight_files: similar function for loading multiple files in this
format, enforcing the same number of features/columns on all of them.
load_svmlight_files : similar function for loading multiple files in this
format, enforcing the same number of features/columns on all of them.

Examples
--------
Expand Down
4 changes: 2 additions & 2 deletions sklearn/externals/joblib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
==================== ===============================================
**Documentation:** https://joblib.readthedocs.io

**Download:** http://pypi.python.org/pypi/joblib#downloads
**Download:** https://pypi.python.org/pypi/joblib#downloads

**Source code:** https://github.com/joblib/joblib

Expand Down Expand Up @@ -106,7 +106,7 @@
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
#
__version__ = '0.13.0'
__version__ = '0.13.2'


from .memory import Memory, MemorizedResult, register_store_backend
Expand Down
Loading