Skip to content

[MRG + 1] Remove redundancy in #9552 #9573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 26, 2017
Merged

[MRG + 1] Remove redundancy in #9552 #9573

merged 4 commits into from
Oct 26, 2017

Conversation

liamge
Copy link
Contributor

@liamge liamge commented Aug 17, 2017

Reference Issue

Fixes #9552

What does this implement/fix? Explain your changes.

Fixes the redundancy in #9552 by removing the second array = np.atleast_2d(array) check.

Any other comments?

All check_array tests pass.

@jnothman
Copy link
Member

======================================================================
ERROR: sklearn.preprocessing.tests.test_data.test_quantile_transform_bounds
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/miniconda/envs/testenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/preprocessing/tests/test_data.py", line 1160, in test_quantile_transform_bounds
    assert_equal(transformer.transform(-10), transformer.transform(np.min(X)))
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/preprocessing/data.py", line 2423, in transform
    X = self._check_inputs(X)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/preprocessing/data.py", line 2348, in _check_inputs
    dtype=[np.float64, np.float32])
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/utils/validation.py", line 425, in check_array
    n_samples = _num_samples(array)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/utils/validation.py", line 118, in _num_samples
    " a valid collection." % x)
TypeError: Singleton array array(-10.0) cannot be considered a valid collection.

Perhaps we need to improve the tests for check_array because clearly the behaviour does change with this line.

Though I think the QuantileTrsansformer tests should not be using scalar inputs to transform, @glemaitre!

@liamge
Copy link
Contributor Author

liamge commented Aug 17, 2017

Any advice for how to build new tests for this? It seems as though the current tests for check_array only raise a ValueError when given a 1d array and not a scalar. Should check_array return a 2d array when given a scalar input?

@jnothman
Copy link
Member

jnothman commented Aug 17, 2017 via email

@liamge
Copy link
Contributor Author

liamge commented Aug 17, 2017

Ok I'll take a look at the QuantileTransformer tests as well, as far as they go how would you suggest I test for the upper/lower bounds without the use of a scalar? Thanks for all the help by the way!

@glemaitre
Copy link
Member

Though I think the QuantileTrsansformer tests should not be using scalar inputs to transform, @glemaitre!

I take the blame. It should be a 2d array. @jnothman.
Would it make sense to have a common test checking that an error is raised when passing a scalar?

@jnothman
Copy link
Member

jnothman commented Aug 17, 2017 via email

@jnothman
Copy link
Member

jnothman commented Aug 17, 2017 via email

@liamge
Copy link
Contributor Author

liamge commented Aug 17, 2017

Ok so I'll change the input to 2D arrays in the quantile tests and write an additional test in test_quantile_transform_check_error to make sure it raises an error on 1D input. Does that work?

if array.ndim == 0:
raise ValueError(
"Expected 2D array, got scalar array instead:\narray={}.\n"
"Reshape your data either using array.reshape(-1, 1) if "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this apply to a scalar array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say the advice does not apply to scalar arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liamge do you intend to address this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reshape(1,1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.array(10).reshape(-1, 1) actually works (np 1.13.1).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can rewrite the code to avoid the copy and paste look and feel:

if array.ndim < 2:
    array_type = 'scalar' if array.ndim == 0 else '1D'
    raise ValueError('...got {} array instead ... array={} ...'.format(array_type, array))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very sorry about the radio silence, would you want me to add an additional commit with the avoidance of the copy/paste look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well @amueller reckons this works, do let's just key it.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from @lesteve’s comment this LGTM

if array.ndim == 0:
raise ValueError(
"Expected 2D array, got scalar array instead:\narray={}.\n"
"Reshape your data either using array.reshape(-1, 1) if "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can reshape(1,1)

"Reshape your data either using array.reshape(-1, 1) if "
"your data has a single feature or array.reshape(1, -1) "
"if it contains a single sample.".format(array))
# If input is 1D raise error
if array.ndim == 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use elif

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

if array.ndim == 0:
raise ValueError(
"Expected 2D array, got scalar array instead:\narray={}.\n"
"Reshape your data either using array.reshape(-1, 1) if "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.array(10).reshape(-1, 1) actually works (np 1.13.1).

@amueller amueller changed the title Remove redundancy in #9552 [MRG + 1] Remove redundancy in #9552 Oct 26, 2017
@jnothman jnothman merged commit 202b532 into scikit-learn:master Oct 26, 2017
@jnothman
Copy link
Member

Thanks @liamge!

@liamge
Copy link
Contributor Author

liamge commented Oct 26, 2017

No problem! Glad to help where I can

@liamge liamge deleted the bug/9552 branch October 26, 2017 21:45
@@ -402,13 +402,20 @@ def check_array(array, accept_sparse=False, dtype="numeric", order=None,
array = np.array(array, dtype=dtype, order=order, copy=copy)

if ensure_2d:
# If input is scalar raise error
if array.ndim == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think the error for scalar array should be the same irrespective of ensure_2d, i.e. it should be taken out of the if ensure_2d clause.

The error message when ensure_2d==False is not so great:

from sklearn.utils import check_array
check_array(10, ensure_2d=False)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-ce2f2639811f> in <module>()
----> 1 check_array(10, ensure_2d=False)

~/dev/scikit-learn/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    488     shape_repr = _shape_repr(array.shape)
    489     if ensure_min_samples > 0:
--> 490         n_samples = _num_samples(array)
    491         if n_samples < ensure_min_samples:
    492             raise ValueError("Found array with %d sample(s) (shape=%s) while a"

~/dev/scikit-learn/sklearn/utils/validation.py in _num_samples(x)
    118         if len(x.shape) == 0:
    119             raise TypeError("Singleton array %r cannot be considered"
--> 120                             " a valid collection." % x)
    121         return x.shape[0]
    122     else:

TypeError: Singleton array array(10) cannot be considered a valid collection.

donigian added a commit to donigian/scikit-learn that referenced this pull request Oct 28, 2017
…cs/donigian-update-contribution-guidelines

* 'master' of github.com:scikit-learn/scikit-learn: (23 commits)
  fixes scikit-learn#10031: fix attribute name and shape in documentation (scikit-learn#10033)
  [MRG+1] add changelog entry for fixed and merged PR scikit-learn#10005 issue scikit-learn#9633 (scikit-learn#10025)
  [MRG] Fix LogisticRegression see also should include LogisticRegressionCV(scikit-learn#9995) (scikit-learn#10022)
  [MRG + 1] Labels of clustering should start at 0 or -1 if noise (scikit-learn#10015)
  MAINT Remove redundancy in scikit-learn#9552 (scikit-learn#9573)
  [MRG+1] correct comparison in GaussianNB for 'priors' (scikit-learn#10005)
  [MRG + 1] ENH add check_inverse in FunctionTransformer (scikit-learn#9399)
  [MRG] FIX bug in nested set_params usage (scikit-learn#9999)
  [MRG+1] Fix LOF and Isolation benchmarks (scikit-learn#9798)
  [MRG + 1] Fix negative inputs checking in mean_squared_log_error (scikit-learn#9968)
  DOC Fix typo (scikit-learn#9996)
  DOC Fix typo: x axis -> y axis (scikit-learn#9985)
  improve example plot_forest_iris.py (scikit-learn#9989)
  [MRG+1] Deprecate pooling_func unused parameter in AgglomerativeClustering (scikit-learn#9875)
  DOC update news
  DOC Fix three typos in manifold documentation (scikit-learn#9990)
  DOC add missing dot in docstring
  DOC Add what's new for 0.19.1 (scikit-learn#9983)
  Improve readability of outlier detection example. (scikit-learn#9973)
  DOC: Fixed typo (scikit-learn#9977)
  ...
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

redundant code in check_array
5 participants