Skip to content

[MRG+2]: Copy pilutils functions that are deprecated in scipy #10427

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 18, 2018

Conversation

jotasi
Copy link

@jotasi jotasi commented Jan 8, 2018

Reference Issues/PRs

Fixes #10147
See also #10149 which is an incomplete and inactive pull request that attempts to fix the same issue.

What does this implement/fix? Explain your changes.

scipy.misc.pilutil's functions are deprecated since version 1.0.0 and will be removed in 1.2.0. As decided in the discussion of #10147 this PR copies the functionality needed in scikit-learn (imread and imresize) and their respective test cases. Following the discussion in #10149 they are put into a utility module in sklearn.util.pilutil. I only included functionality that is already used in scikit-learn, dropping unneeded functions and parameters.

Other comments

I'm still unsure about the following points:

  1. Licensing: As described in SciPy's docs I copied their license's text into the files containing the implementation as well as the tests. Is this sufficient? I'm especially unsure about the images used in the tests which cannot simply include the license text.

  2. Testing: The added tests are not run on Travis as none of the runners installs pillow. Also all other tests that use this functionality are not run. The tests run fine on my machine but it would be nice to add testing on Travis as well. I could add installation of pillow to one/all Travis runners to also execute the tests here. Should this be done or is this unnecessary?

Generally, there were good reasons for SciPy to drop these functions as discussed in the corresponding PR so one might think about following their advice and moving to imageio and maybe skimage in the long run.

@jnothman
Copy link
Member

jnothman commented Jan 8, 2018

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@jotasi jotasi force-pushed the copy_deprecated_scikit_pilutils branch 2 times, most recently from bbb3d24 to 3368e30 Compare January 8, 2018 20:51
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this belongs under externals? Otherwise LGTM

@jotasi jotasi force-pushed the copy_deprecated_scikit_pilutils branch 4 times, most recently from edea9cf to 71b270b Compare January 9, 2018 06:01
@jotasi
Copy link
Author

jotasi commented Jan 9, 2018

Thanks @jnothman for the quick reply. I moved it to externals. I also added Pillow to Travis runners 1 & 2 to verify that the tests are passing there as well. Should I keep this or should I kick it out again?

@jotasi jotasi changed the title [WIP]: Copy pilutils functions that deprecated in scipy [WIP]: Copy pilutils functions that are deprecated in scipy Jan 9, 2018
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want a directory sklearn/externals/tests. Would it hurt to have a sklearn/externals/pilutil/ directory?

@jotasi jotasi force-pushed the copy_deprecated_scikit_pilutils branch 7 times, most recently from a930721 to 069e8c9 Compare January 10, 2018 20:42
@jotasi
Copy link
Author

jotasi commented Jan 11, 2018

Ok. I moved it into its own folder and subpackage. Also I added _imsave, which is used in test_lfw.py. I think now all calls to deprecated scipy.misc.pilutil functions are updated. Does anything else need to be changed?

@@ -31,8 +31,8 @@

from .base import get_data_home, _fetch_remote, RemoteFileMetadata
from ..utils import Bunch
from ..externals.pilutil import _imread, _imresize
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How fast is importing pillow/pil on your machine? I would keep the import local if it will slow accessing datasets significantly just because the user happens to have the library installed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my machine the import from PIL import Image takes approx. 0.04 seconds. The import from sklearn.externals.pilutil import _imread, _imresize takes 0.8 seconds though. Should I keep the global import or should I change it back to a local one?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise lgtm

@@ -82,6 +82,10 @@ if [[ "$COVERAGE" == "true" ]]; then
pip install coverage codecov
fi

if [[ "$PILLOW" == "true" ]]; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you merge master because your install.sh is outdated: https://github.com/scikit-learn/scikit-learn/blob/master/build_tools/travis/install.sh#L40

I would probably add it to the TO_INSTALL environment variable if possible

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased on the current master and added the environment variable to TO_INSTALL for "$DISTRIB" == "conda". The disadvantage is that I had to separately add it for the "$DISTRIB" == "ubuntu" case. Should I keep it like this?

@jotasi jotasi force-pushed the copy_deprecated_scikit_pilutils branch from 069e8c9 to 4e93951 Compare January 11, 2018 16:01
@glemaitre
Copy link
Member

glemaitre commented Jan 11, 2018 via email

@jotasi
Copy link
Author

jotasi commented Jan 11, 2018

Atm. the tests are just skipped if pillow is not installed and thus no error would be raised. But I definitely can just add the test to one or both conda builds and skip them on the ubuntu one. My original reasoning for adding it to ubuntu was to check it on both python 2 and 3 (both conda builds are python 3, aren't they?).

@glemaitre
Copy link
Member

glemaitre commented Jan 11, 2018 via email

@jotasi jotasi force-pushed the copy_deprecated_scikit_pilutils branch from 4e93951 to 276efe6 Compare January 12, 2018 05:52
@jotasi
Copy link
Author

jotasi commented Jan 12, 2018

Ok. I changed the last commit accordingly. Should I add pillow to appveyor then?

@glemaitre
Copy link
Member

yes you can. We were testing this case before (with scipy). I would also do it in circle CI.

@jotasi jotasi force-pushed the copy_deprecated_scikit_pilutils branch from 276efe6 to 0f08bf0 Compare January 12, 2018 16:05
@jotasi
Copy link
Author

jotasi commented Jan 17, 2018

As option 3 seems to be the consensus, I've updated the PR: sklearn.externals._pilutil now contains the original versions of all functions used from scipy.misc.pilutil. The only things that were changed are:

  • When pillow is not installed, the module does not fail at import but when a function is called, that requires pillow.
  • The documentation was changed slightly and the license was added.
  • The example for imsave was removed as it would fail on machines without pillow and did not seem to be overly important.

PS: The Travis build is canceled before running not just here, but for all PRs that were updated before. There seems to be something wrong with it. (On my fork, Travis is running fine...)

@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@lesteve
Copy link
Member

lesteve commented Jan 17, 2018

I pushed some tweaks, mostly to reduce differences with respect to the original scipy file. I think I am fine with this PR as it is. If someone wants to have another quick look, be my guest.

Probably best to leave it for another PR, but there seems to be some occurences of imread/imresize in doc and examples:

doc/datasets/index.rst:* `scipy.misc.imread <https://docs.scipy.org/doc/scipy/reference/generated/scipy.
doc/datasets/index.rst:  misc.imread.html#scipy.misc.imread>`_ (requires the `Pillow
examples/classification/plot_digits_classification.py:# matplotlib.pyplot.imread.  Note that each image must have the same size. For 
examples/cluster/plot_face_segmentation.py:face = sp.misc.imresize(face, 0.10) / 255.
examples/cluster/plot_face_ward_segmentation.py:face = sp.misc.imresize(face, 0.10) / 255.

@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@lesteve
Copy link
Member

lesteve commented Jan 17, 2018

I pulled master to get #10486 which should (hopefully) make Travis happy.

@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@jnothman
Copy link
Member

jnothman commented Jan 17, 2018 via email

@jotasi
Copy link
Author

jotasi commented Jan 18, 2018

First of all: Thanks @lesteve for cleaning up after me!

Concerning the complaint of lgtm.com:
As far as I can see this cannot happen, although, if there were no restriction to keep the diff small I would add some input validation.
At the beginning of toimage shape is checked to be of either length 2 or 3:
https://github.com/jotasi/scikit-learn/blob/2f78ea7653515be83b837399310bc9621a3feb31/sklearn/externals/_pilutil.py#L357-L361

Therefore, this line, taking the cath element of shape would fail for any ca not in [0,1,2]:
https://github.com/jotasi/scikit-learn/blob/2f78ea7653515be83b837399310bc9621a3feb31/sklearn/externals/_pilutil.py#L410

And thus this block will never be called with ca not in [0,1,2]:
https://github.com/jotasi/scikit-learn/blob/2f78ea7653515be83b837399310bc9621a3feb31/sklearn/externals/_pilutil.py#L415-L423

Although, I do agree that the error message, that the list index of shape is out of range, is not a very informative error message.

Concerning the errors on Travis:
Sadly, the pull of the master did not fix the problem. I think for the ignore option to work one needs to specify the full (absolute or relative) path to the folder/file one wants to ignore, rather than the path from the modules root. At least that is the case on my local machine.

@lesteve
Copy link
Member

lesteve commented Jan 18, 2018

I would ignore the lgtm warning, because that's in externals, for the same reason that we do not test code in externals.

About the error in Travis, looks like the --ignore does not work because we are using the installed sklearn package through --pyargs (and not the local folder). I'll try to figure out a way of doing it. If there is no easy way, removing the doctests in _pilutil.py is reasonable.

@lesteve
Copy link
Member

lesteve commented Jan 18, 2018

I just pushed an alternative solution for not running tests in externals, fingers crossed.

@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@lesteve lesteve force-pushed the copy_deprecated_scikit_pilutils branch from a8d05f9 to b36abe1 Compare January 18, 2018 09:51
@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Potentially uninitialized local variable

Comment posted by lgtm.com

@lesteve
Copy link
Member

lesteve commented Jan 18, 2018

LGTM, merging, thanks a lot @jotasi!

@amueller
Copy link
Member

Sorry I'm out of the loop, but wouldn't it be better to add a dependency to one of the libraries? That will point people more directly towards the right solution. And we only need this in some rare cases.

@jnothman
Copy link
Member

jnothman commented Jan 22, 2018 via email

@glemaitre
Copy link
Member

Sorry I'm out of the loop, but wouldn't it be better to add a dependency to one of the libraries? That will point people more directly towards the right solution. And we only need this in some rare cases.

To be able to read image, we will need at least pillow + numpy and make the conversion ourself. This is what imageio is doing in fact. Resizing is a more complex thing since that we would need to use scikit-image which would required dependencies as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

load_sample_images uses deprecated imread
6 participants