Skip to content

MNT move an orphaned vendored file around #21166

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

DimitriPapadopoulos
Copy link
Contributor

@DimitriPapadopoulos DimitriPapadopoulos commented Sep 27, 2021

Reference Issues/PRs

See also #21161 and #21165.

What does this implement/fix? Explain your changes.

File scipy/misc/pilutil.py has been removed from Scipy 1.3.0, see scipy/scipy#9325.

Move _pilutil.py from sklearn/externals, which is the location for vendored files, to sklearn/utils, which is a location for
locally maintained files, including orphan vendored files like this one.

Any other comments?

This is a copy of #21165. Something went wrong while rebasing.

@DimitriPapadopoulos
Copy link
Contributor Author

Since this is a file for internal use, does this need to be in the changelog?

@DimitriPapadopoulos
Copy link
Contributor Author

Not sure about the Azure CI black job failure. Is sklearn/externals exempted from black, but not sklearn/utils?

@thomasjpfan
Copy link
Member

Not sure about the Azure CI black job failure. Is sklearn/externals exempted from black, but not sklearn/utils?

Yea. sklearn/externals is for vendored projects/files so we try not to change the source. sklearn/utils is maintained by us, which follows black.

@DimitriPapadopoulos
Copy link
Contributor Author

@thomasjpfan Makes sense. I ran flake8 and black on the file, but some Azure CI jobs still fail.

@thomasjpfan
Copy link
Member

thomasjpfan commented Sep 27, 2021

The tests are running doctests, which are failing. For example bytescale uses from scipy.misc import bytescale which does not exist.

For future reference, you can explore these errors by clicking on details in the failed check:

Screen Shot 2021-09-27 at 5 46 10 PM

and navigating to the error, which is here

Thoughts on this PR

For this PR, I would consider wrapping Pillow lightly, since we only use imread and imresize. imsave is only used for testing in sklearn/datasets/tests/test_lfw.py, so we can use pillow directly in the testing code.

There are two public functions that use functions in _pilutil.py:

  1. sklearn.datasets.load_sample_images: Optionally import pillow and error if its not installed. If Pillow is installed use np.asarray(Image.open(...)) instead of imread.

  2. sklearn.datasets._lfw._load_imgs method: Optionally import pillow and error if its not installed. If Pillow is installed load images as a Pillow image object using Image.open, use the Image.resize api to resize and then convert to a numpy array.

(This way, we do not need to maintain a big sklearn.utils._pilutil file.)

@DimitriPapadopoulos
Copy link
Contributor Author

I had seen that page, but it doesn't make sense:

  • It's too verbose.
  • It looks like the problem is with the XFAIL messages.

However, now that you've pointed me to these doctest messages, I'll try fixing them:

__________________ [doctest] sklearn.utils._pilutil.bytescale __________________
[gw0] linux -- Python 3.9.7 /usr/share/miniconda/envs/testvenv/bin/python
108 
109     Returns
110     -------
111     img_array : uint8 ndarray
112         The byte-scaled array.
113 
114     Examples
115     --------
116     >>> import numpy as np
117     >>> from scipy.misc import bytescale
UNEXPECTED EXCEPTION: ImportError("cannot import name 'bytescale' from 'scipy.misc' (/usr/share/miniconda/envs/testvenv/lib/python3.9/site-packages/scipy/misc/__init__.py)")
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/testvenv/lib/python3.9/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest sklearn.utils._pilutil.bytescale[1]>", line 1, in <module>
ImportError: cannot import name 'bytescale' from 'scipy.misc' (/usr/share/miniconda/envs/testvenv/lib/python3.9/site-packages/scipy/misc/__init__.py)
/home/vsts/work/1/s/sklearn/utils/_pilutil.py:117: UnexpectedException
___________________ [doctest] sklearn.utils._pilutil.imsave ____________________
[gw0] linux -- Python 3.9.7 /usr/share/miniconda/envs/testvenv/bin/python
243         Image format. If omitted, the format to use is determined from the
244         file name extension. If a file object was used instead of a file name,
245         this parameter should always be used.
246 
247     Examples
248     --------
249     Construct an array of gradient intensity values and save to file:
250 
251     >>> import numpy as np
252     >>> from scipy.misc import imsave
UNEXPECTED EXCEPTION: ImportError("cannot import name 'imsave' from 'scipy.misc' (/usr/share/miniconda/envs/testvenv/lib/python3.9/site-packages/scipy/misc/__init__.py)")
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/testvenv/lib/python3.9/doctest.py", line 1336, in __run
    exec(compile(example.source, filename, "single",
  File "<doctest sklearn.utils._pilutil.imsave[1]>", line 1, in <module>
ImportError: cannot import name 'imsave' from 'scipy.misc' (/usr/share/miniconda/envs/testvenv/lib/python3.9/site-packages/scipy/misc/__init__.py)
/home/vsts/work/1/s/sklearn/utils/_pilutil.py:252: UnexpectedException

File scipy/misc/pilutil.py has been removed from Scipy 1.3.0,
see scipy/scipy#9325.

Move _pilutil.py from sklearn/externals, which is the location
for vendored files, to sklearn/utils, which is a location for
locally maintained files, including orphan vendored files like
this one.
Either import the private module documented in these examples, or we
remove the examples altogether. I chose to keep the examples.
@DimitriPapadopoulos
Copy link
Contributor Author

DimitriPapadopoulos commented Feb 11, 2022

It looks like doctests are OK after my last commit. Please let me know if anything else is needed.

About wrapping Pillow lightly: nothing against it, but this change feels more involved than the current PR. I would rather defer such a change to another PR – provided I can find the time.

thomasjpfan
thomasjpfan previously approved these changes Feb 11, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this again, I'm okay with the moving the vendored file into utils for now.

LGTM

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @DimitriPapadopoulos.

I am mostly OK to move this file under utils for maintenance semantic reasons since it has been deprecated in Scipy. (I think having as little vendored code as possible is good for maintenance.)

IMO, it looks like the implementations can be simplified, but that's out of the scope of this PR (and it is also probably low priority as a change).


Edit: I would wait for a third reviewer before merging this one.

@thomasjpfan thomasjpfan dismissed their stale review March 9, 2022 22:57

I think we can move to removing pilutils altogether

@thomasjpfan
Copy link
Member

After looking at this again, I prefer removing the file all together. I opened #22743 as an alternative, which removes _pilutil.

@ogrisel
Copy link
Member

ogrisel commented Mar 17, 2022

I would also lean towards the alternative implemented in #22743.

@lesteve
Copy link
Member

lesteve commented Mar 18, 2022

Closing since #22743 was merged.

@lesteve lesteve closed this Mar 18, 2022
@DimitriPapadopoulos DimitriPapadopoulos deleted the externals_utils branch March 20, 2022 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants