-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Allow URL strings to be passed to imread #4256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
These modifications allow valid URL strings to be passed directly into imread. URL recognition is done using the standard library function urlparse. If the string contains a valid url scheme, it will be used to download and process the data using a combination of urlopen and BytesIO (StringIO in python 2).
I should have mentioned, the following code is now possible in Python 2.7 and 3.4 with this PR:
|
While it would be a very neat feature to make it very easy to read images On Sat, Mar 21, 2015 at 10:48 AM, Ryan Nelson notifications@github.com
|
I suppose it is possible... But then you could download a malicious image from the internet by hand and try to load it with this function. In that case, the security issue is not with this addition, but with Pillow and MPLs internal PNG reader. Yes? |
I suppose I also should have commented on the motivation here. In Python 2, you could do something like the following to download an image from the internet: |
Closing this due to the discussion in #4252 where this feature was added upstream to pillow (python-pillow/Pillow#1151 and python-pillow/Pillow#1157). @rnelsonchem Thank you and please do not be discouraged! If I am confused, ping me to have this re-addressed. |
No problem @tacaswell |
I thought that the png handler did do the right thing with urls... On Sun, Apr 12, 2015, 08:21 Ryan Nelson notifications@github.com wrote:
|
That might be the case; I didn't try that. But the current code checks to see if |
Sorry, I should have referenced this line: |
Ah, I understand, sorry about that. Can you remove the PIL related changes? |
The Pillow docs state that Image.open can handle either a filename string or a file object. That simplifies the pilread function substantially.
This is now failing on both py2k flavors. |
|
@tacaswell This is very confusing... I do not see those errors on my Py2 installation. Also, the traceback for the first two errors says they occur in line 1269 of the image module. However, the line that is printed in the traceback and the actual line in the file are not equivalent. Also, if I run the following in Py2:
the I'll look a bit more at the third failure, but I don't see that on my local install either... |
This may be a pil vs pillow or a version issue. You can look at the .Travis.yml file to see exactly what we install. |
when lines in a traceback do not match up with the actual file, that is On Thu, Apr 16, 2015 at 8:16 AM, Thomas A Caswell notifications@github.com
|
It seems that the identity test for the parsed URL scheme was not sufficient. I changed this to an equality test, and things are now working properly.
Okay. So it turns out that the identity testing |
yeah, that would be the correct change, even in python 3. There is no guarantee that an empty string in one variable is the same empty string elsewhere. |
@WeatherGod since strings are immutable I think they should be. I think the issue in Python2 is that:
is not true. And we are typically using However using != is the right solution IMHO |
If you are using In [104]: x = 1544656783524
In [105]: y = 1544656783524
In [106]: x is y
Out[106]: False
In [107]: x == y
Out[107]: True in this case |
Shoot. I misunderstood the Pillow PRs for which @tacaswell provided links earlier. It seems that the changes that have been made to that library allow you to do something like the following:
In this case, the current version of that If we are going to implement this functionality, then we need something that is a combination of my first and last commit. Right now the PNG version works fine. Is this something that has a chance to get merged? If so, I can redo my initial commit, which seems like the best course of action. I should also add some tests as well. What would be a good PNG and JPG (for example) URLs that we could ping every time someone runs the test suite? |
We can use files from the matplotlib.org website, just make sure to wrap it I would prefer to merge just the png version and leave the everything else On Thu, Apr 16, 2015, 16:17 Ryan Nelson notifications@github.com wrote:
|
Ok. That's fine. As things currently stand, though, I have made two separate changes essentially in a single PR.
These two changes are not exactly related, and I just want to make sure that it will be okay to leave them as is in a single PR. One potential problem. Pillow |
parsed = urlparse(fname) | ||
# If fname is a URL, download the data | ||
if parsed.scheme != '': | ||
print('Scheme: %s' % parsed.scheme) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debugging print please.
I am 👍 on this going in modulo removing that print statement and adding an entry in https://github.com/matplotlib/matplotlib/tree/master/doc/users/whats_new |
@tacaswell Thanks for the reminder email... Sorry for the delay. I removed that print statement and added a note to the What's New docs. In addition, I updated the docstring and added a single test case, which downloads the MPL logo from the website. Let me know if you see any other necessary changes. |
Uhm, that test case might be problematic. I remember having to do some
|
Good point attn @sandrotosi , does this run afoul of debian rules? |
@WeatherGod @tacaswell I'm a little confused. Could you provide a little more detail into the problem with my test. I'm happy to change it if there is an issue. |
As part of the debian packaging process the test suite gets run and they Given the importance of debian (and it's derivatives) making sure we stay On Fri, May 29, 2015 at 9:43 AM Ryan Nelson notifications@github.com
|
@tacaswell Gotcha. It looks like one alternative would be to create a separate http server (only listening on localhost) to serve local files, and then just request the local picture file through the local web server. I could create that in a new process or using threads. Might be a bit of extra work for one test. I could also just remove the test entirely. |
Lets go with just removing the test from this PR (and putting in a new one The other place we have this issue is in the tests for styles which maybe On Fri, May 29, 2015 at 12:33 PM Ryan Nelson notifications@github.com
|
I removed the unit test and added a PR #4485 for the test in question. |
ENH: Allow URL strings to be passed to imread
@rnelsonchem Thanks! |
These modifications allow valid URL strings to be passed directly into imread. URL recognition is done using the standard library function urlparse. If the string contains a valid url scheme, it will be used to download and process the data using a combination of urlopen and BytesIO (StringIO in python 2). (I followed methodology in the
pandas.read_*
functions, which also allow valid URL strings.)Things that I haven't done:
I'll do these things if this code addition seems acceptable.
One other thing that I've noticed, the
matplotlib.pyplot.imread
function signature does not match thematplotlib.image.imread
signature. It seems like the doc string would be a little easier to follow if the function signatures matched. Maybe that is not something to address in this PR.