Skip to content

GH-125866: Support complete "file:" URLs in urllib #132378

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 14, 2025

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Apr 10, 2025

Add optional add_scheme argument to urllib.request.pathname2url(); when set to true, a complete URL is returned. Likewise add optional require_scheme argument to url2pathname(); when set to true, a complete URL is accepted.


📚 Documentation preview 📚: https://cpython-previews--132378.org.readthedocs.build/

Add optional *add_scheme* argument to `urllib.request.pathname2url()`; when
set to true, a complete URL is returned. Likewise add optional *has_scheme*
argument to `urllib.request.url2pathname()`; when set to true, a complete
URL is accepted.
@barneygale barneygale marked this pull request as ready for review April 10, 2025 21:44
@barneygale
Copy link
Contributor Author

barneygale commented Apr 10, 2025

Some Qs for reviewers:

  1. Do we like the "has_scheme" and "add_scheme" names? Perhaps the latter should be "include_scheme"?
  2. Should this wait until 3.15 given we've already made a few changes to url2pathname() and pathname2url() in 3.14?

Thanks in advance :]

@picnixz
Copy link
Member

picnixz commented Apr 12, 2025

Do we like the "has_scheme" and "add_scheme" names? Perhaps the latter should be "include_scheme"?

If I were to chose one, I would use with_scheme for both because it works in both cases ("should we include the scheme in the output?" and "is the scheme present in the input"?). I would however make it a keyword-only argument so that it's easier to change in 5 years for instance if something went wrong with our naming...

Should this wait until 3.15 given we've already made a few changes to url2pathname() and pathname2url() in 3.14?

I don't think we need to wait if we can ship them. Waiting means that users also wait for at least 2y until they can consider 3.15 as the stable release (and I guess many like having the latest stable release as their main version).

barneygale and others added 3 commits April 12, 2025 14:36
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@barneygale
Copy link
Contributor Author

Thank you for the review + feedback - much appreciated :)

I'm not sure how I feel about with_scheme -- is it a little imprecise? I wonder what Adam and Serhiy think about the arguments naming

@picnixz
Copy link
Member

picnixz commented Apr 12, 2025

is it a little imprecise

A bit, but at least users don't need to remember different names for the path->url and url->path functions.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 👍

@barneygale
Copy link
Contributor Author

A bit, but at least users don't need to remember different names for the path->url and url->path functions.

Well, they didn't have to until we made the argument keyword-only 🙈

@picnixz
Copy link
Member

picnixz commented Apr 13, 2025

Well, they didn't have to until we made the argument keyword-only

True! but on the other hand, if I see func(x, True) I would anyway need to find the docs to see what it's about I think. I'm suggesting a keyword-only to ease future changes (like adding other arguments afterwards). But I honestly don't mind having the names you suggested!

@barneygale
Copy link
Contributor Author

I'm going to ask on discord, as I like with_scheme about equally to has_scheme / add_scheme and I'm struggling to decide.

@ncoghlan
Copy link
Contributor

+1 for keyword-only (non-keyword boolean flags read poorly when called with boolean literals, and reducing the API surface by restricting the calling style generally makes future maintenance easier)

For the parameter naming though, is the flag on url2pathname actually necessary? Couldn't a file:// prefix (if present) be stripped unconditionally? Then if a flag were to be added, it would be require_scheme=True (to make it clear that the purpose of the flag is ensure that the scheme prefix being missing is considered an error rather than being accepted as a scheme-free partial URL).

I also think the ambiguity concern with a unified flag name is valid, so I'd be inclined to keep add_scheme on pathname2url either way (the scheme is added to the output, it isn't expected as part of the input)

@barneygale
Copy link
Contributor Author

For the parameter naming though, is the flag on url2pathname actually necessary? Couldn't a file:// prefix (if present) be stripped unconditionally? Then if a flag were to be added, it would be require_scheme=True (to make it clear that the purpose of the flag is ensure that the scheme prefix being missing is considered an error rather than being accepted as a scheme-free partial URL).

The prefix to remove would be file: rather than file://, because urllib supports expressing relative paths as URLs without authorities:

>>> pathname2url('hello')
'hello'
>>> url2pathname('foo:bar')
'foo:bar'
>>> url2pathname('file:bar')
'file:bar'

Though it's still very unlikely, I worry that file: could begin a legitimate relative path, and that we'd change behaviour here.

@serhiy-storchaka
Copy link
Member

It was a pain to handle URLs like "file:123" where "file" was interpreted as a host name and "123" as a port number..

@ncoghlan
Copy link
Contributor

That makes sense - the input ambiguity would be much worse than I initially thought it would be.

I'd suggest add_scheme/require_scheme as the bikeshed colours then. has_scheme and with_scheme don't really indicate "will throw an exception if the file: prefix is missing" for the URL input case to me.

@picnixz
Copy link
Member

picnixz commented Apr 13, 2025

has_scheme and with_scheme don't really indicate "will throw an exception if the file: prefix is missing" for the URL input case to me

I hadn't thought about the expected contract and the parameter name so require_scheme is a good one. In my projects, I usually use "strict" as the keyword argument meant to indicate that I want a strict parsing but I think strict is not explicit enough.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making sure that we have kw-only at runtime as well and LGTM.

@barneygale
Copy link
Contributor Author

I like require_scheme too! I've adjusted the PR to use that.

One last thought: we could maybe accept a str | None argument called scheme:

>>> url = 'file:///C:/Program%20Files'
>>> url2pathname(url, scheme='file')
'C:\\Program Files'
>>> path = 'C:\\Program Files'
>>> pathname2url(path, scheme='file')
'file:///C:/Program%20Files'

... but I can't really think of a use case for values other than None and "file", plus it may conflict with the meaning of the scheme argument to urlparse() and friends.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@picnixz
Copy link
Member

picnixz commented Apr 13, 2025

... but I can't really think of a use case for values other than None and "file"

I also don't know of another scheme for local paths and for which url2pathname (the "pathname" part in the name is important) would make sense as well. If we need to support another URL scheme, then we should probably have another function (well maybe, the ftp scheme could make sense because it's related to files but it'd be weird to use url2pathname in this case IMO)

@barneygale barneygale requested a review from picnixz April 13, 2025 17:28
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nits but you can go ahead whenever you want.

Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
@barneygale barneygale merged commit ccad61e into python:main Apr 14, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants