Skip to content

URLs validation should use urlparse #189

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
waynew opened this issue Mar 23, 2021 · 5 comments · Fixed by #245
Closed

URLs validation should use urlparse #189

waynew opened this issue Mar 23, 2021 · 5 comments · Fixed by #245
Labels
enhancement Issue/PR: A new feature outdated Issue/PR: Open for more than 3 months

Comments

@waynew
Copy link

waynew commented Mar 23, 2021

Python already parses URLs, and does it correctly:

>>> from urllib.parse import urlparse
>>> urlparse('https://google.com')
ParseResult(scheme='https', netloc='google.com', path='', params='', query='', fragment='')
>>> urlparse('gopher://gopher.waynewerner.com')
ParseResult(scheme='gopher', netloc='gopher.waynewerner.com', path='', params='', query='', fragment='')
>>> urlparse('tel://555-555-5555')
ParseResult(scheme='tel', netloc='555-555-5555', path='', params='', query='', fragment='')
>>> urlparse('file:///path-to-some-file')
ParseResult(scheme='file', netloc='', path='/path-to-some-file', params='', query='', fragment='')
>>> urlparse('missing-scheme.com')
ParseResult(scheme='', netloc='', path='missing-scheme.com', params='', query='', fragment='')

I had to chase down this library because click-params uses validators to validate URLs, but totally valid URLs aren't parsed correctly because the scheme wasn't expected by this library 😞

@rcirca
Copy link

rcirca commented Aug 11, 2021

It doesn't validate though, parses sure, but validating with it is not good. http:////.google.com would be considered valid based on urllib.

@waynew
Copy link
Author

waynew commented Aug 11, 2021

You're right, it would, and is a valid URL:

>>> p.urlparse('http:////.google.com')
ParseResult(scheme='http', netloc='', path='//.google.com', params='', query='', fragment='')

path might not be a valid domain name, but that's an entirely different problem. Interestingly enough, http:////.google.com works fine if you type it into your address bar in Chrome, though it fails if you click that link. http:////google.com works, though.

@rcirca
Copy link

rcirca commented Aug 12, 2021

well, technically it's wrong to place that for the 'path', google.com should be the 'netloc'?
Yeah, without the period it works fine in chrome, but not in safari 😅

@waynew
Copy link
Author

waynew commented Aug 13, 2021

Well, it is the path, strictly speaking, and should be rejected because there is no netloc.

Just because it's a real URL doesn't mean you can get there from here 🤣

@yozachar yozachar added enhancement Issue/PR: A new feature outdated Issue/PR: Open for more than 3 months labels Mar 14, 2023
@waynew
Copy link
Author

waynew commented Mar 19, 2023

Awesome! Thanks for your efforts 🎉🚀👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue/PR: A new feature outdated Issue/PR: Open for more than 3 months
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants