URLs validation should use urlparse #189

waynew · 2021-03-23T17:31:59Z

Python already parses URLs, and does it correctly:

>>> from urllib.parse import urlparse
>>> urlparse('https://google.com')
ParseResult(scheme='https', netloc='google.com', path='', params='', query='', fragment='')
>>> urlparse('gopher://gopher.waynewerner.com')
ParseResult(scheme='gopher', netloc='gopher.waynewerner.com', path='', params='', query='', fragment='')
>>> urlparse('tel://555-555-5555')
ParseResult(scheme='tel', netloc='555-555-5555', path='', params='', query='', fragment='')
>>> urlparse('file:///path-to-some-file')
ParseResult(scheme='file', netloc='', path='/path-to-some-file', params='', query='', fragment='')
>>> urlparse('missing-scheme.com')
ParseResult(scheme='', netloc='', path='missing-scheme.com', params='', query='', fragment='')

I had to chase down this library because click-params uses validators to validate URLs, but totally valid URLs aren't parsed correctly because the scheme wasn't expected by this library 😞

The text was updated successfully, but these errors were encountered:

rcirca · 2021-08-11T22:16:37Z

It doesn't validate though, parses sure, but validating with it is not good. http:////.google.com would be considered valid based on urllib.

waynew · 2021-08-11T23:34:01Z

You're right, it would, and is a valid URL:

>>> p.urlparse('http:////.google.com')
ParseResult(scheme='http', netloc='', path='//.google.com', params='', query='', fragment='')

path might not be a valid domain name, but that's an entirely different problem. Interestingly enough, http:////.google.com works fine if you type it into your address bar in Chrome, though it fails if you click that link. http:////google.com works, though.

rcirca · 2021-08-12T00:05:52Z

well, technically it's wrong to place that for the 'path', google.com should be the 'netloc'?
Yeah, without the period it works fine in chrome, but not in safari 😅

waynew · 2021-08-13T14:15:46Z

Well, it is the path, strictly speaking, and should be rejected because there is no netloc.

Just because it's a real URL doesn't mean you can get there from here 🤣

waynew · 2023-03-19T14:32:13Z

Awesome! Thanks for your efforts 🎉🚀👍

y0urself mentioned this issue Mar 4, 2022

Add: script_calls_and_tags greenbone/troubadix#86

Merged

3 tasks

yozachar added enhancement Issue/PR: A new feature outdated Issue/PR: Open for more than 3 months labels Mar 14, 2023

yozachar mentioned this issue Mar 17, 2023

maint: improves url module #245

Merged

yozachar closed this as completed in c43826c Mar 18, 2023

yozachar closed this as completed in #245 Mar 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

URLs validation should use urlparse #189

URLs validation should use urlparse #189

waynew commented Mar 23, 2021

rcirca commented Aug 11, 2021

Uh oh!

waynew commented Aug 11, 2021

Uh oh!

rcirca commented Aug 12, 2021

Uh oh!

waynew commented Aug 13, 2021

Uh oh!

waynew commented Mar 19, 2023

Uh oh!

URLs validation should use urlparse #189

URLs validation should use urlparse #189

Comments

waynew commented Mar 23, 2021

rcirca commented Aug 11, 2021

Uh oh!

waynew commented Aug 11, 2021

Uh oh!

rcirca commented Aug 12, 2021

Uh oh!

waynew commented Aug 13, 2021

Uh oh!

waynew commented Mar 19, 2023

Uh oh!