Skip to content

Add extract_urls-helper #854

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 40 commits into from
Closed

Add extract_urls-helper #854

wants to merge 40 commits into from

Conversation

JosXa
Copy link
Contributor

@JosXa JosXa commented Oct 2, 2017

Added a new helper that extracts all URLs contained in a message and returned from Telegram

@JosXa JosXa requested a review from jsmnbom October 2, 2017 01:49
@91DarioDev
Copy link

What about to include also captions in the helper using regex?

@JosXa
Copy link
Contributor Author

JosXa commented Oct 2, 2017

@91DarioDev Good point.

@JosXa
Copy link
Contributor Author

JosXa commented Oct 2, 2017

@91DarioDev

  • Extracting URLs from Captions

@python-telegram-bot python-telegram-bot deleted a comment from codecov bot Oct 2, 2017
@Eldinnie
Copy link
Member

Eldinnie commented Oct 2, 2017

Fails on all except py3.6 due to urllib.parse

@JosXa JosXa self-assigned this Oct 7, 2017
@Eldinnie
Copy link
Member

Eldinnie commented Oct 7, 2017

@JosXa Can you please make sure the PR passes pre-commit and tests?

@python-telegram-bot python-telegram-bot deleted a comment from codecov bot Feb 1, 2018
@JosXa
Copy link
Contributor Author

JosXa commented Feb 1, 2018

test_helpers succeeds in all test cases. However, unrelated tests fails. So lgtm, but @Reviewer please have another look at the tests.

@Eldinnie
Copy link
Member

Eldinnie commented Feb 1, 2018

@JosXa Can you remind me why we would need/want this? And anyway I think it's more logical to sort in order of appearance rather then alphabetical? Also to parse them from captions please use the parse_caption_entites

@Bibo-Joshi
Copy link
Member

Bibo-Joshi commented Oct 17, 2019

Turns out the issue was hidden in the call of message.parse_entities. This builds a dict which is not sorted in python <=3.5. So now after parsing, the entities are just sorted.
One thing I want to mention: The method now can only handle single trailing slashes, i.e. [github.com/, google.com, google.com/ github.com//] will produce [github.com/, google.com/ github.com//]. However I guess trailing double slashes are very rare (if even existent?), so that should be ok IMO …

PS: AppVeyor fail is unrelated ;)

@Bibo-Joshi Bibo-Joshi requested a review from tsnoam October 17, 2019 19:30
Copy link
Member

jh0ker commented Oct 17, 2019

@Bibo-Joshi I'm not quite sure why you would want to filter this at all. The difference between having a trailing slash or not in a URL is not necessarily insignificant.

Unrelated, the comment in line 185 says "dublicates" instead of "duplicates".

Copy link
Member

@jh0ker I guess that 9-9-19 me interpreted @nmlorg s comment such that in doubt it's better to have a trailing slash than not. But neither 9-9-19 me nor now-me have much expertise on that area (or any area at all 😅) …

@Bibo-Joshi Bibo-Joshi added 📋 pending-review work status: pending-review and removed 📋 pending-reply work status: pending-reply labels Oct 20, 2019
@Poolitzer Poolitzer added this to the 12.3 milestone Nov 11, 2019
@tsnoam tsnoam modified the milestones: 12.3, 12.4 Nov 15, 2019
@JosXa
Copy link
Contributor Author

JosXa commented Dec 16, 2019

@SpEcHiDe brought up an interesting point. He says that he's using something like this in his bot:

image

Maybe we could also provide some abstractions over join links, message references, and t.me URLs 🤔

@Bibo-Joshi
Copy link
Member

Test fail is unrelated

@tsnoam
Copy link
Member

tsnoam commented Feb 2, 2020

After long considerations and real intent to see this PR merged, we, the maintainers, decided that this PR is too specific to be included in a general purpose library.
Therefor we've decided to close the PR without merging.

@tsnoam tsnoam closed this Feb 2, 2020
@tsnoam tsnoam deleted the extract-urls branch February 2, 2020 21:34
@github-actions github-actions bot locked and limited conversation to collaborators Aug 19, 2020
@Bibo-Joshi
Copy link
Member

For future reference: this will be made available through the new ptbcontrib library

@Bibo-Joshi Bibo-Joshi added 🔌 enhancement pr description: enhancement and removed enhancement labels Nov 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🔌 enhancement pr description: enhancement 📋 pending-review work status: pending-review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants