Skip to content

clean pending update based on Timedelta or datetime #1987

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 62 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
d9988e2
add new fuction to process the backlog using a timedelta spec
Jun 8, 2020
deb2515
add selector to bootstrap, use type of var 'clean' to select the corr…
Jun 8, 2020
d7254b1
pass clean var down to handler
Jun 8, 2020
158419d
ensure we compare UTC vs UTC times
Jun 8, 2020
bc5d964
clean up debugging logging
Jun 8, 2020
b5db0c6
include full 'timedelta' span as valide msg age
Jun 8, 2020
62015f4
updated test 'TODO' comment
Jun 8, 2020
07e22b1
verify input 'clean' as timedelta, needs to be >=1. Throw error other…
Jun 9, 2020
01cf389
update documentation
Jun 9, 2020
766cbe6
clean up, fix isinstance calls
Jun 9, 2020
663aece
clean up no.2
Jun 9, 2020
684280f
fix comment, change varname
Jun 10, 2020
f766b01
msg.date already utc
Jun 10, 2020
1541d64
remove 'kwargs' and use partial to pass var clean
Jun 10, 2020
4804b8b
add debugging info
Jun 10, 2020
21e0226
forgot to rename var in function def...
Jun 10, 2020
cafb188
if timedelta is passed, convert to datetime. If datetime is passed, v…
Jun 10, 2020
531b484
add comments this if for debugging
Jun 11, 2020
ce35971
clean up, remove all extra logger.debug's
Jun 11, 2020
33410d3
check if timedelta is negative, if it is convert to positive
Jun 11, 2020
bd21699
update docstring
Jun 11, 2020
99dd377
remove warning when no tz is set, doc string has warning
Jun 11, 2020
cb7bb3b
change .message to .effective_message to catch all types
Jun 11, 2020
6c44b67
check if msg.date is None, if yes stop cleaning and let all updates pass
Jun 11, 2020
51953a6
update start_polling doc string
Jun 11, 2020
e3f46b8
apply code to start_webhook
Jun 11, 2020
73adbdd
typo
Jun 11, 2020
907e092
remove stale 'sleep' that was just in there to wait for the cleaning …
Jun 13, 2020
c887ac3
fix bug - mismatched ident
Jun 16, 2020
eae79ec
remove slow test functions
Jun 16, 2020
30e854f
test function for clean attr
Jun 16, 2020
091b7a9
create fake class to set attribute, deepcopy to ensure we have unique…
Jun 16, 2020
72098b7
remove all existing testing fucntions to reduce test time
Jun 16, 2020
5f60741
fix bug
Jun 16, 2020
2378587
fix bugg
Jun 16, 2020
233ba3c
remove old func
Jun 16, 2020
cd7afca
test higher arg value
Jun 16, 2020
2c8990e
add catch for increased attempts var
Jun 16, 2020
641e1e7
change selector
Jun 16, 2020
5b6f522
be specific on expected argument
Jun 16, 2020
60dad8e
add safety against inf loop
Jun 16, 2020
5bf813c
change var name
Jun 16, 2020
cac4bf4
fix bug
Jun 16, 2020
030b9ad
test with clean = False
Jun 16, 2020
d3479c0
test
Jun 16, 2020
afe7b24
test
Jun 16, 2020
08a4717
Revert "test"
Jun 17, 2020
0040c88
Revert "test with clean = False"
Jun 17, 2020
97f3197
Revert "test"
Jun 17, 2020
82f8cee
remove unused args
Jun 17, 2020
92346a5
add second function handler
Jun 17, 2020
9f5a8ce
test
Jun 17, 2020
1787202
Revert "test"
Jun 17, 2020
0c4208c
Revert "add second function handler"
Jun 17, 2020
2043335
add comment, create inf loop protection
Jun 17, 2020
e4140d9
improve comment
Jun 17, 2020
0e96ed5
test inf loop protection
Jun 17, 2020
d06ecf9
Revert "test inf loop protection"
Jun 17, 2020
534ec26
improve comment
Jun 17, 2020
fb082a7
set var to 0, just in case
Jun 17, 2020
7dce85d
improve comment
Jun 17, 2020
e1144a9
remove deep copy
Jun 17, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 117 additions & 6 deletions telegram/ext/updater.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,10 @@
import ssl
from threading import Thread, Lock, current_thread, Event
from time import sleep
from datetime import datetime, timedelta, timezone
from signal import signal, SIGINT, SIGTERM, SIGABRT
from queue import Queue
from functools import partial

from telegram import Bot, TelegramError
from telegram.ext import Dispatcher, JobQueue
Expand Down Expand Up @@ -233,8 +235,25 @@ def start_polling(self,
poll_interval (:obj:`float`, optional): Time to wait between polling updates from
Telegram in seconds. Default is 0.0.
timeout (:obj:`float`, optional): Passed to :attr:`telegram.Bot.get_updates`.
clean (:obj:`bool`, optional): Whether to clean any pending updates on Telegram servers
before actually starting to poll. Default is False.
clean (:obj:`bool` | :obj:`datetime.timedelta` | :obj:`datetime.timedelta`, optional):
Whether to clean any pending updates on Telegram servers before actually starting
to poll. This parameter will be interpreted depending on its type.

* :obj:`bool` ``True`` cleans all updates. Default is ``False``.
* :obj:`datetime.timedelta` will be interpreted as "time before now" cut off.
Pending updates older than the cut off will be cleaned up.
:obj:`datetime.timedelta` is sign independent, both positive and negative deltas
are interpreted as "in the past".
* :obj:`datetime.datetime` will be interpreted as a specific date and time as
cut off. Pending updates older than the cut off will be cleaned up.
If the timezone (``datetime.tzinfo``) is ``None``, UTC will be assumed.

Note:
If :attr:`clean` is :obj:`datetime.timedelta` or :obj:`datetime.datetime` and
if a :class:`telegram.Update.effective_message` is found with
:attr:`telegram.Message.date` is ``None``, before the :obj:`datetime.timedelta`
or :obj:`datetime.datetime` condition is met, all updates will pass through.

bootstrap_retries (:obj:`int`, optional): Whether the bootstrapping phase of the
`Updater` will retry on failures on the Telegram server.

Expand All @@ -251,11 +270,38 @@ def start_polling(self,
Returns:
:obj:`Queue`: The update queue that can be filled from the main thread.

Raises:
ValueError: if :attr:`clean` is :obj:`datetime.timedelta` and is < 1 second.
ValueError: if :attr:`clean` is :obj:`datetime.datetime` is not a least 1 second older
than `now()`.

"""
with self.__lock:
if not self.running:
self.running = True

if isinstance(clean, timedelta):
if clean.total_seconds() < 0:
clean = clean * -1

if clean.total_seconds() < 1:
raise ValueError('Clean as timedelta needs to be >= 1 second')
else:
# convert to datetime
clean = datetime.now(tz=timezone.utc) - clean
elif isinstance(clean, datetime):

if (
clean.tzinfo is None or
(clean.tzinfo is not None and clean.tzinfo.utcoffset(clean) is None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the second check good for? If the datetime is aware it shouldn't matter what the UTC offset is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per PY 3.6.10 documentation (scroll down just a little to the datetime.timezone section the requirement to be 'aware' is to have both conditions. If UTC offset is None datetime is naive.

An object of type time or datetime may be naive or aware. A datetime object d is aware if d.tzinfo is not None and d.tzinfo.utcoffset(d) does not return None. If d.tzinfo is None, or if d.tzinfo is not None but d.tzinfo.utcoffset(d) returns None, d is naive. A time object t is aware if t.tzinfo is not None and t.tzinfo.utcoffset(None) does not return None. Otherwise, t is naive.

Copy link
Contributor Author

@ikkemaniac ikkemaniac Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PY 3.8.3 documentation even has dedicated a paragraph on it

A datetime object d is aware if both of the following hold:
d.tzinfo is not None
d.tzinfo.utcoffset(d) does not return None
Otherwise, d is naive.

Copy link
Contributor Author

@ikkemaniac ikkemaniac Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about changing it to
if clean.tzinfo.utcoffset(clean) is None:
but since the documentation is so specific on both requirements I don't think it's a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check utcoffset in other places, too… If people set non-working stuff as tzinfo, they must expect to get errors. Also the refactoring of JobQueue will make people use pytz with PTB anyway, so they might as well use it here

):
clean=clean.replace(tzinfo=timezone.utc)

if clean > (datetime.now(tz=timezone.utc) - timedelta(seconds=1)):
raise ValueError('Clean as datetime ("%s") needs to be at least 1 second'
' older than "now"("%s")' % (clean,
datetime.now(tz=timezone.utc)))

# Create & start threads
self.job_queue.start()
dispatcher_ready = Event()
Expand Down Expand Up @@ -291,8 +337,25 @@ def start_webhook(self,
url_path (:obj:`str`, optional): Path inside url.
cert (:obj:`str`, optional): Path to the SSL certificate file.
key (:obj:`str`, optional): Path to the SSL key file.
clean (:obj:`bool`, optional): Whether to clean any pending updates on Telegram servers
before actually starting the webhook. Default is ``False``.
clean (:obj:`bool` | :obj:`datetime.timedelta` | :obj:`datetime.timedelta`, optional):
Whether to clean any pending updates on Telegram servers before actually starting
to poll. This parameter will be interpreted depending on its type.

* :obj:`bool` ``True`` cleans all updates. Default is ``False``.
* :obj:`datetime.timedelta` will be interpreted as "time before now" cut off.
Pending updates older than the cut off will be cleaned up.
:obj:`datetime.timedelta` is sign independent, both positive and negative deltas
are interpreted as "in the past".
* :obj:`datetime.datetime` will be interpreted as a specific date and time as
cut off. Pending updates older than the cut off will be cleaned up.
If the timezone (``datetime.tzinfo``) is ``None``, UTC will be assumed.

Note:
If :attr:`clean` is :obj:`datetime.timedelta` or :obj:`datetime.datetime` and
if a :class:`telegram.Update.effective_message` is found with
:attr:`telegram.Message.date` is ``None``, before the :obj:`datetime.timedelta`
or :obj:`datetime.datetime` condition is met, all updates will pass through.

bootstrap_retries (:obj:`int`, optional): Whether the bootstrapping phase of the
`Updater` will retry on failures on the Telegram server.

Expand All @@ -308,11 +371,37 @@ def start_webhook(self,
Returns:
:obj:`Queue`: The update queue that can be filled from the main thread.

Raises:
ValueError: if :attr:`clean` is :obj:`datetime.timedelta` and is < 1 second.
ValueError: if :attr:`clean` is :obj:`datetime.datetime` is not a least 1 second older
than `now()`.

"""
with self.__lock:
if not self.running:
self.running = True

if isinstance(clean, timedelta):
if clean.total_seconds() < 0:
clean = clean * -1

if clean.total_seconds() < 1:
raise ValueError('Clean as timedelta needs to be >= 1 second')
else:
# convert to datetime
clean = datetime.now(tz=timezone.utc) - clean
elif isinstance(clean, datetime):
if (
clean.tzinfo is None or
(clean.tzinfo is not None and clean.tzinfo.utcoffset(clean) is None)
):
clean=clean.replace(tzinfo=timezone.utc)

if clean > (datetime.now(tz=timezone.utc) - timedelta(seconds=1)):
raise ValueError('Clean as datetime ("%s") needs to be at least 1 second'
' older than "now"("%s")' % (clean,
datetime.now(tz=timezone.utc)))

# Create & start threads
self.job_queue.start()
self._init_thread(self.dispatcher.start, "dispatcher"),
Expand Down Expand Up @@ -475,6 +564,23 @@ def bootstrap_clean_updates():
updates = self.bot.get_updates(updates[-1].update_id + 1)
return False

def bootstrap_clean_updates_datetime(datetime_cutoff):
self.logger.debug('Cleaning updates from Telegram server with datetime "%s"',
datetime_cutoff)
updates = self.bot.get_updates()

# reversed as we just need to find the first msg that's too old
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be sure that the updates are always ordered? I just don't know tbh.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 assumption is the mother of f..... i got a sequential list every time i ran my tests (and a ran a lot of m... mumbling something about dates and time)
I'll have look.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • telegram bot API does not mention it is ordered
  • i've looked at get_updates() but don't see ordering
  • i've looked at de_json but don't see ordering

Note that the clean as bool part also assumes an ordered by update_id list.

we could do
updates.sort(key=lambda x: x.update_id, reverse=True)

maybe even better to put this in get_updates() before it returns the list

Copy link
Member

@Bibo-Joshi Bibo-Joshi Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mh … Trusting TG to send updates with incremental update_id is okay. They implicity guarantee that by

In order to avoid getting duplicate updates, recalculate offset after each server response.

Trusting that the date necessarily increasing with the update_id is another thing. I don't see, why it shouldn't, but still … for the moment, just assume it is

for up in reversed(updates):
if up.effective_message.date is None:
# break out and leave all updates as is
return False
elif up.effective_message and (up.effective_message.date < datetime_cutoff):
# break out, we want to process the 'next' and all following msg's
updates = self.bot.get_updates(up.update_id + 1)
return False

return False

def bootstrap_set_webhook():
self.bot.set_webhook(url=webhook_url,
certificate=cert,
Expand All @@ -500,11 +606,16 @@ def bootstrap_onerr_cb(exc):
retries[0] = 0

# Clean pending messages, if requested.
if clean:
if isinstance(clean, bool) and clean:
self._network_loop_retry(bootstrap_clean_updates, bootstrap_onerr_cb,
'bootstrap clean updates', bootstrap_interval)
retries[0] = 0
sleep(1)
elif isinstance(clean, datetime):
bootstrap_clean_updates_datetime_p = partial(bootstrap_clean_updates_datetime,
datetime_cutoff=clean)
self._network_loop_retry(bootstrap_clean_updates_datetime_p, bootstrap_onerr_cb,
'bootstrap clean updates datetime', bootstrap_interval)
retries[0] = 0

# Restore/set webhook settings, if needed. Again, we don't know ahead if a webhook is set,
# so we set it anyhow.
Expand Down
Loading