Skip to content

Customize context #2262

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 40 commits into from
Jun 6, 2021
Merged

Customize context #2262

merged 40 commits into from
Jun 6, 2021

Conversation

Bibo-Joshi
Copy link
Member

@Bibo-Joshi Bibo-Joshi commented Dec 22, 2020

⚠️ contains new example, so should be merged right before a release

TODO:

  • Add/Update .. versionadded:: directives
  • Add a note to Deprecation: v14 #2346 to make the new persistence methods abstract classes
  • check that the "edit on GH" thingies redirect to the correct file

POC for #2068, so we have some code base to discuss about

The first three commits only introduce a custom_context parameter to specify a subclass of CallbackContext. That works pretty fine
The last commit expands to customizing chat/user/bot_data and is more in a POC state. Some comments:

  • Minus some issues with annotations, the code runs. Just not tested enough, yet.
  • introduced a class ContextCustomizer - the idea being similarly to Defaults. including that stuff directly into Defaults doesn't seem like a good idea to me. The naming is only temporary, though, I'm not really happy with ContextCustomizer
  • Didn't touch the built-in persistence classes yet. Ultimately we should make both DictPersistence and PicklePersistence generic, i.e. allow using it with custom classes. But probably that doesn't have to be in the same PR, as that will mean writing a lot of boring tests …
  • annotations is tricky (understatement!). Dispatcher now being a generic class depending on the classes passed via the context_customizer object, I somehow need to extract those in a way mypy understands. Will probably make a SO question for that at some point … Also I need to figure out how to annotate something like user_data: UDMapping[int, UDClass], where both UDMapping and UDClass need to be TypeVars (I think)

Edit:
First SO question coming from this PR

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there. Relax, I am just a little warning for the maintainers to release directly after merging your PR, otherwise we have broken examples and people might get confused :)

@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented Jan 16, 2021

Alright, worked a bit on this. Type hinting is now on a reasonably good level. some comments:

  • the persistence argument to Updater/Dispatcher is only annotated as BasePersistence, without specifying the generic types, because I didn't manage to tell mypy that persistence and callback_customizer must have the same generic types. I think that's acceptable.
  • ContextCustomizer got a sh*tload of overload defs for __init__. It's as ugly as it can get, but it's the only way I could achieve that type checking add_handler works correctly. IMHO that makes it worth it, especiall as ContextCustomizer is a helper class that we hopefully don't have to change often.
  • A wiki page on how to type-check code using PTB is probably good anyway, this PR gives even more reason

Tests for Dispatcher and JobQueue are there. Todo: pass context_customizer to Pickle/DictPersistence and write tests to a reasonable extend.

@Bibo-Joshi Bibo-Joshi changed the title POC: Customize context WIP: Customize context Jan 18, 2021
@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented Jan 21, 2021

I skipped DictPersistence, since custom classes are not json-dumpable by default, so making custom context logic work with it will take some action by the user anyway and DictPersistence as more of a starting point in the first place. But I'm ready to be convinced otherwise :D

Probably needs some reworking when #2324 is merged, i.e. moving some types from utils.types to ext.utils.types.

@Bibo-Joshi Bibo-Joshi marked this pull request as ready for review January 21, 2021 21:00
@Bibo-Joshi Bibo-Joshi changed the title WIP: Customize context Customize context Jan 21, 2021
@noxxious
Copy link

noxxious commented Jan 27, 2021

Hey @Bibo-Joshi, I'm just wondering what is the decision to have the dispatcher fetch the all users and chats data upon initialization and then later only use single items (per user_id and chat_id) when constructing the context and when updating persistence?
Given this there are several issues:

  • on cloud function based webhook implementations when load balanced this actually could create a data concurrency issue
  • it unnecessarily uses a lot more RAM (by loading unused chat/user data)

@Bibo-Joshi
Copy link
Member Author

Hey @Bibo-Joshi, I'm just wondering what is the decision to have the dispatcher fetch the all users and chats data upon initialization and then later only use single items (per user_id and chat_id) when constructing the context and when updating persistence?

Sorry, I don't understand the question. Are you asking, why the default is to load everything on init? Or where in the code you can see that decision?

@noxxious
Copy link

Hey @Bibo-Joshi, I'm just wondering what is the decision to have the dispatcher fetch the all users and chats data upon initialization and then later only use single items (per user_id and chat_id) when constructing the context and when updating persistence?

Sorry, I don't understand the question. Are you asking, why the default is to load everything on init? Or where in the code you can see that decision?

The actual decision, I've updated my comment above for the reason why that might be an issue.

@Bibo-Joshi
Copy link
Member Author

Given this there are several issues:

  • on cloud function based webhook implementations when load balanced this actually could create a data concurrency issue
  • it unnecessarily uses a lot more RAM (by loading unused chat/user data)

I'm not sure if I fully understand the first bullet point, but if I do, then both arguments are exactly what this PR tries to improve. See #2068 for a more detailed description.

@noxxious
Copy link

noxxious commented Jan 27, 2021

Just clarify with the example:
The webhook is served by two function instances load balanced. Imagine the case where two users contact the bot (with private chats, for simpler case). Their requests are handled concurrently with the following order:

  1. User1 request hits instance 1, the data (including user 2 is loaded by instance 1) and instance 1 proceeds to handling
  2. User2 request hits instance 2, the data (including user 1 is loaded by instance 2) and instance 2 proceeds to handling
  3. Instance 2 completes and writes the data to persitence, i.e. a database like Postgress (i.e. using PostgressPersistence from ptb-contrib, which overwrites all user's data)
  4. Instance 1 completes and hence overwrites changes to user's 2 data.

@noxxious
Copy link

Wouldn't it be easier to have the persistence actually provide get_user_data(user_id) and just use that when constructing the context. It should simplify the codebase and not need the Dict/Mapping piece.

Though might not be backwards compatible with previous version.

@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented Jan 27, 2021

Just clarify with the example:

No, this is not, what will be happening, or is currently happening. the dispatcher calls get_user_data on init, which returns a mapping, say M (currently a defaultdict). now when an update comes in, dispatcher will access M[user_id]. It's up to M to decide whether the corresponding data for this (and only this!) user has already been loaded or not and load it if necessary. After processing the update, dispatcher will call update_user_data(data, user_id). It's up to the persistence to do the updating of the data for this (and only this!) user. Analogously for chat_data.

So if something's overriden anywhere, it's the fault of either the mapping or the persistence, but never of the dispatcher ;)

Wouldn't it be easier to have the persistence actually provide get_user_data(user_id) and just use that when constructing the context. It should simplify the codebase and not need the Dict/Mapping piece.

Though might not be backwards compatible with previous version.

This one got me thinking :D It might actually be the better approach in two senses:

  1. Only need to implement BasePersistence
  2. Keep any db-access logic separated from Dispatcher.

To make this work with and without persistence, we'd need to refactor how dispatcher is storing user/chat_data internally. More precisely, we would need methods Dispatcher.get_user/chat_data(user/chat_data) that

  • when using no persistence, return the available data/initialize it or
  • when using persistence, call persistence.refresh_user/chat_data(user/chat_id, current_user/chat_data), no matter what. I named the methods refresh instead of get on purpose, b/c they should only update the existing object and not return a new one, as the user/chat_data is shared accross callbacks and it will definetely lead to problems when you start using different objects. One can argue if for users without data, dispatcher should initialize the data and pass it to persistence for refreshing or if persistence should initialize. I tend for dispatcher, as that's closer to the current setup and it emphasizes that dp is in charge of acutally managing the data.

And I guess this can be made in a non-breaking way:

  • Dispatcher.chat/user_data can stay. One might argue that they should be made private and only be accessible via DP.get_*, but that could be done with a deprecation phase
  • BasePersistence.get_user/chat_data can also stay. If you want to load data only on demand, they can return an empty dict. Otherwise they can still be used to load everything at once. In fact IMO they not even need be deprecated as for small to medium sized bots, loading everything on init will still be the reasonable way to go.

My thoughts about how we should proceed:

  • think about this some more
  • see what @noxxious responds
  • Asks for feedback from the other devs
  • Try to get an opinion from Pieter, the author of persistence
  • split the PRs. make this one only about CallbackContext and the type of user/bot/chat_data. make the "dynamic persistence" logic it's own thread.

Other things that came to my mind while thinking about this:

  • maybe make UD, CD & BD bound by a protocal that says "initializable without args", at least when using them without persistence. Scrap that. Protcols are new only in py3.8 …

@noxxious
Copy link

noxxious commented Jan 28, 2021

Thanks, I stand corrected on the override stuff.

And I guess the return of (user/chat)_data should be a Mapping (which is the case in this PR 💯 ) as in actually returning the protocol on how to fetch the data and not necessarily the data itself, but that should maybe be addressed in the docs.

The refresh makes sense, given not all persistence implementations commit on each update and only on flush, dp has to maintain the user data in memory, but refresh would allow the persistence to make the decision on change merges.

But why not really move the whole data storing and persistence into one component out of dp altogether (like Datastore), that could make a decision on what to do with the online data and storing it whenever it's underlying implementation needs. Dispatcher only would need to have hook calls to it.

I guess the same user/chat_data object issue across callbacks could be addressed by:

  1. upon start calling store to init itself (so in could do hot/warm/cold caching if needed)
  2. fetching the data from store upon receiving update and constructing the context
  3. run all the callbacks that need to run for this update cycle using that context and data instance
  4. notify the data store of the update so it can make an appropriate decision (default implementation is just to keep it in the in dictionary)
  5. Upon exit call store.flush

So persistence would be on all the time, just the default implementation would be similar to the current DictPersistence.

# Conflicts:
#	telegram/ext/callbackcontext.py
#	telegram/ext/commandhandler.py
#	telegram/ext/picklepersistence.py
Copy link
Member

@harshil21 harshil21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

example review

Copy link
Member

@harshil21 harshil21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

found this in another quick glance

@Bibo-Joshi
Copy link
Member Author

Bibo-Joshi commented May 28, 2021

Realized that having BasePersistence.refresh_*_data non-abstract but raising NotImplementedError is still breaking, b/c Dispatcher will still call them. So making them pass for now.

Copy link
Member

@starry-shivam starry-shivam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@Bibo-Joshi Bibo-Joshi merged commit fce7cc9 into master Jun 6, 2021
@Bibo-Joshi Bibo-Joshi deleted the customize-context branch June 6, 2021 08:37
@github-actions github-actions bot locked and limited conversation to collaborators Jun 7, 2021
@Bibo-Joshi Bibo-Joshi added 🔌 enhancement pr description: enhancement and removed enhancement labels Nov 3, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🔌 enhancement pr description: enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants