-
-
Notifications
You must be signed in to change notification settings - Fork 245
Changed dict key-matching regex to capture any valid dict key #920
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I tried this out and found that this more general pattern meant the key matching too much of the line, so tab completion of the second dictionary on a line no longer works. I think It would be nice to add some tests, both of the new functionality you're adding (backslashes in strings work!) and this regression (second dictionary on a line should still work) like bpython/bpython/test/test_line_properties.py Lines 171 to 184 in b66a29f
These tests use a DSL to describe the expected match when calling one of these functions: bpython/bpython/test/test_line_properties.py Lines 77 to 86 in b66a29f
so a test for this regression might look like
Thanks for looking into this! |
@arian-deimling could you add tests to keep track of the different cases we've talked about? This pattern doesn't match
but that didn't work before either, so I think your code is a winner! You can run the tests by running |
@thomasballinger If you don't mind, I'd like like to work on this a little bit more. It's more nuanced than I originally thought, but I think with a few more minor modifications, we could have the tab completion working for most different types of potential dict keys (a.k.a. hashable types - https://stackoverflow.com/questions/14535730/what-does-hashable-mean-in-python/44880799#44880799). I agree that there will probably have to be a few edge cases that we cannot handle. And to answer your first question, yes, I will add some tests! |
@thomasballinger Ok - ready for you to have another look with these new commits. This newest version should be able to tab-complete all Also, that testing stuff is pretty neat and straightforward. The changes pass all tests as well as 9 new ones. This may be off topic, but I think this test case is irrelevant: bpython/bpython/test/test_line_properties.py Line 176 in b66a29f
because in the case where you have typed some_dict[abc , the program is not trying to match abc as a key, but rather as the beginning of a variable or function name.
|
I agree that that code path isn't useful when running bpython, and I am ambivalent about inverting the test and changing the behavior. I wrote out the following to think through this. If we implemented fuzzy completion for dict keys (which we do have implemented in The only reason to leave this test case is that always matching the item in the dict key position makes it easier to describe what bpython/bpython/autocomplete.py Lines 225 to 226 in b66a29f
A reason to change the behavior and explicitly not match anything other than hashable literals is that it would be a more efficient solution in some cases than what I implemented for deciding which completer to use: if bpython/bpython/autocomplete.py Lines 653 to 668 in b66a29f
For now, starting to type an identifier in the dict key position means we will not be able to complete a dictionary key. So returning None for something that could not possibly be a prefix of a valid key literal does make sense: there is a dict key, but it's not a completable one and this code is all about completion. So I'd be ok with this change, but I think it's ok as-is too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to get this right, this looks much more precise than we had it. I learned a bunch from reading it.
Does matching more types like this expand what works for substitutions on tab autocomplete? I'd like to understand the behavior change to figure out what the more complex code is buying us.
bpython/line.py
Outdated
# pieces of regex to match repr() of several hashable built-in types | ||
_match_identifiers_and_functions = r"""[a-zA-Z_][a-zA-Z0-9_.(),\[\] ]*""" | ||
_match_single_quote_str_bytes = \ | ||
r"""b?'(?:\\(?:\\\\)*['"nabfrtvxuU]|[^'\\])*(?:\\\\)*\\?'?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a while to figure out what was going on here, maybe it'd be helpful to link to the grammar for strings as justification for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I will take care of that.
bpython/line.py
Outdated
@@ -34,7 +34,33 @@ def current_word(cursor_offset: int, line: str) -> Optional[LinePart]: | |||
return LinePart(start, end, word) | |||
|
|||
|
|||
_current_dict_key_re = LazyReCompile(r"""[\w_][\w0-9._]*\[([\w0-9._(), '"]*)""") | |||
# pieces of regex to match repr() of several hashable built-in types | |||
_match_identifiers_and_functions = r"""[a-zA-Z_][a-zA-Z0-9_.(),\[\] ]*""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this matching; identifiers and property lookup and indexing and function calls?
bpython/line.py
Outdated
_match_identifiers_and_functions = r"""[a-zA-Z_][a-zA-Z0-9_.(),\[\] ]*""" | ||
_match_single_quote_str_bytes = \ | ||
r"""b?'(?:\\(?:\\\\)*['"nabfrtvxuU]|[^'\\])*(?:\\\\)*\\?'?""" | ||
_match_double_quote_str_bytes = r"""b?"[^"]*"?""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that double quotes simpler because repr() never surrounds a string with double quotes if it contains one? Playing around seems to be, cool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, based on some research and just testing different strings, repr() will only surround a string in double-quotes if it contains at least one single quote char ('
) and exactly zero double quote chars ("
).
bpython/line.py
Outdated
|
||
_current_dict_key_re_base = r"""[\w_][\w0-9._]*\[""" | ||
_current_dict_key_re = LazyReCompile( | ||
f'{_current_dict_key_re_base}(' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the "_base" part of _current_dict_key_re_base confuses me a little, _base because it's the beginning of the pattern? because the dict is the base of the expression?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I could use some direction on the name if you don't like that, but basically, base
is the part that I did not change which is everything before the [
character after which 'key matching mode' is entered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe _current_dict_key_dict_re
, or maybe since all of these substrings are not indented to be used separately, a different name like _current_dict_key_dict_part
— OH or to match your current convention of _match_*
maybe just _dict
or _dict_before_match
or something. 🤞 there'll be a beautiful was to do this with re.VERBOSE, but if not just changing to something shorter is fine. I just don't want to force readers to know the history of the code in order to understand the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to _match_dict_before_key
.
Yes, now you can tab-complete any of the types that I added a regex for, and the way I tested for this was to make a Tuples are a special case because you cannot tab-complete them unless you type |
@thomasballinger Okay.. so I think I lost sight of what issue I was trying to solve but I think this final commit is the one! This comment is what I'm basing this last change on:
So the original issue was that we weren't matching everything that can possibly be in the key position, and the primary issue is strings because they can contain the There is one regression here, but I think it is worth the additional functionality. Current in Let me know what you think, but I think the solution to the regression is by making a change in the |
@thomasballinger Can you re-run the test? I fixed the code to pass the black linter test. |
Codecov Report
@@ Coverage Diff @@
## main #920 +/- ##
==========================================
+ Coverage 67.93% 67.98% +0.05%
==========================================
Files 61 61
Lines 9150 9165 +15
==========================================
+ Hits 6216 6231 +15
Misses 2934 2934
Continue to review full report at Codecov.
|
I didn’t mean to do that. Just clicked on the button to see what is does. Sorry |
No problem! I want Sebastian to take a look, and he's busy today so it'll be another day at least FYI. |
@sebastinas I think it's good to go now. |
Turned off |
Thanks for the summary! Looks good to me.
That is unfortunate, I bet it's more common to autocomplete inside [] than to want to complete keys with ] in them; but the way you have done it is more correct and if we want to fix this we can do it explicitly with a workaround instead of an accident of the way the code works. There are other similar edge cases here, it would be neat if e.g. |
Fixes #917