Changed dict key-matching regex to capture any valid dict key #920

arian-deimling · 2021-09-27T23:05:44Z

Fixes #917

thomasballinger · 2021-09-28T01:49:54Z

I tried this out and found that this more general pattern meant the key matching too much of the line, so tab completion of the second dictionary on a line no longer works. I think .* is too broad a pattern, could we restrict it further? Maybe we could allow anything if it started with a ' or " and once that quote is matched, not allow more?

It would be nice to add some tests, both of the new functionality you're adding (backslashes in strings work!) and this regression (second dictionary on a line should still work) like

bpython/bpython/test/test_line_properties.py

Lines 171 to 184 in b66a29f

    
           def test_simple(self): 
        
               self.assertAccess("asdf|") 
        
               self.assertAccess("asdf|") 
        
               self.assertAccess("asdf[<>|") 
        
               self.assertAccess("asdf[<>|]") 
        
               self.assertAccess("object.dict[<abc|>") 
        
               self.assertAccess("asdf|") 
        
               self.assertAccess("asdf[<(>|]") 
        
               self.assertAccess("asdf[<(1>|]") 
        
               self.assertAccess("asdf[<(1,>|]") 
        
               self.assertAccess("asdf[<(1, >|]") 
        
               self.assertAccess("asdf[<(1, 2)>|]") 
        
               # TODO self.assertAccess('d[d[<12|>') 
        
               self.assertAccess("d[<'a>|")

These tests use a DSL to describe the expected match when calling one of these functions:

bpython/bpython/test/test_line_properties.py

Lines 77 to 86 in b66a29f

    
               def assertAccess(self, s): 
        
                   r"""Asserts that self.func matches as described 
        
                   by s, which uses a little language to describe matches: 
        
                   abcd<efg>hijklmnopqrstuvwx|yz 
        
                      /|\ /|\               /|\ 
        
                       |   |                 | 
        
                    the function should   the current cursor position 
        
                    match this "efg"      is between the x and y 
        
                   """

so a test for this regression might look like

    self.assertAccess("(asdf['abc'], asdf[<'ab>|")

Thanks for looking into this!

thomasballinger · 2021-09-28T16:26:04Z

@arian-deimling could you add tests to keep track of the different cases we've talked about?

This pattern doesn't match

>>> d = {'asdf': 123, 'a\'b"\\c': 234}
>>> d['a\'

but that didn't work before either, so I think your code is a winner!

You can run the tests by running pytest in the repo.

arian-deimling · 2021-09-28T17:50:58Z

@thomasballinger If you don't mind, I'd like like to work on this a little bit more. It's more nuanced than I originally thought, but I think with a few more minor modifications, we could have the tab completion working for most different types of potential dict keys (a.k.a. hashable types - https://stackoverflow.com/questions/14535730/what-does-hashable-mean-in-python/44880799#44880799). I agree that there will probably have to be a few edge cases that we cannot handle.

And to answer your first question, yes, I will add some tests!

arian-deimling · 2021-09-29T04:10:54Z

@thomasballinger Ok - ready for you to have another look with these new commits. This newest version should be able to tab-complete all str, bytes, and int keys, float keys besides special values (i.e. nan, inf, -inf), and tuple keys for tuples containing any combination of the aforementioned key possibilities.

Also, that testing stuff is pretty neat and straightforward. The changes pass all tests as well as 9 new ones.

This may be off topic, but I think this test case is irrelevant:

bpython/bpython/test/test_line_properties.py

Line 176 in b66a29f

self.assertAccess("object.dict[<abc|>")

because in the case where you have typed some_dict[abc, the program is not trying to match abc as a key, but rather as the beginning of a variable or function name.

thomasballinger · 2021-09-29T06:10:05Z

This may be off topic, but I think this test case is irrelevant because in the case where you have typed some_dict[abc, the program is not trying to match abc as a key, but rather as the beginning of a variable or function name.

I agree that that code path isn't useful when running bpython, and I am ambivalent about inverting the test and changing the behavior. I wrote out the following to think through this.

If we implemented fuzzy completion for dict keys (which we do have implemented in AttrCompletion and GlobalCompletion) then it would useful to match this, assuming we still want DictKeyCompletion to take precedence over expression completion in that situation. I don't plan to do that though, and I'm not sure we'd want to make it so fuzzy that quotes will be added for you. If the completer were every changed to return matches that included global variables that referred to dictionary keys we would need it... but no plans for that. If we decided to show all matches instead of filtering on prefix we'd want this, but that doesn't sound very useful.

The only reason to leave this test case is that always matching the item in the dict key position makes it easier to describe what DictKeyCompletion.locate() does since its more in the spirit of what locate() is supposed to do according to the BaseCompletion class:

bpython/bpython/autocomplete.py

Lines 225 to 226 in b66a29f

    
                       * `locate(cur, line)` their initial target word to replace given a 
        
                         line and cursor

A reason to change the behavior and explicitly not match anything other than hashable literals is that it would be a more efficient solution in some cases than what I implemented for deciding which completer to use: if DictKeyCompletion.locate() returned None, we could bail out of DictKeyCompletion.matches() before needing to evaluate the dictionary expression, get keys, and see if the expression is a prefix of any keys (and it never is, since the only candidate keys are literals). However if we want to bail out early we could just check manually in DictKeyCompletion.matches instead of changing what locate() means.

bpython/bpython/autocomplete.py

Lines 653 to 668 in b66a29f

    
           for completer in completers: 
        
               try: 
        
                   matches = completer.matches(cursor_offset, line, **kwargs) 
        
               except Exception as e: 
        
                   # Instead of crashing the UI, log exceptions from autocompleters. 
        
                   logger = logging.getLogger(__name__) 
        
                   logger.debug( 
        
                       "Completer {} failed with unhandled exception: {}".format( 
        
                           completer, e 
        
                       ) 
        
                   ) 
        
                   continue 
        
               if matches is not None: 
        
                   return sorted(matches), (completer if matches else None) 
        
           return [], None

For now, starting to type an identifier in the dict key position means we will not be able to complete a dictionary key. So returning None for something that could not possibly be a prefix of a valid key literal does make sense: there is a dict key, but it's not a completable one and this code is all about completion. So I'd be ok with this change, but I think it's ok as-is too.

thomasballinger

Thanks for taking the time to get this right, this looks much more precise than we had it. I learned a bunch from reading it.

Does matching more types like this expand what works for substitutions on tab autocomplete? I'd like to understand the behavior change to figure out what the more complex code is buying us.

bpython/line.py

thomasballinger · 2021-09-29T04:01:13Z

bpython/line.py

+# pieces of regex to match repr() of several hashable built-in types
+_match_identifiers_and_functions = r"""[a-zA-Z_][a-zA-Z0-9_.(),\[\] ]*"""
+_match_single_quote_str_bytes = \
+    r"""b?'(?:\\(?:\\\\)*['"nabfrtvxuU]|[^'\\])*(?:\\\\)*\\?'?"""


It took me a while to figure out what was going on here, maybe it'd be helpful to link to the grammar for strings as justification for this.

Agreed. I will take care of that.

thomasballinger · 2021-09-29T04:06:35Z

bpython/line.py

@@ -34,7 +34,33 @@ def current_word(cursor_offset: int, line: str) -> Optional[LinePart]:
    return LinePart(start, end, word)


-_current_dict_key_re = LazyReCompile(r"""[\w_][\w0-9._]*\[([\w0-9._(), '"]*)""")
+# pieces of regex to match repr() of several hashable built-in types
+_match_identifiers_and_functions = r"""[a-zA-Z_][a-zA-Z0-9_.(),\[\] ]*"""


What is this matching; identifiers and property lookup and indexing and function calls?

thomasballinger · 2021-09-29T04:17:01Z

bpython/line.py

+_match_identifiers_and_functions = r"""[a-zA-Z_][a-zA-Z0-9_.(),\[\] ]*"""
+_match_single_quote_str_bytes = \
+    r"""b?'(?:\\(?:\\\\)*['"nabfrtvxuU]|[^'\\])*(?:\\\\)*\\?'?"""
+_match_double_quote_str_bytes = r"""b?"[^"]*"?"""


Is it the case that double quotes simpler because repr() never surrounds a string with double quotes if it contains one? Playing around seems to be, cool.

Yea, based on some research and just testing different strings, repr() will only surround a string in double-quotes if it contains at least one single quote char (') and exactly zero double quote chars (").

thomasballinger · 2021-09-29T06:13:48Z

bpython/line.py

+
+_current_dict_key_re_base = r"""[\w_][\w0-9._]*\["""
+_current_dict_key_re = LazyReCompile(
+    f'{_current_dict_key_re_base}('


the "_base" part of _current_dict_key_re_base confuses me a little, _base because it's the beginning of the pattern? because the dict is the base of the expression?

Yea, I could use some direction on the name if you don't like that, but basically, base is the part that I did not change which is everything before the [ character after which 'key matching mode' is entered.

maybe _current_dict_key_dict_re, or maybe since all of these substrings are not indented to be used separately, a different name like _current_dict_key_dict_part — OH or to match your current convention of _match_* maybe just _dict or _dict_before_match or something. 🤞 there'll be a beautiful was to do this with re.VERBOSE, but if not just changing to something shorter is fine. I just don't want to force readers to know the history of the code in order to understand the variable name.

Renamed to _match_dict_before_key.

arian-deimling · 2021-09-29T17:05:44Z

Thanks for taking the time to get this right, this looks much more precise than we had it. I learned a bunch from reading it.

Does matching more types like this expand what works for substitutions on tab autocomplete? I'd like to understand the behavior change to figure out what the more complex code is buying us.

Yes, now you can tab-complete any of the types that I added a regex for, and the way I tested for this was to make a dict with those kinds of keys and then type my_dict[ and then make sure that for every new character I typed in, the completion options were still displayed to the user.

Tuples are a special case because you cannot tab-complete them unless you type my_dict[|] with the pipe character representing the cursor location. I'm not sure exactly what the cause is, but it matches the behavior prior to my changes, so I think it's okay. One thing I noticed in these cases where you type the closing brace [ is that when you use the tab completion to finish your key, bpython adds a closing brace ], so you are left with two ]. We may want to open another issue for that; however, this behavior is also consistent with the main branch on the bpython repo.

arian-deimling · 2021-09-29T22:53:36Z

@thomasballinger Okay.. so I think I lost sight of what issue I was trying to solve but I think this final commit is the one!

This comment is what I'm basing this last change on:

The only reason to leave this test case is that always matching the item in the dict key position makes it easier to describe what DictKeyCompletion.locate() does since its more in the spirit of what locate() is supposed to do according to the BaseCompletion class

So the original issue was that we weren't matching everything that can possibly be in the key position, and the primary issue is strings because they can contain the ] character which is the natural way to detect the end of the dict key. To fix this I realized, we should first capture any string in the key position, and outside of strings, we should capture anything but the ] character. If we capture any number or combination of these using something like:
( (?: regex_to_match_any_string | regex_to_match_anything_but_close_square_bracket )* )
then we will capture the entire key. This has the added benefit of allowing text completion of for a multitude of other possible dict keys, such as ranges, complex numbers among others that my older version did not capture.

There is one regression here, but I think it is worth the additional functionality. Current in bpython, if you test
abcd[<'>|] the test will pass, but with my new version the version of the test that would pass is abcd[<'|]>. This is because the ' initiates string pattern matching and therefore the ] is considered part of the string that is being typed in. This means that you cannot tab-complete a string if you've typed my_dict[|] and then start typing the string at the cursor position; however, this regression is offset by the fact that you can tab-complete a string that contains the ] character.

Let me know what you think, but I think the solution to the regression is by making a change in the locate or match function down the line.

arian-deimling · 2021-09-30T00:15:00Z

@thomasballinger Can you re-run the test? I fixed the code to pass the black linter test.

codecov-commenter · 2021-09-30T00:58:23Z

Codecov Report

Merging #920 (a0f7fae) into main (b66a29f) will increase coverage by 0.05%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #920      +/-   ##
==========================================
+ Coverage   67.93%   67.98%   +0.05%     
==========================================
  Files          61       61              
  Lines        9150     9165      +15     
==========================================
+ Hits         6216     6231      +15     
  Misses       2934     2934

Impacted Files	Coverage Δ
bpython/line.py	`96.18% <100.00%> (+0.18%)`	⬆️
bpython/test/test_line_properties.py	`98.89% <100.00%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b66a29f...a0f7fae. Read the comment docs.

arian-deimling · 2021-09-30T15:28:24Z

I didn’t mean to do that. Just clicked on the button to see what is does. Sorry

thomasballinger · 2021-09-30T15:44:43Z

No problem! I want Sebastian to take a look, and he's busy today so it'll be another day at least FYI.

bpython/line.py

bpython/test/test_line_properties.py

arian-deimling · 2021-10-01T15:45:57Z

@sebastinas I think it's good to go now.

arian-deimling · 2021-10-01T23:27:54Z

Turned off black globally because I don't like double quotes and forgot to reenable it for this @sebastinas. Good to go now. Sorry

thomasballinger · 2021-10-06T16:13:02Z

Thanks for the summary! Looks good to me.

This means that you cannot tab-complete a string if you've typed my_dict[|] and then start typing the string at the cursor position; however, this regression is offset by the fact that you can tab-complete a string that contains the ] character.

That is unfortunate, I bet it's more common to autocomplete inside [] than to want to complete keys with ] in them; but the way you have done it is more correct and if we want to fix this we can do it explicitly with a workaround instead of an accident of the way the code works. There are other similar edge cases here, it would be neat if e.g. my_dict["abc|"] completed the key "abcdef" but prefix matching isn't going to do that. We can fix those sometime separately.

Changed dict key-matching regex to capture any valid dict key

eb321aa

arian-deimling marked this pull request as ready for review September 27, 2021 23:06

arian-deimling mentioned this pull request Sep 27, 2021

Tab completion for dict key causes crash #917

Closed

dict key matching regex no longer matches beyond the end of a key

3f20d8e

arian-deimling added 3 commits September 28, 2021 22:12

Updated regex to handle str, bytes, int, float, tuple dict keys

97d6b19

Added tests for regex changes in prior commit

5bfd09c

removed erroneous print()

e470176

thomasballinger reviewed Sep 29, 2021

View reviewed changes

arian-deimling added 3 commits September 29, 2021 17:51

Added comments to regex using re.VERBOSE flag

5299f2b

Removed unnecessary regular expressions

12ff50d

Removed old commented-out code

d57f008

minor formatting changes to pass black linter test

a0f7fae

arian-deimling requested a review from thomasballinger September 30, 2021 15:26

sebastinas reviewed Oct 1, 2021

View reviewed changes

bpython/line.py Outdated Show resolved Hide resolved

sebastinas reviewed Oct 1, 2021

View reviewed changes

bpython/test/test_line_properties.py Show resolved Hide resolved

arian-deimling added 3 commits October 1, 2021 11:12

formatting change; adjusted single-quote string regex; added tests

236dc0b

modified test to test strings containing '[' and ']'

286d71c

added test case for string dict key being typed inside []

01ed93b

ran black formatter on file

0a08f41

thomasballinger merged commit 18f6901 into bpython:main Oct 6, 2021

sebastinas added this to the release-0.22 milestone Oct 12, 2021

Uh oh!

Changed dict key-matching regex to capture any valid dict key #920

Changed dict key-matching regex to capture any valid dict key #920

Uh oh!

Conversation

arian-deimling commented Sep 27, 2021

Uh oh!

thomasballinger commented Sep 28, 2021

Uh oh!

thomasballinger commented Sep 28, 2021

Uh oh!

arian-deimling commented Sep 28, 2021

Uh oh!

arian-deimling commented Sep 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasballinger commented Sep 29, 2021

Uh oh!

thomasballinger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arian-deimling commented Sep 29, 2021

Uh oh!

arian-deimling commented Sep 29, 2021

Uh oh!

arian-deimling commented Sep 30, 2021

Uh oh!

codecov-commenter commented Sep 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

arian-deimling commented Sep 30, 2021

Uh oh!

thomasballinger commented Sep 30, 2021

Uh oh!

Uh oh!

Uh oh!

arian-deimling commented Oct 1, 2021

Uh oh!

arian-deimling commented Oct 1, 2021

Uh oh!

thomasballinger commented Oct 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

arian-deimling commented Sep 29, 2021 •

edited

Loading

codecov-commenter commented Sep 30, 2021 •

edited

Loading

thomasballinger commented Oct 6, 2021 •

edited

Loading