-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Readline word sequences #5420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readline word sequences #5420
Conversation
That sounds a bit dangerous from a licensing point of view. What do you mean by this, did you copy code, or the algorithm? |
Just based on the algorithm. Code can't really be copied from readline since it does other things as well (handles multibyte characeters, ...) |
Even copying the algorithm may lead to licensing issues. I want to be careful here. |
Okay. I think that anyway it's better to use a more optimized version of this function. I'll just rewrite it. |
lib/mp-readline/readline.c
Outdated
return pos; | ||
} | ||
|
||
pos--; // this also ensures pos <= line length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this also ensures pos <= line length
I assume you added this comment to explain why you don't use an assert
like above. Things is, the comment is only correct under the assumption that cursor pos never crosses line length. Which it probably should never do. But that's actually another reason to use the assert: make sure the calling code behaves, makes sure this code doesn't access out of bounds data, and a correct way of commenting expected behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, without this decrement you'd have to check it in runtime. So by this comment I wanted to remind the reader why we don't need to if
it here.
But you are absolutely right about the use of asserts. I'll add one here as well.
lib/mp-readline/readline.c
Outdated
#if MICROPY_REPL_EMACS_WORDS_MOVE | ||
STATIC size_t cursor_count_forward_word(size_t cursor_pos, vstr_t *line) { | ||
size_t pos = rl.cursor_pos; | ||
const char *line_buf = vstr_str(line); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to where it actually gets used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean? To move it to the first line where it's used in unichar_isalnum
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's clearer (to me at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now after optimizing those function, it's organized differently.
lib/mp-readline/readline.c
Outdated
} | ||
} | ||
|
||
if (pos == rl.line->len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It';s confusing to sometimes use ->len
and other times vstr_len
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's my miss. Will update to vstr_len
.
lib/mp-readline/readline.c
Outdated
@@ -221,9 +291,43 @@ int readline_process_char(int c) { | |||
case 'O': | |||
rl.escape_seq = ESEQ_ESC_O; | |||
break; | |||
#if MICROPY_REPL_EMACS_WORDS_MOVE | |||
// on stm32, it compiles better when passing parameters to cursr_count_*_word from here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling either succeeds or doesn't, so i'm not sure what you mean with 'better' here? Does it produce more optimal code or so? Does that make a runtime difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By better I meant smaller code size. I didn't actually check what has changed - I don't think it matters since this piece of code runs on user interaction, not in the core of MP, and it's not in the hot path of anything.
I'll clarify the comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed since it's incorrect now :/
@@ -0,0 +1,9 @@ | |||
# word movement |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't cover all cases I think, i.e the cases where you return early because of reaching the end of the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't, still WIP :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now it covers, I think
Doesn't work correctly for me, but I didn't check if the bug is in your code or elsewhere. On unix the following reproduces one of the prioblems for me:
This is just one of the combinations, there's a bunch of problems where if you mix enough commands everything goes berserk. |
Yup, reproduced. Found the problem pretty quickly - |
Added a specific test-case for this bug |
15df4f8
to
71d7471
Compare
Okay, updated with fixes + full tests + rewriting the functions for better optimization, and to remove any risk of copyright infringement as @dpgeorge has been concerned. I think this test file is one of the most complex test files I've written. readline devs, I salute you, this is one tough product to compose tests for. Size increase with both options for stm32 is 284 bytes. With the emacs options only, it's 256 bytes. I've enabled both options by default for unix coverage build, and also for unix, because I really think the extra convenience is worth it... |
71d7471
to
5289f41
Compare
lib/mp-readline/readline.c
Outdated
@@ -74,6 +74,7 @@ STATIC void mp_hal_move_cursor_back(uint pos) { | |||
// snprintf needs space for the terminating null character | |||
int n = snprintf(&vt100_command[0], sizeof(vt100_command), "\x1b[%u", pos); | |||
if (n > 0) { | |||
assert(n < sizeof(vt100_command)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might have to go in a separate commit, plus it's not going to compile with all warnings enabled because of comparision between signed/unsigned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move to a separate commit.
It compiles with asserts enabled, I've used it to discover one of the bugs (where pos
would underflow and this function will actually receive 2 ** 32 - 1
). The assert was raised in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It compiles with asserts enabled
Well it indeed doesn't compile on the coverage build! I've added a cast to unsigned
.
It's funny we get this warning because it's in an n > 0
block, so I don't see any signed/unsigned conversion problem that might occur.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it's in an n > 0 block
hehe yes, but compilers do not follow that reasoning, luckily
lib/mp-readline/readline.c
Outdated
break; | ||
} | ||
|
||
pos += forward ?: forward - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look right? Also it looks like you chose to make forward
an int
just to be able to use forward - 1
, but it's real meaning is it's used as a boolean. But then you add that int
to a size_t
which isn't ideal anyway. In fact, if forward
is 0
I'm nout sure what is going to happen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, forward
is indeed used as kind of a boolean here, but the 1/0 representation allows producing smaller code, -8 bytes on stm32 this way (which is disappointing, I was hoping it'd be optimized correctly).
It's legit to add -1
to a size_t
. As long as it doesn't underflow... (Which can't happen here due to the loop checks). On ARM, it boils down to add r4, r8
with r8=0
or r8=0xffffffff
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I was also talking about the ?:
. I didn't know it, turns out it's a GNU extension and so won't compile with MSVC and possibly other compilers so you should just write it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh, you're right, I keep forgetting MP compiles w/ MSVC as well. It's indeed a GNU extension. I'll change.
lib/mp-readline/readline.c
Outdated
size_t pos = rl.cursor_pos; | ||
bool in_word = false; | ||
|
||
while ((forward || 0 < pos) && (!forward || pos < vstr_len(rl.line))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me personally, this is just overly succinct and not super readable. Not sure what others think, but both for mainainance and clarity I'd rather have code which speaks for itself instead of having to manually parse 'oh, this actually just means if(forward) then X else Y
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this expression more nicely with comments. But it does have to remain somewhat the same to keep the code small..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you rewrote the expression you don't really need the comments anymore, since the code now is very clear :)
During readline development, this function may receive bad `pos` values. It's easier to understand the assert() failing error than to have a "stack smashing detected" message.
…d-kill-word sequences.
5289f41
to
c1a0fd7
Compare
The failing check is "code size increased for ports/minimal CROSS=1" but I can't reproduce it locally :( Perhaps it's checking against a stale file? Anyway, I think it's ready. |
Don't worry about that, it sometimes increases size spuriously. |
Thanks @Jongy for a well-written PR with tests. I've rebased and merged it in 853aaa0. I made some minor changes during the rebase:
|
Cool, Thanks :) IIRC I named |
I see. But since the "extra" refers to additional keys on top of the standard EMACS ones, I think it's best to have the word EMACS in the config name. Otherwise it could be something like |
A matter of taste in the little bits. However, since the comments on both options are extensive, it doesn't matter that much, let it be :) |
@Jongy Thank you for the commit. It is very useful. Several times a day if used to push Ctrl-Left or Ctrl-Right for moving, which did not work. Now it does. |
I'm glad you liked it @robert-hh :) It's indeed very useful. I can't recall how I ever used the REPL without having Ctrl+Right/Ctrl+Left. The new escape codes I added were the first codes handled in this state machine that have multiple parameters separated by If we want it to accept So Ctrl+Delete wasn't all that important IMO to require this change 🤷♂️ I'm not a huge fan of it anyway since it's not supported out-of-the-box by all terminal/shell configurations. |
|
Adds an option to build MicroPython's readline with emacs-style word move/kill sequences.
Also, an extra option to build with support for ctrl+left, ctrl+right and ctrl+w sequences is given.
When both options are enabled on stm32, increases code size by 324 bytes.
Redo of #5024.
TODO: Complete the tests. Also, perhaps optimize the function further? I've based my implementation on GNU readline so I'm feeling quite comfortable it really works in all specific edge cases.