Readline word sequences #5420

Jongy · 2019-12-15T23:54:15Z

Adds an option to build MicroPython's readline with emacs-style word move/kill sequences.
Also, an extra option to build with support for ctrl+left, ctrl+right and ctrl+w sequences is given.

When both options are enabled on stm32, increases code size by 324 bytes.

Redo of #5024.

TODO: Complete the tests. Also, perhaps optimize the function further? I've based my implementation on GNU readline so I'm feeling quite comfortable it really works in all specific edge cases.

lib/mp-readline/readline.c

dpgeorge · 2019-12-16T00:10:58Z

I've based my implementation on GNU readline

That sounds a bit dangerous from a licensing point of view. What do you mean by this, did you copy code, or the algorithm?

Jongy · 2019-12-16T00:31:31Z

That sounds a bit dangerous from a licensing point of view. What do you mean by this, did you copy code, or the algorithm?

Just based on the algorithm. Code can't really be copied from readline since it does other things as well (handles multibyte characeters, ...)

dpgeorge · 2019-12-16T01:02:58Z

Just based on the algorithm.

Even copying the algorithm may lead to licensing issues. I want to be careful here.

Jongy · 2019-12-16T08:49:56Z

Even copying the algorithm may lead to licensing issues. I want to be careful here.

Okay. I think that anyway it's better to use a more optimized version of this function. I'll just rewrite it.

stinos · 2019-12-16T08:59:14Z

lib/mp-readline/readline.c

+        return pos;
+    }
+
+    pos--; // this also ensures pos <= line length


this also ensures pos <= line length

I assume you added this comment to explain why you don't use an assert like above. Things is, the comment is only correct under the assumption that cursor pos never crosses line length. Which it probably should never do. But that's actually another reason to use the assert: make sure the calling code behaves, makes sure this code doesn't access out of bounds data, and a correct way of commenting expected behaviour.

Actually, without this decrement you'd have to check it in runtime. So by this comment I wanted to remind the reader why we don't need to if it here.

But you are absolutely right about the use of asserts. I'll add one here as well.

stinos · 2019-12-16T09:02:10Z

lib/mp-readline/readline.c

+#if MICROPY_REPL_EMACS_WORDS_MOVE
+STATIC size_t cursor_count_forward_word(size_t cursor_pos, vstr_t *line) {
+    size_t pos = rl.cursor_pos;
+    const char *line_buf = vstr_str(line);


Move this to where it actually gets used?

What do you mean? To move it to the first line where it's used in unichar_isalnum?

Yes, that's clearer (to me at least)

Okay, moved

Now after optimizing those function, it's organized differently.

stinos · 2019-12-16T09:03:20Z

lib/mp-readline/readline.c

+        }
+    }
+
+    if (pos == rl.line->len) {


It';s confusing to sometimes use ->len and other times vstr_len

Yeah it's my miss. Will update to vstr_len.

stinos · 2019-12-16T09:07:07Z

lib/mp-readline/readline.c

@@ -221,9 +291,43 @@ int readline_process_char(int c) {
            case 'O':
                rl.escape_seq = ESEQ_ESC_O;
                break;
+            #if MICROPY_REPL_EMACS_WORDS_MOVE
+                // on stm32, it compiles better when passing parameters to cursr_count_*_word from here.


Compiling either succeeds or doesn't, so i'm not sure what you mean with 'better' here? Does it produce more optimal code or so? Does that make a runtime difference?

By better I meant smaller code size. I didn't actually check what has changed - I don't think it matters since this piece of code runs on user interaction, not in the core of MP, and it's not in the hot path of anything.

I'll clarify the comment

Removed since it's incorrect now :/

stinos · 2019-12-16T09:09:03Z

tests/cmdline/repl_emacs_words_move.py

@@ -0,0 +1,9 @@
+# word movement


This doesn't cover all cases I think, i.e the cases where you return early because of reaching the end of the line?

Yeah, it doesn't, still WIP :)

Now it covers, I think

stinos · 2019-12-16T09:42:47Z

Doesn't work correctly for me, but I didn't check if the bug is in your code or elsewhere. On unix the following reproduces one of the prioblems for me:

start MicroPython
type abc = def followed by enter
up arrow to get that line back
Alt-F, does nothing, as expected
Alt-B incorrectly jumps past beginning of line to the second of the >>> characters

This is just one of the combinations, there's a bunch of problems where if you mix enough commands everything goes berserk.

Jongy · 2019-12-16T18:31:52Z

Doesn't work correctly for me, but I didn't check if the bug is in your code or elsewhere. On unix the following reproduces one of the prioblems for me:

start MicroPython

type abc = def followed by enter

up arrow to get that line back

Alt-F, does nothing, as expected

Alt-B incorrectly jumps past beginning of line to the second of the >>> characters

This is just one of the combinations, there's a bunch of problems where if you mix enough commands everything goes berserk.

Yup, reproduced. Found the problem pretty quickly - cursor_count_forward_word returned pos instead of 0 when the cursor was already at the end of the line. One of the leftovers from my debugging and different structuring for this function.

Jongy · 2019-12-16T18:46:15Z

Yup, reproduced. Found the problem pretty quickly - cursor_count_forward_word returned pos instead of 0 when the cursor was already at the end of the line. One of the leftovers from my debugging and different structuring for this function.

Added a specific test-case for this bug

Jongy · 2019-12-17T00:17:06Z

Okay, updated with fixes + full tests + rewriting the functions for better optimization, and to remove any risk of copyright infringement as @dpgeorge has been concerned.

I think this test file is one of the most complex test files I've written. readline devs, I salute you, this is one tough product to compose tests for.

Size increase with both options for stm32 is 284 bytes. With the emacs options only, it's 256 bytes.
This can be optimized even further by 36 bytes if unichar_isalnum is inlined. But that's another change (I think it's weird to inline unichar_isalnum but keep the rest as they are)

I've enabled both options by default for unix coverage build, and also for unix, because I really think the extra convenience is worth it...

stinos · 2019-12-17T08:51:26Z

lib/mp-readline/readline.c

@@ -74,6 +74,7 @@ STATIC void mp_hal_move_cursor_back(uint pos) {
        // snprintf needs space for the terminating null character
        int n = snprintf(&vt100_command[0], sizeof(vt100_command), "\x1b[%u", pos);
        if (n > 0) {
+            assert(n < sizeof(vt100_command));


This might have to go in a separate commit, plus it's not going to compile with all warnings enabled because of comparision between signed/unsigned.

I'll move to a separate commit.

It compiles with asserts enabled, I've used it to discover one of the bugs (where pos would underflow and this function will actually receive 2 ** 32 - 1). The assert was raised in that case.

It compiles with asserts enabled

Well it indeed doesn't compile on the coverage build! I've added a cast to unsigned.
It's funny we get this warning because it's in an n > 0 block, so I don't see any signed/unsigned conversion problem that might occur.

because it's in an n > 0 block

hehe yes, but compilers do not follow that reasoning, luckily

stinos · 2019-12-17T08:51:44Z

lib/mp-readline/readline.c

+            break;
+        }
+
+        pos += forward ?: forward - 1;


This doesn't look right? Also it looks like you chose to make forward an int just to be able to use forward - 1, but it's real meaning is it's used as a boolean. But then you add that int to a size_t which isn't ideal anyway. In fact, if forward is 0 I'm nout sure what is going to happen here?

Yes, forward is indeed used as kind of a boolean here, but the 1/0 representation allows producing smaller code, -8 bytes on stm32 this way (which is disappointing, I was hoping it'd be optimized correctly).

It's legit to add -1 to a size_t. As long as it doesn't underflow... (Which can't happen here due to the loop checks). On ARM, it boils down to add r4, r8 with r8=0 or r8=0xffffffff.

Actually I was also talking about the ?:. I didn't know it, turns out it's a GNU extension and so won't compile with MSVC and possibly other compilers so you should just write it out.

Uh, you're right, I keep forgetting MP compiles w/ MSVC as well. It's indeed a GNU extension. I'll change.

stinos · 2019-12-17T08:54:15Z

lib/mp-readline/readline.c

+    size_t pos = rl.cursor_pos;
+    bool in_word = false;
+
+    while ((forward || 0 < pos) && (!forward || pos < vstr_len(rl.line))) {


For me personally, this is just overly succinct and not super readable. Not sure what others think, but both for mainainance and clarity I'd rather have code which speaks for itself instead of having to manually parse 'oh, this actually just means if(forward) then X else Y

I rewrote this expression more nicely with comments. But it does have to remain somewhat the same to keep the code small..

Since you rewrote the expression you don't really need the comments anymore, since the code now is very clear :)

During readline development, this function may receive bad `pos` values. It's easier to understand the assert() failing error than to have a "stack smashing detected" message.

…d-kill-word sequences.

Jongy · 2019-12-17T22:27:44Z

The failing check is "code size increased for ports/minimal CROSS=1" but I can't reproduce it locally :( Perhaps it's checking against a stale file?

Anyway, I think it's ready.

dpgeorge · 2019-12-18T05:12:23Z

The failing check is "code size increased for ports/minimal CROSS=1" but I can't reproduce it locally :( Perhaps it's checking against a stale file?

Don't worry about that, it sometimes increases size spuriously.

dpgeorge · 2020-01-12T02:13:00Z

Thanks @Jongy for a well-written PR with tests. I've rebased and merged it in 853aaa0. I made some minor changes during the rebase:

while (1) -> for (;;) to match existing code style
rename config MICROPY_REPL_EXTRA_WORDS_MOVE to MICROPY_REPL_EMACS_EXTRA_WORDS_MOVE to signify that it's related to the EMACS option
enabled the feature on unix micropython-coverage and micropython-dev executables (not the standard one)

Jongy · 2020-01-12T06:43:24Z

Cool, Thanks :) IIRC I named MICROPY_REPL_EXTRA_WORDS_MOVE this way because these extra keys are not related to emacs (while the Alt+ are indeed emacs-based). But since it depends on the emacs keys option, perhaps it's better named this way to show it.

dpgeorge · 2020-01-12T14:08:24Z

IIRC I named MICROPY_REPL_EXTRA_WORDS_MOVE this way because these extra keys are not related to emacs (while the Alt+ are indeed emacs-based). But since it depends on the emacs keys option, perhaps it's better named this way to show it.

I see. But since the "extra" refers to additional keys on top of the standard EMACS ones, I think it's best to have the word EMACS in the config name. Otherwise it could be something like MICROPY_REPL_ALTERNATIVE_WORDS_MOVE to emphasise it's a different set of key bindings for word movement/deletion. If you think that's a better name we can change it.

Jongy · 2020-01-12T18:38:55Z

I see. But since the "extra" refers to additional keys on top of the standard EMACS ones, I think it's best to have the word EMACS in the config name. Otherwise it could be something like MICROPY_REPL_ALTERNATIVE_WORDS_MOVE to emphasise it's a different set of key bindings for word movement/deletion. If you think that's a better name we can change it.

A matter of taste in the little bits. However, since the comments on both options are extensive, it doesn't matter that much, let it be :)

robert-hh · 2020-01-12T19:23:17Z

@Jongy Thank you for the commit. It is very useful. Several times a day if used to push Ctrl-Left or Ctrl-Right for moving, which did not work. Now it does.
Besides that, I do not understand why Ctr-Delete is more difficult to implement. It follows the structure of Ctrl-Left \e[1;5D and Ctrl-Right \e[1;5C with it's coding: \e[3;5~

Jongy · 2020-01-12T22:51:55Z

Thank you for the commit. It is very useful. Several times a day if used to push Ctrl-Left or Ctrl-Right for moving, which did not work. Now it does.

I'm glad you liked it @robert-hh :) It's indeed very useful. I can't recall how I ever used the REPL without having Ctrl+Right/Ctrl+Left.

The new escape codes I added were the first codes handled in this state machine that have multiple parameters separated by ;. To keep things simple, I tried to make as few changes in the state machine as possible , so the state machine accepts 1; as the first parameter, forgets it and resets back to the state of "waiting for parameter".

If we want it to accept 3; as the first parameter as well, then to avoid possible incorrect collisions we now may get (such as \e[3;5C parsed as Ctrl+Right). I think the state machine must be updated to remember the first parameter as well and act accordingly.

So Ctrl+Delete wasn't all that important IMO to require this change 🤷‍♂️ I'm not a huge fan of it anyway since it's not supported out-of-the-box by all terminal/shell configurations.

robert-hh · 2020-01-13T06:45:05Z

it's not supported out-of-the-box by all terminal/shell configurations.
That is a problem for many of these function keys. Putty for instance does not forward the Ctrl-Cursor keys.

py/unicode: Add unichar_isalnum().

79b08ad

Jongy commented Dec 15, 2019

View reviewed changes

lib/mp-readline/readline.c Outdated Show resolved Hide resolved

stinos reviewed Dec 16, 2019

View reviewed changes

Jongy force-pushed the readline-word-sequences branch from 15df4f8 to 71d7471 Compare December 17, 2019 00:17

Jongy force-pushed the readline-word-sequences branch from 71d7471 to 5289f41 Compare December 17, 2019 00:20

stinos reviewed Dec 17, 2019

View reviewed changes

Jongy added 2 commits December 18, 2019 00:09

readline: Add an assert() to catch buffer overflows.

5352e1e

During readline development, this function may receive bad `pos` values. It's easier to understand the assert() failing error than to have a "stack smashing detected" message.

readline: Add backward-word, backward-kill-word, forward-word, forwar…

c1a0fd7

…d-kill-word sequences.

Jongy force-pushed the readline-word-sequences branch from 5289f41 to c1a0fd7 Compare December 17, 2019 22:11

dpgeorge added the enhancement Feature requests, new feature implementations label Dec 18, 2019

Jongy changed the title ~~WIP: Readline word sequences~~ Readline word sequences Dec 20, 2019

Jongy mentioned this pull request Jan 1, 2020

RFC: Linux kernel port #5482

Open

8 tasks

dpgeorge closed this Jan 12, 2020

Jongy deleted the readline-word-sequences branch January 12, 2020 06:43

Jongy mentioned this pull request Jan 12, 2020

lib/mp-readline/readline.c: Added backward/forward/kill-word support. #5024

Closed

Uh oh!

Readline word sequences #5420

Readline word sequences #5420

Uh oh!

Conversation

Jongy commented Dec 15, 2019

Uh oh!

Uh oh!

dpgeorge commented Dec 16, 2019

Uh oh!

Jongy commented Dec 16, 2019

Uh oh!

dpgeorge commented Dec 16, 2019

Uh oh!

Jongy commented Dec 16, 2019

Uh oh!

stinos Dec 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stinos commented Dec 16, 2019

Uh oh!

Jongy commented Dec 16, 2019

Uh oh!

Jongy commented Dec 16, 2019

Uh oh!

Jongy commented Dec 17, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stinos Dec 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stinos Dec 16, 2019 •

edited

Loading

stinos Dec 17, 2019 •

edited

Loading

Jongy commented Dec 17, 2019 •

edited

Loading

Jongy commented Jan 12, 2020 •

edited

Loading