Skip to content

Readline word sequences #5420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

Jongy
Copy link
Contributor

@Jongy Jongy commented Dec 15, 2019

Adds an option to build MicroPython's readline with emacs-style word move/kill sequences.
Also, an extra option to build with support for ctrl+left, ctrl+right and ctrl+w sequences is given.

When both options are enabled on stm32, increases code size by 324 bytes.

Redo of #5024.

TODO: Complete the tests. Also, perhaps optimize the function further? I've based my implementation on GNU readline so I'm feeling quite comfortable it really works in all specific edge cases.

@dpgeorge
Copy link
Member

I've based my implementation on GNU readline

That sounds a bit dangerous from a licensing point of view. What do you mean by this, did you copy code, or the algorithm?

@Jongy
Copy link
Contributor Author

Jongy commented Dec 16, 2019

That sounds a bit dangerous from a licensing point of view. What do you mean by this, did you copy code, or the algorithm?

Just based on the algorithm. Code can't really be copied from readline since it does other things as well (handles multibyte characeters, ...)

@dpgeorge
Copy link
Member

Just based on the algorithm.

Even copying the algorithm may lead to licensing issues. I want to be careful here.

@Jongy
Copy link
Contributor Author

Jongy commented Dec 16, 2019

Even copying the algorithm may lead to licensing issues. I want to be careful here.

Okay. I think that anyway it's better to use a more optimized version of this function. I'll just rewrite it.

return pos;
}

pos--; // this also ensures pos <= line length
Copy link
Contributor

@stinos stinos Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also ensures pos <= line length

I assume you added this comment to explain why you don't use an assert like above. Things is, the comment is only correct under the assumption that cursor pos never crosses line length. Which it probably should never do. But that's actually another reason to use the assert: make sure the calling code behaves, makes sure this code doesn't access out of bounds data, and a correct way of commenting expected behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, without this decrement you'd have to check it in runtime. So by this comment I wanted to remind the reader why we don't need to if it here.

But you are absolutely right about the use of asserts. I'll add one here as well.

#if MICROPY_REPL_EMACS_WORDS_MOVE
STATIC size_t cursor_count_forward_word(size_t cursor_pos, vstr_t *line) {
size_t pos = rl.cursor_pos;
const char *line_buf = vstr_str(line);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to where it actually gets used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? To move it to the first line where it's used in unichar_isalnum?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's clearer (to me at least)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, moved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now after optimizing those function, it's organized differently.

}
}

if (pos == rl.line->len) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It';s confusing to sometimes use ->len and other times vstr_len

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's my miss. Will update to vstr_len.

@@ -221,9 +291,43 @@ int readline_process_char(int c) {
case 'O':
rl.escape_seq = ESEQ_ESC_O;
break;
#if MICROPY_REPL_EMACS_WORDS_MOVE
// on stm32, it compiles better when passing parameters to cursr_count_*_word from here.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiling either succeeds or doesn't, so i'm not sure what you mean with 'better' here? Does it produce more optimal code or so? Does that make a runtime difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By better I meant smaller code size. I didn't actually check what has changed - I don't think it matters since this piece of code runs on user interaction, not in the core of MP, and it's not in the hot path of anything.

I'll clarify the comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed since it's incorrect now :/

@@ -0,0 +1,9 @@
# word movement
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't cover all cases I think, i.e the cases where you return early because of reaching the end of the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it doesn't, still WIP :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it covers, I think

@stinos
Copy link
Contributor

stinos commented Dec 16, 2019

Doesn't work correctly for me, but I didn't check if the bug is in your code or elsewhere. On unix the following reproduces one of the prioblems for me:

  • start MicroPython
  • type abc = def followed by enter
  • up arrow to get that line back
  • Alt-F, does nothing, as expected
  • Alt-B incorrectly jumps past beginning of line to the second of the >>> characters

This is just one of the combinations, there's a bunch of problems where if you mix enough commands everything goes berserk.

@Jongy
Copy link
Contributor Author

Jongy commented Dec 16, 2019

Doesn't work correctly for me, but I didn't check if the bug is in your code or elsewhere. On unix the following reproduces one of the prioblems for me:

  • start MicroPython
  • type abc = def followed by enter
  • up arrow to get that line back
  • Alt-F, does nothing, as expected
  • Alt-B incorrectly jumps past beginning of line to the second of the >>> characters

This is just one of the combinations, there's a bunch of problems where if you mix enough commands everything goes berserk.

Yup, reproduced. Found the problem pretty quickly - cursor_count_forward_word returned pos instead of 0 when the cursor was already at the end of the line. One of the leftovers from my debugging and different structuring for this function.

@Jongy
Copy link
Contributor Author

Jongy commented Dec 16, 2019

Yup, reproduced. Found the problem pretty quickly - cursor_count_forward_word returned pos instead of 0 when the cursor was already at the end of the line. One of the leftovers from my debugging and different structuring for this function.

Added a specific test-case for this bug

@Jongy Jongy force-pushed the readline-word-sequences branch from 15df4f8 to 71d7471 Compare December 17, 2019 00:17
@Jongy
Copy link
Contributor Author

Jongy commented Dec 17, 2019

Okay, updated with fixes + full tests + rewriting the functions for better optimization, and to remove any risk of copyright infringement as @dpgeorge has been concerned.

I think this test file is one of the most complex test files I've written. readline devs, I salute you, this is one tough product to compose tests for.

Size increase with both options for stm32 is 284 bytes. With the emacs options only, it's 256 bytes.
This can be optimized even further by 36 bytes if unichar_isalnum is inlined. But that's another change (I think it's weird to inline unichar_isalnum but keep the rest as they are)

I've enabled both options by default for unix coverage build, and also for unix, because I really think the extra convenience is worth it...

@Jongy Jongy force-pushed the readline-word-sequences branch from 71d7471 to 5289f41 Compare December 17, 2019 00:20
@@ -74,6 +74,7 @@ STATIC void mp_hal_move_cursor_back(uint pos) {
// snprintf needs space for the terminating null character
int n = snprintf(&vt100_command[0], sizeof(vt100_command), "\x1b[%u", pos);
if (n > 0) {
assert(n < sizeof(vt100_command));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might have to go in a separate commit, plus it's not going to compile with all warnings enabled because of comparision between signed/unsigned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move to a separate commit.

It compiles with asserts enabled, I've used it to discover one of the bugs (where pos would underflow and this function will actually receive 2 ** 32 - 1). The assert was raised in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It compiles with asserts enabled

Well it indeed doesn't compile on the coverage build! I've added a cast to unsigned.
It's funny we get this warning because it's in an n > 0 block, so I don't see any signed/unsigned conversion problem that might occur.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it's in an n > 0 block

hehe yes, but compilers do not follow that reasoning, luckily

break;
}

pos += forward ?: forward - 1;
Copy link
Contributor

@stinos stinos Dec 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right? Also it looks like you chose to make forward an int just to be able to use forward - 1, but it's real meaning is it's used as a boolean. But then you add that int to a size_t which isn't ideal anyway. In fact, if forward is 0 I'm nout sure what is going to happen here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, forward is indeed used as kind of a boolean here, but the 1/0 representation allows producing smaller code, -8 bytes on stm32 this way (which is disappointing, I was hoping it'd be optimized correctly).

It's legit to add -1 to a size_t. As long as it doesn't underflow... (Which can't happen here due to the loop checks). On ARM, it boils down to add r4, r8 with r8=0 or r8=0xffffffff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was also talking about the ?:. I didn't know it, turns out it's a GNU extension and so won't compile with MSVC and possibly other compilers so you should just write it out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, you're right, I keep forgetting MP compiles w/ MSVC as well. It's indeed a GNU extension. I'll change.

size_t pos = rl.cursor_pos;
bool in_word = false;

while ((forward || 0 < pos) && (!forward || pos < vstr_len(rl.line))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me personally, this is just overly succinct and not super readable. Not sure what others think, but both for mainainance and clarity I'd rather have code which speaks for itself instead of having to manually parse 'oh, this actually just means if(forward) then X else Y

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rewrote this expression more nicely with comments. But it does have to remain somewhat the same to keep the code small..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you rewrote the expression you don't really need the comments anymore, since the code now is very clear :)

During readline development, this function may receive bad `pos` values.
It's easier to understand the assert() failing error than to have
a "stack smashing detected" message.
@Jongy Jongy force-pushed the readline-word-sequences branch from 5289f41 to c1a0fd7 Compare December 17, 2019 22:11
@Jongy
Copy link
Contributor Author

Jongy commented Dec 17, 2019

The failing check is "code size increased for ports/minimal CROSS=1" but I can't reproduce it locally :( Perhaps it's checking against a stale file?

Anyway, I think it's ready.

@dpgeorge
Copy link
Member

The failing check is "code size increased for ports/minimal CROSS=1" but I can't reproduce it locally :( Perhaps it's checking against a stale file?

Don't worry about that, it sometimes increases size spuriously.

@dpgeorge dpgeorge added the enhancement Feature requests, new feature implementations label Dec 18, 2019
@Jongy Jongy changed the title WIP: Readline word sequences Readline word sequences Dec 20, 2019
@Jongy Jongy mentioned this pull request Jan 1, 2020
8 tasks
@dpgeorge
Copy link
Member

Thanks @Jongy for a well-written PR with tests. I've rebased and merged it in 853aaa0. I made some minor changes during the rebase:

  • while (1) -> for (;;) to match existing code style
  • rename config MICROPY_REPL_EXTRA_WORDS_MOVE to MICROPY_REPL_EMACS_EXTRA_WORDS_MOVE to signify that it's related to the EMACS option
  • enabled the feature on unix micropython-coverage and micropython-dev executables (not the standard one)

@dpgeorge dpgeorge closed this Jan 12, 2020
@Jongy
Copy link
Contributor Author

Jongy commented Jan 12, 2020

Cool, Thanks :) IIRC I named MICROPY_REPL_EXTRA_WORDS_MOVE this way because these extra keys are not related to emacs (while the Alt+ are indeed emacs-based). But since it depends on the emacs keys option, perhaps it's better named this way to show it.

@dpgeorge
Copy link
Member

IIRC I named MICROPY_REPL_EXTRA_WORDS_MOVE this way because these extra keys are not related to emacs (while the Alt+ are indeed emacs-based). But since it depends on the emacs keys option, perhaps it's better named this way to show it.

I see. But since the "extra" refers to additional keys on top of the standard EMACS ones, I think it's best to have the word EMACS in the config name. Otherwise it could be something like MICROPY_REPL_ALTERNATIVE_WORDS_MOVE to emphasise it's a different set of key bindings for word movement/deletion. If you think that's a better name we can change it.

@Jongy
Copy link
Contributor Author

Jongy commented Jan 12, 2020

I see. But since the "extra" refers to additional keys on top of the standard EMACS ones, I think it's best to have the word EMACS in the config name. Otherwise it could be something like MICROPY_REPL_ALTERNATIVE_WORDS_MOVE to emphasise it's a different set of key bindings for word movement/deletion. If you think that's a better name we can change it.

A matter of taste in the little bits. However, since the comments on both options are extensive, it doesn't matter that much, let it be :)

@robert-hh
Copy link
Contributor

@Jongy Thank you for the commit. It is very useful. Several times a day if used to push Ctrl-Left or Ctrl-Right for moving, which did not work. Now it does.
Besides that, I do not understand why Ctr-Delete is more difficult to implement. It follows the structure of Ctrl-Left \e[1;5D and Ctrl-Right \e[1;5C with it's coding: \e[3;5~

@Jongy
Copy link
Contributor Author

Jongy commented Jan 12, 2020

Thank you for the commit. It is very useful. Several times a day if used to push Ctrl-Left or Ctrl-Right for moving, which did not work. Now it does.

I'm glad you liked it @robert-hh :) It's indeed very useful. I can't recall how I ever used the REPL without having Ctrl+Right/Ctrl+Left.

The new escape codes I added were the first codes handled in this state machine that have multiple parameters separated by ;. To keep things simple, I tried to make as few changes in the state machine as possible , so the state machine accepts 1; as the first parameter, forgets it and resets back to the state of "waiting for parameter".

If we want it to accept 3; as the first parameter as well, then to avoid possible incorrect collisions we now may get (such as \e[3;5C parsed as Ctrl+Right). I think the state machine must be updated to remember the first parameter as well and act accordingly.

So Ctrl+Delete wasn't all that important IMO to require this change 🤷‍♂️ I'm not a huge fan of it anyway since it's not supported out-of-the-box by all terminal/shell configurations.

@robert-hh
Copy link
Contributor

it's not supported out-of-the-box by all terminal/shell configurations.
That is a problem for many of these function keys. Putty for instance does not forward the Ctrl-Cursor keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests, new feature implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants