Skip to content

ESP32 RMT: two enhancement suggestions #7015

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
peterhinch opened this issue Mar 11, 2021 · 19 comments
Open

ESP32 RMT: two enhancement suggestions #7015

peterhinch opened this issue Mar 11, 2021 · 19 comments

Comments

@peterhinch
Copy link
Contributor

These stem from my use of RMT for its original purpose. My "blaster" libraries for 433Mhz and IR remote controls support Pyboard, RP2 and ESP32. In each case ESP32 RAM use was larger than on Pyboard and RP2. This could be fixed with two non-breaking changes.

  1. RMT.write_pulses(pulses, start) Allow pulses to be any object supporting the buffer protocol. This would allow half-word arrays or bytearrays, saving RAM. This behaviour would be similar to that of other MicroPython interface classes.
  2. Provide a mechanism for looping N times. 433MHz remotes typically send the same set of pulses N times in quick succession, where N is 5-10. Currently I do this (on ESP32 only) by copying. Ugly and wasteful.

I can see two alternatives for doing N times looping.

  1. Add a count=None arg to .write_pulses() or even .loop().
  2. More flexibly, provide a callback to .write_pulses(). The callback would take a single arg being the RMT instance, and would run on completion of the output. This would enable the user to implement a count (typically by passing a method as the callback). This would be my preferred solution because of its flexibility. Complex modulation schemes could be implemented efficiently in a nonblocking fashion.
    My first thought was a callback=None arg to .write_pulses() but this poses a recursion issue. A new .done_callback(cb=None) method would be better, with the default removing any current callback.
@alexeckermann
Copy link

Allow pulses to be any object supporting the buffer protocol.

This would be a great change to the RMT interface. For driving WS2811 LED's on RMT speed is everything. My current approach is to calculate the pulses and store them as lines of bytes in a file. For each animation tick I take that frames (line) bytes and transform them into a list/tuple. This transform step takes time (+ object memory) and impacts the achievable FPS of an animation. I think that C may be better and transforming bytes on the fly than having to create tuple/list objects just to suffice the interface.

That said, for me this project has been on hold for some time and I need to give it a go with the updated firmware (project currently using 1.12) to see if there has been some incremental improvement.

@Andrei-Pozolotin
Copy link

Andrei-Pozolotin commented Oct 4, 2021

now, make it 3 enhancement suggestions :-)

this is inspired by stepper motor use case from
https://forum.micropython.org/viewtopic.php?p=42691#p42659

REQUEST No 3:

please make RMT.clock_div() updatable via rmt_set_clk_div()

https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/rmt.html#change-operation-parameters

USE CASE:

to accelerate/decelerate a stepper motor, RMT.write_pulses() would re-use same static pulse pattern,
by making this pattern fractal/self-similar, up to a clock frequency scale factor
and then update clock scale between RMT.write_pulses() invocations
by using rmt_set_clk_div()

@Andrei-Pozolotin
Copy link

Andrei-Pozolotin commented Apr 15, 2022

RE: #7015

GIST: https://gist.github.com/Andrei-Pozolotin/c4b3dd041efe53cfb92cfb4de9c67267

Here is a working experimental implementation. Basic approach is:

  • use piecewise interpolation for stepper motor pulse profile
  • persist interpolation segments once, inside esp32.RMT.items store
  • start sending pulses from items store with rmt.h | rmt_set_clk_div() and rmt.h | rmt_write_items()
  • continue sending by switching between pulse patterns with rmt.h | rmt_register_tx_end_callback() system ISR

this is accomplished with addition of 2 functions to (a copy with changes:) ports/esp32/esp32_rmt.c

ports/esp32/esp32_rmt.c
// RMT.store_pulses(self, item_list:list[int]) -> None
STATIC mp_obj_t esp32_rmt_store_pulses(size_t n_args, const mp_obj_t *args) {
    esp32_rmt_obj_t *self = MP_OBJ_TO_PTR(args[0]);
    mp_obj_t item_list_obj = args[1];

    size_t num_items = 0;
    mp_obj_t *item_list_ptr = NULL;
    mp_obj_get_array(item_list_obj, &num_items, &item_list_ptr);

    if (num_items > self->num_items) {
        self->items = (rmt_item32_t *)m_realloc(self->items, num_items * sizeof(rmt_item32_t *));
        self->num_items = num_items;
    }
    
    for (mp_uint_t item_index = 0; item_index < num_items; item_index++) {
        self->items[item_index].val = mp_obj_get_int(item_list_ptr[item_index]);
    }
    
    return mp_const_none;
}
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(esp32_rmt_store_pulses_obj, 2, 2, esp32_rmt_store_pulses);
// RMT.issue_pulses(self, tx_ready_func:callable, item_index:int, item_count:int, clock_div:int) -> None
STATIC mp_obj_t esp32_rmt_issue_pulses(size_t n_args, const mp_obj_t *args) {
    esp32_rmt_obj_t *self = MP_OBJ_TO_PTR(args[0]);
    mp_obj_t *tx_ready_fn = MP_OBJ_TO_PTR(args[1]);
    mp_uint_t item_index = mp_obj_get_int(args[2]);
    mp_uint_t item_count = mp_obj_get_int(args[3]);
    self->clock_div = mp_obj_get_int(args[4]);

    check_esp_err(rmt_set_clk_div(self->channel_id, self->clock_div));
    
    rmt_register_tx_end_callback(esp32_rmt_private_tx_end_callback, tx_ready_fn);
    
    check_esp_err(rmt_write_items(self->channel_id, self->items + item_index, item_count, false));
    
    return mp_const_none;
}
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(esp32_rmt_issue_pulses_obj, 5, 5, esp32_rmt_issue_pulses);

and the following micropython user code:

  • class Planner servo-like motion planner for nema-based stepper driver
  • class Ringer isr-safe ring buffer to store stepper micro-commands
  • class Stepper servo-like nema-based stepper driver based on esp32.RMT driver

with core functionality expressed in:
Stepper.make_rotation_one()

def make_rotation_one(self) -> None:
    def make_rotation_one(self) -> None:
        "accelerate then decelerate during single drive turn"
        plan_one = self.planner.make_plan_rotate(+1, 0.5)
        plan_two = self.planner.make_plan_rotate(-1, 0.5)
        plan_full = plan_one + plan_two + [PLAN_CMD_STOP]
        # print(f"{plan_full=}")
        self.persist_motion_plan(plan_full)
        self.transmit_reactor()  # start transmit with isr self call

and

Stepper.transmit_reactor()

def transmit_reactor(self):
    def transmit_reactor(self):
        """
        function to execute stepper micro-program:
        * invoked by self "make rotation" methods for motion start
        * invoked by esp32.RMT driver "transmit ready" ISR to continue motion
        note: use isr rules https://docs.micropython.org/en/latest/reference/isr_rules.html
        """
        buffer = self.ringer
        header = buffer.ring_get()  # micro-command prefix

        if header == PLAN_CMD_RAMP:
            # micro-command parameters
            item_index = buffer.ring_get()
            item_count = buffer.ring_get()
            clock_div = buffer.ring_get()
            # start pulse sequence form rmt driver store
            self.rmt_driver.issue_pulses(# non blocking invocation
                self.transmit_function,  # this is isr setup to self
                item_index,  # place in the store
                item_count,  # pulse ramp block size
                clock_div,  # frequency for this block
            )
            self.event_cmd_ramp.set()
            return

        if header == PLAN_CMD_STOP:
            self.event_cmd_stop.set()
            return

        self.event_cmd_trap.set()  # should not happen

Wind-stormger pushed a commit to BPI-STEAM/micropython that referenced this issue Oct 13, 2022
This module has not been built in years, since the (removed) esp8266 port.
Delete the code, as it is not likely to be useful in its current form.

Closes: micropython#7015
Andrei-Pozolotin added a commit to random-builder/micropython that referenced this issue Oct 20, 2022
Andrei-Pozolotin added a commit to random-builder/micropython that referenced this issue Oct 20, 2022
Andrei-Pozolotin added a commit to random-builder/micropython that referenced this issue Oct 20, 2022
See micropython#7015. This adds 2 new functions to ports/esp32/esp32_rmt.c:
// RMT.store_pulses(self, item_list:list[int]) -> None
// RMT.issue_pulses(self, tx_ready_func:callable, item_index:int,
item_count:int, clock_div:int) -> None
@peterhinch
Copy link
Contributor Author

peterhinch commented Oct 21, 2022

The reference to isr_rules in the code comments suggests a misconception: the rules apply to hard ISR's and the ESP32 supports only soft ones. I'm unsure if this has implications for what you are doing, but ISR latency on ESP32 can be long. Up to 100ms if a GC is interrupted on hardware with SPIRAM.

@jonathanhogg
Copy link
Contributor

Thinking about this, I think the real problem is that the RMT hardware needs the pulse sequence formatted in a specific way in memory (32-bit structs containing two pulses of 15-bit output period and 1-bit output value). The current implementation converts the .write_pulses() arguments into an internal buffer formatted this way.

Perhaps the solution to the memory problems is to expose this safely as an object that supports the array/buffer protocols. For instance as a new RMTPulseArray that can be constructed from a list or built piecewise, a la:

from esp32 import RMT, RMTPulseArray

pulses1 = RMTPulseArray([(100, 1), (50, 0), (50, 1), (100, 0)])

pulses2 = RMTPulseArray()
pulses2.append((100, 1))
pulses2.append((50, 0))
pulses2.append((50, 1))
pulses2.append((100, 0))

Not necessarily wedded to the tuple argument to .add() here, but I'm thinking of symmetry with list.

Then you dispatch these arrays with:

rmt = RMT(0, pin=Pin(2), clk_div=80)

rmt.write(pulses1)
rmt.write(pulses2)

This way you can create as many of these pulse arrays as you need and retain them efficiently. To retain API compatibility, .write_pulses() could just create one of these arrays internally and then dispatch it.

Add an .irq(callback) and an .init(pin=p, clk_div=n, loop=b) to match other hardware APIs and I think you'd cover the use cases described here. I'd be happy to do a broad refactor but I'm up against it with other projects at the moment so it'd have to wait a month or two.

@mattytrentini is the original author of esp32.RMT and so may have some good/better thoughts.

[Also worth remembering that esp32 machine.bitstream() for talking to WS281x LEDs is layered on top of RMT in a very low-level way.]

@Andrei-Pozolotin
Copy link

@peterhinch : RE:

The reference to isr_rules in the code comments suggests a misconception: the rules apply to hard ISR's and the ESP32 supports only soft ones. I'm unsure if this has implications for what you are doing, but ISR latency on ESP32 can be long. Up to 100ms if a GC is interrupted on hardware with SPIRAM.

  1. thank you for pointing this out

  2. my current understanding is that these are "hard isr" coming form underlying rtos api:
    https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/peripherals/rmt.html#register-event-callbacks
    which are used to forward events to the micropython user code via:

https://github.com/micropython/micropython/pull/9685/files#diff-fe463d5375992959d11e8cce2b48a6164ff3c6d2ac606f7520082325d4081873R367

// called from esp32 RMT system ISR provided by rmt_driver_install()
STATIC void esp32_rmt_private_tx_end_callback(rmt_channel_t channel, void *arg)
  1. in my tests so far I do not see unexpected timing jitters in generated pulse patterns

@Andrei-Pozolotin
Copy link

@jonathanhogg : RE:

I think the real problem is that the RMT hardware needs the pulse sequence formatted in a specific way in memory

  1. for the "stepper-motor" - like use case, that does not matter: however slow and inconvenient rmt_item::int32 array/list preparation could be, it would be calculated only once, ahead of time, to create and store inside RMT driver the "whole repertoire of pulse sequences", to be reused for the duration of the application

The current implementation converts the .write_pulses() arguments into an internal buffer formatted this way.

  1. it seems that any implementation of .write_pulses() which uses on-the-fly conversion from any user format to the internal rmt_item::int32 format will invariably introduce long in-between-writes delays that would make it unusable for generation of continuous, non-trivial patterns beyond simple loops

@jonathanhogg
Copy link
Contributor

  1. it seems that any implementation of .write_pulses() which uses on-the-fly conversion from any user format to the internal rmt_item::int32 format will invariably introduce long in-between-writes delays that would make it unusable for generation of continuous, non-trivial patterns beyond simple loops

Which is why I suggested that the API is extended to allow pre-generation of the buffers (RMTPulseArray()) and then issuing those (RMT.write()).

What I'm proposing is a more flexible version of your RMT.store_pulses() idea, allowing multiple buffers to be constructed and stored (at maximum efficiency) in memory and then issued as necessary.

This could support your use case of only issuing a subset of the generated pulse array by implementing efficient (shared memory) slicing of RMTPulseArray, i.e.:

rmt.write(pulses1[2:])

Alternatively…

A simpler API would be to just have RMT.write() take a bytes-like/memoryview object and then just hand that as-is to the RMT hardware. Add an RMT.create_pulses() method with the same signature as RMT.write_pulses() and have it return a bytes object formatted appropriately. Then RMT.write_pulses(...) would become a synonym for RMT.write(RMT.create_pulses(...))

I'm not a giant fan of passing unknown lumps of memory to the RMT hardware, but it would allow direct creation and modification of the buffers via other low-level mechanisms (like a custom C module or ulab maybe).

@Andrei-Pozolotin
Copy link

@jonathanhogg : RE:

A simpler API would be to just have RMT.write() take a bytes-like/memoryview object and then just hand that as-is to the RMT hardware

great idea, thank you. my vote goes to this approach.

@peterhinch
Copy link
Contributor Author

@Andrei-Pozolotin I was referring to the Python code in transmit_reactor(): the C code you cite does appear to be a hard ISR. My understanding is that Python ISR's are soft on ESP32 but it would be good to have confirmation from a maintainer.

@Andrei-Pozolotin
Copy link

@dpgeorge Damien:

can you please help us get un-confused about soft-isr vs hard-isr in the scenario proposed by #9685

specifically, expected code sequence around rmt isr event processing is a follows:

  1. register a c-level call back with rmt driver via:
rmt_register_tx_end_callback()

rmt_register_tx_end_callback(esp32_rmt_private_tx_end_callback, tx_ready_fn);

// RMT.issue_pulses(self, tx_ready_func:callable, item_index:int, item_count:int, clock_div:int) -> None
STATIC mp_obj_t esp32_rmt_issue_pulses(size_t n_args, const mp_obj_t *args) {
    esp32_rmt_obj_t *self = MP_OBJ_TO_PTR(args[0]);
    mp_obj_t *tx_ready_fn = MP_OBJ_TO_PTR(args[1]);
    mp_uint_t item_index = mp_obj_get_int(args[2]);
    mp_uint_t item_count = mp_obj_get_int(args[3]);
    self->clock_div = mp_obj_get_int(args[4]);

    check_esp_err(rmt_set_clk_div(self->channel_id, self->clock_div));

    rmt_register_tx_end_callback(esp32_rmt_private_tx_end_callback, tx_ready_fn);

    check_esp_err(rmt_write_items(self->channel_id, self->items + item_index, item_count, false));

    return mp_const_none;
}
STATIC MP_DEFINE_CONST_FUN_OBJ_VAR_BETWEEN(esp32_rmt_issue_pulses_obj, 5, 5, esp32_rmt_issue_pulses);
  1. receive rmt "end of transmission" c-level isr event, lock gc, lock sheduler, and forward isr to micropython via:
esp32_rmt_private_tx_end_callback()

STATIC void esp32_rmt_private_tx_end_callback(rmt_channel_t channel, void *arg) {

STATIC void esp32_rmt_private_tx_end_callback(rmt_channel_t channel, void *arg) {

    void *state_past = mp_thread_get_state();
    mp_state_thread_t state_next;
    mp_thread_set_state(&state_next);

    mp_stack_set_top(&state_next + 1);
    mp_stack_set_limit(1024);

    mp_locals_set(mp_state_ctx.thread.dict_locals);
    mp_globals_set(mp_state_ctx.thread.dict_globals);

    mp_sched_lock();
    gc_lock();

    mp_obj_t *tx_ready_fn = (mp_obj_t *)arg;
    mp_call_function_0(tx_ready_fn);

    gc_unlock();
    mp_sched_unlock();
    mp_thread_set_state(state_past);

}
  1. process incoming rmt isr event in micropython user code function, such as:
Stepper.transmit_reactor()

https://gist.github.com/Andrei-Pozolotin/c4b3dd041efe53cfb92cfb4de9c67267#file-stepper-py-L106

    @micropython.native  # @UndefinedVariable
    def transmit_reactor(self):
        """
        function to execute stepper micro-program:
        * invoked by self "make rotation" methods for motion start
        * invoked by esp32.RMT driver "transmit ready" ISR to continue motion
        note: use isr rules https://docs.micropython.org/en/latest/reference/isr_rules.html
        """
        buffer = self.ringer
        header = buffer.ring_get()  # micro-command prefix

        if header == PLAN_CMD_RAMP:
            # micro-command parameters
            item_index = buffer.ring_get()
            item_count = buffer.ring_get()
            clock_div = buffer.ring_get()
            # start pulse sequence form rmt driver store
            self.rmt_driver.issue_pulses(# non blocking invocation
                self.transmit_function,  # this is isr setup to self
                item_index,  # place in the store
                item_count,  # pulse ramp block size
                clock_div,  # frequency for this block
            )
            self.event_cmd_ramp.set()
            return

        if header == PLAN_CMD_STOP:
            self.event_cmd_stop.set()
            return

        self.event_cmd_trap.set()  # should not happen

The code shown in snippets above seems to work fine,
but are we missing some important considerations, such as:

  • special rules for system locking?
  • and again, soft-isr or hard-isr rules here as it applies to esp32?
  • where are we to expect "the unexpected" time delays, and with what duration?

Thank you.

@dpgeorge
Copy link
Member

can you please help us get un-confused about soft-isr vs hard-isr in

On esp32, "hard ISR" Python callbacks are difficult. If you can get away with a soft callback then I suggest using that.

The differences are:

  • Hard Python callbacks will be synchronous, the callback will be called there-and-then in the C ISR handler. Python heap memory cannot be allocated in such a callback.
  • Soft Python callbacks will be asynchronous, the callback is registered as a pending callback (see mp_sched_schedule()) in the C ISR handler, then it is executed some time later by the VM.

The issue on esp32 with hard callbacks is that you must make sure the thread local state is correct. It may need an interlock to coordinate with other (Python) threads. See eg extmod/modbluetooth.c:invoke_irq_handler, when MICROPY_PY_BLUETOOTH_USE_SYNC_EVENTS_WITH_INTERLOCK is enabled.

https://github.com/micropython/micropython/blob/master/extmod/modbluetooth.c#L1261-L1294

@Andrei-Pozolotin
Copy link

@dpgeorge : thank you

@jonathanhogg @peterhinch : what do you think?

  1. it seems that we can not use soft-isr or we introduce pulse generation timing errors (was the whole point of the PR)
    and instead must use hard-isr the way it is proposed how in this PR 9685

  2. another option could be to avoid crossing c-level to micropython-level boundary altogether,
    and move rmt micro-command buffer and rmt micro-command parser from micropython-level into c-level,
    then we need to agree on the buffer internals and the command structure

@jonathanhogg
Copy link
Contributor

My goto on these things is always to look at existing implementations. For instance, Pin.irq() has a hard keyword argument, explained thusly:

hard if true a hardware interrupt is used. This reduces the delay between the pin change and the handler being called. Hard interrupt handlers may not allocate memory; see Writing interrupt handlers. Not all ports support this argument.

Supporting both would be consistent and allow the caller to make a decision about whether they need the timing guarantee of a hard interrupt or the flexibility of a soft one. Defaulting the argument to False stops the caller shooting themselves in the foot unless they mean to.

It's not a difficult implementation switch, you either dispatch the callback directly or pass it to the MPy scheduler.

@Andrei-Pozolotin
Copy link

@jonathanhogg : RE:

Supporting both would be consistent and allow the caller to make a decision

excellent approach, my vote goes to this, thank you.

@Andrei-Pozolotin
Copy link

@jonathanhogg : RE:

I'd be happy to do a broad refactor but I'm up against it with other projects at the moment so it'd have to wait

We really appreciate your willingness for "a broad refactor".

Before that bright future arrives, do you have bandwidth now to complete the review of PR 9685 and merge it?

The motivation is to have more people to play with ISR approach now and to share their feedback, i.e.:
Multiaxis stepper motors using RMT

@jonathanhogg
Copy link
Contributor

@Andrei-Pozolotin: I'm afraid that I'm not a core developer so I am unable to merge.

I don't have time for a full review, but taking a quick look at the code, my biggest problem with it is the lack of any bounds/error checking. What happens if I do an issue without having done a store? What happens if I pass an index or length that extends beyond the buffer?

Generally, it feels unready as an idea and an implementation to push to mainline.

@Andrei-Pozolotin
Copy link

@jonathanhogg : RE:

feels unready as an idea and an implementation to push to mainline

got it , thank you

@mattytrentini
Copy link
Contributor

@peterhinch I did want to support the buffer protocol but, as @jonathanhogg has pointed out, the memory had to be carefully laid out and available as a contiguous array when handing it over to the RMT methods. I did consider pre-allocating an internal array and unpacking the buffer but it was effectively doubling the memory.

It's also possible to use two channels; send one while you continue to unpack the next, then toggle...but it all became difficult and error prone - seemed best to provide a simple solution to start with.

However, we're only really scratching the surface of what we can do with RMT - it's a very flexible component! - and maybe it is time to re-think how we present the API. I'll try and find time to re-read the ESP-IDF docs again and see if I can help contribute with some improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants