-
-
Notifications
You must be signed in to change notification settings - Fork 8.4k
esp32/machine_uart: Fix leaks of machine.UART irq & event task. #17856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
@dpgeorge This might be the crash that you mentioned you were seeing randomly on ESP32 port, however I think it's probably a pre-existing issue in v1.25 so suggest we bump it to v1.27. (EDIT: Have confirmed this crash also happens in v1.25) @DvdGiessen As you've done some work on this driver I'd appreciate any input you might have. :) |
That's how I would have done it, it would be quite simple. Although I didn't consider the case where you recreate the same UART later on... is that worth worrying about? FWIW, some peripherals on some ports use the static object approach, while others use dynamic+finaliser (eg see zephyr's UART). (Hmm, esp32 I2S implementation seems to use finaliser and static objects...) |
I see... I looked at stm32 and rp2 and they're both static, and also saw that the extmod/machine_uart.c implementation hadn't added finaliser support yet - so I figured the "correct" way was static objects. However, I actually did create a version with the finaliser before changing directions - do you want me to submit that as a PR, instead? The other worry I had with the finaliser version was ordering during soft reset: the UART finaliser should also disable the rxidle timer instance, but |
I'm sometimes not sure myself whether static or finaliser is the "correct" way. They are both valid approaches. But, looking at the code again, I think we need to use static objects: if the user creates a UART, registers an IRQ, and then "forgets" the UART instance and it gets collected by the GC, the IRQ task will still run and access invalid data (the UART object is since reclaimed). That's really why we need static objects, so interrupt handlers can find the data, and so it doesn't get reclaimed. If the above bug exists, then I guess we must do it the static object way, ie this PR. |
I think if the finaliser is mapped to From what I can see most of the weird behaviour shows up if you create two objects for the same UART - IIUC the second constructor unregisters the first object's IDF driver and orphans its irq resources, and then if the second object also registers an irq and the first object is GCed then its finaliser would disable the second object's IRQ... |
Yes, I think you're right about that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read through it and looks good to me! Haven't yet had time to test it on hardware.
ports/esp32/machine_uart.c
Outdated
self->txbuf = 256; | ||
self->rxbuf = 256; // IDF minimum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specific to this PR: Reading through it again I noticed it is kind of inefficient we don't already set these to the values passed in the args
object.
The TX/RX buffer size can only be changed by uninstalling and re-installing the driver, thus even if we pass the desired buffer size in the constructor of the UART object, we still end up first installing the driver with 256 byte buffers and then immediately uninstalling it so we can then reinstall it with the actually desired buffer sizes.
Might be something for a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading through it again I noticed it is kind of inefficient we don't already set these to the values passed in the args object.
This sequence is in make_new(), creating the object with default values. init() will not change all of them to new values, and will not set the default values. The intention for init() is to change only parameters defined in the call and leave all others as they were.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention for init() is to change only parameters defined in the call and leave all others as they were.
Yes, I agree, that makes sense. What I meant is that when calling for example UART(1, rxbuf=1024)
in Python (for the first time / assuming it was not initialized before) the following happens:
make_new
setsself->rxbuf = 256
.make_new
callsuart_driver_install(uart_num, self->rxbuf, ...)
, thus initializing the driver with a 256 byte buffermake_new
then it callsinit_helper
- Which actually looks at the argument we passed, which is rxbuf=1024.
- Sees that the driver was initialized with rxbuf=256.
- Calls
uart_driver_delete(...)
to delete the driver we just installed because of that wrong buffer size. - Then calls
uart_driver_install(...)
again to actually set the desired buffer size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New version of this PR streamlines this so uart_driver_install()
is only called once, with (hopefully) correct parameters.
ports/esp32/machine_uart.c
Outdated
if (self->rxidle_timer != NULL) { | ||
machine_timer_disable(self->rxidle_timer); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, since this function is also called inside mp_machine_uart_init_helper
, I'm not sure whether that still works correctly. Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think only disabling the timer here breaks the IRQ when the TX/RX buffer sizes are changed while we're inside the RXIDLE timeout window.
Given an UART that already has an IRQ configured, the following sequence is possible:
Data arrives on the UART -> the event task sets rxidle_state = RXIDLE_ALERT
-> the UART buffer size is changed -> the reinitialization causes the timer to be disabled.
New incoming data will not re-enable the timer, because the state is not RXIDLE_STANDBY
. It will change the state to RXIDLE_ALERT
, which doesn't make much of a difference.
The IRQ is then still configured thus users can reasonably expect it to be functional, but it doesn't do anything until explicitly reconfigured.
Presumably, correctly resetting the rxidle_state
when we disable the timer fixes that:
diff --git a/ports/esp32/machine_uart.c b/ports/esp32/machine_uart.c
index 6838ef0d85..3e056e3ea3 100644
--- a/ports/esp32/machine_uart.c
+++ b/ports/esp32/machine_uart.c
@@ -502,6 +502,9 @@ static void mp_machine_uart_deinit(machine_uart_obj_t *self) {
}
if (self->rxidle_timer != NULL) {
machine_timer_disable(self->rxidle_timer);
+ if (self->rxidle_state > RXIDLE_STANDBY) {
+ self->rxidle_state = RXIDLE_STANDBY;
+ }
}
// Only stop the ESP-IDF driver entirely if it's not the REPL,
// (if it's the REPL then it shouldn't have any ISR configured anyway.)
This makes it so that the state is returned to the STANDBY
state, and thus the timer will correctly be re-enabled when the event task does receive new data.
Optional improvement
And while not directly related, but we should probably clean up this double condition that makes the already moderately difficult to follow RXIDLE timer logic harder to read:
diff --git a/ports/esp32/machine_uart.c b/ports/esp32/machine_uart.c
index 6838ef0d85..8c7a2ad74b 100644
--- a/ports/esp32/machine_uart.c
+++ b/ports/esp32/machine_uart.c
@@ -148,10 +148,8 @@ static void uart_event_task(void *self_in) {
// Event of UART receiving data
case UART_DATA:
if (self->mp_irq_trigger & UART_IRQ_RXIDLE) {
- if (self->rxidle_state != RXIDLE_INACTIVE) {
- if (self->rxidle_state == RXIDLE_STANDBY) {
- machine_timer_enable(self->rxidle_timer);
- }
+ if (self->rxidle_state == RXIDLE_STANDBY) {
+ machine_timer_enable(self->rxidle_timer);
}
self->rxidle_state = RXIDLE_ALERT;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DvdGiessen for thinking this through. Maybe we don't need to disable the timer here at all:
- In the case of soft reset, we could deinit timer before uart - so the timer interrupt is already gone.
- In the case of manual deinit then we disable the IRQ, so at most one spurious timer event occurs (IIUC)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of manual deinit then we disable the IRQ, so at most one spurious timer event occurs (IIUC)
Yes, although that event will also propagate (it will call the IRQ handler set by the user), which may be a bit surprising if you deinited the UART but didn't explicitly set the IRQ handler to None first. (Presumably a handler will then try to read data from the UART, raising an OSError.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks.
I've modified this by always setting rxidle_state = RXIDLE_INACTIVE
during the deinit call - that stops the UART irq from restarting the timer until Python calls calls irq()
or irq().trigger(...)
again. Then at the next line the UART IRQ is disabled when we delete the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that cause the same original problem, namely the IRQ will no longer function after the buffer size has been changed?
I think that rxidle_state
is always >= RXIDLE_STANDBY
if mp_irq_trigger & UART_IRQ_RXIDLE
, which is why I went with setting it back to standby instead of deactivating it. Otherwise it'd be more consistent to also explicitly remove/disable the IRQ, and also we need to reconfigure the timer in the init_helper
after using deinit
(or change init_helper
to not use deinit
at all, but then it should duplicate a lot more of the logic).
that stops the UART irq from restarting the timer
I don't think that can happen: The timer is started in the uart_event_task
, and just above here in the deinit
method we already called vTaskDelete
so that task should not be scheduled to run anymore at this point. Which is why I thought it was safe to just reset the value to RXIDLE_STANDBY
; the same state it would be in after initially configured.
@projectgus I tested that PR with a GENERIC ESP32 and a ESP32_C6 (after applying the latest change for C6) and it works. |
There's another problem with deiniting and static instances: If you deinit, the driver gets uninstalled. Then if you reinitialise the driver doesn't get installed again, because that only happens when creating the object (unless you change the (Found this in testing, it manifested as a lot of I think the fix for this is similar to what would be needed to fix the downside you mentioned. Potentially this can be done by observing whether the driver is installed (using the same function you used in the deinit), as an indicator of whether the object was de-initialized? Something like this (not extensively tested but seems to work): diff --git a/ports/esp32/machine_uart.c b/ports/esp32/machine_uart.c
index cafeb1de6f..e9e7d31802 100644
--- a/ports/esp32/machine_uart.c
+++ b/ports/esp32/machine_uart.c
@@ -420,9 +420,15 @@ static mp_obj_t mp_machine_uart_make_new(const mp_obj_type_t *type, size_t n_arg
machine_uart_obj_t *self = MP_STATE_PORT(machine_uart_objs)[uart_num];
if (self == NULL) {
+ if (!uart_is_repl(uart_num) && uart_is_driver_installed(uart_num)) {
+ mp_raise_msg(&mp_type_RuntimeError, MP_ERROR_TEXT("UART already has a driver installed"));
+ }
+
// Allocate a new UART object
self = mp_obj_malloc(machine_uart_obj_t, &machine_uart_type);
-
+ MP_STATE_PORT(machine_uart_objs)[uart_num] = self;
+ }
+ if (!uart_is_driver_installed(uart_num) || uart_is_repl(uart_num)) {
// Reset the default state of the UART object
self->uart_num = uart_num;
self->bits = 8;
@@ -479,8 +485,6 @@ static mp_obj_t mp_machine_uart_make_new(const mp_obj_type_t *type, size_t n_arg
check_esp_err(uart_param_config(self->uart_num, &uartcfg));
check_esp_err(uart_driver_install(uart_num, self->rxbuf, self->txbuf, UART_QUEUE_SIZE, &self->uart_queue, 0));
}
-
- MP_STATE_PORT(machine_uart_objs)[uart_num] = self;
}
mp_map_t kw_args; |
Thanks again for the additional testing. That approach seems sensible, will try and make a unit test for this sequence as well. |
Just to clarify this point, the static RAM overhead as implemented in this PR is pretty small - four bytes per available UART. The case where more RAM is used after this change is if the firmware initialises a UART object and then permanently stops using it - some of the memory associated with that UART object won't be reclaimed until soft reset (although >90% of it can be reclaimed by calling |
e114df8
to
b936f71
Compare
@DvdGiessen I need to do more testing, but I've updated this PR to fix the deinit/init case, and as a bonus it no longer has to initialise the driver twice if you set custom buffer sizes. |
b936f71
to
d13ecf1
Compare
Makes machine.UART objects static instances (similar to other ports), adds machine_uart_deinit_all() function for cleanup on soft reset. - Fixes the case where the OS-level uart_event_task is leaked (along with 2KB of heap) when a UART object is garbage collected (including after a soft reset). - Fixes hard fault if a previous UART irq() fires after the UART object is garbage collected (including after a soft reset), but before any new UART object is instantiated for the same UART number. - Constructing multiple UART objects for the same UART now returns the same instance multiple times, not different instances. - Was also able to streamline deinit/init to only install the driver once, rather than install-with-defaults/uninstall/reinstall. Alternative would be to add a finaliser, but this is more consistent with how most other UART objects are implemented. This work was funded through GitHub Sponsors. Signed-off-by: Angus Gratton <angus@redyak.com.au>
d13ecf1
to
f1eb45c
Compare
Observed another difference in behaviour: Previously, because making a new UART object always re-installed the driver, a newly created UART object in Python would always have an empty RX buffer. However after this change we no longer uninstall the driver in Presumably we can add a call to Because data can sit in the RX buffer for a long time that's probably the most relevant one. I don't think it's necessary to do something with the TX buffer. |
Summary
Makes machine.UART objects static instances (similar to other ports), adds
machine_uart_deinit_all()
function for cleanup on soft reset.Found while testing the fix for #17844.
The hard fault is a little awkward to reproduce, because it happens after the soft reset cleans up the UART object, and depends on something toggling the RX pin before that UART number is reinitialised. I initially triggered it when I'd forgotten to add the jumper pin for the UART tests, the tests failed, and then it suddenly crashed while I was attaching the jumper!
This work was funded through GitHub Sponsors.
Testing
For each of ESP32-C2,C3,C6,S2 and original:
extmod_hardware/machine_uart*.py
tests and verified no failures (except for known issue with rxidle on C6, fixed by esp32: Fix UART rxidle and machine.Timer on ESP32-C2,C6 #17844).Manual test sequence of:
mpremote PORT repl --inject-file (file with snippet)
junk_tx
phase.I haven't added an automated test for this exact problem as it's hard to write one: the current bug could potentially be triggered by a manual
gc.collect()
, but that's no longer the case now the UART instances are static.Trade-offs and Alternatives
machine.UART
objects for the sameuart_num
then I think the first one becomes invalid, after this change they would both refer to the exact same object which is how it is on most ports).deinit()
a UART object and then initialise it later without a soft reset, any parameters which aren't passed in explicitly will stay the same as before thedeinit()
. I think it would be better if those parameters reverted to the documented defaults, but it turns out this is somewhat fiddly to implement so leaving it as-is for now.