Skip to content

ESP32 - v1.21.0 slower than v1.20.0 running a large application #12609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
beyonlo opened this issue Oct 6, 2023 · 11 comments
Closed

ESP32 - v1.21.0 slower than v1.20.0 running a large application #12609

beyonlo opened this issue Oct 6, 2023 · 11 comments
Labels
needs-info This issue needs more information to be resolvable port-esp32

Comments

@beyonlo
Copy link

beyonlo commented Oct 6, 2023

Hi all

I have an application running on the v1.20.0 (official .bin release - not compiled by myself) in my ESP32-S3 (16MB Flash 2MB RAM). So I just flashed it with the new v1.21.0 (official .bin release) and I clearly noted that everything is slowest.

  1. How I note that everything is slowest? My application has tasks (asyncio), where some that are network communications (TCP), UART (485), write files, display, etc. I have also one thread. So, I noted that everything is slowest, I mean, network answer is slowest, thread is slowest, and write file is slowest, etc.
  2. I really don't know exactly what part goes slow and I didn't yet a specific test to find out where is slow.
  3. I see that this version (v1.21.0) was changed to the IDF-5 and was changed also how the heap management works py/gc & esp32: Add MICROPY_GC_SPLIT_HEAP_AUTO to automatically grow heap on esp32 #12141 - that are two big changes. Can be some of these changes made this new version slowest?

Has anyone experimented the same problem with this new version?

Thank you very much

@beyonlo beyonlo added the bug label Oct 6, 2023
@jimmo
Copy link
Member

jimmo commented Oct 6, 2023

I really don't know exactly what part goes slow and I didn't yet a specific test to find out where is slow.

Can you please try to quantify this somehow. How are you measuring this. When you say "it's slower" are we talking a few percent or are things taking twice as long.

Ideally could you please post a code snippet that demonstrates this.

Some small performance difference between releases is expected for ESP32 because it's quite sensitive to how the code is laid out in flash.

The heap changes should actually be a performance improvement (due to improving GC times). IDF5 shouldn't make a difference for regular python code execution.

@beyonlo
Copy link
Author

beyonlo commented Oct 6, 2023

I really don't know exactly what part goes slow and I didn't yet a specific test to find out where is slow.

Can you please try to quantify this somehow. How are you measuring this. When you say "it's slower" are we talking a few percent or are things taking twice as long.

Great. I don't give you this information because I wouldn't like to say a very wrong number and I still do not measure (with real tests) it. I just see by my eyes that are slowest. For example, I have a Master TCP getting data over WiFi from ESP32 in 50 ms frequency. When I changed do 1.21 I see that response from ESP32 is slowest.
My guess that is around 10-30%

Ideally could you please post a code snippet that demonstrates this.

Yes, that's why I wrote about number 2: "I really don't know exactly what part goes slow and I didn't yet a specific test to find out where is slow." I would like so much to find exactly what part (UART, WiFi, asyncio, write files, etc) are causing that problem to demonstrate it in a very simple example to you works on it. But for now I still don't have it. I will try to do better tests to try to find it and demonstrate in an example.

Some small performance difference between releases is expected for ESP32 because it's quite sensitive to how the code is laid out in flash.

That's small performance difference is not affecting me so much. My intention with this thread is just give you a feedback about this new version and maybe try to improve in the next version and share with others my experience, maybe someone have the same problem

The heap changes should actually be a performance improvement (due to improving GC times). IDF5 shouldn't make a difference for regular python code execution.

Yes, I see that (new heap management) is a very big improve in performance on the GC - so that should be faster and not slower - I agree totally with you.

I'm just giving you my first feedback (with nothing examples to prove that). Maybe that slowest by a problem in my code (application) and not in the Micropython, but why my code works faster on the the 1.20 than 1.21 I don't know.

@jimmo
Copy link
Member

jimmo commented Oct 7, 2023

@beyonlo The problem is that without specifics I don't have a way to investigate this without spending a lot of time chasing random things and don't know whether this needs to be prioritised or not.

Maybe that slowest by a problem in my code (application) and not in the Micropython, but why my code works faster on the the 1.20 than 1.21 I don't know.

Exactly. There shouldn't be a reason that changing from one release to the next should cause your application to slow down if it's running the same Python code, regardless of what your code is doing.

I just see by my eyes that are slowest. For example, I have a Master TCP getting data over WiFi from ESP32 in 50 ms frequency. When I changed do 1.21 I see that response from ESP32 is slowest. My guess that is around 10-30%

Likely I'm not understanding what you're saying because I'm not following how you can visually tell a difference of 10-30% on something that happens 20 times per second.

FWIW, here are the execution and network performance tests v1.20 release vs v1.21 release. Note that one thing that isn't tested here is flash read/write performance.

Python execution performance:

$ ./run-perfbench.py -s ~/esp32-s3-1.20 ~/esp32-s3-1.21
diff of scores (higher is better)
N=100 M=100                /home/jimmo/esp32-s3-1.20 -> /home/jimmo/esp32-s3-1.21         diff      diff% (error%)
bm_chaos.py                    373.27 ->     365.59 :      -7.68 =  -2.057% (+/-0.07%)
bm_fannkuch.py                  80.14 ->      81.28 :      +1.14 =  +1.423% (+/-0.07%)
bm_fft.py                     2667.19 ->    2539.25 :    -127.94 =  -4.797% (+/-0.01%)
bm_float.py                   5676.96 ->    4924.24 :    -752.72 = -13.259% (+/-0.02%)
bm_hexiom.py                    43.40 ->      43.72 :      +0.32 =  +0.737% (+/-0.00%)
bm_nqueens.py                 3636.60 ->    3703.46 :     +66.86 =  +1.839% (+/-0.02%)
bm_pidigits.py                 648.13 ->     546.44 :    -101.69 = -15.690% (+/-1.17%)
core_import_mpy_multi.py       430.38 ->     406.91 :     -23.47 =  -5.453% (+/-0.03%)
core_import_mpy_single.py       81.94 ->      79.69 :      -2.25 =  -2.746% (+/-0.19%)
core_qstr.py                    99.09 ->      89.21 :      -9.88 =  -9.971% (+/-0.03%)
core_yield_from.py             121.75 ->     124.95 :      +3.20 =  +2.628% (+/-0.00%)
misc_aes.py                    376.71 ->     398.65 :     +21.94 =  +5.824% (+/-0.01%)
misc_mandel.py                3789.88 ->    3695.71 :     -94.17 =  -2.485% (+/-0.03%)
misc_pystone.py               2313.43 ->    2517.96 :    +204.53 =  +8.841% (+/-0.00%)
misc_raytrace.py               396.92 ->     383.74 :     -13.18 =  -3.321% (+/-0.02%)
viper_call0.py                 287.28 ->     300.30 :     +13.02 =  +4.532% (+/-0.00%)
viper_call1a.py                284.21 ->     296.41 :     +12.20 =  +4.293% (+/-0.00%)
viper_call1b.py                246.49 ->     254.81 :      +8.32 =  +3.375% (+/-0.00%)
viper_call1c.py                246.87 ->     255.61 :      +8.74 =  +3.540% (+/-0.00%)
viper_call2a.py                281.21 ->     292.60 :     +11.39 =  +4.050% (+/-0.00%)
viper_call2b.py                227.54 ->     233.93 :      +6.39 =  +2.808% (+/-0.00%)

Network transfer (over WiFi) for v1.20 (as server):

Accepted connection: ('192.168.1.105', 38378)
Interval           Transfer     Bitrate
  0.00-1.01   sec   784 KBytes  6.35 Mbits/sec
  1.01-2.02   sec   880 KBytes  7.14 Mbits/sec
  2.02-3.03   sec   912 KBytes  7.39 Mbits/sec
  3.03-4.04   sec   912 KBytes  7.43 Mbits/sec
  4.04-5.06   sec   880 KBytes  7.08 Mbits/sec
  5.06-6.07   sec   944 KBytes  7.65 Mbits/sec
  6.07-7.07   sec   912 KBytes  7.45 Mbits/sec
  7.07-8.08   sec   960 KBytes  7.77 Mbits/sec
  8.08-9.09   sec   880 KBytes  7.14 Mbits/sec
  9.09-10.09  sec   800 KBytes  6.57 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
  0.00-10.09  sec  8.66 MBytes  7.20 Mbits/sec  sender

(as client)

Connecting to ('192.168.1.105', 5201)
Interval           Transfer     Bitrate
  0.00-1.01   sec   270 KBytes  2.20 Mbits/sec
  1.01-2.01   sec   272 KBytes  2.23 Mbits/sec
  2.01-3.01   sec   281 KBytes  2.30 Mbits/sec
  3.01-4.01   sec   278 KBytes  2.28 Mbits/sec
  4.01-5.01   sec   281 KBytes  2.29 Mbits/sec
  5.01-6.01   sec   278 KBytes  2.28 Mbits/sec
  6.01-7.01   sec   278 KBytes  2.28 Mbits/sec
  7.01-8.02   sec   272 KBytes  2.23 Mbits/sec
  8.02-9.02   sec   270 KBytes  2.19 Mbits/sec
  9.02-10.00  sec   281 KBytes  2.35 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
  0.00-10.01  sec  2.70 MBytes  2.26 Mbits/sec  sender
  0.00-10.02  sec  2.70 MBytes  2.26 Mbits/sec  receiver

Network transfer (over WiFi) for v1.21 (as server):

Accepted connection: ('192.168.1.105', 44554)
Interval           Transfer     Bitrate
  0.00-1.01   sec   976 KBytes  7.91 Mbits/sec
  1.01-2.02   sec  1.00 MBytes  8.31 Mbits/sec
  2.02-3.03   sec  1.03 MBytes  8.54 Mbits/sec
  3.03-4.03   sec  1.06 MBytes  8.90 Mbits/sec
  4.03-5.04   sec  1.05 MBytes  8.71 Mbits/sec
  5.04-6.06   sec  1.03 MBytes  8.54 Mbits/sec
  6.06-7.07   sec  1.05 MBytes  8.70 Mbits/sec
  7.07-8.08   sec  1.05 MBytes  8.67 Mbits/sec
  8.08-9.08   sec  1.03 MBytes  8.60 Mbits/sec
  9.08-10.06  sec  1.00 MBytes  8.63 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
  0.00-10.06  sec  10.3 MBytes  8.55 Mbits/sec  sender

(as client)

Connecting to ('192.168.1.105', 5201)
Interval           Transfer     Bitrate
  0.00-1.00   sec   293 KBytes  2.40 Mbits/sec
  1.00-2.01   sec   299 KBytes  2.44 Mbits/sec
  2.01-3.02   sec   293 KBytes  2.38 Mbits/sec
  3.02-4.02   sec   290 KBytes  2.37 Mbits/sec
  4.02-5.02   sec   296 KBytes  2.42 Mbits/sec
  5.02-6.02   sec   290 KBytes  2.37 Mbits/sec
  6.02-7.02   sec   293 KBytes  2.40 Mbits/sec
  7.02-8.02   sec   293 KBytes  2.40 Mbits/sec
  8.02-9.02   sec   296 KBytes  2.42 Mbits/sec
  9.02-10.01  sec   278 KBytes  2.32 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
  0.00-10.01  sec  2.85 MBytes  2.39 Mbits/sec  sender
  0.00-10.03  sec  2.85 MBytes  2.39 Mbits/sec  receiver

@beyonlo
Copy link
Author

beyonlo commented Oct 7, 2023

Hi @jimmo

@beyonlo The problem is that without specifics I don't have a way to investigate this without spending a lot of time chasing random things and don't know whether this needs to be prioritised or not.

You are right. I will try to do many other tests isolating each part of my software to find out where exactly is the problem, and after try to create an example where I can paste to you proving that problem, so you can investigate better.

Maybe that slowest by a problem in my code (application) and not in the Micropython, but why my code works faster on the the 1.20 than 1.21 I don't know.

Exactly. There shouldn't be a reason that changing from one release to the next should cause your application to slow down if it's running the same Python code, regardless of what your code is doing.

Yes, exactly. That's very strange because is the same Python code. I only changed the MicroPython version.

I just see by my eyes that are slowest. For example, I have a Master TCP getting data over WiFi from ESP32 in 50 ms frequency. When I changed do 1.21 I see that response from ESP32 is slowest. My guess that is around 10-30%

Likely I'm not understanding what you're saying because I'm not following how you can visually tell a difference of 10-30% on something that happens 20 times per second.

The Client TCP (that request data over WiFi to Micropython Server TCP) has a counter for each response from the Server TCP. Visually that counter is faster on the 1.20.0. I also did now a real test to you have real numbers:
1.21.0 - 600 seconds requesting - 7528 responses
1.20.0 - 600 seconds requesting - 9942 responses

I think that not means that problem is the my network application, because as my application does many other asyncio tasks "simultaneously": UART (RS-485), calculations, write files (rarely), etc, and one thread getting from the SPI sensors, some of those things can affecting the network task, and so on. Visually that responses over UART (RS-485) maybe slowest in the 1.21.0 too. I need to do real test here too.

FWIW, here are the execution and network performance tests v1.20 release vs v1.21 release. Note that one thing that isn't tested here is flash read/write performance.
Python execution performance:

That's great. Thank you for this report.

$ ./run-perfbench.py -s ~/esp32-s3-1.20 ~/esp32-s3-1.21

Following this, 1.21.0 has very same performance than 1.20.0

And by your reports the network performance on the WiFi (Client and Server) is faster on the 1.21.0 than 1.20.0. That's fantastic!

Thank you very much for your time!

@dpgeorge dpgeorge changed the title ESP32 - v1.21.0 slowest than v1.20.0 ESP32 - v1.21.0 slower than v1.20.0 running a large application Oct 8, 2023
@yunnanpl
Copy link

Just my small tip: I had also sometimes a feeling ("impression") that after version changes my "application" got slower (mostly visible by slower http response times, as this is really the only important thing for me).

The solution was clearing the complete memory, flashing the binary again and uploading the scripts again.
So, "dirty update" is not always good for me, and in case of any issues, I do "clean update" first. Up to now, it solves most of the unexpected issues.

Not sure if it has something with this that ESP32 is "quite sensitive to how the code is laid out in flash", but maybe...

@beyonlo
Copy link
Author

beyonlo commented Oct 16, 2023

Hello @yunnanpl

The solution was clearing the complete memory, flashing the binary again and uploading the scripts again.

Sorry, but is there a way to change from 1.20.0 to 1.21.0 without "clearing the complete memory, flashing the binary again and uploading the scripts again"? Like I wrote in my initial text, I'm using the official binary release 1.20.0 and 1.21.0 (not compiled by myself)

So, "dirty update" is not always good for me, and in case of any issues, I do "clean update" first. Up to now, it solves most of the unexpected issues.

Could you please to tell us what exactly is the relation/means "dirty update" and "clean update" with this subject "change version 1.20.0 to 1.21.0"?

Thank you!

@yunnanpl
Copy link

Hi @beyonlo,
in short:

  • dirty update = flashing new micropython without clearing flash, just overwriting micropython area, so that the scripts are not re-uploaded
    esptool.py --chip esp32 --port COM3 write_flash -z 0x1000 ESP32_GENERIC-20230426-v1.20.0.bin

  • clean update = clearing whole flash, and then uploading micropython and re-uploading scripts
    esptool.py --chip esp32 --port COM3 erase_flash
    esptool.py --chip esp32 --port COM3 write_flash -z 0x1000 ESP32_GENERIC-20230426-v1.20.0.bin
    ampy to upload the files

This does not concerns the latest update in a special way, but it concerns any update I did since I use micropython.
So, my approach is, first to do the dirty update and in case of issues, I do clean update. I do it clearly before fixing/correcting the script.

Sorry if this caused confusion ;)
I had it exactly lately while updating 1.19 to 1.20, whereas it concerned 2 out of 5 pieces of ESP32. One was responding slow, and the other did not even booted as it was complaining about the memory. After clean-flash both are working fine.

PS. I always use official binaries.

@beyonlo
Copy link
Author

beyonlo commented Oct 16, 2023

@yunnanpl thank you to clarify.

I always erase the flash before write a new micropython version.

esptool.py --chip esp32s3 --port /dev/ttyACM0 erase_flash

So, that is not the problem, but new micropython version in real

@beyonlo
Copy link
Author

beyonlo commented Oct 16, 2023

I did some more tests and I think that I found the problem. The UART/485 application is slowest in the 1.21.0 than 1.20.0, and as it is slow, all applications (other tasks) goes slow too, because UART/485 is using more time in the 1.21.0 to do the same job. That's not a final conclusion, so I will to do more tests to have a more secure results/answer.

@maxi07
Copy link

maxi07 commented Nov 12, 2023

I am running my app on a Raspberry Pico W with a webserver (Microdot). With micropython v1.20 I have about 15KB more available memory than running the same application in v1.21.

@jonnor
Copy link
Contributor

jonnor commented Sep 1, 2024

To be able to do anything with this, we really need a minimally reproducible code example.

@maxi07 that does not seem to be related to the issue of the original reporter.

@jonnor jonnor added needs-info This issue needs more information to be resolvable and removed bug labels Sep 1, 2024
@jonnor jonnor closed this as completed Mar 3, 2025
@jonnor jonnor closed this as not planned Won't fix, can't repro, duplicate, stale Mar 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-info This issue needs more information to be resolvable port-esp32
Projects
None yet
Development

No branches or pull requests

6 participants