-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
USB transmission gradually slowing down (in sawtooth pattern) #7479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Have you measured the temperatur of your pico while doing so? Is it more steady when not overclocked? |
@veni-vidi-code Thanks for the tip, but surface temperature is well below 35 °C all the time. No throttling expected, even when overclocking. |
One critical line slowing down the process is this one:
It creates the data object to be sent every time it is called. That will lead to garbage collectikon after a few iterations. It is more effective to create that data buffer beforehand. like:
|
@robert-hh Thanks, this improved speed by few %. Interestingly, the speed cycling occurs at the same period with regards to amount of data. I am now quite confident the cycling occurs at each 418 chunks, i.e. 4180 kB transmitted. |
At least there seems not much difference between sys.stdout and sys.stdout.buffer any more, unless you changed the plotting style. Can you test, whether these delays are caused by the RP2 or by your PC? |
@robert-hh Sorry for confusing you. In the second picture above, both datasets used Now I compared clock speeds. The delays are roughly doubled when Rpi runs at 125 MHz: So I guess the bottleneck is somewhere in Rpi - not sure whether it is micropython or the USB interface. What puzzles me the most is that we can see a repeated, almost fractal-like pattern on the speed curves. Like a function of data transmitted modulo 4180 kB. But this pattern does not scale exactly with the clock speed, though some features are maintained between 125 and 250 MHz. |
I also noticed that the transmission speed does not depend on the size of |
It might be worth trying this with the latest (stable) pico-sdk and tinyusb, see #7474. |
@dpgeorge I agree - either I could report the bug persists in the latest μP version (which is good), or I would find that it is no more there (which is even better). So far I used |
No change here with MicroPython v1.16-48-g0b3332c8e. Times between 0.0126 and 0.084 seconds. |
@robert-hh I can confirm, no change whatsoever. Why is the sawtooth repeated each 418 or 419 chunks? Why are there 10 steps? Note these curves were measured today on a different computer, and they sligtly differ in shape from the previous ones. |
I also changed the computer script so that it receives 1kB chunks instead of 10kB. To make it comparable, I normalized the time to 10kB also in this case. We can see finer structure of the delays, perhaps some mismatch of RP2 and computer loops. But the general trend of transmission slowing down in still obvious. |
Have you looked at https://github.com/raspberrypi/pico-examples/tree/master/pio/logic_analyser ? |
@lurch Sure I did. It would be interesting to see if the problem persists - AFAIK the logic analyser uses the There are many options to do the job. Every project is a tradeoff between spending time to try out different ways, and between spending time to fix one chosen way. For example, this week I thought micropython on RP2 could be the right choice. Maybe it is not. The previous two weeks I wasted time almost finishing my project in compiled C for ESP32 -- only to find out its USB communication is slow (limited to 921600 baud or less on some modules) and not reliable (spitting random 0xFF bytes in the data stream). Too bad I found it so late, now I believe that ESP32 is not suitable for serious projects involving USB connection. |
I'm sure @kilograham and @hathach would be very appreciative of any stress-testing you can give to the USB stack 😀 And as a bonus, you get to narrow down whether the problem is with MicroPython or with TinyUSB 😉 |
Now I must admit that the problem seems to be beyond my hacking skills, and beyond my time dedicated for making the RP2/computer communication fast. Anyway, initial data rate is 10kB in 0.012 s, that is, 830 kB/s. This is very promising, but after transmitting 320 kB, the speed starts dropping as shown above. This must be some software trouble and I still believe more skilled coders will be able to find its cause. While currently I cannot dig through micropython + TinyUSB codebase, I have few ideas that might inspire somebody:
|
I have seen things run slower due to the heap filling up and finding a free block taking longer. Once garbage collection happens, the cycle starts over. Here is a non-USB example of an analogous situation: adafruit#2602. Also see a PR that ameliorated this to some extent: adafruit#2614, with comments by @jimmo. |
@dhalbert I also guessed this will be the correct explanation, but ...
data = b'x'*1000
buf = usys.stdout.buffer
while True:
buf.write(data)
gc.collect() Still, while this example allocates no extra object on heap explicitly, this garbage would have to be created somewhere in the implementation of Is it correct? Can anybody more knowledgeable than me review the code for garbage-prone allocation? |
I experimented further, changing the size of from machine import Pin
import time, machine, usys, gc, micropython
machine.freq(125000000*2)
data = b'x'*14
buf = usys.stdout.buffer
while True:
buf.write(data) Naturally transmitting in such short bursts slows the transmission down due to overhead of the program loop. What interested me more is its impact on the sawtooth function.
Can this inspire somebody? |
@FilipDominec Do you have an FTDI232 or PL2303 or similar USB-UART adaptor? Might be interesting to repeat your experiment, but sending the data over the RP2040's UART instead of USB? Obviously the maximum data-rate will be much reduced, but it would be interesting to see if it stays constant or shows a similar slowdown over time? |
Using UART, the pattern looks different. For a 10000 byte block at 921600 baud it oscillates between 0,10767 and 0,1131 seconds. Doubling the baud rate reduces the time by two, but the pattern stays. Edit2: Changing the data block size to 100 or 10 and removing the sleep(1) does not change the times or pattern at all. |
Using online tips and getting inspired from the "Adafruit TinyUSB/CDC/cdc_multi" example in Arduino studio, it is not hard to flood TinyUSB from C, either: #include <Adafruit_TinyUSB.h>
#include "pico/stdlib.h"
void setup() {
set_sys_clock_khz(250000, false);
Serial.begin(115200);
}
void loop() {
Serial.printf("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx");
} And the result is beatiful, sustained throughput at 770 kB/s for this simple C program: So there is either a bug in micropython, its interface to TinyUSB, or in the way I am using it to transmit data to computer. |
Does MicroPython have any profiling tools we could use to figure out where it’s spending time? |
Could you try to increase Note CFG_TUD_CDC_RX_BUFSIZE/CFG_TUD_CDC_TX_BUFSIZE` may also need to be increased as well. |
|
Using the Pico & PC code in the first post of this issue, I was able to reproduce the "saw tooth" plots. Using PR #7474 seems to fix the problem completely, I can get a sustained ~7Mbits/sec transfer rate from Pico to PC. I'll merge that PR shortly. |
That is great to hear 🎉 🎉 |
I'm not sure where the issue was and what the fix was but it should be somewhere in the pico-sdk and-or tinyusb. I don't have time to do a git-bisect on these to find out 🙂 |
yeah, to be honest, tinyusb isn't speed optimized at the moment. Since I am more focused on the portability. I am glad that micropython could pull off a great throughput 7Mbps which will give me a lot more time to work on the speed feature :) |
Data rate confirmed. Setting CFG_TUD_CDC_EP_BUFSIZE to 256 as suggested by @hathach increases the data rate further by 35% to about 9.5 Mbit/sec. |
Since the rp2040 has quite a bit of RAM it's probably worth making this the default (or at least bump it to 128). |
Actually, testing this (using 256 for CFG_TUD_CDC_EP_BUFSIZE) shows no difference in data rate (for Pico -> PC) for me. @robert-hh what test script to you use to get 9.5Mbit/sec? |
I'm using stripped down versions of the initial script:
Receive:
|
Good job, everybody! The remaining task is to provide micropython newbies (like me) an official tutorial for interfacing with the computer over USB. As a matter of fact, the popular and seemingly preferred way, Remember the words of a prophet:
|
Ok, thanks. Using that I can also achieve around 9MBit/sec (with 256 endpoint size). And I get a similar rate using the same test on a PYBD-SF2. |
But note that |
This improves the speed of data going out of the MCU and in to the USB host. See related micropython#7479. Signed-off-by: Damien George <damien@micropython.org>
See #7492 for this. |
There's also the obvious difference that But good to hear that you've managed to speed up |
That's not technically true: |
Ahh, I must have been muddling up Python's |
This improves the speed of data going out of the MCU. See related micropython#7479. Signed-off-by: Damien George <damien@micropython.org>
Is there also a possibility to use this in C? |
@derdudee For questions regarding the C SDK, you ought to be asking at https://github.com/raspberrypi/pico-sdk/issues |
@derdudee Without questioning the note of lurch, I would like to point you to https://github.com/FilipDominec/rp2daq/blob/main/rp2daq.c#L157 where I commented the issue. Do not forget to set |
@FilipDominec Are you able to read data from you COM Port? Or where is fwrite printing to? |
Use:
and:
to read. This:
will keet binary data safe from changing 0x0D to 0x0D0A. But you need make change in https://github.com/raspberrypi/pico-sdk/blob/master/src/rp2_common/pico_stdio_usb/include/tusb_config.h bike presented above here in the thread. |
I am trying to (mis)use Rpi Pico as a scientific data acquisition platform, imagine a 500 kSpS cheap oscilloscope. It is easy to set up communication with μPython: writing data to stdout on the Rpi Pico, and reading data from /dev/ttyACM0 (or other quasi-serial port) in the computer.
Its direct USB support would promise decent speed up to 1 MB/s, which would be just fine for the task. In practice, there are tighter bottlenecks described below. I made a little script for Rpi Pico, flooding the USB with data, and measuring the speed.
My code for μPython in Rpi Pico:
My code to receive the data in the computer:
First observation is that on Rpi Pico clocked at 250 MHz, one has to avoid using
print()
as it is slow (some 30−50 kB/s). Callingusys.stdout.write()
is better (up to 200 kB/s), butusys.stdout.buffer.write()
can initially transmit 10kB in 0.015 s, getting close to the USB "Full Speed" throughput limit (i.e. ~600 kB/s). So I happily use the latter function for data.Second observation: The trouble is that this speed drops continously - after few seconds of transmission, transmitting 10kB takes 0.050 s (i.e. only 200 kB/s). Then suddenly the transmit time returns to its initial value and starts growing linearly again. This repeats with the transmit times forming a sawtooth pattern.
Here I paste part of the printout on my computer where the drop in transmit time happened:
Complementary observations:
2a) Without overclocking, Rpi Pico running at 125 MHz, the transmit speeds are roughly half of what I wrote above, so the Rpi Pico is the bottleneck here.
2b) When transmit speed goes high, I thought the chip resets, but it is not the case (LED does not blink).
2c) If I put a random
time.sleep(1)
after some hundreds of transmission cycles, the transmit speed does not change much after the delay and continues to grow steadily when delay expires (so it is probably not a USB buffer error).2d) Between transmitting cycles, I observed no significant changes in the
micropython.mem_info()
printout. So it is probably not a memory issue either.2e) The period of main transmission loop (as measured on oscilloscope) is proportional to the time required to receive data in computer. So no packets are probably dropped.
Achieving fast USB transmission is a generally useful task and micropython on Rpi Pico does surprisingly well. At least initially. Can anybody give me a hint to get constantly high USB throughput, please?
The text was updated successfully, but these errors were encountered: