-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
extmod/network_wiznet5k.c: Five small bug fixes. #8974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That was missing, and network.route() returned an empty list.
Drop an obsolete and wrong argument check. The proper checks are done further down in the code.
If nic.active(True) is called several times in a row, the device may lock up. Even if that is bad coding practice, calling wiznet5k_deinit() in wiznet5k_init() prevents the lock.
Codecov Report
@@ Coverage Diff @@
## master #8974 +/- ##
=======================================
Coverage 98.37% 98.37%
=======================================
Files 156 156
Lines 20281 20281
=======================================
Hits 19951 19951
Misses 330 330 Continue to review full report at Codecov.
|
Using keyword arguments raises an error now.
I think I found an issue with the IRQ mode. If the nic is heavily solicited (a web server for instance), I guess it can happen that an IRQ occurs whilst the program was performing an SPI transfer already. Since the IRQ handler is accessing the WIZNET chip as well, there might be some cases where a SPI access conflict could arise => system locked-up. the
and the
With these fixes, the web server seems to work without lock-ups. To be continued. I have a test running 24/7 to verify that. |
That change looks good. 100MByte bulk transfer w/o problems. I'll run a 10GByte test over night. Transfer speed ~400kByte/s. Will take ~7 hours |
The 10GByte test succeeded fine. All data was transferred, no lock-up. So the change improves the reliability substantially. |
Cool ! I'm glad to hear that. |
That's great. The standard test are still running the same way as before (as expected). You should make a PR for that, or I could add that to the current one. |
I think that you can add it to the current one as it is more or less related. |
Registering the nic with the network module is already part of this PR. It's done in wiznet5k_init(). |
Sorry, my bad. That's your first commit. However, I saw that all other nic drivers are doing this in the object new function. Shan't you move it to this function as well rather than in the init to be consistent? (I know I can be a pain...) |
It was already in the init function for the non-LWIP case, so I left it there. init() is called with nic.active(True). So actually it makes already a small difference, in that the route does not exists before the interface is active. |
Fine by me then. :-) Good job. |
@robert-hh The pain in the neck is back. I kept on making extra tests, and found some cases again where the Ethernet link is dead (without locking up the device). It seemed to be related to timing issues. I therefore moved the clear IRQ AFTER the while
I reversed the flag test so that there is only one call to the Edit: |
Well, the non IRQ version seems to work OK. I keep on thinking that the problem is due to a concurrent access to the SPI bus. Therefore, I added a test for the CS signal level to enable the call to
I pursue my tests and let you know. |
Doesn't the Interrupt have to be cleared at the RP2? Otherwise you would fall back to polling mode after the first interrupt has been ignored. |
We cannot ack the IRQ whilst the SPI bus is used. It will be acknowledged during the next poll anyway. The poll function has a while loop that would read multiple packets, therefore the poll could have read the packet that triggered a new IRQ since last time we acknowledged it, making a new IRQ to be trigger for nothing as the packet has already been read. So I think that the last IRQ shall/can be ignored by acknowledging the IRQ at the end of the poll. |
That's clear. I was concerned about the Pin IRQ of the rp2040, which has to be acknowledged to enable it again. But that is done in the hardware Pin IRQ interrupt handler before calling mpy_wiznet_read_int() as the registered callback. |
Well, after more than 12 hours of test, no lock-ups. 👍 |
That looks good. My bulk transfer test works fine. With the bulk transfer running, ping times vary between 0.6 and 5.7 ms. Still pretty good. The IRQ handler now is:
Do you still have the call to |
Well, I kept it at the end because, in my mind, it makes more sense to clear it after having executed the loop and avoid "false alarms". But you're right the impact is really small (a few nano-seconds). It would require a lot of time to statistically prove it :-). |
Would you please show me your version of the |
Here you go:
If we get there, it's either because of the regular polling every 64 ms, or because of an IRQ. If the LINK is not UP we have to clear the IRQ anyway. That was already the case before because the call to the |
Interestingly I see a difference in the ping times depending, on how the flag test is made. All ping'ing is done during the bulk transfer. With the 'old' test, the range is 0.654/1.985/5.734 ms, with the 'new' flag test, it is 0.639/1.857/10.657 ms (all min/avg/max). |
Avoiding conflicts between the IRQ and an active transfers. Before the change, the device could lock up in heavy traffic situations. Fix found and code supplied by @omogenot.
To have just one exit and a more compact flag test. This is just a style change without impact to the function.
The style of wiznet5k_poll() is changed, but in a separate commit, to keep it apart from the bug fix. |
My overnight test (with new flag):
|
@dpgeorge I think this is now ready to go, after having eliminated a rare lock-up state. These changes should affect the W5100 board as well, but there is nothing directly chip type related in the commits. So it will work, I assume. |
@dpgeorge , @robert-hh |
Sure. Nothing easier than that (almost). Thanks for testing it with that board. I considered getting one, but they seem out of stock at the EU based shops, I usually use. |
I started the tests with the W5100S board (actually I use a standard pico with a Wiznet Hat that uses a W5100S chip).
I added a small comment for future readers and a conditional compilation to avoid this useless call for other Wiznet chips. |
OK. I changed the code accordingly. So it shows again, that testing is required after code changes. |
Here are the results after almost 24 hours of tests. The test was consisting in having an Ajax HTTP request every 2 seconds to the embedded http server, 1 request every 200 ms from a Modbus TCP client, 1 ping every second.
The maximum round-trip time of 64 ms tends to show that the SPI bus congestion was handled by ignoring the IRQ and letting the polling do the work. |
Thanks. Then I'll make the commit for the most recent change. |
Five small commits fixing bugs for WIZNET5K boards:
Commits 2 though 4 are done in behalf of @omogenot, who asked me to do so.
Commit 3 effectively adds only one line, but looks larger since the deinit() function was moved avoiding a forward declaration.
The changes were tested by @omogenot on a W5500_EVB_PICO board and by me with a RP2040 pico + W5500 breakout combo. Ping times are ~0.7s for "pinging" the board and ~3 second for ping originated from the board, when interrupts are enabled. The MP test suite for Network shows a good result: