Skip to content

Weird problems with tcp #5806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cziter15 opened this issue Feb 22, 2019 · 8 comments
Closed

Weird problems with tcp #5806

cziter15 opened this issue Feb 22, 2019 · 8 comments
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.

Comments

@cziter15
Copy link
Contributor

cziter15 commented Feb 22, 2019

Hello,

I'm fighting an issue in my mqtt client for last three weeks. I'm using PubSubClient library. It all runs on esp12f module with 2.5.0 release.

The behaviour is strange, my mqtt connection timeouts and it depends on lwip version. No error, no crash, just socket timeout. I'm not using delays, except one 10ms in loop, to allow cpu to sleep at idle task (light sleep).

I've tried lwip2 higher bandwidth/lower memory/ipv6 and it does not matter, connection timeouts few times per hour.

When switched to lwip 1.4 higher bandwidth, it timeouts only once a day, but still.

WiFi connection is not being lost. Router seems to be connected without interruptions.

Tried earlier releases, same results.

@devyte
Copy link
Collaborator

devyte commented Feb 23, 2019

Try latest git. Issues were found in SDK pre3 which is included in 2.5.0, so the SDK has been reverted to 2.2.1 after 2.5.0 release, ref #5784, see pinned issue.
Older core releases have other issues that have been fixed since, so reverting the core isn't enough either (you could be switching one reason that causes timeout for a different one that also causes timeout).

@devyte devyte added the waiting for feedback Waiting on additional info. If it's not received, the issue may be closed. label Feb 23, 2019
@cziter15
Copy link
Contributor Author

cziter15 commented Feb 23, 2019

Latest git fails at SPIFFS.

Exception 3: LoadStoreError: Processor internal physical address or data error during load or store
PC: 0x40215f48
EXCVADDR: 0x402497b1

Decoding stack results
0x4020b148: spiffs_hal_read(unsigned int, unsigned int, unsigned char*) at esp8266\2.5.0\cores\esp8266\spiffs_hal.cpp line 45
0x4020b204: spiffs_hal_write(unsigned int, unsigned int, unsigned char*) at esp8266\2.5.0\cores\esp8266\spiffs_hal.cpp line 102
0x4020b340: spiffs_hal_erase(unsigned int, unsigned int) at esp8266\2.5.0\cores\esp8266\spiffs_hal.cpp line 168
0x4020ac60: SPIFFSImpl::open(char const*, fs::OpenMode, fs::AccessMode) at esp8266\2.5.0\cores\esp8266\spiffs_api.cpp line 30
0x40209db8: fs::FS::open(char const*, char const*) at esp8266\2.5.0\cores\esp8266\FS.cpp line 221
0x4020139d: Config::openConfigObject() at \esp8266_espino\Debug\Config\Config.cpp line 12
0x4020115c: BridgeApp::begin() at \esp8266_espino\Debug\BridgeApp.cpp line 16
0x40100124: millis() at esp8266\2.5.0\cores\esp8266\core_esp8266_wiring.cpp line 186
0x4020926e: __yield() at esp8266\2.5.0\cores\esp8266\core_esp8266_main.cpp line 100
0x402010a3: setup() at OOKBridge\OOKBridge/OOKBridge.ino line 25
0x402010a9: setup() at OOKBridge\OOKBridge/OOKBridge.ino line 26
0x40209234: loop_wrapper() at esp8266\2.5.0\cores\esp8266\core_esp8266_main.cpp line 122

I have of course error checking in my code. SPIFFS worked properly in 2.5.0.
There is nothing special in my SPIFFS code...

https://pastebin.com/yz1LLp1r

@devyte
Copy link
Collaborator

devyte commented Feb 23, 2019

Your logs show 2.5.0. Either you're still using the release, or you installed git in a non-supported way.

@cziter15
Copy link
Contributor Author

cziter15 commented Feb 23, 2019

I just checked latest git into 2.5.0 directory.

@devyte
Copy link
Collaborator

devyte commented Feb 23, 2019

That's not a supported installation method. Please read the docs.

@cziter15
Copy link
Contributor Author

cziter15 commented Feb 28, 2019

I've found, that this crash was caused by PROGMEM specifier on my filename variable.
Now testing, lwip2 is signifficantly more stable. Using lower memory variant, no disconnections for 53926 seconds, however test is still running :)

@cziter15
Copy link
Contributor Author

cziter15 commented Mar 12, 2019

I've found that my router antenna was not original one.

Antenna has been connected using wrong SMA socket pair (female-to-female), which obviously didn't work. That was possible cause for all of my problems. Now (after adding an adapter) I have very stable and excellent signal (~ -30dBm).

My module is up for 41 hours now, without disconnection. However, I'd like to wait some more time, to definitely confirm that was caused by antenna.

@cziter15
Copy link
Contributor Author

cziter15 commented Apr 7, 2019

Moved to another router firmware
(Openwrt), that allowed me to do some diag tests/scripts. There was a problem with LTE rsrp parameter, that sometimes cause timeouts, but doesn't restarts whole LTE session.

Changed network operator to another one. No problems, tcp connection is alive for 6 days now.

Conclusion: Never trust "conection time" status on router's configuration panel.

Please close this issue.

@devyte devyte closed this as completed Apr 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting for feedback Waiting on additional info. If it's not received, the issue may be closed.
Projects
None yet
Development

No branches or pull requests

2 participants