-
Notifications
You must be signed in to change notification settings - Fork 51
MQTT connect() does not timeout, causing hang #115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
nice work. will be keeping an eye on this |
I adapted the code to run both in CircuitPython and also standard Python environment (it lives on https://github.com/vladak/shield/tree/mqtt_bare) and when running it on Raspbian (64-bit) with Python 3.9.2 against the "rogue" Netcat server, it also hangs.
ad infinitum (15 minutes to be precise :-) however I am fairly confident this will be looping forever, as long as the TCP connection is alive). On the terminal running the Netcat process I can see the initial MQTT message being accepted:
Interrupting the program it produces this:
so this is likely looping in Adafruit_CircuitPython_MiniMQTT/adafruit_minimqtt/adafruit_minimqtt.py Lines 517 to 528 in 67e0ade
I confirmed this hypothesis by editing the
The frequency of these debug prints correlates with the timeout values of the poll() syscall observed with So, this |
This sort of |
This is pretty much the same as #110 except:
connect()
I think this is actually a CircuitPython problem, however this was discovered while trying to get
adafruit_minimqtt
to work on (supposedly) flaky network, so I am filing this here for initial investigation.This is happening on
Adafruit CircuitPython 7.3.2 on 2022-07-20; Adafruit Feather ESP32S2 with ESP32S2
, particularly on Adafruit ESP32-S2 Feather with BME280 Sensor - STEMMA QT - 4MB Flash + 2 MB PSRAM. There is 2000 mAh battery and the Adafruit TMP117 ±0.1°C High Accuracy I2C Temperature Sensor connected. I don't think the sensor and battery matter for this issue, I am adding them for completeness.Assume the following code:
The
secrets.py
file looks like this:This is based on CircuitPython example code on https://learn.adafruit.com/mqtt-in-circuitpython/connecting-to-a-mqtt-broker
In my case, sometimes, for some reason, the MQTT broker does not respond to messages sent by the MQTT client, using the above code. The reason might be e.g. packet loss or some software/environment issue, however for this case it does not matter. What happens in such case is that the client hangs, instead of timing out.
If there is similar problem with MQTT broker communication, the MQTT client should properly timeout, however this does not happen. Internally, the socket timeout is set to 1 second (#112). This timeout should matter for various operations as stated on https://docs.python.org/3/library/socket.html#socket.socket.settimeout
Now, instead of letting the above MQTT client connect to a real MQTT broker, I will let it to connect to a fake server, which accepts any messages, however does not send anything back. This is done via Netcat like so:
nc -l 4444
Now, when the MQTT client connects to such server, it reports this:
and that's it. It would stay like this for minutes or hours.
Observing the traffic, it goes like this:
So, the TCP connection is still up, MQTT client is waiting for reply which it is not going to get and the socket timeout value is not honored.
Interesingly, when I terminate the Netcat process, the TCP connection goes down:
yet the MQTT client would not budge - it is still hung.
Terminating the client via Ctrl-C on the serial console, I get this:
This means the
recv_into()
called inside_sock_exact_recv()
does not return at all even though the socket timeout was set to 1 second:Adafruit_CircuitPython_MiniMQTT/adafruit_minimqtt/adafruit_minimqtt.py
Line 267 in 67e0ade
Now, when I do not start the Netcat process and run the MQTT client code, the TCP connection cannot be established, and the program properly reports
Repeated socket failures
and exits.For some other details see https://forums.adafruit.com/viewtopic.php?p=937186#p937186
The text was updated successfully, but these errors were encountered: