Skip to content

ESP32/machine_wdt.c: Add WDT.deinit() method. #6631

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

IhorNehrutsa
Copy link
Contributor

@IhorNehrutsa IhorNehrutsa commented Nov 18, 2020

The WDT.deinit() method allows you to execute sys.exit() to enter to REPL(WebREPL) manual mode without reboot caused by the WDT.

This function may be interested in ESP8266, WiPy, RP2, and other ports.

@IhorNehrutsa
Copy link
Contributor Author

IhorNehrutsa commented Nov 18, 2020

Test code:

import sys
from machine import WDT

wdt = WDT(timeout=1000)

try:
    while True:
        # do something

        if check_everything_is_ok():
            wdt.feed()

        if need_to_exit_without_reboot():
            wdt.deinit()
            sys.exit()
            # will enter to REPL 
except:
    raise
    # WDT will reboot the MCU after any exceptions

@peterhinch
Copy link
Contributor

Is this wise? A WDT is supposed to force a reset even in the event that the code runs wild. To guarantee this, there should be no way to disable the WDT in code.

@kevinkk525
Copy link
Contributor

kevinkk525 commented Jun 8, 2021

I can see the usefulness in deinitializing a wdt. Sometimes you might just want to debug something remotely during runtime, so you have to interrupt your program to check variables etc.. But if you have a WDT active, after a few seconds you'll be "thrown out" and debugging could become a nuisance.
My software Watchdogs all have the ability to be deinitialized for this exact reason. Sometimes I would just use the webrepl to connect to a deployed unit and quickly check/test some stuff and therefore disable the watchdog.

@IhorNehrutsa
Copy link
Contributor Author

@kevinkk525 Yes, yes, yes. WDT.deinit () is only needed in debug mode, not in product release.
And I use it with WEBrepl too.

@peterhinch
Copy link
Contributor

I suppose one approach would be to make it a compile-time option. I will be interested to see the views of the maintainers.

@IhorNehrutsa
Copy link
Contributor Author

Let's take a look at a standalone MicroPython device which is only connected to a W-iFi router.
Here there are minimal boot.py and main.py.
boot.py
main.py
boot.py starts after reset, connect to Wi-Fi network, and then main.py starts.
main.py initialize the WDT and creates an infinite working(runtime) loop.

Now you can connect to device via WebREPL, see print() messages, but you can't upload files to device when your program stay in an infinite working loop. You need to exit from the loop.
You can exit loop, but WDT resets the device if you are using WDT.init() and don't have WDT.deinit().

When you connect to the REPL(via USB-UART)(if it possible), you can break boot.py or main.py by Keyboard Interrupt before starting WDT, but you don't have a time to press Ctrl-C in WebREPL.

@IhorNehrutsa
Copy link
Contributor Author

IhorNehrutsa commented Sep 7, 2021

Dear @dpgeorge and other maintainers. Are there any chances of a code review? Thanks.

P.S. Is the maintainer's group exist? How is the right way to request a code review?

https://docs.github.com/en/github/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/requesting-a-pull-request-review

@IhorNehrutsa
Copy link
Contributor Author

@xorbit @jimmo @jonathanhogg @karfas @toybuilder
I noted that you are successful ESP32 port contributors. Thank you all.
Perhaps you would be interested in this function.
Could you please give your expert opinion or take a code review or tell that something needs to be added/corrected in the documentation for this PR.
Thanks.

@rolandvs
Copy link
Contributor

Better to build a debug and production version w/ or w/o watchdog. If the watchdog is enabled there should be NO WAY to disable it again. In almost all cases micro-controllers will run with watchdog (when enabled) until hell_freezes_over or reset.

@IhorNehrutsa
Copy link
Contributor Author

@rolandvs
Please imagine a standalone IoT device with only a network wi-fi connection accessible.
It may be IoT mesh on a multistorey building or a manufactory.
It may be a device IP67 or IP68 class protection without any connectors etc.
It may be a little web server or it may be an MQTT client etc.
It works standalone well and WDT is enabled.
If NO WAY to disable the WDT, how to upgrade software (xxx.py, xxx.cfg files, etc (not the Micropython firmware.bin)) to the next stable revision?
WebREPL is enabled and you can connect to the device but can't send files because the device is in an infinite loop.
If you break the infinite loop, you must call WDT.feed() manually, otherwise, WDT will reset the MCU and you will lose your wi-fi connection.

Is there another way to update the software over the network with WDT enabled?

@jonathanhogg
Copy link
Contributor

A simpler approach to the specific use case described would be to just reinitialise the watchdog with a longer timeout – like an hour or something. You can just call WDT() again to reinitialise it – there's only actually one watchdog timer (it's a singleton object).

You get the same practical effect as turning it off, but retain both the ideological purity of the watchdog being unstoppable and the safety mechanism of an eventual automatic reboot if you cock up the REPL.

Of course, you still run the risk of a bodged OTA update, but that's a whole different problem. My last solution to this involved retaining multiple versioned copies of the Python code with checksums and a fallback to the previous version in the event of a watchdog reset.

@IhorNehrutsa
Copy link
Contributor Author

@jonathanhogg Thank you very much.

Watchdog with a longer timeout – like an hour

That is solution

@kevinkk525
Copy link
Contributor

+1 and not all hardware WDT support timeouts of an hour. Pyboard-D is only a few minutes iirc

@IhorNehrutsa IhorNehrutsa reopened this Sep 10, 2021
@karfas
Copy link

karfas commented Sep 10, 2021

I have really no idea why I'm subscribed to this topic, but here are my 2 cents:

A programming language used in the embedded area needs to give the user as much control as possible about the hardware. If the language/runtime allows to switch on a function like the WDT it should also allow the opposite.

@peterhinch: While I also think that the WDT should remain active forever once enabled, it is also my opinion this should be MY decision as a developer, not some language/runtime restriction "because ugly/bad design/whatever".

It looks like there is some need for a function like this.
Is it really Micropython's (our) task to protect the user from his own bad design decisions at the cost of reduced flexibility ?

@jonathanhogg
Copy link
Contributor

So I'm not against adding a deinit() method by any means. As @karfas says, if the underlying hardware supports it then why not surface the functionality and let the developer decide whether to use it – the ESP32 port already allows the timeout to be changed (though that may originally have been an oversight).

That said, I'd argue that the machine.WDT() docs should represent the baseline functionality supported by all ports and any special functionality available to a specific port should be in the port quickref. Obviously, this PR doesn't attempt to implement deinit() for any other port and I'm not even clear that it's possible on other devices. Same goes for the related #7770 PR.

Generally, I'd say the docs need a bit of work and I note that this PR also introduces a new reset reason code without mention or documentation.

@peterhinch
Copy link
Contributor

On further thought my point about it being desirable to have no way to disable a WDT, while theoretically correct, doesn't actually work in this case.

The fact that it is possible to write code which implements .deinit() implies that there is a sequence of machine code instructions that disables the WDT. Consequently, even if .deinit() is not implemented, if an electrical spike causes the ESP to execute garbage the consequence might be a disabled WDT.

A real WDT is a piece of hardware attached to the reset pin.

@jonathanhogg
Copy link
Contributor

A real WDT is a piece of hardware attached to the reset pin.

I believe that the ESP32 watchdog is actually a per-thread watchdog implemented at the RTOS level.

@karfas
Copy link

karfas commented Sep 13, 2021

A real WDT is a piece of hardware attached to the reset pin.

Abstracting the interface to a generic piece of hardware (chosen by the user, initialized by a combination of I/O pins, a string sent from the UART, whatever) is most likely not micropython's machine.WDT() business.

Are there any ports or boards out there without a watchdog of some kind?

@robert-hh
Copy link
Contributor

Are there any ports or boards out there without a watchdog of some kind?

A watchdog module is present in the ports for STM32, ESP32, ESP8266, RP2 and CC3200.
ESP8266 has a software WDT, the other ports do not seem to have one.
For mimxrt, machine.WDT is prepared but the PR is not yet made.

@peterhinch
Copy link
Contributor

My point is that a real, grown-up, mission critical WDT is implemented in hardware (either on-chip or externally). Such a WDT is designed so it cannot be disabled at a hardware level. If it is possible to disable it in code, then a processor that has run wild might conceivably disable it.

If the hardware design is such that the WDT can be disabled in code then all bets are off.

My initial objection is therefore invalid. Adding a Python method doesn't make any difference: the problem is in hardware and cannot be altered in code.

@chrismas9
Copy link
Contributor

I agree with Peter and Jonathan. There should be no way to disable a Watchdog and increasing the timeout is a safer way to debug. You can only increase it to 32 sec on STM32, but that's probably manageable.

If the hardware design is such that the WDT can be disabled in code then all bets are off.

Not quite, the STM32 requires a write sequence to unlock and disable the WDT. As long as that code sequence doesn't exist it's unlikely a corrupt program counter would find random code to execute the sequence required. But if the disable code is built into the codebase, even if it is never called then a corrupt program counter may find it.

@peterhinch
Copy link
Contributor

Not quite, the STM32 requires a write sequence to unlock and disable the WDT. As long as that code sequence doesn't exist it's unlikely a corrupt program counter would find random code to execute the sequence required. But if the disable code is built into the codebase, even if it is never called then a corrupt program counter may find it.

Good point, but I don't think we've quite got to the end of this rabbit hole.
What you say is undoubtedly true if the code that executes the sequence is written in C. But what if it were written in Python? A CPU that runs wild executes machine code, and machine code to directly issue that sequence would not exist. To issue the sequence it would have to execute the Python VM machine code in such a way that the "right" bytecode ran. If that happened on my watch I'd swallow my view of gambling and rush out for a lottery ticket...

@IhorNehrutsa
Copy link
Contributor Author

The discussion has moved away from the main issue.
The question is not about the reliability of the software, but about the possibility of upgrading the software.
How to update the software with a short period of WDT (like seconds)?

Try to update this:
boot.py

print('boot.py')
from machine import WDT
wdt = WDT(timeout=10000)

The solution is to reinitialize the watchdog with a longer timeout or WDT.deinit().
I selected reinitialization.
Thanks @jonathanhogg

@IhorNehrutsa
Copy link
Contributor Author

IhorNehrutsa commented Dec 15, 2021

PR is closed in favor of
Reinitialize watchdog timer with longer timeout #7770

@mrx23dot
Copy link

isn't there NVIC_SystemReset() ?

@karfas
Copy link

karfas commented Nov 27, 2023

@mrx23dot: NVIC_SystemReset() does only exists for STM32. I don't see any relevance to the PR discussed here (watchdog behavior/interface on ESP32).

@mrx23dot
Copy link

it's CMSIS, every ARM and riscV, maybe they also implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants