-
Notifications
You must be signed in to change notification settings - Fork 2.4k
Output file can be corrupted if library keyword uses BuiltIn.run_keyword
and timeout occurs
#5417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The related code was changed recently when fixing #5395, but I tested that the same problem occurs also with earlier versions. I noticed this bug by looking at the new code, but I initially couldn't confirm it because there were not problems on Linux. My understanding on why the bug occurs is this:
I believe that fixing this requires a new approach to avoid problems with timeouts interrupting writing to output files. We could try making it possible to pause timeouts whenever we are writing something to an output file, but getting it to work reliably on all operating systems can be challenging. We probably should also make sure no more than one timeout can be active at the same time. |
Keywords executing other keywords should actually be registered as "run keyword variants" using the register_run_keyword method. If the above earlier example is changed to this problem doesn't anymore occur: from robot.api import logger
from robot.libraries.BuiltIn import BuiltIn, register_run_keyword
register_run_keyword(
library='lib',
keyword='keyword',
deprecation_warning=False
)
MSG = 'A rather long message that is slow to write on the disk. ' * 10000
def keyword():
while True:
BuiltIn().run_keyword('Log Huge Message')
def log_huge_message():
logger.info(MSG) The problem is that this registration isn't well known and that it should be cleaned up (#2190). Output files getting corrupted is such a severe issue, that I hope we could avoid that also without extra configuration. Anyway, if fixing this problem is too complicated for RF 7.3, we should at least try to enhance the related documentation and make the registration API more well known. |
Previous messages weren't restored meaning that messages could be lost. The unused variable also made linters unhappy. This code attemps to avoid problems with timeouts interrupting writing to output files. The current code was written to fix #5395 and is basically a rewrite of the fix for #2839. As #5417 explains, there are still problems and bigger changes are needed.
So on Browser and SeleniumLibrary almost all keywords should be registered as run keywords because of their run on failure functionality? Which they are not done, at least on Browser library |
Yeah, the run-on-failure functionality uses BuiltIn.run_keyword and is affected by this issue. Registering these keywords as run keywords would mean that timeouts wouldn't be able to stop them and that would probably be worse. This makes fixing this issue even more important. |
- Move high level timeout implementation from `__init__.py` to `timeout.py`. - Make it an error to start or otherwise interact with inactive timeouts. - Rename platform specific timeout classes to from `Timeout` to `<Platform>Runner`. - Create common base class for runners. - Add `Timeout.get_runner()` for getting a runner. This can be used later to get an access to a runner to be able to pause it (#5417). - Preserve `Timeout.run()` as a convenience method. - Remove locking and `_finished` flag from Windows implementation. They didn't seem to be actually needed, and also the commit were they were added mentioned that the fixed issue didn't really require them. See 1a5eeaa.
BuiltIn.run_keyword
and timeout occurs
BuiltIn.run_keyword
and timeout occursBuiltIn.run_keyword
and timeout occurs
While investigating this, I noticed that timeouts outside Windows had a bug with nested timeouts (#5422). That explained why this issue occurred only on Windows and after fixing that problem this issue manifested itself also on Linux. |
This bug was fixed by pausing timeouts when |
Missing part of the already closed #5417.
Also small adjustment to when to raise a timeout on resume. Earlier timeout was raised on the first resume after timeout had been exceeded, but now the runner must be fully resumed. This behavior is more logical than the earlier. The change shouldn't affect execution at all. The reason is that if timeout is paused in nested manner, `BuiltIn.run_keyword` has been used in recursively and each recursive call has its own timeout. The last one of them will raise a timeout on resume immediately after the timeout has been exceeded anyway. This is part of #5417.
The fix didn't work properly if |
- Enhance tests related to pausing timeouts. - Fix the aforementioned tests on Windows. - Enhance Windows timeout code to avoid race conditions. Related to #5417.
Explain that using `BuiltIn.register_run_keyword` to disable timeouts isn't relevant anymore now that #5417 is fixed.
Output files can be corrupted if a timeout occurs when Robot is writing log messages. That is a serious issue, because it prevents creating normal log and report files. Investigating this kind of problems is also pretty annoying, because what happens depends on when exactly the timeout occurs. It could be that most times everything goes well, but now and then the output is corrupted.
To avoid these problems, we nowadays delay writing log messages when timeouts are enabled (#2839). Before we had this mechanism in place, the following example reproduced the problem pretty reliably regardless the operating system:
Unfortunately the current mechanism doesn't prevent the problem if a library keyword uses
BuiltIn.run_keyword
for executing other keywords that log messages. For example, this variation corrupts the output file nearly every time I run it on Windows:I haven't been able to get the above example, or any variation of, to fail on Linux. This is most likely due to the platform specific timeout implementation, but looking at the related code, it seems the problem could happen anywhere. Anyway, this is a serious issue even if it would occur only on Windows.
The text was updated successfully, but these errors were encountered: