Skip to content

Conversation

BenWiederhake
Copy link
Collaborator

test_round_robin_limited_file_descriptors is flaky and causes real problems.

The test imposes .limit(Resource::NOFILE, 9, 9), that's the point of the test. On my machine, this number can be lowered to 5; it always works with 5 or above, and never works below that. So I would assume that the "real" limit is 5 (plus minus a bit wiggle room for version differences).
On CI, it usually works with 9, but sometimes fails in the middle of the run (xaz, the 26th file), so it seems like there is a real issue, like an fd leak. (So we should not just raise the number.)

So let's at least make this test more verbose. This way, the next time it fails, we can see where exactly in <OutFiles as ManageOutFiles>::get_writer it fails. (At least that's where I think it fails.)

I have a bit of a bad feeling that it might be the line out_file.maybe_writer.as_mut().unwrap().flush()?;, i.e. flushing old files while there are no free descriptors left.

I would also love to run lsof at the time of crash, but since I cannot reproduce this issue locally, there's no way for me to do so. (And trying to do it automatically seems extremely difficult.)

@BenWiederhake BenWiederhake marked this pull request as ready for review February 25, 2024 15:36
@BenWiederhake
Copy link
Collaborator Author

Changes since last push: None, I just want a re-run.

Android build flaked, and this time I'm not gonna create a PR to fix it:

[2024-02-25 15:40:47]    Compiling memchr v2.7.1
[2024-02-25 15:40:47] error: failed to run custom build command for `proc-macro2 v1.0.78`
[2024-02-25 15:40:47] 
[2024-02-25 15:40:47] Caused by:
[2024-02-25 15:40:47]   could not execute process `/data/data/com.termux/files/usr/tmp/cargo-install57bj3O/release/build/proc-macro2-ca558865293f126b/build-script-build` (never executed)
[2024-02-25 15:40:47] 
[2024-02-25 15:40:47] Caused by:
[2024-02-25 15:40:47]   Text file busy (os error 26)
[2024-02-25 15:40:47] warning: build failed, waiting for other jobs to finish...
[2024-02-25 15:40:49] error: failed to compile `cargo-nextest v0.9.67`, intermediate artifacts can be found at `/data/data/com.termux/files/usr/tmp/cargo-install57bj3O`.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@BenWiederhake
Copy link
Collaborator Author

Changes since last push: None, I just want a re-run.

Our copy of the GNU tests flaked. I must have somehow angered the gods of CI flakiness.

Log of `test_uniq::gnu_tests Test 112.stdin`
Test 112.stdin
run: /target/i686-unknown-linux-musl/debug/coreutils uniq -D -c
thread 'test_uniq::gnu_tests' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "failed to write to stdin of child: Broken pipe (os error 32)" }', tests/common/util.rs:2031:18
stack backtrace:
   0: rust_begin_unwind
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/std/src/panicking.rs:578:5
   1: core::panicking::panic_fmt
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/panicking.rs:67:14
   2: core::result::unwrap_failed
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1687:5
   3: core::result::Result<T,E>::unwrap
             at /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/result.rs:1089:23
   4: tests::common::util::UChild::wait_with_output
             at ./tests/common/util.rs:2028:13
   5: tests::common::util::UChild::wait
             at ./tests/common/util.rs:1975:22
   6: tests::common::util::UCommand::run
             at ./tests/common/util.rs:1570:9
   7: tests::common::util::UCommand::run_piped_stdin
             at ./tests/common/util.rs:1578:9
   8: tests::test_uniq::gnu_tests
             at ./tests/by-util/test_uniq.rs:1058:22

@BenWiederhake
Copy link
Collaborator Author

uniq GNU tests flaked again in the same test. I'll ignore it this time.

@sylvestre sylvestre force-pushed the dev-split-verbose-test branch from fbc6349 to 4eb8997 Compare February 28, 2024 08:43
@BenWiederhake BenWiederhake marked this pull request as draft February 28, 2024 11:24
@BenWiederhake
Copy link
Collaborator Author

Good news: The test failed exactly in this CI run.

Bad news: Derp, I'm an idiot, of course unable to open 'xbm'; aborting is not a panic, so setting RUST_BACKTRACE=1 does absolutely nothing. I'll create a new PR if/when I have a better idea how to test this.

@BenWiederhake BenWiederhake deleted the dev-split-verbose-test branch February 28, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant