pinky: fix off-by-one in GECOS parsing #3155

jtracey · 2022-02-18T06:08:54Z

This lets the test pass when running cargo test --features unix on my system, a Xubuntu 21.10 install. Without this, the output includes the trailing comma. I am a bit cautious though, since it seems unusual that nobody noticed this in the 6 years since this code was added -- is it possible there's something system-specific going on? Also, all the CI tests pass before and after this change, does the CI not run the Unix-specific tests?

sylvestre · 2022-02-18T08:59:07Z

Could you please add a test to make sure we don't regress in the future?
thanks

jtracey · 2022-02-18T17:02:52Z

Sorry, I should have made my post clearer. This line is tested, in the test_no_flag() pinky test. The test is the only test that fails when I run cargo test --features unix on my system (well, a logname test also fails if I run outside a direct tty, but that's a xfce bug, GNU's logname fails too). The git blame shows this code was unchanged for the last 6 years, but I now realize the test was added 9 months ago, in #2210. The CI flagged it as untested then too, but this was glossed over, probably under the assumption that it was buggy symbol tracking or something. Given that I don't think this test should have ever passed (because of the pinky code I change here, the test itself is good), I think this is actually because of my original theory: none of the CI tests (and apparently, none of us 😉 ) are testing with --features unix. Should I try to fix that in this PR, or file a new one?

sylvestre · 2022-02-21T18:09:50Z

yeah, please try to run the tests too :)
many thanks

jtracey · 2022-02-23T03:43:10Z

Okay, so I think I figured out the problem with the tests. They are actually being run in the CI, but weren't really doing much because 1. most of the tests that specify an account were specifying the root acount, which has no GECOS fields, and more importantly, 2. the environment is running the commands from an account without logging into it, so there is no utmp data. Pinky can't run most of its code if there are no users in the utmp file, and needs something in the GECOS fields to print something meaningful in that part of the output.

I added the default GH Action user to one of the tests, and modified the linux portions of the CI to hack around the two above issues:

We add a name field to the GECOS fields of any lines of the passwd file where there are GECOS fields, but name is empty. This is currently just the default "runner" account, but it should be harmless if other accounts are added later.
We dump the utmp file, add a plausible line for what it would look like if the runner account had logged in, and reverse dump it back into the file. This is not something you're supposed to do, since the text format has no stability guarantees, as documented in the man page for utmpdump, but it hasn't been touched since 2014, and worst case scenario will show up as reduced coverage to where we are prior this PR if it ever breaks.

This caught a similar bug in another part of the code, I've folded the fix into the first commit. The coverage should be pretty good now, but a network blip prevented the codecov action from running. Someone with access should restart it before merging, to make sure it actually worked (or I'll rebase if I notice new commits on main first).

sylvestre · 2022-02-23T08:43:25Z

.github/workflows/CICD.yml

@@ -595,17 +595,6 @@ jobs:
          *-pc-windows-msvc) STRIP="" ;;
        esac;
        outputs STRIP
-    - name: Install/setup prerequisites


why did you remove this?

It's duplicated a few lines below, where I modify it with the additional lines. I believe 39a6e6c was supposed to move it but instead copied it. If I don't remove it, I'd need to choose which to modify, or modify both, in any case becoming a likely spot for bit rot.

Yes, mistakenly left in... deletion should be fine.
Modify the second copy.

sylvestre · 2022-02-23T08:43:49Z

.github/workflows/CICD.yml

@@ -624,6 +613,13 @@ jobs:
        case '${{ matrix.job.os }}' in
          macos-latest) brew install coreutils ;; # needed for testing
        esac
+        case '${{ matrix.job.os }}' in
+          *linux*)
+            sudo sed -i 's/:,/:runner name,/' /etc/passwd


please add a comment explaining why you are doing that

jtracey · 2022-02-23T18:28:18Z

Oh I should also mention: I have a workflow run that clearly shows what these hacks are doing, and why they're necessary. (N.b. the commands there are using the built-in, GNU pinky.)

rivy · 2022-02-23T19:27:41Z

.github/workflows/CICD.yml

+        case '${{ matrix.job.os }}' in
+          *linux*)
+            sudo sed -i 's/:,/:runner name,/' /etc/passwd  # add a full name each account with a gecos field but no full name
+            FAKE_UTMP='[7] [999999] [tty2] [runner] [tty2] [] [0.0.0.0] [2022-02-22T22:22:22,222222+00:00]'
+            (utmpdump /var/run/utmp ; echo $FAKE_UTMP) | sudo utmpdump -r -o /var/run/utmp  # create a fake login entry for the GH runner
+            ;; # helps pinky tests
+        esac


I'm getting some understanding from comments and variable naming, but this is a little opaque to me.
Could you document a little more "why" for this block (locally), expanding on "helps pinky tests"? Maybe just a couple of explanatory lines?
I think it'll help future me (and maybe future you) understand why it helps and if it can be and how it should be modified in the future.

Thanks.

I've updated the comment, how does it look now?

rivy · 2022-02-23T19:42:44Z

@jtracey : "I think this is actually because of my original theory: none of the CI tests (and apparently, none of us 😉 ) are testing with --features unix. Should I try to fix that in this PR, or file a new one?"

Hmm, 🤔; we're testing "--features feat_os_unix" which is a direct feature alias in Cargo.toml and therefore should be synonymous with "--features unix"...

Cargo.toml

[features]
default = [ "feat_common_core" ]
## OS feature shortcodes
macos = [ "feat_os_macos" ]
unix = [ "feat_os_unix" ]
windows = [ "feat_os_windows" ]
# ...

CICD.yml

  build:
    name: Build
    needs: [ min_version, deps ]
    runs-on: ${{ matrix.job.os }}
    strategy:
      fail-fast: false
      matrix:
        job:
          # { os , target , cargo-options , features , use-cross , toolchain }
          - { os: ubuntu-latest  , target: arm-unknown-linux-gnueabihf, features: feat_os_unix_gnueabihf, use-cross: use-cross, }
          - { os: ubuntu-latest  , target: aarch64-unknown-linux-gnu   , features: feat_os_unix_gnueabihf , use-cross: use-cross }
          - { os: ubuntu-latest  , target: x86_64-unknown-linux-gnu    , features: feat_os_unix           , use-cross: use-cross }
          # - { os: ubuntu-latest  , target: x86_64-unknown-linux-gnu    , features: feat_selinux           , use-cross: use-cross }
          # - { os: ubuntu-18.04   , target: i586-unknown-linux-gnu      , features: feat_os_unix           , use-cross: use-cross } ## note: older windows platform; not required, dev-FYI only
          # - { os: ubuntu-18.04   , target: i586-unknown-linux-gnu      , features: feat_os_unix           , use-cross: use-cross } ## note: older windows platform; not required, dev-FYI only
          - { os: ubuntu-18.04   , target: i686-unknown-linux-gnu      , features: feat_os_unix           , use-cross: use-cross }
          - { os: ubuntu-18.04   , target: i686-unknown-linux-musl     , features: feat_os_unix_musl      , use-cross: use-cross }
          - { os: ubuntu-18.04   , target: x86_64-unknown-linux-gnu    , features: feat_os_unix           , use-cross: use-cross }
          - { os: ubuntu-18.04   , target: x86_64-unknown-linux-musl   , features: feat_os_unix_musl      , use-cross: use-cross }
          - { os: macos-latest   , target: x86_64-apple-darwin         , features: feat_os_macos }
          - { os: windows-latest , target: i686-pc-windows-gnu         , features: feat_os_windows }
          - { os: windows-latest , target: i686-pc-windows-msvc        , features: feat_os_windows }
          - { os: windows-latest , target: x86_64-pc-windows-gnu       , features: feat_os_windows } ## note: requires rust >= 1.43.0 to link correctly
          - { os: windows-latest , target: x86_64-pc-windows-msvc      , features: feat_os_windows }
# ...

Have you seen differences between the use of the two features?

jtracey · 2022-02-23T20:30:07Z

@rivy Yeah I was mistaken about the tests not being run in the CI, they are, just in an environment weird enough that pinky was basically untested. When you run in an actual, full Unix-like system, at least one user is going to be logged in, and typically have at least one gecos field (indeed, some of the "users" in a /etc/passwd sense on the Linux CI do, but none are "users" from a utmp perspective). So on all the systems I tested on, the unix tests failed on pinky, leading me to assume they weren't being run in the CI, when they were really just passing on the vacuous test of "no information to print".

The workflow run I mentioned is the clearest place to see what's going on, but the tl;dr is $ pinky goes from

Login    Name                 TTY      Idle   When             Where

to

Login    Name                 TTY      Idle   When             Where
runner   runner name          tty2            2022-02-22 22:22

where runner name would have shown up as runner name, prior to the bugfix.

rivy · 2022-02-23T21:42:15Z

So, just to be clear, GitHub is one of the "limited" *nix environments that doesn't have enough info being used/logged to actually test pinky?

If that's correct, just add a quick note to that effect in a leading comment.
Maybe, something like:

# GHA VMs (as limited environments) need extra configuration for some utility tests (eg, `pinky`)
# * add GECOS info to the known GHA user `runner`
#...
# * add a faked login entry into `/var/run/utmp`
#...

jtracey · 2022-02-23T23:50:58Z

Sorry, I didn't mean to say that the VMs are in some essential way lesser *nix environments, it's more that they're configured to run in unusual ways. I don't know how the commands are run, but it's probably something like launching from the root init with sudo; there's just nobody logged in, and the runner account (the account that runs the action commands) doesn't have any of the info we need to fully test -- info that would almost always get added when creating a new login user.

How does this work for a comment:

case '${{ matrix.job.os }}' in
  *linux*)
    # pinky is a tool to show logged-in users from utmp, and gecos fields from /etc/passwd.
    # In GitHub Action *nix VMs, no accounts log in, even the "runner" account that runs the commands. The account also has empty gecos fields.
    # To work around this for pinky tests, we create a fake login entry for the GH runner account...
    FAKE_UTMP='[7] [999999] [tty2] [runner] [tty2] [] [0.0.0.0] [2022-02-22T22:22:22,222222+00:00]'
    # ... by dumping the login records, adding our fake line, then reverse dumping ...
    (utmpdump /var/run/utmp ; echo $FAKE_UTMP) | sudo utmpdump -r -o /var/run/utmp
    # ... and add a full name to each account with a gecos field but no full name.
    sudo sed -i 's/:,/:runner name,/' /etc/passwd
    ;;
esac

tests/by-util/test_pinky.rs

.github/workflows/CICD.yml

jtracey · 2022-03-05T18:32:13Z

Some more good/bad news: I found out why the extra runner steps weren't working (bad regex for matching the os), and now that it's fixed, it seems to be catching more bugs in pinky, as well as some bugs in who.

jtracey · 2022-03-07T00:10:26Z

I fixed the additional pinky bug (the code to read files was filling a buffer with null values that were being invisibly printed), and I believe identified the cause of the bugs in users and who (#3219). For now, I've disabled those tests, since fixing them seems out of scope for this PR.

sylvestre · 2022-03-07T19:26:29Z

I am not a bit fan of ignoring that many tests :/

jtracey · 2022-03-07T20:55:48Z

Hmm, what would you like to see then? The tests are good tests, my tweaks to the environment uncovered a bug they were finding, but that bug is unrelated to the original point of this PR (fixing the off-by-one error in GECOS parsing). Ignoring the tests annotates them as "we know there's a bug here", leaving them as-is gives the impression that they're testing behavior and passing when in reality, they're just being run in an environment that doesn't exercise relevant code paths.

We could:

merge as is, and track the ignored tests in users and who: need to check the user's login shell is alive #3219
block this PR on users and who: need to check the user's login shell is alive #3219
unstage the changes to CICD.yaml, open a new PR for it, and leave pinky and the buggy parts of who and users untested until it gets merged

For some additional context, the reason I'm filing this PR is because it's blocking another, larger PR I plan to file that fixes the broken Android builds. Among other problems, there is a memory safety bug that my testing on Android uncovered: a syscall uucore::entries relies on can sometimes return null pointers, and does so on Android. Fixing that means some of the structures uucore::entries provides need to change to Option types, which causes a bit of code churn that should be tested. I can't show that the existing behavior still works for all utils on existing platforms though, because the existing behavior with pinky has the off-by-one error, causing the relevant pinky test to fail anyway. I filed this hoping it would be a quick fix that I could rebase that branch on, better demonstrating no regressions, but the issue dependency stack has grown pretty deep...

android builds -> uucore::entries memory safety bug fix -> failing pinky test -> GECOS off by one error -> CI isn't testing pinky -> adding login entries to CI runner -> failing users and who tests -> #3219 ;)

So I'm fine with any of the above three options (or others that I'm perhaps missing), but I'd like to fix the memory safety bug sooner than later, and if we go for the latter two options, the fix won't be fully tested -- perhaps a tradeoff worth making, I just wanted to let you know where this is going.

sylvestre · 2022-03-11T22:52:09Z

@tertsdiepraam wdyt?

tertsdiepraam · 2022-03-17T09:35:09Z

I agree with @jtracey that we should ignore the tests for now. Hopefully, we'll be able to enable them again soon.

jtracey force-pushed the gecos-off-by-one branch 2 times, most recently from 74bf71a to d2aee2f Compare February 23, 2022 00:39

sylvestre reviewed Feb 23, 2022

View reviewed changes

jtracey force-pushed the gecos-off-by-one branch from d2aee2f to 8ef5838 Compare February 23, 2022 18:04

rivy requested changes Feb 23, 2022

View reviewed changes

jtracey force-pushed the gecos-off-by-one branch from 8ef5838 to 42b7c68 Compare February 23, 2022 20:29

jtracey force-pushed the gecos-off-by-one branch from 42b7c68 to 0339a43 Compare February 24, 2022 20:20

sylvestre reviewed Feb 25, 2022

View reviewed changes

tests/by-util/test_pinky.rs Outdated Show resolved Hide resolved

jtracey force-pushed the gecos-off-by-one branch 5 times, most recently from d814494 to 8485e70 Compare February 28, 2022 17:59

jtracey commented Feb 28, 2022

View reviewed changes

.github/workflows/CICD.yml Show resolved Hide resolved

sylvestre requested a review from rivy March 3, 2022 21:55

jtracey force-pushed the gecos-off-by-one branch 2 times, most recently from b62efb6 to 98d7b86 Compare March 5, 2022 18:29

jtracey mentioned this pull request Mar 6, 2022

users and who: need to check the user's login shell is alive #3219

Open

4 tasks

jtracey force-pushed the gecos-off-by-one branch from 98d7b86 to c6e95c9 Compare March 6, 2022 23:35

jtracey force-pushed the gecos-off-by-one branch from c6e95c9 to fea0efe Compare March 7, 2022 05:33

jtracey added 4 commits March 7, 2022 12:32

pinky: fix off-by-one in GECOS parsing

9430a9f

pinky: improve tests

f52f655

pinky: simplify and debug printing files

644c99e

users and who: ignore failing tests for now

687dcae

jtracey force-pushed the gecos-off-by-one branch from fea0efe to 687dcae Compare March 7, 2022 17:32

sylvestre requested a review from tertsdiepraam March 16, 2022 11:35

tertsdiepraam approved these changes Mar 17, 2022

View reviewed changes

sylvestre merged commit a5948ce into uutils:main Mar 17, 2022

This was referenced May 2, 2024

date: fix interaction of flags, fix issues around --set #6142

Draft

Changed /sys/kernel/address_bits to /sys/kernel/profiling in test_cp #6294

Merged

Uh oh!

pinky: fix off-by-one in GECOS parsing #3155

pinky: fix off-by-one in GECOS parsing #3155

Uh oh!

Conversation

jtracey commented Feb 18, 2022

Uh oh!

sylvestre commented Feb 18, 2022

Uh oh!

jtracey commented Feb 18, 2022

Uh oh!

sylvestre commented Feb 21, 2022

Uh oh!

jtracey commented Feb 23, 2022

Uh oh!

sylvestre Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

jtracey Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

rivy Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

sylvestre Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

jtracey Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

jtracey commented Feb 23, 2022

Uh oh!

rivy Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

jtracey Feb 23, 2022

Choose a reason for hiding this comment

Uh oh!

rivy commented Feb 23, 2022

Uh oh!

jtracey commented Feb 23, 2022

Uh oh!

rivy commented Feb 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jtracey commented Feb 23, 2022

Uh oh!

Uh oh!

Uh oh!

jtracey commented Mar 5, 2022

Uh oh!

jtracey commented Mar 7, 2022

Uh oh!

sylvestre commented Mar 7, 2022

Uh oh!

jtracey commented Mar 7, 2022

Uh oh!

sylvestre commented Mar 11, 2022

Uh oh!

tertsdiepraam commented Mar 17, 2022

Uh oh!

Uh oh!

rivy commented Feb 23, 2022 •

edited

Loading