Skip to content

pinky: fix off-by-one in GECOS parsing #3155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 17, 2022
Merged

Conversation

jtracey
Copy link
Contributor

@jtracey jtracey commented Feb 18, 2022

This lets the test pass when running cargo test --features unix on my system, a Xubuntu 21.10 install. Without this, the output includes the trailing comma. I am a bit cautious though, since it seems unusual that nobody noticed this in the 6 years since this code was added -- is it possible there's something system-specific going on? Also, all the CI tests pass before and after this change, does the CI not run the Unix-specific tests?

@sylvestre
Copy link
Contributor

Could you please add a test to make sure we don't regress in the future?
thanks

@jtracey
Copy link
Contributor Author

jtracey commented Feb 18, 2022

Sorry, I should have made my post clearer. This line is tested, in the test_no_flag() pinky test. The test is the only test that fails when I run cargo test --features unix on my system (well, a logname test also fails if I run outside a direct tty, but that's a xfce bug, GNU's logname fails too). The git blame shows this code was unchanged for the last 6 years, but I now realize the test was added 9 months ago, in #2210. The CI flagged it as untested then too, but this was glossed over, probably under the assumption that it was buggy symbol tracking or something. Given that I don't think this test should have ever passed (because of the pinky code I change here, the test itself is good), I think this is actually because of my original theory: none of the CI tests (and apparently, none of us 😉 ) are testing with --features unix. Should I try to fix that in this PR, or file a new one?

@sylvestre
Copy link
Contributor

yeah, please try to run the tests too :)
many thanks

@jtracey jtracey force-pushed the gecos-off-by-one branch 2 times, most recently from 74bf71a to d2aee2f Compare February 23, 2022 00:39
@jtracey
Copy link
Contributor Author

jtracey commented Feb 23, 2022

Okay, so I think I figured out the problem with the tests. They are actually being run in the CI, but weren't really doing much because 1. most of the tests that specify an account were specifying the root acount, which has no GECOS fields, and more importantly, 2. the environment is running the commands from an account without logging into it, so there is no utmp data. Pinky can't run most of its code if there are no users in the utmp file, and needs something in the GECOS fields to print something meaningful in that part of the output.

I added the default GH Action user to one of the tests, and modified the linux portions of the CI to hack around the two above issues:

  • We add a name field to the GECOS fields of any lines of the passwd file where there are GECOS fields, but name is empty. This is currently just the default "runner" account, but it should be harmless if other accounts are added later.
  • We dump the utmp file, add a plausible line for what it would look like if the runner account had logged in, and reverse dump it back into the file. This is not something you're supposed to do, since the text format has no stability guarantees, as documented in the man page for utmpdump, but it hasn't been touched since 2014, and worst case scenario will show up as reduced coverage to where we are prior this PR if it ever breaks.

This caught a similar bug in another part of the code, I've folded the fix into the first commit. The coverage should be pretty good now, but a network blip prevented the codecov action from running. Someone with access should restart it before merging, to make sure it actually worked (or I'll rebase if I notice new commits on main first).

@@ -595,17 +595,6 @@ jobs:
*-pc-windows-msvc) STRIP="" ;;
esac;
outputs STRIP
- name: Install/setup prerequisites
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you remove this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's duplicated a few lines below, where I modify it with the additional lines. I believe 39a6e6c was supposed to move it but instead copied it. If I don't remove it, I'd need to choose which to modify, or modify both, in any case becoming a likely spot for bit rot.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, mistakenly left in... deletion should be fine.
Modify the second copy.

@@ -624,6 +613,13 @@ jobs:
case '${{ matrix.job.os }}' in
macos-latest) brew install coreutils ;; # needed for testing
esac
case '${{ matrix.job.os }}' in
*linux*)
sudo sed -i 's/:,/:runner name,/' /etc/passwd
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a comment explaining why you are doing that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jtracey
Copy link
Contributor Author

jtracey commented Feb 23, 2022

Oh I should also mention: I have a workflow run that clearly shows what these hacks are doing, and why they're necessary. (N.b. the commands there are using the built-in, GNU pinky.)

Comment on lines 616 to 635
case '${{ matrix.job.os }}' in
*linux*)
sudo sed -i 's/:,/:runner name,/' /etc/passwd # add a full name each account with a gecos field but no full name
FAKE_UTMP='[7] [999999] [tty2] [runner] [tty2] [] [0.0.0.0] [2022-02-22T22:22:22,222222+00:00]'
(utmpdump /var/run/utmp ; echo $FAKE_UTMP) | sudo utmpdump -r -o /var/run/utmp # create a fake login entry for the GH runner
;; # helps pinky tests
esac
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting some understanding from comments and variable naming, but this is a little opaque to me.
Could you document a little more "why" for this block (locally), expanding on "helps pinky tests"? Maybe just a couple of explanatory lines?
I think it'll help future me (and maybe future you) understand why it helps and if it can be and how it should be modified in the future.

Thanks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the comment, how does it look now?

@rivy
Copy link
Member

rivy commented Feb 23, 2022

@jtracey : "I think this is actually because of my original theory: none of the CI tests (and apparently, none of us 😉 ) are testing with --features unix. Should I try to fix that in this PR, or file a new one?"

Hmm, 🤔; we're testing "--features feat_os_unix" which is a direct feature alias in Cargo.toml and therefore should be synonymous with "--features unix"...

Cargo.toml

[features]
default = [ "feat_common_core" ]
## OS feature shortcodes
macos = [ "feat_os_macos" ]
unix = [ "feat_os_unix" ]
windows = [ "feat_os_windows" ]
# ...

CICD.yml

  build:
    name: Build
    needs: [ min_version, deps ]
    runs-on: ${{ matrix.job.os }}
    strategy:
      fail-fast: false
      matrix:
        job:
          # { os , target , cargo-options , features , use-cross , toolchain }
          - { os: ubuntu-latest  , target: arm-unknown-linux-gnueabihf, features: feat_os_unix_gnueabihf, use-cross: use-cross, }
          - { os: ubuntu-latest  , target: aarch64-unknown-linux-gnu   , features: feat_os_unix_gnueabihf , use-cross: use-cross }
          - { os: ubuntu-latest  , target: x86_64-unknown-linux-gnu    , features: feat_os_unix           , use-cross: use-cross }
          # - { os: ubuntu-latest  , target: x86_64-unknown-linux-gnu    , features: feat_selinux           , use-cross: use-cross }
          # - { os: ubuntu-18.04   , target: i586-unknown-linux-gnu      , features: feat_os_unix           , use-cross: use-cross } ## note: older windows platform; not required, dev-FYI only
          # - { os: ubuntu-18.04   , target: i586-unknown-linux-gnu      , features: feat_os_unix           , use-cross: use-cross } ## note: older windows platform; not required, dev-FYI only
          - { os: ubuntu-18.04   , target: i686-unknown-linux-gnu      , features: feat_os_unix           , use-cross: use-cross }
          - { os: ubuntu-18.04   , target: i686-unknown-linux-musl     , features: feat_os_unix_musl      , use-cross: use-cross }
          - { os: ubuntu-18.04   , target: x86_64-unknown-linux-gnu    , features: feat_os_unix           , use-cross: use-cross }
          - { os: ubuntu-18.04   , target: x86_64-unknown-linux-musl   , features: feat_os_unix_musl      , use-cross: use-cross }
          - { os: macos-latest   , target: x86_64-apple-darwin         , features: feat_os_macos }
          - { os: windows-latest , target: i686-pc-windows-gnu         , features: feat_os_windows }
          - { os: windows-latest , target: i686-pc-windows-msvc        , features: feat_os_windows }
          - { os: windows-latest , target: x86_64-pc-windows-gnu       , features: feat_os_windows } ## note: requires rust >= 1.43.0 to link correctly
          - { os: windows-latest , target: x86_64-pc-windows-msvc      , features: feat_os_windows }
# ...

Have you seen differences between the use of the two features?

@jtracey
Copy link
Contributor Author

jtracey commented Feb 23, 2022

@rivy Yeah I was mistaken about the tests not being run in the CI, they are, just in an environment weird enough that pinky was basically untested. When you run in an actual, full Unix-like system, at least one user is going to be logged in, and typically have at least one gecos field (indeed, some of the "users" in a /etc/passwd sense on the Linux CI do, but none are "users" from a utmp perspective). So on all the systems I tested on, the unix tests failed on pinky, leading me to assume they weren't being run in the CI, when they were really just passing on the vacuous test of "no information to print".

The workflow run I mentioned is the clearest place to see what's going on, but the tl;dr is $ pinky goes from

Login    Name                 TTY      Idle   When             Where

to

Login    Name                 TTY      Idle   When             Where
runner   runner name          tty2            2022-02-22 22:22

where runner name would have shown up as runner name, prior to the bugfix.

@rivy
Copy link
Member

rivy commented Feb 23, 2022

So, just to be clear, GitHub is one of the "limited" *nix environments that doesn't have enough info being used/logged to actually test pinky?

If that's correct, just add a quick note to that effect in a leading comment.
Maybe, something like:

# GHA VMs (as limited environments) need extra configuration for some utility tests (eg, `pinky`)
# * add GECOS info to the known GHA user `runner`
#...
# * add a faked login entry into `/var/run/utmp`
#...

@jtracey
Copy link
Contributor Author

jtracey commented Feb 23, 2022

Sorry, I didn't mean to say that the VMs are in some essential way lesser *nix environments, it's more that they're configured to run in unusual ways. I don't know how the commands are run, but it's probably something like launching from the root init with sudo; there's just nobody logged in, and the runner account (the account that runs the action commands) doesn't have any of the info we need to fully test -- info that would almost always get added when creating a new login user.

How does this work for a comment:

case '${{ matrix.job.os }}' in
  *linux*)
    # pinky is a tool to show logged-in users from utmp, and gecos fields from /etc/passwd.
    # In GitHub Action *nix VMs, no accounts log in, even the "runner" account that runs the commands. The account also has empty gecos fields.
    # To work around this for pinky tests, we create a fake login entry for the GH runner account...
    FAKE_UTMP='[7] [999999] [tty2] [runner] [tty2] [] [0.0.0.0] [2022-02-22T22:22:22,222222+00:00]'
    # ... by dumping the login records, adding our fake line, then reverse dumping ...
    (utmpdump /var/run/utmp ; echo $FAKE_UTMP) | sudo utmpdump -r -o /var/run/utmp
    # ... and add a full name to each account with a gecos field but no full name.
    sudo sed -i 's/:,/:runner name,/' /etc/passwd
    ;;
esac

@jtracey jtracey force-pushed the gecos-off-by-one branch 5 times, most recently from d814494 to 8485e70 Compare February 28, 2022 17:59
@sylvestre sylvestre requested a review from rivy March 3, 2022 21:55
@jtracey jtracey force-pushed the gecos-off-by-one branch 2 times, most recently from b62efb6 to 98d7b86 Compare March 5, 2022 18:29
@jtracey
Copy link
Contributor Author

jtracey commented Mar 5, 2022

Some more good/bad news: I found out why the extra runner steps weren't working (bad regex for matching the os), and now that it's fixed, it seems to be catching more bugs in pinky, as well as some bugs in who.

@jtracey
Copy link
Contributor Author

jtracey commented Mar 7, 2022

I fixed the additional pinky bug (the code to read files was filling a buffer with null values that were being invisibly printed), and I believe identified the cause of the bugs in users and who (#3219). For now, I've disabled those tests, since fixing them seems out of scope for this PR.

@jtracey jtracey force-pushed the gecos-off-by-one branch from c6e95c9 to fea0efe Compare March 7, 2022 05:33
@jtracey jtracey force-pushed the gecos-off-by-one branch from fea0efe to 687dcae Compare March 7, 2022 17:32
@sylvestre
Copy link
Contributor

I am not a bit fan of ignoring that many tests :/

@jtracey
Copy link
Contributor Author

jtracey commented Mar 7, 2022

Hmm, what would you like to see then? The tests are good tests, my tweaks to the environment uncovered a bug they were finding, but that bug is unrelated to the original point of this PR (fixing the off-by-one error in GECOS parsing). Ignoring the tests annotates them as "we know there's a bug here", leaving them as-is gives the impression that they're testing behavior and passing when in reality, they're just being run in an environment that doesn't exercise relevant code paths.

We could:

For some additional context, the reason I'm filing this PR is because it's blocking another, larger PR I plan to file that fixes the broken Android builds. Among other problems, there is a memory safety bug that my testing on Android uncovered: a syscall uucore::entries relies on can sometimes return null pointers, and does so on Android. Fixing that means some of the structures uucore::entries provides need to change to Option types, which causes a bit of code churn that should be tested. I can't show that the existing behavior still works for all utils on existing platforms though, because the existing behavior with pinky has the off-by-one error, causing the relevant pinky test to fail anyway. I filed this hoping it would be a quick fix that I could rebase that branch on, better demonstrating no regressions, but the issue dependency stack has grown pretty deep...

android builds -> uucore::entries memory safety bug fix -> failing pinky test -> GECOS off by one error -> CI isn't testing pinky -> adding login entries to CI runner -> failing users and who tests -> #3219 ;)

So I'm fine with any of the above three options (or others that I'm perhaps missing), but I'd like to fix the memory safety bug sooner than later, and if we go for the latter two options, the fix won't be fully tested -- perhaps a tradeoff worth making, I just wanted to let you know where this is going.

@sylvestre
Copy link
Contributor

@tertsdiepraam wdyt?

@sylvestre sylvestre requested a review from tertsdiepraam March 16, 2022 11:35
@tertsdiepraam
Copy link
Member

I agree with @jtracey that we should ignore the tests for now. Hopefully, we'll be able to enable them again soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants