Skip to content

allow building multicall binary as dynamic library #7928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jtracey
Copy link
Contributor

@jtracey jtracey commented May 13, 2025

It was noted in the latest Ubuntu post that they were having problems with AppArmor profiles on the multicall binary, and listed a possible solution of

Build coreutils into a dynamic library that simply exposes coreutils_main() and then call that from tiny wrapper binaries

I decided to see how much work that would be, and it turns out to not be too bad. Basically, I just moved the multicall binary logic into a top-level library, and made the multicall binary link to it, either statically (i.e., normal Rust linking) or dynamically, depending on whether the "dynamic" feature is enabled. On my Debian system with --release, the normal coreutils (i.e., the default multicall binary) is ~14M, while when dynamically linking, libcoreutils.so is ~26M and coreutils (i.e., the minimal shim that loads and executes the shared library) is 381K.

This allows creating individual binaries with different profiles by just copying the dynamically linked multicall binary with the name of each respective utility. It's only marginally better for security though, since the behavior of the shim executable is identical to the full multicall binary, meaning someone could just modify argv[0] to get the coreutil foo to run with the AppArmor profile of coreutil bar, or exploit a confused deputy with setuid, etc. There are a few possible solutions to that (for a future PR), with the main gist being to get each util's (uu)main dynamically linked when the dynamic feature is enabled. Another thing I'm putting off is proper Windows support in the documentation and CICD. I don't know as much about Windows/.dlls as I do Unix shared libraries, and don't have a local Windows dev env right now, so I'm not testing it at all yet, but in theory it should work.

Some things I'd welcome any feedback on:

  • do we want this: Seems simple enough to support, but I'm open to arguments otherwise. There's also possible arguments for doing this in a fundamentally different way, such as writing the shim in a few lines of C.
  • library location: Is relying on default dynamic loader search behavior good for our use case? The libloading docs recommend against it, saying a specific path should be preferred to avoid platform-specific quirks, but this seems wrong to me, since there's a reason different distros and platforms want to say for themselves where libraries should be located, and LD_LIBRARY_PATH works fine for local configurations.
  • CICD tests: I'd rather not include yet another entire test run for each platform, when ostensibly the behavior should be nearly identical to the multicall binary, but not testing at all seems liable to cause breakage. The middle ground I chose is to build and run the dynamically linked multicall binary on the compatible Unixes (seems like Rust doesn't support .so generation with musl yet), to ensure it loads fine, but not run the tests with it. On the other hand, the default test suite is rarely the bottleneck in CICD, so we could run it if we think that overhead is acceptable.
  • naming conventions: Feel free to nit the name of the library, feature, etc.

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@jtracey
Copy link
Contributor Author

jtracey commented May 13, 2025

Because the diff is deceptively large (git has no internal notion of moved files, so the "move + creating a new file at the old location" looks like two large edits instead), here's the actual main difference, from the output of git diff main:src/bin/coreutils.rs dynamic-multicall:src/lib/coreutils.rs, for simpler reviewing:

diff --git a/src/bin/coreutils.rs b/src/lib/coreutils.rs
index b29e7ea23..68da02af1 100644
--- a/src/bin/coreutils.rs
+++ b/src/lib/coreutils.rs
@@ -20,7 +20,11 @@ const VERSION: &str = env!("CARGO_PKG_VERSION");
 include!(concat!(env!("OUT_DIR"), "/uutils_map.rs"));
 
 fn usage<T>(utils: &UtilityMap<T>, name: &str) {
-    println!("{name} {VERSION} (multi-call binary)\n");
+    #[cfg(not(feature = "dynamic"))]
+    let kind = "binary";
+    #[cfg(feature = "dynamic")]
+    let kind = "dynamic library";
+    println!("{name} {VERSION} (multi-call {kind})\n");
     println!("Usage: {name} [function [arguments...]]");
     println!("       {name} --list\n");
     println!("Options:");
@@ -51,7 +55,7 @@ fn name(binary_path: &Path) -> Option<&str> {
 }
 
 #[allow(clippy::cognitive_complexity)]
-fn main() {
+pub fn multicall_main() {
     uucore::panic::mute_sigpipe_panic();
 
     let utils = util_map();
@@ -239,3 +243,9 @@ fn gen_coreutils_app<T: uucore::Args>(util_map: &UtilityMap<T>) -> Command {
     }
     command
 }
+
+#[cfg(feature = "dynamic")]
+#[unsafe(no_mangle)]
+pub extern "Rust" fn coreutils_multicall_main_wrapper() {
+    multicall_main();
+}

The new src/bin/coreutils.rs can then be directly viewed for review.

@sylvestre
Copy link
Contributor

isn't it just a workaround to bypass limitations of apparmor ? :)

@jtracey
Copy link
Contributor Author

jtracey commented May 14, 2025

Right now, there's not a lot of advantage to dynamic linking. This is mainly a stepping stone to individual shim applications that would link against the one shared library. Once that exists, it's not so much a limitation of apparmor that's being addressed as the entire idea of restricting the capabilities of individual utilities. The multicall binary will never be able to do that, since there is no secure mechanism of figuring out how it was invoked (argv[0] is trivial to manipulate).

@Ecordonnier
Copy link
Contributor

Ecordonnier commented May 27, 2025

If you are searching for inspiration, you can check how this is handled for gnu coreutils and busybox in ubuntu. It probably makes sense to handle it the same way for uutils coreutils.

I know that in yocto meta-selinux, there are two busybox shell scripts which are one-liner files with different selinux labels: busybox, and busybox.nosuid. However their current implementation has the drawback of swallowing arg[0].

Also note that I fixed the build issue with musl in my PR "stdbuf: fix cross-compilation".

@jtracey
Copy link
Contributor Author

jtracey commented May 27, 2025

If you are searching for inspiration, you can check how this is handled for gnu coreutils and busybox in ubuntu. It probably makes sense to handle it the same way for uutils coreutils.

The GNU coreutils in Ubuntu don't use the multi-call implementation, and it sounds like busybox is just given an extremely broad profile. The problem is that the individual uutils are much larger than the individual coreutils, and Ubuntu don't want to weaken their profiles when migrating. I suspect one reason the set of individual GNU utils is so much smaller is they can more heavily rely on direct invocation of the dynamically linked libc (e.g., printf is a lot simpler if you can just call printf()), so in some ways, this PR makes it more of a fair comparison.

I know that in yocto meta-selinux, there are two busybox shell scripts which are one-liner files with different selinux labels: busybox, and busybox.nosuid. However their current implementation has the drawback of swallowing arg[0].

This sounds similar to the other proposed solution by Ubuntu:

Build tiny wrapper binaries that ensure the coreutils binary is called with the right first argument, that is, /usr/bin/rm essentially does argv[0] = “rm”; execv(“/usr/bin/coreutils”, argv) - this ensures that an apparmor profile that does e.g. /usr/bin/rm Ux works safely, but doesn’t necessarily work for other things.

That works, as long as you're careful setting up the profiles, but swallows argv[0] as you say. Linking as a dynamic library also means we don't need exec to invoke any of the utilities, allowing tighter profiles. I would have thought that would also improve performance, but it looks like one or more of ld.so, Rust's FFI invocation, or Rust's main invocation isn't quite as fast as execve in C (results are from a Debian 12 VM on Xen, C code compiled with gcc -O3):

#include <unistd.h>

int main() {
  char* path = "/home/user/Documents/coreutils/target/release/coreutils.static";
  char* argv[2] = {"true", NULL};
  char* env[1] = { NULL };
  return execve(path, argv, env);
}
$ hyperfine -N -m 5000 -w 100 '/usr/bin/true' '/home/user/Documents/coreutils/target/release/true' '/home/user/Documents/coreutils/target/release/coreutils.static true' '/home/user/Documents/coreutils/target/release/coreutils.dynamic true' /home/user/Documents/sandbox/a.out 
Benchmark 1: /usr/bin/true
  Time (mean ± σ):       1.1 ms ±   0.5 ms    [User: 0.6 ms, System: 0.4 ms]
  Range (min … max):     0.3 ms …   5.8 ms    5000 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: /home/user/Documents/coreutils/target/release/true
  Time (mean ± σ):       1.5 ms ±   0.7 ms    [User: 0.7 ms, System: 0.7 ms]
  Range (min … max):     0.5 ms …   6.2 ms    5000 runs
 
Benchmark 3: /home/user/Documents/coreutils/target/release/coreutils.static true
  Time (mean ± σ):       5.1 ms ±   1.9 ms    [User: 1.1 ms, System: 3.7 ms]
  Range (min … max):     1.6 ms …  12.4 ms    5000 runs
 
Benchmark 4: /home/user/Documents/coreutils/target/release/coreutils.dynamic true
  Time (mean ± σ):       7.2 ms ±   2.6 ms    [User: 2.5 ms, System: 4.3 ms]
  Range (min … max):     2.4 ms …  16.4 ms    5000 runs
 
Benchmark 5: /home/user/Documents/sandbox/a.out
  Time (mean ± σ):       5.3 ms ±   2.0 ms    [User: 1.2 ms, System: 3.8 ms]
  Range (min … max):     1.8 ms …  13.5 ms    5000 runs
 
Summary
  /usr/bin/true ran
    1.36 ± 0.88 times faster than /home/user/Documents/coreutils/target/release/true
    4.52 ± 2.75 times faster than /home/user/Documents/coreutils/target/release/coreutils.static true
    4.66 ± 2.84 times faster than /home/user/Documents/sandbox/a.out
    6.38 ± 3.82 times faster than /home/user/Documents/coreutils/target/release/coreutils.dynamic true

To summarize: individual GNU and uutils true take about 1.5ms, the static multi-call adds about +4ms to that, whether invoked with the C exec or directly, vs. about +6ms for the dynamic multi-call (all marginal in most real-world invocations, of course, and much faster than using literal shell scripts as wrappers).

@Ecordonnier
Copy link
Contributor

In the case of uutils-coreutils, I guess swallowing arg[0] is not an issue. I know in the case of busybox this breaks a corner-case with "sh" where you can change arg[0] and pass a dash as arg[0] to make it a login shell, but I don't think there is such a corner-case in uutils-coreutils.

@julian-klode
Copy link

I was looking at the options and i believe that the dynamic linking approach is the best one to get AppArmor profiles (or selinux ones I suppose) working correctly.

The problem with the multicall binary even with the wrapper binaries is that you still need to allow the multi-call binary to be run; i.e. you allow factor and coreutils to run in inherited profile (or factor in unconfined).

For the performance I think a bunch of that is down to relocations of the larger code base?

What's then missing is generating tiny individual binaries that swallow argv[0] and insert the hardcoded one. Essentially my thought was the library exposes the C-style main() function such that a wrapper essentially just swaps the arguments and jumps to that; I don't know how to minimally express that in Rust - optimally you avoid any Rust libstd in there to get the size really down to the bare minimum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants