Skip to content

Improving basenc / base32 / base64 #5698

@bal-e

Description

@bal-e

I've identified a few issues with the encoding/decoding tools that I'm interested in solving:

  1. They are not iterative / streaming. The entirety of the input is read first before it is all decoded. One the one hand, this allows you to detect errors upfront before you've output anything. But it means that these tools can't be used for iterative computations (e.g. over while loops) and they may consume large amounts of memory. The GNU implementations of these tools are streaming as well, so I'd classify this as an incompatibility bug.

  2. Significant optimization potential is available. I have a fair bit of experience with SIMD and I've designed AVX2 and AVX512 algorithms for Base32 / Base64 encoding - it's difficult but I would like to see if there are performance benefits to it, particularly for large inputs. Optimizations can be added iteratively, and can be feature-gated or even based on runtime CPU feature detection.

  3. Compiled binaries for base32 / base64, on my machine, currently take up 23 MiB in debug mode and 3 MiB in release mode. This is surprisingly large and I believe the majority of that work lies in the argument parsing rather than the actual encoding. Since basenc / base32 / base64 have essentially the same arguments to deal with, they can be combined.

To this end, here's my proposal:

  1. Rewrite src/uucore/src/lib/features/encoding.rs to expose a streaming rather than one-shot interface. Along the way, I can add a sub-module for each encoding format so that per-format optimization routines can be defined. Runtime CPU detection, if implemented, would be format-generic.

  2. Setup basenc as a multicall binary so that base32 / base64 are symlinks to it. The argument parsing for basenc will have to be updated to only expose format-selecting arguments if argv[0] is basenc.

  3. Implement format-specific optimization routines over time, supported by benchmarking. For example, if the user only provides less than 1 KiB of data, a less optimized routine can be used (advanced AVX512 instructions have some interesting effects on CPU clock frequencies, so we should avoid them unless a large amount of input needs to be processed).

It may be preferable that I write a new, independent crate for optimized data encoding and create a dependency on that (rather than the current data-encoding crate); the alternative is to eliminate the external dependency completely.

I'd like to ask for prior approval from the uutils community before I start working on this. Note that task 1 and 2 are effectively independent - while I'm more interested in the first, the second seems more important. Maintainers, what do you think?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions