Improving `basenc` / `base32` / `base64`

I've identified a few issues with the encoding/decoding tools that I'm interested in solving:

1. They are not iterative / streaming.  The entirety of the input is read first before it is all decoded.  One the one hand, this allows you to detect errors upfront before you've output anything.  But it means that these tools can't be used for iterative computations (e.g. over `while` loops) and they may consume large amounts of memory.  The GNU implementations of these tools are streaming as well, so I'd classify this as an incompatibility bug.

2. Significant optimization potential is available.  I have a fair bit of experience with SIMD and I've designed AVX2 and AVX512 algorithms for Base32 / Base64 encoding - it's difficult but I would like to see if there are performance benefits to it, particularly for large inputs.  Optimizations can be added iteratively, and can be feature-gated or even based on runtime CPU feature detection.

3. Compiled binaries for `base32` / `base64`, on my machine, currently take up 23 MiB in debug mode and 3 MiB in release mode.  This is surprisingly large and I believe the majority of that work lies in the argument parsing rather than the actual encoding.  Since `basenc` / `base32` / `base64` have essentially the same arguments to deal with, they can be combined.

To this end, here's my proposal:

1. Rewrite `src/uucore/src/lib/features/encoding.rs` to expose a streaming rather than one-shot interface.  Along the way, I can add a sub-module for each encoding format so that per-format optimization routines can be defined.  Runtime CPU detection, if implemented, would be format-generic.

2. Setup `basenc` as a multicall binary so that `base32` / `base64` are symlinks to it.  The argument parsing for `basenc` will have to be updated to only expose format-selecting arguments if `argv[0]` is `basenc`.

3. Implement format-specific optimization routines over time, supported by benchmarking.  For example, if the user only provides less than 1 KiB of data, a less optimized routine can be used (advanced AVX512 instructions have some interesting effects on CPU clock frequencies, so we should avoid them unless a large amount of input needs to be processed).

It may be preferable that I write a new, independent crate for optimized data encoding and create a dependency on that (rather than the current `data-encoding` crate); the alternative is to eliminate the external dependency completely.

I'd like to ask for prior approval from the `uutils` community before I start working on this.  Note that task 1 and 2 are effectively independent - while I'm more interested in the first, the second seems more important.  Maintainers, what do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improving `basenc` / `base32` / `base64` #5698

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improving basenc / base32 / base64 #5698

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Improving `basenc` / `base32` / `base64` #5698