-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
I've identified a few issues with the encoding/decoding tools that I'm interested in solving:
-
They are not iterative / streaming. The entirety of the input is read first before it is all decoded. One the one hand, this allows you to detect errors upfront before you've output anything. But it means that these tools can't be used for iterative computations (e.g. over
while
loops) and they may consume large amounts of memory. The GNU implementations of these tools are streaming as well, so I'd classify this as an incompatibility bug. -
Significant optimization potential is available. I have a fair bit of experience with SIMD and I've designed AVX2 and AVX512 algorithms for Base32 / Base64 encoding - it's difficult but I would like to see if there are performance benefits to it, particularly for large inputs. Optimizations can be added iteratively, and can be feature-gated or even based on runtime CPU feature detection.
-
Compiled binaries for
base32
/base64
, on my machine, currently take up 23 MiB in debug mode and 3 MiB in release mode. This is surprisingly large and I believe the majority of that work lies in the argument parsing rather than the actual encoding. Sincebasenc
/base32
/base64
have essentially the same arguments to deal with, they can be combined.
To this end, here's my proposal:
-
Rewrite
src/uucore/src/lib/features/encoding.rs
to expose a streaming rather than one-shot interface. Along the way, I can add a sub-module for each encoding format so that per-format optimization routines can be defined. Runtime CPU detection, if implemented, would be format-generic. -
Setup
basenc
as a multicall binary so thatbase32
/base64
are symlinks to it. The argument parsing forbasenc
will have to be updated to only expose format-selecting arguments ifargv[0]
isbasenc
. -
Implement format-specific optimization routines over time, supported by benchmarking. For example, if the user only provides less than 1 KiB of data, a less optimized routine can be used (advanced AVX512 instructions have some interesting effects on CPU clock frequencies, so we should avoid them unless a large amount of input needs to be processed).
It may be preferable that I write a new, independent crate for optimized data encoding and create a dependency on that (rather than the current data-encoding
crate); the alternative is to eliminate the external dependency completely.
I'd like to ask for prior approval from the uutils
community before I start working on this. Note that task 1 and 2 are effectively independent - while I'm more interested in the first, the second seems more important. Maintainers, what do you think?