-
Notifications
You must be signed in to change notification settings - Fork 16

Description
Apultra is an excellent compression algorithm with great compression ratios. As an optimization enthusiast, I would like to propose a way to significantly improve Apultra's performance while maintaining the integrity of compression.
The idea is to invert certain cycles to reduce processor cache misses. For example, the following cycle can be inverted:
for (int n = 0; n < numElements; n++) {
if (condition) {
// ...
}
}
Into:
for (int n = numElements - 1; n >= 0; n--) {
if (condition) {
// ...
}
}
Benefits of this approach:
- Improves memory locality and cache utilization by accessing memory sequentially
- Reduces branch mispredictions
- Enables better instruction pipelining
With cycle inversion, we can optimize hot loops that dominate the compression ratios like the ones in apultra_optimize_forward, hash search cycles, and other auxiliary algorithms.
My benchmarks demonstrate [2-4x] performance improvements with these optimizations while maintaining bit-exact output on multiple datasets.
It would be great if you can assess the viability of including cycle inversion in Apultra's core optimization pipeline. Please let me know if any other details are required. These micro-optimizations can significantly improve the real-world speed of compression without affecting accuracy.
I optimize apultra_optimize_forward
and receive good results.