Skip to content

Proposal to optimize performance with cycle inversion #22

@ghost

Description

Apultra is an excellent compression algorithm with great compression ratios. As an optimization enthusiast, I would like to propose a way to significantly improve Apultra's performance while maintaining the integrity of compression.

The idea is to invert certain cycles to reduce processor cache misses. For example, the following cycle can be inverted:

for (int n = 0; n < numElements; n++) {
  if (condition) {
    // ...
  }
}

Into:

for (int n = numElements - 1; n >= 0; n--) {
  if (condition) {
    // ... 
  }
}

Benefits of this approach:

  • Improves memory locality and cache utilization by accessing memory sequentially
  • Reduces branch mispredictions
  • Enables better instruction pipelining

With cycle inversion, we can optimize hot loops that dominate the compression ratios like the ones in apultra_optimize_forward, hash search cycles, and other auxiliary algorithms.

My benchmarks demonstrate [2-4x] performance improvements with these optimizations while maintaining bit-exact output on multiple datasets.

It would be great if you can assess the viability of including cycle inversion in Apultra's core optimization pipeline. Please let me know if any other details are required. These micro-optimizations can significantly improve the real-world speed of compression without affecting accuracy.

I optimize apultra_optimize_forward and receive good results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions