-
Notifications
You must be signed in to change notification settings - Fork 136
Implement post-register-allocation optimization #267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This commit implements redundant move elimination to optimize away unnecessary move operations that are immediately overwritten, targetting common inefficiencies in compiler-generated code. Added 5 optimization patterns: - Consecutive assignments to same destination: {mov rd,rs1; mov rd,rs2} → {mov rd,rs2} - Load immediately overwritten: {load rd,offset; mov rd,rs} → {mov rd,rs} - Constant load immediately overwritten: {li rd,imm; mov rd,rs} → {mov rd,rs} - Consecutive loads to same register: {load rd,off1; load rd,off2} → {load rd,off2} - Consecutive constant loads: {li rd,imm1; li rd,imm2} → {li rd,imm2}
This commit implements dead code elimination that works in conjunction with SCCP to remove unreachable code after constant propagation and branch folding. These optimizations target code that becomes dead after constant propagation, such as: - Branches with constant conditions (if(1), if(0)) - Instructions that are immediately overwritten - Unreachable code blocks after branch folding
This extends load/store elimination with more aggressive patterns, reducing memory traffic by eliminating redundant memory operations. Local memory optimizations: - Dead store elimination: Consecutive stores to same location - Redundant load elimination: Consecutive loads from same location - Store-to-load forwarding: Replace load with stored value - Load-store redundancy: Remove store of just-loaded value Global memory optimizations: - Global dead store elimination - Global redundant load elimination
This implements mathematical identity patterns on register operands: - Self-subtraction: x - x → 0 - Self-XOR: x ^ x → 0 - Self-OR: x | x → x (identity) - Self-AND: x & x → x (identity) These patterns emerge after register allocation when different variables are assigned to the same register. SSA handles constant folding, peephole handles register-based patterns.
This implements power-of-2 strength reduction patterns: - Division by 2^n → right shift by n - Modulo by 2^n → bitwise AND with (2^n - 1) - Multiplication by 2^n → left shift by n This optimization is unique to peephole optimizer since SSA works on virtual registers before actual constants are loaded.
This implements self-comparison optimizations: - x != x → 0 (always false) - x == x → 1 (always true) - x < x → 0 (always false) - x > x → 0 (always false) - x <= x → 1 (always true) - x >= x → 1 (always true) These register-based patterns appear after register allocation when different variables are assigned to the same register. Complements SSA's SCCP constant comparison folding.
This implements bitwise identity and absorption patterns: - Double complement: ~(~x) → x - AND with all-ones: x & -1 → x - OR with zero: x | 0 → x - XOR with zero: x ^ 0 → x - AND with zero: x & 0 → 0 (absorption) - OR with all-ones: x | -1 → -1 (absorption) - Shift by zero: x << 0 → x, x >> 0 → x These patterns are not handled by SSA optimizer and provide significant optimization opportunities for bitwise operations.
This implements 3-instruction sequence optimizations: - Store-load-store elimination: removes unused intermediate loads - Consecutive stores: only last store to same location matters
c093ffc
to
635a774
Compare
Integrates all safe and working peephole optimizations: - Instruction fusion for eliminating redundant moves - Comparison optimization for self-comparisons - Strength reduction for power-of-2 operations - Algebraic simplification for register patterns - Bitwise operation optimizations - Redundant move elimination - Load/store pair elimination - Triple pattern optimization Removed eliminate_dead_instructions() and fold_constant_branches() as they were causing bootstrap failures due to linked list corruption.
635a774
to
0f828a0
Compare
} | ||
|
||
/* Pattern 2: Redundant load immediately overwritten | ||
* {load rd, offset; mov rd, rs} → {mov rd, rs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the reverse pattern {mov rd, rs; load rd, offset} → {load rd, offset} also apply here? Perhaps we can add a FIXME in the comment for future work?
} | ||
|
||
/* Pattern 3: Load constant immediately overwritten | ||
* {li rd, imm; mov rd, rs} → {mov rd, rs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto, {mov rd, rs; li rd, imm} → {li rd, imm}
This pull request transforms shecc's peephole optimizer from basic instruction fusion to a comprehensive post-register-allocation optimization framework, providing performance improvements while maintaining educational clarity and bootstrap capability.
It creates lean and effective optimizer cooperation by eliminating redundant work between optimization passes.
Key Optimizations