Skip to content

Conversation

wenju-he
Copy link
Contributor

@wenju-he wenju-he commented Sep 5, 2025

This PR reduces amdgcn--amdhsa.bc size by 1.8% and nvptx64--nvidiacl.bc size by 4%.
Loop trip count is constant and backend can decide whether to unroll.

…nction is large

This PR reduces amdgcn--amdhsa.bc size by 3% and nvptx64--nvidiacl.bc
size by 4%.
Loop trip count is constant and backend can decide whether to unroll.
@wenju-he wenju-he requested a review from Copilot September 5, 2025 08:51
@llvmbot llvmbot added the libclc libclc OpenCL library label Sep 5, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the erf and erfc vector functions in libclc by switching from scalarization to a loop-based implementation. The change aims to reduce binary size by 3-4% for AMD and NVIDIA targets while maintaining functionality.

  • Replaces scalarized vector implementations with loop-based ones for erf/erfc functions
  • Introduces a new shared header for loop-based unary function scalarization
  • Allows backend compilers to decide on loop unrolling for optimal performance

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
libclc/clc/lib/generic/math/clc_erfc.cl Switches from scalarized to loop-based vector implementation
libclc/clc/lib/generic/math/clc_erf.cl Switches from scalarized to loop-based vector implementation
libclc/clc/include/clc/shared/unary_def_scalarize_loop.inc New shared header implementing loop-based vector scalarization

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@wenju-he wenju-he merged commit a271d07 into llvm:main Sep 5, 2025
9 checks passed
@wenju-he wenju-he deleted the unary_def_scalarize_loop.inc branch September 5, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libclc libclc OpenCL library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants