Skip to content

[ROCm] [CK] Composable Kernel integration for ROCm #156192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

iupaikov-amd
Copy link
Collaborator

@iupaikov-amd iupaikov-amd commented Jun 17, 2025

This is a part of our effort for integrating Composable Kernel library for Inductor backend. Currently we have a submodule, but would prefer to have commit pin control over the library as with Triton.

The idea is to have CK in 2.8 release to allow people to use it with inductor and AOT inductor and then gradually step away from submodule usage. Right now SDPA is tied to submodule files. We would like to avoid putting all installation logic in CI scripts to allow locally built versions to have this functionality.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @naromero77amd @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

Copy link

pytorch-bot bot commented Jun 17, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/156192

Note: Links to docs will display an error until the docs builds have been completed.

❌ 54 New Failures, 4 Unrelated Failures

As of commit ebd452d with merge base 3058719 (image):

NEW FAILURES - The following jobs have failed:

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added module: inductor module: rocm AMD GPU support for Pytorch release notes: releng release notes category labels Jun 17, 2025
@iupaikov-amd iupaikov-amd marked this pull request as ready for review June 18, 2025 09:46
@iupaikov-amd iupaikov-amd requested a review from jeffdaily as a code owner June 18, 2025 09:46
@iupaikov-amd iupaikov-amd added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 18, 2025
@iupaikov-amd
Copy link
Collaborator Author

Test failures in CI seem unrelated, they failed for main branch as well. Will need to be rebased after it's fixed.

@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 19, 2025
@jithunnair-amd jithunnair-amd added keep-going Don't stop on first failure, keep running tests until the end ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 19, 2025
@pytorch-bot pytorch-bot bot removed ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 19, 2025
@jithunnair-amd jithunnair-amd added the ciflow/rocm Trigger "default" config CI on ROCm label Jun 19, 2025
@jithunnair-amd jithunnair-amd added ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jun 19, 2025
@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 21, 2025
@pytorch-bot pytorch-bot bot removed ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 labels Jul 15, 2025
@iupaikov-amd
Copy link
Collaborator Author

@pytorchbot rebase

@iupaikov-amd iupaikov-amd added ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300 labels Jul 15, 2025
@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to Command git -C /home/runner/work/pytorch/pytorch rebase refs/remotes/origin/viable/strict pull/156192/head returned non-zero exit code 1

Rebasing (1/57)
Auto-merging setup.py
CONFLICT (content): Merge conflict in setup.py
error: could not apply f61030db195... Implemented CK installation for ROCm builds
hint: Resolve all conflicts manually, mark them as resolved with
hint: "git add/rm <conflicted_files>", then run "git rebase --continue".
hint: You can instead skip this commit: run "git rebase --skip".
hint: To abort and get back to the state before "git rebase", run "git rebase --abort".
hint: Disable this message with "git config set advice.mergeConflict false"
Could not apply f61030db195... # Implemented CK installation for ROCm builds

Raised by https://github.com/pytorch/pytorch/actions/runs/16303019055

@iupaikov-amd
Copy link
Collaborator Author

Bumped CK commit hash which should fix issues with unit tests.

@iupaikov-amd iupaikov-amd force-pushed the iupaikov_ck_integration_upstream branch from ebd452d to 6abe450 Compare July 21, 2025 15:50
@pytorch-bot pytorch-bot bot removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/inductor ciflow/rocm Trigger "default" config CI on ROCm ciflow/inductor-rocm Trigger "inductor" config CI on ROCm ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/periodic-rocm-mi300 Trigger "distributed" config CI on ROCm MI300 labels Jul 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep-going Don't stop on first failure, keep running tests until the end module: inductor module: rocm AMD GPU support for Pytorch open source release notes: releng release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants