-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Add helion x pt2 test #155513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add helion x pt2 test #155513
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155513
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 Cancelled Jobs, 1 Unrelated FailureAs of commit a30befb with merge base e125970 ( CANCELLED JOBS - The following jobs were cancelled. Please retry:
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
.ci/docker/common/install_triton.sh
Outdated
@@ -98,3 +98,4 @@ fi | |||
if [ -n "${NUMPY_VERSION}" ]; then | |||
pip_install "numpy==${NUMPY_VERSION}" | |||
fi | |||
pip_install helion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for now lets install the latest, but we can start pinning later on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty amazed this works out of the box
@helion.kernel(config=helion.Config(block_sizes=[1, 2])) | ||
def add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor: | ||
# match pytorch broadcasting rules | ||
x, y = torch.broadcast_tensors(x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL lol
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774 In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774 In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774 In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774 In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 2 jobs have failed, first few of them are: inductor / unit-test / linux-jammy-cpu-py3.12-gcc11-inductor-halide / build, inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / build Details for Dev Infra teamRaised by workflow job |
@pytorchbot merge -f "the cancelled jobs seem to be just timing out unrelated to this" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774 In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc. Pull Request resolved: pytorch#155513 Approved by: https://github.com/zou3519, https://github.com/jansel
Hi @oulgen looks like this breaks triton pin update please see: #156186
Please note it overwrite triton-3.4.0+git76b6977d and uses triton-3.3.1 Helion depend on :
Testing the fix by adding: |
After: #155513 Please see comment: #155513 (comment) Here are the logs: https://github.com/pytorch/pytorch/actions/runs/15838529400/job/44646874281?pr=156664#step:6:16372 Looks like current workflow is : Build triton - triton-3.4.0+git5389ed79-cp310-cp310-linux_x86_64.whl Install Helion - Overwrite triton with production 3.3.1 and install production torch Reinstall triton as final docker build step - triton-3.4.0+git5389ed79-cp310-cp310-linux_x86_64.whl This makes it somewhat messy since we install both torch and triton from prod. This is something we want to avoid when building underlining docker images for CI Log: ``` #55 311.4 + pip_install helion #55 311.4 + as_jenkins conda run -n py_3.10 pip install --progress-bar off helion #55 311.4 + sudo -E -H -u jenkins env -u SUDO_UID -u SUDO_GID -u SUDO_COMMAND -u SUDO_USER env PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/opt/conda/envs/py_3.10/bin:/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin LD_LIBRARY_PATH= conda run -n py_3.10 pip install --progress-bar off helion #55 393.6 Collecting helion #55 393.6 Downloading helion-0.0.7-py3-none-any.whl.metadata (14 kB) #55 393.6 Collecting filecheck (from helion) #55 393.6 Downloading filecheck-1.0.2-py3-none-any.whl.metadata (5.8 kB) #55 393.6 Collecting torch>=2.7.0 (from helion) #55 393.6 Downloading torch-2.7.1-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (29 kB) #55 393.6 Requirement already satisfied: typing-extensions>=4.0.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from helion) (4.14.0) #55 393.6 Requirement already satisfied: filelock in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch>=2.7.0->helion) (3.18.0) #55 393.6 Requirement already satisfied: sympy>=1.13.3 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch>=2.7.0->helion) (1.13.3) #55 393.6 Requirement already satisfied: networkx in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch>=2.7.0->helion) (2.8.8) #55 393.6 Requirement already satisfied: jinja2 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch>=2.7.0->helion) (3.1.6) #55 393.6 Requirement already satisfied: fsspec in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from torch>=2.7.0->helion) (2025.5.1) #55 393.6 Collecting nvidia-cuda-nvrtc-cu12==12.6.77 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting nvidia-cuda-runtime-cu12==12.6.77 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting nvidia-cuda-cupti-cu12==12.6.80 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) #55 393.6 Collecting nvidia-cudnn-cu12==9.5.1.17 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl.metadata (1.6 kB) #55 393.6 Collecting nvidia-cublas-cu12==12.6.4.1 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting nvidia-cufft-cu12==11.3.0.4 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting nvidia-curand-cu12==10.3.7.77 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_curand_cu12-10.3.7.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting nvidia-cusolver-cu12==11.7.1.2 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cusolver_cu12-11.7.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) #55 393.6 Collecting nvidia-cusparse-cu12==12.5.4.2 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cusparse_cu12-12.5.4.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) #55 393.6 Collecting nvidia-cusparselt-cu12==0.6.3 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl.metadata (6.8 kB) #55 393.6 Collecting nvidia-nccl-cu12==2.26.2 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (2.0 kB) #55 393.6 Collecting nvidia-nvtx-cu12==12.6.77 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_nvtx_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.6 kB) #55 393.6 Collecting nvidia-nvjitlink-cu12==12.6.85 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_nvjitlink_cu12-12.6.85-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting nvidia-cufile-cu12==1.11.1.6 (from torch>=2.7.0->helion) #55 393.6 Downloading nvidia_cufile_cu12-1.11.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.5 kB) #55 393.6 Collecting triton==3.3.1 (from torch>=2.7.0->helion) #55 393.6 Downloading triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (1.5 kB) #55 393.6 Requirement already satisfied: setuptools>=40.8.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from triton==3.3.1->torch>=2.7.0->helion) (80.9.0) #55 393.6 Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from sympy>=1.13.3->torch>=2.7.0->helion) (1.3.0) #55 393.6 Requirement already satisfied: MarkupSafe>=2.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from jinja2->torch>=2.7.0->helion) (3.0.2) #55 393.6 Downloading helion-0.0.7-py3-none-any.whl (149 kB) #55 393.6 Downloading torch-2.7.1-cp310-cp310-manylinux_2_28_x86_64.whl (821.2 MB) #55 393.6 Downloading nvidia_cublas_cu12-12.6.4.1-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (393.1 MB) #55 393.6 Downloading nvidia_cuda_cupti_cu12-12.6.80-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (8.9 MB) #55 393.6 Downloading nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (23.7 MB) #55 393.6 Downloading nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (897 kB) #55 393.6 Downloading nvidia_cudnn_cu12-9.5.1.17-py3-none-manylinux_2_28_x86_64.whl (571.0 MB) #55 393.6 Downloading nvidia_cufft_cu12-11.3.0.4-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (200.2 MB) #55 393.6 Downloading nvidia_cufile_cu12-1.11.1.6-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.1 MB) #55 393.6 Downloading nvidia_curand_cu12-10.3.7.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (56.3 MB) #55 393.6 Downloading nvidia_cusolver_cu12-11.7.1.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (158.2 MB) #55 393.6 Downloading nvidia_cusparse_cu12-12.5.4.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (216.6 MB) #55 393.6 Downloading nvidia_cusparselt_cu12-0.6.3-py3-none-manylinux2014_x86_64.whl (156.8 MB) #55 393.6 Downloading nvidia_nccl_cu12-2.26.2-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (201.3 MB) #55 393.6 Downloading nvidia_nvjitlink_cu12-12.6.85-py3-none-manylinux2010_x86_64.manylinux_2_12_x86_64.whl (19.7 MB) #55 393.6 Downloading nvidia_nvtx_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (89 kB) #55 393.6 Downloading triton-3.3.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (155.6 MB) #55 393.6 Downloading filecheck-1.0.2-py3-none-any.whl (23 kB) #55 393.6 Installing collected packages: nvidia-cusparselt-cu12, triton, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufile-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, filecheck, nvidia-cusparse-cu12, nvidia-cufft-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, torch, helion #55 393.6 Attempting uninstall: triton #55 393.6 Found existing installation: triton 3.4.0+git5389ed79 #55 393.6 Uninstalling triton-3.4.0+git5389ed79: #55 393.6 Successfully uninstalled triton-3.4.0+git5389ed79 #55 393.6 Successfully installed filecheck-1.0.2 helion-0.0.7 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 torch-2.7.1 triton-3.3.1 #55 393.6 #55 DONE 428.8s #56 [final 1/30] COPY --from=triton-builder /opt/triton /opt/triton #56 DONE 0.0s #57 [final 2/30] RUN if [ -n "yes" ] || [ -n "" ]; then pip install /opt/triton/*.whl; chown -R jenkins:jenkins /opt/conda; fi #57 0.823 Processing /opt/triton/triton-3.4.0+git5389ed79-cp310-cp310-linux_x86_64.whl #57 2.263 Requirement already satisfied: setuptools>=40.8.0 in /opt/conda/envs/py_3.10/lib/python3.10/site-packages (from triton==3.4.0+git5389ed79) (80.9.0) #57 2.589 Installing collected packages: triton #57 6.405 Successfully installed triton-3.4.0+git5389ed79 #57 6.405 WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. #57 DONE 86.5s ``` Pull Request resolved: #156706 Approved by: https://github.com/oulgen, https://github.com/malfet
Stack from ghstack (oldest at bottom):
This kinda just worked out of the box, shocking. PT2 traced into helion and emitted it as a user defined triton kernel: P1836496774
In the long run, we do not actually want this, but rather to create a helion HOP so we can do fusions etc.
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov