Skip to content

Fallback to contiguous layout in convolution lowering on stride mismatch #159462 #159593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kanavgoyal898
Copy link

@kanavgoyal898 kanavgoyal898 commented Jul 31, 2025

Fixes #159462

Fallback to .contiguous() Layout When require_stride_order Fails in Convolution Lowering

This PR fixes a stride validation error in the Inductor backend that occurs when a permute() is followed by a Conv1d layer. The issue happens because permute() creates a tensor with a non-standard layout, which breaks the expected stride checks during kernel compilation.

In the aten.convolution lowering function, this patch wraps require_stride_order(...) with a try/except. If the stride requirement check fails, it safely falls back to require_contiguous(...). This resolves the mismatch by ensuring a compatible memory layout.

Code:

import torch
import torch.nn as nn

import warnings
warnings.filterwarnings("ignore", message=".*TF32.*deprecated.*")
warnings.filterwarnings("ignore", message=".*Please use the new API settings.*")

class ConvModel(nn.Module):  
    def __init__(self):  
        super().__init__()  
        self.conv = nn.Conv1d(1, 64, kernel_size=3, padding=1)  
      
    def forward(self, x):  
        x = x.permute(0, 2, 1)
        return self.conv(x)


model = ConvModel()
x = torch.randn(32, 100, 1, dtype=torch.float32)

def run_test(model, input, backend):
    try:
        model = torch.compile(model, backend=backend)
        output = model(*input)
        print(f"succeed on {backend}")
    except Exception as e:
        print(f"failed on {backend}", str(e))
        

run_test(model, [x], "eager")
run_test(model, [x], "aot_eager")
run_test(model, [x], "inductor")

Output:

succeed on eager
succeed on aot_eager
succeed on inductor

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben

Copy link

pytorch-bot bot commented Jul 31, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/159593

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit 8810e26 with merge base a991e28 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link

@RohitRathore1 RohitRathore1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it's still failing. I build it on top of your fix which is this commit 850db0c......

python3
Python 3.13.5 | packaged by conda-forge | (main, Jun 16 2025, 08:27:50) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
2.9.0a0+git850db0c
>>> exit
(myenv) rohitrathore1@hpe10:~$ python3 test.py
succeed on eager
succeed on aot_eager
failed on inductor expected size 64==64, stride 1==100 at dim=1; expected size 100==100, stride 64==1 at dim=2
Error in op: torch.ops.aten.convolution.default
This error most often comes from a incorrect fake (aka meta) kernel for a custom op.
Use torch.library.opcheck to test your custom op.
See https://pytorch.org/docs/stable/library.html#torch.library.opcheck

@kanavgoyal898
Copy link
Author

kanavgoyal898 commented Aug 1, 2025

@RohitRathore1
Thanks for checking! I just re-ran the test on my local machine using the exact same commit (850db0cb) and it passes on all three backends, including inductor. I've attached the full output below for reference.

It could be platform-dependent, but since the operation uses only standard PyTorch modules on CPU, and passes on macOS with a clean build, it doesn’t immediately appear to be platform-specific.

(venv) kanavgoyal@MacBook-Pro pytorch % python 
Python 3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
... import torch.nn as nn
... 
... import warnings
... warnings.filterwarnings("ignore", message=".*TF32.*deprecated.*")
... warnings.filterwarnings("ignore", message=".*Please use the new API settings.*")
... 
... class ConvModel(nn.Module):  
...     def __init__(self):  
...         super().__init__()  
...         self.conv = nn.Conv1d(1, 64, kernel_size=3, padding=1)  
...       
...     def forward(self, x):  
...         x = x.permute(0, 2, 1)
...         return self.conv(x)
... 
... 
... model = ConvModel()
... x = torch.randn(32, 100, 1, dtype=torch.float32)
... 
... def run_test(model, input, backend):
...     try:
...         model = torch.compile(model, backend=backend)
...         output = model(*input)
...         print(f"succeed on {backend}")
...     except Exception as e:
...         print(f"failed on {backend}", str(e))
...         
... 
... run_test(model, [x], "eager")
... run_test(model, [x], "aot_eager")
... run_test(model, [x], "inductor")
... 
/Users/kanavgoyal/Downloads/pytorch/torch/_dynamo/guards.py:787: RuntimeWarning: Guards may run slower on Python 3.13.0. Consider upgrading to Python 3.13.1+.
  warnings.warn(
succeed on eager
/Users/kanavgoyal/Downloads/pytorch/torch/_dynamo/guards.py:787: RuntimeWarning: Guards may run slower on Python 3.13.0. Consider upgrading to Python 3.13.1+.
  warnings.warn(
succeed on aot_eager
/Users/kanavgoyal/Downloads/pytorch/torch/_dynamo/guards.py:787: RuntimeWarning: Guards may run slower on Python 3.13.0. Consider upgrading to Python 3.13.1+.
  warnings.warn(
succeed on inductor
>>> torch.__version__
'2.9.0a0+git850db0c'
>>> torch.utils.collect_env.get_pretty_env_info()
'PyTorch version: 2.9.0a0+git850db0c\nIs debug build: False\nCUDA used to build PyTorch: None\nROCM used to build PyTorch: N/A\n\nOS: macOS 15.5 (arm64)\nGCC version: Could not collect\nClang version: 17.0.0 (clang-1700.0.13.5)\nCMake version: version 4.0.3\nLibc version: N/A\n\nPython version: 3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 16.0.0 (clang-1600.0.26.3)] (64-bit runtime)\nPython platform: macOS-15.5-arm64-arm-64bit-Mach-O\nIs CUDA available: False\nCUDA runtime version: No CUDA\nCUDA_MODULE_LOADING set to: N/A\nGPU models and configuration: No CUDA\nNvidia driver version: No CUDA\ncuDNN version: No CUDA\nIs XPU available: False\nHIP runtime version: N/A\nMIOpen runtime version: N/A\nIs XNNPACK available: True\n\nCPU:\nApple M3 Pro\n\nVersions of relevant libraries:\n[pip3] numpy==2.3.2\n[pip3] optree==0.17.0\n[pip3] torch==2.9.0a0+git3967dbe\n[pip3] torch==2.9.0a0+git3967dbe\n[pip3] torch==2.9.0a0+git850db0c\n[conda] Could not collect'
>>> exit()

cc @eellison @soulitzer

Copy link

pytorch-bot bot commented Aug 1, 2025

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'can' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

@janeyx99 janeyx99 requested a review from eellison August 4, 2025 23:08
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 4, 2025
Copy link
Contributor

@eellison eellison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't do try ... except here. We should figure out what the actual underlying issue is. Sorry this may not be a good first issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: inductor open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inductor Fails on Conv1D After Permute with Stride Mismatch Error
5 participants