Pulse · pytorch/pytorch · GitHub

April 24, 2025 – May 1, 2025

Overview

189 Active pull requests

285 Active issues
- 0 Merged pull requests
- 189 Open pull requests
- 164 Closed issues
- 121 New issues

Could not load contribution data

Please try again later

189 Pull requests opened by 114 people

Extend compute_global_tensor_shape to multi dimension sharding
#152166 opened Apr 25, 2025
Generate test reports for pytest when option is given
#152170 opened Apr 25, 2025
[c10d] Allow split_group to work with non nccl backends
#152175 opened Apr 25, 2025
IGNORE: Testing OIDC
#152181 opened Apr 25, 2025
[WIP] New Win Arm64 Runners - User pre installed Visual Studio
#152184 opened Apr 25, 2025
xpu: get xpu arch flags at runtime in cpp_extensions
#152192 opened Apr 25, 2025
SAC: fix recompute tag propagation for ops with list[tensor] inputs
#152193 opened Apr 25, 2025
SAC: fix recompute tag propagation for ops with list[tensor] inputs
#152194 opened Apr 25, 2025
SAC: fix recompute tag propagation for ops with list[tensor] inputs
#152195 opened Apr 25, 2025
Add detailed triton kernel logging to tlparse
#152197 opened Apr 25, 2025
[inductor] propagate shapes in CSEVariable
#152198 opened Apr 25, 2025
Synchronize mps backend in the timer
#152199 opened Apr 25, 2025
[submodule] Update ONNX to 1.18
#152200 opened Apr 25, 2025
Add support for torch.cuda.FloatTensor()
#152208 opened Apr 25, 2025
[CI] docker images use tags instead of image name
#152209 opened Apr 25, 2025
Move mps_linear forward to use MPS kernels directly instead of MPSGraph
#152210 opened Apr 25, 2025
Mini tutorial for provenance tracking
#152211 opened Apr 25, 2025
Improve error handling in CachingAutotuner for argument mismatches
#152215 opened Apr 25, 2025
[not for land] functionalization hack to try making mutations on graph input slices more efficient
#152217 opened Apr 25, 2025
Add `padding="same"` for transposed convolution
#152228 opened Apr 25, 2025
Fix: Consider input defined unbacked during inductor codegen for runtime asserts
#152231 opened Apr 25, 2025
_get_total_norm should use float64 to avoid rounding errors
#152234 opened Apr 25, 2025
At least one of ROCM_HOME or CUDA_HOME must be None
#152236 opened Apr 26, 2025
[executorch hash update] update the pinned executorch hash
#152238 opened Apr 26, 2025
Updates to build on Noble (Ubuntu24.04) and py3.12
#152240 opened Apr 26, 2025
Enable 8byte vector loading for fp16/bf16
#152242 opened Apr 26, 2025
Move code out of individual token linters
#152256 opened Apr 26, 2025
[BE]: Cleanup traceutils with fmtlib
#152265 opened Apr 26, 2025
[ROCm] Maxpool backward NHWC Perf Improvement targeting Resnet scenarios
#152267 opened Apr 26, 2025
[Dynamo] Replace `unimplemented` with `unimplemented_v2` in `torch/_dynamo/variables/misc.py` [1/2]
#152274 opened Apr 27, 2025
[CI] Add xpu inductor test into periodic workflow
#152281 opened Apr 27, 2025
[DTensor] enable SimpleFSDP's composability with Tensor Parallel
#152286 opened Apr 27, 2025
[inductor] Skip isinf check for FP8 E4M3 dtype
#152289 opened Apr 28, 2025
Enable the AMP precision with freezing for CPU nightly test
#152298 opened Apr 28, 2025
[cp] dispatch flex_attention_backward to CP impl in TorchDispatchMode
#152311 opened Apr 28, 2025
setuptools.build_meta:__legacy__ backend is deprecated
#152313 opened Apr 28, 2025
Fixed RELEASE.md typo
#152315 opened Apr 28, 2025
Correct torch.xpu.is_bf16_supported return False if no XPU detected
#152317 opened Apr 28, 2025
[dynamo] Use getattr when accessing self.value.__module__ in SkipFunctionVariable
#152320 opened Apr 28, 2025
[ROCm] Unskipped test_rnn_dropout_state for ROCm
#152339 opened Apr 28, 2025
[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices
#152341 opened Apr 28, 2025
[Memento] Enable on-demand mode
#152342 opened Apr 28, 2025
[WIP] DeadCodeEliminator Mark(block) improvement
#152348 opened Apr 28, 2025
[inductor][dynamo] Include operator name in size/stride/alignment assertion
#152353 opened Apr 28, 2025
Add codeowner for merge rules
#152354 opened Apr 28, 2025
[Will This Work?] Build libgomp (gcc-11) from src on AArch64
#152361 opened Apr 28, 2025
Format all headers under ATen/cpu/vec, not just top-level
#152364 opened Apr 28, 2025
add is_vec_specialized_for
#152365 opened Apr 28, 2025
vec::map: directly process reduced-precision floats when reasonable
#152366 opened Apr 28, 2025
[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up
#152372 opened Apr 28, 2025
complex.pow(2) on GPU by replacing with complex * complex to avoid numerical instability
#152373 opened Apr 28, 2025
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 opened Apr 28, 2025
fix: outdated contents in dynamo overview
#152382 opened Apr 28, 2025
[inductor][subgraph] Simplify the resulting output code for subgraph
#152383 opened Apr 28, 2025
[inductor][invoke_subgraph] Remove assertion checks for outputs of invoke_subgraph
#152384 opened Apr 28, 2025
Add vec_reduce_all specialization for std::plus on AArch64
#152388 opened Apr 28, 2025
[Hierarchical Compilation] Track node mutations
#152389 opened Apr 29, 2025
[Inductor] Fix typing in cuda_template.py
#152390 opened Apr 29, 2025
[Inductor] Use `torch._dynamo.utils.same` in block pointer tests, adding atol/rtol kwargs to it.
#152392 opened Apr 29, 2025
[Accelerator] Fix Python typing in accelerator
#152394 opened Apr 29, 2025
[NFC] [inductor] [compile async] Warn exception if pickler failed
#152401 opened Apr 29, 2025
[Do not merge] poke CI with FX IR always on
#152405 opened Apr 29, 2025
Call torch.distributed.destroy_process_group() at the end of the example
#152407 opened Apr 29, 2025
[Inductor][CPU] bug fix for int8 GEMM compensation epilogue
#152408 opened Apr 29, 2025
Cleanup DeviceInterface in triton test
#152409 opened Apr 29, 2025
[Hierarchical Compile] Add mutation dependencies to topological sorting
#152410 opened Apr 29, 2025
[Quant][X86] add ops to compute uint8 pointwise add/add_relu
#152411 opened Apr 29, 2025
Add Vectorized FP8 E4M3
#152417 opened Apr 29, 2025
[Inductor][CPP] Enable vectorized fp8 quant dequant
#152418 opened Apr 29, 2025
Relax tolerance for test_quick_baddbmm_cpu_complex64
#152424 opened Apr 29, 2025
[ROCm] cpp_extension allow user to override default flags
#152432 opened Apr 29, 2025
Log aot and idx waitcounters.
#152444 opened Apr 29, 2025
[TorchDynamo] Fix failure to realize LazyVariableTracker on stack
#152446 opened Apr 29, 2025
Add new profiling events to `DebugAutotuner`
#152449 opened Apr 29, 2025
[PT2] Port replace_lce_with_matmul / replace_first_lce_with_fused_matmul_lce to PT2 pre_grad passes
#152450 opened Apr 29, 2025
Implement async manifold cache write
#152452 opened Apr 29, 2025
Add epoch to fake tensor cache key
#152453 opened Apr 29, 2025
[export] add runtime assert messages to python torch checks (#150719)
#152455 opened Apr 29, 2025
Fix XLA issue.
#152456 opened Apr 29, 2025
fix: Update padding_mode to use Literal for type checking
#152458 opened Apr 29, 2025
[IR] Input Adapter refactor prototype
#152459 opened Apr 29, 2025
[pytorch][triton] flex attention fwd kernel with TMA loads (#151923)
#152460 opened Apr 29, 2025
consolidate guard_or_x and definitely_x
#152463 opened Apr 29, 2025
add device generalisation support for distributed tests
#152471 opened Apr 29, 2025
[nativert] Move TensorMeta to pytorch core
#152475 opened Apr 29, 2025
[DO NOT REVIEW] Attempt a mixed precision fused adam
#152477 opened Apr 29, 2025
[ONNX] Suggest users setting dynamo=True when exporting
#152478 opened Apr 29, 2025
Fix flaky test in test_custom_ops
#152484 opened Apr 29, 2025
Change unsafe_marked_cacheable_functions to a dictionary, so that you can specify a static cache key
#152486 opened Apr 29, 2025
[invoke_subgraph] Simplify output code for subgraph output node
#152490 opened Apr 29, 2025
fix tests broken after #152450
#152493 opened Apr 29, 2025
[inductor][invoke_subgraph] Free the buffers before the subgraph call
#152494 opened Apr 29, 2025
[export] Refactor pt2 save/load
#152495 opened Apr 30, 2025
Refactor nested benchmark functions in AlgorithmSelectorCache
#152502 opened Apr 30, 2025
[Hierarchical Compilation] Use universal flatten APIs
#152505 opened Apr 30, 2025
[Hierarchical Compile] Take into account mutation deps in cycle detection
#152506 opened Apr 30, 2025
[inductor] [compile async] Don't compile in eager
#152507 opened Apr 30, 2025
[2/N] Deprecate c10::string_view and at::string
#152509 opened Apr 30, 2025
[MPS] Migrate mul to TensorIterator
#152515 opened Apr 30, 2025
Make torch/csrc/utils.h to be device-agnostic
#152521 opened Apr 30, 2025
Remove the unnecessary cuda/Tensor.cpp
#152522 opened Apr 30, 2025
[compile async] [cache] testing
#152523 opened Apr 30, 2025
elastic: do not shutdown rendezvous on leaving workers
#152525 opened Apr 30, 2025
Use std::apply for CPU code
#152526 opened Apr 30, 2025
Add methods for checking Triton availability to the device interface
#152529 opened Apr 30, 2025
Do not check out nccl when not building it
#152533 opened Apr 30, 2025
[CI] Use cmake from pip instead of conda in CI docker images
#152537 opened Apr 30, 2025
Disable SLEEF implementation of vec::maximum in vec128_float_neon.h | Accelerate aten::hardtanh_ by 21x
#152538 opened Apr 30, 2025
[CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`
#152540 opened Apr 30, 2025
Add parameters for monitor
#152541 opened Apr 30, 2025
strict multidimensional slicing
#152543 opened Apr 30, 2025
Migrate perf_test/test_[gc]pu_speed_mnist.sh from conda to venv
#152544 opened Apr 30, 2025
ci: Switch benchmark dependency to use pip
#152545 opened Apr 30, 2025
Remove Conda Instructions
#152546 opened Apr 30, 2025
Implemented `Size.__radd__`
#152554 opened Apr 30, 2025
[BE] Update numba versions
#152557 opened Apr 30, 2025
xpu: rely on sycl/sycl.hpp to include bfloat16.hpp
#152562 opened Apr 30, 2025
[c10d][fr] Make FR vendor neutral so that other backends can use it
#152563 opened Apr 30, 2025
[ROCm] Update spack includes
#152569 opened Apr 30, 2025
[Hierarchical Compile] Replace tracing alias and mutation check with dynamo impl
#152570 opened Apr 30, 2025
[Dynamo] Fix typing in graph_deduplication.py
#152572 opened May 1, 2025
Allow decomposeK to fuse
#152573 opened May 1, 2025
Added documentation for nonzero_static function (#152347)
#152574 opened May 1, 2025
[IR] Input Adapter refactor prototype (#152459)
#152575 opened May 1, 2025
[testing] 1
#152578 opened May 1, 2025
[aoti] skip codegen for sympy expr when codegening input
#152579 opened May 1, 2025
[cutlass backend] cache filtered ops based on layouts
#152580 opened May 1, 2025
[invoke_subgraph] rename identifiers to prevent python mangling
#152581 opened May 1, 2025
add support for 0 size shardedTensor and recalculate metadata from all_gather
#152583 opened May 1, 2025
[c10d][fr] Decouple the core logic of FR with the entry and event type
#152585 opened May 1, 2025
[2/N] Use std::filesystem
#152586 opened May 1, 2025
[Inductor] Introduce Wrapper IR line for symbolic call args
#152587 opened May 1, 2025
[WIP] verbose logging for recompilations
#152588 opened May 1, 2025
[Dynamo] Optimize dedupe region ancestor tracking
#152589 opened May 1, 2025
Fix #152280: add Literal[…] PaddingMode to Conv modules
#152590 opened May 1, 2025
Fix: promote scalar to MPS device in exec_binary_kernel
#152591 opened May 1, 2025
[c10d] Add support for ReduceOp::AVG in ProcessGroupMPI for FSDP2
#152594 opened May 1, 2025
[wip] base commit
#152596 opened May 1, 2025
add backend_specialization kwarg to mark_dynamic
#152597 opened May 1, 2025
[testing] 3
#152599 opened May 1, 2025
store backend specializations in StatelessSymbolicContext
#152600 opened May 1, 2025
use backend specializations in compile_and_call_fx_graph
#152601 opened May 1, 2025
[testing] 4
#152602 opened May 1, 2025
[BE] Delete `Module_CUDA_fix`
#152603 opened May 1, 2025
[Testing] Is FindCUDA.cmake from `Modules_CUDA_fix` called at all?
#152604 opened May 1, 2025
[Environment Variable] Use thread-safe getenv functions
#152609 opened May 1, 2025
Update padding_mode type annotation to use Literal type (PaddingMode)
#152610 opened May 1, 2025
Makefile: refactor build, setup and lint rules
#152611 opened May 1, 2025
Revert "Cleanup VS 2019 refs in pytorch (#145863)"
#152613 opened May 1, 2025
[WIP] Make FR vendor generic and try to enable it for gloo
#152614 opened May 1, 2025
[dynamo] Guard serialization for DUAL LEVEL.
#152615 opened May 1, 2025
[dynamo] Guard serialization for FUNCTORCH_STACK_MATCH
#152616 opened May 1, 2025
[CUDA][TF32] Account for TF32 in `test_conv2d_same_padding`
#152618 opened May 1, 2025
[DO NOT REVIEW] Implement __obj_flatten__ for LinearPackedParamsBase
#152619 opened May 1, 2025
Stop proxy-ing autograd.Function.ctx into the graph
#152621 opened May 1, 2025
Parameterized CUDA Graph Launch
#152622 opened May 1, 2025
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 opened May 1, 2025
[Dynamo] Guard serialization for TENSOR_SUBCLASS_METADATA_MATCH
#152626 opened May 1, 2025
Make PGO code state not sensitive to file path by hashing file content when the file is available.
#152628 opened May 1, 2025
[DCP] Add 30min timeout for IPC communications in async checkpointing
#152629 opened May 1, 2025
[ROCm] Initial AITER Integration for mha_bwd asm kernels
#152630 opened May 1, 2025
Fix two error messages involving Tensor.dense()
#152631 opened May 1, 2025
[ca] wrap flex attention tests with compiled autograd
#152633 opened May 1, 2025
Switch to metal kernel for mul
#152636 opened May 1, 2025
[export] Add draft-export docs
#152637 opened May 1, 2025
[dynamic shapes] use try-catch instead of guard_or_true for reshape_view_helper
#152638 opened May 1, 2025
[AOTAutogradCache][Easy] Move `"einops.einops.rearrange"` to `SAFE_NON_TORCH_FUNCTIONS`
#152640 opened May 1, 2025
[FlexAttention] explicilty create grad_q w/ strides
#152641 opened May 1, 2025
[CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
#152642 opened May 1, 2025
[BE]remove vulkan test
#152643 opened May 1, 2025
[inductor] Realize bucketize/searchsorted output
#152644 opened May 1, 2025
[do-not-land][ca] default on for CI
#152646 opened May 1, 2025
[Flight Recorder] Added logging after FR dump completed
#152648 opened May 2, 2025
thread through specialization to compile_fx
#152650 opened May 2, 2025
Add assert_fp8_close helper for FP8 tensor comparisons
#152651 opened May 2, 2025
Refactor some common autotune-related utils into a new file
#152652 opened May 2, 2025
Remove incorrect assertion
#152653 opened May 2, 2025
Make assertion about pass callable print the bad pass
#152654 opened May 2, 2025
cleanup, refactor and add missing self._dde_suppressed checks
#152657 opened May 2, 2025
Fix the basic description of torch.min(), torch.max(), torch.all(), torch.any()
#152658 opened May 2, 2025
[Inductor] Fix kernel argument ordering when using dynamic shapes with workspace
#152660 opened May 2, 2025
Fix evaluate_expr to include suppress_guards_tls in cache key
#152661 opened May 2, 2025
Re-enable FakeTensor caching for SymInts
#152662 opened May 2, 2025
[MPS][BE] Do not dispatch empty kernels
#152663 opened May 2, 2025
Raise error when no record on extra_files
#152664 opened May 2, 2025
MXFP8 Fix broken bias support for mxfp8
#152665 opened May 2, 2025
[StaticCudaLauncher] Ensure cuda context exists before launching kernels
#152667 opened May 2, 2025
Added documentation for nonzero_static function (#152347)
#152669 opened May 2, 2025
add codegen layer specialization dispatch
#152670 opened May 2, 2025

164 Issues closed by 41 people

DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64 (__main__.TestForeachCUDA)
#150392 closed May 2, 2025
DISABLED test_nvshmem
#152649 closed May 2, 2025
py_limited_api=True in PyTorch2.7 will break the build of extensions
#152243 closed May 2, 2025
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32 (__main__.TestForeachCUDA)
#150350 closed May 2, 2025
[ONNX] Improve and sort out fallback mechanism
#151703 closed May 2, 2025
Should make the doc of `nn.CrossEntropyLoss()` more clear
#134853 closed May 1, 2025
torch.compile should not recompiles when `.requires_grad=True` under `torch.no_grad()` context
#131975 closed May 1, 2025
compiled autograd + dynamic shapes fails with constraint violation
#133575 closed May 1, 2025
[torch.compile] Dynamic shape behavior is different between using torch.compile with and without compiled_autograd.enable
#113129 closed May 1, 2025
Export QAT model is not performing as expected when compared to the original model and FX Graph QAT
#150746 closed May 1, 2025
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_float32 (__main__.TestForeachCUDA)
#149409 closed May 1, 2025
`torch.export` fails on `InstanceNorm1d`
#152467 closed May 1, 2025
module.cuda() doesn't work under FakeTensorMode
#148977 closed May 1, 2025
[CI] [anaconda] CI Perf Tests
#148342 closed May 1, 2025
[Inductor] Dynamo hangs when processing an operator, seemingly depending on a logical argument value
#151743 closed May 1, 2025
[export] Warn users when 0/1 specialization happens
#151582 closed May 1, 2025
The test 'test_host_memory_stats' is failing in torch2.7.0+cu118
#152422 closed May 1, 2025
How does torch.cudagraph capture a hybrid graph?
#152584 closed May 1, 2025
Add switch to disable truncation to long list print
#152427 closed May 1, 2025
`torch.randint` can't handle large `high` argument (and in general high range of `torch.uint64`)
#152564 closed Apr 30, 2025
torch.randint should accept high=2**63
#81446 closed Apr 30, 2025
pytorch index_select is too slow
#111247 closed Apr 30, 2025
cuda graphs produce two additional kernel calls
#143572 closed Apr 30, 2025
[regression] Not getting `CUDA error: device-side assert triggered` on main for CUDA_KERNEL_ASSERT2
#107396 closed Apr 30, 2025
[CI] [anaconda] Benchmarks anaconda removal
#152123 closed Apr 30, 2025
More logs to show why fx graph cache isn't hit / created?
#152065 closed Apr 30, 2025
Mr
#152549 closed Apr 30, 2025
Add Description of `validate_args` in `torch.distributions.`
#152165 closed Apr 30, 2025
[ROCm] "No available kernel" when running EFFICIENT_ATTENTION sdpa
#138864 closed Apr 30, 2025
difficulty creating magma tarball when new rocm or cuda versions are deployed
#151707 closed Apr 30, 2025
[CUDA Graph tree] Cannot capture buffer allocation on side CUDA Streams
#151199 closed Apr 30, 2025
Unify nccl versions for x86 and aarch64 builds
#149554 closed Apr 30, 2025
Release torch with CUDA12.1 for 2.6 and even latest version
#152524 closed Apr 30, 2025
Failed to create Gloo new group after initialized with NCCL
#68726 closed Apr 30, 2025
Cudnn 9.2 is out!
#119400 closed Apr 30, 2025
[ONNX] scatter_reduce with max reduction not correctly converted to ONNX for 2d input
#152419 closed Apr 30, 2025
DISABLED test_fn_grad_grid_sampler_2d_cuda_float64 (__main__.TestBwdGradientsCUDA)
#131079 closed Apr 30, 2025
NotImplementedError: Operator aten.view.dtype does not have a sharding strategy registered.
#152530 closed Apr 30, 2025
DISABLED test_inductor_debug (__main__.LoggingTests)
#152511 closed Apr 30, 2025
DISABLED test_einsum_cpu (__main__.TestUnbackedSymintsCPU)
#151380 closed Apr 30, 2025
[export] export doesn't save custom meta for constant tensors
#151476 closed Apr 30, 2025
Regression: Multiple OpenMP runtimes linked to libtorch_cpu.so
#146603 closed Apr 29, 2025
Floating point exception (core dumped) in `native_channel_shuffle`
#142453 closed Apr 29, 2025
UNSTABLE pull / linux-jammy-py3-clang12-executorch / test (executorch)
#144480 closed Apr 29, 2025
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex64 (__main__.TestForeachCUDA)
#149199 closed Apr 29, 2025
[CI] Remove conda usage from lint related jobs
#148110 closed Apr 29, 2025
`make pdflatex` Sphinx error: Builder name pdflatex not registered or available through entry point
#147027 closed Apr 29, 2025
[CI] [anaconda] Utility scripts and workflows
#152124 closed Apr 29, 2025
cd: There's no way to test changes to container images for binary builds
#149679 closed Apr 29, 2025
[RFC] PyTorch next wheel build platform: manylinux-2.28
#123649 closed Apr 29, 2025
[Performance] Simple arithemtic operations are slower using MPS than Metal
#143874 closed Apr 29, 2025
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_complex64 (__main__.TestForeachCUDA)
#151313 closed Apr 29, 2025
[BUG] when invoking torch::manul_seed() program crashed in libtorch 2.2.1
#121658 closed Apr 29, 2025
Loading weights using `torch.distributed.checkpoint` leads to large loss values
#145378 closed Apr 29, 2025
FSDP OOM during initialization
#152263 closed Apr 29, 2025
[Inductor] Different results with Conv2d and BN2d not in `eval mode`
#141317 closed Apr 29, 2025
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_bool (__main__.TestForeachCUDA)
#151268 closed Apr 29, 2025
DISABLED test_guard_failure_fn2 (__main__.MiscTests)
#148217 closed Apr 29, 2025
DISABLED test_inlined_functions_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148544 closed Apr 29, 2025
DISABLED test_nested_tuple_output_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148606 closed Apr 29, 2025
DISABLED test_make_closure_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148889 closed Apr 29, 2025
DISABLED test_capture_tracked_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148427 closed Apr 29, 2025
DISABLED test_return_captured_var_used_multiple_times_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148624 closed Apr 29, 2025
DISABLED test_export_defaults_ok_dynamic_shapes (__main__.DynamicShapesExportTests)
#148331 closed Apr 29, 2025
DISABLED test_internal_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148558 closed Apr 29, 2025
DISABLED test_int_shape_binops (__main__.MiscTests)
#148296 closed Apr 29, 2025
DISABLED test_user_defined_binop (__main__.MiscTests)
#148443 closed Apr 29, 2025
DISABLED test_empty_graph_nested_calls_fullgraph_True_dynamic_shapes (__main__.DynamicShapesReproTests)
#148311 closed Apr 29, 2025
DISABLED test_wrap_kwarg_default_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148951 closed Apr 29, 2025
DISABLED test_capture_untracked_global_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148442 closed Apr 29, 2025
DISABLED test_freevars_as_inputs_to_wrap_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148514 closed Apr 29, 2025
DISABLED test_donated_buffer1_dynamic_shapes (__main__.DynamicShapesAotAutogradFallbackTests)
#149101 closed Apr 29, 2025
DISABLED test_sys_modules_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148330 closed Apr 29, 2025
DISABLED test_int_shape_inplace_binops (__main__.MiscTests)
#148312 closed Apr 29, 2025
DISABLED test_guard_failure_fn_shape_control_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148216 closed Apr 29, 2025
DISABLED test_lift_tensors_with_shared_symbols_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148870 closed Apr 29, 2025
DISABLED test_wrap_kwarg_only_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#149024 closed Apr 29, 2025
DISABLED test_symint_in_slice_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148643 closed Apr 29, 2025
DISABLED test_capture_tracked_nested_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148851 closed Apr 29, 2025
DISABLED test_wrap_kwarg_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#149000 closed Apr 29, 2025
DISABLED test_param_shape_binops (__main__.MiscTests)
#148369 closed Apr 29, 2025
DISABLED test_wrap_kwarg_default_if_branch_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148961 closed Apr 29, 2025
DISABLED test_dont_aggressively_write_assert_dynamic_shapes (__main__.DynamicShapesReproTests)
#148295 closed Apr 29, 2025
DISABLED test_side_effect_local_list_append_no_graph_break_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148935 closed Apr 29, 2025
DISABLED test_export_with_cond_dynamic_shape_pred_dynamic_shapes (__main__.DynamicShapesExportTests)
#148368 closed Apr 29, 2025
DISABLED test_empty_graph_nested_calls_fullgraph_False_dynamic_shapes (__main__.DynamicShapesReproTests)
#148426 closed Apr 29, 2025
DISABLED test_shape_int_inplace_binops (__main__.MiscTests)
#148392 closed Apr 29, 2025
DISABLED test_capture_untracked_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148464 closed Apr 29, 2025
DISABLED test_sys_modules (__main__.MiscTests)
#148428 closed Apr 29, 2025
DISABLED test_wrap_pytree_kwargs_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#149079 closed Apr 29, 2025
DISABLED test_dynamic_sources_dynamic_override (__main__.MiscTests)
#148218 closed Apr 29, 2025
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#151300 closed Apr 29, 2025
DISABLED test_mark_unbacked_strict_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148215 closed Apr 29, 2025
DISABLED test_nested_wrap_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148914 closed Apr 29, 2025
DISABLED test_mark_unbacked_strict (__main__.MiscTests)
#148332 closed Apr 29, 2025
DISABLED test_wrap_all_kwarg_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148665 closed Apr 29, 2025
float' object is not callable when using scheduler.step() with MultiplicativeLR
#81554 closed Apr 29, 2025
MPS SDPA `float32` memory leak
#152344 closed Apr 29, 2025
[Break XPU] chunk_cat accuracy failed on XPU Inductor UT.
#152296 closed Apr 29, 2025
[XPU] The updated torch-xpu-ops caused interpolate_bilinear accuracy error.
#152020 closed Apr 29, 2025
Compilation Issues with sm_129 (RTX 5070 Ti) on WSL - Seeking Advice
#152400 closed Apr 29, 2025
Proposal: Beautify torch.distributed.tensor.debug.visualize_sharding
#151857 closed Apr 29, 2025
[CI] [anaconda] Utilities
#152126 closed Apr 29, 2025
torch.bucketize works incorrectly on uint input with negative boundaries after torch.compile-gpu
#145929 closed Apr 28, 2025
Potential bug in torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
#88791 closed Apr 28, 2025
[inductor] Precompilation start time is the time when a config is added to the queue, not when executor starts compiling the config
#148777 closed Apr 28, 2025
[inductor] [dtype] `ReplicationPad` raise dtype error on eager but pass the check on indcutor
#143779 closed Apr 28, 2025
Error building pytorch from source
#138315 closed Apr 28, 2025
Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit
#152351 closed Apr 28, 2025
torch.arange bf16 results are not accurate
#137774 closed Apr 28, 2025
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#151228 closed Apr 28, 2025
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool (__main__.TestForeachCUDA)
#151229 closed Apr 28, 2025
[discussion] Consolidation of audio-visual I/O in a new package
#81102 closed Apr 28, 2025
[PTD][RPC] Verify RPC Tutorials contents and scripts
#138832 closed Apr 28, 2025
Error in DTensor uneven shard view op
#143372 closed Apr 28, 2025
Incorrect Gradient Computation in `torch.log1p`
#152088 closed Apr 28, 2025
AWS A100 runners reliability issue
#140332 closed Apr 28, 2025
[CI] [anaconda] CI Build and Test scripts MacOS
#152113 closed Apr 28, 2025
peak memory is lower for subsequent fresh runs compared to the first run of a torch.compiled model
#151995 closed Apr 28, 2025
DISABLED test_dynamic_sources_dynamic_override_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148214 closed Apr 28, 2025
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float64 (__main__.TestForeachCUDA)
#151214 closed Apr 28, 2025
DISABLED test_setting_meta_device_model_broadcasting_and_memory (__main__.TestStateDict)
#143994 closed Apr 28, 2025
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised: ModuleNotFoundError: No module named 'expecttest'
#152225 closed Apr 28, 2025
Pytorch aten::col2im not currently supported on the MPS backend
#151820 closed Apr 28, 2025
prod_cpu not implemented for 'BFloat16'
#89372 closed Apr 28, 2025
Windows CUDA Build Failure: Ambiguous std in cuda_vectorized_test.cu (CUDA 12.6/MSVC 2019)
#152291 closed Apr 28, 2025
`torch._inductor.exc.InductorError: CppCompileError: C++ compile error` after Torch 2.7 Release
#152172 closed Apr 28, 2025
Aborted (core dumped) in torch.flipud
#152253 closed Apr 27, 2025
Aborted (core dumped) in torch.fliplr
#152085 closed Apr 27, 2025
Less Check on the triangular tensor of `L` in `torch.cholesky_solve()`
#152164 closed Apr 27, 2025
[inductor] `.to_sparse()-.to_dense()` throws `LoweringException: NotImplementedError:`
#151522 closed Apr 26, 2025
"TypeError: unhashable type: non-nested SymInt" with `torch.compile`
#135099 closed Apr 26, 2025
Pytorch 2.7.0 with XPU (silently) crashing
#152255 closed Apr 26, 2025
[Inductor] weird reordering behavior with `wait_tensor`
#152252 closed Apr 26, 2025
memoryview support for `torch._C.import_ir_module_from_buffer`
#107099 closed Apr 26, 2025
Compute Capability Misrecognition on NVIDIA Force RTX 50Ge70 Ti (Blackwell Architecture)
#152223 closed Apr 25, 2025
[MPS/Inductor] polygamma is miscompiled for some inputs
#152205 closed Apr 25, 2025
Lint rule for always using std::optional?
#150313 closed Apr 25, 2025
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float32 (__main__.TestForeachCUDA)
#151136 closed Apr 25, 2025
DISABLED test_foreach_check_stride_ignore_dims_of_one_cuda_float32 (__main__.TestForeachCUDA)
#150026 closed Apr 25, 2025
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float16 (__main__.TestForeachCUDA)
#151114 closed Apr 25, 2025
[AOTI] aoti_compile_and_package + use_runtime_constant_folding gives "Error: CUDA driver error: file not found"
#152067 closed Apr 25, 2025
[`Torch 2.7.0 x Py 3.9`] Incompatible dep versions with networkx
#152191 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass3 (__main__.ComposabilityTest)
#151089 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass0 (__main__.ComposabilityTest)
#151083 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass1 (__main__.ComposabilityTest)
#151084 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass3 (__main__.ComposabilityTest)
#151090 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass2 (__main__.ComposabilityTest)
#151085 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass1 (__main__.ComposabilityTest)
#151087 closed Apr 25, 2025
DISABLED test_pp_ddp_ScheduleClass2 (__main__.ComposabilityTest)
#151082 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass2 (__main__.ComposabilityTest)
#151088 closed Apr 25, 2025
DISABLED test_pp_ddp_ScheduleClass1 (__main__.ComposabilityTest)
#151081 closed Apr 25, 2025
DISABLED test_pp_ddp_ScheduleClass0 (__main__.ComposabilityTest)
#151078 closed Apr 25, 2025
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass0 (__main__.ComposabilityTest)
#151086 closed Apr 25, 2025
UserWarning: There is a performance drop because we have not yet implemented the batching rule for aten::scatter_reduce.two
#152141 closed Apr 25, 2025
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex64 (__main__.TestForeachCUDA)
#151099 closed Apr 25, 2025
ENH: Publish full-fledged tarballs also for release candidates
#150649 closed Apr 25, 2025
[ONNX] Exporting with `dynamo=True` and `Dim.DYNAMIC` in `dynamic_shapes` passes for `scaled_dot_product_attention`, but doesn't do anything
#152018 closed Apr 25, 2025
Fix the Inconsistency and Description of `device_type` in `torch.random.fork_rng()`
#151784 closed Apr 25, 2025
`out` should exist as an instance variable out of the func itself
#146676 closed Apr 25, 2025
Size of `tau` can mismatch with the context in `torch.ormqr()`
#150674 closed Apr 25, 2025
Whether `x` and `dx` can be used together in `torch.trapezoid()`?
#151105 closed Apr 25, 2025
[ONNX] Dynamic shapes: support `torch.sym_not`
#136572 closed Apr 25, 2025
What is the difference between normal_tensor.storage().use_count() and viewed_tensor's?
#152100 closed Apr 25, 2025

121 Issues opened by 76 people

Torch BF16 group gemm hangs in backward pass - core issue isolated, needs proper resolution.
#152668 opened May 2, 2025
DISABLED test_comprehensive_nansum_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152666 opened May 2, 2025
Add explicit error message for def infer_size(a, b): that specificy that non broadcast path was picked due to unbacked existing in both inputs.
#152656 opened May 2, 2025
UNSTABLE docker-cache-mi300 / docker-cache
#152655 opened May 2, 2025
Check for if two tensors are overall similar instead of bitwise similar?
#152647 opened May 2, 2025
ProcessGroupGloo.allgather_into_tensor_coalesced crashes with CUDA tensors
#152645 opened May 1, 2025
static cuda launcher causes `RuntimeError: CUDA driver error: invalid device context` in torchtitan CI
#152639 opened May 1, 2025
TestFlexAttentionCUDA.test_GQA_score_mod7_cuda_float16 fails on h100
#152635 opened May 1, 2025
Incorrect strides for `nonzero_static` compilation
#152634 opened May 1, 2025
DISABLED test_torchvision_models_efficientnet_v2_l (__main__.TestVisionTracing)
#152632 opened May 1, 2025
[v2.7.1] Release Tracker
#152627 opened May 1, 2025
modded-nanogpt flaky NCCL hang starting 3/30 nightly
#152623 opened May 1, 2025
Pytorch Profiler crashes while using it with Pytorch Lightning module
#152617 opened May 1, 2025
Enable AOTI for Metal inductor
#152612 opened May 1, 2025
[triton pin update] Run Inductor CI on pin updates for Triton and the PyTorch nightly branch
#152608 opened May 1, 2025
Loops impacting output when utilizing hooks
#152607 opened May 1, 2025
AOTI regression on SAM and tts-angular
#152606 opened May 1, 2025
ROCm, 7900 XTX: Pytorch SDPA is 2.5x slower than manual implementation with non-continuous v
#152595 opened May 1, 2025
Flex Attention doesn't scale with custom bias
#152593 opened May 1, 2025
[ratter-build] Cannot detect CUDA when build from source
#152592 opened May 1, 2025
[MPS] Binary kernels produce incorrect results when one of the tensor arguments is from a wrapped scalar
#152582 opened May 1, 2025
[Benchmark] High compilation time variance on benchmark dashboards
#152566 opened Apr 30, 2025
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)
#152561 opened Apr 30, 2025
DISABLED test_pending_fusion_pro_and_epi (__main__.TestPrologueFusion)
#152560 opened Apr 30, 2025
DISABLED test_comprehensive_signal_windows_hamming_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152559 opened Apr 30, 2025
DISABLED test_comprehensive_amin_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152558 opened Apr 30, 2025
PGO does not work on jobs for frameworks that copy code to different dirs at different attempts.
#152555 opened Apr 30, 2025
MPS varying seq len SDPA memory leak
#152550 opened Apr 30, 2025
FakeTensorUpdater does not trace nodes correctly
#152548 opened Apr 30, 2025
optree package status in PyTorch
#152535 opened Apr 30, 2025
AsyncCollectiveTensor doesn't trigger wait upon dtype cast
#152534 opened Apr 30, 2025
The 2.7.0 release tarball is missing `.ci/docker/ci_commit_pins/nccl-cu12.txt` required for building
#152532 opened Apr 30, 2025
[inductor][triton] Inductor is not compatible with the latest upstream Triton
#152531 opened Apr 30, 2025
flex attention does not leverage masking, memory error
#152528 opened Apr 30, 2025
can't reconstruct the communication group using PyTorch.
#152527 opened Apr 30, 2025
DISABLED test_comprehensive_lu_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152520 opened Apr 30, 2025
DISABLED test_comprehensive_repeat_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152500 opened Apr 30, 2025
UNSTABLE Lint / Lint URLs / linux-job
#152489 opened Apr 29, 2025
[dynamo] Try tracing into einops
#152480 opened Apr 29, 2025
[dynamo] Dynamo fails to run torch.cat() with FakeTensors because it can't confirm 's0 + s1*u0' is nonzero
#152473 opened Apr 29, 2025
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152469 opened Apr 29, 2025
DISABLED test_comprehensive_polygamma_polygamma_n_1_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152470 opened Apr 29, 2025
torch.export with dynamic shapes on Static Cache HF LLama model fails
#152465 opened Apr 29, 2025
[dynamo] `torch.compile` prevents fsdp warning from getting generated
#152451 opened Apr 29, 2025
[dynamo] guard code generation triggers attribute error on DeviceMesh object
#152447 opened Apr 29, 2025
`torch.compile` causes assertion error in distributed checkpoint wrapper test
#152442 opened Apr 29, 2025
Inductor pattern matching on mutable ops
#152441 opened Apr 29, 2025
Newly added lint-urls jobs are very flaky
#152439 opened Apr 29, 2025
`nn.CrossEntropyLoss` accepts negative target probabilities
#152437 opened Apr 29, 2025
Fx Graph cache hit generates guards that does not exists in the original cached program causing recompilations only at cache hit.
#152435 opened Apr 29, 2025
[pt2] [AOTAutogradCache] Allow users to specify non torch functions as cacheable
#152434 opened Apr 29, 2025
[Manylinux 2.28] Migrate Docker container to use gcc 14
#152426 opened Apr 29, 2025
Silent incorrectness between static torch.compile vs eager
#152425 opened Apr 29, 2025
Invalid handling of nans in compiled torch.quantile / torch.nanquantile on cuda
#152423 opened Apr 29, 2025
torch.nn.functional.ctc_loss raises cuDNN error in PyTorch versions >=2.5.0
#152421 opened Apr 29, 2025
DISABLED test_comprehensive_index_select_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152416 opened Apr 29, 2025
DISABLED test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest)
#152415 opened Apr 29, 2025
[DTensor] Calling .item() on DTensor with Partial placement results in local value
#152406 opened Apr 29, 2025
[CPU][UT] 16 UT of test/inductor/test_cpu_select_algorithm.py failed with PyTorch 2025-04-028 nightly wheel
#152398 opened Apr 29, 2025
Illegal Instruction Caused by `grid_sample` Under Windows
#152385 opened Apr 28, 2025
Outdated contents in dynamo overview
#152381 opened Apr 28, 2025
TORCH_COMPILE_DEBUG=1 does not consistently generate debug logs
#152374 opened Apr 28, 2025
DISABLED test_remote_cache_load_function_device_cuda_float32_dynamic_False_bundle_triton_True_use_static_cuda_launcher_False (__main__.TestFxGraphCache)
#152370 opened Apr 28, 2025
DISABLED test_reduce_stress_cuda (__main__.ProcessGroupGlooTest)
#152367 opened Apr 28, 2025
[AOTI] Package lowered with package_constants_in_so=False still uses lots of memory when loaded
#152356 opened Apr 28, 2025
Pin setuptools runtime dependency
#152355 opened Apr 28, 2025
DISABLED test_e2e_compile_True_model_type1 (__main__.TestE2ESaveAndLoad)
#152349 opened Apr 28, 2025
torch.nonzero_static is not documented on the website
#152347 opened Apr 28, 2025
compile generates inefficient code for mutations on small slices of inputs
#152346 opened Apr 28, 2025
cudagraphs: `static_input_indices` incorrectly including SymInt graph args when using tensor subclasses + dynamic shapes
#152343 opened Apr 28, 2025
Unusually slow draft_export time
#152337 opened Apr 28, 2025
pin_memory crashes for big tensors and leaks page locked memory
#152335 opened Apr 28, 2025
compile generates inefficient code when mutating small slice of a graph input
#152323 opened Apr 28, 2025
DISABLED test_comprehensive_pca_lowrank_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152318 opened Apr 28, 2025
[dynamo] torch._dynamo crashes on `self.value.__module__` inside SkipFunctionVariable.call_function() (PyTorch 2.7, works 2.6)
#152316 opened Apr 28, 2025
[DCP] failure case of save method
#152310 opened Apr 28, 2025
Softmax Decomp Causes Incorrect Gradients when Using `torch.compile` with `F.multi_head_attention_forward`
#152309 opened Apr 28, 2025
bizarre behavior with torch module's Attribute Error
#152308 opened Apr 28, 2025
Recompile issue after fp8 conversion
#152307 opened Apr 28, 2025
NCCL out of memory error after updating to PyTorch 2.7
#152302 opened Apr 28, 2025
Unexpected result from `torch.xpu.is_bf16_supported()` when XPU is unavailable
#152301 opened Apr 28, 2025
Unexpected behavior when using dist.all_reduce(x, op=dist.ReduceOp.SUM)
#152300 opened Apr 28, 2025
`torch.compile()` produces incorrect results for `asinh_()` operation on large/small values
#152299 opened Apr 28, 2025
Flex attention: batch-index-dependent block mask causes error with changing batch size
#152297 opened Apr 28, 2025
`vmap` not working on `torch.arange`, `torch.scalar_tensor`, and `torch.ones`
#152295 opened Apr 28, 2025
Unexpected overflow behavior when using `torch.addcmul`
#152294 opened Apr 28, 2025
`torch.sparse.log_softmax` output mismatch between CPU and CUDA
#152293 opened Apr 28, 2025
`torch==2.6` broke `nn.Module.dtype` typing
#152292 opened Apr 28, 2025
[Intel GPU][PT2.8]scaled_dot_product_attention returns wrong output
#152290 opened Apr 28, 2025
Error after successful build: No module named 'torch._C._distributed_c10d'
#152285 opened Apr 27, 2025
Forward compatibility in torch.export
#152283 opened Apr 27, 2025
Update `torch/nn/modules/conv.py` to use Literal for support padding modes
#152280 opened Apr 27, 2025
Make scaler.step() return if step was skipped or not
#152279 opened Apr 27, 2025
MPS: Conv1d fails with NotImplementedError for output_channels > 65536
#152278 opened Apr 27, 2025
`setup.py develop` command is disappearing soon from `setuptools`
#152276 opened Apr 27, 2025
[cudagraphs][HF][torch 2.7] Excessive cudagraph re-recording for HF LLM models
#152275 opened Apr 27, 2025
Question about that support of torch.compile for a custom CUDA operator?
#152270 opened Apr 27, 2025
Arbitrary Code Execution Risk in `torch.distributed.utils.overload` When Misused in Type Annotations
#152269 opened Apr 27, 2025
`iter()` and `reversed()` do not raise `StopIteration` when exhausted in torch.compile
#152262 opened Apr 26, 2025
Context Parallel -- unsharded output doesn't match output without CP.
#152261 opened Apr 26, 2025
[FR] Support BSHM-layout scaled_dot_product_attention without transpose.
#152257 opened Apr 26, 2025
Windows inductor genarated code without function declaration, and compile failed on MSVC.
#152251 opened Apr 26, 2025
[DTensor] [distributed]: Operator aten.masked_fill_.Scalar does not have a sharding strategy registered
#152249 opened Apr 26, 2025
NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'SparseCUDA' backend.
#152226 opened Apr 25, 2025
DISABLED test_remote_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_False_use_static_cuda_launcher_False (__main__.TestFxGraphCache)
#152222 opened Apr 25, 2025
DISABLED test_pending_fusions_multiple (__main__.TestPrologueFusion)
#152221 opened Apr 25, 2025
[C10D] Allow NCCL single P2P ops to use parent/collective communicator
#152220 opened Apr 25, 2025
Have compiled autograd config API support nested compilation
#152219 opened Apr 25, 2025
Outdated install commands
#152213 opened Apr 25, 2025
Have cherry-pick bot always add the current release to the PR
#152212 opened Apr 25, 2025
DISABLED test_reduce_stress_cuda (__main__.ProcessGroupGlooLazyInitTest)
#152201 opened Apr 25, 2025
HUD Dashboard sort by perf speedup doesn't do anything
#152190 opened Apr 25, 2025
The input for layers other than the first layer should be the hidden state from the previous layer.
#152188 opened Apr 25, 2025
GroupNorm compilation errors on UNet-based architecture on torch >= 2.6.0
#152185 opened Apr 25, 2025
write a custom ViewAndMutationmeta.__repr__
#152183 opened Apr 25, 2025
GH200/GB200 NCCL Build Pytorch
#152182 opened Apr 25, 2025
Raise an Error when File Not Found in `torch.jit.load()`
#152178 opened Apr 25, 2025
Add description of several params in the basic usage of `torch.min()`, `torch.max()`, `torch.all()` and `torch.any()`
#152176 opened Apr 25, 2025
/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py:825: UserWarning: grid_sampler_2d_backward_cuda does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True, warn_only=True)'.
#152171 opened Apr 25, 2025
DISABLED test_e2e_compile_True_model_type2 (__main__.TestE2ESaveAndLoad)
#152169 opened Apr 25, 2025
DISABLED test_e2e_compile_True_model_type0 (__main__.TestE2ESaveAndLoad)
#152168 opened Apr 25, 2025

510 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

[Inductor] FX backend via Wrapper IR
#146942 commented on May 2, 2025 • 18 new comments
[Cutlass] Integrate EVT into CUDACPPScheduling
#150906 commented on May 2, 2025 • 17 new comments
Random Batch Sampler Speedup
#147706 commented on May 1, 2025 • 16 new comments
[cp] dispatch flex_attention to CP impl in TorchDispatchMode
#151497 commented on Apr 29, 2025 • 14 new comments
[aotd] Support saved tensors hooks in aot_autograd
#150032 commented on Apr 30, 2025 • 13 new comments
[Inductor] Add decomposeK as an autotuning choice for mm
#150654 commented on May 2, 2025 • 13 new comments
[1/n][Optimus][Auto-AC] Support activation quantization without scaling
#148380 commented on May 1, 2025 • 13 new comments
Implement util function compute_global_tensor_shape for 1D device mesh
#151990 commented on May 1, 2025 • 10 new comments
[reland][ROCm] remove caffe2 from hipify
#151845 commented on Apr 28, 2025 • 9 new comments
[dynamo] replace `unimplemented` with `unimplemented_v2` in `variables/functions.py`
#151277 commented on May 1, 2025 • 9 new comments
[device_mesh] improve device selection logic
#150897 commented on May 1, 2025 • 9 new comments
[Inductor UT] Generalize device-bias code in `test_flex_attention.py`
#151937 commented on May 2, 2025 • 8 new comments
[ROCm][CI] Enabled fp8 distributed tests in test_micro_pipeline_tp.py for MI300
#151977 commented on May 2, 2025 • 8 new comments
Add infra to run CPython tests under Dynamo
#150787 commented on May 1, 2025 • 7 new comments
Cache code generation during triton template expansion and enable it for mm_template.
#151773 commented on May 2, 2025 • 7 new comments
Fix take_along_dim negative index handling (#146211)
#152161 commented on Apr 29, 2025 • 6 new comments
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on May 1, 2025 • 6 new comments
Unify how we create random inputs for auto-tuning
#152147 commented on Apr 30, 2025 • 6 new comments
Refactoring FSDP2 (_composable/fsdp) test cases to be device agnostic
#149848 commented on Apr 30, 2025 • 6 new comments
[CUDA] Replace deprecated usages of cub iterators and thread operators
#147493 commented on Apr 29, 2025 • 6 new comments
dynamically set tags
#152089 commented on Apr 29, 2025 • 6 new comments
[SymmMem] Add all-to-all
#151498 commented on May 2, 2025 • 5 new comments
[associative_scan] Refactoring of input checking and dynamo invocation
#148657 commented on May 1, 2025 • 5 new comments
NUMA Binding Integration with torchrun
#149334 commented on Apr 29, 2025 • 5 new comments
[dynamo] replace `unimplemented` with `unimplemented_v2` in `variables/torch_functions.py`
#151278 commented on May 2, 2025 • 5 new comments
Avoid differing results in `linalg.(tensor_)solve`
#151896 commented on Apr 30, 2025 • 4 new comments
Do not cover up `__dunder`__ method type-hints from `.pyi` file
#150875 commented on May 1, 2025 • 4 new comments
Optimize LRScheduler docs
#146684 commented on Apr 28, 2025 • 4 new comments
Add `load_state_dict` hint doc about invoke order work with lr_scheduler
#149942 commented on Apr 29, 2025 • 4 new comments
autograd: Add VJP and JVP rules for aten::aminmax
#151186 commented on May 2, 2025 • 4 new comments
removed zero dim cpu logic from fake_tensor.py
#147501 commented on May 1, 2025 • 3 new comments
[Intel GPU] Support f32 intermediate dtype, headdim size <=576 and f32 causal mask for SDPA
#152091 commented on Apr 28, 2025 • 3 new comments
Move prologue_supported_inputs computations to def_kernal
#150869 commented on May 2, 2025 • 3 new comments
Add is_pinned to host allocator
#151439 commented on Apr 29, 2025 • 3 new comments
flex attention: fix dispatch order for tensor subclasses, avoid hardcoding call to faketensor impl in dynamo
#151719 commented on Apr 30, 2025 • 3 new comments
use vectorized loads and stores for all datatypes in torch.cat
#151818 commented on Apr 28, 2025 • 3 new comments
Enable type promotions in slice_scatter (pytorch#147842)
#151911 commented on Apr 29, 2025 • 3 new comments
Make `Adam`, `AdamW` work with nonzero-dim Tensor betas
#149939 commented on May 1, 2025 • 3 new comments
[WIP]: track remaining runtime time asserts for backward coddgen instead of trying to regenerate all
#151919 commented on Apr 28, 2025 • 3 new comments
auto functionalize base_hop
#151067 commented on May 2, 2025 • 2 new comments
Remove conda usage in condaenv.bat
#151035 commented on May 2, 2025 • 2 new comments
update get_default_device to also respect torch.device ctx manager
#148621 commented on Apr 30, 2025 • 2 new comments
[Inductor] Adjust boundary checking of dimensions using YBLOCK
#149504 commented on Apr 28, 2025 • 2 new comments
Parallelize sort using libstdc++ parallel mode
#150195 commented on May 1, 2025 • 2 new comments
API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+
#150536 commented on Apr 30, 2025 • 2 new comments
[dynamic shapes] guard_or_false for infer_size
#152146 commented on May 2, 2025 • 2 new comments
Exempt overriding methods from docstring_linter (fix #151692)
#151906 commented on Apr 29, 2025 • 2 new comments
[OpenReg] Add _lazy_init and rng_state support for OpenReg
#151914 commented on Apr 30, 2025 • 2 new comments
Add AC_TRACER Infra TorchDispatchMode key
#152158 commented on Apr 26, 2025 • 1 new comment
[Graph Partition] Pass all cudagraph tree tests
#152048 commented on Apr 30, 2025 • 1 new comment
Replace `fw_metadata` info with trace log hint in hint message
#147365 commented on Apr 25, 2025 • 1 new comment
Improve cache key graph printing performance
#151928 commented on Apr 29, 2025 • 1 new comment
[Intel GPU][Inductor] Fallback embedding_dense_backward on XPU
#151637 commented on Apr 29, 2025 • 1 new comment
[ROCm] Maxpool forward NHWC Perf Improvement targeting Resnet scenarios
#151727 commented on Apr 30, 2025 • 1 new comment
Fix #150472 torch.library.custom_op doesn't handle single element tuples returns
#151408 commented on Apr 29, 2025 • 1 new comment
[AOTI][reland] Remove typedef for half and bfloat16
#151109 commented on Apr 30, 2025 • 1 new comment
[training] Adding NUMA support for pytorch
#150597 commented on Apr 29, 2025 • 1 new comment
[ROCm] Add support for SymmetricMemory
#150580 commented on May 1, 2025 • 1 new comment
[Inductor] Restrict block analysis to only match integer dims and strides
#149615 commented on Apr 28, 2025 • 1 new comment
Inductor logging + analysis of torch.profile
#149697 commented on May 2, 2025 • 1 new comment
[dynamic shapes] guard_or_false for computeStorageNbytes
#150483 commented on Apr 25, 2025 • 1 new comment
Add MPS support for getHostAllocator API
#151913 commented on Apr 29, 2025 • 1 new comment
[map] always turn on dynamo for map
#152041 commented on Apr 29, 2025 • 1 new comment
Enable AArch64 CI scripts to be used for local dev
#143190 commented on Apr 30, 2025 • 1 new comment
[pytree] simplify public API exposition with `__module__`
#148328 commented on May 1, 2025 • 0 new comments
[ATen][CUDA] Optimize 128 bit vectorization
#148320 commented on May 2, 2025 • 0 new comments
handle jk for emulation runs
#148240 commented on May 2, 2025 • 0 new comments
[Intel CPU] Fix issue #143483.
#144854 commented on May 2, 2025 • 0 new comments
Enable `_lazy_clone` between CPU and MPS
#148408 commented on May 1, 2025 • 0 new comments
set non_blocking to true in torch._foreach_copy_ to improve performance
#148431 commented on Apr 28, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format`
#148185 commented on May 1, 2025 • 0 new comments
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on May 1, 2025 • 0 new comments
Improvement with comprehensive docstrings and implementation of class method for the code.
#148170 commented on Apr 29, 2025 • 0 new comments
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Z7)
#148163 commented on Apr 30, 2025 • 0 new comments
Replacing explicit backend search with api call
#144944 commented on May 1, 2025 • 0 new comments
Enable fp16 linear layers in PyTorch via ACL
#144992 commented on Apr 26, 2025 • 0 new comments
draft
#148160 commented on Apr 29, 2025 • 0 new comments
Checks kv pair indexing in OrderedPreservingDictTest.test_range_insert
#148136 commented on May 1, 2025 • 0 new comments
Use myst_nb in docs
#148105 commented on Apr 28, 2025 • 0 new comments
Enable XPU distributed test for PT2.8
#149916 commented on Apr 29, 2025 • 0 new comments
[WIP] no normalizations abstractions
#149899 commented on Apr 28, 2025 • 0 new comments
[Dynamo] Add easydict support
#149851 commented on Apr 26, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[e-n]*/` to `ruff format`
#144553 commented on May 1, 2025 • 0 new comments
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format`
#144554 commented on May 1, 2025 • 0 new comments
[BE][Ez]: Update CU126 to CUDNN 12.8 too
#149254 commented on Apr 30, 2025 • 0 new comments
[Easy] update pip sources for CUDA in nightly pull tool
#149143 commented on May 1, 2025 • 0 new comments
Update the heuristic for AArch64 bmm/baddbmm
#149122 commented on Apr 29, 2025 • 0 new comments
Unrestrict some onlyCPU tests
#149095 commented on Apr 29, 2025 • 0 new comments
[test] bigger runnner
#149003 commented on Apr 30, 2025 • 0 new comments
[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging
#148981 commented on Apr 29, 2025 • 0 new comments
Move token linter code into tools/linter/adaptors/_linter/
#148959 commented on May 1, 2025 • 0 new comments
cpp_wrapper: build non-performance-sensitive code at O1
#148773 commented on May 1, 2025 • 0 new comments
Trunk workflow for Windows Arm64
#148753 commented on May 2, 2025 • 0 new comments
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on May 1, 2025 • 0 new comments
[inductor] lowering for fractional_max_pool3d
#148630 commented on Apr 28, 2025 • 0 new comments
Adjust CMake code for Eigen
#148628 commented on May 1, 2025 • 0 new comments
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on May 1, 2025 • 0 new comments
[triton hash update] update the pinned triton hash
#148492 commented on May 2, 2025 • 0 new comments
[Intel CPU] Fix issue #143482.
#144760 commented on May 2, 2025 • 0 new comments
Fix issue #146018: Improve CachingAutotuner handling
#147580 commented on Apr 25, 2025 • 0 new comments
Update pybind11 submodule to 3.0.0-dev test
#147524 commented on Apr 30, 2025 • 0 new comments
removed check for ConvTranspose3D on MPS
#145366 commented on Apr 28, 2025 • 0 new comments
Fix pxtas warnings on sm_120
#147491 commented on Apr 25, 2025 • 0 new comments
Documentation: fix RNN example for multiple layers
#147490 commented on Apr 25, 2025 • 0 new comments
[test] sccache log
#147470 commented on May 1, 2025 • 0 new comments
handle default in _NamedOptimizer
#147357 commented on Apr 25, 2025 • 0 new comments
[NOT_FOR_COMMIT] Try Triton-cpu-arm
#147341 commented on Apr 26, 2025 • 0 new comments
Fix torch.compile Fallback for Meta Device Tensors
#147339 commented on Apr 27, 2025 • 0 new comments
Optimize `Sequential` methods description
#147304 commented on May 1, 2025 • 0 new comments
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on May 1, 2025 • 0 new comments
Do not use username for inductor default_cache_dir
#147291 commented on Apr 26, 2025 • 0 new comments
Fix clang-tidy warnings in torch/jit
#147253 commented on May 2, 2025 • 0 new comments
Add quantized BatchNorm1d module
#147113 commented on Apr 25, 2025 • 0 new comments
Porting Pytorch to AIX Operating System.
#146983 commented on Apr 30, 2025 • 0 new comments
cmake: fix detection logic when using system XNNPACK
#145853 commented on Apr 26, 2025 • 0 new comments
[ARM] Fix TestDataLoader.test_segfault unexpected success on Aarch6[4
#146090 commented on Apr 29, 2025 • 0 new comments
[CI] Get rid of UCC builds
#146173 commented on May 2, 2025 • 0 new comments
Fix non-bitwise type annotations for Tensor operators (see #145838)
#146845 commented on May 1, 2025 • 0 new comments
[Inductor-CPU] FP16 X int8 WoQ GEMM for M <= 4 with FP16 accum & compute
#146781 commented on Apr 27, 2025 • 0 new comments
Enable pt2e quantization path for arm
#146690 commented on Apr 28, 2025 • 0 new comments
[2/N] Fix cppcoreguidelines-init-variables suppression
#146237 commented on Apr 29, 2025 • 0 new comments
Update code_template.py re.compile() is directly applied to the regex…
#146489 commented on Apr 29, 2025 • 0 new comments
[not for land] temp changes to enable 'simple_fsdp'
#146558 commented on Apr 28, 2025 • 0 new comments
[HOP] Mutation and alias rework
#146658 commented on Apr 30, 2025 • 0 new comments
Refactor layout constraint selection logic
#148104 commented on May 2, 2025 • 0 new comments
Fix test_tensorboard when started w/o tensorboard package
#148079 commented on Apr 29, 2025 • 0 new comments
use identity op for alpha=inf in torch.celu and quantized_celu
#148066 commented on Apr 29, 2025 • 0 new comments
Support `contextlib.suppress`
#147990 commented on Apr 29, 2025 • 0 new comments
xpu: test py_limited_api with SyclExtension
#147984 commented on Apr 27, 2025 • 0 new comments
[aot] reset aot counter on torch._dynamo.reset
#147915 commented on Apr 27, 2025 • 0 new comments
[DONOTLAND] Fix partial + scalar issue
#147910 commented on Apr 27, 2025 • 0 new comments
[PT2][Optimus][Opportunity Finder][1/n] Add opportunity finder in the inductor for GEMM horizonal fusion search
#147908 commented on Apr 28, 2025 • 0 new comments
Remerge of #144974
#147903 commented on Apr 27, 2025 • 0 new comments
Change persistent reduction threshold to 32
#147899 commented on Apr 28, 2025 • 0 new comments
Back out "use copy2d in h2d/d2h copy when possible (#146256)"
#147808 commented on Apr 27, 2025 • 0 new comments
test
#147800 commented on Apr 28, 2025 • 0 new comments
Upgrade to DLPack 1.0.
#145000 commented on Apr 26, 2025 • 0 new comments
Made partitioning more(?) deterministic
#145024 commented on Apr 29, 2025 • 0 new comments
[Inductor-CPU] Avoid memory allocator lock contention in the GEMM template
#147797 commented on Apr 27, 2025 • 0 new comments
[cuda] Add new gamma beta backwards kernel
#147773 commented on Apr 26, 2025 • 0 new comments
add pt2 testing for torch.float8_e8m0fnu
#147770 commented on Apr 27, 2025 • 0 new comments
[DCP][OSS] Rank local checkpointing in DCP without collectives
#147758 commented on Apr 30, 2025 • 0 new comments
Adding MVP of P1 INT16 Full
#147747 commented on Apr 25, 2025 • 0 new comments
Turn Stream into protocol and improve typing in torch/_C/__init__.pyi.in
#145239 commented on Apr 30, 2025 • 0 new comments
[Intel GPU] OneDNN primitive cache support for Int4 WOQ gemm on XPU
#147693 commented on Apr 28, 2025 • 0 new comments
[cuBLAS] restrict input range for `addmm` tests
#147658 commented on Apr 28, 2025 • 0 new comments
[sm100][sm120][fp8][CUDA] skip rowwise scaling tests on SM100+ for now
#147645 commented on Apr 26, 2025 • 0 new comments
Remove backend_type_map from Backend
#147635 commented on Apr 27, 2025 • 0 new comments
Refactor typing: Replace Any with ParamSpec for better type safety
#147582 commented on Apr 26, 2025 • 0 new comments
Add dynamo config to HOP-ify context managers
#152159 commented on Apr 30, 2025 • 0 new comments
[SymmMem] Add all_to_all_vdev
#151819 commented on May 2, 2025 • 0 new comments
Normalize dynamic size symbols in template codegen cache key.
#151778 commented on Apr 28, 2025 • 0 new comments
[Inductor] Modify TritonTemplate store_output function to support TMA stores
#151775 commented on Apr 30, 2025 • 0 new comments
[Inductor] Modify persistent+TMA template for Triton mm and admm to use new TMA API
#151774 commented on Apr 30, 2025 • 0 new comments
[2/n][Optimus][Auto-AC] Support activation quantization with scaling
#151770 commented on May 1, 2025 • 0 new comments
Add adaptive_avg_pool2d input and output_size check
#151769 commented on Apr 25, 2025 • 0 new comments
[Don't merge] Upgrade oneDNN to v3.8 for XPU build
#151767 commented on Apr 25, 2025 • 0 new comments
torch.testing._internal.optests - MPS Support
#151758 commented on Apr 28, 2025 • 0 new comments
Use gather in index_select
#151715 commented on May 2, 2025 • 0 new comments
[dtensor] op_schema recursive check for symints
#151679 commented on Apr 28, 2025 • 0 new comments
[Intel GPU] Use user-friendly err msg in mm
#151655 commented on Apr 28, 2025 • 0 new comments
Add OIDC perms to windows-[build|test] workflows
#151596 commented on May 1, 2025 • 0 new comments
Add OIDC permissions to linux-test workflow
#151585 commented on May 1, 2025 • 0 new comments
Add OIDC permissions to linux-build workflow
#151581 commented on May 1, 2025 • 0 new comments
Update OpenBLAS commit
#151547 commented on Apr 28, 2025 • 0 new comments
Fix `InstanceNorm` wrong suggestion in warning message
#151534 commented on May 1, 2025 • 0 new comments
[WIP] Deprecate getPinnedMemoryAllocator use getHostAllocator instead
#151531 commented on Apr 29, 2025 • 0 new comments
Add OIDC permissions to bazel workflow
#151456 commented on Apr 25, 2025 • 0 new comments
Allow to byteswap data when reading saved torch jit data
#151447 commented on May 1, 2025 • 0 new comments
Implement fast exp for AVX2 and AVX512 for the flash attention
#151441 commented on Apr 29, 2025 • 0 new comments
[Cutlass] Add epilogue inputs/outputs to def_kernel
#151406 commented on May 2, 2025 • 0 new comments
[ROCm] Upgrade ROCm CI to ROCm6.4
#151368 commented on May 2, 2025 • 0 new comments
Fix skipIfXpu and skipIfHpu disables tests when used on class
#151315 commented on Apr 30, 2025 • 0 new comments
Add inductor backend to device interface; make minifier_tests more device agnostic
#151314 commented on Apr 29, 2025 • 0 new comments
[Export] Remove to() from module generated form exported program
#151307 commented on Apr 28, 2025 • 0 new comments
Use Allocator API raw_allocate & raw_dealloc in CUDAAllocator
#151305 commented on Apr 25, 2025 • 0 new comments
WIP: divup op
#152144 commented on Apr 25, 2025 • 0 new comments
[inductor] pass reduction idx to scan inner_fns
#152142 commented on Apr 28, 2025 • 0 new comments
Remove some instances of uninitialized memory use
#152132 commented on Apr 29, 2025 • 0 new comments
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on Apr 28, 2025 • 0 new comments
Migrate to new Windows Arm64 runners
#152099 commented on Apr 28, 2025 • 0 new comments
Switch to standard pep517 sdist generation
#152098 commented on May 1, 2025 • 0 new comments
Work around MPSGraph issue in backward pass of nn.ReplicationPad1d/2d
#152094 commented on Apr 28, 2025 • 0 new comments
Add optional device index to AOTIModelPackageLoader
#152093 commented on May 2, 2025 • 0 new comments
[cuDNN][SDPA] Fix head-dim 256 condition for SM 10.0
#152076 commented on May 1, 2025 • 0 new comments
Test
#152055 commented on Apr 28, 2025 • 0 new comments
unbreak fb:operator_benchmark_test
#152049 commented on May 1, 2025 • 0 new comments
Ignore unused structured arguments in member functions
#152019 commented on Apr 29, 2025 • 0 new comments
Add CPython complex tests
#152015 commented on May 1, 2025 • 0 new comments
[Kineto] Upgrade the kineto commit to fb36cce
#152007 commented on Apr 29, 2025 • 0 new comments
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on Apr 28, 2025 • 0 new comments
[SymmMem] Use cub's BlockScan instead of in-house impl for offset calculation
#151993 commented on May 2, 2025 • 0 new comments
[inductor] Remove usage of autotune_fallback_to_aten outside inductor code
#151988 commented on Apr 29, 2025 • 0 new comments
Add torchcheck for replication_pad3d_backward
#151986 commented on Apr 25, 2025 • 0 new comments
[Intel GPU] undo broadcast on zero stride tensor for SDPA
#151976 commented on Apr 29, 2025 • 0 new comments
Inductor Tiling Rewrite
#151958 commented on Apr 29, 2025 • 0 new comments
[inductor] fix lowering for cummin, cummax for one element tensors
#151931 commented on May 1, 2025 • 0 new comments
[ROCm][CI] Update dockerfile to use centos9
#151929 commented on May 1, 2025 • 0 new comments
Skip fuse attention on fp32 if not tf32
#151924 commented on Apr 28, 2025 • 0 new comments
[WIP] Deprecate AcceleratorHooksInterface isPinnedPtr, use at::getHostAllocator()->is_pinned instead
#151916 commented on Apr 29, 2025 • 0 new comments
Deprecated pkg_resources and use distributions instead
#151915 commented on Apr 29, 2025 • 0 new comments
Add `LinearLR` compute lr formula in doc
#151894 commented on Apr 25, 2025 • 0 new comments
[inductor] Clean typing in codegen/common.py and codecache.py
#150767 commented on Apr 28, 2025 • 0 new comments
[Dynamo][Typing] Enable `@override` for VTs [1/N]
#150763 commented on Apr 25, 2025 • 0 new comments
[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files
#150732 commented on May 2, 2025 • 0 new comments
[BE] Resolve lint errors in `.pyi` stub files
#150731 commented on May 2, 2025 • 0 new comments
[BE] Ensure generated stub files by `gen_pyi` are properly formatted
#150730 commented on May 2, 2025 • 0 new comments
[BE] Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#150729 commented on May 2, 2025 • 0 new comments
[BE] Update `.pyi` stub template to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150728 commented on May 2, 2025 • 0 new comments
[torchgen] Refactor and simplify `gen_pyi.py` to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150727 commented on May 2, 2025 • 0 new comments
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#150726 commented on May 2, 2025 • 0 new comments
Avoid overwriting COW data in MPS code
#150721 commented on May 2, 2025 • 0 new comments
[export] add runtime assert messages to python torch checks
#150719 commented on Apr 25, 2025 • 0 new comments
[WIP] Support XPU in memory tracker
#150703 commented on Apr 30, 2025 • 0 new comments
Raise `BufferError` for DLPack buffer-related errors.
#150691 commented on Apr 26, 2025 • 0 new comments
AOTI: add all fallback ops that are missing from C-shim
#150673 commented on May 1, 2025 • 0 new comments
[Inductor] Fix CUDA memory usage for CPU only compile
#150669 commented on Apr 28, 2025 • 0 new comments
Refactor `torch/utils/data/datapipes/gen_pyi.py` with `torchgen`
#150626 commented on May 2, 2025 • 0 new comments
Fix nn.LazyModuleMixin examples
#150596 commented on May 1, 2025 • 0 new comments
fix dynamic shapes for kwargs
#150583 commented on Apr 30, 2025 • 0 new comments
Enable lazy cloning in `Tensor.to` between CPU and MPS
#150569 commented on May 2, 2025 • 0 new comments
Inductor respects exact strides on custom ops by default
#150511 commented on May 2, 2025 • 0 new comments
[DLPack] Add support for missing keyword-arguments.
#150218 commented on Apr 26, 2025 • 0 new comments
Fix DLPack stream logic.
#150217 commented on Apr 26, 2025 • 0 new comments
[DLPack] add NumPy exchange tests.
#150216 commented on Apr 26, 2025 • 0 new comments
AOTI freezing: fix test issues and enable by default
#149961 commented on May 1, 2025 • 0 new comments
[inductor] Add more typing to _inductor/ir.py
#149959 commented on Apr 28, 2025 • 0 new comments
[inductor] Add typing to _inductor/ir.py
#149958 commented on Apr 29, 2025 • 0 new comments
[3/N] Use internal linkage in C++ files
#151297 commented on Apr 30, 2025 • 0 new comments
[WIP][SymmMem] Add sendrecv op
#151262 commented on Apr 29, 2025 • 0 new comments
[dynamo] keep C++ symbolic shape guards disabled for benchmarks
#151225 commented on May 1, 2025 • 0 new comments
Implement MKLGenerator
#151218 commented on May 1, 2025 • 0 new comments
Update slow tests
#151207 commented on Apr 28, 2025 • 0 new comments
Fix DWConv in QNNPACK for aarch32
#151191 commented on Apr 26, 2025 • 0 new comments
[dynamo] Prevent lazy variable realization on STORE_FAST
#151184 commented on May 2, 2025 • 0 new comments
Guard additional use of DriverAPI
#151125 commented on Apr 28, 2025 • 0 new comments
TESTING: IGNORE
#151116 commented on May 1, 2025 • 0 new comments
Update auto-tuning support for _scaled_grouped_mm
#150944 commented on May 1, 2025 • 0 new comments
fix shard tensor gather when a local tensor on certain ranks has zero elements
#150914 commented on Apr 27, 2025 • 0 new comments
[DO NOT MERGE] Throwaway changes
#150910 commented on May 2, 2025 • 0 new comments
[Inductor] Fix cuda_template.py typing
#150909 commented on May 2, 2025 • 0 new comments
[Inductor] Fix cuda_kernel typing
#150908 commented on Apr 29, 2025 • 0 new comments
[Cutlass] Changes to gemm template for EVT
#150907 commented on May 2, 2025 • 0 new comments
[device_mesh] replace dim_group_info with group_name
#150898 commented on Apr 30, 2025 • 0 new comments
Add CPython tests for iter/sort
#150797 commented on May 1, 2025 • 0 new comments
Add CPython generator/contextlib tests
#150796 commented on May 2, 2025 • 0 new comments
Add CPython int/float tests
#150795 commented on May 1, 2025 • 0 new comments
Add CPython math/cmath tests
#150794 commented on May 1, 2025 • 0 new comments
Add CPython string tests
#150793 commented on May 1, 2025 • 0 new comments
Add CPython set tests
#150792 commented on May 1, 2025 • 0 new comments
Add CPython dict tests
#150791 commented on May 1, 2025 • 0 new comments
Add CPython list/tuple tests
#150790 commented on May 1, 2025 • 0 new comments
Add CPython exception tests
#150789 commented on May 1, 2025 • 0 new comments
Add CPython tests for unittest
#150788 commented on May 1, 2025 • 0 new comments
The operator 'aten::_linalg_eigh.eigenvalues' is not currently implemented for the MPS device
#151874 commented on Apr 28, 2025 • 0 new comments
CUDA deps cannot be preloaded under Bazel
#117350 commented on Apr 28, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int8 (__main__.TestForeachCUDA)
#150837 commented on Apr 28, 2025 • 0 new comments
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 commented on Apr 28, 2025 • 0 new comments
Bug with "make latexpdf"
#135420 commented on Apr 28, 2025 • 0 new comments
DeepSeek: mixed precision optimizers
#146542 commented on Apr 28, 2025 • 0 new comments
`2` and `-2` for `ord` argument of `linalg.norm()` should be explained more clearly
#136453 commented on Apr 28, 2025 • 0 new comments
[inductor] [cuda] [fake tensor] `torch.ones(x.size(0))` becomes a fake tensor for `torch.diagonal_scatter`
#151670 commented on Apr 28, 2025 • 0 new comments
Graph break on .t() when Tensor._make_subclass
#151771 commented on Apr 28, 2025 • 0 new comments
Inductor generates wrong code for `torch.embedding`
#151918 commented on Apr 28, 2025 • 0 new comments
Tracking issue: Incorrect Meta Strides / Turn On PyDispatcher in FakeTensor Mode
#145094 commented on Apr 28, 2025 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on Apr 28, 2025 • 0 new comments
Add maxcount Parameter to torch.unique and torch.unique_consecutive
#151722 commented on Apr 28, 2025 • 0 new comments
ExpandableMemorySegments not working on H100s/A100s
#122057 commented on Apr 28, 2025 • 0 new comments
[ROCm] QR decomposition is much slower on MI300x than A100
#151066 commented on Apr 28, 2025 • 0 new comments
Unaccaptable OOMs all the time.
#152135 commented on Apr 28, 2025 • 0 new comments
distrubuted: false positive Grad strides vs Bucket strides warning
#152042 commented on Apr 28, 2025 • 0 new comments
[inductor] nan_asserts doesn't work for FP8, "RuntimeError: "isinf" not implemented for 'Float8_e4m3fn'"
#149002 commented on Apr 28, 2025 • 0 new comments
Support loading and executing a ExportedProgram from torch.export in C++ environment
#144663 commented on Apr 28, 2025 • 0 new comments
[C10D] Autograd Support for Collectives
#152131 commented on Apr 28, 2025 • 0 new comments
[torch.compile][export] `PendingUnbackedSymbolNotFound` for `torch.full`
#151980 commented on Apr 28, 2025 • 0 new comments
FlopCounterMode doesn't support HOP
#134385 commented on Apr 28, 2025 • 0 new comments
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 commented on Apr 28, 2025 • 0 new comments
torch.compile doesnot support index with tensor
#151997 commented on Apr 28, 2025 • 0 new comments
DISABLED test_comprehensive_native_layer_norm_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152056 commented on Apr 28, 2025 • 0 new comments
[Dynamo] Exception raised inside torch.autocast causes crash AttributeError: 'NoneType' object has no attribute 'is_python_constant
#152012 commented on Apr 28, 2025 • 0 new comments
linear + relu don't fuse
#152101 commented on Apr 28, 2025 • 0 new comments
FlexAttention + Export / AOTI
#152128 commented on Apr 28, 2025 • 0 new comments
Compiling attention (SDPA) with nested tensors fails when using DDP
#152068 commented on Apr 28, 2025 • 0 new comments
RFC: Torch Native Runtime
#152034 commented on Apr 28, 2025 • 0 new comments
standalone_compile with training errors with no cache artifacts
#152022 commented on Apr 28, 2025 • 0 new comments
Adam optimizer ValueError: beta1 as a Tensor
#149508 commented on Apr 26, 2025 • 0 new comments
[BE] Consolidate PR labeling logic
#151579 commented on Apr 29, 2025 • 0 new comments
[Infra] Jobs got frequently cancelled, sometimes mid-checkout
#151669 commented on Apr 29, 2025 • 0 new comments
The docstring linter should not force overridden methods to be documented
#151692 commented on Apr 29, 2025 • 0 new comments
fbgemm packages are compiled in torchinductor torchbench tests
#152024 commented on Apr 29, 2025 • 0 new comments
[inductor] [silent incorrectness] `torch.nn.PairwiseDistance(p=2)` outputs incorrect results with eager
#151198 commented on Apr 29, 2025 • 0 new comments
Major perf regression with `BatchNorm2d` + `torch.compile` with `reduce-overhead` + DDP
#139207 commented on Apr 29, 2025 • 0 new comments
associative scan is incorrect for certain shapes/kwargs
#137943 commented on Apr 29, 2025 • 0 new comments
Support AC with graph break
#139989 commented on Apr 29, 2025 • 0 new comments
torch.compile + Huggingface GenerationMixin
#141196 commented on Apr 29, 2025 • 0 new comments
nn.MultiheadAttention causes gradients to become NaN under some use cases
#41508 commented on Apr 29, 2025 • 0 new comments
Expanding subset of tensor reads wrong memory
#151799 commented on Apr 29, 2025 • 0 new comments
aten.grid_sampler_3d.default is missing a c-shim implementation, using proxy executor as fallback
#147625 commented on Apr 29, 2025 • 0 new comments
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 commented on Apr 29, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_uint8 (__main__.TestForeachCUDA)
#150878 commented on Apr 29, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8 (__main__.TestForeachCUDA)
#149859 commented on Apr 29, 2025 • 0 new comments
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 commented on Apr 29, 2025 • 0 new comments
Expand Tag Set: views & reductions
#129020 commented on Apr 29, 2025 • 0 new comments
[inductor] [cpu] `torch.nn.RReLU()` doesn't respect `fallback_random` flag
#147255 commented on Apr 29, 2025 • 0 new comments
DISABLED test_serialized_patterns_up_to_date (__main__.TestPatternMatcher)
#135476 commented on Apr 29, 2025 • 0 new comments
DISABLED test_comprehensive_scatter_reduce_prod_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140294 commented on Apr 29, 2025 • 0 new comments
torch.compile fails in FSDP due to .data assignment with different floating type
#152162 commented on Apr 29, 2025 • 0 new comments
Update quantization to make source files complient with /Zc:lambda
#92600 commented on Apr 29, 2025 • 0 new comments
[Inductor] define custom pass as list
#151876 commented on Apr 29, 2025 • 0 new comments
Loss parallel's override of log_softmax doesn't support negative dims
#152016 commented on Apr 29, 2025 • 0 new comments
`RuntimeError: UR error` with XPU
#149953 commented on Apr 29, 2025 • 0 new comments
"RuntimeError: makeDeviceForHostname(): unsupported gloo device" with nightly torch 2.8
#150381 commented on Apr 29, 2025 • 0 new comments
DISABLED test_distributed_checkpoint_state_dict_type0_cuda (__main__.TestDistributedCheckpointCUDA)
#145807 commented on Apr 29, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Apr 29, 2025 • 0 new comments
DISABLED test_comprehensive_nn_functional_glu_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#140383 commented on Apr 29, 2025 • 0 new comments
DISABLED test_comprehensive_nanquantile_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139593 commented on Apr 29, 2025 • 0 new comments
[dynamo] Some inefficiencies around handling __torch_function__
#151776 commented on Apr 28, 2025 • 0 new comments
`torch.bmm` is slow on non-contiguous BF16 CPU tensors
#151934 commented on Apr 28, 2025 • 0 new comments
Negative index support for `take_along_dim`
#146211 commented on Apr 26, 2025 • 0 new comments
Add a `TORCH_LOGS_RANK=0` env var that integrates with `TORCH_LOGS`
#146913 commented on Apr 26, 2025 • 0 new comments
[inductor] [assertion error] `torch.select_scatter` crashes on inductor but passes on eager
#151296 commented on Apr 26, 2025 • 0 new comments
[mem profiler] mem fragmentation and pynvml view
#150574 commented on Apr 25, 2025 • 0 new comments
[AotInductor][Export][Triton] how to export custom triton kernels when use torch.export.export
#151746 commented on Apr 25, 2025 • 0 new comments
AOTI cannot move tensors between cuda devices
#152130 commented on Apr 25, 2025 • 0 new comments
AOTInductor package can only be loaded on the first GPU (cuda:0) in C++ via AOTIModelPackageLoader
#152087 commented on Apr 25, 2025 • 0 new comments
[feature request][AOTI] Expand check input assertions to cover input guards created during compilation?
#151925 commented on Apr 25, 2025 • 0 new comments
Exported Module cannot call train() or eval()
#151726 commented on Apr 25, 2025 • 0 new comments
[export] deserialization for unbacked ranges is wrong
#151809 commented on Apr 25, 2025 • 0 new comments
DISABLED test_item_to_inputs_kernel_nobreak_cuda (__main__.TestInductorDynamicCUDA)
#119538 commented on Apr 25, 2025 • 0 new comments
Stack trace from pytest is very far away and far too find on some tests
#141204 commented on Apr 25, 2025 • 0 new comments
Some Performance Bug in `tol` of `torch.lobpcg()`
#152154 commented on Apr 25, 2025 • 0 new comments
Note some limit in docstring of `padding` in Poolnd
#152156 commented on Apr 25, 2025 • 0 new comments
Some Doc Issue about `torch.lobpcg()`
#152107 commented on Apr 25, 2025 • 0 new comments
[Scaled MM] Update to support on B200 TN, NT, NN, TT Layouts are supported
#152150 commented on Apr 25, 2025 • 0 new comments
padding_mode `reflect` works different from others in Conv
#152152 commented on Apr 25, 2025 • 0 new comments
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_
#151554 commented on Apr 25, 2025 • 0 new comments
[Inductor] track block shape of intermediary variables
#149905 commented on Apr 25, 2025 • 0 new comments
torch.nested.narrow() or torch.nested.to_padded_tensor() breaks backwards pass - invalid gradient
#145837 commented on Apr 25, 2025 • 0 new comments
Enable TorchInductor to Generate Matmuls Natively via `tl.dot`
#151705 commented on Apr 25, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16 (__main__.TestForeachCUDA)
#148965 commented on Apr 25, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int16 (__main__.TestForeachCUDA)
#150772 commented on Apr 25, 2025 • 0 new comments
Exporting the operator 'aten::lift_fresh' to ONNX - not supported
#151932 commented on Apr 25, 2025 • 0 new comments
[ONNX] Flip `dynamo` default to True in torch.onnx.export
#151693 commented on Apr 25, 2025 • 0 new comments
Pinned memory doubles memory usage for tensors slightly over 128MB
#150517 commented on Apr 25, 2025 • 0 new comments
Inductor pattern matcher replaces aten.reshape with aten.view in pattern
#151649 commented on Apr 25, 2025 • 0 new comments
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 commented on Apr 25, 2025 • 0 new comments
Xcode 16+: duplicate LC_RPATH '@loader_path'
#151592 commented on Apr 25, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_float64 (__main__.TestForeachCUDA)
#150752 commented on Apr 25, 2025 • 0 new comments
[Inductor] atomic_add does not support bf16
#97016 commented on Apr 25, 2025 • 0 new comments
standalone compile FakeTensor from_graph detection with tensor subclass outputs
#151945 commented on Apr 28, 2025 • 0 new comments
Add operator name to the size/strides/alignment assertion
#151930 commented on Apr 28, 2025 • 0 new comments
Runtime assertion not generated in inductor for input unbacked symints
#151879 commented on Apr 28, 2025 • 0 new comments
Optimize printing sympy expressions during logging and cache key computation
#151823 commented on Apr 28, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int64 (__main__.TestForeachCUDA)
#150822 commented on Apr 28, 2025 • 0 new comments
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on Apr 28, 2025 • 0 new comments
Torch 2.1 compile + FSDP (mixed precision) + LlamaForCausalLM: `RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with dtype 'float'.`
#111317 commented on Apr 28, 2025 • 0 new comments
[inductor] [cpu] [edge case] When processing `torch.nan_to_num-.long()`, inductor outputs the `reciprocal` of eager
#151510 commented on Apr 28, 2025 • 0 new comments
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on Apr 28, 2025 • 0 new comments
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on Apr 28, 2025 • 0 new comments
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on Apr 28, 2025 • 0 new comments
`torch.jit.script` does not respect `torch.set_default_dtype`
#150607 commented on Apr 28, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int32 (__main__.TestForeachCUDA)
#150800 commented on Apr 28, 2025 • 0 new comments
Python 3.10 + intel-openmp failed to use numactl after import torch._C
#136307 commented on Apr 28, 2025 • 0 new comments
Torch compile issue, AttributeError: 'NoneType' object has no attribute 'store_cubin'
#150980 commented on Apr 27, 2025 • 0 new comments
Elastic training crashes on killed agent
#150916 commented on Apr 27, 2025 • 0 new comments
[Torch Profiler] Only two streams captured in CUDA graph but multiple streams shown in Torch Profiler
#152114 commented on Apr 27, 2025 • 0 new comments
DeepSeek: a2a communication with metadata on the GPU
#146329 commented on Apr 27, 2025 • 0 new comments
ROCm+gcc 15 asserts
#145608 commented on Apr 27, 2025 • 0 new comments
[FSDP2][DTensor] numeric bug for DTensor + python float in gradient clipping
#149768 commented on Apr 26, 2025 • 0 new comments
caching keys+values in TransformerDecoderLayer for faster inference
#107573 commented on Apr 26, 2025 • 0 new comments
[release] Make pytorch source distribution package respect pep-0517
#150461 commented on Apr 26, 2025 • 0 new comments
Change the type hint for nn.Module.__call__ to be friendly to overrides.
#74746 commented on Apr 26, 2025 • 0 new comments
[inductor] [cuda] [silent incorrectness] `F.softmax-torch.argsort` output silent incorrectness when tensor input is very large
#151745 commented on Apr 26, 2025 • 0 new comments
[feature request] [ux] Frontend methods for fused elementwise affine transform: mul+add+dtype convert + support `integer_tensor.mul_(float_constant)` and `float_tensor.mul(some_constant, out = integer_tensor)` maybe via new args `rounding_mode=...` and `dtype=...` + maybe support OpenCV-style saturated dtype conversions (e.g. `clamp_` before conversion)
#106624 commented on Apr 26, 2025 • 0 new comments
[feature request] torch.mix function to generalize/symmetrize addcmul
#104849 commented on Apr 26, 2025 • 0 new comments
`torch.lerp` to support argument type promotion / broadcasting - including of `input` / `end` arguments
#57947 commented on Apr 26, 2025 • 0 new comments
Wrong formula for CosineAnnealingLR
#152081 commented on Apr 26, 2025 • 0 new comments
Build pytorch for rocm failed
#148167 commented on Apr 26, 2025 • 0 new comments
[RFC] Add new CPP builder for inductor on pytorch Windows
#124245 commented on Apr 26, 2025 • 0 new comments
Torch Inductor Windows Path Escape Characters
#135954 commented on Apr 26, 2025 • 0 new comments
Compilation of the post-training quantized model using Nvidia ModelOpt is failing with the error: Unsupported — 'inline in skipfiles: QuantLinearConvBase.quantize_weight
#151450 commented on Apr 26, 2025 • 0 new comments
refine fp32 precision api
#125888 commented on Apr 30, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#125806 commented on May 2, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on May 2, 2025 • 0 new comments
[pytree] support PyStructSequence types for Python pytree
#113258 commented on May 1, 2025 • 0 new comments
Automated submodule update: kineto
#106149 commented on May 2, 2025 • 0 new comments
[ATen][Sparse] Use Third-Party Eigen for sparse addmm
#101814 commented on Apr 30, 2025 • 0 new comments
Performance regression on modded-nanogpt torch-2.7.0.dev20250208→torch-2.7.0.dev20250209
#147463 commented on May 2, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16 (__main__.TestForeachCUDA)
#150309 commented on May 2, 2025 • 0 new comments
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on May 2, 2025 • 0 new comments
DISABLED test_remove_noop_view_dtype_cpu (__main__.CpuTests)
#151540 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_floor_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152058 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_nansum_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140693 commented on May 2, 2025 • 0 new comments
DISABLED AotInductorTest.FreeInactiveConstantBufferRuntimeConstantFoldingCuda (build.bin.test_aoti_inference)
#150299 commented on May 2, 2025 • 0 new comments
DISABLED AotInductorTest.FreeInactiveConstantBufferCuda (build.bin.test_aoti_inference)
#149495 commented on May 2, 2025 • 0 new comments
DISABLED test_comprehensive_bitwise_right_shift_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152057 commented on May 2, 2025 • 0 new comments
[ued] Slow start up time for `torch.compile` on GGUF Auraflow
#150706 commented on May 2, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64 (__main__.TestForeachCUDA)
#150298 commented on May 2, 2025 • 0 new comments
`context_parallel` fails for training with `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`
#149306 commented on May 2, 2025 • 0 new comments
The state of sparse Tensors
#9674 commented on May 1, 2025 • 0 new comments
[Feature request] Exclusive prefix sum, `torch.cumsum(input, dim=0, exclusive=True)`
#76191 commented on May 1, 2025 • 0 new comments
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on May 1, 2025 • 0 new comments
`view()` + modify-in-place fails silently with DTensor
#147570 commented on May 1, 2025 • 0 new comments
LoadHIP.cmake should find_package(composable_kernel)
#149809 commented on May 1, 2025 • 0 new comments
at::BlasBackend::Ck does not handle all ROCm BLAS gpus
#150187 commented on May 1, 2025 • 0 new comments
[ROCm] PyTorch slow on TTS
#150168 commented on May 1, 2025 • 0 new comments
Support SDPA flash attention/ memory efficant attn on ROCm gfx908
#141958 commented on May 1, 2025 • 0 new comments
[RFC] : Dynamically Quantized 8-bit Matrix Multiplication support
#149500 commented on May 1, 2025 • 0 new comments
dynamo cannot trace global op_set .__contains__
#145761 commented on May 1, 2025 • 0 new comments
Investigate FlexAttention performance degradation on low precision inputs
#147336 commented on May 1, 2025 • 0 new comments
When using torch to convert to oxxn model, testing the inference results with actual images shows tensor mismatch
#152097 commented on May 1, 2025 • 0 new comments
Dynamo unsupported: dynamic padding
#123855 commented on May 1, 2025 • 0 new comments
[ROCm] MI300X FP8 scaled_mm is extremely slow on nightly
#143465 commented on Apr 30, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format`
#144551 commented on May 1, 2025 • 0 new comments
[BE][PYFMT] migrate PYFMT for `{torch,test}/{nn,optim}/**` to `ruff format`
#144548 commented on May 1, 2025 • 0 new comments
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on May 1, 2025 • 0 new comments
[ci] Add riscv opt-int build
#143979 commented on Apr 27, 2025 • 0 new comments
Make init_method deprecated to fix TCP connection refused error
#143858 commented on Apr 25, 2025 • 0 new comments
Make functionalization `ViewMeta` serializable with pickle.
#143712 commented on Apr 27, 2025 • 0 new comments
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on Apr 29, 2025 • 0 new comments
[while_loop][jit inductor] auto-unspecialize int input and output to unbacked symints
#143457 commented on Apr 30, 2025 • 0 new comments
Fix type annotation of `Linear.bias`
#142326 commented on Apr 26, 2025 • 0 new comments
remove redundant assign
#140399 commented on Apr 27, 2025 • 0 new comments
[pt2e][quant] Add simple cond support for annotate, prepare and convert
#140323 commented on Apr 27, 2025 • 0 new comments
cpu: enable gemm-bf16f32 for SDPA BF16
#140159 commented on Apr 29, 2025 • 0 new comments
Support LOAD_BUILD_CLASS opcode in dynamo
#139561 commented on Apr 29, 2025 • 0 new comments
Use the device interface for detecting Triton availability
#139171 commented on May 1, 2025 • 0 new comments
Fix bug of torch.nn.functional.kl_div when broadcast happened
#138810 commented on Apr 27, 2025 • 0 new comments
Add overflow check for negtive integer div_floor and div_trunc on CPU
#138684 commented on Apr 30, 2025 • 0 new comments
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on Apr 29, 2025 • 0 new comments
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on May 1, 2025 • 0 new comments
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on Apr 29, 2025 • 0 new comments
[BE]: Try turning on LTO in CMake in CI
#137866 commented on Apr 25, 2025 • 0 new comments
Load cuda deps more aggressively
#137059 commented on Apr 28, 2025 • 0 new comments
Add back DistributedDataParallel types that were lost when pyi was removed
#136835 commented on Apr 29, 2025 • 0 new comments
[Don't Merge] Try to build custom ops with MKL XPU
#133658 commented on Apr 29, 2025 • 0 new comments
add ranking for grouped benchmarks
#133287 commented on May 2, 2025 • 0 new comments
[torch.special] Adding betainc, betaincc, betaincinv, betainccinv, betaln and beta with backward operation
#132135 commented on Apr 29, 2025 • 0 new comments
[1/N] Use 3.25.3 as the minimum CMake version
#130522 commented on May 1, 2025 • 0 new comments
[pytree] implement key path APIs for CXX pytree
#130141 commented on Apr 29, 2025 • 0 new comments
Remove direct dependency of protobuf from CMake
#127919 commented on Apr 27, 2025 • 0 new comments
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on Apr 30, 2025 • 0 new comments
[inductor] enable bf32 test for mkldnn conv
#127293 commented on Apr 30, 2025 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on Apr 30, 2025 • 0 new comments
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on Apr 30, 2025 • 0 new comments
Profiler doesn't seem to work on AMD CPUs
#150052 commented on Apr 30, 2025 • 0 new comments
[ROCm] sdpa group query attention bf16 numeric error
#139352 commented on Apr 30, 2025 • 0 new comments
[NJT] `.bmm`'s BmmBackward0 fails compilation when second arg requires grad
#152122 commented on Apr 30, 2025 • 0 new comments
[RFC] zentorch Integration
#150296 commented on Apr 30, 2025 • 0 new comments
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on Apr 30, 2025 • 0 new comments
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on Apr 30, 2025 • 0 new comments
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on Apr 30, 2025 • 0 new comments
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on Apr 30, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool (__main__.TestForeachCUDA)
#150120 commented on Apr 30, 2025 • 0 new comments
Implementation of a numerically stable log(1 - softmax) function in PyTorch
#129657 commented on Apr 30, 2025 • 0 new comments
Release artifacts for rc releases
#124759 commented on Apr 30, 2025 • 0 new comments
DISABLED test_fake_registration (__main__.TestOpProfiles)
#151301 commented on Apr 30, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on Apr 30, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16 (__main__.TestForeachCUDA)
#150119 commented on Apr 30, 2025 • 0 new comments
DISABLED test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA)
#150406 commented on Apr 30, 2025 • 0 new comments
Grad strides do not match bucket view strides.
#47163 commented on Apr 30, 2025 • 0 new comments
[torch/elastic] unexpected behavior of torch elastic
#147064 commented on Apr 30, 2025 • 0 new comments
[ONNX] Create a message to suggest users setting dynamo=True when exporting
#152025 commented on Apr 30, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#150902 commented on Apr 30, 2025 • 0 new comments
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 commented on Apr 30, 2025 • 0 new comments
DISABLED test_comprehensive_nansum_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#139710 commented on Apr 30, 2025 • 0 new comments
Error when using torch.fx on bert
#67970 commented on Apr 30, 2025 • 0 new comments
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#148966 commented on Apr 30, 2025 • 0 new comments
DISABLED test_repeated_calling_cuda (__main__.AOTInductorTestABICompatibleGpu)
#146185 commented on Apr 30, 2025 • 0 new comments
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 commented on Apr 30, 2025 • 0 new comments
DISABLED test_vdd_clamp_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134445 commented on Apr 30, 2025 • 0 new comments
[C10D] Make collectives backwards throw an error
#152127 commented on Apr 30, 2025 • 0 new comments
Numerical inaccuracies in "ddp_apply_optim_in_backward" unit tests for gloo backend
#111834 commented on Apr 29, 2025 • 0 new comments
DISABLED test_duplicate_registration_impl (__main__.TestOpProfiles)
#151281 commented on Apr 29, 2025 • 0 new comments
Is it possible to remove NCCL submodule and use only nccl binaries from pypi instead ?
#144768 commented on Apr 29, 2025 • 0 new comments
`version.txt` mismatch with tags in release branch
#151425 commented on Apr 29, 2025 • 0 new comments
Status of pip wheels with _GLIBCXX_USE_CXX11_ABI=1
#51039 commented on May 1, 2025 • 0 new comments
DTensor slicing on sharded dimension leads to replication
#149447 commented on May 1, 2025 • 0 new comments
[graph pickler] [inductor compile async] imprecise filter for non standard op?
#151904 commented on May 1, 2025 • 0 new comments
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#150933 commented on May 1, 2025 • 0 new comments
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128 (__main__.TestForeachCUDA)
#149323 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32 (__main__.TestForeachCUDA)
#150208 commented on May 1, 2025 • 0 new comments
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 commented on May 1, 2025 • 0 new comments
RFC: The State of Custom CUDA extensions in PyTorch
#152032 commented on May 1, 2025 • 0 new comments
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
#127176 commented on May 1, 2025 • 0 new comments
AOTI packaged model fails with generic error when run in for loop but succeeds on individual sample
#146524 commented on May 1, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on May 1, 2025 • 0 new comments
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on May 1, 2025 • 0 new comments
DISABLED test_comprehensive_special_xlog1py_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140648 commented on May 1, 2025 • 0 new comments
Ability to do aot/inductor compilation from a jit model (or torch.exported model)
#127928 commented on May 1, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on May 1, 2025 • 0 new comments
Memory Leak in MPS Backend During LSTM Iterations (Out of Memory Error)
#145374 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16 (__main__.TestForeachCUDA)
#150173 commented on May 1, 2025 • 0 new comments
[Async TP] all-gather-matuls not fusing properly when rowwise scales are used
#149990 commented on May 1, 2025 • 0 new comments
[Tracker] Nested tensor op coverage requests
#118107 commented on May 1, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64 (__main__.TestForeachCUDA)
#150161 commented on May 1, 2025 • 0 new comments
DISABLED test_comprehensive_nanmean_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140339 commented on May 1, 2025 • 0 new comments
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on May 1, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on May 1, 2025 • 0 new comments
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on May 1, 2025 • 0 new comments
[CI] No workflows scheduled on PRs
#151322 commented on Apr 30, 2025 • 0 new comments
[inductor] Improve codegen for argmax+max
#146643 commented on Apr 30, 2025 • 0 new comments
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128 (__main__.TestForeachCUDA)
#150141 commented on Apr 30, 2025 • 0 new comments
missing docs for torch.Tag
#126518 commented on Apr 30, 2025 • 0 new comments
Quantile is limited to 16 million elements and have poor performance.
#64947 commented on Apr 30, 2025 • 0 new comments
enhance documentation around the developer build
#108406 commented on Apr 30, 2025 • 0 new comments
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on Apr 30, 2025 • 0 new comments
torch.compile on MPS progress tracker
#150121 commented on Apr 30, 2025 • 0 new comments