-
Notifications
You must be signed in to change notification settings - Fork 24k
Insights: pytorch/pytorch
Overview
-
- 0 Merged pull requests
- 189 Open pull requests
- 164 Closed issues
- 121 New issues
Could not load contribution data
Please try again later
189 Pull requests opened by 114 people
-
Extend compute_global_tensor_shape to multi dimension sharding
#152166 opened
Apr 25, 2025 -
Generate test reports for pytest when option is given
#152170 opened
Apr 25, 2025 -
[c10d] Allow split_group to work with non nccl backends
#152175 opened
Apr 25, 2025 -
IGNORE: Testing OIDC
#152181 opened
Apr 25, 2025 -
[WIP] New Win Arm64 Runners - User pre installed Visual Studio
#152184 opened
Apr 25, 2025 -
xpu: get xpu arch flags at runtime in cpp_extensions
#152192 opened
Apr 25, 2025 -
SAC: fix recompute tag propagation for ops with list[tensor] inputs
#152193 opened
Apr 25, 2025 -
SAC: fix recompute tag propagation for ops with list[tensor] inputs
#152194 opened
Apr 25, 2025 -
SAC: fix recompute tag propagation for ops with list[tensor] inputs
#152195 opened
Apr 25, 2025 -
Add detailed triton kernel logging to tlparse
#152197 opened
Apr 25, 2025 -
[inductor] propagate shapes in CSEVariable
#152198 opened
Apr 25, 2025 -
Synchronize mps backend in the timer
#152199 opened
Apr 25, 2025 -
[submodule] Update ONNX to 1.18
#152200 opened
Apr 25, 2025 -
Add support for torch.cuda.FloatTensor()
#152208 opened
Apr 25, 2025 -
[CI] docker images use tags instead of image name
#152209 opened
Apr 25, 2025 -
Move mps_linear forward to use MPS kernels directly instead of MPSGraph
#152210 opened
Apr 25, 2025 -
Mini tutorial for provenance tracking
#152211 opened
Apr 25, 2025 -
Improve error handling in CachingAutotuner for argument mismatches
#152215 opened
Apr 25, 2025 -
[not for land] functionalization hack to try making mutations on graph input slices more efficient
#152217 opened
Apr 25, 2025 -
Add `padding="same"` for transposed convolution
#152228 opened
Apr 25, 2025 -
Fix: Consider input defined unbacked during inductor codegen for runtime asserts
#152231 opened
Apr 25, 2025 -
_get_total_norm should use float64 to avoid rounding errors
#152234 opened
Apr 25, 2025 -
At least one of ROCM_HOME or CUDA_HOME must be None
#152236 opened
Apr 26, 2025 -
[executorch hash update] update the pinned executorch hash
#152238 opened
Apr 26, 2025 -
Updates to build on Noble (Ubuntu24.04) and py3.12
#152240 opened
Apr 26, 2025 -
Enable 8byte vector loading for fp16/bf16
#152242 opened
Apr 26, 2025 -
Move code out of individual token linters
#152256 opened
Apr 26, 2025 -
[BE]: Cleanup traceutils with fmtlib
#152265 opened
Apr 26, 2025 -
[ROCm] Maxpool backward NHWC Perf Improvement targeting Resnet scenarios
#152267 opened
Apr 26, 2025 -
[Dynamo] Replace `unimplemented` with `unimplemented_v2` in `torch/_dynamo/variables/misc.py` [1/2]
#152274 opened
Apr 27, 2025 -
[CI] Add xpu inductor test into periodic workflow
#152281 opened
Apr 27, 2025 -
[DTensor] enable SimpleFSDP's composability with Tensor Parallel
#152286 opened
Apr 27, 2025 -
[inductor] Skip isinf check for FP8 E4M3 dtype
#152289 opened
Apr 28, 2025 -
Enable the AMP precision with freezing for CPU nightly test
#152298 opened
Apr 28, 2025 -
[cp] dispatch flex_attention_backward to CP impl in TorchDispatchMode
#152311 opened
Apr 28, 2025 -
setuptools.build_meta:__legacy__ backend is deprecated
#152313 opened
Apr 28, 2025 -
Fixed RELEASE.md typo
#152315 opened
Apr 28, 2025 -
Correct torch.xpu.is_bf16_supported return False if no XPU detected
#152317 opened
Apr 28, 2025 -
[dynamo] Use getattr when accessing self.value.__module__ in SkipFunctionVariable
#152320 opened
Apr 28, 2025 -
[ROCm] Unskipped test_rnn_dropout_state for ROCm
#152339 opened
Apr 28, 2025 -
[ROCm][Inductor][CK] Add ck-tile based universal gemm kernels to torch.mm autotune choices
#152341 opened
Apr 28, 2025 -
[Memento] Enable on-demand mode
#152342 opened
Apr 28, 2025 -
[WIP] DeadCodeEliminator Mark(block) improvement
#152348 opened
Apr 28, 2025 -
[inductor][dynamo] Include operator name in size/stride/alignment assertion
#152353 opened
Apr 28, 2025 -
Add codeowner for merge rules
#152354 opened
Apr 28, 2025 -
[Will This Work?] Build libgomp (gcc-11) from src on AArch64
#152361 opened
Apr 28, 2025 -
Format all headers under ATen/cpu/vec, not just top-level
#152364 opened
Apr 28, 2025 -
add is_vec_specialized_for
#152365 opened
Apr 28, 2025 -
vec::map: directly process reduced-precision floats when reasonable
#152366 opened
Apr 28, 2025 -
[Relandx2] Rewrite the guts of torch::jit::Lexer to speed it up
#152372 opened
Apr 28, 2025 -
complex.pow(2) on GPU by replacing with complex * complex to avoid numerical instability
#152373 opened
Apr 28, 2025 -
[FP8][CUTLASS] xFail `honor_sm_carveout` on `sm100`
#152378 opened
Apr 28, 2025 -
fix: outdated contents in dynamo overview
#152382 opened
Apr 28, 2025 -
[inductor][subgraph] Simplify the resulting output code for subgraph
#152383 opened
Apr 28, 2025 -
[inductor][invoke_subgraph] Remove assertion checks for outputs of invoke_subgraph
#152384 opened
Apr 28, 2025 -
Add vec_reduce_all specialization for std::plus on AArch64
#152388 opened
Apr 28, 2025 -
[Hierarchical Compilation] Track node mutations
#152389 opened
Apr 29, 2025 -
[Inductor] Fix typing in cuda_template.py
#152390 opened
Apr 29, 2025 -
[Inductor] Use `torch._dynamo.utils.same` in block pointer tests, adding atol/rtol kwargs to it.
#152392 opened
Apr 29, 2025 -
[Accelerator] Fix Python typing in accelerator
#152394 opened
Apr 29, 2025 -
[NFC] [inductor] [compile async] Warn exception if pickler failed
#152401 opened
Apr 29, 2025 -
[Do not merge] poke CI with FX IR always on
#152405 opened
Apr 29, 2025 -
Call torch.distributed.destroy_process_group() at the end of the example
#152407 opened
Apr 29, 2025 -
[Inductor][CPU] bug fix for int8 GEMM compensation epilogue
#152408 opened
Apr 29, 2025 -
Cleanup DeviceInterface in triton test
#152409 opened
Apr 29, 2025 -
[Hierarchical Compile] Add mutation dependencies to topological sorting
#152410 opened
Apr 29, 2025 -
[Quant][X86] add ops to compute uint8 pointwise add/add_relu
#152411 opened
Apr 29, 2025 -
Add Vectorized FP8 E4M3
#152417 opened
Apr 29, 2025 -
[Inductor][CPP] Enable vectorized fp8 quant dequant
#152418 opened
Apr 29, 2025 -
Relax tolerance for test_quick_baddbmm_cpu_complex64
#152424 opened
Apr 29, 2025 -
[ROCm] cpp_extension allow user to override default flags
#152432 opened
Apr 29, 2025 -
Log aot and idx waitcounters.
#152444 opened
Apr 29, 2025 -
[TorchDynamo] Fix failure to realize LazyVariableTracker on stack
#152446 opened
Apr 29, 2025 -
Add new profiling events to `DebugAutotuner`
#152449 opened
Apr 29, 2025 -
[PT2] Port replace_lce_with_matmul / replace_first_lce_with_fused_matmul_lce to PT2 pre_grad passes
#152450 opened
Apr 29, 2025 -
Implement async manifold cache write
#152452 opened
Apr 29, 2025 -
Add epoch to fake tensor cache key
#152453 opened
Apr 29, 2025 -
[export] add runtime assert messages to python torch checks (#150719)
#152455 opened
Apr 29, 2025 -
Fix XLA issue.
#152456 opened
Apr 29, 2025 -
fix: Update padding_mode to use Literal for type checking
#152458 opened
Apr 29, 2025 -
[IR] Input Adapter refactor prototype
#152459 opened
Apr 29, 2025 -
[pytorch][triton] flex attention fwd kernel with TMA loads (#151923)
#152460 opened
Apr 29, 2025 -
consolidate guard_or_x and definitely_x
#152463 opened
Apr 29, 2025 -
add device generalisation support for distributed tests
#152471 opened
Apr 29, 2025 -
[nativert] Move TensorMeta to pytorch core
#152475 opened
Apr 29, 2025 -
[DO NOT REVIEW] Attempt a mixed precision fused adam
#152477 opened
Apr 29, 2025 -
[ONNX] Suggest users setting dynamo=True when exporting
#152478 opened
Apr 29, 2025 -
Fix flaky test in test_custom_ops
#152484 opened
Apr 29, 2025 -
Change unsafe_marked_cacheable_functions to a dictionary, so that you can specify a static cache key
#152486 opened
Apr 29, 2025 -
[invoke_subgraph] Simplify output code for subgraph output node
#152490 opened
Apr 29, 2025 -
fix tests broken after #152450
#152493 opened
Apr 29, 2025 -
[inductor][invoke_subgraph] Free the buffers before the subgraph call
#152494 opened
Apr 29, 2025 -
[export] Refactor pt2 save/load
#152495 opened
Apr 30, 2025 -
Refactor nested benchmark functions in AlgorithmSelectorCache
#152502 opened
Apr 30, 2025 -
[Hierarchical Compilation] Use universal flatten APIs
#152505 opened
Apr 30, 2025 -
[Hierarchical Compile] Take into account mutation deps in cycle detection
#152506 opened
Apr 30, 2025 -
[inductor] [compile async] Don't compile in eager
#152507 opened
Apr 30, 2025 -
[2/N] Deprecate c10::string_view and at::string
#152509 opened
Apr 30, 2025 -
[MPS] Migrate mul to TensorIterator
#152515 opened
Apr 30, 2025 -
Make torch/csrc/utils.h to be device-agnostic
#152521 opened
Apr 30, 2025 -
Remove the unnecessary cuda/Tensor.cpp
#152522 opened
Apr 30, 2025 -
[compile async] [cache] testing
#152523 opened
Apr 30, 2025 -
elastic: do not shutdown rendezvous on leaving workers
#152525 opened
Apr 30, 2025 -
Use std::apply for CPU code
#152526 opened
Apr 30, 2025 -
Add methods for checking Triton availability to the device interface
#152529 opened
Apr 30, 2025 -
Do not check out nccl when not building it
#152533 opened
Apr 30, 2025 -
[CI] Use cmake from pip instead of conda in CI docker images
#152537 opened
Apr 30, 2025 -
Disable SLEEF implementation of vec::maximum in vec128_float_neon.h | Accelerate aten::hardtanh_ by 21x
#152538 opened
Apr 30, 2025 -
[CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`
#152540 opened
Apr 30, 2025 -
Add parameters for monitor
#152541 opened
Apr 30, 2025 -
strict multidimensional slicing
#152543 opened
Apr 30, 2025 -
Migrate perf_test/test_[gc]pu_speed_mnist.sh from conda to venv
#152544 opened
Apr 30, 2025 -
ci: Switch benchmark dependency to use pip
#152545 opened
Apr 30, 2025 -
Remove Conda Instructions
#152546 opened
Apr 30, 2025 -
Implemented `Size.__radd__`
#152554 opened
Apr 30, 2025 -
[BE] Update numba versions
#152557 opened
Apr 30, 2025 -
xpu: rely on sycl/sycl.hpp to include bfloat16.hpp
#152562 opened
Apr 30, 2025 -
[c10d][fr] Make FR vendor neutral so that other backends can use it
#152563 opened
Apr 30, 2025 -
[ROCm] Update spack includes
#152569 opened
Apr 30, 2025 -
[Hierarchical Compile] Replace tracing alias and mutation check with dynamo impl
#152570 opened
Apr 30, 2025 -
[Dynamo] Fix typing in graph_deduplication.py
#152572 opened
May 1, 2025 -
Allow decomposeK to fuse
#152573 opened
May 1, 2025 -
Added documentation for nonzero_static function (#152347)
#152574 opened
May 1, 2025 -
[IR] Input Adapter refactor prototype (#152459)
#152575 opened
May 1, 2025 -
[testing] 1
#152578 opened
May 1, 2025 -
[aoti] skip codegen for sympy expr when codegening input
#152579 opened
May 1, 2025 -
[cutlass backend] cache filtered ops based on layouts
#152580 opened
May 1, 2025 -
[invoke_subgraph] rename identifiers to prevent python mangling
#152581 opened
May 1, 2025 -
add support for 0 size shardedTensor and recalculate metadata from all_gather
#152583 opened
May 1, 2025 -
[c10d][fr] Decouple the core logic of FR with the entry and event type
#152585 opened
May 1, 2025 -
[2/N] Use std::filesystem
#152586 opened
May 1, 2025 -
[Inductor] Introduce Wrapper IR line for symbolic call args
#152587 opened
May 1, 2025 -
[WIP] verbose logging for recompilations
#152588 opened
May 1, 2025 -
[Dynamo] Optimize dedupe region ancestor tracking
#152589 opened
May 1, 2025 -
Fix #152280: add Literal[…] PaddingMode to Conv modules
#152590 opened
May 1, 2025 -
Fix: promote scalar to MPS device in exec_binary_kernel
#152591 opened
May 1, 2025 -
[c10d] Add support for ReduceOp::AVG in ProcessGroupMPI for FSDP2
#152594 opened
May 1, 2025 -
[wip] base commit
#152596 opened
May 1, 2025 -
add backend_specialization kwarg to mark_dynamic
#152597 opened
May 1, 2025 -
[testing] 3
#152599 opened
May 1, 2025 -
store backend specializations in StatelessSymbolicContext
#152600 opened
May 1, 2025 -
use backend specializations in compile_and_call_fx_graph
#152601 opened
May 1, 2025 -
[testing] 4
#152602 opened
May 1, 2025 -
[BE] Delete `Module_CUDA_fix`
#152603 opened
May 1, 2025 -
[Testing] Is FindCUDA.cmake from `Modules_CUDA_fix` called at all?
#152604 opened
May 1, 2025 -
[Environment Variable] Use thread-safe getenv functions
#152609 opened
May 1, 2025 -
Update padding_mode type annotation to use Literal type (PaddingMode)
#152610 opened
May 1, 2025 -
Makefile: refactor build, setup and lint rules
#152611 opened
May 1, 2025 -
Revert "Cleanup VS 2019 refs in pytorch (#145863)"
#152613 opened
May 1, 2025 -
[WIP] Make FR vendor generic and try to enable it for gloo
#152614 opened
May 1, 2025 -
[dynamo] Guard serialization for DUAL LEVEL.
#152615 opened
May 1, 2025 -
[dynamo] Guard serialization for FUNCTORCH_STACK_MATCH
#152616 opened
May 1, 2025 -
[CUDA][TF32] Account for TF32 in `test_conv2d_same_padding`
#152618 opened
May 1, 2025 -
[DO NOT REVIEW] Implement __obj_flatten__ for LinearPackedParamsBase
#152619 opened
May 1, 2025 -
Stop proxy-ing autograd.Function.ctx into the graph
#152621 opened
May 1, 2025 -
Parameterized CUDA Graph Launch
#152622 opened
May 1, 2025 -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 opened
May 1, 2025 -
[Dynamo] Guard serialization for TENSOR_SUBCLASS_METADATA_MATCH
#152626 opened
May 1, 2025 -
Make PGO code state not sensitive to file path by hashing file content when the file is available.
#152628 opened
May 1, 2025 -
[DCP] Add 30min timeout for IPC communications in async checkpointing
#152629 opened
May 1, 2025 -
[ROCm] Initial AITER Integration for mha_bwd asm kernels
#152630 opened
May 1, 2025 -
Fix two error messages involving Tensor.dense()
#152631 opened
May 1, 2025 -
[ca] wrap flex attention tests with compiled autograd
#152633 opened
May 1, 2025 -
Switch to metal kernel for mul
#152636 opened
May 1, 2025 -
[export] Add draft-export docs
#152637 opened
May 1, 2025 -
[dynamic shapes] use try-catch instead of guard_or_true for reshape_view_helper
#152638 opened
May 1, 2025 -
[AOTAutogradCache][Easy] Move `"einops.einops.rearrange"` to `SAFE_NON_TORCH_FUNCTIONS`
#152640 opened
May 1, 2025 -
[FlexAttention] explicilty create grad_q w/ strides
#152641 opened
May 1, 2025 -
[CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
#152642 opened
May 1, 2025 -
[BE]remove vulkan test
#152643 opened
May 1, 2025 -
[inductor] Realize bucketize/searchsorted output
#152644 opened
May 1, 2025 -
[do-not-land][ca] default on for CI
#152646 opened
May 1, 2025 -
[Flight Recorder] Added logging after FR dump completed
#152648 opened
May 2, 2025 -
thread through specialization to compile_fx
#152650 opened
May 2, 2025 -
Add assert_fp8_close helper for FP8 tensor comparisons
#152651 opened
May 2, 2025 -
Refactor some common autotune-related utils into a new file
#152652 opened
May 2, 2025 -
Remove incorrect assertion
#152653 opened
May 2, 2025 -
Make assertion about pass callable print the bad pass
#152654 opened
May 2, 2025 -
cleanup, refactor and add missing self._dde_suppressed checks
#152657 opened
May 2, 2025 -
Fix the basic description of torch.min(), torch.max(), torch.all(), torch.any()
#152658 opened
May 2, 2025 -
[Inductor] Fix kernel argument ordering when using dynamic shapes with workspace
#152660 opened
May 2, 2025 -
Fix evaluate_expr to include suppress_guards_tls in cache key
#152661 opened
May 2, 2025 -
Re-enable FakeTensor caching for SymInts
#152662 opened
May 2, 2025 -
[MPS][BE] Do not dispatch empty kernels
#152663 opened
May 2, 2025 -
Raise error when no record on extra_files
#152664 opened
May 2, 2025 -
MXFP8 Fix broken bias support for mxfp8
#152665 opened
May 2, 2025 -
[StaticCudaLauncher] Ensure cuda context exists before launching kernels
#152667 opened
May 2, 2025 -
Added documentation for nonzero_static function (#152347)
#152669 opened
May 2, 2025 -
add codegen layer specialization dispatch
#152670 opened
May 2, 2025
164 Issues closed by 41 people
-
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int64 (__main__.TestForeachCUDA)
#150392 closed
May 2, 2025 -
DISABLED test_nvshmem
#152649 closed
May 2, 2025 -
py_limited_api=True in PyTorch2.7 will break the build of extensions
#152243 closed
May 2, 2025 -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int32 (__main__.TestForeachCUDA)
#150350 closed
May 2, 2025 -
[ONNX] Improve and sort out fallback mechanism
#151703 closed
May 2, 2025 -
Should make the doc of `nn.CrossEntropyLoss()` more clear
#134853 closed
May 1, 2025 -
torch.compile should not recompiles when `.requires_grad=True` under `torch.no_grad()` context
#131975 closed
May 1, 2025 -
compiled autograd + dynamic shapes fails with constraint violation
#133575 closed
May 1, 2025 -
Export QAT model is not performing as expected when compared to the original model and FX Graph QAT
#150746 closed
May 1, 2025 -
`torch.export` fails on `InstanceNorm1d`
#152467 closed
May 1, 2025 -
module.cuda() doesn't work under FakeTensorMode
#148977 closed
May 1, 2025 -
[CI] [anaconda] CI Perf Tests
#148342 closed
May 1, 2025 -
[Inductor] Dynamo hangs when processing an operator, seemingly depending on a logical argument value
#151743 closed
May 1, 2025 -
[export] Warn users when 0/1 specialization happens
#151582 closed
May 1, 2025 -
The test 'test_host_memory_stats' is failing in torch2.7.0+cu118
#152422 closed
May 1, 2025 -
How does torch.cudagraph capture a hybrid graph?
#152584 closed
May 1, 2025 -
Add switch to disable truncation to long list print
#152427 closed
May 1, 2025 -
`torch.randint` can't handle large `high` argument (and in general high range of `torch.uint64`)
#152564 closed
Apr 30, 2025 -
torch.randint should accept high=2**63
#81446 closed
Apr 30, 2025 -
pytorch index_select is too slow
#111247 closed
Apr 30, 2025 -
cuda graphs produce two additional kernel calls
#143572 closed
Apr 30, 2025 -
[regression] Not getting `CUDA error: device-side assert triggered` on main for CUDA_KERNEL_ASSERT2
#107396 closed
Apr 30, 2025 -
[CI] [anaconda] Benchmarks anaconda removal
#152123 closed
Apr 30, 2025 -
More logs to show why fx graph cache isn't hit / created?
#152065 closed
Apr 30, 2025 -
Mr
#152549 closed
Apr 30, 2025 -
Add Description of `validate_args` in `torch.distributions.`
#152165 closed
Apr 30, 2025 -
[ROCm] "No available kernel" when running EFFICIENT_ATTENTION sdpa
#138864 closed
Apr 30, 2025 -
difficulty creating magma tarball when new rocm or cuda versions are deployed
#151707 closed
Apr 30, 2025 -
[CUDA Graph tree] Cannot capture buffer allocation on side CUDA Streams
#151199 closed
Apr 30, 2025 -
Unify nccl versions for x86 and aarch64 builds
#149554 closed
Apr 30, 2025 -
Release torch with CUDA12.1 for 2.6 and even latest version
#152524 closed
Apr 30, 2025 -
Failed to create Gloo new group after initialized with NCCL
#68726 closed
Apr 30, 2025 -
Cudnn 9.2 is out!
#119400 closed
Apr 30, 2025 -
[ONNX] scatter_reduce with max reduction not correctly converted to ONNX for 2d input
#152419 closed
Apr 30, 2025 -
DISABLED test_fn_grad_grid_sampler_2d_cuda_float64 (__main__.TestBwdGradientsCUDA)
#131079 closed
Apr 30, 2025 -
NotImplementedError: Operator aten.view.dtype does not have a sharding strategy registered.
#152530 closed
Apr 30, 2025 -
DISABLED test_inductor_debug (__main__.LoggingTests)
#152511 closed
Apr 30, 2025 -
DISABLED test_einsum_cpu (__main__.TestUnbackedSymintsCPU)
#151380 closed
Apr 30, 2025 -
[export] export doesn't save custom meta for constant tensors
#151476 closed
Apr 30, 2025 -
Regression: Multiple OpenMP runtimes linked to libtorch_cpu.so
#146603 closed
Apr 29, 2025 -
Floating point exception (core dumped) in `native_channel_shuffle`
#142453 closed
Apr 29, 2025 -
UNSTABLE pull / linux-jammy-py3-clang12-executorch / test (executorch)
#144480 closed
Apr 29, 2025 -
[CI] Remove conda usage from lint related jobs
#148110 closed
Apr 29, 2025 -
`make pdflatex` Sphinx error: Builder name pdflatex not registered or available through entry point
#147027 closed
Apr 29, 2025 -
[CI] [anaconda] Utility scripts and workflows
#152124 closed
Apr 29, 2025 -
cd: There's no way to test changes to container images for binary builds
#149679 closed
Apr 29, 2025 -
[RFC] PyTorch next wheel build platform: manylinux-2.28
#123649 closed
Apr 29, 2025 -
[Performance] Simple arithemtic operations are slower using MPS than Metal
#143874 closed
Apr 29, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_complex64 (__main__.TestForeachCUDA)
#151313 closed
Apr 29, 2025 -
[BUG] when invoking torch::manul_seed() program crashed in libtorch 2.2.1
#121658 closed
Apr 29, 2025 -
Loading weights using `torch.distributed.checkpoint` leads to large loss values
#145378 closed
Apr 29, 2025 -
FSDP OOM during initialization
#152263 closed
Apr 29, 2025 -
[Inductor] Different results with Conv2d and BN2d not in `eval mode`
#141317 closed
Apr 29, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_bool (__main__.TestForeachCUDA)
#151268 closed
Apr 29, 2025 -
DISABLED test_guard_failure_fn2 (__main__.MiscTests)
#148217 closed
Apr 29, 2025 -
DISABLED test_inlined_functions_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148544 closed
Apr 29, 2025 -
DISABLED test_nested_tuple_output_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148606 closed
Apr 29, 2025 -
DISABLED test_make_closure_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148889 closed
Apr 29, 2025 -
DISABLED test_capture_tracked_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148427 closed
Apr 29, 2025 -
DISABLED test_return_captured_var_used_multiple_times_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148624 closed
Apr 29, 2025 -
DISABLED test_export_defaults_ok_dynamic_shapes (__main__.DynamicShapesExportTests)
#148331 closed
Apr 29, 2025 -
DISABLED test_internal_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148558 closed
Apr 29, 2025 -
DISABLED test_int_shape_binops (__main__.MiscTests)
#148296 closed
Apr 29, 2025 -
DISABLED test_user_defined_binop (__main__.MiscTests)
#148443 closed
Apr 29, 2025 -
DISABLED test_empty_graph_nested_calls_fullgraph_True_dynamic_shapes (__main__.DynamicShapesReproTests)
#148311 closed
Apr 29, 2025 -
DISABLED test_wrap_kwarg_default_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148951 closed
Apr 29, 2025 -
DISABLED test_capture_untracked_global_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148442 closed
Apr 29, 2025 -
DISABLED test_freevars_as_inputs_to_wrap_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148514 closed
Apr 29, 2025 -
DISABLED test_donated_buffer1_dynamic_shapes (__main__.DynamicShapesAotAutogradFallbackTests)
#149101 closed
Apr 29, 2025 -
DISABLED test_sys_modules_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148330 closed
Apr 29, 2025 -
DISABLED test_int_shape_inplace_binops (__main__.MiscTests)
#148312 closed
Apr 29, 2025 -
DISABLED test_guard_failure_fn_shape_control_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148216 closed
Apr 29, 2025 -
DISABLED test_lift_tensors_with_shared_symbols_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148870 closed
Apr 29, 2025 -
DISABLED test_wrap_kwarg_only_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#149024 closed
Apr 29, 2025 -
DISABLED test_symint_in_slice_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148643 closed
Apr 29, 2025 -
DISABLED test_capture_tracked_nested_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148851 closed
Apr 29, 2025 -
DISABLED test_wrap_kwarg_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#149000 closed
Apr 29, 2025 -
DISABLED test_param_shape_binops (__main__.MiscTests)
#148369 closed
Apr 29, 2025 -
DISABLED test_wrap_kwarg_default_if_branch_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148961 closed
Apr 29, 2025 -
DISABLED test_dont_aggressively_write_assert_dynamic_shapes (__main__.DynamicShapesReproTests)
#148295 closed
Apr 29, 2025 -
DISABLED test_export_with_cond_dynamic_shape_pred_dynamic_shapes (__main__.DynamicShapesExportTests)
#148368 closed
Apr 29, 2025 -
DISABLED test_empty_graph_nested_calls_fullgraph_False_dynamic_shapes (__main__.DynamicShapesReproTests)
#148426 closed
Apr 29, 2025 -
DISABLED test_shape_int_inplace_binops (__main__.MiscTests)
#148392 closed
Apr 29, 2025 -
DISABLED test_capture_untracked_nonlocal_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148464 closed
Apr 29, 2025 -
DISABLED test_sys_modules (__main__.MiscTests)
#148428 closed
Apr 29, 2025 -
DISABLED test_wrap_pytree_kwargs_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#149079 closed
Apr 29, 2025 -
DISABLED test_dynamic_sources_dynamic_override (__main__.MiscTests)
#148218 closed
Apr 29, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#151300 closed
Apr 29, 2025 -
DISABLED test_mark_unbacked_strict_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148215 closed
Apr 29, 2025 -
DISABLED test_nested_wrap_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148914 closed
Apr 29, 2025 -
DISABLED test_mark_unbacked_strict (__main__.MiscTests)
#148332 closed
Apr 29, 2025 -
DISABLED test_wrap_all_kwarg_dynamic_shapes (__main__.DynamicShapesHigherOrderOpTests)
#148665 closed
Apr 29, 2025 -
float' object is not callable when using scheduler.step() with MultiplicativeLR
#81554 closed
Apr 29, 2025 -
MPS SDPA `float32` memory leak
#152344 closed
Apr 29, 2025 -
[Break XPU] chunk_cat accuracy failed on XPU Inductor UT.
#152296 closed
Apr 29, 2025 -
[XPU] The updated torch-xpu-ops caused interpolate_bilinear accuracy error.
#152020 closed
Apr 29, 2025 -
Compilation Issues with sm_129 (RTX 5070 Ti) on WSL - Seeking Advice
#152400 closed
Apr 29, 2025 -
Proposal: Beautify torch.distributed.tensor.debug.visualize_sharding
#151857 closed
Apr 29, 2025 -
[CI] [anaconda] Utilities
#152126 closed
Apr 29, 2025 -
torch.bucketize works incorrectly on uint input with negative boundaries after torch.compile-gpu
#145929 closed
Apr 28, 2025 -
Potential bug in torch.optim.lr_scheduler.CosineAnnealingWarmRestarts
#88791 closed
Apr 28, 2025 -
[inductor] [dtype] `ReplicationPad` raise dtype error on eager but pass the check on indcutor
#143779 closed
Apr 28, 2025 -
Error building pytorch from source
#138315 closed
Apr 28, 2025 -
Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit
#152351 closed
Apr 28, 2025 -
torch.arange bf16 results are not accurate
#137774 closed
Apr 28, 2025 -
DISABLED test_parity__foreach_add_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#151228 closed
Apr 28, 2025 -
DISABLED test_foreach_reduce_large_input__foreach_max_w_empty_False_cuda_bool (__main__.TestForeachCUDA)
#151229 closed
Apr 28, 2025 -
[discussion] Consolidation of audio-visual I/O in a new package
#81102 closed
Apr 28, 2025 -
[PTD][RPC] Verify RPC Tutorials contents and scripts
#138832 closed
Apr 28, 2025 -
Error in DTensor uneven shard view op
#143372 closed
Apr 28, 2025 -
Incorrect Gradient Computation in `torch.log1p`
#152088 closed
Apr 28, 2025 -
AWS A100 runners reliability issue
#140332 closed
Apr 28, 2025 -
[CI] [anaconda] CI Build and Test scripts MacOS
#152113 closed
Apr 28, 2025 -
peak memory is lower for subsequent fresh runs compared to the first run of a torch.compiled model
#151995 closed
Apr 28, 2025 -
DISABLED test_dynamic_sources_dynamic_override_dynamic_shapes (__main__.DynamicShapesMiscTests)
#148214 closed
Apr 28, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float64 (__main__.TestForeachCUDA)
#151214 closed
Apr 28, 2025 -
DISABLED test_setting_meta_device_model_broadcasting_and_memory (__main__.TestStateDict)
#143994 closed
Apr 28, 2025 -
Pytorch aten::col2im not currently supported on the MPS backend
#151820 closed
Apr 28, 2025 -
prod_cpu not implemented for 'BFloat16'
#89372 closed
Apr 28, 2025 -
Windows CUDA Build Failure: Ambiguous std in cuda_vectorized_test.cu (CUDA 12.6/MSVC 2019)
#152291 closed
Apr 28, 2025 -
`torch._inductor.exc.InductorError: CppCompileError: C++ compile error` after Torch 2.7 Release
#152172 closed
Apr 28, 2025 -
Aborted (core dumped) in torch.flipud
#152253 closed
Apr 27, 2025 -
Aborted (core dumped) in torch.fliplr
#152085 closed
Apr 27, 2025 -
Less Check on the triangular tensor of `L` in `torch.cholesky_solve()`
#152164 closed
Apr 27, 2025 -
[inductor] `.to_sparse()-.to_dense()` throws `LoweringException: NotImplementedError:`
#151522 closed
Apr 26, 2025 -
"TypeError: unhashable type: non-nested SymInt" with `torch.compile`
#135099 closed
Apr 26, 2025 -
Pytorch 2.7.0 with XPU (silently) crashing
#152255 closed
Apr 26, 2025 -
[Inductor] weird reordering behavior with `wait_tensor`
#152252 closed
Apr 26, 2025 -
memoryview support for `torch._C.import_ir_module_from_buffer`
#107099 closed
Apr 26, 2025 -
Compute Capability Misrecognition on NVIDIA Force RTX 50Ge70 Ti (Blackwell Architecture)
#152223 closed
Apr 25, 2025 -
[MPS/Inductor] polygamma is miscompiled for some inputs
#152205 closed
Apr 25, 2025 -
Lint rule for always using std::optional?
#150313 closed
Apr 25, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float32 (__main__.TestForeachCUDA)
#151136 closed
Apr 25, 2025 -
DISABLED test_foreach_check_stride_ignore_dims_of_one_cuda_float32 (__main__.TestForeachCUDA)
#150026 closed
Apr 25, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_float16 (__main__.TestForeachCUDA)
#151114 closed
Apr 25, 2025 -
[AOTI] aoti_compile_and_package + use_runtime_constant_folding gives "Error: CUDA driver error: file not found"
#152067 closed
Apr 25, 2025 -
[`Torch 2.7.0 x Py 3.9`] Incompatible dep versions with networkx
#152191 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass3 (__main__.ComposabilityTest)
#151089 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass0 (__main__.ComposabilityTest)
#151083 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass1 (__main__.ComposabilityTest)
#151084 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass3 (__main__.ComposabilityTest)
#151090 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_MP_ScheduleClass2 (__main__.ComposabilityTest)
#151085 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass1 (__main__.ComposabilityTest)
#151087 closed
Apr 25, 2025 -
DISABLED test_pp_ddp_ScheduleClass2 (__main__.ComposabilityTest)
#151082 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass2 (__main__.ComposabilityTest)
#151088 closed
Apr 25, 2025 -
DISABLED test_pp_ddp_ScheduleClass1 (__main__.ComposabilityTest)
#151081 closed
Apr 25, 2025 -
DISABLED test_pp_ddp_ScheduleClass0 (__main__.ComposabilityTest)
#151078 closed
Apr 25, 2025 -
DISABLED test_pp_fsdp_dp_type_FSDP_ScheduleClass0 (__main__.ComposabilityTest)
#151086 closed
Apr 25, 2025 -
DISABLED test_parity__foreach_acos_fastpath_outplace_cuda_complex64 (__main__.TestForeachCUDA)
#151099 closed
Apr 25, 2025 -
ENH: Publish full-fledged tarballs also for release candidates
#150649 closed
Apr 25, 2025 -
Fix the Inconsistency and Description of `device_type` in `torch.random.fork_rng()`
#151784 closed
Apr 25, 2025 -
`out` should exist as an instance variable out of the func itself
#146676 closed
Apr 25, 2025 -
Size of `tau` can mismatch with the context in `torch.ormqr()`
#150674 closed
Apr 25, 2025 -
Whether `x` and `dx` can be used together in `torch.trapezoid()`?
#151105 closed
Apr 25, 2025 -
[ONNX] Dynamic shapes: support `torch.sym_not`
#136572 closed
Apr 25, 2025 -
What is the difference between normal_tensor.storage().use_count() and viewed_tensor's?
#152100 closed
Apr 25, 2025
121 Issues opened by 76 people
-
Torch BF16 group gemm hangs in backward pass - core issue isolated, needs proper resolution.
#152668 opened
May 2, 2025 -
DISABLED test_comprehensive_nansum_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152666 opened
May 2, 2025 -
UNSTABLE docker-cache-mi300 / docker-cache
#152655 opened
May 2, 2025 -
Check for if two tensors are overall similar instead of bitwise similar?
#152647 opened
May 2, 2025 -
ProcessGroupGloo.allgather_into_tensor_coalesced crashes with CUDA tensors
#152645 opened
May 1, 2025 -
static cuda launcher causes `RuntimeError: CUDA driver error: invalid device context` in torchtitan CI
#152639 opened
May 1, 2025 -
TestFlexAttentionCUDA.test_GQA_score_mod7_cuda_float16 fails on h100
#152635 opened
May 1, 2025 -
Incorrect strides for `nonzero_static` compilation
#152634 opened
May 1, 2025 -
DISABLED test_torchvision_models_efficientnet_v2_l (__main__.TestVisionTracing)
#152632 opened
May 1, 2025 -
[v2.7.1] Release Tracker
#152627 opened
May 1, 2025 -
modded-nanogpt flaky NCCL hang starting 3/30 nightly
#152623 opened
May 1, 2025 -
Pytorch Profiler crashes while using it with Pytorch Lightning module
#152617 opened
May 1, 2025 -
Enable AOTI for Metal inductor
#152612 opened
May 1, 2025 -
[triton pin update] Run Inductor CI on pin updates for Triton and the PyTorch nightly branch
#152608 opened
May 1, 2025 -
Loops impacting output when utilizing hooks
#152607 opened
May 1, 2025 -
AOTI regression on SAM and tts-angular
#152606 opened
May 1, 2025 -
ROCm, 7900 XTX: Pytorch SDPA is 2.5x slower than manual implementation with non-continuous v
#152595 opened
May 1, 2025 -
Flex Attention doesn't scale with custom bias
#152593 opened
May 1, 2025 -
[ratter-build] Cannot detect CUDA when build from source
#152592 opened
May 1, 2025 -
[MPS] Binary kernels produce incorrect results when one of the tensor arguments is from a wrapped scalar
#152582 opened
May 1, 2025 -
[Benchmark] High compilation time variance on benchmark dashboards
#152566 opened
Apr 30, 2025 -
DISABLED test_graph_partition_reorder_cpu_and_gpu_interleave (__main__.CudaGraphTreeTests)
#152561 opened
Apr 30, 2025 -
DISABLED test_pending_fusion_pro_and_epi (__main__.TestPrologueFusion)
#152560 opened
Apr 30, 2025 -
DISABLED test_comprehensive_signal_windows_hamming_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152559 opened
Apr 30, 2025 -
DISABLED test_comprehensive_amin_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152558 opened
Apr 30, 2025 -
PGO does not work on jobs for frameworks that copy code to different dirs at different attempts.
#152555 opened
Apr 30, 2025 -
MPS varying seq len SDPA memory leak
#152550 opened
Apr 30, 2025 -
FakeTensorUpdater does not trace nodes correctly
#152548 opened
Apr 30, 2025 -
optree package status in PyTorch
#152535 opened
Apr 30, 2025 -
AsyncCollectiveTensor doesn't trigger wait upon dtype cast
#152534 opened
Apr 30, 2025 -
The 2.7.0 release tarball is missing `.ci/docker/ci_commit_pins/nccl-cu12.txt` required for building
#152532 opened
Apr 30, 2025 -
[inductor][triton] Inductor is not compatible with the latest upstream Triton
#152531 opened
Apr 30, 2025 -
flex attention does not leverage masking, memory error
#152528 opened
Apr 30, 2025 -
can't reconstruct the communication group using PyTorch.
#152527 opened
Apr 30, 2025 -
DISABLED test_comprehensive_lu_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152520 opened
Apr 30, 2025 -
DISABLED test_comprehensive_repeat_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152500 opened
Apr 30, 2025 -
UNSTABLE Lint / Lint URLs / linux-job
#152489 opened
Apr 29, 2025 -
[dynamo] Try tracing into einops
#152480 opened
Apr 29, 2025 -
[dynamo] Dynamo fails to run torch.cat() with FakeTensors because it can't confirm 's0 + s1*u0' is nonzero
#152473 opened
Apr 29, 2025 -
DISABLED test_comprehensive_polygamma_polygamma_n_0_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152469 opened
Apr 29, 2025 -
DISABLED test_comprehensive_polygamma_polygamma_n_1_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152470 opened
Apr 29, 2025 -
torch.export with dynamic shapes on Static Cache HF LLama model fails
#152465 opened
Apr 29, 2025 -
[dynamo] `torch.compile` prevents fsdp warning from getting generated
#152451 opened
Apr 29, 2025 -
[dynamo] guard code generation triggers attribute error on DeviceMesh object
#152447 opened
Apr 29, 2025 -
`torch.compile` causes assertion error in distributed checkpoint wrapper test
#152442 opened
Apr 29, 2025 -
Inductor pattern matching on mutable ops
#152441 opened
Apr 29, 2025 -
Newly added lint-urls jobs are very flaky
#152439 opened
Apr 29, 2025 -
`nn.CrossEntropyLoss` accepts negative target probabilities
#152437 opened
Apr 29, 2025 -
[pt2] [AOTAutogradCache] Allow users to specify non torch functions as cacheable
#152434 opened
Apr 29, 2025 -
[Manylinux 2.28] Migrate Docker container to use gcc 14
#152426 opened
Apr 29, 2025 -
Silent incorrectness between static torch.compile vs eager
#152425 opened
Apr 29, 2025 -
Invalid handling of nans in compiled torch.quantile / torch.nanquantile on cuda
#152423 opened
Apr 29, 2025 -
torch.nn.functional.ctc_loss raises cuDNN error in PyTorch versions >=2.5.0
#152421 opened
Apr 29, 2025 -
DISABLED test_comprehensive_index_select_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152416 opened
Apr 29, 2025 -
DISABLED test_input_moved_to_cuda_device_script (__main__.TensorPipeCudaRemoteModuleTest)
#152415 opened
Apr 29, 2025 -
[DTensor] Calling .item() on DTensor with Partial placement results in local value
#152406 opened
Apr 29, 2025 -
[CPU][UT] 16 UT of test/inductor/test_cpu_select_algorithm.py failed with PyTorch 2025-04-028 nightly wheel
#152398 opened
Apr 29, 2025 -
Illegal Instruction Caused by `grid_sample` Under Windows
#152385 opened
Apr 28, 2025 -
Outdated contents in dynamo overview
#152381 opened
Apr 28, 2025 -
TORCH_COMPILE_DEBUG=1 does not consistently generate debug logs
#152374 opened
Apr 28, 2025 -
DISABLED test_reduce_stress_cuda (__main__.ProcessGroupGlooTest)
#152367 opened
Apr 28, 2025 -
[AOTI] Package lowered with package_constants_in_so=False still uses lots of memory when loaded
#152356 opened
Apr 28, 2025 -
Pin setuptools runtime dependency
#152355 opened
Apr 28, 2025 -
DISABLED test_e2e_compile_True_model_type1 (__main__.TestE2ESaveAndLoad)
#152349 opened
Apr 28, 2025 -
torch.nonzero_static is not documented on the website
#152347 opened
Apr 28, 2025 -
compile generates inefficient code for mutations on small slices of inputs
#152346 opened
Apr 28, 2025 -
Unusually slow draft_export time
#152337 opened
Apr 28, 2025 -
pin_memory crashes for big tensors and leaks page locked memory
#152335 opened
Apr 28, 2025 -
compile generates inefficient code when mutating small slice of a graph input
#152323 opened
Apr 28, 2025 -
DISABLED test_comprehensive_pca_lowrank_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#152318 opened
Apr 28, 2025 -
[DCP] failure case of save method
#152310 opened
Apr 28, 2025 -
Softmax Decomp Causes Incorrect Gradients when Using `torch.compile` with `F.multi_head_attention_forward`
#152309 opened
Apr 28, 2025 -
bizarre behavior with torch module's Attribute Error
#152308 opened
Apr 28, 2025 -
Recompile issue after fp8 conversion
#152307 opened
Apr 28, 2025 -
NCCL out of memory error after updating to PyTorch 2.7
#152302 opened
Apr 28, 2025 -
Unexpected result from `torch.xpu.is_bf16_supported()` when XPU is unavailable
#152301 opened
Apr 28, 2025 -
Unexpected behavior when using dist.all_reduce(x, op=dist.ReduceOp.SUM)
#152300 opened
Apr 28, 2025 -
`torch.compile()` produces incorrect results for `asinh_()` operation on large/small values
#152299 opened
Apr 28, 2025 -
Flex attention: batch-index-dependent block mask causes error with changing batch size
#152297 opened
Apr 28, 2025 -
`vmap` not working on `torch.arange`, `torch.scalar_tensor`, and `torch.ones`
#152295 opened
Apr 28, 2025 -
Unexpected overflow behavior when using `torch.addcmul`
#152294 opened
Apr 28, 2025 -
`torch.sparse.log_softmax` output mismatch between CPU and CUDA
#152293 opened
Apr 28, 2025 -
`torch==2.6` broke `nn.Module.dtype` typing
#152292 opened
Apr 28, 2025 -
[Intel GPU][PT2.8]scaled_dot_product_attention returns wrong output
#152290 opened
Apr 28, 2025 -
Error after successful build: No module named 'torch._C._distributed_c10d'
#152285 opened
Apr 27, 2025 -
Forward compatibility in torch.export
#152283 opened
Apr 27, 2025 -
Update `torch/nn/modules/conv.py` to use Literal for support padding modes
#152280 opened
Apr 27, 2025 -
Make scaler.step() return if step was skipped or not
#152279 opened
Apr 27, 2025 -
MPS: Conv1d fails with NotImplementedError for output_channels > 65536
#152278 opened
Apr 27, 2025 -
`setup.py develop` command is disappearing soon from `setuptools`
#152276 opened
Apr 27, 2025 -
[cudagraphs][HF][torch 2.7] Excessive cudagraph re-recording for HF LLM models
#152275 opened
Apr 27, 2025 -
Question about that support of torch.compile for a custom CUDA operator?
#152270 opened
Apr 27, 2025 -
Arbitrary Code Execution Risk in `torch.distributed.utils.overload` When Misused in Type Annotations
#152269 opened
Apr 27, 2025 -
`iter()` and `reversed()` do not raise `StopIteration` when exhausted in torch.compile
#152262 opened
Apr 26, 2025 -
Context Parallel -- unsharded output doesn't match output without CP.
#152261 opened
Apr 26, 2025 -
[FR] Support BSHM-layout scaled_dot_product_attention without transpose.
#152257 opened
Apr 26, 2025 -
Windows inductor genarated code without function declaration, and compile failed on MSVC.
#152251 opened
Apr 26, 2025 -
[DTensor] [distributed]: Operator aten.masked_fill_.Scalar does not have a sharding strategy registered
#152249 opened
Apr 26, 2025 -
NotImplementedError: Could not run 'aten::index.Tensor' with arguments from the 'SparseCUDA' backend.
#152226 opened
Apr 25, 2025 -
DISABLED test_pending_fusions_multiple (__main__.TestPrologueFusion)
#152221 opened
Apr 25, 2025 -
[C10D] Allow NCCL single P2P ops to use parent/collective communicator
#152220 opened
Apr 25, 2025 -
Have compiled autograd config API support nested compilation
#152219 opened
Apr 25, 2025 -
Outdated install commands
#152213 opened
Apr 25, 2025 -
Have cherry-pick bot always add the current release to the PR
#152212 opened
Apr 25, 2025 -
DISABLED test_reduce_stress_cuda (__main__.ProcessGroupGlooLazyInitTest)
#152201 opened
Apr 25, 2025 -
HUD Dashboard sort by perf speedup doesn't do anything
#152190 opened
Apr 25, 2025 -
The input for layers other than the first layer should be the hidden state from the previous layer.
#152188 opened
Apr 25, 2025 -
GroupNorm compilation errors on UNet-based architecture on torch >= 2.6.0
#152185 opened
Apr 25, 2025 -
write a custom ViewAndMutationmeta.__repr__
#152183 opened
Apr 25, 2025 -
GH200/GB200 NCCL Build Pytorch
#152182 opened
Apr 25, 2025 -
Raise an Error when File Not Found in `torch.jit.load()`
#152178 opened
Apr 25, 2025 -
DISABLED test_e2e_compile_True_model_type2 (__main__.TestE2ESaveAndLoad)
#152169 opened
Apr 25, 2025 -
DISABLED test_e2e_compile_True_model_type0 (__main__.TestE2ESaveAndLoad)
#152168 opened
Apr 25, 2025
510 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
[Inductor] FX backend via Wrapper IR
#146942 commented on
May 2, 2025 • 18 new comments -
[Cutlass] Integrate EVT into CUDACPPScheduling
#150906 commented on
May 2, 2025 • 17 new comments -
Random Batch Sampler Speedup
#147706 commented on
May 1, 2025 • 16 new comments -
[cp] dispatch flex_attention to CP impl in TorchDispatchMode
#151497 commented on
Apr 29, 2025 • 14 new comments -
[aotd] Support saved tensors hooks in aot_autograd
#150032 commented on
Apr 30, 2025 • 13 new comments -
[Inductor] Add decomposeK as an autotuning choice for mm
#150654 commented on
May 2, 2025 • 13 new comments -
[1/n][Optimus][Auto-AC] Support activation quantization without scaling
#148380 commented on
May 1, 2025 • 13 new comments -
Implement util function compute_global_tensor_shape for 1D device mesh
#151990 commented on
May 1, 2025 • 10 new comments -
[reland][ROCm] remove caffe2 from hipify
#151845 commented on
Apr 28, 2025 • 9 new comments -
[dynamo] replace `unimplemented` with `unimplemented_v2` in `variables/functions.py`
#151277 commented on
May 1, 2025 • 9 new comments -
[device_mesh] improve device selection logic
#150897 commented on
May 1, 2025 • 9 new comments -
[Inductor UT] Generalize device-bias code in `test_flex_attention.py`
#151937 commented on
May 2, 2025 • 8 new comments -
[ROCm][CI] Enabled fp8 distributed tests in test_micro_pipeline_tp.py for MI300
#151977 commented on
May 2, 2025 • 8 new comments -
Add infra to run CPython tests under Dynamo
#150787 commented on
May 1, 2025 • 7 new comments -
Cache code generation during triton template expansion and enable it for mm_template.
#151773 commented on
May 2, 2025 • 7 new comments -
Fix take_along_dim negative index handling (#146211)
#152161 commented on
Apr 29, 2025 • 6 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
May 1, 2025 • 6 new comments -
Unify how we create random inputs for auto-tuning
#152147 commented on
Apr 30, 2025 • 6 new comments -
Refactoring FSDP2 (_composable/fsdp) test cases to be device agnostic
#149848 commented on
Apr 30, 2025 • 6 new comments -
[CUDA] Replace deprecated usages of cub iterators and thread operators
#147493 commented on
Apr 29, 2025 • 6 new comments -
dynamically set tags
#152089 commented on
Apr 29, 2025 • 6 new comments -
[SymmMem] Add all-to-all
#151498 commented on
May 2, 2025 • 5 new comments -
[associative_scan] Refactoring of input checking and dynamo invocation
#148657 commented on
May 1, 2025 • 5 new comments -
NUMA Binding Integration with torchrun
#149334 commented on
Apr 29, 2025 • 5 new comments -
[dynamo] replace `unimplemented` with `unimplemented_v2` in `variables/torch_functions.py`
#151278 commented on
May 2, 2025 • 5 new comments -
Avoid differing results in `linalg.(tensor_)solve`
#151896 commented on
Apr 30, 2025 • 4 new comments -
Do not cover up `__dunder`__ method type-hints from `.pyi` file
#150875 commented on
May 1, 2025 • 4 new comments -
Optimize LRScheduler docs
#146684 commented on
Apr 28, 2025 • 4 new comments -
Add `load_state_dict` hint doc about invoke order work with lr_scheduler
#149942 commented on
Apr 29, 2025 • 4 new comments -
autograd: Add VJP and JVP rules for aten::aminmax
#151186 commented on
May 2, 2025 • 4 new comments -
removed zero dim cpu logic from fake_tensor.py
#147501 commented on
May 1, 2025 • 3 new comments -
[Intel GPU] Support f32 intermediate dtype, headdim size <=576 and f32 causal mask for SDPA
#152091 commented on
Apr 28, 2025 • 3 new comments -
Move prologue_supported_inputs computations to def_kernal
#150869 commented on
May 2, 2025 • 3 new comments -
Add is_pinned to host allocator
#151439 commented on
Apr 29, 2025 • 3 new comments -
flex attention: fix dispatch order for tensor subclasses, avoid hardcoding call to faketensor impl in dynamo
#151719 commented on
Apr 30, 2025 • 3 new comments -
use vectorized loads and stores for all datatypes in torch.cat
#151818 commented on
Apr 28, 2025 • 3 new comments -
Enable type promotions in slice_scatter (pytorch#147842)
#151911 commented on
Apr 29, 2025 • 3 new comments -
Make `Adam`, `AdamW` work with nonzero-dim Tensor betas
#149939 commented on
May 1, 2025 • 3 new comments -
[WIP]: track remaining runtime time asserts for backward coddgen instead of trying to regenerate all
#151919 commented on
Apr 28, 2025 • 3 new comments -
auto functionalize base_hop
#151067 commented on
May 2, 2025 • 2 new comments -
Remove conda usage in condaenv.bat
#151035 commented on
May 2, 2025 • 2 new comments -
update get_default_device to also respect torch.device ctx manager
#148621 commented on
Apr 30, 2025 • 2 new comments -
[Inductor] Adjust boundary checking of dimensions using YBLOCK
#149504 commented on
Apr 28, 2025 • 2 new comments -
Parallelize sort using libstdc++ parallel mode
#150195 commented on
May 1, 2025 • 2 new comments -
API change for new enum in cusparseltsplitkmode-t for cusparseLT 0.7.0+
#150536 commented on
Apr 30, 2025 • 2 new comments -
[dynamic shapes] guard_or_false for infer_size
#152146 commented on
May 2, 2025 • 2 new comments -
Exempt overriding methods from docstring_linter (fix #151692)
#151906 commented on
Apr 29, 2025 • 2 new comments -
[OpenReg] Add _lazy_init and rng_state support for OpenReg
#151914 commented on
Apr 30, 2025 • 2 new comments -
Add AC_TRACER Infra TorchDispatchMode key
#152158 commented on
Apr 26, 2025 • 1 new comment -
[Graph Partition] Pass all cudagraph tree tests
#152048 commented on
Apr 30, 2025 • 1 new comment -
Replace `fw_metadata` info with trace log hint in hint message
#147365 commented on
Apr 25, 2025 • 1 new comment -
Improve cache key graph printing performance
#151928 commented on
Apr 29, 2025 • 1 new comment -
[Intel GPU][Inductor] Fallback embedding_dense_backward on XPU
#151637 commented on
Apr 29, 2025 • 1 new comment -
[ROCm] Maxpool forward NHWC Perf Improvement targeting Resnet scenarios
#151727 commented on
Apr 30, 2025 • 1 new comment -
Fix #150472 torch.library.custom_op doesn't handle single element tuples returns
#151408 commented on
Apr 29, 2025 • 1 new comment -
[AOTI][reland] Remove typedef for half and bfloat16
#151109 commented on
Apr 30, 2025 • 1 new comment -
[training] Adding NUMA support for pytorch
#150597 commented on
Apr 29, 2025 • 1 new comment -
[ROCm] Add support for SymmetricMemory
#150580 commented on
May 1, 2025 • 1 new comment -
[Inductor] Restrict block analysis to only match integer dims and strides
#149615 commented on
Apr 28, 2025 • 1 new comment -
Inductor logging + analysis of torch.profile
#149697 commented on
May 2, 2025 • 1 new comment -
[dynamic shapes] guard_or_false for computeStorageNbytes
#150483 commented on
Apr 25, 2025 • 1 new comment -
Add MPS support for getHostAllocator API
#151913 commented on
Apr 29, 2025 • 1 new comment -
[map] always turn on dynamo for map
#152041 commented on
Apr 29, 2025 • 1 new comment -
Enable AArch64 CI scripts to be used for local dev
#143190 commented on
Apr 30, 2025 • 1 new comment -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
May 1, 2025 • 0 new comments -
[ATen][CUDA] Optimize 128 bit vectorization
#148320 commented on
May 2, 2025 • 0 new comments -
handle jk for emulation runs
#148240 commented on
May 2, 2025 • 0 new comments -
[Intel CPU] Fix issue #143483.
#144854 commented on
May 2, 2025 • 0 new comments -
Enable `_lazy_clone` between CPU and MPS
#148408 commented on
May 1, 2025 • 0 new comments -
set non_blocking to true in torch._foreach_copy_ to improve performance
#148431 commented on
Apr 28, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[p-z]*/` to `ruff format`
#144552 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/ao/` to `ruff format`
#148185 commented on
May 1, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
May 1, 2025 • 0 new comments -
Improvement with comprehensive docstrings and implementation of class method for the code.
#148170 commented on
Apr 29, 2025 • 0 new comments -
[Don't merge]Upgrade submodule oneDNN to v3.7 (#147498)(Z7)
#148163 commented on
Apr 30, 2025 • 0 new comments -
Replacing explicit backend search with api call
#144944 commented on
May 1, 2025 • 0 new comments -
Enable fp16 linear layers in PyTorch via ACL
#144992 commented on
Apr 26, 2025 • 0 new comments -
draft
#148160 commented on
Apr 29, 2025 • 0 new comments -
Checks kv pair indexing in OrderedPreservingDictTest.test_range_insert
#148136 commented on
May 1, 2025 • 0 new comments -
Use myst_nb in docs
#148105 commented on
Apr 28, 2025 • 0 new comments -
Enable XPU distributed test for PT2.8
#149916 commented on
Apr 29, 2025 • 0 new comments -
[WIP] no normalizations abstractions
#149899 commented on
Apr 28, 2025 • 0 new comments -
[Dynamo] Add easydict support
#149851 commented on
Apr 26, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[e-n]*/` to `ruff format`
#144553 commented on
May 1, 2025 • 0 new comments -
[cuDNN][SDPA] cuDNN SDPA refactor/cleanup, nested tensor backward, test priority bump for `sm90`, `sm100`
#149282 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/[a-c]*/` to `ruff format`
#144554 commented on
May 1, 2025 • 0 new comments -
[BE][Ez]: Update CU126 to CUDNN 12.8 too
#149254 commented on
Apr 30, 2025 • 0 new comments -
[Easy] update pip sources for CUDA in nightly pull tool
#149143 commented on
May 1, 2025 • 0 new comments -
Update the heuristic for AArch64 bmm/baddbmm
#149122 commented on
Apr 29, 2025 • 0 new comments -
Unrestrict some onlyCPU tests
#149095 commented on
Apr 29, 2025 • 0 new comments -
[test] bigger runnner
#149003 commented on
Apr 30, 2025 • 0 new comments -
[Inductor] Record Triton’s Base32 Cache Key in .best_config for Debugging
#148981 commented on
Apr 29, 2025 • 0 new comments -
Move token linter code into tools/linter/adaptors/_linter/
#148959 commented on
May 1, 2025 • 0 new comments -
cpp_wrapper: build non-performance-sensitive code at O1
#148773 commented on
May 1, 2025 • 0 new comments -
Trunk workflow for Windows Arm64
#148753 commented on
May 2, 2025 • 0 new comments -
[BE][PYFMT] remove `black`: finish `black -> ruff format` migration
#144557 commented on
May 1, 2025 • 0 new comments -
[inductor] lowering for fractional_max_pool3d
#148630 commented on
Apr 28, 2025 • 0 new comments -
Adjust CMake code for Eigen
#148628 commented on
May 1, 2025 • 0 new comments -
[BE][pytree] cleanup parameterized pytree tests
#148569 commented on
May 1, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
May 2, 2025 • 0 new comments -
[Intel CPU] Fix issue #143482.
#144760 commented on
May 2, 2025 • 0 new comments -
Fix issue #146018: Improve CachingAutotuner handling
#147580 commented on
Apr 25, 2025 • 0 new comments -
Update pybind11 submodule to 3.0.0-dev test
#147524 commented on
Apr 30, 2025 • 0 new comments -
removed check for ConvTranspose3D on MPS
#145366 commented on
Apr 28, 2025 • 0 new comments -
Fix pxtas warnings on sm_120
#147491 commented on
Apr 25, 2025 • 0 new comments -
Documentation: fix RNN example for multiple layers
#147490 commented on
Apr 25, 2025 • 0 new comments -
[test] sccache log
#147470 commented on
May 1, 2025 • 0 new comments -
handle default in _NamedOptimizer
#147357 commented on
Apr 25, 2025 • 0 new comments -
[NOT_FOR_COMMIT] Try Triton-cpu-arm
#147341 commented on
Apr 26, 2025 • 0 new comments -
Fix torch.compile Fallback for Meta Device Tensors
#147339 commented on
Apr 27, 2025 • 0 new comments -
Optimize `Sequential` methods description
#147304 commented on
May 1, 2025 • 0 new comments -
[Easy] update pip sources for ROCm in nightly pull tool
#145685 commented on
May 1, 2025 • 0 new comments -
Do not use username for inductor default_cache_dir
#147291 commented on
Apr 26, 2025 • 0 new comments -
Fix clang-tidy warnings in torch/jit
#147253 commented on
May 2, 2025 • 0 new comments -
Add quantized BatchNorm1d module
#147113 commented on
Apr 25, 2025 • 0 new comments -
Porting Pytorch to AIX Operating System.
#146983 commented on
Apr 30, 2025 • 0 new comments -
cmake: fix detection logic when using system XNNPACK
#145853 commented on
Apr 26, 2025 • 0 new comments -
[ARM] Fix TestDataLoader.test_segfault unexpected success on Aarch6[4
#146090 commented on
Apr 29, 2025 • 0 new comments -
[CI] Get rid of UCC builds
#146173 commented on
May 2, 2025 • 0 new comments -
Fix non-bitwise type annotations for Tensor operators (see #145838)
#146845 commented on
May 1, 2025 • 0 new comments -
[Inductor-CPU] FP16 X int8 WoQ GEMM for M <= 4 with FP16 accum & compute
#146781 commented on
Apr 27, 2025 • 0 new comments -
Enable pt2e quantization path for arm
#146690 commented on
Apr 28, 2025 • 0 new comments -
[2/N] Fix cppcoreguidelines-init-variables suppression
#146237 commented on
Apr 29, 2025 • 0 new comments -
Update code_template.py re.compile() is directly applied to the regex…
#146489 commented on
Apr 29, 2025 • 0 new comments -
[not for land] temp changes to enable 'simple_fsdp'
#146558 commented on
Apr 28, 2025 • 0 new comments -
[HOP] Mutation and alias rework
#146658 commented on
Apr 30, 2025 • 0 new comments -
Refactor layout constraint selection logic
#148104 commented on
May 2, 2025 • 0 new comments -
Fix test_tensorboard when started w/o tensorboard package
#148079 commented on
Apr 29, 2025 • 0 new comments -
use identity op for alpha=inf in torch.celu and quantized_celu
#148066 commented on
Apr 29, 2025 • 0 new comments -
Support `contextlib.suppress`
#147990 commented on
Apr 29, 2025 • 0 new comments -
xpu: test py_limited_api with SyclExtension
#147984 commented on
Apr 27, 2025 • 0 new comments -
[aot] reset aot counter on torch._dynamo.reset
#147915 commented on
Apr 27, 2025 • 0 new comments -
[DONOTLAND] Fix partial + scalar issue
#147910 commented on
Apr 27, 2025 • 0 new comments -
[PT2][Optimus][Opportunity Finder][1/n] Add opportunity finder in the inductor for GEMM horizonal fusion search
#147908 commented on
Apr 28, 2025 • 0 new comments -
Remerge of #144974
#147903 commented on
Apr 27, 2025 • 0 new comments -
Change persistent reduction threshold to 32
#147899 commented on
Apr 28, 2025 • 0 new comments -
Back out "use copy2d in h2d/d2h copy when possible (#146256)"
#147808 commented on
Apr 27, 2025 • 0 new comments -
test
#147800 commented on
Apr 28, 2025 • 0 new comments -
Upgrade to DLPack 1.0.
#145000 commented on
Apr 26, 2025 • 0 new comments -
Made partitioning more(?) deterministic
#145024 commented on
Apr 29, 2025 • 0 new comments -
[Inductor-CPU] Avoid memory allocator lock contention in the GEMM template
#147797 commented on
Apr 27, 2025 • 0 new comments -
[cuda] Add new gamma beta backwards kernel
#147773 commented on
Apr 26, 2025 • 0 new comments -
add pt2 testing for torch.float8_e8m0fnu
#147770 commented on
Apr 27, 2025 • 0 new comments -
[DCP][OSS] Rank local checkpointing in DCP without collectives
#147758 commented on
Apr 30, 2025 • 0 new comments -
Adding MVP of P1 INT16 Full
#147747 commented on
Apr 25, 2025 • 0 new comments -
Turn Stream into protocol and improve typing in torch/_C/__init__.pyi.in
#145239 commented on
Apr 30, 2025 • 0 new comments -
[Intel GPU] OneDNN primitive cache support for Int4 WOQ gemm on XPU
#147693 commented on
Apr 28, 2025 • 0 new comments -
[cuBLAS] restrict input range for `addmm` tests
#147658 commented on
Apr 28, 2025 • 0 new comments -
[sm100][sm120][fp8][CUDA] skip rowwise scaling tests on SM100+ for now
#147645 commented on
Apr 26, 2025 • 0 new comments -
Remove backend_type_map from Backend
#147635 commented on
Apr 27, 2025 • 0 new comments -
Refactor typing: Replace Any with ParamSpec for better type safety
#147582 commented on
Apr 26, 2025 • 0 new comments -
Add dynamo config to HOP-ify context managers
#152159 commented on
Apr 30, 2025 • 0 new comments -
[SymmMem] Add all_to_all_vdev
#151819 commented on
May 2, 2025 • 0 new comments -
Normalize dynamic size symbols in template codegen cache key.
#151778 commented on
Apr 28, 2025 • 0 new comments -
[Inductor] Modify TritonTemplate store_output function to support TMA stores
#151775 commented on
Apr 30, 2025 • 0 new comments -
[Inductor] Modify persistent+TMA template for Triton mm and admm to use new TMA API
#151774 commented on
Apr 30, 2025 • 0 new comments -
[2/n][Optimus][Auto-AC] Support activation quantization with scaling
#151770 commented on
May 1, 2025 • 0 new comments -
Add adaptive_avg_pool2d input and output_size check
#151769 commented on
Apr 25, 2025 • 0 new comments -
[Don't merge] Upgrade oneDNN to v3.8 for XPU build
#151767 commented on
Apr 25, 2025 • 0 new comments -
torch.testing._internal.optests - MPS Support
#151758 commented on
Apr 28, 2025 • 0 new comments -
Use gather in index_select
#151715 commented on
May 2, 2025 • 0 new comments -
[dtensor] op_schema recursive check for symints
#151679 commented on
Apr 28, 2025 • 0 new comments -
[Intel GPU] Use user-friendly err msg in mm
#151655 commented on
Apr 28, 2025 • 0 new comments -
Add OIDC perms to windows-[build|test] workflows
#151596 commented on
May 1, 2025 • 0 new comments -
Add OIDC permissions to linux-test workflow
#151585 commented on
May 1, 2025 • 0 new comments -
Add OIDC permissions to linux-build workflow
#151581 commented on
May 1, 2025 • 0 new comments -
Update OpenBLAS commit
#151547 commented on
Apr 28, 2025 • 0 new comments -
Fix `InstanceNorm` wrong suggestion in warning message
#151534 commented on
May 1, 2025 • 0 new comments -
[WIP] Deprecate getPinnedMemoryAllocator use getHostAllocator instead
#151531 commented on
Apr 29, 2025 • 0 new comments -
Add OIDC permissions to bazel workflow
#151456 commented on
Apr 25, 2025 • 0 new comments -
Allow to byteswap data when reading saved torch jit data
#151447 commented on
May 1, 2025 • 0 new comments -
Implement fast exp for AVX2 and AVX512 for the flash attention
#151441 commented on
Apr 29, 2025 • 0 new comments -
[Cutlass] Add epilogue inputs/outputs to def_kernel
#151406 commented on
May 2, 2025 • 0 new comments -
[ROCm] Upgrade ROCm CI to ROCm6.4
#151368 commented on
May 2, 2025 • 0 new comments -
Fix skipIfXpu and skipIfHpu disables tests when used on class
#151315 commented on
Apr 30, 2025 • 0 new comments -
Add inductor backend to device interface; make minifier_tests more device agnostic
#151314 commented on
Apr 29, 2025 • 0 new comments -
[Export] Remove to() from module generated form exported program
#151307 commented on
Apr 28, 2025 • 0 new comments -
Use Allocator API raw_allocate & raw_dealloc in CUDAAllocator
#151305 commented on
Apr 25, 2025 • 0 new comments -
WIP: divup op
#152144 commented on
Apr 25, 2025 • 0 new comments -
[inductor] pass reduction idx to scan inner_fns
#152142 commented on
Apr 28, 2025 • 0 new comments -
Remove some instances of uninitialized memory use
#152132 commented on
Apr 29, 2025 • 0 new comments -
Update _torch_docs.py to Fix torch.bernoulli()
#152104 commented on
Apr 28, 2025 • 0 new comments -
Migrate to new Windows Arm64 runners
#152099 commented on
Apr 28, 2025 • 0 new comments -
Switch to standard pep517 sdist generation
#152098 commented on
May 1, 2025 • 0 new comments -
Work around MPSGraph issue in backward pass of nn.ReplicationPad1d/2d
#152094 commented on
Apr 28, 2025 • 0 new comments -
Add optional device index to AOTIModelPackageLoader
#152093 commented on
May 2, 2025 • 0 new comments -
[cuDNN][SDPA] Fix head-dim 256 condition for SM 10.0
#152076 commented on
May 1, 2025 • 0 new comments -
Test
#152055 commented on
Apr 28, 2025 • 0 new comments -
unbreak fb:operator_benchmark_test
#152049 commented on
May 1, 2025 • 0 new comments -
Ignore unused structured arguments in member functions
#152019 commented on
Apr 29, 2025 • 0 new comments -
Add CPython complex tests
#152015 commented on
May 1, 2025 • 0 new comments -
[Kineto] Upgrade the kineto commit to fb36cce
#152007 commented on
Apr 29, 2025 • 0 new comments -
[UniformValueConstantFolder] deduce value on CPU rather than on device
#151998 commented on
Apr 28, 2025 • 0 new comments -
[SymmMem] Use cub's BlockScan instead of in-house impl for offset calculation
#151993 commented on
May 2, 2025 • 0 new comments -
[inductor] Remove usage of autotune_fallback_to_aten outside inductor code
#151988 commented on
Apr 29, 2025 • 0 new comments -
Add torchcheck for replication_pad3d_backward
#151986 commented on
Apr 25, 2025 • 0 new comments -
[Intel GPU] undo broadcast on zero stride tensor for SDPA
#151976 commented on
Apr 29, 2025 • 0 new comments -
Inductor Tiling Rewrite
#151958 commented on
Apr 29, 2025 • 0 new comments -
[inductor] fix lowering for cummin, cummax for one element tensors
#151931 commented on
May 1, 2025 • 0 new comments -
[ROCm][CI] Update dockerfile to use centos9
#151929 commented on
May 1, 2025 • 0 new comments -
Skip fuse attention on fp32 if not tf32
#151924 commented on
Apr 28, 2025 • 0 new comments -
[WIP] Deprecate AcceleratorHooksInterface isPinnedPtr, use at::getHostAllocator()->is_pinned instead
#151916 commented on
Apr 29, 2025 • 0 new comments -
Deprecated pkg_resources and use distributions instead
#151915 commented on
Apr 29, 2025 • 0 new comments -
Add `LinearLR` compute lr formula in doc
#151894 commented on
Apr 25, 2025 • 0 new comments -
[inductor] Clean typing in codegen/common.py and codecache.py
#150767 commented on
Apr 28, 2025 • 0 new comments -
[Dynamo][Typing] Enable `@override` for VTs [1/N]
#150763 commented on
Apr 25, 2025 • 0 new comments -
[BE][CI][Easy] Run `lintrunner` on generated `.pyi` stub files
#150732 commented on
May 2, 2025 • 0 new comments -
[BE] Resolve lint errors in `.pyi` stub files
#150731 commented on
May 2, 2025 • 0 new comments -
[BE] Ensure generated stub files by `gen_pyi` are properly formatted
#150730 commented on
May 2, 2025 • 0 new comments -
[BE] Add `__all__` to `torch/nn/functional.pyi` and `torch/return_types.pyi`
#150729 commented on
May 2, 2025 • 0 new comments -
[BE] Update `.pyi` stub template to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150728 commented on
May 2, 2025 • 0 new comments -
[torchgen] Refactor and simplify `gen_pyi.py` to use Generic TypeAlias (PEP 585) and Union Type (PEP 604)
#150727 commented on
May 2, 2025 • 0 new comments -
[torchgen] Refactor `torchgen.utils.FileManager` to accept `pathlib.Path`
#150726 commented on
May 2, 2025 • 0 new comments -
Avoid overwriting COW data in MPS code
#150721 commented on
May 2, 2025 • 0 new comments -
[export] add runtime assert messages to python torch checks
#150719 commented on
Apr 25, 2025 • 0 new comments -
[WIP] Support XPU in memory tracker
#150703 commented on
Apr 30, 2025 • 0 new comments -
Raise `BufferError` for DLPack buffer-related errors.
#150691 commented on
Apr 26, 2025 • 0 new comments -
AOTI: add all fallback ops that are missing from C-shim
#150673 commented on
May 1, 2025 • 0 new comments -
[Inductor] Fix CUDA memory usage for CPU only compile
#150669 commented on
Apr 28, 2025 • 0 new comments -
Refactor `torch/utils/data/datapipes/gen_pyi.py` with `torchgen`
#150626 commented on
May 2, 2025 • 0 new comments -
Fix nn.LazyModuleMixin examples
#150596 commented on
May 1, 2025 • 0 new comments -
fix dynamic shapes for kwargs
#150583 commented on
Apr 30, 2025 • 0 new comments -
Enable lazy cloning in `Tensor.to` between CPU and MPS
#150569 commented on
May 2, 2025 • 0 new comments -
Inductor respects exact strides on custom ops by default
#150511 commented on
May 2, 2025 • 0 new comments -
[DLPack] Add support for missing keyword-arguments.
#150218 commented on
Apr 26, 2025 • 0 new comments -
Fix DLPack stream logic.
#150217 commented on
Apr 26, 2025 • 0 new comments -
[DLPack] add NumPy exchange tests.
#150216 commented on
Apr 26, 2025 • 0 new comments -
AOTI freezing: fix test issues and enable by default
#149961 commented on
May 1, 2025 • 0 new comments -
[inductor] Add more typing to _inductor/ir.py
#149959 commented on
Apr 28, 2025 • 0 new comments -
[inductor] Add typing to _inductor/ir.py
#149958 commented on
Apr 29, 2025 • 0 new comments -
[3/N] Use internal linkage in C++ files
#151297 commented on
Apr 30, 2025 • 0 new comments -
[WIP][SymmMem] Add sendrecv op
#151262 commented on
Apr 29, 2025 • 0 new comments -
[dynamo] keep C++ symbolic shape guards disabled for benchmarks
#151225 commented on
May 1, 2025 • 0 new comments -
Implement MKLGenerator
#151218 commented on
May 1, 2025 • 0 new comments -
Update slow tests
#151207 commented on
Apr 28, 2025 • 0 new comments -
Fix DWConv in QNNPACK for aarch32
#151191 commented on
Apr 26, 2025 • 0 new comments -
[dynamo] Prevent lazy variable realization on STORE_FAST
#151184 commented on
May 2, 2025 • 0 new comments -
Guard additional use of DriverAPI
#151125 commented on
Apr 28, 2025 • 0 new comments -
TESTING: IGNORE
#151116 commented on
May 1, 2025 • 0 new comments -
Update auto-tuning support for _scaled_grouped_mm
#150944 commented on
May 1, 2025 • 0 new comments -
fix shard tensor gather when a local tensor on certain ranks has zero elements
#150914 commented on
Apr 27, 2025 • 0 new comments -
[DO NOT MERGE] Throwaway changes
#150910 commented on
May 2, 2025 • 0 new comments -
[Inductor] Fix cuda_template.py typing
#150909 commented on
May 2, 2025 • 0 new comments -
[Inductor] Fix cuda_kernel typing
#150908 commented on
Apr 29, 2025 • 0 new comments -
[Cutlass] Changes to gemm template for EVT
#150907 commented on
May 2, 2025 • 0 new comments -
[device_mesh] replace dim_group_info with group_name
#150898 commented on
Apr 30, 2025 • 0 new comments -
Add CPython tests for iter/sort
#150797 commented on
May 1, 2025 • 0 new comments -
Add CPython generator/contextlib tests
#150796 commented on
May 2, 2025 • 0 new comments -
Add CPython int/float tests
#150795 commented on
May 1, 2025 • 0 new comments -
Add CPython math/cmath tests
#150794 commented on
May 1, 2025 • 0 new comments -
Add CPython string tests
#150793 commented on
May 1, 2025 • 0 new comments -
Add CPython set tests
#150792 commented on
May 1, 2025 • 0 new comments -
Add CPython dict tests
#150791 commented on
May 1, 2025 • 0 new comments -
Add CPython list/tuple tests
#150790 commented on
May 1, 2025 • 0 new comments -
Add CPython exception tests
#150789 commented on
May 1, 2025 • 0 new comments -
Add CPython tests for unittest
#150788 commented on
May 1, 2025 • 0 new comments -
The operator 'aten::_linalg_eigh.eigenvalues' is not currently implemented for the MPS device
#151874 commented on
Apr 28, 2025 • 0 new comments -
CUDA deps cannot be preloaded under Bazel
#117350 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int8 (__main__.TestForeachCUDA)
#150837 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 commented on
Apr 28, 2025 • 0 new comments -
Bug with "make latexpdf"
#135420 commented on
Apr 28, 2025 • 0 new comments -
DeepSeek: mixed precision optimizers
#146542 commented on
Apr 28, 2025 • 0 new comments -
`2` and `-2` for `ord` argument of `linalg.norm()` should be explained more clearly
#136453 commented on
Apr 28, 2025 • 0 new comments -
[inductor] [cuda] [fake tensor] `torch.ones(x.size(0))` becomes a fake tensor for `torch.diagonal_scatter`
#151670 commented on
Apr 28, 2025 • 0 new comments -
Graph break on .t() when Tensor._make_subclass
#151771 commented on
Apr 28, 2025 • 0 new comments -
Inductor generates wrong code for `torch.embedding`
#151918 commented on
Apr 28, 2025 • 0 new comments -
Tracking issue: Incorrect Meta Strides / Turn On PyDispatcher in FakeTensor Mode
#145094 commented on
Apr 28, 2025 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
Apr 28, 2025 • 0 new comments -
Add maxcount Parameter to torch.unique and torch.unique_consecutive
#151722 commented on
Apr 28, 2025 • 0 new comments -
ExpandableMemorySegments not working on H100s/A100s
#122057 commented on
Apr 28, 2025 • 0 new comments -
[ROCm] QR decomposition is much slower on MI300x than A100
#151066 commented on
Apr 28, 2025 • 0 new comments -
Unaccaptable OOMs all the time.
#152135 commented on
Apr 28, 2025 • 0 new comments -
distrubuted: false positive Grad strides vs Bucket strides warning
#152042 commented on
Apr 28, 2025 • 0 new comments -
[inductor] nan_asserts doesn't work for FP8, "RuntimeError: "isinf" not implemented for 'Float8_e4m3fn'"
#149002 commented on
Apr 28, 2025 • 0 new comments -
Support loading and executing a ExportedProgram from torch.export in C++ environment
#144663 commented on
Apr 28, 2025 • 0 new comments -
[C10D] Autograd Support for Collectives
#152131 commented on
Apr 28, 2025 • 0 new comments -
[torch.compile][export] `PendingUnbackedSymbolNotFound` for `torch.full`
#151980 commented on
Apr 28, 2025 • 0 new comments -
FlopCounterMode doesn't support HOP
#134385 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 commented on
Apr 28, 2025 • 0 new comments -
torch.compile doesnot support index with tensor
#151997 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_comprehensive_native_layer_norm_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#152056 commented on
Apr 28, 2025 • 0 new comments -
[Dynamo] Exception raised inside torch.autocast causes crash AttributeError: 'NoneType' object has no attribute 'is_python_constant
#152012 commented on
Apr 28, 2025 • 0 new comments -
linear + relu don't fuse
#152101 commented on
Apr 28, 2025 • 0 new comments -
FlexAttention + Export / AOTI
#152128 commented on
Apr 28, 2025 • 0 new comments -
Compiling attention (SDPA) with nested tensors fails when using DDP
#152068 commented on
Apr 28, 2025 • 0 new comments -
RFC: Torch Native Runtime
#152034 commented on
Apr 28, 2025 • 0 new comments -
standalone_compile with training errors with no cache artifacts
#152022 commented on
Apr 28, 2025 • 0 new comments -
Adam optimizer ValueError: beta1 as a Tensor
#149508 commented on
Apr 26, 2025 • 0 new comments -
[BE] Consolidate PR labeling logic
#151579 commented on
Apr 29, 2025 • 0 new comments -
[Infra] Jobs got frequently cancelled, sometimes mid-checkout
#151669 commented on
Apr 29, 2025 • 0 new comments -
The docstring linter should not force overridden methods to be documented
#151692 commented on
Apr 29, 2025 • 0 new comments -
fbgemm packages are compiled in torchinductor torchbench tests
#152024 commented on
Apr 29, 2025 • 0 new comments -
[inductor] [silent incorrectness] `torch.nn.PairwiseDistance(p=2)` outputs incorrect results with eager
#151198 commented on
Apr 29, 2025 • 0 new comments -
Major perf regression with `BatchNorm2d` + `torch.compile` with `reduce-overhead` + DDP
#139207 commented on
Apr 29, 2025 • 0 new comments -
associative scan is incorrect for certain shapes/kwargs
#137943 commented on
Apr 29, 2025 • 0 new comments -
Support AC with graph break
#139989 commented on
Apr 29, 2025 • 0 new comments -
torch.compile + Huggingface GenerationMixin
#141196 commented on
Apr 29, 2025 • 0 new comments -
nn.MultiheadAttention causes gradients to become NaN under some use cases
#41508 commented on
Apr 29, 2025 • 0 new comments -
Expanding subset of tensor reads wrong memory
#151799 commented on
Apr 29, 2025 • 0 new comments -
aten.grid_sampler_3d.default is missing a c-shim implementation, using proxy executor as fallback
#147625 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_uint8 (__main__.TestForeachCUDA)
#150878 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_int8 (__main__.TestForeachCUDA)
#149859 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 commented on
Apr 29, 2025 • 0 new comments -
Expand Tag Set: views & reductions
#129020 commented on
Apr 29, 2025 • 0 new comments -
[inductor] [cpu] `torch.nn.RReLU()` doesn't respect `fallback_random` flag
#147255 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_serialized_patterns_up_to_date (__main__.TestPatternMatcher)
#135476 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_comprehensive_scatter_reduce_prod_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140294 commented on
Apr 29, 2025 • 0 new comments -
torch.compile fails in FSDP due to .data assignment with different floating type
#152162 commented on
Apr 29, 2025 • 0 new comments -
Update quantization to make source files complient with /Zc:lambda
#92600 commented on
Apr 29, 2025 • 0 new comments -
[Inductor] define custom pass as list
#151876 commented on
Apr 29, 2025 • 0 new comments -
Loss parallel's override of log_softmax doesn't support negative dims
#152016 commented on
Apr 29, 2025 • 0 new comments -
`RuntimeError: UR error` with XPU
#149953 commented on
Apr 29, 2025 • 0 new comments -
"RuntimeError: makeDeviceForHostname(): unsupported gloo device" with nightly torch 2.8
#150381 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_distributed_checkpoint_state_dict_type0_cuda (__main__.TestDistributedCheckpointCUDA)
#145807 commented on
Apr 29, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_comprehensive_nn_functional_glu_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#140383 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_comprehensive_nanquantile_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139593 commented on
Apr 29, 2025 • 0 new comments -
[dynamo] Some inefficiencies around handling __torch_function__
#151776 commented on
Apr 28, 2025 • 0 new comments -
`torch.bmm` is slow on non-contiguous BF16 CPU tensors
#151934 commented on
Apr 28, 2025 • 0 new comments -
Negative index support for `take_along_dim`
#146211 commented on
Apr 26, 2025 • 0 new comments -
Add a `TORCH_LOGS_RANK=0` env var that integrates with `TORCH_LOGS`
#146913 commented on
Apr 26, 2025 • 0 new comments -
[inductor] [assertion error] `torch.select_scatter` crashes on inductor but passes on eager
#151296 commented on
Apr 26, 2025 • 0 new comments -
[mem profiler] mem fragmentation and pynvml view
#150574 commented on
Apr 25, 2025 • 0 new comments -
[AotInductor][Export][Triton] how to export custom triton kernels when use torch.export.export
#151746 commented on
Apr 25, 2025 • 0 new comments -
AOTI cannot move tensors between cuda devices
#152130 commented on
Apr 25, 2025 • 0 new comments -
AOTInductor package can only be loaded on the first GPU (cuda:0) in C++ via AOTIModelPackageLoader
#152087 commented on
Apr 25, 2025 • 0 new comments -
[feature request][AOTI] Expand check input assertions to cover input guards created during compilation?
#151925 commented on
Apr 25, 2025 • 0 new comments -
Exported Module cannot call train() or eval()
#151726 commented on
Apr 25, 2025 • 0 new comments -
[export] deserialization for unbacked ranges is wrong
#151809 commented on
Apr 25, 2025 • 0 new comments -
DISABLED test_item_to_inputs_kernel_nobreak_cuda (__main__.TestInductorDynamicCUDA)
#119538 commented on
Apr 25, 2025 • 0 new comments -
Stack trace from pytest is very far away and far too find on some tests
#141204 commented on
Apr 25, 2025 • 0 new comments -
Some Performance Bug in `tol` of `torch.lobpcg()`
#152154 commented on
Apr 25, 2025 • 0 new comments -
Note some limit in docstring of `padding` in Poolnd
#152156 commented on
Apr 25, 2025 • 0 new comments -
Some Doc Issue about `torch.lobpcg()`
#152107 commented on
Apr 25, 2025 • 0 new comments -
[Scaled MM] Update to support on B200 TN, NT, NN, TT Layouts are supported
#152150 commented on
Apr 25, 2025 • 0 new comments -
padding_mode `reflect` works different from others in Conv
#152152 commented on
Apr 25, 2025 • 0 new comments -
Documentation Clarification Needed for Clamping of Scale Coefficient in clip_grads_with_norm_
#151554 commented on
Apr 25, 2025 • 0 new comments -
[Inductor] track block shape of intermediary variables
#149905 commented on
Apr 25, 2025 • 0 new comments -
torch.nested.narrow() or torch.nested.to_padded_tensor() breaks backwards pass - invalid gradient
#145837 commented on
Apr 25, 2025 • 0 new comments -
Enable TorchInductor to Generate Matmuls Natively via `tl.dot`
#151705 commented on
Apr 25, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_bfloat16 (__main__.TestForeachCUDA)
#148965 commented on
Apr 25, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int16 (__main__.TestForeachCUDA)
#150772 commented on
Apr 25, 2025 • 0 new comments -
Exporting the operator 'aten::lift_fresh' to ONNX - not supported
#151932 commented on
Apr 25, 2025 • 0 new comments -
[ONNX] Flip `dynamo` default to True in torch.onnx.export
#151693 commented on
Apr 25, 2025 • 0 new comments -
Pinned memory doubles memory usage for tensors slightly over 128MB
#150517 commented on
Apr 25, 2025 • 0 new comments -
Inductor pattern matcher replaces aten.reshape with aten.view in pattern
#151649 commented on
Apr 25, 2025 • 0 new comments -
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 commented on
Apr 25, 2025 • 0 new comments -
Xcode 16+: duplicate LC_RPATH '@loader_path'
#151592 commented on
Apr 25, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_float64 (__main__.TestForeachCUDA)
#150752 commented on
Apr 25, 2025 • 0 new comments -
[Inductor] atomic_add does not support bf16
#97016 commented on
Apr 25, 2025 • 0 new comments -
standalone compile FakeTensor from_graph detection with tensor subclass outputs
#151945 commented on
Apr 28, 2025 • 0 new comments -
Add operator name to the size/strides/alignment assertion
#151930 commented on
Apr 28, 2025 • 0 new comments -
Runtime assertion not generated in inductor for input unbacked symints
#151879 commented on
Apr 28, 2025 • 0 new comments -
Optimize printing sympy expressions during logging and cache key computation
#151823 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int64 (__main__.TestForeachCUDA)
#150822 commented on
Apr 28, 2025 • 0 new comments -
Support Delay Loading of c10.dll in when using libtorch as a thirdparty library.
#105058 commented on
Apr 28, 2025 • 0 new comments -
Torch 2.1 compile + FSDP (mixed precision) + LlamaForCausalLM: `RuntimeError: attempting to assign a gradient with dtype 'c10::BFloat16' to a tensor with dtype 'float'.`
#111317 commented on
Apr 28, 2025 • 0 new comments -
[inductor] [cpu] [edge case] When processing `torch.nan_to_num-.long()`, inductor outputs the `reciprocal` of eager
#151510 commented on
Apr 28, 2025 • 0 new comments -
[dynamo] Replace `unimplemented` with `unimplemented_v2`
#147913 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on
Apr 28, 2025 • 0 new comments -
`torch.jit.script` does not respect `torch.set_default_dtype`
#150607 commented on
Apr 28, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_outplace_cuda_int32 (__main__.TestForeachCUDA)
#150800 commented on
Apr 28, 2025 • 0 new comments -
Python 3.10 + intel-openmp failed to use numactl after import torch._C
#136307 commented on
Apr 28, 2025 • 0 new comments -
Torch compile issue, AttributeError: 'NoneType' object has no attribute 'store_cubin'
#150980 commented on
Apr 27, 2025 • 0 new comments -
Elastic training crashes on killed agent
#150916 commented on
Apr 27, 2025 • 0 new comments -
[Torch Profiler] Only two streams captured in CUDA graph but multiple streams shown in Torch Profiler
#152114 commented on
Apr 27, 2025 • 0 new comments -
DeepSeek: a2a communication with metadata on the GPU
#146329 commented on
Apr 27, 2025 • 0 new comments -
ROCm+gcc 15 asserts
#145608 commented on
Apr 27, 2025 • 0 new comments -
[FSDP2][DTensor] numeric bug for DTensor + python float in gradient clipping
#149768 commented on
Apr 26, 2025 • 0 new comments -
caching keys+values in TransformerDecoderLayer for faster inference
#107573 commented on
Apr 26, 2025 • 0 new comments -
[release] Make pytorch source distribution package respect pep-0517
#150461 commented on
Apr 26, 2025 • 0 new comments -
Change the type hint for nn.Module.__call__ to be friendly to overrides.
#74746 commented on
Apr 26, 2025 • 0 new comments -
[inductor] [cuda] [silent incorrectness] `F.softmax-torch.argsort` output silent incorrectness when tensor input is very large
#151745 commented on
Apr 26, 2025 • 0 new comments -
[feature request] [ux] Frontend methods for fused elementwise affine transform: mul+add+dtype convert + support `integer_tensor.mul_(float_constant)` and `float_tensor.mul(some_constant, out = integer_tensor)` maybe via new args `rounding_mode=...` and `dtype=...` + maybe support OpenCV-style saturated dtype conversions (e.g. `clamp_` before conversion)
#106624 commented on
Apr 26, 2025 • 0 new comments -
[feature request] torch.mix function to generalize/symmetrize addcmul
#104849 commented on
Apr 26, 2025 • 0 new comments -
`torch.lerp` to support argument type promotion / broadcasting - including of `input` / `end` arguments
#57947 commented on
Apr 26, 2025 • 0 new comments -
Wrong formula for CosineAnnealingLR
#152081 commented on
Apr 26, 2025 • 0 new comments -
Build pytorch for rocm failed
#148167 commented on
Apr 26, 2025 • 0 new comments -
[RFC] Add new CPP builder for inductor on pytorch Windows
#124245 commented on
Apr 26, 2025 • 0 new comments -
Torch Inductor Windows Path Escape Characters
#135954 commented on
Apr 26, 2025 • 0 new comments -
Compilation of the post-training quantized model using Nvidia ModelOpt is failing with the error: Unsupported — 'inline in skipfiles: QuantLinearConvBase.quantize_weight
#151450 commented on
Apr 26, 2025 • 0 new comments -
refine fp32 precision api
#125888 commented on
Apr 30, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#125806 commented on
May 2, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
May 2, 2025 • 0 new comments -
[pytree] support PyStructSequence types for Python pytree
#113258 commented on
May 1, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
May 2, 2025 • 0 new comments -
[ATen][Sparse] Use Third-Party Eigen for sparse addmm
#101814 commented on
Apr 30, 2025 • 0 new comments -
Performance regression on modded-nanogpt torch-2.7.0.dev20250208→torch-2.7.0.dev20250209
#147463 commented on
May 2, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_int16 (__main__.TestForeachCUDA)
#150309 commented on
May 2, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cuda (__main__.GPUTests)
#151541 commented on
May 2, 2025 • 0 new comments -
DISABLED test_remove_noop_view_dtype_cpu (__main__.CpuTests)
#151540 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_floor_cuda_float16 (__main__.TestInductorOpInfoCUDA)
#152058 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_nansum_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140693 commented on
May 2, 2025 • 0 new comments -
DISABLED AotInductorTest.FreeInactiveConstantBufferRuntimeConstantFoldingCuda (build.bin.test_aoti_inference)
#150299 commented on
May 2, 2025 • 0 new comments -
DISABLED AotInductorTest.FreeInactiveConstantBufferCuda (build.bin.test_aoti_inference)
#149495 commented on
May 2, 2025 • 0 new comments -
DISABLED test_comprehensive_bitwise_right_shift_cuda_int32 (__main__.TestInductorOpInfoCUDA)
#152057 commented on
May 2, 2025 • 0 new comments -
[ued] Slow start up time for `torch.compile` on GGUF Auraflow
#150706 commented on
May 2, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float64 (__main__.TestForeachCUDA)
#150298 commented on
May 2, 2025 • 0 new comments -
`context_parallel` fails for training with `RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation`
#149306 commented on
May 2, 2025 • 0 new comments -
The state of sparse Tensors
#9674 commented on
May 1, 2025 • 0 new comments -
[Feature request] Exclusive prefix sum, `torch.cumsum(input, dim=0, exclusive=True)`
#76191 commented on
May 1, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on
May 1, 2025 • 0 new comments -
`view()` + modify-in-place fails silently with DTensor
#147570 commented on
May 1, 2025 • 0 new comments -
LoadHIP.cmake should find_package(composable_kernel)
#149809 commented on
May 1, 2025 • 0 new comments -
at::BlasBackend::Ck does not handle all ROCm BLAS gpus
#150187 commented on
May 1, 2025 • 0 new comments -
[ROCm] PyTorch slow on TTS
#150168 commented on
May 1, 2025 • 0 new comments -
Support SDPA flash attention/ memory efficant attn on ROCm gfx908
#141958 commented on
May 1, 2025 • 0 new comments -
[RFC] : Dynamically Quantized 8-bit Matrix Multiplication support
#149500 commented on
May 1, 2025 • 0 new comments -
dynamo cannot trace global op_set .__contains__
#145761 commented on
May 1, 2025 • 0 new comments -
Investigate FlexAttention performance degradation on low precision inputs
#147336 commented on
May 1, 2025 • 0 new comments -
When using torch to convert to oxxn model, testing the inference results with actual images shows tensor mismatch
#152097 commented on
May 1, 2025 • 0 new comments -
Dynamo unsupported: dynamic padding
#123855 commented on
May 1, 2025 • 0 new comments -
[ROCm] MI300X FP8 scaled_mm is extremely slow on nightly
#143465 commented on
Apr 30, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `torch/_[a-h]*/` to `ruff format`
#144551 commented on
May 1, 2025 • 0 new comments -
[BE][PYFMT] migrate PYFMT for `{torch,test}/{nn,optim}/**` to `ruff format`
#144548 commented on
May 1, 2025 • 0 new comments -
codecache: Remove cpp_prefix.h duplication per build, then precompile it
#144293 commented on
May 1, 2025 • 0 new comments -
[ci] Add riscv opt-int build
#143979 commented on
Apr 27, 2025 • 0 new comments -
Make init_method deprecated to fix TCP connection refused error
#143858 commented on
Apr 25, 2025 • 0 new comments -
Make functionalization `ViewMeta` serializable with pickle.
#143712 commented on
Apr 27, 2025 • 0 new comments -
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on
Apr 29, 2025 • 0 new comments -
[while_loop][jit inductor] auto-unspecialize int input and output to unbacked symints
#143457 commented on
Apr 30, 2025 • 0 new comments -
Fix type annotation of `Linear.bias`
#142326 commented on
Apr 26, 2025 • 0 new comments -
remove redundant assign
#140399 commented on
Apr 27, 2025 • 0 new comments -
[pt2e][quant] Add simple cond support for annotate, prepare and convert
#140323 commented on
Apr 27, 2025 • 0 new comments -
cpu: enable gemm-bf16f32 for SDPA BF16
#140159 commented on
Apr 29, 2025 • 0 new comments -
Support LOAD_BUILD_CLASS opcode in dynamo
#139561 commented on
Apr 29, 2025 • 0 new comments -
Use the device interface for detecting Triton availability
#139171 commented on
May 1, 2025 • 0 new comments -
Fix bug of torch.nn.functional.kl_div when broadcast happened
#138810 commented on
Apr 27, 2025 • 0 new comments -
Add overflow check for negtive integer div_floor and div_trunc on CPU
#138684 commented on
Apr 30, 2025 • 0 new comments -
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on
Apr 29, 2025 • 0 new comments -
[pytree] add `treespec_{leaf,tuple,dict}` functions for args_spec modification
#138214 commented on
May 1, 2025 • 0 new comments -
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on
Apr 29, 2025 • 0 new comments -
[BE]: Try turning on LTO in CMake in CI
#137866 commented on
Apr 25, 2025 • 0 new comments -
Load cuda deps more aggressively
#137059 commented on
Apr 28, 2025 • 0 new comments -
Add back DistributedDataParallel types that were lost when pyi was removed
#136835 commented on
Apr 29, 2025 • 0 new comments -
[Don't Merge] Try to build custom ops with MKL XPU
#133658 commented on
Apr 29, 2025 • 0 new comments -
add ranking for grouped benchmarks
#133287 commented on
May 2, 2025 • 0 new comments -
[torch.special] Adding betainc, betaincc, betaincinv, betainccinv, betaln and beta with backward operation
#132135 commented on
Apr 29, 2025 • 0 new comments -
[1/N] Use 3.25.3 as the minimum CMake version
#130522 commented on
May 1, 2025 • 0 new comments -
[pytree] implement key path APIs for CXX pytree
#130141 commented on
Apr 29, 2025 • 0 new comments -
Remove direct dependency of protobuf from CMake
#127919 commented on
Apr 27, 2025 • 0 new comments -
[inductor] enable bf32 for mkldnn linear pointwise/binary in inductor
#127294 commented on
Apr 30, 2025 • 0 new comments -
[inductor] enable bf32 test for mkldnn conv
#127293 commented on
Apr 30, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv backward
#126054 commented on
Apr 30, 2025 • 0 new comments -
allow to use bf16 as fp32 internal precision for mkldnn conv
#126050 commented on
Apr 30, 2025 • 0 new comments -
Profiler doesn't seem to work on AMD CPUs
#150052 commented on
Apr 30, 2025 • 0 new comments -
[ROCm] sdpa group query attention bf16 numeric error
#139352 commented on
Apr 30, 2025 • 0 new comments -
[NJT] `.bmm`'s BmmBackward0 fails compilation when second arg requires grad
#152122 commented on
Apr 30, 2025 • 0 new comments -
[RFC] zentorch Integration
#150296 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bool (__main__.TestForeachCUDA)
#150120 commented on
Apr 30, 2025 • 0 new comments -
Implementation of a numerically stable log(1 - softmax) function in PyTorch
#129657 commented on
Apr 30, 2025 • 0 new comments -
Release artifacts for rc releases
#124759 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_fake_registration (__main__.TestOpProfiles)
#151301 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_bfloat16 (__main__.TestForeachCUDA)
#150119 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_matrix_rank_basic_cuda_float32 (__main__.TestLinalgCUDA)
#150406 commented on
Apr 30, 2025 • 0 new comments -
Grad strides do not match bucket view strides.
#47163 commented on
Apr 30, 2025 • 0 new comments -
[torch/elastic] unexpected behavior of torch elastic
#147064 commented on
Apr 30, 2025 • 0 new comments -
[ONNX] Create a message to suggest users setting dynamo=True when exporting
#152025 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#150902 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_aoti (__main__.TestMemoryPlanning)
#145211 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_comprehensive_nansum_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#139710 commented on
Apr 30, 2025 • 0 new comments -
Error when using torch.fx on bert
#67970 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_parity__foreach_abs_fastpath_inplace_cuda_bfloat16 (__main__.TestForeachCUDA)
#148966 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_repeated_calling_cuda (__main__.AOTInductorTestABICompatibleGpu)
#146185 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_vdd_clamp_dynamic_shapes_cpu (__main__.DynamicShapesCpuTests)
#134445 commented on
Apr 30, 2025 • 0 new comments -
[C10D] Make collectives backwards throw an error
#152127 commented on
Apr 30, 2025 • 0 new comments -
Numerical inaccuracies in "ddp_apply_optim_in_backward" unit tests for gloo backend
#111834 commented on
Apr 29, 2025 • 0 new comments -
DISABLED test_duplicate_registration_impl (__main__.TestOpProfiles)
#151281 commented on
Apr 29, 2025 • 0 new comments -
Is it possible to remove NCCL submodule and use only nccl binaries from pypi instead ?
#144768 commented on
Apr 29, 2025 • 0 new comments -
`version.txt` mismatch with tags in release branch
#151425 commented on
Apr 29, 2025 • 0 new comments -
Status of pip wheels with _GLIBCXX_USE_CXX11_ABI=1
#51039 commented on
May 1, 2025 • 0 new comments -
DTensor slicing on sharded dimension leads to replication
#149447 commented on
May 1, 2025 • 0 new comments -
[graph pickler] [inductor compile async] imprecise filter for non standard op?
#151904 commented on
May 1, 2025 • 0 new comments -
DISABLED test_parity__foreach_acos_fastpath_inplace_cuda_complex128 (__main__.TestForeachCUDA)
#150933 commented on
May 1, 2025 • 0 new comments -
DISABLED test_binary_op_with_scalar_self_support__foreach_pow_is_fastpath_True_cuda_complex128 (__main__.TestForeachCUDA)
#149323 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float32 (__main__.TestForeachCUDA)
#150208 commented on
May 1, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cpu (__main__.CpuTests)
#151512 commented on
May 1, 2025 • 0 new comments -
RFC: The State of Custom CUDA extensions in PyTorch
#152032 commented on
May 1, 2025 • 0 new comments -
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
#127176 commented on
May 1, 2025 • 0 new comments -
AOTI packaged model fails with generic error when run in for loop but succeeds on individual sample
#146524 commented on
May 1, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
May 1, 2025 • 0 new comments -
DISABLED test_int64_upsample3d_cuda_bfloat16 (__main__.TestTorchDeviceTypeCUDA)
#146007 commented on
May 1, 2025 • 0 new comments -
DISABLED test_comprehensive_special_xlog1py_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#140648 commented on
May 1, 2025 • 0 new comments -
Ability to do aot/inductor compilation from a jit model (or torch.exported model)
#127928 commented on
May 1, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
May 1, 2025 • 0 new comments -
Memory Leak in MPS Backend During LSTM Iterations (Out of Memory Error)
#145374 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_float16 (__main__.TestForeachCUDA)
#150173 commented on
May 1, 2025 • 0 new comments -
[Async TP] all-gather-matuls not fusing properly when rowwise scales are used
#149990 commented on
May 1, 2025 • 0 new comments -
[Tracker] Nested tensor op coverage requests
#118107 commented on
May 1, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex64 (__main__.TestForeachCUDA)
#150161 commented on
May 1, 2025 • 0 new comments -
DISABLED test_comprehensive_nanmean_cuda_float64 (__main__.TestInductorOpInfoCUDA)
#140339 commented on
May 1, 2025 • 0 new comments -
DISABLED test_inductor_all_gather_into_tensor_coalesced (__main__.CompileTest)
#146806 commented on
May 1, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on
May 1, 2025 • 0 new comments -
DISABLED test_is_isnot (__main__.TestScript)
#120694 commented on
May 1, 2025 • 0 new comments -
[CI] No workflows scheduled on PRs
#151322 commented on
Apr 30, 2025 • 0 new comments -
[inductor] Improve codegen for argmax+max
#146643 commented on
Apr 30, 2025 • 0 new comments -
DISABLED test_foreach_copy_with_multi_dtypes__foreach_copy_cuda_complex128 (__main__.TestForeachCUDA)
#150141 commented on
Apr 30, 2025 • 0 new comments -
missing docs for torch.Tag
#126518 commented on
Apr 30, 2025 • 0 new comments -
Quantile is limited to 16 million elements and have poor performance.
#64947 commented on
Apr 30, 2025 • 0 new comments -
enhance documentation around the developer build
#108406 commented on
Apr 30, 2025 • 0 new comments -
Training/Fine-tuning fails with PyTorch 2.8 + 4x 5090 GPUs using DDP/FSDP/DeepSpeed
#150734 commented on
Apr 30, 2025 • 0 new comments -
torch.compile on MPS progress tracker
#150121 commented on
Apr 30, 2025 • 0 new comments