Pulse · pytorch/pytorch · GitHub

August 4, 2025 – August 11, 2025

Overview

239 Active pull requests

230 Active issues
- 0 Merged pull requests
- 239 Open pull requests
- 115 Closed issues
- 115 New issues

1 Release published by 1 person

v2.8.0 PyTorch 2.8.0 Release
published Aug 6, 2025

239 Pull requests opened by 125 people

[WIP][symm_mem] Add a wait for signal and put signal for one side API
#159837 opened Aug 5, 2025
Improve README.md formatting and fix documentation errors
#159841 opened Aug 5, 2025
[AOTInductor] ABI-Compatibility for RecordFunction.
#159842 opened Aug 5, 2025
Revert "[BE] Update xpu driver repo for CD used almalinux 8.10 (#1573…
#159849 opened Aug 5, 2025
Guard the CPU cpp wrapper tests on having a cpp wrapper
#159850 opened Aug 5, 2025
[dtensor] fix incorrect norm calculation for Partial DTensors
#159856 opened Aug 5, 2025
[Do not merge][TensorPipe] Test PR https://github.com/pytorch/tensorpipe/pull/464
#159857 opened Aug 5, 2025
[ROCm] Clean up CUDA state between tests
#159858 opened Aug 5, 2025
Added PyTorch LUT optimisation for GELU bf16 operators
#159859 opened Aug 5, 2025
Issue 146167 inductor type hints lowering 1
#159861 opened Aug 5, 2025
Fix skipIfXpu and skipIfHpu and similar skip decorators
#159862 opened Aug 5, 2025
Implement `list(UserDefinedObject)` via `force_unpack_var_sequence`
#159864 opened Aug 5, 2025
[collections.abc] Ensure that binop calls works with UserDefinedObjects
#159865 opened Aug 5, 2025
error message for instantiating CUDA Stream if CUDA not available
#159868 opened Aug 5, 2025
Add linux-aarch64 and windows python 3.14 nightly builds
#159869 opened Aug 5, 2025
ParamwiseScheduler for different schedulers for specific parameter groups
#159873 opened Aug 5, 2025
[pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory
#159874 opened Aug 5, 2025
[ca] enable on PYTORCH_TEST_WITH_INDUCTOR
#159875 opened Aug 5, 2025
Automated submodule update: tensorpipe
#159876 opened Aug 5, 2025
Fix meta for constant_pad_nd
#159878 opened Aug 5, 2025
Make distributed modules importable without distributed build
#159889 opened Aug 5, 2025
Add tests for Gaussian Mixture Model numerical consistency
#159893 opened Aug 5, 2025
tools: Add option to log build output to a file
#159895 opened Aug 5, 2025
Allow torch.hub.load with unauthorized GITHUB_TOKEN
#159896 opened Aug 5, 2025
[DTensor] Migrate tests to Continuous test base
#159898 opened Aug 5, 2025
[pt2e] Avoid getting model device once per node
#159901 opened Aug 5, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_collections`
#159902 opened Aug 5, 2025
[BE] Add linter to detect unused docker images
#159905 opened Aug 5, 2025
[DO NOT MERGE] Perf-testing #158137 addmm_fusion 2
#159909 opened Aug 5, 2025
[CUDA] Bump tolerances for `test_baddmm`
#159915 opened Aug 5, 2025
Fix typo
#159919 opened Aug 6, 2025
Close some sources of fake tensor leakages
#159923 opened Aug 6, 2025
[AOTI] Check if triton is installed when BUILD_AOT_INDUCTOR_TEST=1
#159934 opened Aug 6, 2025
added class or module info for functions blocked by weight-only load
#159935 opened Aug 6, 2025
Fix AdaptiveMaxPoll index error
#159936 opened Aug 6, 2025
[WIP][device_mesh] Move global state into class method
#159937 opened Aug 6, 2025
[Intel GPU PT2E] Infer runtime out dtype based on dequant node in pattern
#159941 opened Aug 6, 2025
[CD] Add ptl aot target in xpu windows build
#159943 opened Aug 6, 2025
[WIP] enable some tests in test_ops.TestCommon on Intel GPU
#159944 opened Aug 6, 2025
avoid bit cast for bfloat16_t
#159946 opened Aug 6, 2025
Update SECURITY.md - added branded name as PyTorch instead of Pytorch and others.
#159953 opened Aug 6, 2025
[inductor] fix triton bucketize mask propagation
#159961 opened Aug 6, 2025
fix fake_tensor for aten._to_copy
#159964 opened Aug 6, 2025
Fix redundant move warnings in dim.cpp
#159966 opened Aug 6, 2025
Support device ordering in Shard
#159967 opened Aug 6, 2025
Add mast job name from env variable
#159971 opened Aug 6, 2025
[bucketing] Bucket only adjacent collectives to prevent reordering
#159983 opened Aug 6, 2025
[ROCm][inductor][dashboard] Add GPT2ForSequenceClassification to use_larger_multiplier_for_smaller_tensor list
#160001 opened Aug 6, 2025
Allow setting quantized engine to none
#160003 opened Aug 6, 2025
[export] Fix custom ops in subgraphs
#160004 opened Aug 6, 2025
Set PYTHONHOME for inductor subprocesses using torch
#160008 opened Aug 6, 2025
[torch.mtia] fix a bug in storage.resize_()
#160017 opened Aug 6, 2025
[1/3][ghstack] [vllm ci build setup ]setup lumen_cli
#160043 opened Aug 7, 2025
[ghstack] setup torch_cli build
#160044 opened Aug 7, 2025
Update (base update)
#160045 opened Aug 7, 2025
ci: Reduce the amount of log spam for manywheel
#160055 opened Aug 7, 2025
ci: Remove app from GH_CHECKSUITES_FRAGMENT
#160056 opened Aug 7, 2025
Fix profiler stack trace names
#160058 opened Aug 7, 2025
Error when there is side effect in strict mode
#160060 opened Aug 7, 2025
Clarify EMA equation in get_ema_multi_avg_fn docstring
#160061 opened Aug 7, 2025
Update torch-xpu-ops commit pin
#160062 opened Aug 7, 2025
Fix hpu backend mapping issue
#160063 opened Aug 7, 2025
Move hardware_destructive_interference_size to c10/core/alignment.h
#160067 opened Aug 7, 2025
[DO NOT MERGE] Stress Test MI325 Capacity.
#160073 opened Aug 7, 2025
Bump to ONNX 1.19.0
#160076 opened Aug 7, 2025
Enable prioritized linker optimization for AArch64 in setup.py and clean up CI script
#160078 opened Aug 7, 2025
Make manylinux build.sh work for AArch64 and AArch64+CUDA builds
#160079 opened Aug 7, 2025
POC for VLA SVE Vectorized class
#160080 opened Aug 7, 2025
[ROCm] [inductor] Added test skips for small ROCm gpus
#160081 opened Aug 7, 2025
Generalize Block/Pool in device caching allocator
#160082 opened Aug 7, 2025
[cpp][inductor] Fix crash on bmm when input is used twice.
#160087 opened Aug 7, 2025
set up vllm build logics in torch_cli
#160088 opened Aug 7, 2025
[2/3 step][ vllm ci build setup] Add vlllm buld logic and dockerfile
#160089 opened Aug 7, 2025
Update (base update)
#160090 opened Aug 7, 2025
[DO NOT MERGE] Testing with TOT Kineto
#160091 opened Aug 7, 2025
[fx] fix split_module with symint
#160093 opened Aug 7, 2025
Try fix AC tag propagation in compile when not using dynamo
#160096 opened Aug 7, 2025
Fix flight recorder for P2P ops
#160097 opened Aug 7, 2025
Add ownership token when needed on GradientEdge
#160098 opened Aug 7, 2025
[OpenReg] Add Event&Stream Support for OpenReg Backend
#160099 opened Aug 7, 2025
[OpenReg] Integrate Event&Stream from OpenReg Backend into PyTorch
#160100 opened Aug 7, 2025
[OpenReg] Improve the Event and Stream capabilities of DeviceGuardImplInterface
#160101 opened Aug 7, 2025
[TEST] Revert "[ROCm][CI] upgrade to 6.4.2 patch release (#158887)"
#160103 opened Aug 7, 2025
[ROCm] Integrate AITER Fav3 fwd kernels
#160105 opened Aug 7, 2025
Add cutedsl template support to compile
#160108 opened Aug 7, 2025
Add flash attention impl to flex attention
#160109 opened Aug 7, 2025
switch prefer_deferred_runtime_asserts_over_guards in export
#160111 opened Aug 7, 2025
ci: Update test_trymerge for stale data
#160112 opened Aug 7, 2025
[inductor] Estimate peak memory allocfree and applying to reordering collectives
#160113 opened Aug 7, 2025
[3/3][ghstack][vllm ci build setup]vllm build workflow
#160116 opened Aug 7, 2025
Update (base update)
#160117 opened Aug 7, 2025
ci: Update permissions to include checks + actions
#160118 opened Aug 7, 2025
Account for triton kernel source code hidden in custom ops properly in AOTAutogradCache
#160120 opened Aug 7, 2025
Gh/yangw dev/13/orig
#160125 opened Aug 7, 2025
Gh/yangw dev/14/orig
#160126 opened Aug 7, 2025
[AOTI] use CudaCachingAllocator for memory allocation
#160127 opened Aug 7, 2025
Add `CUDA_KERNEL_ASSERT_PRINTF`, a more flexible `CUDA_KERNEL_ASSERT_MSG`
#160129 opened Aug 7, 2025
[AOTInductor] Add grid information for Triton Kernels
#160131 opened Aug 7, 2025
[inductor] TLParse tensor metadata logging + test
#160132 opened Aug 7, 2025
[FSDP][Collectives] skipping allgather when world size is 1
#160135 opened Aug 7, 2025
[FSDP][Collectives] skipping reduce_scatter when world size is 1
#160136 opened Aug 7, 2025
[dynamo, nested graph breaks] clean up comments and codegen
#160138 opened Aug 7, 2025
fixing graph break for namedtuple._replace
#160139 opened Aug 7, 2025
Add stable Tensor get_device_index, use more stable DeviceIndex
#160143 opened Aug 7, 2025
[c10d] Error out the case when registering symmetric memory without eager init
#160145 opened Aug 7, 2025
[Still Work in progress] vllm test cli
#160146 opened Aug 7, 2025
[FSDP][Replicate] replicate tests for param registration and input device movements
#160147 opened Aug 7, 2025
[Still WIP]setup test ci workflow
#160149 opened Aug 7, 2025
dynamo: Write a structured trace for supressed exceptions.
#160151 opened Aug 8, 2025
[dynamo] Trace nn.Module __delattr__
#160152 opened Aug 8, 2025
[dynamo] Make ListIterator track mutations to original list
#160154 opened Aug 8, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_dict`/`test_ordered_dict`
#160156 opened Aug 8, 2025
[WIP][1/N] Port 6 fsdp distributed test cases to Intel GPU
#160158 opened Aug 8, 2025
[inductor] turn on windows inductor UTs
#160161 opened Aug 8, 2025
Support NUMA Binding for Callable Entrypoints
#160163 opened Aug 8, 2025
fix(inductor): show intermediate buffers for split reductions in profile
#160166 opened Aug 8, 2025
[CUDAGraph] Skip CUDAGraph when only 1 kernel
#160168 opened Aug 8, 2025
Add build support for RISCV
#160172 opened Aug 8, 2025
[Inductor XPU GEMM] Step 1/N: Add cutlass-sycl repro.
#160173 opened Aug 8, 2025
[Inductor XPU GEMM] Step 2/N: Generalize cutlass configuration.
#160174 opened Aug 8, 2025
Fixes #119272
#160178 opened Aug 8, 2025
Do not rpath CUDA stubs folder in JIT generated code
#160179 opened Aug 8, 2025
[cutlass backend] re-add pip cutlass path
#160180 opened Aug 8, 2025
[do not merge][inductor] optimize welford reduction
#160181 opened Aug 8, 2025
[WIP] [1/N] Introduce a generic CachingDeviceAllocatorImpl for cross backend use
#160182 opened Aug 8, 2025
Update triton xpu commit to support python 3.14
#160183 opened Aug 8, 2025
Draft: separate reqs for manywheel build and pin
#160184 opened Aug 8, 2025
[doc] fix spelling of word - when
#160185 opened Aug 8, 2025
[DO NOT MERGE][Kineto] Testing not clearing per-thread attribute at all
#160186 opened Aug 8, 2025
[DCP][OSS] Remove extra collective on load
#160189 opened Aug 8, 2025
[Triton] [Inductor] Generalize index broadcasting to handle torch.utils._sympy.functions.Identity
#160190 opened Aug 8, 2025
Flex Attention heuristics: a Blackwell config
#160192 opened Aug 8, 2025
[FR] Don't check incomplete ranks for printing
#160195 opened Aug 8, 2025
kill allow_complex_guards_as_runtime_asserts
#160198 opened Aug 8, 2025
Add CUDA installation script for CUDA 13
#160201 opened Aug 8, 2025
Refactors symmetric memory creation through allocator interface
#160202 opened Aug 8, 2025
[WIP][doc] AOTI debugging guide
#160204 opened Aug 8, 2025
[DCP][HF] Add option to parallelize reads in HF Storage Reader
#160205 opened Aug 8, 2025
[ROCm][DO NOT MERGE] treat unset variables in manywheels build_rocm.sh as an error.
#160207 opened Aug 8, 2025
[wip] Support `defaultdict(None, mapping | Iterable[Tuple])`
#160209 opened Aug 8, 2025
Add `is_cpu` and `dtype` convenience methods for stable tensor type
#160212 opened Aug 8, 2025
[muon] Introduce Muon optimizer to PyTorch
#160213 opened Aug 8, 2025
Port amax to stable ABI
#160214 opened Aug 8, 2025
[ROCm] Enable MI355 CI on PRs, and run full set of UTs on PRs
#160215 opened Aug 8, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_set`
#160216 opened Aug 8, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_operator`
#160217 opened Aug 8, 2025
Cap num_stages to 1 on AMD in inductor's triton_heuristics.py
#160218 opened Aug 8, 2025
appending the pythonpath
#160219 opened Aug 8, 2025
[MPS] Sparse enable indices and values
#160223 opened Aug 8, 2025
Remove torch.serialization entries from the doc ignore list
#160224 opened Aug 8, 2025
switch order of ScalarType::Undefined so it equates to -1
#160226 opened Aug 8, 2025
Fix undefined behavior in mul_overflows
#160229 opened Aug 8, 2025
torchdim Python port
#160236 opened Aug 9, 2025
fix(docs): wrong conversion from rst to md in torch.compiler_troubleshooting.md
#160238 opened Aug 9, 2025
[kernacle] add support for addmm and bmm
#160239 opened Aug 9, 2025
guard_or_false cat ops
#160250 opened Aug 9, 2025
unify broadcast_shapes functions and avoid duplicates
#160251 opened Aug 9, 2025
migrate more simple gso checks
#160253 opened Aug 9, 2025
extract shape in _view_has_unbacked_input
#160255 opened Aug 9, 2025
Detect torch function in lists as well
#160256 opened Aug 9, 2025
[inductor] Windows inductor use intel-openmp.
#160258 opened Aug 9, 2025
[vllm hash update] update the pinned vllm hash
#160259 opened Aug 10, 2025
[DO NOT MERGE] Autograd Onboarding Lab
#160264 opened Aug 10, 2025
Support of dtensor redistribute with device order
#160266 opened Aug 10, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_int`/`bool`/`float`/`complex`
#160276 opened Aug 10, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_list`/`tuple`
#160277 opened Aug 10, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_iter`
#160278 opened Aug 10, 2025
[FSDP2] cast `unsharded_param_grad` to correct reduce dtype
#160279 opened Aug 10, 2025
[simplefsdp] add multi parallelism autobucketing
#160282 opened Aug 10, 2025
[C10D] Add check_rng_sync util
#160283 opened Aug 10, 2025
WIP summarize ranks
#160284 opened Aug 10, 2025
[BE] Remove modernize suppression
#160288 opened Aug 11, 2025
[BE] Save attributes for CppCompileError for pickleing
#160294 opened Aug 11, 2025
Recursively descend into lists for TF in getitem
#160297 opened Aug 11, 2025
[wip][claude-code] support multi kernel reductions
#160298 opened Aug 11, 2025
[Do not merge]Upgrade oneDNN to v3.9
#160299 opened Aug 11, 2025
[Intel GPU] check data alignment before contiguousness
#160301 opened Aug 11, 2025
Improve README.md formatting and fix documentation errors
#160307 opened Aug 11, 2025
Enable XPU for test_autograd_function.py
#160309 opened Aug 11, 2025
[inductor] Fix descriptor broadcasting for singleton dimensions
#160310 opened Aug 11, 2025
Fix reinplace optimization issue for index_put when self and source alias
#160311 opened Aug 11, 2025
Optimize `min`, `max` gradient behavior description
#160312 opened Aug 11, 2025
Fix get_free_symbol_uses for several nodes
#160314 opened Aug 11, 2025
Add sdist handling to version finding
#160315 opened Aug 11, 2025
[DO NOT MERGE] ACL Version Upgrade v52.3.0
#160316 opened Aug 11, 2025
[Windows] Update libuv version from 1.39 to 1.51
#160318 opened Aug 11, 2025
Draft Python API for F.linear_cross_entropy
#160319 opened Aug 11, 2025
[Caffe2] Enable SVE128
#160323 opened Aug 11, 2025
[Caffe2] Add SVE128 vectorized<bfloat16_t> template layer
#160324 opened Aug 11, 2025
[Caffe2] Add SVE128 vectorized<float16_t> template layer
#160325 opened Aug 11, 2025
[Caffe2] Add SVE128 vectorized<int##bit##_t> template layers
#160326 opened Aug 11, 2025
[Caffe2] Add SVE128 vectorized<T> template layers for unsigned integers
#160327 opened Aug 11, 2025
[Caffe2] Add SVE128 vectorized<float> template layer
#160328 opened Aug 11, 2025
[Caffe2] Add SVE128 vectorized<double> template layer
#160329 opened Aug 11, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_math`/`cmath`
#160330 opened Aug 11, 2025
Wrap class definitions in `set_fullgraph(False)` in `test_sort`
#160331 opened Aug 11, 2025
Fix typo: 'complext'
#160335 opened Aug 11, 2025
[ROCm] Make triton build rocm agnostic
#160336 opened Aug 11, 2025
Return the loaded library in torch.ops.load_library
#160338 opened Aug 11, 2025
[MPS] Add support for log_normal_
#160339 opened Aug 11, 2025
Get tensor subclasses and torch.library.triton_op to dispatch correctly
#160341 opened Aug 11, 2025
Add batch option for send/recv_object_list
#160342 opened Aug 11, 2025
[FSDP][Replicate] replicate tests for casting module after init
#160344 opened Aug 11, 2025
[retry-land][pytorch][dynamo_compile] Log stack_trace to dynamo_compile
#160348 opened Aug 11, 2025
Parameterized CUDA Graph Launch 2
#160351 opened Aug 11, 2025
[redone][pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory
#160352 opened Aug 11, 2025
Easy decomposeK code refactor
#160353 opened Aug 11, 2025
fix cpp builder to avoid missing-source compile error
#160354 opened Aug 11, 2025
[cutlass backend] Allow bmm use cases when batch stride is 0
#160356 opened Aug 11, 2025
Factor out the strings to templates for better editor integration
#160357 opened Aug 11, 2025
[Let's see CI] Remove use of device_guard for TensorIndex kernel
#160360 opened Aug 11, 2025
setup test ci
#160361 opened Aug 11, 2025
Typing for common.py
#160362 opened Aug 11, 2025
Type cudagraphs.py
#160363 opened Aug 11, 2025
typing debugging.py
#160364 opened Aug 11, 2025
typing distributed.py
#160365 opened Aug 11, 2025
typing inductor and placeholder backends
#160366 opened Aug 11, 2025
typing registry.py
#160367 opened Aug 11, 2025
Type backend torchxla
#160368 opened Aug 11, 2025
typing tvm.py
#160369 opened Aug 11, 2025
[kernacle] add support for addmm and bmm
#160370 opened Aug 11, 2025
[kernacle] add support for addmm and bmm
#160371 opened Aug 11, 2025
[MTIA] Add MTIA dispatch for kernel foreach_maximum (#160358)
#160372 opened Aug 11, 2025
[ROCm][Windows] Include native_transformers srcs to fix link errors.
#160373 opened Aug 11, 2025
[inductor][while_loop][be] improve the readability of output handling
#160374 opened Aug 11, 2025
[while_loop] autograd support
#160375 opened Aug 11, 2025
[while_loop][auto_grad] support dynamic shape
#160376 opened Aug 11, 2025
[while_loop][autograd] support carry of int tensor
#160377 opened Aug 11, 2025
[BE][CI] Adjust `error_inputs` for cat and complex
#160378 opened Aug 11, 2025
[CI] Move CUDA tests to trunk workflow
#160379 opened Aug 11, 2025
[AOTInductor] Add input information for Triton Kernels in AOTI
#160380 opened Aug 11, 2025
[Inductor][Configs] Expose autotune_num_choices_displayed through environment variable
#160381 opened Aug 11, 2025
Turn on part of provenance tracking by default
#160383 opened Aug 12, 2025
[audio hash update] update the pinned audio hash
#160384 opened Aug 12, 2025
[TorchScript] thread-safe ErrorReport::CallStack
#160386 opened Aug 12, 2025
Fix LBFGS warning convert a tensor with requires_grad=True to a scalar
#160389 opened Aug 12, 2025
[FSDP][Replicate] Testing replicate parity in single and multigroup
#160390 opened Aug 12, 2025
wip
#160393 opened Aug 12, 2025
[export] Refactor PT2 Archive weight saving and loading
#160394 opened Aug 12, 2025

115 Issues closed by 37 people

flex_attention + dynamic=True with large batch or heads causes Triton Error [CUDA]: invalid argument
#157018 closed Aug 12, 2025
[FlexAttention] Zero computed input gradients with torch.compile + customized autograd func
#159299 closed Aug 12, 2025
DISABLED test_multiple_mutations_of_buf (__main__.TestOperatorReorderForPeakMemory)
#159952 closed Aug 12, 2025
2.8 flex attention kernels result in triton warning
#158463 closed Aug 11, 2025
Wrong-size gradients in Expert Parallel MoE
#160285 closed Aug 11, 2025
Use FP32 for ConvTranspose3D when using autocast on MPS
#160332 closed Aug 11, 2025
DISABLED test_op_has_batch_rule_tensordot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142769 closed Aug 11, 2025
DISABLED test_cuda_kernel_loop_overflow_large (__main__.TestCuda)
#159285 closed Aug 11, 2025
DISABLED test_deferred_runtime_asserts (__main__.ReproTests)
#156817 closed Aug 11, 2025
DISABLED test_graph_memory_stats_and_use_result_after_destroy_graph (__main__.TestCuda)
#159286 closed Aug 11, 2025
Regression on compile with backend inductor with torch 2.8
#160084 closed Aug 11, 2025
MPS regression on supported dtypes/scalar/alpha combinations on add/sub/rsub
#160208 closed Aug 11, 2025
[RFC] A Distributed CUDA Unified Memory Backend for PyTorch
#158122 closed Aug 11, 2025
DISABLED test_inplace_on_view_makes_base_require_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156209 closed Aug 11, 2025
DISABLED test_dataclass_init_with_default_factory_with_inputs (__main__.ReproTests)
#156799 closed Aug 11, 2025
DISABLED test_triton_fx_graph_with_et_cuda (__main__.TestExecutionTraceCUDA)
#159236 closed Aug 11, 2025
Severe performance regression on deterministic algorithm in torch 2.0
#109856 closed Aug 11, 2025
uint64_t can't be represented as `c10::IValue` (was TorchDispatchMode + full raises an overflow error for uint64 when it's out of int64 range)
#159168 closed Aug 11, 2025
Add __riscv macro detection to support the scalar backend for RISCV
#160171 closed Aug 11, 2025
DISABLED test_qlinear_add_int8_mixed_bf16_use_relu_False_is_qat_False_is_dynamic_True (__main__.TestPatternMatcher)
#157911 closed Aug 11, 2025
DISABLED test_conv_transpose_unary_fusion_ops (__main__.TestMkldnnFusion)
#158115 closed Aug 11, 2025
MPS does not support addmm for non-float input
#154901 closed Aug 11, 2025
[Feature] Implement a CUDA kernel for _weight_int8pack_mm
#158849 closed Aug 10, 2025
aten::grid_sampler_3d'
#160237 closed Aug 10, 2025
`torch.compile` on `.sum() `and `.item()` calls errors from `tensorify_python_scalars`
#158083 closed Aug 9, 2025
torch.jit.script fails with 'undefined value' error when module has @torch.compiler.disable decorator (regression in 2.8.0)
#160059 closed Aug 9, 2025
Aborted (core dumped) in `reflection_pad 2d`
#142455 closed Aug 8, 2025
[ONNX] Exporter crashes when fx node output includes None
#160150 closed Aug 8, 2025
AArch64 Inductor Perf Test build fails installing torchao
#160188 closed Aug 8, 2025
DISABLED test_repeat_interleave_2_dynamic_shapes_xpu (__main__.DynamicShapesCodegenGPUTests)
#159803 closed Aug 8, 2025
error message for `Tensor.index_put_` could be improved for MPS failure mode
#160034 closed Aug 8, 2025
`copy_()` fails with HSDP in FSDP2
#147568 closed Aug 8, 2025
[dynamo, guards] Move SHAPE_ENV guard to C++
#143309 closed Aug 8, 2025
BUG: numpy very slow after import torch
#158005 closed Aug 8, 2025
[ONNX] Improve dynamic_axes to dynamic_shapes conversion in exporter
#150940 closed Aug 8, 2025
TypeError: (): incompatible function arguments
#102832 closed Aug 8, 2025
PyTorch 2.8: PYTHONPATH no longer respected when building from source?
#160092 closed Aug 8, 2025
HAS_CUDA in the inductor tests is really HAS_CUDA_AND_TRITON
#159399 closed Aug 8, 2025
lintrunner not support riscv64
#160170 closed Aug 8, 2025
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 closed Aug 8, 2025
DISABLED test_dataclass_in_module (__main__.ReproTests)
#156776 closed Aug 8, 2025
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 closed Aug 8, 2025
DISABLED test_graph_two_successive (__main__.TestCuda)
#159113 closed Aug 8, 2025
DISABLED test_inductor_multiple_specializations_cuda (__main__.GPUTests)
#154705 closed Aug 8, 2025
Add the `cutlass-sycl` submodule and set it as the default cutlass path for XPU.
#160176 closed Aug 8, 2025
torch.multinomial sample behavior is not consist with sample numbers
#159927 closed Aug 8, 2025
[RFC] CUDAPluggableAllocator receives malloc request of size zero.
#159892 closed Aug 8, 2025
Feature Request: deterministic adaptive_avg_pool2d_backward_cuda
#84860 closed Aug 7, 2025
[libtorh]Consistency problem of gpu computing
#94976 closed Aug 7, 2025
Add nondeterministic alert to `torch.scatter_`
#70583 closed Aug 7, 2025
nn.LSTM gives nondeterministic results with dropout and multiple layers, OR cuDNN version mismatch
#35661 closed Aug 7, 2025
Add nondeterministic alert to `.scatter_`
#133204 closed Aug 7, 2025
Deterministic support for adaptive_avg_pool2d_backward_cuda
#149130 closed Aug 7, 2025
Inconsistent output for ConvTranspose3d on GPU
#137970 closed Aug 7, 2025
[inductor] tune_scaled_grouped_mm fails with memory layout assertion, despite memory layout assertions prior to op call passing
#156325 closed Aug 7, 2025
Fix the description of `alpha` in `torch.sub`
#159637 closed Aug 7, 2025
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 closed Aug 7, 2025
DISABLED test_cuda_kernel_loop_overflow (__main__.TestCuda)
#159069 closed Aug 7, 2025
DISABLED test_record_stream (__main__.TestCuda)
#134746 closed Aug 7, 2025
torch.cuda.empty_cache() shall support MemPool as an argument just like the C++ interface
#160069 closed Aug 7, 2025
Bits types cannot be used under deterministic mode
#109802 closed Aug 7, 2025
/usr/local/lib/python3.11/dist-packages/torch/autograd/graph.py:825: UserWarning: grid_sampler_2d_backward_cuda does not have a deterministic implementation, but you set 'torch.use_deterministic_algorithms(True, warn_only=True)'.
#152171 closed Aug 7, 2025
X86InductorQuantizer does not quantize anything
#160095 closed Aug 7, 2025
DISABLED test_cuda_memory_leak_detection_propagates_errors (__main__.TestCuda)
#159039 closed Aug 7, 2025
DISABLED test_inductor_all_reduce_coalesced (__main__.CompileTest)
#147726 closed Aug 7, 2025
DISABLED test_inductor_broadcast (__main__.CompileTest)
#147816 closed Aug 7, 2025
torch.sparse.to_sparse_semi_structured generate irrelevant values with dtype half or int8
#159872 closed Aug 7, 2025
[rocm] HIP Graph capture raises segmentation fault on AMD GPU but CUDA Graph capture succeeds on Nvidia GPU
#155720 closed Aug 7, 2025
`torch.unique` behaves strange for large input arrays on Windows
#135019 closed Aug 7, 2025
addmv bfloat16 inconsistency on x86
#159960 closed Aug 7, 2025
[inductor][fuzzer] Compilation Error in complex64+toint
#157683 closed Aug 7, 2025
Build pytorch for rocm failed
#148167 closed Aug 7, 2025
DISABLED test_comment_graph_fragment (__main__.TritonCodeGenTests)
#159925 closed Aug 7, 2025
DISABLED test_hop_eager (__main__.TorchFunctionModeTests)
#159950 closed Aug 7, 2025
DISABLED test_hop (__main__.TorchFunctionModeTests)
#159951 closed Aug 7, 2025
DISABLED test_remote_cache_load_function_device_cuda_bfloat16_dynamic_False_bundle_triton_True_use_static_cuda_launcher_False (__main__.TestFxGraphCache)
#150444 closed Aug 7, 2025
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 closed Aug 7, 2025
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 closed Aug 7, 2025
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 closed Aug 7, 2025
Release 2.8.0 validations checklist and cherry-picks
#158939 closed Aug 6, 2025
torch_shm_manager: undefined reference to gloo
#146239 closed Aug 6, 2025
Performance bug on `mode` of `torch.autograd.grad_mode.inference_mode`
#159633 closed Aug 6, 2025
[Docs] 2.8 version is listed twice
#159972 closed Aug 6, 2025
`CosineAnnealingWarmRestarts` should use integer epoch
#69841 closed Aug 6, 2025
`torch.export` fails on `torch.cond` that dispatches Triton kernels (`SpecViolationError: missing val`)
#159955 closed Aug 6, 2025
ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'
#159825 closed Aug 6, 2025
DISABLED test_addmm_dtype_mismatch (__main__.TestPatternMatcher)
#159631 closed Aug 6, 2025
[MPS] Remove MacOS-13 support
#159275 closed Aug 6, 2025
Multiple runners shutdown for an autoupdate while still running jobs
#107402 closed Aug 6, 2025
[v.2.8.0] Release Tracker
#156745 closed Aug 6, 2025
PyTorch Improper Resource Shutdown or Release vulnerability
#159963 closed Aug 6, 2025
Inductor Perf MX to_blocked
#153194 closed Aug 6, 2025
Reproducibility on different platform
#159846 closed Aug 6, 2025
DISABLED test_distributed_checkpoint_state_dict_type0_cuda (__main__.TestDistributedCheckpointCUDA)
#145807 closed Aug 6, 2025
UnicodeDecodeError in torch.compile on Windows MSVC
#159537 closed Aug 6, 2025
DISABLED test_inplace_on_view_then_no_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156306 closed Aug 5, 2025
Nightly libtorch links on website are incorrect
#159880 closed Aug 5, 2025
Add CUDA kernel support for RTX 5070 Ti (Ada Lovelace, SM 8.9)
#159851 closed Aug 5, 2025
incorrect _unsafe_index meta
#139312 closed Aug 5, 2025
DISABLED test_grad_with_tracer_ScheduleClass0 (__main__.ScheduleTest)
#159034 closed Aug 5, 2025
DISABLED test_grad_with_tracer_ScheduleClass1 (__main__.ScheduleTest)
#159253 closed Aug 5, 2025
DISABLED test_zero_bubble_with_model_kwargs_ScheduleClass0 (__main__.ScheduleTest)
#154547 closed Aug 5, 2025
DISABLED test_schedule_with_native_zero_bubble_ScheduleClass0 (__main__.ScheduleTest)
#156088 closed Aug 5, 2025
DISABLED test_schedule_with_native_zero_bubble_ScheduleClass1 (__main__.ScheduleTest)
#156328 closed Aug 5, 2025
DISABLED test_grad_with_manual_interleaved_ScheduleClass2_use_new_runtime_False (__main__.ScheduleTest)
#154443 closed Aug 5, 2025
DISABLED test_grad_with_manual_interleaved_ScheduleClass2_use_new_runtime_True (__main__.ScheduleTest)
#154481 closed Aug 5, 2025
DISABLED test_is_isnot (__main__.TestScript)
#120694 closed Aug 5, 2025
DISABLED test_index (__main__.TestPythonBuiltinOP)
#119160 closed Aug 5, 2025
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 closed Aug 5, 2025
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 closed Aug 5, 2025
[Feature Request] Add support for CUDA sm_120 (RTX 5070 Ti) in prebuilt PyTorch binaries
#159847 closed Aug 5, 2025
Inconsistent Model Results and Failures on Windows with CUDA vs. CPU PyTorch Builds
#156547 closed Aug 5, 2025
The recorded tid in torch kineto profiler do not match with syscall(SYS_gettid) or pthread_self() in Linux
#159771 closed Aug 5, 2025
DISABLED test_undefined_grads_mode_warn (__main__.TestAutogradFallback)
#159471 closed Aug 5, 2025
DISABLED test_pin_memory_no_cuda (__main__.TestDictDataLoader)
#159802 closed Aug 5, 2025

115 Issues opened by 87 people

bug in libtorch optimize_for_inference
#160392 opened Aug 12, 2025
`torch.compile` backward pass fails with `AssertionError` in Inductor C++ codegen when model returns tuple outputs
#160391 opened Aug 12, 2025
Perf differences observed between AOT and JIT on RAFT model
#160388 opened Aug 12, 2025
DISABLED RecordDebugHandles.Basic (__main__.test_jit)
#160387 opened Aug 12, 2025
TTGIR error for FlexAttention on B200
#160385 opened Aug 12, 2025
Use `tf32x3` through Inductor
#160359 opened Aug 11, 2025
Max-autotune is slower than eager for specific jagged tensor shapes
#160355 opened Aug 11, 2025
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_tensor-wise (__main__.SymmetricMemoryTest)
#160347 opened Aug 11, 2025
DISABLED test_custom_functions_and_tracer (__main__.TestFXNumericSuiteNShadows)
#160346 opened Aug 11, 2025
Meta device initialization on HF models + FSDP leads to different numerical behaviour
#160340 opened Aug 11, 2025
[RFC] Simplify bookkeeping of DeviceMesh slicing, (un)flattening, ...
#160337 opened Aug 11, 2025
Tensor subclasses don't work with torch.library.triton_op
#160333 opened Aug 11, 2025
`torch.Tensor` methods return type annotation too broad for `torch.Tensor` subclasses
#160322 opened Aug 11, 2025
FSDP does not reduce the gradient size during backward
#160320 opened Aug 11, 2025
Memory leak when using mark_dirty in Python custom autograd.Function
#160317 opened Aug 11, 2025
[DTensor] Make Redistribute autograd function twice-differentiable
#160313 opened Aug 11, 2025
Reserved memory is much bigger than allocated memory in multi-stream scenario
#160308 opened Aug 11, 2025
Many failures in inductor/test_max_autotune on H100
#160305 opened Aug 11, 2025
Multi device aoti compile not working
#160303 opened Aug 11, 2025
Redundant installation of CMake and Ninja
#160302 opened Aug 11, 2025
[inductor][cpu] Performance regression in 2025-08-02 nightly release
#160296 opened Aug 11, 2025
torch 2.9.0dev on cuda 13 producing Cuda mismatch error during flash/sage attention install
#160293 opened Aug 11, 2025
[FSDP2] Bug: NaN gradient when both HSDP and CPU offload are enabled
#160291 opened Aug 11, 2025
DISABLED test_small_stability (__main__.TestBase)
#160290 opened Aug 11, 2025
Symmetric memory backend bug: OOM with default, errors with CUDA/NCCL, nvshmem works
#160289 opened Aug 11, 2025
Improve CUDAGraph Tree heuristics on starting new generation
#160281 opened Aug 10, 2025
[Documentation Clarity] torch.min/torch.max gradient behavior
#160273 opened Aug 10, 2025
Add Green Context Support for fsdp2
#160272 opened Aug 10, 2025
Error when using torch.compile with dynamic=True: tensor 'guidance' stride mismatch at index 0
#160271 opened Aug 10, 2025
Questions about Volta Support
#160269 opened Aug 10, 2025
XPU out of memory on Intel iGPUs although plenty of memory is available per error message.
#160265 opened Aug 10, 2025
Graph partition + Fake Dependency from view op
#160263 opened Aug 10, 2025
CPU: Pytorch >= 2.7.0 is broken
#160261 opened Aug 10, 2025
PyTorch 2.8: cuda linking issue, undefined symbol
#160248 opened Aug 9, 2025
wan2.1 vae take more gpu memory after compile
#160247 opened Aug 9, 2025
pytorch.org/get-started/locally is wrong about supported Python version (3.12 is not highest)
#160246 opened Aug 9, 2025
DISABLED test_comprehensive_nn_functional_interpolate_trilinear_xpu_float32 (__main__.TestInductorOpInfoXPU)
#160245 opened Aug 9, 2025
DISABLED test_comprehensive_nn_functional_interpolate_trilinear_xpu_float64 (__main__.TestInductorOpInfoXPU)
#160244 opened Aug 9, 2025
DISABLED test_copy_non_blocking_is_pinned_xpu (__main__.AOTInductorTestABICompatibleGpu)
#160243 opened Aug 9, 2025
`torch.compile` produces output mismatch vs eager float64 for some seeds
#160242 opened Aug 9, 2025
`torch.export.export` fails when model is in `eval()` mode due to `torch.typename` being skipped by Dynamo
#160241 opened Aug 9, 2025
Migrate test.sh into python based tests
#160240 opened Aug 9, 2025
xpu: huggingface accelerate test_dynamo test fails on XPU
#160232 opened Aug 8, 2025
Offer official Pytorch Vulkan backend on pytorch.org
#160230 opened Aug 8, 2025
Building cuda kernels with debug information causes some cuda kernel launches to fail due to capacity constraints
#160225 opened Aug 8, 2025
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_row-wise-sharded (__main__.SymmetricMemoryTest)
#160203 opened Aug 8, 2025
Shape propagation through repeated modules during dynamo export
#160199 opened Aug 8, 2025
LBFGS always raises warning about converting a tensor with requires_grad=True to a scalar?
#160197 opened Aug 8, 2025
Assertion Error When Freeing Unbacked Symbolic Shapes in Torch.Export
#160196 opened Aug 8, 2025
[CI] Inherit PYTHONPATH from env instead of overriding it completely
#160193 opened Aug 8, 2025
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_row-wise-replicated (__main__.SymmetricMemoryTest)
#160177 opened Aug 8, 2025
[RFC] Enable cutlass to support Intel GPU into PyTorch Inductor.
#160175 opened Aug 8, 2025
When using DP + TP, DP only parameters diverge across TP ranks if using operations with non-deterministic implementations
#160169 opened Aug 8, 2025
libtorch: CMake 4.1: Error evaluating generator expression: TARGET_PROPERTY:torch,INTERFACE_LINK_LIBRARIES
#160167 opened Aug 8, 2025
aarch64 GPU wheel release in pypi for GH200/GB200 etc
#160162 opened Aug 8, 2025
Symbolic function 'aten::scaled_dot_product_attention' already registered for opset 14. Replacing the existing function with new function. This is unexpected. Please report it on github..
#160157 opened Aug 8, 2025
Add a "max-performance" mode to torch.compile for aggressive optimization
#160142 opened Aug 7, 2025
RuntimeError: miopenStatusUnknownError on PyTorch 2.8.0+rocm6.4
#160141 opened Aug 7, 2025
Can't use nvshmem triton device side function
#160137 opened Aug 7, 2025
Torchrun breaks with virtual environments
#160130 opened Aug 7, 2025
FX graph segments for multiple triton kernels from reduction node are broken
#160124 opened Aug 7, 2025
Add tensor-aware put_signal & wait_signal Triton wrappers to NVSHMEM Triton Kernels
#160122 opened Aug 7, 2025
cuda 13 broken
#160104 opened Aug 7, 2025
torch.library.infer_schema doesn't have full support for list types
#160094 opened Aug 7, 2025
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_0_scale_mode_tensor-wise (__main__.SymmetricMemoryTest)
#160085 opened Aug 7, 2025
compile regression on forward pre hook with prepend=True
#160083 opened Aug 7, 2025
'torch._tensor' has no attribute 'split' while using torch.compile under torch.device context.
#160077 opened Aug 7, 2025
FlexAttention backward compilation failure with GQA on NVIDIA B200
#160074 opened Aug 7, 2025
[TorchDynamo] excessive stack use: stack is 3286 deep on A100 with AMD EPYC CPU
#160071 opened Aug 7, 2025
FSDP2 not compatible with transformers >= 4.54.0 GenericForTokenClassification
#160068 opened Aug 7, 2025
`torch.compile` crashes on model exported via `torch.export`, with lifted constant from cache tensor
#160066 opened Aug 7, 2025
MultiheadAttention returns all NaN when using attn_mask to fully mask some heads, and using `need_weights=True`
#160064 opened Aug 7, 2025
Error raised when running torch.compile under FakeTensorMode context
#160057 opened Aug 7, 2025
`torch.Pad(mode="circular")` doesn't work for 4D or 5D input despite error msg
#160053 opened Aug 7, 2025
DISABLED test_flop_counter_op_options0_cuda_float16 (__main__.TestSchedulerCUDA)
#160052 opened Aug 7, 2025
DISABLED test_cat_extern_kernel_dynamic_shapes_mps (__main__.DynamicShapesGPUTests)
#160051 opened Aug 7, 2025
DISABLED test_retracibility_nested_list_out_dynamic_shapes (__main__.DynamicShapesExportTests)
#160050 opened Aug 7, 2025
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_0_scale_mode_row-wise-sharded (__main__.SymmetricMemoryTest)
#160049 opened Aug 7, 2025
Triton kernel generated for torch.compile(create_block_mask) using Flex Attention throws CUDA Illegal memory access
#160018 opened Aug 6, 2025
Requesting improved torch.auto_grad.detect_anomaly() for NaN detection
#160016 opened Aug 6, 2025
Support NUMA Binding for Callable entrypoints to `elastic_launch`
#160006 opened Aug 6, 2025
[BE][download.pytorch.org] Index fix user navigation
#160005 opened Aug 6, 2025
`torch._inductor.aoti_compile_and_package` fails with CUDA kernels inside of a `torch.cond`
#159995 opened Aug 6, 2025
[DTensor] Decide / Document RNG semantics
#159991 opened Aug 6, 2025
@require_world_size(4) macro is ambiguous or buggy
#159987 opened Aug 6, 2025
[RFC] Cuda support matrix for Release 2.9
#159980 opened Aug 6, 2025
Segmentation fault when using torch.compile with PyTorch 2.8.0 XPU on Intel ARC A770
#159974 opened Aug 6, 2025
SAC compatibility with compiled flex attention
#159970 opened Aug 6, 2025
Compile error with gcc-14 when building vec_test_all_types
#159962 opened Aug 6, 2025
TensorMetadata not retained during DTensor convolution double backpropagation
#159959 opened Aug 6, 2025
torch.compile regression on side-effects between torch 2.7.1 and 2.8 (final RC)
#159958 opened Aug 6, 2025
CUDA initialization: CUDA unknown error
#159954 opened Aug 6, 2025
Add official statically linked libtorch libraries
#159947 opened Aug 6, 2025
Wrong backend assigned to Intel Gaudi (HPU) devices in PT 2.8.0
#159945 opened Aug 6, 2025
Is is possible for pytorch pipeline parallelism to support dynamic shapes(input/output)
#159942 opened Aug 6, 2025
scaled_mm Triton implementation causes wrong results on (at least) H100
#159940 opened Aug 6, 2025
contiguous has BUG ?
#159932 opened Aug 6, 2025
`torch.onnx.export` fails when exporting custom operator with bfloat16 constant tensor
#159928 opened Aug 6, 2025
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_0_scale_mode_row-wise-replicated (__main__.SymmetricMemoryTest)
#159921 opened Aug 6, 2025
[export] Tensor subclass decomposition breaks state dict contracts
#159918 opened Aug 6, 2025
`test_non_contiguous_input_mm_plus_mm` looks broken on B200
#159914 opened Aug 5, 2025
New at::HostAllocator interface prevents using more than one allocator implementation for a device type
#159906 opened Aug 5, 2025
UNSTABLE Check mergeability of ghstack PR / ghstack-mergeability-check
#159899 opened Aug 5, 2025
UNSTABLE Check Labels / Check labels
#159894 opened Aug 5, 2025
torch._dynamo documentation
#159886 opened Aug 5, 2025
Implement `grid_sampler_3d` for MPS
#159882 opened Aug 5, 2025
`torch.nn.functional.sigmoid` produces inconsistent results on different device for complex inputs
#159870 opened Aug 5, 2025
BE: enable auto reformat List to list and similar typing
#159866 opened Aug 5, 2025
DISABLED test_upper_bound_i64_cuda (__main__.AOTInductorTestABICompatibleGpu)
#159860 opened Aug 5, 2025
[compile] Correctness and reproducibility issue
#159855 opened Aug 5, 2025
`torch.cond()` behaves inconsistently when using symbolic predicate
#159852 opened Aug 5, 2025
`tree_flatten(some_pytree, lambda x: isinstance(x, Tensor))` yields `[None], _`
#159848 opened Aug 5, 2025
[DTensor] nn.Embedding Compile Failure when Creating FX Graph
#159843 opened Aug 5, 2025
C++23 forward declaration error :`invalid application of 'sizeof' to an incomplete type 'torch::_export::Graph'`
#159838 opened Aug 5, 2025
Missing documentation for device mesh on DDP
#159836 opened Aug 5, 2025

517 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

Introduce Muon optimizer to PyTorch
#159465 commented on Aug 11, 2025 • 32 new comments
[OpenReg] Add OSX/Windows Support for OpenReg
#159441 commented on Aug 11, 2025 • 20 new comments
Add beginnings of torch::stable::accelerator
#159679 commented on Aug 12, 2025 • 19 new comments
[inductor] dont reuse buffers if it affects peak (#145883)
#159530 commented on Aug 12, 2025 • 15 new comments
Enable output padding when only outermost dim is dynamic
#159404 commented on Aug 12, 2025 • 14 new comments
[Intel GPU] Enable backward for SDPA XPU [WIP]
#156272 commented on Aug 12, 2025 • 13 new comments
[DeviceMesh] Add `_unflatten_` api for device mesh to support better UX for some use cases like EP and replicate
#159482 commented on Aug 6, 2025 • 11 new comments
Update torch::stable::Tensor() default constructor
#159507 commented on Aug 11, 2025 • 11 new comments
[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm
#155357 commented on Aug 8, 2025 • 10 new comments
[PP] Add DualPipeV schedule
#159591 commented on Aug 10, 2025 • 10 new comments
[dynamic shapes] unbacked-safe slicing
#157944 commented on Aug 12, 2025 • 9 new comments
Add utility to get computed kernel in torch.library
#158393 commented on Aug 8, 2025 • 9 new comments
Add support for param mutation under inference mode
#159661 commented on Aug 12, 2025 • 9 new comments
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on Aug 8, 2025 • 9 new comments
Support XPU in --nproc-per-node option to torchrun
#159474 commented on Aug 11, 2025 • 7 new comments
Add placeholder for the User Guide
#159379 commented on Aug 11, 2025 • 6 new comments
Remove accidental host synchronization in autograd cpu offload
#159698 commented on Aug 5, 2025 • 6 new comments
Implement OpenReg device autoload mechanism
#158555 commented on Aug 11, 2025 • 6 new comments
[vllm in torch ci ][step 1/3] add build logics
#159815 commented on Aug 9, 2025 • 6 new comments
Add support for tracing vmap in pre-dispatch export
#154650 commented on Aug 7, 2025 • 5 new comments
[fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL
#158568 commented on Aug 11, 2025 • 5 new comments
[CUDA] Add experimental green context support for SM carveout
#159104 commented on Aug 11, 2025 • 5 new comments
Add option to assert if kernel is not fully fused in foreach_map
#159213 commented on Aug 6, 2025 • 5 new comments
Add `label_smoothing` param in `nn.BCELoss` and `nn.BCEWithLogitsLoss`
#150282 commented on Aug 12, 2025 • 5 new comments
Allow exposing more functions during initial template expansion
#159554 commented on Aug 11, 2025 • 4 new comments
[Draft][CUDA] Upgrade torch._scaled_grouped_mm to SM100+
#156806 commented on Aug 7, 2025 • 4 new comments
Replace setup.py bdist_wheel with python -m build --wheel
#156712 commented on Aug 11, 2025 • 4 new comments
Add fallback support for torch.mm in foreach_map_fn
#159757 commented on Aug 9, 2025 • 4 new comments
[DTensor] Registers sharding rule for rms_norm
#159692 commented on Aug 8, 2025 • 4 new comments
Fix typo in parameter name in cpp.py
#159716 commented on Aug 11, 2025 • 3 new comments
grid_sampler_3d for MPS
#159421 commented on Aug 6, 2025 • 3 new comments
[Intel GPU] Support SDPA backend selection and priority setting on XPU
#159464 commented on Aug 12, 2025 • 3 new comments
Add pad and narrow to torch/csrc/stable/ops.h
#159328 commented on Aug 12, 2025 • 3 new comments
Graph split event tracker
#159795 commented on Aug 11, 2025 • 3 new comments
[caffe2][IGiOS] Fix exhaustive switches
#156832 commented on Aug 8, 2025 • 2 new comments
[inductor] add lowering for repeat_interleave.Tensor with output size specified (#147160)
#158462 commented on Aug 11, 2025 • 2 new comments
Replace `std::runtime_error` with `TORCH_CHECK`
#159344 commented on Aug 8, 2025 • 2 new comments
Bump transformers pin
#159291 commented on Aug 12, 2025 • 2 new comments
[dynamo][guards] Install dict watchers for recrusive dict tag optimization
#159796 commented on Aug 12, 2025 • 2 new comments
[Inductor] addmm + activation function fusion
#158137 commented on Aug 8, 2025 • 2 new comments
add fp8 scaled_mm for XPU
#140972 commented on Aug 9, 2025 • 2 new comments
port 2 distributed pipeline test files for Intel GPU
#159140 commented on Aug 12, 2025 • 2 new comments
[Graph Partition] Pass all OSS unit tests
#154667 commented on Aug 12, 2025 • 2 new comments
Add torch.compile support for torch.mm(out_dtype=...)
#159026 commented on Aug 6, 2025 • 2 new comments
[benchmark] Add HF LLM benchmarks
#156967 commented on Aug 11, 2025 • 1 new comment
Add new_empty (with dtype argument only) to torch::stable
#159508 commented on Aug 12, 2025 • 1 new comment
[DTensor] add op support: aten.squeeze_.dim
#159532 commented on Aug 7, 2025 • 1 new comment
[inductor] skip bmm when converting channel last
#159459 commented on Aug 5, 2025 • 1 new comment
Remove usage of fsspec in HF consolidation script
#159392 commented on Aug 11, 2025 • 1 new comment
Fix torch.export.export() GPU failure with RNN modules.
#155734 commented on Aug 10, 2025 • 1 new comment
[CI][CUDA] Add periodic b200 distributed job
#159323 commented on Aug 9, 2025 • 1 new comment
[inductor][cpu] Fix double-offset issue in `GEMM_TEMPLATE`
#159233 commented on Aug 12, 2025 • 1 new comment
[dynamo] Add -> bool to functions named (_?)(is|has)_.*
#155923 commented on Aug 11, 2025 • 1 new comment
Use device agnostic APIs for RNG
#159021 commented on Aug 6, 2025 • 1 new comment
[dict] Implement `__eq__` for dict_items
#155154 commented on Aug 12, 2025 • 1 new comment
[inductor] initial triton static config lookup table
#157699 commented on Aug 11, 2025 • 1 new comment
[inductor] propagate shapes in CSEVariable
#152198 commented on Aug 12, 2025 • 1 new comment
[ARM] Integrate INT4→BF16 via KleidiAI, with fallback
#158250 commented on Aug 6, 2025 • 1 new comment
fixes #156701
#159715 commented on Aug 6, 2025 • 1 new comment
[cuda][cupy] Improve cupy device placement when device is provided with explicit index
#158529 commented on Aug 10, 2025 • 1 new comment
[WIP] Attempt to fix `torch.backends.cudnn.rnn` import
#159828 commented on Aug 6, 2025 • 1 new comment
[Flex Attn][CPU] support flash decoding for cpu
#159835 commented on Aug 6, 2025 • 0 new comments
[C10d][Gloo] Enable complex datatype support in ProcessGroupGloo
#156633 commented on Aug 11, 2025 • 0 new comments
[build] remove upper version pin for `setuptools<80.0`
#156049 commented on Aug 11, 2025 • 0 new comments
Fuse matmul
#157743 commented on Aug 11, 2025 • 0 new comments
unskipped mobilenet_v3 quantization and mobilenet_v2 quantization plus tests from https://github.com/pytorch/pytorch/issues/125438
#157786 commented on Aug 7, 2025 • 0 new comments
[test][do not merge] Upgrade oneDNN to v3.9
#157994 commented on Aug 5, 2025 • 0 new comments
[claude-code] Add top-level module doc for torch/distributed/tensor/_op_schema.py
#157804 commented on Aug 11, 2025 • 0 new comments
For sdists, replace symlink with copy for docs requirements
#157811 commented on Aug 11, 2025 • 0 new comments
Allow docker builds to deal with symlinks
#157812 commented on Aug 8, 2025 • 0 new comments
[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests
#156599 commented on Aug 11, 2025 • 0 new comments
[dynamo, nested graph breaks] implement new resume frame stack/locals/cell layout convention
#157971 commented on Aug 11, 2025 • 0 new comments
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel
#156140 commented on Aug 6, 2025 • 0 new comments
docstring_linter: Fix #151692 and other issues
#156596 commented on Aug 11, 2025 • 0 new comments
Make functorch notebook symlinks PEP 517 valid
#157813 commented on Aug 11, 2025 • 0 new comments
Improve MANIFEST.in for source distribution
#157814 commented on Aug 11, 2025 • 0 new comments
[Doc] remove WSL2 in support matrix for Intel GPU
#156590 commented on Aug 5, 2025 • 0 new comments
Add PEP 517 compliant Python source distribution to release process
#157815 commented on Aug 11, 2025 • 0 new comments
Add functions to setup PrivateUse1 as a python backend device.
#157859 commented on Aug 5, 2025 • 0 new comments
[generator] Close all open generators in compile_subgraph
#157149 commented on Aug 11, 2025 • 0 new comments
[generator] Raise `StopIteration(value)` with value from the return stmt
#157152 commented on Aug 12, 2025 • 0 new comments
[SymmMem] Install NVSHMEM wheel in CI docker
#157411 commented on Aug 6, 2025 • 0 new comments
[build] bootstrap git repo for build for non-git-clone archive
#157432 commented on Aug 10, 2025 • 0 new comments
[contextlib] Fixes for CPython contextlib tests
#157148 commented on Aug 8, 2025 • 0 new comments
[BE][4/5] fix typos in aten/ (aten/src/ATen/native/)
#157553 commented on Aug 10, 2025 • 0 new comments
[BE][5/5] fix typos in aten/ (aten/src/ATen/)
#157554 commented on Aug 10, 2025 • 0 new comments
[dynamo] [guard] Add caching for inside torch.compile.disable function to avoid unnecessary recompilation.
#157566 commented on Aug 11, 2025 • 0 new comments
Introduce a new API torch.accelerator.get_mem_info
#156812 commented on Aug 9, 2025 • 0 new comments
[simplefsdp auto-bucketing] ir node runtime estimation
#157572 commented on Aug 11, 2025 • 0 new comments
[BE][1/6] fix typos in test/
#157635 commented on Aug 10, 2025 • 0 new comments
[BE] add `SHFMT` linter to format shell scripts
#157685 commented on Aug 10, 2025 • 0 new comments
[BE][1/4] format shell scripts with `SHFMT`
#157686 commented on Aug 10, 2025 • 0 new comments
[CI] Run all torchinductor jobs on MPS
#156773 commented on Aug 6, 2025 • 0 new comments
[BE][2/4] format shell scripts with `SHFMT` in .circleci/ and .github/
#157687 commented on Aug 10, 2025 • 0 new comments
[BE][3/4] format shell scripts with `SHFMT` in .ci/
#157688 commented on Aug 10, 2025 • 0 new comments
[BE][4/4] format shell scripts with `SHFMT` in scripts/
#157689 commented on Aug 10, 2025 • 0 new comments
Remove upper pin on setuptools
#156713 commented on Aug 11, 2025 • 0 new comments
Replace setup.py install with pip install
#156711 commented on Aug 11, 2025 • 0 new comments
Replace setup.py develop with pip install -e
#156710 commented on Aug 11, 2025 • 0 new comments
Stop parsing command line arguments every time common_utils is imported.
#156703 commented on Aug 11, 2025 • 0 new comments
[inductor] Add return types to functions named (_?)(is|has)_.*
#155928 commented on Aug 11, 2025 • 0 new comments
[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op
#155420 commented on Aug 9, 2025 • 0 new comments
[Precompile] Integrate PrecompileContext with CompilePackage
#155384 commented on Aug 9, 2025 • 0 new comments
[export] inline into torch.jit.traced nn module
#155381 commented on Aug 8, 2025 • 0 new comments
Try running test_foreach sequentially
#155366 commented on Aug 6, 2025 • 0 new comments
Try adding bfloat16 to test_nn_lstm
#155338 commented on Aug 5, 2025 • 0 new comments
turn off reorder_for_peak_memory in case of collectives
#155271 commented on Aug 8, 2025 • 0 new comments
[WIP][dynamic shapes] if-then-else for meta_select storage offset
#155269 commented on Aug 5, 2025 • 0 new comments
higher_order_ops.py unimplemented_v2 migration, part1
#155264 commented on Aug 5, 2025 • 0 new comments
[Dynamo] Add CPython default dict tests
#155263 commented on Aug 9, 2025 • 0 new comments
Add more logging
#155219 commented on Aug 6, 2025 • 0 new comments
[WIP][fake tensor] invalidate memos for PropagateUnbackedSymInts
#155187 commented on Aug 9, 2025 • 0 new comments
[OrderedDict] Implement `OrderedDict.popitem(last=...)`
#155153 commented on Aug 12, 2025 • 0 new comments
[OrderedDict] Implement `OrderedDict.move_to_end(key, last=False)`
#155152 commented on Aug 12, 2025 • 0 new comments
Convert to markdown: cpp_extension.rst, cpp_index.rst, cpu.rst, cuda_environment_variables.rst, cuda._sanitizer.rst
#155110 commented on Aug 8, 2025 • 0 new comments
docs: link to Nvidia Container Toolkit in README
#155102 commented on Aug 11, 2025 • 0 new comments
[WIP][dynamic shapes] guard_or_false for are_strides_like_channel_last
#155076 commented on Aug 10, 2025 • 0 new comments
[dict] Implement dict.__ior__ and fix return type in dict.__or__
#155072 commented on Aug 12, 2025 • 0 new comments
[test] lintrunner thing
#155062 commented on Aug 5, 2025 • 0 new comments
[WIP] cast to bf16 before mul op in flex bwd
#154922 commented on Aug 7, 2025 • 0 new comments
[ROCm] SDPA fix mem fault when dropout is enabled
#154864 commented on Aug 11, 2025 • 0 new comments
Fix DataLoader to Pass List to getitems When Using BatchSampler. Fixes Issue_#154810
#154844 commented on Aug 6, 2025 • 0 new comments
Add serialization support for register_constant
#154834 commented on Aug 8, 2025 • 0 new comments
Fix in pytorch do_bench_using_profiling
#154766 commented on Aug 6, 2025 • 0 new comments
[Wheel Variant] Experimental Support
#154733 commented on Aug 5, 2025 • 0 new comments
[vision hash update] update the pinned vision hash
#154694 commented on Aug 12, 2025 • 0 new comments
libtorch_cpu.so link libm.so issue:GLIBC_2.29 not found
#159454 commented on Aug 5, 2025 • 0 new comments
[BE][c10d/Store]add check in pyi
#155855 commented on Aug 12, 2025 • 0 new comments
[ATen MTIA backend] Use aten native CPU fallback function on MTIA
#155845 commented on Aug 11, 2025 • 0 new comments
[CD] Move build_magma.bat to build_magma.py
#155804 commented on Aug 11, 2025 • 0 new comments
[Do not merge] DCP ZOC Test Changes
#155802 commented on Aug 11, 2025 • 0 new comments
[dynamo][guards] Skip dispatch key guards for requires_grad=False
#155756 commented on Aug 11, 2025 • 0 new comments
torch.distributed TCP bind address
#155741 commented on Aug 10, 2025 • 0 new comments
[MPS] Add regression test for memory leak in nn.MaxPool2d
#155730 commented on Aug 11, 2025 • 0 new comments
HOP py_impl register to tensor subclass cannot dispatch
#155726 commented on Aug 11, 2025 • 0 new comments
[CUDA][MAGMA][Linalg][WIP] Remove MAGMA
#155694 commented on Aug 12, 2025 • 0 new comments
Remove unused nonlocal declarations from checkpoint and library helper functions
#155686 commented on Aug 11, 2025 • 0 new comments
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on Aug 12, 2025 • 0 new comments
[WIP][PGO] exclude optimizer state from PGO whitelist
#155643 commented on Aug 10, 2025 • 0 new comments
[Scripts] Add refresh script to clean, pull and build repo
#155639 commented on Aug 11, 2025 • 0 new comments
[dict] Implement dict subclass `fromkeys` classmethod
#155608 commented on Aug 12, 2025 • 0 new comments
[DRAFT] Evaluate feasability of using FunctionalTensor for Example Value
#155606 commented on Aug 6, 2025 • 0 new comments
Remove unnecessary MPSStream initialization
#155602 commented on Aug 12, 2025 • 0 new comments
[do NOT land] DTensor + torch_function_mode + flex_attention dispatch test
#155600 commented on Aug 11, 2025 • 0 new comments
[do NOT land] torch_function_mode + flex_attention dispatch test
#155594 commented on Aug 11, 2025 • 0 new comments
WIP add support for dynamic shapes
#155557 commented on Aug 8, 2025 • 0 new comments
[OrderedDict] Add `bool(OrderedDict)`
#155503 commented on Aug 12, 2025 • 0 new comments
[OrderedDict] Set the correct dict class in UserDefinedDictVariable
#155502 commented on Aug 12, 2025 • 0 new comments
[OrderedDict] Implement `hasattr(..., IteratorVariable)`
#155501 commented on Aug 12, 2025 • 0 new comments
[proxy_tensor] Do not clobber tensor proxies for inplace ops
#155456 commented on Aug 10, 2025 • 0 new comments
Quiet Inductor #135521
#155450 commented on Aug 9, 2025 • 0 new comments
Clean up memory management in impl_func_norm
#155432 commented on Aug 12, 2025 • 0 new comments
[inductor] Improve GEMM loggings
#155427 commented on Aug 11, 2025 • 0 new comments
[cuDNN] cuDNN frontend for LayerNorm RMSNorm
#159682 commented on Aug 6, 2025 • 0 new comments
[dynamo, nested graph breaks] support nested graph breaks x context managers
#159678 commented on Aug 9, 2025 • 0 new comments
[ROCm] Limit number of values per thread for reductions on three dimensions
#159652 commented on Aug 5, 2025 • 0 new comments
[OpenReg] Refactor and Bug Fix
#159640 commented on Aug 8, 2025 • 0 new comments
Moved Autograd Fallback Interface to Header for Use by Out-of-tree Backends
#159639 commented on Aug 8, 2025 • 0 new comments
setup [Do not review]
#159636 commented on Aug 6, 2025 • 0 new comments
[WIP] incomplete view unabcked fix to by pass vllm issue
#159626 commented on Aug 6, 2025 • 0 new comments
Fallback to contiguous layout in convolution lowering on stride mismatch #159462
#159593 commented on Aug 5, 2025 • 0 new comments
[c10d][nvshmem] add nvshmem build rules and dependency for libtorch_cuda
#159562 commented on Aug 12, 2025 • 0 new comments
Add dtype checks in meta dispatch for various ordering ops
#159556 commented on Aug 8, 2025 • 0 new comments
Move config/util to AllocatorConfig for cross-allocator sharing
#159553 commented on Aug 11, 2025 • 0 new comments
Editing and updating glossary to test functionality.
#159544 commented on Aug 6, 2025 • 0 new comments
[1/N]Port 3 distributed/_tools test cases to Intel GPU
#159543 commented on Aug 12, 2025 • 0 new comments
[ci][inductor dashboard] Remove torchao install as its unused
#159501 commented on Aug 5, 2025 • 0 new comments
Add B200 smoke test
#159494 commented on Aug 5, 2025 • 0 new comments
Support `next(iterator, default)`
#159483 commented on Aug 5, 2025 • 0 new comments
Recursively sync fbgemm submodules before build
#159477 commented on Aug 5, 2025 • 0 new comments
[WIP]port sevearl test files under test/distributed to Intel GPU
#159473 commented on Aug 11, 2025 • 0 new comments
[ROCm] Use opportunistic fastatomics based on hueristics
#159430 commented on Aug 8, 2025 • 0 new comments
Fixes for `collections.Counter`
#159368 commented on Aug 5, 2025 • 0 new comments
Fixes for `collections.NamedTuple`
#159367 commented on Aug 5, 2025 • 0 new comments
Change mutation type of `MutableMappingVariable` to `AttributeMutationNew`
#159366 commented on Aug 5, 2025 • 0 new comments
Enable trace through the collections module
#159365 commented on Aug 5, 2025 • 0 new comments
[dynamo] Simplify two methods in ConstDictVariable
#159361 commented on Aug 11, 2025 • 0 new comments
Avoid potential deadlocks in host allocator
#159352 commented on Aug 11, 2025 • 0 new comments
[dynamo, nested graph breaks] support very simple nested graph breaks
#159329 commented on Aug 9, 2025 • 0 new comments
[dynamo, nested graph breaks] use CALL_FUNCTION_EX when calling resume function
#159281 commented on Aug 9, 2025 • 0 new comments
[Inductor] support native_layer_norm_backward mixed dtype for privateuse1
#159830 commented on Aug 5, 2025 • 0 new comments
DO NOT MERGE, testing sequential builds
#159827 commented on Aug 12, 2025 • 0 new comments
Add support for error-ing when there is side effect
#159826 commented on Aug 5, 2025 • 0 new comments
[BE][Dynamo] Type improvements in `_dynamo/utils` to generics
#159824 commented on Aug 11, 2025 • 0 new comments
ci: Add option for sequential wheel building
#159821 commented on Aug 11, 2025 • 0 new comments
[dynamo, nested graph breaks] support nested closures
#159817 commented on Aug 9, 2025 • 0 new comments
[Inductor] Freeze matmul args layouts
#159813 commented on Aug 5, 2025 • 0 new comments
Replace C array with std::array in formatSockAddr
#159812 commented on Aug 6, 2025 • 0 new comments
[inductor] remove no_x_dim
#159810 commented on Aug 11, 2025 • 0 new comments
fix type documentation for context_parallel no_restore_buffers, to prevent user from passing in the wrong type
#159808 commented on Aug 7, 2025 • 0 new comments
[dynamo] fixes to propagate tag safeness
#159807 commented on Aug 12, 2025 • 0 new comments
[146643]Fixed max triton generation
#159797 commented on Aug 6, 2025 • 0 new comments
[dynamo, nested graph breaks] prevent excessive recompilations
#159786 commented on Aug 12, 2025 • 0 new comments
[Tests]: disable logspace tests correctly
#159785 commented on Aug 5, 2025 • 0 new comments
[MTIA Aten Backend] Migrate any.all_out (example diff for tutorial)
#159780 commented on Aug 7, 2025 • 0 new comments
[Caffe2] Add float batch box cox SVE128 implementation
#159778 commented on Aug 11, 2025 • 0 new comments
[Inductor][Triton] Support TMA before strict 3.4 cutoff
#159777 commented on Aug 6, 2025 • 0 new comments
[ROCm] Fix Sliding Window Attention in AOTriton integration code
#159773 commented on Aug 8, 2025 • 0 new comments
Add binary size check to validate current limits for binaries released to pypi
#159768 commented on Aug 5, 2025 • 0 new comments
[CI] Reduce XPU Windows build time
#159763 commented on Aug 11, 2025 • 0 new comments
[FSDP] Add FrozenParamHandle to optimize memory for frozen parameters
#159751 commented on Aug 9, 2025 • 0 new comments
AOT graph capture with dynamo.
#159749 commented on Aug 6, 2025 • 0 new comments
Build and Install Arm Compute Library in manylinux docker image
#159737 commented on Aug 7, 2025 • 0 new comments
Fix GroupNorm(num_groups=1) to match LayerNorm behavior
#159736 commented on Aug 9, 2025 • 0 new comments
Use uv run for lintrunner Python deps
#159735 commented on Aug 11, 2025 • 0 new comments
[Don't Review] Test XPU CI
#159718 commented on Aug 5, 2025 • 0 new comments
dynamo: Remove passing or deleted dynamo_expected_failures
#159691 commented on Aug 8, 2025 • 0 new comments
Recheck Autotune cache on Precompile serialization to prune compilation results
#158656 commented on Aug 5, 2025 • 0 new comments
Update persons of interest for XLA. The previous one is out of date.
#158652 commented on Aug 9, 2025 • 0 new comments
[NumPy] use NumPy 2.x in CI
#158647 commented on Aug 10, 2025 • 0 new comments
[OpenReg] Add Develop Notes for Integrating New Backend into PyTorch(Operator Aspect)
#158644 commented on Aug 8, 2025 • 0 new comments
unskipped flaky conv2d, rmatmul, and matmul as these now pass
#158640 commented on Aug 11, 2025 • 0 new comments
Don't use LLVM libraries
#158623 commented on Aug 12, 2025 • 0 new comments
[simplefsdp auto-bucketing] auto bucketing with greedy algorithm
#158609 commented on Aug 11, 2025 • 0 new comments
[cuda][complex] Use scaling to compute the absolute value of complex number to avoid overflow
#158557 commented on Aug 6, 2025 • 0 new comments
port 3 distributed test to Intel GPU and unified some common functions
#158533 commented on Aug 12, 2025 • 0 new comments
Filter out local timer tests which are unimplemented in Python on AArch64
#158342 commented on Aug 7, 2025 • 0 new comments
Move XPUEvent to c10
#158336 commented on Aug 11, 2025 • 0 new comments
[simplefsdp auto-bucketing] manual bucketing with plan
#158321 commented on Aug 11, 2025 • 0 new comments
autograd: Add VJP and JVP rules for aten::aminmax
#158241 commented on Aug 11, 2025 • 0 new comments
reuse EventPool::Event in CUDAAllocator
#158224 commented on Aug 11, 2025 • 0 new comments
Move EventPool::Event to c10
#158220 commented on Aug 11, 2025 • 0 new comments
Move CUDAEvent to c10
#158219 commented on Aug 11, 2025 • 0 new comments
Multi-threaded concurrent fetching in Dataloader for high-latency storage.
#158218 commented on Aug 7, 2025 • 0 new comments
[scan] cloned aliased input when lowering scan to while_loop
#158168 commented on Aug 12, 2025 • 0 new comments
[DTensor] Assert DTensorSpec has valid placements
#158133 commented on Aug 6, 2025 • 0 new comments
[build] pin `setuptools>=77` to enable PEP 639
#158104 commented on Aug 11, 2025 • 0 new comments
[simplefsdp auto-bucketing] add ir node reorder helper function
#158098 commented on Aug 11, 2025 • 0 new comments
[simplefsdp auto-bucketing] add ir node bucket helper function
#158097 commented on Aug 11, 2025 • 0 new comments
[inductor] add template hashing for template lookup table
#158091 commented on Aug 11, 2025 • 0 new comments
adding types to nn module init
#158065 commented on Aug 11, 2025 • 0 new comments
[dict] Support `dict.update()` with no args
#158061 commented on Aug 12, 2025 • 0 new comments
Update upstream opinfo to generate appropriately scaled sample inputs
#158018 commented on Aug 8, 2025 • 0 new comments
remove unnecessary sync point in AveragedModel update
#158017 commented on Aug 11, 2025 • 0 new comments
[Caffe2] Build perfkernels targeting SVE128
#159274 commented on Aug 11, 2025 • 0 new comments
add try catch around provenance tracking
#159266 commented on Aug 6, 2025 • 0 new comments
[WIP][2/N] Port 5 _composable distributed test to Intel GPU
#159241 commented on Aug 8, 2025 • 0 new comments
Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous.
#159197 commented on Aug 9, 2025 • 0 new comments
[TESTING] Triton pin (Aug 8) 05b2c186c1b6c9a08375389d5efe9cb4c401c075
#159158 commented on Aug 9, 2025 • 0 new comments
[WIP][1/N] Port 5 _composable/fsdp distributed test cases to Intel GPU
#159118 commented on Aug 8, 2025 • 0 new comments
outer heuristic
#159093 commented on Aug 12, 2025 • 0 new comments
port distributed pipeline test files for Intel GPU
#159033 commented on Aug 12, 2025 • 0 new comments
[while_loop] support input mutation with auto_functionalize
#159010 commented on Aug 12, 2025 • 0 new comments
[cond] support input mutation with auto_functionalize
#159009 commented on Aug 12, 2025 • 0 new comments
Docs on export joint with descriptors
#159006 commented on Aug 12, 2025 • 0 new comments
[inductor] add lookup table recorder
#158987 commented on Aug 11, 2025 • 0 new comments
[BE] create an empty shape_env for check_input_alias_and_mutation_return_outputs
#158965 commented on Aug 12, 2025 • 0 new comments
Ensure outer aliasing on DTensor matches inner aliasing
#158954 commented on Aug 12, 2025 • 0 new comments
[Caffe2] Import SVE128 PR
#158932 commented on Aug 11, 2025 • 0 new comments
[triton_heuristics] Optimize the triton launcher in pt2
#158897 commented on Aug 7, 2025 • 0 new comments
[map] support gen_schema for map
#158884 commented on Aug 12, 2025 • 0 new comments
[associative_scan] support gen_schema for associative_scan
#158883 commented on Aug 12, 2025 • 0 new comments
[CI] Switch ROCm MI300 GitHub Actions workflows from 2-GPU to 1-GPU runners
#158882 commented on Aug 11, 2025 • 0 new comments
[scan] support gen_schema for scan
#158864 commented on Aug 12, 2025 • 0 new comments
[while_loop] support gen_schema for while_loop
#158863 commented on Aug 12, 2025 • 0 new comments
Update RandomSampler docstring. data_source must be Sized not Dataset
#158857 commented on Aug 7, 2025 • 0 new comments
[ROCm] Avoid test_conv_backend_cudnn* UTs
#158817 commented on Aug 8, 2025 • 0 new comments
Update nullcontext to return input args
#158776 commented on Aug 8, 2025 • 0 new comments
Guard rocm_smi.h include with a header check
#158771 commented on Aug 5, 2025 • 0 new comments
[ROCm] [CK] Composable Kernel integration for ROCm
#158747 commented on Aug 8, 2025 • 0 new comments
[BE] Upgrade XPU support package to 2025.2
#158733 commented on Aug 8, 2025 • 0 new comments
DISABLED test_empty_cpu_tensor (__main__.CudaGraphTreeTests)
#156735 commented on Aug 7, 2025 • 0 new comments
DISABLED test_graph_break_unsupported_fake (__main__.ReproTests)
#156629 commented on Aug 7, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157335 commented on Aug 7, 2025 • 0 new comments
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on Aug 7, 2025 • 0 new comments
DISABLED test_allocate_in_thread_to_pool (__main__.TestBlockStateAbsorption)
#158764 commented on Aug 7, 2025 • 0 new comments
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on Aug 7, 2025 • 0 new comments
DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)
#157110 commented on Aug 7, 2025 • 0 new comments
DISABLED test_dynamic_warmup (__main__.CudaGraphTreeTests)
#156693 commented on Aug 7, 2025 • 0 new comments
DISABLED test_get_parameter_dtype (__main__.ReproTests)
#156598 commented on Aug 7, 2025 • 0 new comments
DISABLED test_add_sub_alpha_out (__main__.ReproTests)
#156597 commented on Aug 7, 2025 • 0 new comments
Inductor codegen for float8 dynamic quantization ops for scaled_grouped_mm backward pass is slow
#159769 commented on Aug 7, 2025 • 0 new comments
[BUG]Nan in gradients of scaled_dot_product_attention operation with mem_efficient backend
#125674 commented on Aug 7, 2025 • 0 new comments
non-deterministic issue of torch.einsum function on different GPU.
#137389 commented on Aug 7, 2025 • 0 new comments
Error : torch/utils/_sympy/interp.py:176] [0/2] failed while executing pow_by_natural([VR1, int_oo], VR[-1, -1]])
#148003 commented on Aug 8, 2025 • 0 new comments
[MPS] test_linalg_cholesky fails on M4
#157364 commented on Aug 8, 2025 • 0 new comments
Weird dataloader performance degradation caused by torch and numpy import order
#101188 commented on Aug 8, 2025 • 0 new comments
KeyError when using fx.split_module
#155220 commented on Aug 7, 2025 • 0 new comments
DISABLED test_add_loggers_functions (__main__.TestFXNumericSuiteNShadows)
#140380 commented on Aug 7, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157315 commented on Aug 7, 2025 • 0 new comments
`RuntimeError: UR error` with XPU
#149953 commented on Aug 7, 2025 • 0 new comments
Mutating a tensor while serializing with safetensors crashes free-threaded PyTorch
#158071 commented on Aug 7, 2025 • 0 new comments
extern kernel's get_free_symbols seems incomplete
#159685 commented on Aug 7, 2025 • 0 new comments
Support dict as input/output for pipeline parallelism
#159711 commented on Aug 7, 2025 • 0 new comments
Poor error message when trying to jit a function instead of a module (RuntimeError: Cannot insert a Tensor that requires grad as a constant.)
#55282 commented on Aug 7, 2025 • 0 new comments
foreach CUDA tests flaky on CUDA 12.6+ due to flaky profiler results
#148681 commented on Aug 7, 2025 • 0 new comments
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats - PyTorch compile fails with Python 3.12
#153737 commented on Aug 7, 2025 • 0 new comments
Support installing Python bindings in CMake
#159232 commented on Aug 7, 2025 • 0 new comments
General MPS op coverage tracking issue
#77764 commented on Aug 7, 2025 • 0 new comments
Cannot pass through None for example_inputs in prepare_fx
#159505 commented on Aug 7, 2025 • 0 new comments
Feature request: `DataLoader` using multithreading instead of multiprocessing
#158714 commented on Aug 7, 2025 • 0 new comments
DISABLED test_triton_barrier (__main__.NVSHMEMTritonTest)
#158761 commented on Aug 7, 2025 • 0 new comments
DISABLED test_graph_concurrent_replay (__main__.TestCuda)
#104055 commented on Aug 7, 2025 • 0 new comments
Add official support for CUDA sm_120 (RTX 5090 / Blackwell architecture)
#159207 commented on Aug 8, 2025 • 0 new comments
DISABLED test_end_recording_early (__main__.CudaGraphTreeTests)
#156778 commented on Aug 8, 2025 • 0 new comments
DISABLED test_add_loggers_linear_mod_fp32_quant (__main__.TestFXNumericSuiteNShadows)
#142860 commented on Aug 8, 2025 • 0 new comments
DISABLED test_dont_dce_rand (__main__.ReproTests)
#156580 commented on Aug 8, 2025 • 0 new comments
DISABLED test_assigning_back_deleter_fns_to_tensor (__main__.TestBlockStateAbsorption)
#134810 commented on Aug 8, 2025 • 0 new comments
DISABLED test_aot_autograd_runtime_wrapper_prologue_profiled (__main__.ReproTests)
#156678 commented on Aug 8, 2025 • 0 new comments
DISABLED test_error_on_dealloc_use (__main__.CudaGraphTreeTests)
#156801 commented on Aug 8, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142566 commented on Aug 8, 2025 • 0 new comments
Support for _Float16/C++23 std::float16_t
#157776 commented on Aug 8, 2025 • 0 new comments
Add nn.GradBank for gradient scaling to prevent vanishing/exploding gradients
#159765 commented on Aug 8, 2025 • 0 new comments
RuntimeError: Tried to instantiate dummy base class Stream
#159744 commented on Aug 8, 2025 • 0 new comments
DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes
#13246 commented on Aug 8, 2025 • 0 new comments
[discussion, idea] Batched, vectorized base64 decoding / encoding + maybe RLE decoding / encoding
#90560 commented on Aug 8, 2025 • 0 new comments
canUse32BitIndexMath set to False with efficient net
#155225 commented on Aug 8, 2025 • 0 new comments
DISABLED test_sort_large_cuda_float16 (__main__.TestSortAndSelectCUDA)
#159426 commented on Aug 8, 2025 • 0 new comments
Deterministic implementation for grid_sampler_2d_backward_cuda
#68959 commented on Aug 8, 2025 • 0 new comments
Obscure error: Expected a value of type 'List[int]' for argument 'sizes' but instead found type 'immutable_list'
#122129 commented on Aug 8, 2025 • 0 new comments
Incorrect hint calculated for expression involving unbacked SymInt
#130456 commented on Aug 8, 2025 • 0 new comments
Remove redundant type aliases of _device for torch.Device
#152952 commented on Aug 8, 2025 • 0 new comments
DISABLED test_addr_alpha_beta_out (__main__.ReproTests)
#156641 commented on Aug 8, 2025 • 0 new comments
Dead link in `torch.compile` docs
#119272 commented on Aug 8, 2025 • 0 new comments
[feature request] Inplace downcast dtype conversion
#158710 commented on Aug 8, 2025 • 0 new comments
`tensordot` not working for dtype int32 and lower when there is only 1 element in the given axis
#84530 commented on Aug 8, 2025 • 0 new comments
DISABLED test_triton_broadcast (__main__.NVSHMEMTritonTest)
#158908 commented on Aug 8, 2025 • 0 new comments
DISABLED test_add_loggers_linear_mod_fp32_fp32 (__main__.TestFXNumericSuiteNShadows)
#159036 commented on Aug 8, 2025 • 0 new comments
DISABLED test_empty_storage (__main__.CudaGraphTreeTests)
#156755 commented on Aug 8, 2025 • 0 new comments
DISABLED test_relative_import (__main__.ReproTests)
#156679 commented on Aug 8, 2025 • 0 new comments
DISABLED test_compile_kernel_advanced (__main__.TestCompileKernel)
#157172 commented on Aug 8, 2025 • 0 new comments
DISABLED test_relative_import_no_modulename (__main__.ReproTests)
#156691 commented on Aug 8, 2025 • 0 new comments
UNSTABLE rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default)
#158182 commented on Aug 8, 2025 • 0 new comments
UNSTABLE inductor-rocm-mi300 / rocm-py3.10-inductor-mi300 / test (inductor)
#154884 commented on Aug 8, 2025 • 0 new comments
Fix for special.zeta nan handling - follow-up PR #138653
#146618 commented on Aug 8, 2025 • 0 new comments
DISABLED test_function_compiled_multiple_times (__main__.CudaGraphTreeTests)
#157143 commented on Aug 5, 2025 • 0 new comments
DISABLED test_name_match (__main__.TestGuardSerialization)
#156246 commented on Aug 5, 2025 • 0 new comments
[BE] linter to detect unused docker images
#158783 commented on Aug 5, 2025 • 0 new comments
ModuleDict subscription no longer works after compile().
#159831 commented on Aug 5, 2025 • 0 new comments
`torch.load` can't deserialize `datetime` objects, even with the appropriate `safe_globals`
#152985 commented on Aug 5, 2025 • 0 new comments
DISABLED test_add_loggers_conv_bn_relu_fusion_fp32 (__main__.TestFXNumericSuiteNShadows)
#158762 commented on Aug 6, 2025 • 0 new comments
DISABLED test_zero_bubble_with_model_kwargs_ScheduleClass1 (__main__.ScheduleTest)
#154579 commented on Aug 6, 2025 • 0 new comments
DISABLED test_graph_partition (__main__.CudaGraphTreeTests)
#157173 commented on Aug 6, 2025 • 0 new comments
DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
#157258 commented on Aug 6, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157278 commented on Aug 6, 2025 • 0 new comments
Inconsistent Error Message for Cross-Device Input in torch.compile
#159133 commented on Aug 6, 2025 • 0 new comments
CMake Error: When installing PyTorch from source, CUDA not being detected.
#134331 commented on Aug 6, 2025 • 0 new comments
native_layer_norm_backward supports mixed precision for PrivateUse1
#159829 commented on Aug 6, 2025 • 0 new comments
User Triton Kernels Are not Serialized in Fx Graph Runnable
#153475 commented on Aug 6, 2025 • 0 new comments
[RFC]: PyTorch Low-Precision GEMMs Public API
#157950 commented on Aug 6, 2025 • 0 new comments
Add mean and var operation for Nested Tensors
#138831 commented on Aug 6, 2025 • 0 new comments
foreach_map enhancements
#158968 commented on Aug 5, 2025 • 0 new comments
torch compile produces nans with GQA
#159469 commented on Aug 5, 2025 • 0 new comments
[Feat] Tracking for OpenReg Improvements
#158917 commented on Aug 5, 2025 • 0 new comments
ONNX export via Dynamo sets `dft_length = 1` in `DFT`, breaking shape-inference for `torch.fft.rfft`
#155997 commented on Aug 5, 2025 • 0 new comments
mmap fails on 64k page aarch64 systems for AOTI model loading
#145610 commented on Aug 5, 2025 • 0 new comments
DISABLED test_conv2d_api (__main__.TestQuantizedFunctionalOps)
#157346 commented on Aug 5, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_mv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142697 commented on Aug 5, 2025 • 0 new comments
DISABLED test_mempool_limited_memory_with_allocator (__main__.TestMemPool)
#157256 commented on Aug 5, 2025 • 0 new comments
DDP+TP composition does not work as expected
#157445 commented on Aug 5, 2025 • 0 new comments
Tracking issue: Incorrect Meta Strides / Turn On PyDispatcher in FakeTensor Mode
#145094 commented on Aug 5, 2025 • 0 new comments
Experimental Wheel Variant Support - Technical Discussion
#155141 commented on Aug 5, 2025 • 0 new comments
Compilation of the post-training quantized model using Nvidia ModelOpt is failing with the error: Unsupported — 'inline in skipfiles: QuantLinearConvBase.quantize_weight
#151450 commented on Aug 5, 2025 • 0 new comments
torch.empty should consider np.bool while parsing args
#159739 commented on Aug 5, 2025 • 0 new comments
Inductor doesn't fuse outer dimension softmax into a single kernel.
#93718 commented on Aug 5, 2025 • 0 new comments
[NJT] can only chunk if the 2nd dimension is ragged
#153238 commented on Aug 5, 2025 • 0 new comments
DISABLED test_dont_aggressively_write_assert (__main__.ReproTests)
#156570 commented on Aug 5, 2025 • 0 new comments
DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
#157280 commented on Aug 7, 2025 • 0 new comments
DISABLED test_vmap_exhaustive_addmv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157617 commented on Aug 7, 2025 • 0 new comments
DISABLED test_call_count_tunableop_cuda_float32 (__main__.TestLinalgCUDA)
#155953 commented on Aug 7, 2025 • 0 new comments
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on Aug 7, 2025 • 0 new comments
NotImplementedError: Could not run 'aten::q_scale' with arguments from the 'CPU'
#159743 commented on Aug 7, 2025 • 0 new comments
Bug: `torch.compile` triggers C++ compile error due to conflicting declaration in generated `.cpp` code
#159245 commented on Aug 7, 2025 • 0 new comments
[DTensor][FSDP2][DDP] benchmark dtensor cpu overhead for adam optimizer
#159169 commented on Aug 7, 2025 • 0 new comments
[Flex Attention] Accuracy issue with kv length not multiple of kv block size
#159247 commented on Aug 7, 2025 • 0 new comments
torch.export with nn.Transformer creates a non-contiguous memory tensor for aten.view
#159126 commented on Aug 7, 2025 • 0 new comments
Torch profiler corrupted names with Python 3.11
#121219 commented on Aug 7, 2025 • 0 new comments
Segmentation fault with ITIMER_REAL
#57185 commented on Aug 7, 2025 • 0 new comments
Foreach Where support
#117884 commented on Aug 7, 2025 • 0 new comments
get_ema_multi_avg_fn() equation is a little confused
#155551 commented on Aug 7, 2025 • 0 new comments
Autogenerate code example / tutorial outputs in documentation
#6662 commented on Aug 7, 2025 • 0 new comments
Is DTensor support dynamic shapes while using torch.compile ?
#159635 commented on Aug 7, 2025 • 0 new comments
Einsum of 2 dtensors fails in inference mode
#157631 commented on Aug 7, 2025 • 0 new comments
Enable CUDA 12.9 binaries
#155196 commented on Aug 6, 2025 • 0 new comments
extract statistics from attention weights in FlexAttention
#159770 commented on Aug 6, 2025 • 0 new comments
Enable CI and Build Support for PyTorch on PPC64LE Architecture
#141235 commented on Aug 6, 2025 • 0 new comments
Transition Adam and maybe other optimizers to fused code path by default to avoid `foreach=True`-specific VRAM peak due to temp TensorList for bias-corrected moments
#158371 commented on Aug 6, 2025 • 0 new comments
Different behavior between sparse and dense tensors with broadcasting multiplication.
#158861 commented on Aug 6, 2025 • 0 new comments
Performance bug on `inplace` of `nn.ELU`
#159622 commented on Aug 6, 2025 • 0 new comments
cuda memory error thrown by torch.
#150048 commented on Aug 6, 2025 • 0 new comments
which Pythorch suports "TORCH_USE_CUDA_DSA=1" from the shell environment?
#121969 commented on Aug 6, 2025 • 0 new comments
[dynamo] torch.randint_like on DTensor does not work with compile
#156649 commented on Aug 6, 2025 • 0 new comments
[AOTI] Severe Performance Regression with FP16 Autocast in AOTInductor for Small Batch Sizes
#159346 commented on Aug 6, 2025 • 0 new comments
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on Aug 6, 2025 • 0 new comments
DISABLED test_module_and_optimizer_ids (__main__.TestTorchTidyProfiler)
#87581 commented on Aug 7, 2025 • 0 new comments
DISABLED test_add_loggers_conv_bn_relu_fusion_quant (__main__.TestFXNumericSuiteNShadows)
#127814 commented on Aug 7, 2025 • 0 new comments
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 commented on Aug 7, 2025 • 0 new comments
DISABLED test_dtensor_seq_par_shard_dim_1 (__main__.MicroPipelineTPTest)
#153223 commented on Aug 7, 2025 • 0 new comments
DISABLED test_triton_alltoall (__main__.NVSHMEMTritonTest)
#158840 commented on Aug 7, 2025 • 0 new comments
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on Aug 9, 2025 • 0 new comments
Replacing explicit backend search with api call
#144944 commented on Aug 9, 2025 • 0 new comments
Support contextlib.ExitStack
#146506 commented on Aug 12, 2025 • 0 new comments
Enable explicitly vectorized `_weight_int8pack_mm` op for FP16 dtype on x86_64 CPU
#146777 commented on Aug 5, 2025 • 0 new comments
[DO NOT MERGE][Inductor] Migrate from oneDNN Inner Product to oneDNN MatMul for mkldnn._linear_pointwise and mkldnn._linear_pointwise.binary
#147360 commented on Aug 12, 2025 • 0 new comments
[ONNX] Migrate onnx ops decomp functions
#147469 commented on Aug 6, 2025 • 0 new comments
[test] check labels
#147470 commented on Aug 5, 2025 • 0 new comments
Support `contextlib.suppress`
#147990 commented on Aug 11, 2025 • 0 new comments
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on Aug 8, 2025 • 0 new comments
[pytree] simplify public API exposition with `__module__`
#148328 commented on Aug 8, 2025 • 0 new comments
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on Aug 8, 2025 • 0 new comments
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on Aug 8, 2025 • 0 new comments
[triton hash update] update the pinned triton hash
#148492 commented on Aug 12, 2025 • 0 new comments
Remove shebang line from easy_install generated python scripts on Windows only
#148673 commented on Aug 5, 2025 • 0 new comments
Support int step for nonfused optimizer
#148956 commented on Aug 8, 2025 • 0 new comments
Update the heuristic for AArch64 bmm/baddbmm
#149122 commented on Aug 8, 2025 • 0 new comments
TorchInductor CPU Performance Dashboard
#93531 commented on Aug 12, 2025 • 0 new comments
[inductor] [cpu] `torch.nn.Embedding-torch.index_copy` outputs inconsistent results on cpu inductor
#156786 commented on Aug 12, 2025 • 0 new comments
Automated submodule update: kineto
#106149 commented on Aug 11, 2025 • 0 new comments
Automated submodule update: FBGEMM
#115316 commented on Aug 12, 2025 • 0 new comments
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on Aug 6, 2025 • 0 new comments
Remove deprecated torch/csrc/jit/codegen/cuda
#131296 commented on Aug 11, 2025 • 0 new comments
Add decompositions for median and nonmedian
#134881 commented on Aug 7, 2025 • 0 new comments
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on Aug 8, 2025 • 0 new comments
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on Aug 6, 2025 • 0 new comments
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on Aug 11, 2025 • 0 new comments
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on Aug 11, 2025 • 0 new comments
[Don't Review] Test CI
#139971 commented on Aug 9, 2025 • 0 new comments
Using acc_t for log_softmax
#143896 commented on Aug 5, 2025 • 0 new comments
Defaults to C++20 for torch targets
#143959 commented on Aug 11, 2025 • 0 new comments
[ci] Add riscv opt-int build
#143979 commented on Aug 8, 2025 • 0 new comments
Full static typing for `torch.distributions`
#144219 commented on Aug 11, 2025 • 0 new comments
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on Aug 8, 2025 • 0 new comments
[CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
#152642 commented on Aug 9, 2025 • 0 new comments
[invoke_subgraph] Force the output stride to be same as eager
#152806 commented on Aug 5, 2025 • 0 new comments
[MSVC] Enable updated lambda processor by setting compiler flag /Zc:lambda globally
#152828 commented on Aug 5, 2025 • 0 new comments
[AOTI Debugging] Add Environment Variable to control output path
#153391 commented on Aug 11, 2025 • 0 new comments
[AUTOCAST] FEAT: Allow passing a `torch.device` object to autocast
#153539 commented on Aug 9, 2025 • 0 new comments
Updates contextlib with ParamSpec
#153623 commented on Aug 6, 2025 • 0 new comments
Add TORCH_CHECK for group < channels for native_channel_shuffle
#153781 commented on Aug 5, 2025 • 0 new comments
Fix `LLONG_MIN` errors in `torch.jit.script`
#153793 commented on Aug 5, 2025 • 0 new comments
[Dynamo] Fixes for exceptions
#153966 commented on Aug 11, 2025 • 0 new comments
[cond] support gen_schema for cond
#154193 commented on Aug 12, 2025 • 0 new comments
Revert D74898941 (#154188)
#154203 commented on Aug 8, 2025 • 0 new comments
Updated padding validation in max_pool functions to account for dilation
#154395 commented on Aug 10, 2025 • 0 new comments
[Inductor] Fix remove_noop_ops pass where the types for the same_meta would differ
#154460 commented on Aug 6, 2025 • 0 new comments
[WIP][export][cond] support exporting cond with unbacked symint shaped tensor
#154570 commented on Aug 5, 2025 • 0 new comments
Use official CUDAToolkit module in CMake
#154595 commented on Aug 11, 2025 • 0 new comments
Fix `SequentialLR` deprecate warning about invoke `step(epoch)`
#149392 commented on Aug 8, 2025 • 0 new comments
Introduce test skip markers for Sandcastle
#150934 commented on Aug 5, 2025 • 0 new comments
[hop] Make base_hop share utils with control flow ops in backward
#151146 commented on Aug 5, 2025 • 0 new comments
[WIP] Generalize device caching allocator
#151298 commented on Aug 8, 2025 • 0 new comments
Add inductor backend to device interface; make minifier_tests more device agnostic
#151314 commented on Aug 6, 2025 • 0 new comments
Fix skipIfXpu and skipIfHpu disables tests when used on class
#151315 commented on Aug 5, 2025 • 0 new comments
AMD/ROCm OCP Micro-scaling Format (mx-fp8/mx-fp4) Support
#151360 commented on Aug 11, 2025 • 0 new comments
Add is_pinned to host allocator
#151439 commented on Aug 10, 2025 • 0 new comments
Allow to byteswap data when reading saved torch jit data
#151447 commented on Aug 7, 2025 • 0 new comments
[WIP] Deprecate getPinnedMemoryAllocator use getHostAllocator instead
#151531 commented on Aug 10, 2025 • 0 new comments
GEMM-template Horizontal
#151780 commented on Aug 7, 2025 • 0 new comments
[WIP] Deprecate AcceleratorHooksInterface isPinnedPtr, use at::getHostAllocator()->is_pinned instead
#151916 commented on Aug 10, 2025 • 0 new comments
Expand cache logging
#152026 commented on Aug 10, 2025 • 0 new comments
Work around MPSGraph issue in backward pass of nn.ReplicationPad1d/2d
#152094 commented on Aug 11, 2025 • 0 new comments
[jit] DeadCodeEliminator Mark(block) improvement
#152348 commented on Aug 10, 2025 • 0 new comments
Build libgomp (gcc-13) from src on AArch64
#152361 commented on Aug 7, 2025 • 0 new comments
Bitwise-perfect method for (de)serializing tensors in base64
#93859 commented on Aug 10, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157368 commented on Aug 11, 2025 • 0 new comments
[RFC] Graph generalization
#158827 commented on Aug 11, 2025 • 0 new comments
[export] Unable to trace ops like min/pow
#148389 commented on Aug 11, 2025 • 0 new comments
DISABLED test_add_loggers_linear_mod_quant_fp32 (__main__.TestFXNumericSuiteNShadows)
#159152 commented on Aug 11, 2025 • 0 new comments
DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
#157339 commented on Aug 11, 2025 • 0 new comments
DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
#157312 commented on Aug 11, 2025 • 0 new comments
[ARM] multiple test failures in TestQuantizedConv on Aarch64
#144770 commented on Aug 11, 2025 • 0 new comments
Unable to Specify CUDA Stream for Collective Operations Using with torch.cuda.stream() context
#136187 commented on Aug 11, 2025 • 0 new comments
Allow slicing of Nested Tensors along constant dimensions
#108567 commented on Aug 11, 2025 • 0 new comments
torch.nn.InstanceNorm2d throws "mixed dtype" error with track_running_stats set to True
#139140 commented on Aug 11, 2025 • 0 new comments
Data corruption when reading data as CUDA tensor from a different process
#134273 commented on Aug 11, 2025 • 0 new comments
Update epsilon logic to improve numerical stability
#151110 commented on Aug 11, 2025 • 0 new comments
[Doc] [Win] libuv installation doc is not correct.
#148315 commented on Aug 11, 2025 • 0 new comments
[ONNX] broadcast_in_dim: model (ReDimNet)
#138313 commented on Aug 11, 2025 • 0 new comments
Most requested ops for the MPS backend
#154052 commented on Aug 11, 2025 • 0 new comments
Support for non-scalar valued Loss Tensor in `_export_forward_backward()`
#159316 commented on Aug 9, 2025 • 0 new comments
Gross mismatch in PDF between CUDA and CPU for multivariate Gaussian mixture models
#156959 commented on Aug 9, 2025 • 0 new comments
There is a performance drop because we have not yet implemented the batching rule for aten::_scaled_dot_product_efficient_attention_backward
#117016 commented on Aug 9, 2025 • 0 new comments
Performance issue on Windows with a "benchmark" comparing to Linux and WLS
#87692 commented on Aug 9, 2025 • 0 new comments
Get https://github.com/pytorch/benchmark working
#87697 commented on Aug 9, 2025 • 0 new comments
Compiling attention (SDPA) with nested tensors fails when using DDP
#152068 commented on Aug 9, 2025 • 0 new comments
Cannot configure dist_timeout when using device_mesh
#119574 commented on Aug 9, 2025 • 0 new comments
Performance regression: torch.jit.trace() significantly slower on RTX 5090 than RTX 4060 (cu128 nightly)
#159238 commented on Aug 9, 2025 • 0 new comments
MPS Sparse Support
#129842 commented on Aug 9, 2025 • 0 new comments
[feature request] Native checkpointing to/from `s3://` (DCP / `torch.load` / `torch.save`)
#155992 commented on Aug 9, 2025 • 0 new comments
PyTorch 2.8.0 exposes statically linked `libstdc++` CXX11 ABI symbols.
#133437 commented on Aug 10, 2025 • 0 new comments
strange error when distributed training
#150081 commented on Aug 10, 2025 • 0 new comments
torch_function objects passed as non-Tensor args should trigger overrides
#119194 commented on Aug 10, 2025 • 0 new comments
State of Torch Named Tensors
#60832 commented on Aug 10, 2025 • 0 new comments
Activation Checkpointing dtype Mismatch in Recomputed Activations Due to Skipped Type Casting in FSDPv2 _pre_forward
#159359 commented on Aug 10, 2025 • 0 new comments
Torch Compile. CudaGraphs Memory leak
#159669 commented on Aug 10, 2025 • 0 new comments
Triton 3.5 / PyTorch 2.9 Pin Update Tracker
#159704 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on Aug 12, 2025 • 0 new comments
DISABLED test_cdist_large_batch_cpu (__main__.TestTorchDeviceTypeCPU)
#158909 commented on Aug 12, 2025 • 0 new comments
DISABLED test_gru (__main__.TestXNNPACKQuantizer)
#158116 commented on Aug 12, 2025 • 0 new comments
DISABLED test_execution_into_recording (__main__.CudaGraphTreeTests)
#156838 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on Aug 12, 2025 • 0 new comments
DISABLED test_error_on_dealloc_use2 (__main__.CudaGraphTreeTests)
#156808 commented on Aug 12, 2025 • 0 new comments
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on Aug 12, 2025 • 0 new comments
DISABLED test_against_reference_multi_input_jacfwd_cuda (__main__.TestJacCUDA)
#156998 commented on Aug 12, 2025 • 0 new comments
DISABLED test_op_has_batch_rule___rmatmul___cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157003 commented on Aug 12, 2025 • 0 new comments
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#82340 commented on Aug 12, 2025 • 0 new comments
Enable CUDA 13.0 binaries
#159779 commented on Aug 12, 2025 • 0 new comments
Immutable assignment (akin to `array.at[...]` in JAX)
#159784 commented on Aug 11, 2025 • 0 new comments
Re-enable Low Memory Dropout
#102319 commented on Aug 11, 2025 • 0 new comments
dataloader hits CUDA error: invalid argument in data.pin_memory(device) on RTX 3090
#159447 commented on Aug 11, 2025 • 0 new comments
RandomSampler docs wrongly states type for data_source is Dataset, when it is Sized
#158631 commented on Aug 11, 2025 • 0 new comments
Execution of an `ExportProgram` of a model with `torch.autograd.grad(torch.sqrt(x), x, torch.ones_like(torch.sqrt(x)))` returns a `FakeTensor` instead of a real tensor
#155044 commented on Aug 11, 2025 • 0 new comments
DISABLED test_resnet (__main__.TestBlockStateAbsorption)
#157725 commented on Aug 11, 2025 • 0 new comments
DISABLED test_bitwise_print_precedence (__main__.ReproTests)
#156736 commented on Aug 11, 2025 • 0 new comments
DTensor Compile w/ Dynamic Shapes Autograd - Unhashable SymInt in sharding propagation when inputs have requires_grad=True
#159590 commented on Aug 11, 2025 • 0 new comments
Tensor roll with different shifts
#125422 commented on Aug 11, 2025 • 0 new comments
Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`
#124480 commented on Aug 11, 2025 • 0 new comments
Parallel Associative Scan
#95408 commented on Aug 11, 2025 • 0 new comments
Wrong meta function for constant_pad_nd
#144187 commented on Aug 11, 2025 • 0 new comments
C10d Elastic Training master_addr ERROR
#74824 commented on Aug 11, 2025 • 0 new comments
torch.export seems to emit invalid code for Tensor.split when used with meta device
#154721 commented on Aug 11, 2025 • 0 new comments
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on Aug 11, 2025 • 0 new comments
[vulkan] Vulkan backend fails creating tensor on x86_64 Linux
#72775 commented on Aug 11, 2025 • 0 new comments