-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Insights: pytorch/pytorch
Overview
-
- 0 Merged pull requests
- 239 Open pull requests
- 115 Closed issues
- 115 New issues
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v2.8.0 PyTorch 2.8.0 Release
published
Aug 6, 2025
239 Pull requests opened by 125 people
-
[WIP][symm_mem] Add a wait for signal and put signal for one side API
#159837 opened
Aug 5, 2025 -
Improve README.md formatting and fix documentation errors
#159841 opened
Aug 5, 2025 -
[AOTInductor] ABI-Compatibility for RecordFunction.
#159842 opened
Aug 5, 2025 -
Revert "[BE] Update xpu driver repo for CD used almalinux 8.10 (#1573…
#159849 opened
Aug 5, 2025 -
Guard the CPU cpp wrapper tests on having a cpp wrapper
#159850 opened
Aug 5, 2025 -
[dtensor] fix incorrect norm calculation for Partial DTensors
#159856 opened
Aug 5, 2025 -
[Do not merge][TensorPipe] Test PR https://github.com/pytorch/tensorpipe/pull/464
#159857 opened
Aug 5, 2025 -
[ROCm] Clean up CUDA state between tests
#159858 opened
Aug 5, 2025 -
Added PyTorch LUT optimisation for GELU bf16 operators
#159859 opened
Aug 5, 2025 -
Issue 146167 inductor type hints lowering 1
#159861 opened
Aug 5, 2025 -
Fix skipIfXpu and skipIfHpu and similar skip decorators
#159862 opened
Aug 5, 2025 -
Implement `list(UserDefinedObject)` via `force_unpack_var_sequence`
#159864 opened
Aug 5, 2025 -
[collections.abc] Ensure that binop calls works with UserDefinedObjects
#159865 opened
Aug 5, 2025 -
error message for instantiating CUDA Stream if CUDA not available
#159868 opened
Aug 5, 2025 -
Add linux-aarch64 and windows python 3.14 nightly builds
#159869 opened
Aug 5, 2025 -
ParamwiseScheduler for different schedulers for specific parameter groups
#159873 opened
Aug 5, 2025 -
[pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory
#159874 opened
Aug 5, 2025 -
[ca] enable on PYTORCH_TEST_WITH_INDUCTOR
#159875 opened
Aug 5, 2025 -
Automated submodule update: tensorpipe
#159876 opened
Aug 5, 2025 -
Fix meta for constant_pad_nd
#159878 opened
Aug 5, 2025 -
Make distributed modules importable without distributed build
#159889 opened
Aug 5, 2025 -
Add tests for Gaussian Mixture Model numerical consistency
#159893 opened
Aug 5, 2025 -
tools: Add option to log build output to a file
#159895 opened
Aug 5, 2025 -
Allow torch.hub.load with unauthorized GITHUB_TOKEN
#159896 opened
Aug 5, 2025 -
[DTensor] Migrate tests to Continuous test base
#159898 opened
Aug 5, 2025 -
[pt2e] Avoid getting model device once per node
#159901 opened
Aug 5, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_collections`
#159902 opened
Aug 5, 2025 -
[BE] Add linter to detect unused docker images
#159905 opened
Aug 5, 2025 -
[DO NOT MERGE] Perf-testing #158137 addmm_fusion 2
#159909 opened
Aug 5, 2025 -
[CUDA] Bump tolerances for `test_baddmm`
#159915 opened
Aug 5, 2025 -
Fix typo
#159919 opened
Aug 6, 2025 -
Close some sources of fake tensor leakages
#159923 opened
Aug 6, 2025 -
[AOTI] Check if triton is installed when BUILD_AOT_INDUCTOR_TEST=1
#159934 opened
Aug 6, 2025 -
added class or module info for functions blocked by weight-only load
#159935 opened
Aug 6, 2025 -
Fix AdaptiveMaxPoll index error
#159936 opened
Aug 6, 2025 -
[WIP][device_mesh] Move global state into class method
#159937 opened
Aug 6, 2025 -
[Intel GPU PT2E] Infer runtime out dtype based on dequant node in pattern
#159941 opened
Aug 6, 2025 -
[CD] Add ptl aot target in xpu windows build
#159943 opened
Aug 6, 2025 -
[WIP] enable some tests in test_ops.TestCommon on Intel GPU
#159944 opened
Aug 6, 2025 -
avoid bit cast for bfloat16_t
#159946 opened
Aug 6, 2025 -
Update SECURITY.md - added branded name as PyTorch instead of Pytorch and others.
#159953 opened
Aug 6, 2025 -
[inductor] fix triton bucketize mask propagation
#159961 opened
Aug 6, 2025 -
fix fake_tensor for aten._to_copy
#159964 opened
Aug 6, 2025 -
Fix redundant move warnings in dim.cpp
#159966 opened
Aug 6, 2025 -
Support device ordering in Shard
#159967 opened
Aug 6, 2025 -
Add mast job name from env variable
#159971 opened
Aug 6, 2025 -
[bucketing] Bucket only adjacent collectives to prevent reordering
#159983 opened
Aug 6, 2025 -
Allow setting quantized engine to none
#160003 opened
Aug 6, 2025 -
[export] Fix custom ops in subgraphs
#160004 opened
Aug 6, 2025 -
Set PYTHONHOME for inductor subprocesses using torch
#160008 opened
Aug 6, 2025 -
[torch.mtia] fix a bug in storage.resize_()
#160017 opened
Aug 6, 2025 -
[1/3][ghstack] [vllm ci build setup ]setup lumen_cli
#160043 opened
Aug 7, 2025 -
[ghstack] setup torch_cli build
#160044 opened
Aug 7, 2025 -
Update (base update)
#160045 opened
Aug 7, 2025 -
ci: Reduce the amount of log spam for manywheel
#160055 opened
Aug 7, 2025 -
ci: Remove app from GH_CHECKSUITES_FRAGMENT
#160056 opened
Aug 7, 2025 -
Fix profiler stack trace names
#160058 opened
Aug 7, 2025 -
Error when there is side effect in strict mode
#160060 opened
Aug 7, 2025 -
Clarify EMA equation in get_ema_multi_avg_fn docstring
#160061 opened
Aug 7, 2025 -
Update torch-xpu-ops commit pin
#160062 opened
Aug 7, 2025 -
Fix hpu backend mapping issue
#160063 opened
Aug 7, 2025 -
Move hardware_destructive_interference_size to c10/core/alignment.h
#160067 opened
Aug 7, 2025 -
[DO NOT MERGE] Stress Test MI325 Capacity.
#160073 opened
Aug 7, 2025 -
Bump to ONNX 1.19.0
#160076 opened
Aug 7, 2025 -
Enable prioritized linker optimization for AArch64 in setup.py and clean up CI script
#160078 opened
Aug 7, 2025 -
Make manylinux build.sh work for AArch64 and AArch64+CUDA builds
#160079 opened
Aug 7, 2025 -
POC for VLA SVE Vectorized class
#160080 opened
Aug 7, 2025 -
[ROCm] [inductor] Added test skips for small ROCm gpus
#160081 opened
Aug 7, 2025 -
Generalize Block/Pool in device caching allocator
#160082 opened
Aug 7, 2025 -
[cpp][inductor] Fix crash on bmm when input is used twice.
#160087 opened
Aug 7, 2025 -
set up vllm build logics in torch_cli
#160088 opened
Aug 7, 2025 -
[2/3 step][ vllm ci build setup] Add vlllm buld logic and dockerfile
#160089 opened
Aug 7, 2025 -
Update (base update)
#160090 opened
Aug 7, 2025 -
[DO NOT MERGE] Testing with TOT Kineto
#160091 opened
Aug 7, 2025 -
[fx] fix split_module with symint
#160093 opened
Aug 7, 2025 -
Try fix AC tag propagation in compile when not using dynamo
#160096 opened
Aug 7, 2025 -
Fix flight recorder for P2P ops
#160097 opened
Aug 7, 2025 -
Add ownership token when needed on GradientEdge
#160098 opened
Aug 7, 2025 -
[OpenReg] Add Event&Stream Support for OpenReg Backend
#160099 opened
Aug 7, 2025 -
[OpenReg] Integrate Event&Stream from OpenReg Backend into PyTorch
#160100 opened
Aug 7, 2025 -
[OpenReg] Improve the Event and Stream capabilities of DeviceGuardImplInterface
#160101 opened
Aug 7, 2025 -
[TEST] Revert "[ROCm][CI] upgrade to 6.4.2 patch release (#158887)"
#160103 opened
Aug 7, 2025 -
[ROCm] Integrate AITER Fav3 fwd kernels
#160105 opened
Aug 7, 2025 -
Add cutedsl template support to compile
#160108 opened
Aug 7, 2025 -
Add flash attention impl to flex attention
#160109 opened
Aug 7, 2025 -
switch prefer_deferred_runtime_asserts_over_guards in export
#160111 opened
Aug 7, 2025 -
ci: Update test_trymerge for stale data
#160112 opened
Aug 7, 2025 -
[inductor] Estimate peak memory allocfree and applying to reordering collectives
#160113 opened
Aug 7, 2025 -
[3/3][ghstack][vllm ci build setup]vllm build workflow
#160116 opened
Aug 7, 2025 -
Update (base update)
#160117 opened
Aug 7, 2025 -
ci: Update permissions to include checks + actions
#160118 opened
Aug 7, 2025 -
Account for triton kernel source code hidden in custom ops properly in AOTAutogradCache
#160120 opened
Aug 7, 2025 -
Gh/yangw dev/13/orig
#160125 opened
Aug 7, 2025 -
Gh/yangw dev/14/orig
#160126 opened
Aug 7, 2025 -
[AOTI] use CudaCachingAllocator for memory allocation
#160127 opened
Aug 7, 2025 -
Add `CUDA_KERNEL_ASSERT_PRINTF`, a more flexible `CUDA_KERNEL_ASSERT_MSG`
#160129 opened
Aug 7, 2025 -
[AOTInductor] Add grid information for Triton Kernels
#160131 opened
Aug 7, 2025 -
[inductor] TLParse tensor metadata logging + test
#160132 opened
Aug 7, 2025 -
[FSDP][Collectives] skipping allgather when world size is 1
#160135 opened
Aug 7, 2025 -
[FSDP][Collectives] skipping reduce_scatter when world size is 1
#160136 opened
Aug 7, 2025 -
[dynamo, nested graph breaks] clean up comments and codegen
#160138 opened
Aug 7, 2025 -
fixing graph break for namedtuple._replace
#160139 opened
Aug 7, 2025 -
Add stable Tensor get_device_index, use more stable DeviceIndex
#160143 opened
Aug 7, 2025 -
[c10d] Error out the case when registering symmetric memory without eager init
#160145 opened
Aug 7, 2025 -
[Still Work in progress] vllm test cli
#160146 opened
Aug 7, 2025 -
[FSDP][Replicate] replicate tests for param registration and input device movements
#160147 opened
Aug 7, 2025 -
[Still WIP]setup test ci workflow
#160149 opened
Aug 7, 2025 -
dynamo: Write a structured trace for supressed exceptions.
#160151 opened
Aug 8, 2025 -
[dynamo] Trace nn.Module __delattr__
#160152 opened
Aug 8, 2025 -
[dynamo] Make ListIterator track mutations to original list
#160154 opened
Aug 8, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_dict`/`test_ordered_dict`
#160156 opened
Aug 8, 2025 -
[WIP][1/N] Port 6 fsdp distributed test cases to Intel GPU
#160158 opened
Aug 8, 2025 -
[inductor] turn on windows inductor UTs
#160161 opened
Aug 8, 2025 -
Support NUMA Binding for Callable Entrypoints
#160163 opened
Aug 8, 2025 -
fix(inductor): show intermediate buffers for split reductions in profile
#160166 opened
Aug 8, 2025 -
[CUDAGraph] Skip CUDAGraph when only 1 kernel
#160168 opened
Aug 8, 2025 -
Add build support for RISCV
#160172 opened
Aug 8, 2025 -
[Inductor XPU GEMM] Step 1/N: Add cutlass-sycl repro.
#160173 opened
Aug 8, 2025 -
[Inductor XPU GEMM] Step 2/N: Generalize cutlass configuration.
#160174 opened
Aug 8, 2025 -
Fixes #119272
#160178 opened
Aug 8, 2025 -
Do not rpath CUDA stubs folder in JIT generated code
#160179 opened
Aug 8, 2025 -
[cutlass backend] re-add pip cutlass path
#160180 opened
Aug 8, 2025 -
[do not merge][inductor] optimize welford reduction
#160181 opened
Aug 8, 2025 -
[WIP] [1/N] Introduce a generic CachingDeviceAllocatorImpl for cross backend use
#160182 opened
Aug 8, 2025 -
Update triton xpu commit to support python 3.14
#160183 opened
Aug 8, 2025 -
Draft: separate reqs for manywheel build and pin
#160184 opened
Aug 8, 2025 -
[doc] fix spelling of word - when
#160185 opened
Aug 8, 2025 -
[DO NOT MERGE][Kineto] Testing not clearing per-thread attribute at all
#160186 opened
Aug 8, 2025 -
[DCP][OSS] Remove extra collective on load
#160189 opened
Aug 8, 2025 -
[Triton] [Inductor] Generalize index broadcasting to handle torch.utils._sympy.functions.Identity
#160190 opened
Aug 8, 2025 -
Flex Attention heuristics: a Blackwell config
#160192 opened
Aug 8, 2025 -
[FR] Don't check incomplete ranks for printing
#160195 opened
Aug 8, 2025 -
kill allow_complex_guards_as_runtime_asserts
#160198 opened
Aug 8, 2025 -
Add CUDA installation script for CUDA 13
#160201 opened
Aug 8, 2025 -
Refactors symmetric memory creation through allocator interface
#160202 opened
Aug 8, 2025 -
[WIP][doc] AOTI debugging guide
#160204 opened
Aug 8, 2025 -
[DCP][HF] Add option to parallelize reads in HF Storage Reader
#160205 opened
Aug 8, 2025 -
[ROCm][DO NOT MERGE] treat unset variables in manywheels build_rocm.sh as an error.
#160207 opened
Aug 8, 2025 -
[wip] Support `defaultdict(None, mapping | Iterable[Tuple])`
#160209 opened
Aug 8, 2025 -
Add `is_cpu` and `dtype` convenience methods for stable tensor type
#160212 opened
Aug 8, 2025 -
[muon] Introduce Muon optimizer to PyTorch
#160213 opened
Aug 8, 2025 -
Port amax to stable ABI
#160214 opened
Aug 8, 2025 -
[ROCm] Enable MI355 CI on PRs, and run full set of UTs on PRs
#160215 opened
Aug 8, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_set`
#160216 opened
Aug 8, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_operator`
#160217 opened
Aug 8, 2025 -
Cap num_stages to 1 on AMD in inductor's triton_heuristics.py
#160218 opened
Aug 8, 2025 -
appending the pythonpath
#160219 opened
Aug 8, 2025 -
[MPS] Sparse enable indices and values
#160223 opened
Aug 8, 2025 -
Remove torch.serialization entries from the doc ignore list
#160224 opened
Aug 8, 2025 -
switch order of ScalarType::Undefined so it equates to -1
#160226 opened
Aug 8, 2025 -
Fix undefined behavior in mul_overflows
#160229 opened
Aug 8, 2025 -
torchdim Python port
#160236 opened
Aug 9, 2025 -
fix(docs): wrong conversion from rst to md in torch.compiler_troubleshooting.md
#160238 opened
Aug 9, 2025 -
[kernacle] add support for addmm and bmm
#160239 opened
Aug 9, 2025 -
guard_or_false cat ops
#160250 opened
Aug 9, 2025 -
unify broadcast_shapes functions and avoid duplicates
#160251 opened
Aug 9, 2025 -
migrate more simple gso checks
#160253 opened
Aug 9, 2025 -
extract shape in _view_has_unbacked_input
#160255 opened
Aug 9, 2025 -
Detect torch function in lists as well
#160256 opened
Aug 9, 2025 -
[inductor] Windows inductor use intel-openmp.
#160258 opened
Aug 9, 2025 -
[vllm hash update] update the pinned vllm hash
#160259 opened
Aug 10, 2025 -
[DO NOT MERGE] Autograd Onboarding Lab
#160264 opened
Aug 10, 2025 -
Support of dtensor redistribute with device order
#160266 opened
Aug 10, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_int`/`bool`/`float`/`complex`
#160276 opened
Aug 10, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_list`/`tuple`
#160277 opened
Aug 10, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_iter`
#160278 opened
Aug 10, 2025 -
[FSDP2] cast `unsharded_param_grad` to correct reduce dtype
#160279 opened
Aug 10, 2025 -
[simplefsdp] add multi parallelism autobucketing
#160282 opened
Aug 10, 2025 -
[C10D] Add check_rng_sync util
#160283 opened
Aug 10, 2025 -
WIP summarize ranks
#160284 opened
Aug 10, 2025 -
[BE] Remove modernize suppression
#160288 opened
Aug 11, 2025 -
[BE] Save attributes for CppCompileError for pickleing
#160294 opened
Aug 11, 2025 -
Recursively descend into lists for TF in getitem
#160297 opened
Aug 11, 2025 -
[wip][claude-code] support multi kernel reductions
#160298 opened
Aug 11, 2025 -
[Do not merge]Upgrade oneDNN to v3.9
#160299 opened
Aug 11, 2025 -
[Intel GPU] check data alignment before contiguousness
#160301 opened
Aug 11, 2025 -
Improve README.md formatting and fix documentation errors
#160307 opened
Aug 11, 2025 -
Enable XPU for test_autograd_function.py
#160309 opened
Aug 11, 2025 -
[inductor] Fix descriptor broadcasting for singleton dimensions
#160310 opened
Aug 11, 2025 -
Fix reinplace optimization issue for index_put when self and source alias
#160311 opened
Aug 11, 2025 -
Optimize `min`, `max` gradient behavior description
#160312 opened
Aug 11, 2025 -
Fix get_free_symbol_uses for several nodes
#160314 opened
Aug 11, 2025 -
Add sdist handling to version finding
#160315 opened
Aug 11, 2025 -
[DO NOT MERGE] ACL Version Upgrade v52.3.0
#160316 opened
Aug 11, 2025 -
[Windows] Update libuv version from 1.39 to 1.51
#160318 opened
Aug 11, 2025 -
Draft Python API for F.linear_cross_entropy
#160319 opened
Aug 11, 2025 -
[Caffe2] Enable SVE128
#160323 opened
Aug 11, 2025 -
[Caffe2] Add SVE128 vectorized<bfloat16_t> template layer
#160324 opened
Aug 11, 2025 -
[Caffe2] Add SVE128 vectorized<float16_t> template layer
#160325 opened
Aug 11, 2025 -
[Caffe2] Add SVE128 vectorized<int##bit##_t> template layers
#160326 opened
Aug 11, 2025 -
[Caffe2] Add SVE128 vectorized<T> template layers for unsigned integers
#160327 opened
Aug 11, 2025 -
[Caffe2] Add SVE128 vectorized<float> template layer
#160328 opened
Aug 11, 2025 -
[Caffe2] Add SVE128 vectorized<double> template layer
#160329 opened
Aug 11, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_math`/`cmath`
#160330 opened
Aug 11, 2025 -
Wrap class definitions in `set_fullgraph(False)` in `test_sort`
#160331 opened
Aug 11, 2025 -
Fix typo: 'complext'
#160335 opened
Aug 11, 2025 -
[ROCm] Make triton build rocm agnostic
#160336 opened
Aug 11, 2025 -
Return the loaded library in torch.ops.load_library
#160338 opened
Aug 11, 2025 -
[MPS] Add support for log_normal_
#160339 opened
Aug 11, 2025 -
Get tensor subclasses and torch.library.triton_op to dispatch correctly
#160341 opened
Aug 11, 2025 -
Add batch option for send/recv_object_list
#160342 opened
Aug 11, 2025 -
[FSDP][Replicate] replicate tests for casting module after init
#160344 opened
Aug 11, 2025 -
[retry-land][pytorch][dynamo_compile] Log stack_trace to dynamo_compile
#160348 opened
Aug 11, 2025 -
Parameterized CUDA Graph Launch 2
#160351 opened
Aug 11, 2025 -
[redone][pytorch] Moving torch.compile worker process logs to a dedicated rank based log directory
#160352 opened
Aug 11, 2025 -
Easy decomposeK code refactor
#160353 opened
Aug 11, 2025 -
fix cpp builder to avoid missing-source compile error
#160354 opened
Aug 11, 2025 -
[cutlass backend] Allow bmm use cases when batch stride is 0
#160356 opened
Aug 11, 2025 -
Factor out the strings to templates for better editor integration
#160357 opened
Aug 11, 2025 -
[Let's see CI] Remove use of device_guard for TensorIndex kernel
#160360 opened
Aug 11, 2025 -
setup test ci
#160361 opened
Aug 11, 2025 -
Typing for common.py
#160362 opened
Aug 11, 2025 -
Type cudagraphs.py
#160363 opened
Aug 11, 2025 -
typing debugging.py
#160364 opened
Aug 11, 2025 -
typing distributed.py
#160365 opened
Aug 11, 2025 -
typing inductor and placeholder backends
#160366 opened
Aug 11, 2025 -
typing registry.py
#160367 opened
Aug 11, 2025 -
Type backend torchxla
#160368 opened
Aug 11, 2025 -
typing tvm.py
#160369 opened
Aug 11, 2025 -
[kernacle] add support for addmm and bmm
#160370 opened
Aug 11, 2025 -
[kernacle] add support for addmm and bmm
#160371 opened
Aug 11, 2025 -
[MTIA] Add MTIA dispatch for kernel foreach_maximum (#160358)
#160372 opened
Aug 11, 2025 -
[ROCm][Windows] Include native_transformers srcs to fix link errors.
#160373 opened
Aug 11, 2025 -
[inductor][while_loop][be] improve the readability of output handling
#160374 opened
Aug 11, 2025 -
[while_loop] autograd support
#160375 opened
Aug 11, 2025 -
[while_loop][auto_grad] support dynamic shape
#160376 opened
Aug 11, 2025 -
[while_loop][autograd] support carry of int tensor
#160377 opened
Aug 11, 2025 -
[BE][CI] Adjust `error_inputs` for cat and complex
#160378 opened
Aug 11, 2025 -
[CI] Move CUDA tests to trunk workflow
#160379 opened
Aug 11, 2025 -
[AOTInductor] Add input information for Triton Kernels in AOTI
#160380 opened
Aug 11, 2025 -
[Inductor][Configs] Expose autotune_num_choices_displayed through environment variable
#160381 opened
Aug 11, 2025 -
Turn on part of provenance tracking by default
#160383 opened
Aug 12, 2025 -
[audio hash update] update the pinned audio hash
#160384 opened
Aug 12, 2025 -
[TorchScript] thread-safe ErrorReport::CallStack
#160386 opened
Aug 12, 2025 -
Fix LBFGS warning convert a tensor with requires_grad=True to a scalar
#160389 opened
Aug 12, 2025 -
[FSDP][Replicate] Testing replicate parity in single and multigroup
#160390 opened
Aug 12, 2025 -
wip
#160393 opened
Aug 12, 2025 -
[export] Refactor PT2 Archive weight saving and loading
#160394 opened
Aug 12, 2025
115 Issues closed by 37 people
-
flex_attention + dynamic=True with large batch or heads causes Triton Error [CUDA]: invalid argument
#157018 closed
Aug 12, 2025 -
[FlexAttention] Zero computed input gradients with torch.compile + customized autograd func
#159299 closed
Aug 12, 2025 -
DISABLED test_multiple_mutations_of_buf (__main__.TestOperatorReorderForPeakMemory)
#159952 closed
Aug 12, 2025 -
2.8 flex attention kernels result in triton warning
#158463 closed
Aug 11, 2025 -
Wrong-size gradients in Expert Parallel MoE
#160285 closed
Aug 11, 2025 -
Use FP32 for ConvTranspose3D when using autocast on MPS
#160332 closed
Aug 11, 2025 -
DISABLED test_op_has_batch_rule_tensordot_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142769 closed
Aug 11, 2025 -
DISABLED test_cuda_kernel_loop_overflow_large (__main__.TestCuda)
#159285 closed
Aug 11, 2025 -
DISABLED test_deferred_runtime_asserts (__main__.ReproTests)
#156817 closed
Aug 11, 2025 -
DISABLED test_graph_memory_stats_and_use_result_after_destroy_graph (__main__.TestCuda)
#159286 closed
Aug 11, 2025 -
Regression on compile with backend inductor with torch 2.8
#160084 closed
Aug 11, 2025 -
MPS regression on supported dtypes/scalar/alpha combinations on add/sub/rsub
#160208 closed
Aug 11, 2025 -
[RFC] A Distributed CUDA Unified Memory Backend for PyTorch
#158122 closed
Aug 11, 2025 -
DISABLED test_inplace_on_view_makes_base_require_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156209 closed
Aug 11, 2025 -
DISABLED test_dataclass_init_with_default_factory_with_inputs (__main__.ReproTests)
#156799 closed
Aug 11, 2025 -
DISABLED test_triton_fx_graph_with_et_cuda (__main__.TestExecutionTraceCUDA)
#159236 closed
Aug 11, 2025 -
Severe performance regression on deterministic algorithm in torch 2.0
#109856 closed
Aug 11, 2025 -
Add __riscv macro detection to support the scalar backend for RISCV
#160171 closed
Aug 11, 2025 -
DISABLED test_conv_transpose_unary_fusion_ops (__main__.TestMkldnnFusion)
#158115 closed
Aug 11, 2025 -
MPS does not support addmm for non-float input
#154901 closed
Aug 11, 2025 -
[Feature] Implement a CUDA kernel for _weight_int8pack_mm
#158849 closed
Aug 10, 2025 -
aten::grid_sampler_3d'
#160237 closed
Aug 10, 2025 -
`torch.compile` on `.sum() `and `.item()` calls errors from `tensorify_python_scalars`
#158083 closed
Aug 9, 2025 -
Aborted (core dumped) in `reflection_pad 2d`
#142455 closed
Aug 8, 2025 -
[ONNX] Exporter crashes when fx node output includes None
#160150 closed
Aug 8, 2025 -
AArch64 Inductor Perf Test build fails installing torchao
#160188 closed
Aug 8, 2025 -
DISABLED test_repeat_interleave_2_dynamic_shapes_xpu (__main__.DynamicShapesCodegenGPUTests)
#159803 closed
Aug 8, 2025 -
error message for `Tensor.index_put_` could be improved for MPS failure mode
#160034 closed
Aug 8, 2025 -
`copy_()` fails with HSDP in FSDP2
#147568 closed
Aug 8, 2025 -
[dynamo, guards] Move SHAPE_ENV guard to C++
#143309 closed
Aug 8, 2025 -
BUG: numpy very slow after import torch
#158005 closed
Aug 8, 2025 -
[ONNX] Improve dynamic_axes to dynamic_shapes conversion in exporter
#150940 closed
Aug 8, 2025 -
TypeError: (): incompatible function arguments
#102832 closed
Aug 8, 2025 -
PyTorch 2.8: PYTHONPATH no longer respected when building from source?
#160092 closed
Aug 8, 2025 -
HAS_CUDA in the inductor tests is really HAS_CUDA_AND_TRITON
#159399 closed
Aug 8, 2025 -
lintrunner not support riscv64
#160170 closed
Aug 8, 2025 -
DISABLED test_inductor_reuse_buffer_after_inplace_collective (__main__.CompileTest)
#147950 closed
Aug 8, 2025 -
DISABLED test_dataclass_in_module (__main__.ReproTests)
#156776 closed
Aug 8, 2025 -
DISABLED test_inductor_reduce_scatter_tensor_single (__main__.CompileTest)
#147911 closed
Aug 8, 2025 -
DISABLED test_graph_two_successive (__main__.TestCuda)
#159113 closed
Aug 8, 2025 -
DISABLED test_inductor_multiple_specializations_cuda (__main__.GPUTests)
#154705 closed
Aug 8, 2025 -
Add the `cutlass-sycl` submodule and set it as the default cutlass path for XPU.
#160176 closed
Aug 8, 2025 -
torch.multinomial sample behavior is not consist with sample numbers
#159927 closed
Aug 8, 2025 -
[RFC] CUDAPluggableAllocator receives malloc request of size zero.
#159892 closed
Aug 8, 2025 -
Feature Request: deterministic adaptive_avg_pool2d_backward_cuda
#84860 closed
Aug 7, 2025 -
[libtorh]Consistency problem of gpu computing
#94976 closed
Aug 7, 2025 -
Add nondeterministic alert to `torch.scatter_`
#70583 closed
Aug 7, 2025 -
nn.LSTM gives nondeterministic results with dropout and multiple layers, OR cuDNN version mismatch
#35661 closed
Aug 7, 2025 -
Add nondeterministic alert to `.scatter_`
#133204 closed
Aug 7, 2025 -
Deterministic support for adaptive_avg_pool2d_backward_cuda
#149130 closed
Aug 7, 2025 -
Inconsistent output for ConvTranspose3d on GPU
#137970 closed
Aug 7, 2025 -
Fix the description of `alpha` in `torch.sub`
#159637 closed
Aug 7, 2025 -
DISABLED test_inductor_inplace_op_on_view (__main__.CompileTest)
#147852 closed
Aug 7, 2025 -
DISABLED test_cuda_kernel_loop_overflow (__main__.TestCuda)
#159069 closed
Aug 7, 2025 -
DISABLED test_record_stream (__main__.TestCuda)
#134746 closed
Aug 7, 2025 -
torch.cuda.empty_cache() shall support MemPool as an argument just like the C++ interface
#160069 closed
Aug 7, 2025 -
Bits types cannot be used under deterministic mode
#109802 closed
Aug 7, 2025 -
X86InductorQuantizer does not quantize anything
#160095 closed
Aug 7, 2025 -
DISABLED test_cuda_memory_leak_detection_propagates_errors (__main__.TestCuda)
#159039 closed
Aug 7, 2025 -
DISABLED test_inductor_all_reduce_coalesced (__main__.CompileTest)
#147726 closed
Aug 7, 2025 -
DISABLED test_inductor_broadcast (__main__.CompileTest)
#147816 closed
Aug 7, 2025 -
torch.sparse.to_sparse_semi_structured generate irrelevant values with dtype half or int8
#159872 closed
Aug 7, 2025 -
[rocm] HIP Graph capture raises segmentation fault on AMD GPU but CUDA Graph capture succeeds on Nvidia GPU
#155720 closed
Aug 7, 2025 -
`torch.unique` behaves strange for large input arrays on Windows
#135019 closed
Aug 7, 2025 -
addmv bfloat16 inconsistency on x86
#159960 closed
Aug 7, 2025 -
[inductor][fuzzer] Compilation Error in complex64+toint
#157683 closed
Aug 7, 2025 -
Build pytorch for rocm failed
#148167 closed
Aug 7, 2025 -
DISABLED test_comment_graph_fragment (__main__.TritonCodeGenTests)
#159925 closed
Aug 7, 2025 -
DISABLED test_hop_eager (__main__.TorchFunctionModeTests)
#159950 closed
Aug 7, 2025 -
DISABLED test_hop (__main__.TorchFunctionModeTests)
#159951 closed
Aug 7, 2025 -
DISABLED test_inductor_all_reduce_non_contig_input (__main__.CompileTest)
#147733 closed
Aug 7, 2025 -
DISABLED test_inductor_all_to_all_single (__main__.CompileTest)
#147795 closed
Aug 7, 2025 -
DISABLED test_wait_tensor (__main__.CompileTest)
#148014 closed
Aug 7, 2025 -
Release 2.8.0 validations checklist and cherry-picks
#158939 closed
Aug 6, 2025 -
torch_shm_manager: undefined reference to gloo
#146239 closed
Aug 6, 2025 -
Performance bug on `mode` of `torch.autograd.grad_mode.inference_mode`
#159633 closed
Aug 6, 2025 -
[Docs] 2.8 version is listed twice
#159972 closed
Aug 6, 2025 -
`CosineAnnealingWarmRestarts` should use integer epoch
#69841 closed
Aug 6, 2025 -
`torch.export` fails on `torch.cond` that dispatches Triton kernels (`SpecViolationError: missing val`)
#159955 closed
Aug 6, 2025 -
ghstack-mergeability-check and Check labels failing with 'Resource not accessible by integration'
#159825 closed
Aug 6, 2025 -
DISABLED test_addmm_dtype_mismatch (__main__.TestPatternMatcher)
#159631 closed
Aug 6, 2025 -
[MPS] Remove MacOS-13 support
#159275 closed
Aug 6, 2025 -
Multiple runners shutdown for an autoupdate while still running jobs
#107402 closed
Aug 6, 2025 -
[v.2.8.0] Release Tracker
#156745 closed
Aug 6, 2025 -
PyTorch Improper Resource Shutdown or Release vulnerability
#159963 closed
Aug 6, 2025 -
Inductor Perf MX to_blocked
#153194 closed
Aug 6, 2025 -
Reproducibility on different platform
#159846 closed
Aug 6, 2025 -
DISABLED test_distributed_checkpoint_state_dict_type0_cuda (__main__.TestDistributedCheckpointCUDA)
#145807 closed
Aug 6, 2025 -
UnicodeDecodeError in torch.compile on Windows MSVC
#159537 closed
Aug 6, 2025 -
DISABLED test_inplace_on_view_then_no_grad_cpu (__main__.TestAutogradDeviceTypeCPU)
#156306 closed
Aug 5, 2025 -
Nightly libtorch links on website are incorrect
#159880 closed
Aug 5, 2025 -
Add CUDA kernel support for RTX 5070 Ti (Ada Lovelace, SM 8.9)
#159851 closed
Aug 5, 2025 -
incorrect _unsafe_index meta
#139312 closed
Aug 5, 2025 -
DISABLED test_grad_with_tracer_ScheduleClass0 (__main__.ScheduleTest)
#159034 closed
Aug 5, 2025 -
DISABLED test_grad_with_tracer_ScheduleClass1 (__main__.ScheduleTest)
#159253 closed
Aug 5, 2025 -
DISABLED test_zero_bubble_with_model_kwargs_ScheduleClass0 (__main__.ScheduleTest)
#154547 closed
Aug 5, 2025 -
DISABLED test_schedule_with_native_zero_bubble_ScheduleClass0 (__main__.ScheduleTest)
#156088 closed
Aug 5, 2025 -
DISABLED test_schedule_with_native_zero_bubble_ScheduleClass1 (__main__.ScheduleTest)
#156328 closed
Aug 5, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass2_use_new_runtime_False (__main__.ScheduleTest)
#154443 closed
Aug 5, 2025 -
DISABLED test_grad_with_manual_interleaved_ScheduleClass2_use_new_runtime_True (__main__.ScheduleTest)
#154481 closed
Aug 5, 2025 -
DISABLED test_is_isnot (__main__.TestScript)
#120694 closed
Aug 5, 2025 -
DISABLED test_index (__main__.TestPythonBuiltinOP)
#119160 closed
Aug 5, 2025 -
DISABLED test_ranks_and_tag (__main__.CompileTest)
#147974 closed
Aug 5, 2025 -
DISABLED test_tensor_subclasses (__main__.TestScript)
#119949 closed
Aug 5, 2025 -
[Feature Request] Add support for CUDA sm_120 (RTX 5070 Ti) in prebuilt PyTorch binaries
#159847 closed
Aug 5, 2025 -
Inconsistent Model Results and Failures on Windows with CUDA vs. CPU PyTorch Builds
#156547 closed
Aug 5, 2025 -
The recorded tid in torch kineto profiler do not match with syscall(SYS_gettid) or pthread_self() in Linux
#159771 closed
Aug 5, 2025 -
DISABLED test_undefined_grads_mode_warn (__main__.TestAutogradFallback)
#159471 closed
Aug 5, 2025 -
DISABLED test_pin_memory_no_cuda (__main__.TestDictDataLoader)
#159802 closed
Aug 5, 2025
115 Issues opened by 87 people
-
bug in libtorch optimize_for_inference
#160392 opened
Aug 12, 2025 -
Perf differences observed between AOT and JIT on RAFT model
#160388 opened
Aug 12, 2025 -
DISABLED RecordDebugHandles.Basic (__main__.test_jit)
#160387 opened
Aug 12, 2025 -
TTGIR error for FlexAttention on B200
#160385 opened
Aug 12, 2025 -
Use `tf32x3` through Inductor
#160359 opened
Aug 11, 2025 -
Max-autotune is slower than eager for specific jagged tensor shapes
#160355 opened
Aug 11, 2025 -
DISABLED test_fused_all_gather_scaled_matmul_gather_dim_1_scale_mode_tensor-wise (__main__.SymmetricMemoryTest)
#160347 opened
Aug 11, 2025 -
DISABLED test_custom_functions_and_tracer (__main__.TestFXNumericSuiteNShadows)
#160346 opened
Aug 11, 2025 -
Meta device initialization on HF models + FSDP leads to different numerical behaviour
#160340 opened
Aug 11, 2025 -
[RFC] Simplify bookkeeping of DeviceMesh slicing, (un)flattening, ...
#160337 opened
Aug 11, 2025 -
Tensor subclasses don't work with torch.library.triton_op
#160333 opened
Aug 11, 2025 -
`torch.Tensor` methods return type annotation too broad for `torch.Tensor` subclasses
#160322 opened
Aug 11, 2025 -
FSDP does not reduce the gradient size during backward
#160320 opened
Aug 11, 2025 -
Memory leak when using mark_dirty in Python custom autograd.Function
#160317 opened
Aug 11, 2025 -
[DTensor] Make Redistribute autograd function twice-differentiable
#160313 opened
Aug 11, 2025 -
Reserved memory is much bigger than allocated memory in multi-stream scenario
#160308 opened
Aug 11, 2025 -
Many failures in inductor/test_max_autotune on H100
#160305 opened
Aug 11, 2025 -
Multi device aoti compile not working
#160303 opened
Aug 11, 2025 -
Redundant installation of CMake and Ninja
#160302 opened
Aug 11, 2025 -
[inductor][cpu] Performance regression in 2025-08-02 nightly release
#160296 opened
Aug 11, 2025 -
torch 2.9.0dev on cuda 13 producing Cuda mismatch error during flash/sage attention install
#160293 opened
Aug 11, 2025 -
[FSDP2] Bug: NaN gradient when both HSDP and CPU offload are enabled
#160291 opened
Aug 11, 2025 -
DISABLED test_small_stability (__main__.TestBase)
#160290 opened
Aug 11, 2025 -
Symmetric memory backend bug: OOM with default, errors with CUDA/NCCL, nvshmem works
#160289 opened
Aug 11, 2025 -
Improve CUDAGraph Tree heuristics on starting new generation
#160281 opened
Aug 10, 2025 -
[Documentation Clarity] torch.min/torch.max gradient behavior
#160273 opened
Aug 10, 2025 -
Add Green Context Support for fsdp2
#160272 opened
Aug 10, 2025 -
Error when using torch.compile with dynamic=True: tensor 'guidance' stride mismatch at index 0
#160271 opened
Aug 10, 2025 -
Questions about Volta Support
#160269 opened
Aug 10, 2025 -
XPU out of memory on Intel iGPUs although plenty of memory is available per error message.
#160265 opened
Aug 10, 2025 -
Graph partition + Fake Dependency from view op
#160263 opened
Aug 10, 2025 -
CPU: Pytorch >= 2.7.0 is broken
#160261 opened
Aug 10, 2025 -
PyTorch 2.8: cuda linking issue, undefined symbol
#160248 opened
Aug 9, 2025 -
wan2.1 vae take more gpu memory after compile
#160247 opened
Aug 9, 2025 -
pytorch.org/get-started/locally is wrong about supported Python version (3.12 is not highest)
#160246 opened
Aug 9, 2025 -
DISABLED test_comprehensive_nn_functional_interpolate_trilinear_xpu_float32 (__main__.TestInductorOpInfoXPU)
#160245 opened
Aug 9, 2025 -
DISABLED test_comprehensive_nn_functional_interpolate_trilinear_xpu_float64 (__main__.TestInductorOpInfoXPU)
#160244 opened
Aug 9, 2025 -
DISABLED test_copy_non_blocking_is_pinned_xpu (__main__.AOTInductorTestABICompatibleGpu)
#160243 opened
Aug 9, 2025 -
`torch.compile` produces output mismatch vs eager float64 for some seeds
#160242 opened
Aug 9, 2025 -
`torch.export.export` fails when model is in `eval()` mode due to `torch.typename` being skipped by Dynamo
#160241 opened
Aug 9, 2025 -
Migrate test.sh into python based tests
#160240 opened
Aug 9, 2025 -
xpu: huggingface accelerate test_dynamo test fails on XPU
#160232 opened
Aug 8, 2025 -
Offer official Pytorch Vulkan backend on pytorch.org
#160230 opened
Aug 8, 2025 -
Shape propagation through repeated modules during dynamo export
#160199 opened
Aug 8, 2025 -
LBFGS always raises warning about converting a tensor with requires_grad=True to a scalar?
#160197 opened
Aug 8, 2025 -
Assertion Error When Freeing Unbacked Symbolic Shapes in Torch.Export
#160196 opened
Aug 8, 2025 -
[CI] Inherit PYTHONPATH from env instead of overriding it completely
#160193 opened
Aug 8, 2025 -
[RFC] Enable cutlass to support Intel GPU into PyTorch Inductor.
#160175 opened
Aug 8, 2025 -
libtorch: CMake 4.1: Error evaluating generator expression: TARGET_PROPERTY:torch,INTERFACE_LINK_LIBRARIES
#160167 opened
Aug 8, 2025 -
aarch64 GPU wheel release in pypi for GH200/GB200 etc
#160162 opened
Aug 8, 2025 -
Add a "max-performance" mode to torch.compile for aggressive optimization
#160142 opened
Aug 7, 2025 -
RuntimeError: miopenStatusUnknownError on PyTorch 2.8.0+rocm6.4
#160141 opened
Aug 7, 2025 -
Can't use nvshmem triton device side function
#160137 opened
Aug 7, 2025 -
Torchrun breaks with virtual environments
#160130 opened
Aug 7, 2025 -
FX graph segments for multiple triton kernels from reduction node are broken
#160124 opened
Aug 7, 2025 -
Add tensor-aware put_signal & wait_signal Triton wrappers to NVSHMEM Triton Kernels
#160122 opened
Aug 7, 2025 -
cuda 13 broken
#160104 opened
Aug 7, 2025 -
torch.library.infer_schema doesn't have full support for list types
#160094 opened
Aug 7, 2025 -
compile regression on forward pre hook with prepend=True
#160083 opened
Aug 7, 2025 -
'torch._tensor' has no attribute 'split' while using torch.compile under torch.device context.
#160077 opened
Aug 7, 2025 -
FlexAttention backward compilation failure with GQA on NVIDIA B200
#160074 opened
Aug 7, 2025 -
[TorchDynamo] excessive stack use: stack is 3286 deep on A100 with AMD EPYC CPU
#160071 opened
Aug 7, 2025 -
FSDP2 not compatible with transformers >= 4.54.0 GenericForTokenClassification
#160068 opened
Aug 7, 2025 -
`torch.compile` crashes on model exported via `torch.export`, with lifted constant from cache tensor
#160066 opened
Aug 7, 2025 -
Error raised when running torch.compile under FakeTensorMode context
#160057 opened
Aug 7, 2025 -
`torch.Pad(mode="circular")` doesn't work for 4D or 5D input despite error msg
#160053 opened
Aug 7, 2025 -
DISABLED test_flop_counter_op_options0_cuda_float16 (__main__.TestSchedulerCUDA)
#160052 opened
Aug 7, 2025 -
DISABLED test_cat_extern_kernel_dynamic_shapes_mps (__main__.DynamicShapesGPUTests)
#160051 opened
Aug 7, 2025 -
DISABLED test_retracibility_nested_list_out_dynamic_shapes (__main__.DynamicShapesExportTests)
#160050 opened
Aug 7, 2025 -
Requesting improved torch.auto_grad.detect_anomaly() for NaN detection
#160016 opened
Aug 6, 2025 -
Support NUMA Binding for Callable entrypoints to `elastic_launch`
#160006 opened
Aug 6, 2025 -
[BE][download.pytorch.org] Index fix user navigation
#160005 opened
Aug 6, 2025 -
`torch._inductor.aoti_compile_and_package` fails with CUDA kernels inside of a `torch.cond`
#159995 opened
Aug 6, 2025 -
[DTensor] Decide / Document RNG semantics
#159991 opened
Aug 6, 2025 -
@require_world_size(4) macro is ambiguous or buggy
#159987 opened
Aug 6, 2025 -
[RFC] Cuda support matrix for Release 2.9
#159980 opened
Aug 6, 2025 -
Segmentation fault when using torch.compile with PyTorch 2.8.0 XPU on Intel ARC A770
#159974 opened
Aug 6, 2025 -
SAC compatibility with compiled flex attention
#159970 opened
Aug 6, 2025 -
Compile error with gcc-14 when building vec_test_all_types
#159962 opened
Aug 6, 2025 -
TensorMetadata not retained during DTensor convolution double backpropagation
#159959 opened
Aug 6, 2025 -
torch.compile regression on side-effects between torch 2.7.1 and 2.8 (final RC)
#159958 opened
Aug 6, 2025 -
CUDA initialization: CUDA unknown error
#159954 opened
Aug 6, 2025 -
Add official statically linked libtorch libraries
#159947 opened
Aug 6, 2025 -
Wrong backend assigned to Intel Gaudi (HPU) devices in PT 2.8.0
#159945 opened
Aug 6, 2025 -
Is is possible for pytorch pipeline parallelism to support dynamic shapes(input/output)
#159942 opened
Aug 6, 2025 -
scaled_mm Triton implementation causes wrong results on (at least) H100
#159940 opened
Aug 6, 2025 -
contiguous has BUG ?
#159932 opened
Aug 6, 2025 -
`torch.onnx.export` fails when exporting custom operator with bfloat16 constant tensor
#159928 opened
Aug 6, 2025 -
[export] Tensor subclass decomposition breaks state dict contracts
#159918 opened
Aug 6, 2025 -
`test_non_contiguous_input_mm_plus_mm` looks broken on B200
#159914 opened
Aug 5, 2025 -
New at::HostAllocator interface prevents using more than one allocator implementation for a device type
#159906 opened
Aug 5, 2025 -
UNSTABLE Check mergeability of ghstack PR / ghstack-mergeability-check
#159899 opened
Aug 5, 2025 -
UNSTABLE Check Labels / Check labels
#159894 opened
Aug 5, 2025 -
torch._dynamo documentation
#159886 opened
Aug 5, 2025 -
Implement `grid_sampler_3d` for MPS
#159882 opened
Aug 5, 2025 -
`torch.nn.functional.sigmoid` produces inconsistent results on different device for complex inputs
#159870 opened
Aug 5, 2025 -
BE: enable auto reformat List to list and similar typing
#159866 opened
Aug 5, 2025 -
DISABLED test_upper_bound_i64_cuda (__main__.AOTInductorTestABICompatibleGpu)
#159860 opened
Aug 5, 2025 -
[compile] Correctness and reproducibility issue
#159855 opened
Aug 5, 2025 -
`torch.cond()` behaves inconsistently when using symbolic predicate
#159852 opened
Aug 5, 2025 -
`tree_flatten(some_pytree, lambda x: isinstance(x, Tensor))` yields `[None], _`
#159848 opened
Aug 5, 2025 -
[DTensor] nn.Embedding Compile Failure when Creating FX Graph
#159843 opened
Aug 5, 2025 -
Missing documentation for device mesh on DDP
#159836 opened
Aug 5, 2025
517 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Introduce Muon optimizer to PyTorch
#159465 commented on
Aug 11, 2025 • 32 new comments -
[OpenReg] Add OSX/Windows Support for OpenReg
#159441 commented on
Aug 11, 2025 • 20 new comments -
Add beginnings of torch::stable::accelerator
#159679 commented on
Aug 12, 2025 • 19 new comments -
[inductor] dont reuse buffers if it affects peak (#145883)
#159530 commented on
Aug 12, 2025 • 15 new comments -
Enable output padding when only outermost dim is dynamic
#159404 commented on
Aug 12, 2025 • 14 new comments -
[Intel GPU] Enable backward for SDPA XPU [WIP]
#156272 commented on
Aug 12, 2025 • 13 new comments -
[DeviceMesh] Add `_unflatten_` api for device mesh to support better UX for some use cases like EP and replicate
#159482 commented on
Aug 6, 2025 • 11 new comments -
Update torch::stable::Tensor() default constructor
#159507 commented on
Aug 11, 2025 • 11 new comments -
[ATen][CPU][Sparse] Use Third-Party Eigen for sparse add and addmm
#155357 commented on
Aug 8, 2025 • 10 new comments -
[PP] Add DualPipeV schedule
#159591 commented on
Aug 10, 2025 • 10 new comments -
[dynamic shapes] unbacked-safe slicing
#157944 commented on
Aug 12, 2025 • 9 new comments -
Add utility to get computed kernel in torch.library
#158393 commented on
Aug 8, 2025 • 9 new comments -
Add support for param mutation under inference mode
#159661 commented on
Aug 12, 2025 • 9 new comments -
[Draft][WIP] Enable XPU path for FlexAttention
#143553 commented on
Aug 8, 2025 • 9 new comments -
Support XPU in --nproc-per-node option to torchrun
#159474 commented on
Aug 11, 2025 • 7 new comments -
Add placeholder for the User Guide
#159379 commented on
Aug 11, 2025 • 6 new comments -
Remove accidental host synchronization in autograd cpu offload
#159698 commented on
Aug 5, 2025 • 6 new comments -
Implement OpenReg device autoload mechanism
#158555 commented on
Aug 11, 2025 • 6 new comments -
[vllm in torch ci ][step 1/3] add build logics
#159815 commented on
Aug 9, 2025 • 6 new comments -
Add support for tracing vmap in pre-dispatch export
#154650 commented on
Aug 7, 2025 • 5 new comments -
[fr] [xpu] Add FlightRecorder support for ProcessGroupXCCL
#158568 commented on
Aug 11, 2025 • 5 new comments -
[CUDA] Add experimental green context support for SM carveout
#159104 commented on
Aug 11, 2025 • 5 new comments -
Add option to assert if kernel is not fully fused in foreach_map
#159213 commented on
Aug 6, 2025 • 5 new comments -
Add `label_smoothing` param in `nn.BCELoss` and `nn.BCEWithLogitsLoss`
#150282 commented on
Aug 12, 2025 • 5 new comments -
Allow exposing more functions during initial template expansion
#159554 commented on
Aug 11, 2025 • 4 new comments -
[Draft][CUDA] Upgrade torch._scaled_grouped_mm to SM100+
#156806 commented on
Aug 7, 2025 • 4 new comments -
Replace setup.py bdist_wheel with python -m build --wheel
#156712 commented on
Aug 11, 2025 • 4 new comments -
Add fallback support for torch.mm in foreach_map_fn
#159757 commented on
Aug 9, 2025 • 4 new comments -
[DTensor] Registers sharding rule for rms_norm
#159692 commented on
Aug 8, 2025 • 4 new comments -
Fix typo in parameter name in cpp.py
#159716 commented on
Aug 11, 2025 • 3 new comments -
grid_sampler_3d for MPS
#159421 commented on
Aug 6, 2025 • 3 new comments -
[Intel GPU] Support SDPA backend selection and priority setting on XPU
#159464 commented on
Aug 12, 2025 • 3 new comments -
Add pad and narrow to torch/csrc/stable/ops.h
#159328 commented on
Aug 12, 2025 • 3 new comments -
Graph split event tracker
#159795 commented on
Aug 11, 2025 • 3 new comments -
[caffe2][IGiOS] Fix exhaustive switches
#156832 commented on
Aug 8, 2025 • 2 new comments -
[inductor] add lowering for repeat_interleave.Tensor with output size specified (#147160)
#158462 commented on
Aug 11, 2025 • 2 new comments -
Replace `std::runtime_error` with `TORCH_CHECK`
#159344 commented on
Aug 8, 2025 • 2 new comments -
Bump transformers pin
#159291 commented on
Aug 12, 2025 • 2 new comments -
[dynamo][guards] Install dict watchers for recrusive dict tag optimization
#159796 commented on
Aug 12, 2025 • 2 new comments -
[Inductor] addmm + activation function fusion
#158137 commented on
Aug 8, 2025 • 2 new comments -
add fp8 scaled_mm for XPU
#140972 commented on
Aug 9, 2025 • 2 new comments -
port 2 distributed pipeline test files for Intel GPU
#159140 commented on
Aug 12, 2025 • 2 new comments -
[Graph Partition] Pass all OSS unit tests
#154667 commented on
Aug 12, 2025 • 2 new comments -
Add torch.compile support for torch.mm(out_dtype=...)
#159026 commented on
Aug 6, 2025 • 2 new comments -
[benchmark] Add HF LLM benchmarks
#156967 commented on
Aug 11, 2025 • 1 new comment -
Add new_empty (with dtype argument only) to torch::stable
#159508 commented on
Aug 12, 2025 • 1 new comment -
[DTensor] add op support: aten.squeeze_.dim
#159532 commented on
Aug 7, 2025 • 1 new comment -
[inductor] skip bmm when converting channel last
#159459 commented on
Aug 5, 2025 • 1 new comment -
Remove usage of fsspec in HF consolidation script
#159392 commented on
Aug 11, 2025 • 1 new comment -
Fix torch.export.export() GPU failure with RNN modules.
#155734 commented on
Aug 10, 2025 • 1 new comment -
[CI][CUDA] Add periodic b200 distributed job
#159323 commented on
Aug 9, 2025 • 1 new comment -
[inductor][cpu] Fix double-offset issue in `GEMM_TEMPLATE`
#159233 commented on
Aug 12, 2025 • 1 new comment -
[dynamo] Add -> bool to functions named (_?)(is|has)_.*
#155923 commented on
Aug 11, 2025 • 1 new comment -
Use device agnostic APIs for RNG
#159021 commented on
Aug 6, 2025 • 1 new comment -
[dict] Implement `__eq__` for dict_items
#155154 commented on
Aug 12, 2025 • 1 new comment -
[inductor] initial triton static config lookup table
#157699 commented on
Aug 11, 2025 • 1 new comment -
[inductor] propagate shapes in CSEVariable
#152198 commented on
Aug 12, 2025 • 1 new comment -
[ARM] Integrate INT4→BF16 via KleidiAI, with fallback
#158250 commented on
Aug 6, 2025 • 1 new comment -
fixes #156701
#159715 commented on
Aug 6, 2025 • 1 new comment -
[cuda][cupy] Improve cupy device placement when device is provided with explicit index
#158529 commented on
Aug 10, 2025 • 1 new comment -
[WIP] Attempt to fix `torch.backends.cudnn.rnn` import
#159828 commented on
Aug 6, 2025 • 1 new comment -
[Flex Attn][CPU] support flash decoding for cpu
#159835 commented on
Aug 6, 2025 • 0 new comments -
[C10d][Gloo] Enable complex datatype support in ProcessGroupGloo
#156633 commented on
Aug 11, 2025 • 0 new comments -
[build] remove upper version pin for `setuptools<80.0`
#156049 commented on
Aug 11, 2025 • 0 new comments -
Fuse matmul
#157743 commented on
Aug 11, 2025 • 0 new comments -
unskipped mobilenet_v3 quantization and mobilenet_v2 quantization plus tests from https://github.com/pytorch/pytorch/issues/125438
#157786 commented on
Aug 7, 2025 • 0 new comments -
[test][do not merge] Upgrade oneDNN to v3.9
#157994 commented on
Aug 5, 2025 • 0 new comments -
[claude-code] Add top-level module doc for torch/distributed/tensor/_op_schema.py
#157804 commented on
Aug 11, 2025 • 0 new comments -
For sdists, replace symlink with copy for docs requirements
#157811 commented on
Aug 11, 2025 • 0 new comments -
Allow docker builds to deal with symlinks
#157812 commented on
Aug 8, 2025 • 0 new comments -
[ZENDNN] Integrate ZenDNN library, implement Linear op, add unit-tests
#156599 commented on
Aug 11, 2025 • 0 new comments -
[dynamo, nested graph breaks] implement new resume frame stack/locals/cell layout convention
#157971 commented on
Aug 11, 2025 • 0 new comments -
[cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel
#156140 commented on
Aug 6, 2025 • 0 new comments -
docstring_linter: Fix #151692 and other issues
#156596 commented on
Aug 11, 2025 • 0 new comments -
Make functorch notebook symlinks PEP 517 valid
#157813 commented on
Aug 11, 2025 • 0 new comments -
Improve MANIFEST.in for source distribution
#157814 commented on
Aug 11, 2025 • 0 new comments -
[Doc] remove WSL2 in support matrix for Intel GPU
#156590 commented on
Aug 5, 2025 • 0 new comments -
Add PEP 517 compliant Python source distribution to release process
#157815 commented on
Aug 11, 2025 • 0 new comments -
Add functions to setup PrivateUse1 as a python backend device.
#157859 commented on
Aug 5, 2025 • 0 new comments -
[generator] Close all open generators in compile_subgraph
#157149 commented on
Aug 11, 2025 • 0 new comments -
[generator] Raise `StopIteration(value)` with value from the return stmt
#157152 commented on
Aug 12, 2025 • 0 new comments -
[SymmMem] Install NVSHMEM wheel in CI docker
#157411 commented on
Aug 6, 2025 • 0 new comments -
[build] bootstrap git repo for build for non-git-clone archive
#157432 commented on
Aug 10, 2025 • 0 new comments -
[contextlib] Fixes for CPython contextlib tests
#157148 commented on
Aug 8, 2025 • 0 new comments -
[BE][4/5] fix typos in aten/ (aten/src/ATen/native/)
#157553 commented on
Aug 10, 2025 • 0 new comments -
[BE][5/5] fix typos in aten/ (aten/src/ATen/)
#157554 commented on
Aug 10, 2025 • 0 new comments -
[dynamo] [guard] Add caching for inside torch.compile.disable function to avoid unnecessary recompilation.
#157566 commented on
Aug 11, 2025 • 0 new comments -
Introduce a new API torch.accelerator.get_mem_info
#156812 commented on
Aug 9, 2025 • 0 new comments -
[simplefsdp auto-bucketing] ir node runtime estimation
#157572 commented on
Aug 11, 2025 • 0 new comments -
[BE][1/6] fix typos in test/
#157635 commented on
Aug 10, 2025 • 0 new comments -
[BE] add `SHFMT` linter to format shell scripts
#157685 commented on
Aug 10, 2025 • 0 new comments -
[BE][1/4] format shell scripts with `SHFMT`
#157686 commented on
Aug 10, 2025 • 0 new comments -
[CI] Run all torchinductor jobs on MPS
#156773 commented on
Aug 6, 2025 • 0 new comments -
[BE][2/4] format shell scripts with `SHFMT` in .circleci/ and .github/
#157687 commented on
Aug 10, 2025 • 0 new comments -
[BE][3/4] format shell scripts with `SHFMT` in .ci/
#157688 commented on
Aug 10, 2025 • 0 new comments -
[BE][4/4] format shell scripts with `SHFMT` in scripts/
#157689 commented on
Aug 10, 2025 • 0 new comments -
Remove upper pin on setuptools
#156713 commented on
Aug 11, 2025 • 0 new comments -
Replace setup.py install with pip install
#156711 commented on
Aug 11, 2025 • 0 new comments -
Replace setup.py develop with pip install -e
#156710 commented on
Aug 11, 2025 • 0 new comments -
Stop parsing command line arguments every time common_utils is imported.
#156703 commented on
Aug 11, 2025 • 0 new comments -
[inductor] Add return types to functions named (_?)(is|has)_.*
#155928 commented on
Aug 11, 2025 • 0 new comments -
[DTensor] Fix aten.all strategy with min instead of sum as the reduce_op
#155420 commented on
Aug 9, 2025 • 0 new comments -
[Precompile] Integrate PrecompileContext with CompilePackage
#155384 commented on
Aug 9, 2025 • 0 new comments -
[export] inline into torch.jit.traced nn module
#155381 commented on
Aug 8, 2025 • 0 new comments -
Try running test_foreach sequentially
#155366 commented on
Aug 6, 2025 • 0 new comments -
Try adding bfloat16 to test_nn_lstm
#155338 commented on
Aug 5, 2025 • 0 new comments -
turn off reorder_for_peak_memory in case of collectives
#155271 commented on
Aug 8, 2025 • 0 new comments -
[WIP][dynamic shapes] if-then-else for meta_select storage offset
#155269 commented on
Aug 5, 2025 • 0 new comments -
higher_order_ops.py unimplemented_v2 migration, part1
#155264 commented on
Aug 5, 2025 • 0 new comments -
[Dynamo] Add CPython default dict tests
#155263 commented on
Aug 9, 2025 • 0 new comments -
Add more logging
#155219 commented on
Aug 6, 2025 • 0 new comments -
[WIP][fake tensor] invalidate memos for PropagateUnbackedSymInts
#155187 commented on
Aug 9, 2025 • 0 new comments -
[OrderedDict] Implement `OrderedDict.popitem(last=...)`
#155153 commented on
Aug 12, 2025 • 0 new comments -
[OrderedDict] Implement `OrderedDict.move_to_end(key, last=False)`
#155152 commented on
Aug 12, 2025 • 0 new comments -
Convert to markdown: cpp_extension.rst, cpp_index.rst, cpu.rst, cuda_environment_variables.rst, cuda._sanitizer.rst
#155110 commented on
Aug 8, 2025 • 0 new comments -
docs: link to Nvidia Container Toolkit in README
#155102 commented on
Aug 11, 2025 • 0 new comments -
[WIP][dynamic shapes] guard_or_false for are_strides_like_channel_last
#155076 commented on
Aug 10, 2025 • 0 new comments -
[dict] Implement dict.__ior__ and fix return type in dict.__or__
#155072 commented on
Aug 12, 2025 • 0 new comments -
[test] lintrunner thing
#155062 commented on
Aug 5, 2025 • 0 new comments -
[WIP] cast to bf16 before mul op in flex bwd
#154922 commented on
Aug 7, 2025 • 0 new comments -
[ROCm] SDPA fix mem fault when dropout is enabled
#154864 commented on
Aug 11, 2025 • 0 new comments -
Fix DataLoader to Pass List to getitems When Using BatchSampler. Fixes Issue_#154810
#154844 commented on
Aug 6, 2025 • 0 new comments -
Add serialization support for register_constant
#154834 commented on
Aug 8, 2025 • 0 new comments -
Fix in pytorch do_bench_using_profiling
#154766 commented on
Aug 6, 2025 • 0 new comments -
[Wheel Variant] Experimental Support
#154733 commented on
Aug 5, 2025 • 0 new comments -
[vision hash update] update the pinned vision hash
#154694 commented on
Aug 12, 2025 • 0 new comments -
libtorch_cpu.so link libm.so issue:GLIBC_2.29 not found
#159454 commented on
Aug 5, 2025 • 0 new comments -
[BE][c10d/Store]add check in pyi
#155855 commented on
Aug 12, 2025 • 0 new comments -
[ATen MTIA backend] Use aten native CPU fallback function on MTIA
#155845 commented on
Aug 11, 2025 • 0 new comments -
[CD] Move build_magma.bat to build_magma.py
#155804 commented on
Aug 11, 2025 • 0 new comments -
[Do not merge] DCP ZOC Test Changes
#155802 commented on
Aug 11, 2025 • 0 new comments -
[dynamo][guards] Skip dispatch key guards for requires_grad=False
#155756 commented on
Aug 11, 2025 • 0 new comments -
torch.distributed TCP bind address
#155741 commented on
Aug 10, 2025 • 0 new comments -
[MPS] Add regression test for memory leak in nn.MaxPool2d
#155730 commented on
Aug 11, 2025 • 0 new comments -
HOP py_impl register to tensor subclass cannot dispatch
#155726 commented on
Aug 11, 2025 • 0 new comments -
[CUDA][MAGMA][Linalg][WIP] Remove MAGMA
#155694 commented on
Aug 12, 2025 • 0 new comments -
Remove unused nonlocal declarations from checkpoint and library helper functions
#155686 commented on
Aug 11, 2025 • 0 new comments -
DOC: update CrossEntropyLoss with note and example of incorrect target specification
#155649 commented on
Aug 12, 2025 • 0 new comments -
[WIP][PGO] exclude optimizer state from PGO whitelist
#155643 commented on
Aug 10, 2025 • 0 new comments -
[Scripts] Add refresh script to clean, pull and build repo
#155639 commented on
Aug 11, 2025 • 0 new comments -
[dict] Implement dict subclass `fromkeys` classmethod
#155608 commented on
Aug 12, 2025 • 0 new comments -
[DRAFT] Evaluate feasability of using FunctionalTensor for Example Value
#155606 commented on
Aug 6, 2025 • 0 new comments -
Remove unnecessary MPSStream initialization
#155602 commented on
Aug 12, 2025 • 0 new comments -
[do NOT land] DTensor + torch_function_mode + flex_attention dispatch test
#155600 commented on
Aug 11, 2025 • 0 new comments -
[do NOT land] torch_function_mode + flex_attention dispatch test
#155594 commented on
Aug 11, 2025 • 0 new comments -
WIP add support for dynamic shapes
#155557 commented on
Aug 8, 2025 • 0 new comments -
[OrderedDict] Add `bool(OrderedDict)`
#155503 commented on
Aug 12, 2025 • 0 new comments -
[OrderedDict] Set the correct dict class in UserDefinedDictVariable
#155502 commented on
Aug 12, 2025 • 0 new comments -
[OrderedDict] Implement `hasattr(..., IteratorVariable)`
#155501 commented on
Aug 12, 2025 • 0 new comments -
[proxy_tensor] Do not clobber tensor proxies for inplace ops
#155456 commented on
Aug 10, 2025 • 0 new comments -
Quiet Inductor #135521
#155450 commented on
Aug 9, 2025 • 0 new comments -
Clean up memory management in impl_func_norm
#155432 commented on
Aug 12, 2025 • 0 new comments -
[inductor] Improve GEMM loggings
#155427 commented on
Aug 11, 2025 • 0 new comments -
[cuDNN] cuDNN frontend for LayerNorm RMSNorm
#159682 commented on
Aug 6, 2025 • 0 new comments -
[dynamo, nested graph breaks] support nested graph breaks x context managers
#159678 commented on
Aug 9, 2025 • 0 new comments -
[ROCm] Limit number of values per thread for reductions on three dimensions
#159652 commented on
Aug 5, 2025 • 0 new comments -
[OpenReg] Refactor and Bug Fix
#159640 commented on
Aug 8, 2025 • 0 new comments -
Moved Autograd Fallback Interface to Header for Use by Out-of-tree Backends
#159639 commented on
Aug 8, 2025 • 0 new comments -
setup [Do not review]
#159636 commented on
Aug 6, 2025 • 0 new comments -
[WIP] incomplete view unabcked fix to by pass vllm issue
#159626 commented on
Aug 6, 2025 • 0 new comments -
Fallback to contiguous layout in convolution lowering on stride mismatch #159462
#159593 commented on
Aug 5, 2025 • 0 new comments -
[c10d][nvshmem] add nvshmem build rules and dependency for libtorch_cuda
#159562 commented on
Aug 12, 2025 • 0 new comments -
Add dtype checks in meta dispatch for various ordering ops
#159556 commented on
Aug 8, 2025 • 0 new comments -
Move config/util to AllocatorConfig for cross-allocator sharing
#159553 commented on
Aug 11, 2025 • 0 new comments -
Editing and updating glossary to test functionality.
#159544 commented on
Aug 6, 2025 • 0 new comments -
[1/N]Port 3 distributed/_tools test cases to Intel GPU
#159543 commented on
Aug 12, 2025 • 0 new comments -
[ci][inductor dashboard] Remove torchao install as its unused
#159501 commented on
Aug 5, 2025 • 0 new comments -
Add B200 smoke test
#159494 commented on
Aug 5, 2025 • 0 new comments -
Support `next(iterator, default)`
#159483 commented on
Aug 5, 2025 • 0 new comments -
Recursively sync fbgemm submodules before build
#159477 commented on
Aug 5, 2025 • 0 new comments -
[WIP]port sevearl test files under test/distributed to Intel GPU
#159473 commented on
Aug 11, 2025 • 0 new comments -
[ROCm] Use opportunistic fastatomics based on hueristics
#159430 commented on
Aug 8, 2025 • 0 new comments -
Fixes for `collections.Counter`
#159368 commented on
Aug 5, 2025 • 0 new comments -
Fixes for `collections.NamedTuple`
#159367 commented on
Aug 5, 2025 • 0 new comments -
Change mutation type of `MutableMappingVariable` to `AttributeMutationNew`
#159366 commented on
Aug 5, 2025 • 0 new comments -
Enable trace through the collections module
#159365 commented on
Aug 5, 2025 • 0 new comments -
[dynamo] Simplify two methods in ConstDictVariable
#159361 commented on
Aug 11, 2025 • 0 new comments -
Avoid potential deadlocks in host allocator
#159352 commented on
Aug 11, 2025 • 0 new comments -
[dynamo, nested graph breaks] support very simple nested graph breaks
#159329 commented on
Aug 9, 2025 • 0 new comments -
[dynamo, nested graph breaks] use CALL_FUNCTION_EX when calling resume function
#159281 commented on
Aug 9, 2025 • 0 new comments -
[Inductor] support native_layer_norm_backward mixed dtype for privateuse1
#159830 commented on
Aug 5, 2025 • 0 new comments -
DO NOT MERGE, testing sequential builds
#159827 commented on
Aug 12, 2025 • 0 new comments -
Add support for error-ing when there is side effect
#159826 commented on
Aug 5, 2025 • 0 new comments -
[BE][Dynamo] Type improvements in `_dynamo/utils` to generics
#159824 commented on
Aug 11, 2025 • 0 new comments -
ci: Add option for sequential wheel building
#159821 commented on
Aug 11, 2025 • 0 new comments -
[dynamo, nested graph breaks] support nested closures
#159817 commented on
Aug 9, 2025 • 0 new comments -
[Inductor] Freeze matmul args layouts
#159813 commented on
Aug 5, 2025 • 0 new comments -
Replace C array with std::array in formatSockAddr
#159812 commented on
Aug 6, 2025 • 0 new comments -
[inductor] remove no_x_dim
#159810 commented on
Aug 11, 2025 • 0 new comments -
fix type documentation for context_parallel no_restore_buffers, to prevent user from passing in the wrong type
#159808 commented on
Aug 7, 2025 • 0 new comments -
[dynamo] fixes to propagate tag safeness
#159807 commented on
Aug 12, 2025 • 0 new comments -
[146643]Fixed max triton generation
#159797 commented on
Aug 6, 2025 • 0 new comments -
[dynamo, nested graph breaks] prevent excessive recompilations
#159786 commented on
Aug 12, 2025 • 0 new comments -
[Tests]: disable logspace tests correctly
#159785 commented on
Aug 5, 2025 • 0 new comments -
[MTIA Aten Backend] Migrate any.all_out (example diff for tutorial)
#159780 commented on
Aug 7, 2025 • 0 new comments -
[Caffe2] Add float batch box cox SVE128 implementation
#159778 commented on
Aug 11, 2025 • 0 new comments -
[Inductor][Triton] Support TMA before strict 3.4 cutoff
#159777 commented on
Aug 6, 2025 • 0 new comments -
[ROCm] Fix Sliding Window Attention in AOTriton integration code
#159773 commented on
Aug 8, 2025 • 0 new comments -
Add binary size check to validate current limits for binaries released to pypi
#159768 commented on
Aug 5, 2025 • 0 new comments -
[CI] Reduce XPU Windows build time
#159763 commented on
Aug 11, 2025 • 0 new comments -
[FSDP] Add FrozenParamHandle to optimize memory for frozen parameters
#159751 commented on
Aug 9, 2025 • 0 new comments -
AOT graph capture with dynamo.
#159749 commented on
Aug 6, 2025 • 0 new comments -
Build and Install Arm Compute Library in manylinux docker image
#159737 commented on
Aug 7, 2025 • 0 new comments -
Fix GroupNorm(num_groups=1) to match LayerNorm behavior
#159736 commented on
Aug 9, 2025 • 0 new comments -
Use uv run for lintrunner Python deps
#159735 commented on
Aug 11, 2025 • 0 new comments -
[Don't Review] Test XPU CI
#159718 commented on
Aug 5, 2025 • 0 new comments -
dynamo: Remove passing or deleted dynamo_expected_failures
#159691 commented on
Aug 8, 2025 • 0 new comments -
Recheck Autotune cache on Precompile serialization to prune compilation results
#158656 commented on
Aug 5, 2025 • 0 new comments -
Update persons of interest for XLA. The previous one is out of date.
#158652 commented on
Aug 9, 2025 • 0 new comments -
[NumPy] use NumPy 2.x in CI
#158647 commented on
Aug 10, 2025 • 0 new comments -
[OpenReg] Add Develop Notes for Integrating New Backend into PyTorch(Operator Aspect)
#158644 commented on
Aug 8, 2025 • 0 new comments -
unskipped flaky conv2d, rmatmul, and matmul as these now pass
#158640 commented on
Aug 11, 2025 • 0 new comments -
Don't use LLVM libraries
#158623 commented on
Aug 12, 2025 • 0 new comments -
[simplefsdp auto-bucketing] auto bucketing with greedy algorithm
#158609 commented on
Aug 11, 2025 • 0 new comments -
[cuda][complex] Use scaling to compute the absolute value of complex number to avoid overflow
#158557 commented on
Aug 6, 2025 • 0 new comments -
port 3 distributed test to Intel GPU and unified some common functions
#158533 commented on
Aug 12, 2025 • 0 new comments -
Filter out local timer tests which are unimplemented in Python on AArch64
#158342 commented on
Aug 7, 2025 • 0 new comments -
Move XPUEvent to c10
#158336 commented on
Aug 11, 2025 • 0 new comments -
[simplefsdp auto-bucketing] manual bucketing with plan
#158321 commented on
Aug 11, 2025 • 0 new comments -
autograd: Add VJP and JVP rules for aten::aminmax
#158241 commented on
Aug 11, 2025 • 0 new comments -
reuse EventPool::Event in CUDAAllocator
#158224 commented on
Aug 11, 2025 • 0 new comments -
Move EventPool::Event to c10
#158220 commented on
Aug 11, 2025 • 0 new comments -
Move CUDAEvent to c10
#158219 commented on
Aug 11, 2025 • 0 new comments -
Multi-threaded concurrent fetching in Dataloader for high-latency storage.
#158218 commented on
Aug 7, 2025 • 0 new comments -
[scan] cloned aliased input when lowering scan to while_loop
#158168 commented on
Aug 12, 2025 • 0 new comments -
[DTensor] Assert DTensorSpec has valid placements
#158133 commented on
Aug 6, 2025 • 0 new comments -
[build] pin `setuptools>=77` to enable PEP 639
#158104 commented on
Aug 11, 2025 • 0 new comments -
[simplefsdp auto-bucketing] add ir node reorder helper function
#158098 commented on
Aug 11, 2025 • 0 new comments -
[simplefsdp auto-bucketing] add ir node bucket helper function
#158097 commented on
Aug 11, 2025 • 0 new comments -
[inductor] add template hashing for template lookup table
#158091 commented on
Aug 11, 2025 • 0 new comments -
adding types to nn module init
#158065 commented on
Aug 11, 2025 • 0 new comments -
[dict] Support `dict.update()` with no args
#158061 commented on
Aug 12, 2025 • 0 new comments -
Update upstream opinfo to generate appropriately scaled sample inputs
#158018 commented on
Aug 8, 2025 • 0 new comments -
remove unnecessary sync point in AveragedModel update
#158017 commented on
Aug 11, 2025 • 0 new comments -
[Caffe2] Build perfkernels targeting SVE128
#159274 commented on
Aug 11, 2025 • 0 new comments -
add try catch around provenance tracking
#159266 commented on
Aug 6, 2025 • 0 new comments -
[WIP][2/N] Port 5 _composable distributed test to Intel GPU
#159241 commented on
Aug 8, 2025 • 0 new comments -
Remove guard_size_oblivious from default contiguity python check, and add aten.sym_is_contiguous.
#159197 commented on
Aug 9, 2025 • 0 new comments -
[TESTING] Triton pin (Aug 8) 05b2c186c1b6c9a08375389d5efe9cb4c401c075
#159158 commented on
Aug 9, 2025 • 0 new comments -
[WIP][1/N] Port 5 _composable/fsdp distributed test cases to Intel GPU
#159118 commented on
Aug 8, 2025 • 0 new comments -
outer heuristic
#159093 commented on
Aug 12, 2025 • 0 new comments -
port distributed pipeline test files for Intel GPU
#159033 commented on
Aug 12, 2025 • 0 new comments -
[while_loop] support input mutation with auto_functionalize
#159010 commented on
Aug 12, 2025 • 0 new comments -
[cond] support input mutation with auto_functionalize
#159009 commented on
Aug 12, 2025 • 0 new comments -
Docs on export joint with descriptors
#159006 commented on
Aug 12, 2025 • 0 new comments -
[inductor] add lookup table recorder
#158987 commented on
Aug 11, 2025 • 0 new comments -
[BE] create an empty shape_env for check_input_alias_and_mutation_return_outputs
#158965 commented on
Aug 12, 2025 • 0 new comments -
Ensure outer aliasing on DTensor matches inner aliasing
#158954 commented on
Aug 12, 2025 • 0 new comments -
[Caffe2] Import SVE128 PR
#158932 commented on
Aug 11, 2025 • 0 new comments -
[triton_heuristics] Optimize the triton launcher in pt2
#158897 commented on
Aug 7, 2025 • 0 new comments -
[map] support gen_schema for map
#158884 commented on
Aug 12, 2025 • 0 new comments -
[associative_scan] support gen_schema for associative_scan
#158883 commented on
Aug 12, 2025 • 0 new comments -
[CI] Switch ROCm MI300 GitHub Actions workflows from 2-GPU to 1-GPU runners
#158882 commented on
Aug 11, 2025 • 0 new comments -
[scan] support gen_schema for scan
#158864 commented on
Aug 12, 2025 • 0 new comments -
[while_loop] support gen_schema for while_loop
#158863 commented on
Aug 12, 2025 • 0 new comments -
Update RandomSampler docstring. data_source must be Sized not Dataset
#158857 commented on
Aug 7, 2025 • 0 new comments -
[ROCm] Avoid test_conv_backend_cudnn* UTs
#158817 commented on
Aug 8, 2025 • 0 new comments -
Update nullcontext to return input args
#158776 commented on
Aug 8, 2025 • 0 new comments -
Guard rocm_smi.h include with a header check
#158771 commented on
Aug 5, 2025 • 0 new comments -
[ROCm] [CK] Composable Kernel integration for ROCm
#158747 commented on
Aug 8, 2025 • 0 new comments -
[BE] Upgrade XPU support package to 2025.2
#158733 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_empty_cpu_tensor (__main__.CudaGraphTreeTests)
#156735 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_graph_break_unsupported_fake (__main__.ReproTests)
#156629 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_with_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157335 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_while_loop_schema_gen (__main__.TestHopSchema)
#141202 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_allocate_in_thread_to_pool (__main__.TestBlockStateAbsorption)
#158764 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_rng (__main__.TestCompilerBisector)
#139590 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)
#157110 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_dynamic_warmup (__main__.CudaGraphTreeTests)
#156693 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_get_parameter_dtype (__main__.ReproTests)
#156598 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_add_sub_alpha_out (__main__.ReproTests)
#156597 commented on
Aug 7, 2025 • 0 new comments -
Inductor codegen for float8 dynamic quantization ops for scaled_grouped_mm backward pass is slow
#159769 commented on
Aug 7, 2025 • 0 new comments -
[BUG]Nan in gradients of scaled_dot_product_attention operation with mem_efficient backend
#125674 commented on
Aug 7, 2025 • 0 new comments -
non-deterministic issue of torch.einsum function on different GPU.
#137389 commented on
Aug 7, 2025 • 0 new comments -
Error : torch/utils/_sympy/interp.py:176] [0/2] failed while executing pow_by_natural([VR1, int_oo], VR[-1, -1]])
#148003 commented on
Aug 8, 2025 • 0 new comments -
[MPS] test_linalg_cholesky fails on M4
#157364 commented on
Aug 8, 2025 • 0 new comments -
Weird dataloader performance degradation caused by torch and numpy import order
#101188 commented on
Aug 8, 2025 • 0 new comments -
KeyError when using fx.split_module
#155220 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_add_loggers_functions (__main__.TestFXNumericSuiteNShadows)
#140380 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv2d_strided_padding_dilation_no_bias_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157315 commented on
Aug 7, 2025 • 0 new comments -
`RuntimeError: UR error` with XPU
#149953 commented on
Aug 7, 2025 • 0 new comments -
Mutating a tensor while serializing with safetensors crashes free-threaded PyTorch
#158071 commented on
Aug 7, 2025 • 0 new comments -
extern kernel's get_free_symbols seems incomplete
#159685 commented on
Aug 7, 2025 • 0 new comments -
Support dict as input/output for pipeline parallelism
#159711 commented on
Aug 7, 2025 • 0 new comments -
Poor error message when trying to jit a function instead of a module (RuntimeError: Cannot insert a Tensor that requires grad as a constant.)
#55282 commented on
Aug 7, 2025 • 0 new comments -
foreach CUDA tests flaky on CUDA 12.6+ due to flaky profiler results
#148681 commented on
Aug 7, 2025 • 0 new comments -
SystemError: PY_SSIZE_T_CLEAN macro must be defined for '#' formats - PyTorch compile fails with Python 3.12
#153737 commented on
Aug 7, 2025 • 0 new comments -
Support installing Python bindings in CMake
#159232 commented on
Aug 7, 2025 • 0 new comments -
General MPS op coverage tracking issue
#77764 commented on
Aug 7, 2025 • 0 new comments -
Cannot pass through None for example_inputs in prepare_fx
#159505 commented on
Aug 7, 2025 • 0 new comments -
Feature request: `DataLoader` using multithreading instead of multiprocessing
#158714 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_triton_barrier (__main__.NVSHMEMTritonTest)
#158761 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_graph_concurrent_replay (__main__.TestCuda)
#104055 commented on
Aug 7, 2025 • 0 new comments -
Add official support for CUDA sm_120 (RTX 5090 / Blackwell architecture)
#159207 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_end_recording_early (__main__.CudaGraphTreeTests)
#156778 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_add_loggers_linear_mod_fp32_quant (__main__.TestFXNumericSuiteNShadows)
#142860 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_dont_dce_rand (__main__.ReproTests)
#156580 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_assigning_back_deleter_fns_to_tensor (__main__.TestBlockStateAbsorption)
#134810 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_aot_autograd_runtime_wrapper_prologue_profiled (__main__.ReproTests)
#156678 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_error_on_dealloc_use (__main__.CudaGraphTreeTests)
#156801 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose1d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142566 commented on
Aug 8, 2025 • 0 new comments -
Support for _Float16/C++23 std::float16_t
#157776 commented on
Aug 8, 2025 • 0 new comments -
Add nn.GradBank for gradient scaling to prevent vanishing/exploding gradients
#159765 commented on
Aug 8, 2025 • 0 new comments -
RuntimeError: Tried to instantiate dummy base class Stream
#159744 commented on
Aug 8, 2025 • 0 new comments -
DataLoader num_workers > 0 causes CPU memory from parent process to be replicated in all worker processes
#13246 commented on
Aug 8, 2025 • 0 new comments -
[discussion, idea] Batched, vectorized base64 decoding / encoding + maybe RLE decoding / encoding
#90560 commented on
Aug 8, 2025 • 0 new comments -
canUse32BitIndexMath set to False with efficient net
#155225 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_sort_large_cuda_float16 (__main__.TestSortAndSelectCUDA)
#159426 commented on
Aug 8, 2025 • 0 new comments -
Deterministic implementation for grid_sampler_2d_backward_cuda
#68959 commented on
Aug 8, 2025 • 0 new comments -
Obscure error: Expected a value of type 'List[int]' for argument 'sizes' but instead found type 'immutable_list'
#122129 commented on
Aug 8, 2025 • 0 new comments -
Incorrect hint calculated for expression involving unbacked SymInt
#130456 commented on
Aug 8, 2025 • 0 new comments -
Remove redundant type aliases of _device for torch.Device
#152952 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_addr_alpha_beta_out (__main__.ReproTests)
#156641 commented on
Aug 8, 2025 • 0 new comments -
Dead link in `torch.compile` docs
#119272 commented on
Aug 8, 2025 • 0 new comments -
[feature request] Inplace downcast dtype conversion
#158710 commented on
Aug 8, 2025 • 0 new comments -
`tensordot` not working for dtype int32 and lower when there is only 1 element in the given axis
#84530 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_triton_broadcast (__main__.NVSHMEMTritonTest)
#158908 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_add_loggers_linear_mod_fp32_fp32 (__main__.TestFXNumericSuiteNShadows)
#159036 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_empty_storage (__main__.CudaGraphTreeTests)
#156755 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_relative_import (__main__.ReproTests)
#156679 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_compile_kernel_advanced (__main__.TestCompileKernel)
#157172 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_relative_import_no_modulename (__main__.ReproTests)
#156691 commented on
Aug 8, 2025 • 0 new comments -
UNSTABLE rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default)
#158182 commented on
Aug 8, 2025 • 0 new comments -
UNSTABLE inductor-rocm-mi300 / rocm-py3.10-inductor-mi300 / test (inductor)
#154884 commented on
Aug 8, 2025 • 0 new comments -
Fix for special.zeta nan handling - follow-up PR #138653
#146618 commented on
Aug 8, 2025 • 0 new comments -
DISABLED test_function_compiled_multiple_times (__main__.CudaGraphTreeTests)
#157143 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_name_match (__main__.TestGuardSerialization)
#156246 commented on
Aug 5, 2025 • 0 new comments -
[BE] linter to detect unused docker images
#158783 commented on
Aug 5, 2025 • 0 new comments -
ModuleDict subscription no longer works after compile().
#159831 commented on
Aug 5, 2025 • 0 new comments -
`torch.load` can't deserialize `datetime` objects, even with the appropriate `safe_globals`
#152985 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_add_loggers_conv_bn_relu_fusion_fp32 (__main__.TestFXNumericSuiteNShadows)
#158762 commented on
Aug 6, 2025 • 0 new comments -
DISABLED test_zero_bubble_with_model_kwargs_ScheduleClass1 (__main__.ScheduleTest)
#154579 commented on
Aug 6, 2025 • 0 new comments -
DISABLED test_graph_partition (__main__.CudaGraphTreeTests)
#157173 commented on
Aug 6, 2025 • 0 new comments -
DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
#157258 commented on
Aug 6, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157278 commented on
Aug 6, 2025 • 0 new comments -
Inconsistent Error Message for Cross-Device Input in torch.compile
#159133 commented on
Aug 6, 2025 • 0 new comments -
CMake Error: When installing PyTorch from source, CUDA not being detected.
#134331 commented on
Aug 6, 2025 • 0 new comments -
native_layer_norm_backward supports mixed precision for PrivateUse1
#159829 commented on
Aug 6, 2025 • 0 new comments -
User Triton Kernels Are not Serialized in Fx Graph Runnable
#153475 commented on
Aug 6, 2025 • 0 new comments -
[RFC]: PyTorch Low-Precision GEMMs Public API
#157950 commented on
Aug 6, 2025 • 0 new comments -
Add mean and var operation for Nested Tensors
#138831 commented on
Aug 6, 2025 • 0 new comments -
foreach_map enhancements
#158968 commented on
Aug 5, 2025 • 0 new comments -
torch compile produces nans with GQA
#159469 commented on
Aug 5, 2025 • 0 new comments -
[Feat] Tracking for OpenReg Improvements
#158917 commented on
Aug 5, 2025 • 0 new comments -
ONNX export via Dynamo sets `dft_length = 1` in `DFT`, breaking shape-inference for `torch.fft.rfft`
#155997 commented on
Aug 5, 2025 • 0 new comments -
mmap fails on 64k page aarch64 systems for AOTI model loading
#145610 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_conv2d_api (__main__.TestQuantizedFunctionalOps)
#157346 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_mv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#142697 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_mempool_limited_memory_with_allocator (__main__.TestMemPool)
#157256 commented on
Aug 5, 2025 • 0 new comments -
DDP+TP composition does not work as expected
#157445 commented on
Aug 5, 2025 • 0 new comments -
Tracking issue: Incorrect Meta Strides / Turn On PyDispatcher in FakeTensor Mode
#145094 commented on
Aug 5, 2025 • 0 new comments -
Experimental Wheel Variant Support - Technical Discussion
#155141 commented on
Aug 5, 2025 • 0 new comments -
Compilation of the post-training quantized model using Nvidia ModelOpt is failing with the error: Unsupported — 'inline in skipfiles: QuantLinearConvBase.quantize_weight
#151450 commented on
Aug 5, 2025 • 0 new comments -
torch.empty should consider np.bool while parsing args
#159739 commented on
Aug 5, 2025 • 0 new comments -
Inductor doesn't fuse outer dimension softmax into a single kernel.
#93718 commented on
Aug 5, 2025 • 0 new comments -
[NJT] can only chunk if the 2nd dimension is ragged
#153238 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_dont_aggressively_write_assert (__main__.ReproTests)
#156570 commented on
Aug 5, 2025 • 0 new comments -
DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
#157280 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_vmap_exhaustive_addmv_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157617 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_call_count_tunableop_cuda_float32 (__main__.TestLinalgCUDA)
#155953 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_reentrant_parent_error_on_cpu_cuda (__main__.TestAutogradDeviceTypeCUDA)
#86735 commented on
Aug 7, 2025 • 0 new comments -
NotImplementedError: Could not run 'aten::q_scale' with arguments from the 'CPU'
#159743 commented on
Aug 7, 2025 • 0 new comments -
Bug: `torch.compile` triggers C++ compile error due to conflicting declaration in generated `.cpp` code
#159245 commented on
Aug 7, 2025 • 0 new comments -
[DTensor][FSDP2][DDP] benchmark dtensor cpu overhead for adam optimizer
#159169 commented on
Aug 7, 2025 • 0 new comments -
[Flex Attention] Accuracy issue with kv length not multiple of kv block size
#159247 commented on
Aug 7, 2025 • 0 new comments -
torch.export with nn.Transformer creates a non-contiguous memory tensor for aten.view
#159126 commented on
Aug 7, 2025 • 0 new comments -
Torch profiler corrupted names with Python 3.11
#121219 commented on
Aug 7, 2025 • 0 new comments -
Segmentation fault with ITIMER_REAL
#57185 commented on
Aug 7, 2025 • 0 new comments -
Foreach Where support
#117884 commented on
Aug 7, 2025 • 0 new comments -
get_ema_multi_avg_fn() equation is a little confused
#155551 commented on
Aug 7, 2025 • 0 new comments -
Autogenerate code example / tutorial outputs in documentation
#6662 commented on
Aug 7, 2025 • 0 new comments -
Is DTensor support dynamic shapes while using torch.compile ?
#159635 commented on
Aug 7, 2025 • 0 new comments -
Einsum of 2 dtensors fails in inference mode
#157631 commented on
Aug 7, 2025 • 0 new comments -
Enable CUDA 12.9 binaries
#155196 commented on
Aug 6, 2025 • 0 new comments -
extract statistics from attention weights in FlexAttention
#159770 commented on
Aug 6, 2025 • 0 new comments -
Enable CI and Build Support for PyTorch on PPC64LE Architecture
#141235 commented on
Aug 6, 2025 • 0 new comments -
Transition Adam and maybe other optimizers to fused code path by default to avoid `foreach=True`-specific VRAM peak due to temp TensorList for bias-corrected moments
#158371 commented on
Aug 6, 2025 • 0 new comments -
Different behavior between sparse and dense tensors with broadcasting multiplication.
#158861 commented on
Aug 6, 2025 • 0 new comments -
Performance bug on `inplace` of `nn.ELU`
#159622 commented on
Aug 6, 2025 • 0 new comments -
cuda memory error thrown by torch.
#150048 commented on
Aug 6, 2025 • 0 new comments -
which Pythorch suports "TORCH_USE_CUDA_DSA=1" from the shell environment?
#121969 commented on
Aug 6, 2025 • 0 new comments -
[dynamo] torch.randint_like on DTensor does not work with compile
#156649 commented on
Aug 6, 2025 • 0 new comments -
[AOTI] Severe Performance Regression with FP16 Autocast in AOTInductor for Small Batch Sizes
#159346 commented on
Aug 6, 2025 • 0 new comments -
Label tracking meta-issue (edit me to get automatically CC'ed on issues! cc bot)
#24422 commented on
Aug 6, 2025 • 0 new comments -
DISABLED test_module_and_optimizer_ids (__main__.TestTorchTidyProfiler)
#87581 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_add_loggers_conv_bn_relu_fusion_quant (__main__.TestFXNumericSuiteNShadows)
#127814 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_comprehensive_pca_lowrank_cuda_float32 (__main__.TestInductorOpInfoCUDA)
#139828 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_dtensor_seq_par_shard_dim_1 (__main__.MicroPipelineTPTest)
#153223 commented on
Aug 7, 2025 • 0 new comments -
DISABLED test_triton_alltoall (__main__.NVSHMEMTritonTest)
#158840 commented on
Aug 7, 2025 • 0 new comments -
[dynamo, nested graph breaks] add nested graph break tests
#144516 commented on
Aug 9, 2025 • 0 new comments -
Replacing explicit backend search with api call
#144944 commented on
Aug 9, 2025 • 0 new comments -
Support contextlib.ExitStack
#146506 commented on
Aug 12, 2025 • 0 new comments -
Enable explicitly vectorized `_weight_int8pack_mm` op for FP16 dtype on x86_64 CPU
#146777 commented on
Aug 5, 2025 • 0 new comments -
[DO NOT MERGE][Inductor] Migrate from oneDNN Inner Product to oneDNN MatMul for mkldnn._linear_pointwise and mkldnn._linear_pointwise.binary
#147360 commented on
Aug 12, 2025 • 0 new comments -
[ONNX] Migrate onnx ops decomp functions
#147469 commented on
Aug 6, 2025 • 0 new comments -
[test] check labels
#147470 commented on
Aug 5, 2025 • 0 new comments -
Support `contextlib.suppress`
#147990 commented on
Aug 11, 2025 • 0 new comments -
[pytree] add another simplified pytree module `torch.pytree`
#148180 commented on
Aug 8, 2025 • 0 new comments -
[pytree] simplify public API exposition with `__module__`
#148328 commented on
Aug 8, 2025 • 0 new comments -
[BE][pytree] rename `NodeDef` member to match the type annotations: `*_fn -> *_func`
#148474 commented on
Aug 8, 2025 • 0 new comments -
[BE][pytree] rename argument name in register function to match the type annotations: `*_fn -> *_func`
#148484 commented on
Aug 8, 2025 • 0 new comments -
[triton hash update] update the pinned triton hash
#148492 commented on
Aug 12, 2025 • 0 new comments -
Remove shebang line from easy_install generated python scripts on Windows only
#148673 commented on
Aug 5, 2025 • 0 new comments -
Support int step for nonfused optimizer
#148956 commented on
Aug 8, 2025 • 0 new comments -
Update the heuristic for AArch64 bmm/baddbmm
#149122 commented on
Aug 8, 2025 • 0 new comments -
TorchInductor CPU Performance Dashboard
#93531 commented on
Aug 12, 2025 • 0 new comments -
[inductor] [cpu] `torch.nn.Embedding-torch.index_copy` outputs inconsistent results on cpu inductor
#156786 commented on
Aug 12, 2025 • 0 new comments -
Automated submodule update: kineto
#106149 commented on
Aug 11, 2025 • 0 new comments -
Automated submodule update: FBGEMM
#115316 commented on
Aug 12, 2025 • 0 new comments -
[AOTAutograd] tweak min-cut partitioner to avoid saving softmax output
#126348 commented on
Aug 6, 2025 • 0 new comments -
Remove deprecated torch/csrc/jit/codegen/cuda
#131296 commented on
Aug 11, 2025 • 0 new comments -
Add decompositions for median and nonmedian
#134881 commented on
Aug 7, 2025 • 0 new comments -
[pytree] Add public pytree module `torch.utils.pytree`
#137400 commented on
Aug 8, 2025 • 0 new comments -
Add TORCH_CHECK_INDEX in convert_indices_from_coo_to_csr_cpu
#138068 commented on
Aug 6, 2025 • 0 new comments -
Extending SVE VEC Backend Support in PyTorch to SVE128 and SVE512.
#138388 commented on
Aug 11, 2025 • 0 new comments -
Fix `USE_STATIC_MKL` lost functionality
#138996 commented on
Aug 11, 2025 • 0 new comments -
[Don't Review] Test CI
#139971 commented on
Aug 9, 2025 • 0 new comments -
Using acc_t for log_softmax
#143896 commented on
Aug 5, 2025 • 0 new comments -
Defaults to C++20 for torch targets
#143959 commented on
Aug 11, 2025 • 0 new comments -
[ci] Add riscv opt-int build
#143979 commented on
Aug 8, 2025 • 0 new comments -
Full static typing for `torch.distributions`
#144219 commented on
Aug 11, 2025 • 0 new comments -
[pytree] make `tree_*` functions accept both Python and C++ `PyTreeSpec`
#152624 commented on
Aug 8, 2025 • 0 new comments -
[CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
#152642 commented on
Aug 9, 2025 • 0 new comments -
[invoke_subgraph] Force the output stride to be same as eager
#152806 commented on
Aug 5, 2025 • 0 new comments -
[MSVC] Enable updated lambda processor by setting compiler flag /Zc:lambda globally
#152828 commented on
Aug 5, 2025 • 0 new comments -
[AOTI Debugging] Add Environment Variable to control output path
#153391 commented on
Aug 11, 2025 • 0 new comments -
[AUTOCAST] FEAT: Allow passing a `torch.device` object to autocast
#153539 commented on
Aug 9, 2025 • 0 new comments -
Updates contextlib with ParamSpec
#153623 commented on
Aug 6, 2025 • 0 new comments -
Add TORCH_CHECK for group < channels for native_channel_shuffle
#153781 commented on
Aug 5, 2025 • 0 new comments -
Fix `LLONG_MIN` errors in `torch.jit.script`
#153793 commented on
Aug 5, 2025 • 0 new comments -
[Dynamo] Fixes for exceptions
#153966 commented on
Aug 11, 2025 • 0 new comments -
[cond] support gen_schema for cond
#154193 commented on
Aug 12, 2025 • 0 new comments -
Revert D74898941 (#154188)
#154203 commented on
Aug 8, 2025 • 0 new comments -
Updated padding validation in max_pool functions to account for dilation
#154395 commented on
Aug 10, 2025 • 0 new comments -
[Inductor] Fix remove_noop_ops pass where the types for the same_meta would differ
#154460 commented on
Aug 6, 2025 • 0 new comments -
[WIP][export][cond] support exporting cond with unbacked symint shaped tensor
#154570 commented on
Aug 5, 2025 • 0 new comments -
Use official CUDAToolkit module in CMake
#154595 commented on
Aug 11, 2025 • 0 new comments -
Fix `SequentialLR` deprecate warning about invoke `step(epoch)`
#149392 commented on
Aug 8, 2025 • 0 new comments -
Introduce test skip markers for Sandcastle
#150934 commented on
Aug 5, 2025 • 0 new comments -
[hop] Make base_hop share utils with control flow ops in backward
#151146 commented on
Aug 5, 2025 • 0 new comments -
[WIP] Generalize device caching allocator
#151298 commented on
Aug 8, 2025 • 0 new comments -
Add inductor backend to device interface; make minifier_tests more device agnostic
#151314 commented on
Aug 6, 2025 • 0 new comments -
Fix skipIfXpu and skipIfHpu disables tests when used on class
#151315 commented on
Aug 5, 2025 • 0 new comments -
AMD/ROCm OCP Micro-scaling Format (mx-fp8/mx-fp4) Support
#151360 commented on
Aug 11, 2025 • 0 new comments -
Add is_pinned to host allocator
#151439 commented on
Aug 10, 2025 • 0 new comments -
Allow to byteswap data when reading saved torch jit data
#151447 commented on
Aug 7, 2025 • 0 new comments -
[WIP] Deprecate getPinnedMemoryAllocator use getHostAllocator instead
#151531 commented on
Aug 10, 2025 • 0 new comments -
GEMM-template Horizontal
#151780 commented on
Aug 7, 2025 • 0 new comments -
[WIP] Deprecate AcceleratorHooksInterface isPinnedPtr, use at::getHostAllocator()->is_pinned instead
#151916 commented on
Aug 10, 2025 • 0 new comments -
Expand cache logging
#152026 commented on
Aug 10, 2025 • 0 new comments -
Work around MPSGraph issue in backward pass of nn.ReplicationPad1d/2d
#152094 commented on
Aug 11, 2025 • 0 new comments -
[jit] DeadCodeEliminator Mark(block) improvement
#152348 commented on
Aug 10, 2025 • 0 new comments -
Build libgomp (gcc-13) from src on AArch64
#152361 commented on
Aug 7, 2025 • 0 new comments -
Bitwise-perfect method for (de)serializing tensors in base64
#93859 commented on
Aug 10, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose2d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157368 commented on
Aug 11, 2025 • 0 new comments -
[RFC] Graph generalization
#158827 commented on
Aug 11, 2025 • 0 new comments -
[export] Unable to trace ops like min/pow
#148389 commented on
Aug 11, 2025 • 0 new comments -
DISABLED test_add_loggers_linear_mod_quant_fp32 (__main__.TestFXNumericSuiteNShadows)
#159152 commented on
Aug 11, 2025 • 0 new comments -
DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
#157339 commented on
Aug 11, 2025 • 0 new comments -
DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
#157312 commented on
Aug 11, 2025 • 0 new comments -
[ARM] multiple test failures in TestQuantizedConv on Aarch64
#144770 commented on
Aug 11, 2025 • 0 new comments -
Unable to Specify CUDA Stream for Collective Operations Using with torch.cuda.stream() context
#136187 commented on
Aug 11, 2025 • 0 new comments -
Allow slicing of Nested Tensors along constant dimensions
#108567 commented on
Aug 11, 2025 • 0 new comments -
torch.nn.InstanceNorm2d throws "mixed dtype" error with track_running_stats set to True
#139140 commented on
Aug 11, 2025 • 0 new comments -
Data corruption when reading data as CUDA tensor from a different process
#134273 commented on
Aug 11, 2025 • 0 new comments -
Update epsilon logic to improve numerical stability
#151110 commented on
Aug 11, 2025 • 0 new comments -
[Doc] [Win] libuv installation doc is not correct.
#148315 commented on
Aug 11, 2025 • 0 new comments -
[ONNX] broadcast_in_dim: model (ReDimNet)
#138313 commented on
Aug 11, 2025 • 0 new comments -
Most requested ops for the MPS backend
#154052 commented on
Aug 11, 2025 • 0 new comments -
Support for non-scalar valued Loss Tensor in `_export_forward_backward()`
#159316 commented on
Aug 9, 2025 • 0 new comments -
Gross mismatch in PDF between CUDA and CPU for multivariate Gaussian mixture models
#156959 commented on
Aug 9, 2025 • 0 new comments -
There is a performance drop because we have not yet implemented the batching rule for aten::_scaled_dot_product_efficient_attention_backward
#117016 commented on
Aug 9, 2025 • 0 new comments -
Performance issue on Windows with a "benchmark" comparing to Linux and WLS
#87692 commented on
Aug 9, 2025 • 0 new comments -
Get https://github.com/pytorch/benchmark working
#87697 commented on
Aug 9, 2025 • 0 new comments -
Compiling attention (SDPA) with nested tensors fails when using DDP
#152068 commented on
Aug 9, 2025 • 0 new comments -
Cannot configure dist_timeout when using device_mesh
#119574 commented on
Aug 9, 2025 • 0 new comments -
Performance regression: torch.jit.trace() significantly slower on RTX 5090 than RTX 4060 (cu128 nightly)
#159238 commented on
Aug 9, 2025 • 0 new comments -
MPS Sparse Support
#129842 commented on
Aug 9, 2025 • 0 new comments -
[feature request] Native checkpointing to/from `s3://` (DCP / `torch.load` / `torch.save`)
#155992 commented on
Aug 9, 2025 • 0 new comments -
PyTorch 2.8.0 exposes statically linked `libstdc++` CXX11 ABI symbols.
#133437 commented on
Aug 10, 2025 • 0 new comments -
strange error when distributed training
#150081 commented on
Aug 10, 2025 • 0 new comments -
torch_function objects passed as non-Tensor args should trigger overrides
#119194 commented on
Aug 10, 2025 • 0 new comments -
State of Torch Named Tensors
#60832 commented on
Aug 10, 2025 • 0 new comments -
Activation Checkpointing dtype Mismatch in Recomputed Activations Due to Skipped Type Casting in FSDPv2 _pre_forward
#159359 commented on
Aug 10, 2025 • 0 new comments -
Torch Compile. CudaGraphs Memory leak
#159669 commented on
Aug 10, 2025 • 0 new comments -
Triton 3.5 / PyTorch 2.9 Pin Update Tracker
#159704 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cpu (__main__.CpuTests)
#151379 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cpu (__main__.CpuTests)
#151382 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_cdist_large_batch_cpu (__main__.TestTorchDeviceTypeCPU)
#158909 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_gru (__main__.TestXNNPACKQuantizer)
#158116 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_execution_into_recording (__main__.CudaGraphTreeTests)
#156838 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_scatter_cuda (__main__.GPUTests)
#151378 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_view_default_cuda (__main__.GPUTests)
#151511 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cuda (__main__.GPUTests)
#151383 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_slice1_cuda (__main__.GPUTests)
#151381 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_error_on_dealloc_use2 (__main__.CudaGraphTreeTests)
#156808 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_remove_noop_slice_cpu (__main__.CpuTests)
#151384 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_against_reference_multi_input_jacfwd_cuda (__main__.TestJacCUDA)
#156998 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule___rmatmul___cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#157003 commented on
Aug 12, 2025 • 0 new comments -
DISABLED test_op_has_batch_rule_nn_functional_conv_transpose3d_cuda_float32 (__main__.TestVmapOperatorsOpInfoCUDA)
#82340 commented on
Aug 12, 2025 • 0 new comments -
Enable CUDA 13.0 binaries
#159779 commented on
Aug 12, 2025 • 0 new comments -
Immutable assignment (akin to `array.at[...]` in JAX)
#159784 commented on
Aug 11, 2025 • 0 new comments -
Re-enable Low Memory Dropout
#102319 commented on
Aug 11, 2025 • 0 new comments -
dataloader hits CUDA error: invalid argument in data.pin_memory(device) on RTX 3090
#159447 commented on
Aug 11, 2025 • 0 new comments -
RandomSampler docs wrongly states type for data_source is Dataset, when it is Sized
#158631 commented on
Aug 11, 2025 • 0 new comments -
Execution of an `ExportProgram` of a model with `torch.autograd.grad(torch.sqrt(x), x, torch.ones_like(torch.sqrt(x)))` returns a `FakeTensor` instead of a real tensor
#155044 commented on
Aug 11, 2025 • 0 new comments -
DISABLED test_resnet (__main__.TestBlockStateAbsorption)
#157725 commented on
Aug 11, 2025 • 0 new comments -
DISABLED test_bitwise_print_precedence (__main__.ReproTests)
#156736 commented on
Aug 11, 2025 • 0 new comments -
DTensor Compile w/ Dynamic Shapes Autograd - Unhashable SymInt in sharding propagation when inputs have requires_grad=True
#159590 commented on
Aug 11, 2025 • 0 new comments -
Tensor roll with different shifts
#125422 commented on
Aug 11, 2025 • 0 new comments -
Fused Linear and Cross-Entropy Loss `torch.nn.functional.linear_cross_entropy`
#124480 commented on
Aug 11, 2025 • 0 new comments -
Parallel Associative Scan
#95408 commented on
Aug 11, 2025 • 0 new comments -
Wrong meta function for constant_pad_nd
#144187 commented on
Aug 11, 2025 • 0 new comments -
C10d Elastic Training master_addr ERROR
#74824 commented on
Aug 11, 2025 • 0 new comments -
torch.export seems to emit invalid code for Tensor.split when used with meta device
#154721 commented on
Aug 11, 2025 • 0 new comments -
MPS operator coverage tracking issue (2.6+ version)
#141287 commented on
Aug 11, 2025 • 0 new comments -
[vulkan] Vulkan backend fails creating tensor on x86_64 Linux
#72775 commented on
Aug 11, 2025 • 0 new comments