chore(deps): update dependency torch to v2.8.0 [security] #13405

renovate-bot · 2025-06-04T23:09:11Z

This PR contains the following updates:

Package	Change	Age	Confidence
torch (source)	`==2.2.2` -> `==2.8.0`

GitHub Vulnerability Alerts

CVE-2025-2953

A vulnerability, which was classified as problematic, has been found in PyTorch 2.6.0+cu124. Affected by this issue is the function torch.mkldnn_max_pool2d. The manipulation leads to denial of service. An attack has to be approached locally. The exploit has been disclosed to the public and may be used.

CVE-2025-3730

A vulnerability, which was classified as problematic, was found in PyTorch 2.6.0. Affected is the function torch.nn.functional.ctc_loss of the file aten/src/ATen/native/LossCTC.cpp. The manipulation leads to denial of service. An attack has to be approached locally. The exploit has been disclosed to the public and may be used. The name of the patch is 46fc5d8e360127361211cb237d5f9eef0223e567. It is recommended to apply a patch to fix this issue.

CVE-2025-32434

Description

I found a Remote Command Execution (RCE) vulnerability in PyTorch. When loading model using torch.load with weights_only=True, it can still achieve RCE.

Background knowledge

https://github.com/pytorch/pytorch/security
As you can see, the PyTorch official documentation considers using torch.load() with weights_only=True to be safe.

Since everyone knows that weights_only=False is unsafe, so they will use the weights_only=True to mitigate the seucirty issue.
But now, I just proved that even if you use weights_only=True, it can still achieve RCE.

Credit

This vulnerability was found by Ji'an Zhou.

Release Notes

pytorch/pytorch (torch)

`v2.8.0`: PyTorch 2.8.0 Release

Compare Source

PyTorch 2.8.0 Release Notes

Highlights

Unstable
torch::stable::Tensor
High-performance quantized LLM inference on Intel CPUs with native PyTorch
Experimental Wheel Variant Support
Inductor CUTLASS backend support
Inductor Graph Partition for CUDAGraph
Control Flow Operator Library
HuggingFace SafeTensors support in PyTorch Distributed Checkpointing
SYCL support in PyTorch CPP Extension API
A16W4 on XPU Device
Hierarchical compilation with torch.compile
Intel GPU distributed backend (XCCL) support

For more details about these highlighted features, you can look at the release blogpost.
Below are the full release notes for this release.

Tracked Regressions

Windows wheel builds with CUDA 12.9.1 stack overflow during build (#156181)

Due to a bug introduced in CUDA 12.9.1, we are unable to complete full Windows wheel builds with this
version, as compilation of torch.segment_reduce() crashes the build. Thus, we provide a wheel
without torch.segment_reduce() included in order to sidestep the issue. If you need support
for torch.segment_reduce(), please utilize a different version.

Backwards Incompatible Changes

CUDA Support

Removed support for Maxwell and Pascal architectures with CUDA 12.8 and 12.9 builds (#157517, #158478, #158744)

Due to binary size limitations, support for sm50 - sm60 architectures with CUDA 12.8 and 12.9 has
been dropped for the 2.8.0 release. If you need support for these architectures, please utilize
CUDA 12.6 instead.

Python Frontend

Calling an op with an input dtype that is unsupported now raises `NotImplementedError` instead of `RuntimeError` (#155470)

Please update exception handling logic to reflect this.

In 2.7.0

try:
    torch.nn.Hardshrink()(torch.randint(0, 5, (10,)))
except RuntimeError:
    ...

In 2.8.0

try:
    torch.nn.Hardshrink()(torch.randint(0, 5, (10,)))
except NotImplementedError:
    ...

Added missing in-place on view check to custom `autograd.Function` (#153094)

In 2.8.0, if a custom autograd.Function mutates a view of a leaf requiring grad,
it now properly raises an error. Previously, it would silently leak memory.

   class Func(torch.autograd.Function):
        @&#8203;staticmethod
        def forward(ctx, inp):
            inp.add_(1)
            ctx.mark_dirty(inp)
            return inp

        @&#8203;staticmethod
        def backward(ctx, gO):
            pass

    a = torch.tensor([1.0, 2.0], requires_grad=True)
    b = a.view_as(a)
    Func.apply(b)

Output:

Version 2.7.0

Runs without error, but leaks memory

Version 2.8.0

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation

An error is now properly thrown for the out variant of `tensordot` when called with a `requires_grad=True` tensor (#150270)

Please avoid passing an out tensor with requires_grad=True as gradients cannot be
computed for this tensor.

In 2.7.0

a = torch.empty((4, 2), requires_grad=True)
b = torch.empty((2, 4), requires_grad=True)
c = torch.empty((2, 2), requires_grad=True)

### does not error, but gradients for c cannot be computed
torch.tensordot(a, b, dims=([1], [0]), out=c)

In 2.8.0

a = torch.empty((4, 2), requires_grad=True)
b = torch.empty((2, 4), requires_grad=True)
c = torch.empty((2, 2), requires_grad=True)
torch.tensordot(a, b, dims=([1], [0]), out=c)

### RuntimeError: tensordot(): the 'out' tensor was specified and requires gradients, and
### its shape does not match the expected result. Either remove the 'out' argument, ensure

### it does not require gradients, or make sure its shape matches the expected output.

torch.compile

Specialization of a tensor shape with `mark_dynamic` applied now correctly errors (#152661)

Prior to 2.8, it was possible for a guard on a symbolic shape to be incorrectly
omitted if the symbolic shape evaluation was previously tested with guards
suppressed (this often happens within the compiler itself). This has been fixed
in 2.8 and usually will just silently "do the right thing" and add the correct
guard. However, if the new guard causes a tensor marked with mark_dynamic to become
specialized, this can result in an error. One workaround is to use
maybe_mark_dynamic instead of mark_dynamic.

See the discussion in issue #157921 for more
context.

Version 2.7.0

import torch

embed = torch.randn(2, 8192)
x = torch.zeros(8192)

torch._dynamo.mark_dynamic(x, 0)

@&#8203;torch.compile
def f(embedding_indices, x):
    added_tokens_mask = torch.where(x > 10000, 1, 0)
    ei = torch.narrow(embedding_indices, 1, 0, x.size(0))
    return ei.clone()

f(embed, x)

Version 2.8.0

import torch

embed = torch.randn(2, 8192)
x = torch.zeros(8192)

torch._dynamo.maybe_mark_dynamic(x, 0)

@&#8203;torch.compile
def f(embedding_indices, x):
    added_tokens_mask = torch.where(x > 10000, 1, 0)
    ei = torch.narrow(embedding_indices, 1, 0, x.size(0))
    return ei.clone()

f(embed, x)

Several config variables related to `torch.compile` have been renamed or removed

Dynamo config variable enable_cpp_framelocals_guard_eval has changed to no longer have any effect (#151008).
Inductor config variable rocm.n_max_profiling_configs is deprecated (#152341).
Instead, use ck-tile based configs rocm.ck_max_profiling_configs and
rocm.ck_tile_max_profiling_configs.
Inductor config variable autotune_fallback_to_aten is deprecated (#154331).
Inductor will no longer silently fall back to ATen. Please add "ATEN" to
max_autotune_gemm_backends for the old behavior.
Inductor config variables use_mixed_mm and mixed_mm_choice are deprecated (#152071). Inductor now supports prologue fusion, so there is no need for
special cases now.
Inductor config setting descriptive_names = False is deprecated (#151481). Please use one of the other available
options: "torch", "original_aten", or "inductor_node".
custom_op_default_layout_constraint has moved from inductor config to functorch config (#148104). Please reference it via
torch._functorch.config.custom_op_default_layout_constraint instead of
torch._inductor.config.custom_op_default_layout_constraint.
AOTI config variable emit_current_arch_binary is deprecated (#155768).
AOTI config variable aot_inductor.embed_cubin has been renamed to aot_inductor.embed_kernel_binary (#154412).
AOTI config variable aot_inductor.compile_wrapper_with_O0 has been renamed to compile_wrapper_opt_level (#148714).

Added a stricter aliasing/mutation check for `HigherOrderOperator`s (e.g. `cond`), which will explicitly error out if alias/mutation among inputs and outputs is unsupported (#148953, #146658).

For affected HigherOrderOperators, add .clone() to aliased outputs to address this.

Version 2.7.0

import torch

@&#8203;torch.compile(backend="eager")
def fn(x):
    return torch.cond(x.sum() > 0, lambda x: x, lambda x: x + 1, [x])

fn(torch.ones(3))

Version 2.8.0

import torch

@&#8203;torch.compile(backend="eager")
def fn(x):
    return torch.cond(x.sum() > 0, lambda x: x.clone(), lambda x: x + 1, [x])

fn(torch.ones(3))

`guard_or_x` and `definitely_x` have been consolidated (#152463)

We removed definitely_true / definitely_false and associated APIs, replacing them with
guard_or_true / guard_or_false, which offer similar functionality and can be used to
achieve the same effect. Please migrate to the latter.

Version 2.7.0

from torch.fx.experimental.symbolic_shapes import definitely_false, definitely_true

...
if definitely_true(x):
  ...

if definitely_false(y):
  ...

Version 2.8.0

from torch.fx.experimental.symbolic_shapes import guard_or_false, guard_or_true

...
if guard_or_false(x):
  ...

### alternatively: if guard_or_false(torch.sym_not(y))
if not guard_or_true(y):
  ...

torch.export

`torch.export.export_for_inference` has been removed in favor of `torch.export.export_for_training().run_decompositions()` (#149078)

Version 2.7.0

import torch

...
exported_program = torch.export.export_for_inference(mod, args, kwargs)

Version 2.8.0

import torch

...
exported_program = torch.export.export_for_training(
    mod, args, kwargs
).run_decompositions(decomp_table=decomp_table)

Switched default to `strict=False` in `torch.export.export` and `export_for_training` (#148790, #150941)

This differs from the previous release default of strict=True. To revert to the old default
behavior, please explicitly pass strict=True.

Version 2.7.0

import torch

### default behavior is strict=True
torch.export.export(...)
torch.export.export_for_training(...)

Version 2.8.0

import torch

### strict=True must be explicitly passed to get the old behavior
torch.export.export(..., strict=True)
torch.export.export_for_training(..., strict=True)

ONNX

Default opset in `torch.onnx.export` is now 18 (#156023)

When dynamo=False, the default ONNX opset version has been updated from 17 to 18. Users can set opset_version to explicitly select an opset version.

Version 2.7

### opset_version=17
torch.onnx.export(...)

Version 2.8

### To preserve the original behavior
torch.onnx.export(..., opset_version=17)

### New: opset_version=18
torch.onnx.export(...)

The `JitTraceConvertStrategy` has been removed (#152556)

Support for JIT traced and scripted modules in the ONNX exporter when dynamo=True has been removed. You are encouraged to export an nn.Module directly, or create an ExportedProgram using torch.export before exporting to ONNX.

`onnxscript>=0.3.1` is required for the `dynamo=True` option (#157017)

You must upgrade onnxscript to version 0.3.1 or higher for it to be compatible with PyTorch 2.8.

Build Frontend

Removed the `torch/types.h` include from `Dispatcher.h` (#149557)

This can cause build errors in C++ code that implicitly relies on this include (e.g. very old versions of torchvision).

Note that Dispatcher.h does not belong as an include from torch/types.h and was only present as a
short-term hack to appease torchvision. If you run into torchvision build errors, please
update to a more recent version of torchvision to resolve this.

Upgraded `DLPack` to 1.0 (#145000)

As part of the upgrade, some of the DLDeviceType enum values have been renamed. Please switch
to the new names.

Version 2.7.0

from torch.utils.dlpack import DLDeviceType

d1 = DLDeviceType.kDLGPU
d2 = DLDeviceType.kDLCPUPinned
...

Version 2.8.0

from torch.utils.dlpack import DLDeviceType

d1 = DLDeviceType.kDLCUDA  # formerly kDLGPU
d2 = DLDeviceType.kDLCUDAHost  # formerly kDLCPUPinned
...

NVTX3 code has been moved from `cmake/public/cuda.cmake` to `cmake/Dependencies.cmake` (#151583)

This is a BC-breaking change for the build system interface. Downstream projects that previously got NVTX3 through cmake/public/cuda.cmake
(i.e.. calling find_package(TORCH REQUIRED)) will now need to explicitly configure NVTX3 support in the library itself (i.e. use USE_SYSTEM_NVTX=1).
The change is to fix the broken behavior where downstream projects couldn't find NVTX3 anyway due to the PROJECT_SOURCE_DIR mismatch.

Version 2.7.0:

A downstream project using -DUSE_SYSTEM_NVTX would be able to find NVTX3 and torch::nvtx3 via PyTorch's cmake/public/cuda.cmake logic.
A downstream project NOT using -DUSE_SYSTEM_NVTX would encounter build errors with CUDA 12.8 or above.

Version 2.8.0:

A downstream project using -DUSE_SYSTEM_NVTX will not be able to find NVTX3 or torch::nvtx3 via PyTorch's cmake/public/cuda.cmake. The downstream project now needs to explicitly find NVTX3 and torch::nvtx3 by implementing the same logic in PyTorch's cmake/Dependences.cmake.
A downstream project NOT using -DUSE_SYSTEM_NVTX will proceed building without NVTX unless another part of the build process re-enables NVTX.

Deprecations

MPS support for MacOS Ventura will be removed in 2.9

PyTorch 2.8 is the last release that will support GPU acceleration on MacOS Ventura. In the next
release (2.9), MacOS Sonoma (released in Sept. 2023) or above will be required to use the MPS
backend.

`torch.ao.quantization` is deprecated and will be removed in 2.10 (#153892)

To migrate:

Eager mode quantization (torch.ao.quantization.quantize, torch.ao.quantization.quantize_dynamic)
- Weight-only and dynamic quantization: use torchao eager mode quantize_.
- Static quantization: use torchao PT2E quantization.
FX graph mode quantization (torch.ao.quantization.quantize_fx.prepare_fx, torch.ao.quantization.quantize_fx.convert_fx): use torchao PT2E quantization (torchao.quantization.quantize_pt2e.prepare_pt2e, torchao.quantization.quantize_pt2e.convert_pt2e).

Note that PT2E quantization has been migrated to torchao (https://github.com/pytorch/ao/tree/main/torchao/quantization/pt2e). See pytorch/ao#2259 and https://docs.pytorch.org/ao/main/quick_start.html#pytorch-2-export-quantization for more details.

The `dynamo=False` (current default) option for `torch.onnx.export` is deprecated (#152478, #155580)

The default will be dynamo=True starting from PyTorch 2.9. You are encouraged to migrate to use the dynamo=True option in torch.onnx.export. This flag makes torch.export.export the default export path, replacing TorchScript.

To maintain the old behavior, set dynamo=False explicitly. You are encouraged to also experiment with the fallback=True option that will make the exporter fall back to the dynamo=False path if there are errors.

New Features

CUDA

Support capture of event record and wait in CUDAGraphs for timing (#155372)

torch.compile

Dynamo

Added support for hierarchical compilation via nested_compile_region (#156449)
Allow guards to be dropped with custom filter functions via guard_filter_fn (#150936)
Added dont_skip_tracing decorator to skip over most Dynamo skipfiles rules (#150586)

Inductor

Added support for mapping a Dynamo graph to multiple different Inductor graphs, which can be optimized separately (#147648, #147038)

torch.export

Introduced draft-export, an export variant designed to consistently produce a graph and generate a debugging report of issues encountered during tracing (#152637, #153219, #149465, #153627, #154190, #155744, #150876, #150948, #151051, #151065, #150809, #151797)

Ahead-Of-Time Inductor (AOTI)

Added support for TorchBind objects (#150196, #154265)
Added config variable aot_inductor.model_name_for_generated_files for specifying model name (#154129)

MPS

MPSInductor: torch.compile for Apple GPUs (#150121, #149342, #151449, #151754, #149687, #149180, #149221, #153598, #152788, #153787, #152214, #151152, #155891, #154578, #151272, #151288, #153997, #151871, #153362, #156566, #150661, #153582)

ONNX

Added new strategy draft_export (#147529, docs) to provide debugging information upon data-dependent / constraint errors when obtaining an ExportedProgram with torch.onnx.export
Added support for symbolic operators in the dynamo=True export path (#148905, #149678, #150038, docs). Two operators torch.onnx.ops.symbolic and torch.onnx.ops.symbolic_multi_out are defined to allow you to create symbolic ONNX operators directly in your PyTorch models. You can use them in a forward method:

def forward(self, x: torch.Tensor) -> torch.Tensor:

### Optionally use is_in_onnx_export to control the behavior during onnx export

    if torch.onnx.is_in_onnx_export():

### Create a symbolic ONNX operator with the name "CustomOp" in the "custom_domain" domain.
### The output tensor will have the specified dtype and shape
        return torch.onnx.ops.symbolic(
            "custom_domain::CustomOp",
            (x,),
            dict(attr_key="attr_value"),
            dtype=x.dtype,
            shape=x.shape,
            version=1,
        )
    else:
        return x

Python Frontend

Added Generalized Pareto Distribution (GPD) (#135968)

Quantization

Introduced torch.float4_e2m1fn_x2 dtype (#148791)

XPU

Support Intel distributed backend (XCCL) (#141856)
Support SYCL kernels through C++ extension (#132945)

Improvements

Build Frontend

Removed outdated warning about TORCH_CUDA_ARCH_LIST (#152715, #155314)
Made Eigen an optional build dependency (#155955)
Updated CUTLASS to 3.9.2 (#152779)

Composability

Enhanced custom op support with serializable op profiles and fake registration overrides (#151817, #150807, #150806)

C++ Frontend

Exposed bicubic mode for torch::nn::functional::grid_sample (#150817)

CUDA

Introduced no_implicit_headers mode for load_inline() on custom CUDA extensions (#149480)
Support large batch sizes in SDPA memory-efficient attention backend (#154029, #154663)
Fixed invalid indexing in SDPA memory-efficient attention backward (#155397)
Support SDPA attention backends on sm121 (DGX Spark) (#152314)
Added FP8 row-wise scaled-mm for sm12x (GeForce Blackwell) (#155991)

cuDNN

Updated cuDNN frontend version to 1.12 (#153888)

Distributed

c10d

Enhanced TCPStore with clone and queuing features (#150966, #151045, #150969, #151485)
Added a collective time estimator for NCCL comms (#149343)
Made getDefaultBackend more fault tolerant without relying on exceptions (#149152)
Specified the default PyTorch Distributed backend for MPS (#149538)
Supported masterListenFd in TCPStoreLibUvBackend (#150215)
Used shared stores in gloo (#150230)
Improved FR dump robustness with all watchdog broadcast wait, reduce dump timeout and shrinked mutex range (#150652, #151329, #155949)
Added the record of each individual collective being coalesced in FR (#151238)
Implemented safer book-keeping of NCCL communicators (#150681)
Clarified behavior of TORCH_NCCL_USE_TENSOR_REGISTER_ALLOCATOR_HOOK (#150682)
Registered also future allocations in mempool with NCCL (#150684)
Avoided computing global_rank when group_rank is used (#151373)
Exposed NCCL communicator from ProcessGroupNCCL via an unsafe API (#152496)
Added split sizes info dump for uneven all2all bw calculation (#151438)
Made FR vendor neutral so that other backends can use it and integrated into gloo. (#152585, #152563, #154929, #152614)
Added needs_contiguous_strides tag in functional collective (#153399, #153523)
Allowed split_group to work with non-nccl backends (#152175)
Simplified new_subgroups() by using new_subgroups_by_enumeration() (#153843)
Made only current thread allocate to pool in ProcessGroupNCCL (#153990)
Enabled using c10::Half for gloo (#153862)
Released GIL in PG destructor (#154976)
Enhanced get_process_group_ranks() to accept group=None (#154902)
Skipped updating the default device distributed backend if already registered (#155320)
Enabled querying the build and runtime NCCL versions (#156305)
Disabled NCCL NVLS when using deterministic mode (#156381)
Made init_process_group support index-only device id (#156214)
Support enabling / disabling NaN detector per-ProcessGroup (#151723)
Added support for reduce_scatter and ReduceOp::AVG in ProcessGroupGloo (#149781, #149869)
Added FP8 support in ProcessGroupNCCL (#152706)
Added ibverbs backend in gloo and enabled gloo CUDA when used with a backend that supports GPUDirect (#153015, #153425, #153406)

DeviceMesh

Improved device selection logic (#150897)

DistributedDataParallel (DDP)

Added one option to allow skipping all reduce unused parameters (#151503)
Added check on received data to avoid segfault in the DDP reducer (#152143)
Propagated use_python_reducer to C++ reducer (#152735)
DistributedStateDict (DSD)
Supported non-tensor-data write_size in planner write items (#149699)
Skip popping meta device tensors (#153185)

DTensor

Made StridedShard support uneven sharding (#150490)
Added op support for torch.cumsum (#151071)
Added DTensor redistribute fwd/bwd datatype conversion to enable SimpleFSDP mixed precision training (#150740)
Added rich support to torch.distributed.tensor.debug.visualize_sharding (#152027)

FullyShardedDataParallel2 (FSDP2)

Added PrivateUse1 backend in FSDP collectives and device type to pre forward hook (#147260, #149487)
Added set_reshard_after_forward (#149103)
Allowed different dtypes for no grad model params (#154103)
Respected reshard_after_forward=True for root model and kept root unsharded when not specifying reshard_after_forward (#154704, #155319)
Allowed forcing FSDP2 to always use SUM reductions (#155915)
Made assert on all_reduce_event only if it's not CPU device (#150316)
Enabled NCCL zero-copy (user buffer registration) for FSDP2 (#150564)

Pipeline Parallelism

Added schedule visualizer (#150347)
Allowed unused kwargs in ZB path (#153498)
Added get_pipeline_order() for Gpipe and 1F1B (#155935)

ShardedTensor

Added support for 0-size ShardedTensor and recalculated metadata from all_gather (#152583)

TensorParallel

Added a ParallelStyle PrepareModuleInputOutput (#150372)

torchelastic

No shutdown of rendezvous on leaving workers (#152525)

torch.compile

Dynamo

Improved tracing support for python sets, tensor subclasses with __torch_function__, and namedtuple subclasses (#153150, #149792, #153982)
Eliminated all Compiled Autograd dynamic shapes recompiles for compile time reduction (#151962, #152119,
#151962, #149707, #149709,
#148799, #148801)
Added reason field to torch.compiler.disable (#150341)
Removed lru_cache warnings for functions in the top-level torch namespace (#157718)

Inductor

Added block sparse support for FlexAttention on CPU (#147196)
Introduced new config settings:
- aot_inductor.custom_ops_to_c_shims and aot_inductor.custom_op_libs: allow for specifying custom op C shim (#153968)
- max_fusion_buffer_group_pairwise_attempts: limits fusions to specified node distance (#154688)
- cuda.cutlass_enabled_ops: controls CUTLASS operation selection (#155770)
- triton.cudagraph_capture_sizes: allows specifying certain shapes for which to capture CUDAGraphs; skips CUDAGraphs for other shapes (#156551)
- use_static_cuda_launcher: enables launching compiled triton statically to improve cold start times (#148890)
- assume_unaligned_fallback_output: allows inductor to track unaligned outputs (#150777)
- cuda.cutlass_tma_only: controls whether or not to only use TMA-compatible kernels in CUTLASS (#152815)
- static_launch_user_defined_triton_kernels: enables statically launching user defined triton kernels (#153725)
- precompilation_timeout_seconds: controls the timeout on precompilation (#153788)
- disable_decompose_k: disables new DecomposeK GEMM Kernels (#154421)
- min_num_split: sets the minimum number of splits in a split reduction (#155941)
- max_autotune_flex_search_space: allows specifying the size of the search space for flex attention autotuning (#156307)
Introduced environment variable LOG_AUTOTUNE_RESULTS for autotune log (#156254)
Improved numerical stability of CPU Welford reduction for normalizations (#145061)

torch.export

Improved handling of builtin ops (min, max, math.pow) (#151348)
Added min/max ranges for dim hints (#149590)
Allow registering normal classes to pytree.register_dataclass (#147752)
Allow specifying integer inputs as dynamic (#151842)
Inline jit.scripted functions in export (#155180)
Pretty printing for graph signature (#149710)

Ahead-Of-Time Inductor (AOTI)

Support for device-side TMA (#157241)
Added num_runners to AOTIModelPackageLoader (#149364)

FX

Updated codegen compare op to == (#150611)
Map names to operand indices when const folding submodules (#150692)
Improved stacktrace when tracing (#151029, #155486)
Support edge dialect ops in normalize_function (#143689)
Fixed path naming in minifier (#153130)
Added graph_code_verbose_log artifact for FX passes (#153775)
Improved cache key graph printing performance (#151928)
Added flag to fx.passes.split_module to normalize input names (#157793)

Linear Algebra Frontend

Add tensor overlap check for cross (#154999)

MPS

Added support for a number of torch.special operations as well as index_copy, hardshrink, rsub, col2im, and isin (#149174, #149203 #149123, #149368, #149378, #149563, #149687, #149705, #149783, #149407/#149680, #150279, #151754, #153786, #154326, #155304, #156263, #155382, #154010, #149816, #152282, #156090, #150060, #151600, #155002, #154671)
Extended dtype support for:
- index_put with half precision floats (#151869)
- ConvTranspose3D with FP32 and complex (#154696)
- log1p and sigmoid with int64 (#151791)
Compute activation kernels at float precision (#155735)

Nested Tensor (NJT)

Fixed contiguity in NJT string representation (#153529)

torch.nn

Added warning for module full backward hook when no input requires gradient (#155339)
Added Half support for weight_norm on CPU (#148878)

ONNX

Updated ONNX to 1.18 (#152200)
Added support for opsets (18-23) when dynamo=True (#149901, #154596)
Added float4 support (#151069, #156353)
Added support for ONNX operators Attention-23 and RotaryEmbedding-23 as native PyTorch ops (#156431, #156367, #154745)
Added support for torch.scan (#154513)
Added support for 0/1-sized example inputs on dynamic dimensions (#155717)
Add group_norm support from opset 21 (#152138)
Added asdict method to VerificationInfo class (#151024)
Support running bfloat16 models with ONNX Runtime (#149646)
Updated ONNX program doc formatting and improve robustness (#151623)
Updated dynamic_shapes behavior to use torch.export.dim.DYNAMIC (#153065)
Set the name of the producing node using the value name (#155413)
Improved support for symbolic operators sym_float, sym_not, sym_min, sym_max (#153200, #152111, #152196)

Optimizer

Added TensorLR variant for fused Adagrad on CPU (#153078)
Convert tensor lr to 0-dim as needed for the optimizer to normally work (#145674)
Added lr_lambda type check in MultiplicativeLR (#151973)

Profiler

Added support for on-demand memory snapshot (#150559)
Added PT2 compile context to visualizer (#152862)
Added PT2 to memory snapshot (#152707)
Added flag to toggle global and local callbacks for annotations (#154932)
Pass overload names to Kineto (#149333)
Set duration to -1 for unfinished CPU events (#150131)
Start at index with most events (#154571)

Python Frontend

Introduced torch.AcceleratorError (#152023)
Implemented Size.__radd__() (#152554)
Updated get_default_device() to also respect torch.device context manager (#148621)

Quantization

Improved x86 PT2E quantization support with new uint8 ops (pointwise mul / add / add_relu and batch_norm2d), qconv1d-relu fusion, and lowering pass (#151112, #152411, #152811, #150751, #149708)
Support boolean tensor for torch.fused_moving_avg_obs_fake_quant on CUDA (#153699)

Release Engineering

Updated gcc11 to gcc13 in manylinux images (#152825, #152825, #150635, #158445)
Updated to cmake 3.27.2 (#154783, #150549, #153380)

ROCm

Allow user to override default flags for cpp_extension (#152432)
Enabled support for sparse compressed mm/bmm/addmm (#153262)

Sparse Frontend

Enabled sparse compressed tensor invariant checks for PrivateUse1 extension (#149374)

torch.func

Add batching rules for ops: torch.Tensor.scatter_add_ (#150543), torch.matrix_exp (#155202)

XPU

Support safe softmax, GQA, fp32 causal mask for SDP and increase maximum head dim from 256 to 576 on Intel GPU (#151999, #150992, #152091)
Add memory reporting to Memory Profiler for Intel GPU (#152842)
Support Intel GPU profiler toggle functionality (#155135)
Support distributed memory tracker integration for Intel GPU (#150703)
Improved error handling and reporting in Intel GPU CMake files (#149353)
Support embed_cubin and multi_arch_kernel_binary options in AOTI for Intel GPU (#154514, #153924)
Added generic and Intel GPU specific Stream and Event in UserDefineClass (#155787)
Support int4 WOQ GEMM on Intel GPU (#137566)

Bug Fixes

Build Frontend

Support builds with CMake-4.x (#150203)
Fixed fbgemm build with gcc-12+ (#150847)
Force build to conform to C++ standard on Windows by adding /permissive- flag (#149035)

Composability

Fixed support for 1-element tuple returns from custom ops (#155447)
Avoid overflow in torch.norm for scalar input (#144073)

CPU (x86)

Fixed apparent copy-paste bug in log_softmax reduced-precision fp kernel (#156379)

CUDA

Fixed deterministic indexing with broadcast (#154296)
Fixed torch.backends.cuda.matmul.allow_fp16_accumulation crash when using cuBLASLt (#153083)
Enable AsyncMM on Blackwell (#153519)
Fixed torch.cuda.MemPool for multithreaded use-cases (#153356)
Fix to avoid calling sum() on a default-constructed gamma / beta in layer_norm (#156600)
Avoid hangs by erroring out for negative offsets or K=0 in grouped GEMMs (#153226)
Don't error out in empty_cache under mempool context (#158180)

Distributed

c10d

Fixed extra CUDA context created by barrier (#149144)
Fixed the logic to use group rank instead of global rank when possible (#149488)
Fixed ET trace collection of all_to_all (#149485)
Disabled start event recording for coalesced col and improved profile title (#150863)
Fixed connection reset in tcp store (#150987, #151052)
Fixed unused group input argument in new_subgroups() (#152765, #153798)
Fixed tcp init when using port 0 (#154156)
Adopted a vector to temporarily keep the reference to future object to avoid blocking inside Flight Recorder (#156653)

Distributed Checkpointing (DCP)

Fixed to use global coordinator rank in broadcast_object util function (#155912)

DistributedDataParallel (DDP)

Fixed DDPOptimizer issue on static tensor index (#155746)

DTensor

Fixed local_map with multi-threading (#149070)
Fixed new_local_tensor in redistribute be None case (#152303)
Fixed bug visualizing 1D Tensor using rich (#152871)

Pipeline Parallelism

Optimized memory usage by releasing output memory earlier (#153383)

RPC

Made torch importable if compiled without `Tenso

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Never, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

gemini-code-assist

Hello @renovate-bot, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, opened by renovate-bot, aims to update the torch dependency in the dataflow/run-inference/requirements.txt file. The primary motivation for this update is to address recently disclosed security vulnerabilities in older versions of torch, specifically CVE-2025-32434 (a Remote Command Execution vulnerability) and CVE-2025-2953 (a Denial of Service vulnerability). The PR updates the version from 2.2.2 to 2.7.1 to incorporate the necessary security fixes.

Highlights

Security Fixes: This update is crucial as it includes fixes for significant security vulnerabilities (CVE-2025-32434 and CVE-2025-2953) found in previous versions of torch.

Changelog

dataflow/run-inference/requirements.txt
- Updated the torch dependency from version 2.2.2 to 2.7.1.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Dependency update,
Fixing bugs, making code safe,
Security patch.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request updates the torch dependency from version 2.2.2 to 2.7.1. This is an important update as it addresses two security vulnerabilities: CVE-2025-32434 (Remote Command Execution in torch.load) and CVE-2025-2953 (Denial of Service in torch.mkldnn_max_pool2d).

The change is confined to the requirements.txt file and correctly updates the version string. The Renovate bot's compatibility checks indicate that this new version should be compatible with the existing dependencies, which is reassuring.

Given the security implications of the older torch version, this update is highly recommended. The code in main.py (via Apache Beam's PytorchModelHandlerTensor) and download_model.py involves model loading and saving, so ensuring PyTorch is up-to-date with security patches is crucial.

Overall, this is a beneficial and necessary update.

Summary of Findings

Security Vulnerability Remediation: The primary purpose of this PR is to update torch to v2.7.1, which addresses critical security vulnerabilities (CVE-2025-32434 and CVE-2025-2953) present in the previous version (v2.2.2). This is a crucial improvement for the security posture of the application.
Dependency Update: The torch dependency has been updated from 2.2.2 to 2.7.1 in dataflow/run-inference/requirements.txt. This change is correctly implemented.

Merge Readiness

This pull request directly addresses known security vulnerabilities by updating the torch library. The change is minimal and appears to be compatible according to automated checks. I recommend merging this PR to enhance the security of the project. As I am an AI assistant, I am not authorized to approve pull requests; please ensure it undergoes any further necessary human review and testing procedures before merging.

renovate-bot requested review from a team as code owners June 4, 2025 23:09

trusted-contributions-gcf bot added kokoro:force-run Add this label to force Kokoro to re-run the tests. owlbot:run Add this label to trigger the Owlbot post processor. labels Jun 4, 2025

product-auto-label bot added the samples Issues that are directly related to samples. label Jun 4, 2025

gemini-code-assist bot reviewed Jun 4, 2025

View reviewed changes

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 4, 2025

gemini-code-assist bot reviewed Jun 4, 2025

View reviewed changes

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 8976e2a to 54dff54 Compare June 5, 2025 00:27

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 5, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 5, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 54dff54 to c3c3982 Compare June 11, 2025 04:44

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 11, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 11, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from c3c3982 to 6fd487d Compare June 11, 2025 21:19

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 11, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 11, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 6fd487d to 5010020 Compare June 12, 2025 04:41

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 12, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 12, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 5010020 to c062b4a Compare June 12, 2025 14:44

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 12, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 12, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from c062b4a to 9e87ef8 Compare June 13, 2025 00:13

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 13, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 13, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 9e87ef8 to 289f877 Compare June 13, 2025 08:53

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 13, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Jun 13, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 2, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 864a61e to a5c6a93 Compare September 2, 2025 10:44

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 2, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 2, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from a5c6a93 to 6332ab4 Compare September 2, 2025 18:17

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 2, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 2, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 6332ab4 to b8a21aa Compare September 3, 2025 02:24

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 3, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 3, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from b8a21aa to d77c93c Compare September 3, 2025 10:26

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 3, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 3, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from d77c93c to ba02696 Compare September 3, 2025 18:11

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 3, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 3, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from ba02696 to f896db9 Compare September 4, 2025 06:06

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 4, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 4, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from f896db9 to 12c842c Compare September 4, 2025 15:58

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 4, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 4, 2025

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from 12c842c to ea02eeb Compare September 4, 2025 20:05

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 4, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 4, 2025

chore(deps): update dependency torch to v2.8.0 [security]

73e67d8

renovate-bot force-pushed the renovate/pypi-torch-vulnerability branch from ea02eeb to 73e67d8 Compare September 5, 2025 04:45

trusted-contributions-gcf bot added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 5, 2025

renovate-bot changed the title ~~chore(deps): update dependency torch [security]~~ chore(deps): update dependency torch to v2.8.0 [security] Sep 5, 2025

kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(deps): update dependency torch to v2.8.0 [security] #13405

chore(deps): update dependency torch to v2.8.0 [security] #13405

renovate-bot commented Jun 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chore(deps): update dependency torch to v2.8.0 [security] #13405

Are you sure you want to change the base?

chore(deps): update dependency torch to v2.8.0 [security] #13405

Conversation

renovate-bot commented Jun 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Vulnerability Alerts

CVE-2025-2953

CVE-2025-3730

CVE-2025-32434

Description

Background knowledge

Credit

Release Notes

v2.8.0: PyTorch 2.8.0 Release

PyTorch 2.8.0 Release Notes

Highlights

Tracked Regressions

Windows wheel builds with CUDA 12.9.1 stack overflow during build (#​156181)

Backwards Incompatible Changes

CUDA Support

Removed support for Maxwell and Pascal architectures with CUDA 12.8 and 12.9 builds (#​157517, #​158478, #​158744)

Python Frontend

Calling an op with an input dtype that is unsupported now raises NotImplementedError instead of RuntimeError (#​155470)

Added missing in-place on view check to custom autograd.Function (#​153094)

An error is now properly thrown for the out variant of tensordot when called with a requires_grad=True tensor (#​150270)

torch.compile

Specialization of a tensor shape with mark_dynamic applied now correctly errors (#​152661)

Several config variables related to torch.compile have been renamed or removed

Added a stricter aliasing/mutation check for HigherOrderOperators (e.g. cond), which will explicitly error out if alias/mutation among inputs and outputs is unsupported (#​148953, #​146658).

guard_or_x and definitely_x have been consolidated (#​152463)

torch.export

torch.export.export_for_inference has been removed in favor of torch.export.export_for_training().run_decompositions() (#​149078)

Switched default to strict=False in torch.export.export and export_for_training (#​148790, #​150941)

ONNX

Default opset in torch.onnx.export is now 18 (#​156023)

The JitTraceConvertStrategy has been removed (#​152556)

onnxscript>=0.3.1 is required for the dynamo=True option (#​157017)

Build Frontend

Removed the torch/types.h include from Dispatcher.h (#​149557)

Upgraded DLPack to 1.0 (#​145000)

NVTX3 code has been moved from cmake/public/cuda.cmake to cmake/Dependencies.cmake (#​151583)

Deprecations

MPS support for MacOS Ventura will be removed in 2.9

torch.ao.quantization is deprecated and will be removed in 2.10 (#​153892)

The dynamo=False (current default) option for torch.onnx.export is deprecated (#​152478, #​155580)

New Features

CUDA

torch.compile

Dynamo

Inductor

torch.export

Ahead-Of-Time Inductor (AOTI)

MPS

ONNX

Python Frontend

Quantization

XPU

Improvements

Build Frontend

Composability

C++ Frontend

CUDA

cuDNN

Distributed

c10d

DeviceMesh

DistributedDataParallel (DDP)

DTensor

FullyShardedDataParallel2 (FSDP2)

Pipeline Parallelism

ShardedTensor

TensorParallel

torchelastic

torch.compile

Dynamo

Inductor

torch.export

Ahead-Of-Time Inductor (AOTI)

FX

renovate-bot commented Jun 4, 2025 •

edited

Loading

`v2.8.0`: PyTorch 2.8.0 Release

Windows wheel builds with CUDA 12.9.1 stack overflow during build (#156181)

Removed support for Maxwell and Pascal architectures with CUDA 12.8 and 12.9 builds (#157517, #158478, #158744)

Calling an op with an input dtype that is unsupported now raises `NotImplementedError` instead of `RuntimeError` (#155470)

Added missing in-place on view check to custom `autograd.Function` (#153094)

An error is now properly thrown for the out variant of `tensordot` when called with a `requires_grad=True` tensor (#150270)

Specialization of a tensor shape with `mark_dynamic` applied now correctly errors (#152661)

Several config variables related to `torch.compile` have been renamed or removed

Added a stricter aliasing/mutation check for `HigherOrderOperator`s (e.g. `cond`), which will explicitly error out if alias/mutation among inputs and outputs is unsupported (#148953, #146658).

`guard_or_x` and `definitely_x` have been consolidated (#152463)

`torch.export.export_for_inference` has been removed in favor of `torch.export.export_for_training().run_decompositions()` (#149078)

Switched default to `strict=False` in `torch.export.export` and `export_for_training` (#148790, #150941)

Default opset in `torch.onnx.export` is now 18 (#156023)

The `JitTraceConvertStrategy` has been removed (#152556)

`onnxscript>=0.3.1` is required for the `dynamo=True` option (#157017)

Removed the `torch/types.h` include from `Dispatcher.h` (#149557)

Upgraded `DLPack` to 1.0 (#145000)

NVTX3 code has been moved from `cmake/public/cuda.cmake` to `cmake/Dependencies.cmake` (#151583)

`torch.ao.quantization` is deprecated and will be removed in 2.10 (#153892)

The `dynamo=False` (current default) option for `torch.onnx.export` is deprecated (#152478, #155580)