From 4962e4f20f77680f3f748db1921902cb7d3f47d6 Mon Sep 17 00:00:00 2001 From: Chris Abraham Date: Wed, 23 Apr 2025 18:14:55 -0700 Subject: [PATCH] blog edit Signed-off-by: Chris Abraham --- _posts/2025-04-23-pytorch-2-7.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/_posts/2025-04-23-pytorch-2-7.md b/_posts/2025-04-23-pytorch-2-7.md index cad9d2fd0ae8..1f31b9f2e6c3 100644 --- a/_posts/2025-04-23-pytorch-2-7.md +++ b/_posts/2025-04-23-pytorch-2-7.md @@ -41,13 +41,13 @@ This release is composed of 3262 commits from 457 contributors since PyTorch 2.6 - FlexAttention LLM first token processing on X86 CPUs + FlexAttention LLM first token processing on x86 CPUs - FlexAttention LLM throughput mode optimization on X86 CPUs + FlexAttention LLM throughput mode optimization on x86 CPUs @@ -135,9 +135,9 @@ For more information regarding Intel GPU support, please refer to [Getting Start See also the tutorials [here](https://pytorch.org/tutorials/prototype/inductor_windows.html) and [here](https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html). -### [Prototype] FlexAttention LLM first token processing on X86 CPUs +### [Prototype] FlexAttention LLM first token processing on x86 CPUs -FlexAttention X86 CPU support was first introduced in PyTorch 2.6, offering optimized implementations — such as PageAttention, which is critical for LLM inference—via the TorchInductor C++ backend. In PyTorch 2.7, more attention variants for first token processing of LLMs are supported. With this feature, users can have a smoother experience running FlexAttention on x86 CPUs, replacing specific *scaled_dot_product_attention* operators with a unified FlexAttention API, and benefiting from general support and good performance when using torch.compile. +FlexAttention x86 CPU support was first introduced in PyTorch 2.6, offering optimized implementations — such as PageAttention, which is critical for LLM inference—via the TorchInductor C++ backend. In PyTorch 2.7, more attention variants for first token processing of LLMs are supported. With this feature, users can have a smoother experience running FlexAttention on x86 CPUs, replacing specific *scaled_dot_product_attention* operators with a unified FlexAttention API, and benefiting from general support and good performance when using torch.compile. ### [Prototype] FlexAttention LLM throughput mode optimization