diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 908ececa7f86..23a8f5ffbad9 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -8,7 +8,7 @@ on: jobs: tests: - uses: pytorch/test-infra/.github/workflows/linux_job.yml@main + uses: pytorch/test-infra/.github/workflows/linux_job_v2.yml@main secrets: inherit with: runner: linux.12xlarge diff --git a/.github/workflows/update-quick-start-module.yml b/.github/workflows/update-quick-start-module.yml index 704d6643cbbb..7d070eb7ff8a 100644 --- a/.github/workflows/update-quick-start-module.yml +++ b/.github/workflows/update-quick-start-module.yml @@ -63,7 +63,7 @@ jobs: update-quick-start: needs: [linux-nightly-matrix, windows-nightly-matrix, macos-arm64-nightly-matrix, linux-release-matrix, windows-release-matrix, macos-arm64-release-matrix] - runs-on: "ubuntu-20.04" + runs-on: "ubuntu-latest" environment: pytorchbot-env steps: - name: Checkout pytorch.github.io diff --git a/CNAME b/CNAME index c101f6da020d..583993f7b85f 100644 --- a/CNAME +++ b/CNAME @@ -1 +1 @@ -pytorch.org \ No newline at end of file +docs.pytorch.org diff --git a/_community_blog/optimize-llms.md b/_community_blog/optimize-llms.md new file mode 100644 index 000000000000..e0ecb819ac05 --- /dev/null +++ b/_community_blog/optimize-llms.md @@ -0,0 +1,8 @@ +--- +title: "Optimize LLMs for Efficiency & Sustainability" +ext_url: /blog/optimize-llms/ +date: Feb 19, 2025 +author: "Zach Lasiuk, Arm" +--- + +The rapid growth of large language model (LLM) applications is linked to rapid growth in energy demand. According to the International Energy Agency (IEA), data center electricity consumption is projected to roughly double by 2026 primarily driven by AI. This is due to the energy-intensive training requirements for massive LLMs – however, the increase in AI Inferencing workloads also plays a role. For example, compared with traditional search queries, a single AI inference can consume about [10x more energy](https://www.weforum.org/stories/2024/07/generative-ai-energy-emissions/). diff --git a/_community_blog/pt-fedora-os-communities.md b/_community_blog/pt-fedora-os-communities.md new file mode 100644 index 000000000000..ec37d275c4a5 --- /dev/null +++ b/_community_blog/pt-fedora-os-communities.md @@ -0,0 +1,9 @@ +--- +title: "Powering AI with PyTorch, Fedora, and Open Source Communities" +author: Sudhir Dharanendraiah +ext_url: /blog/pt-fedora-os-communities/ +date: Mar 7, 2025 +--- + +At [DevConf.IN 2025](https://www.devconf.info/in/) in Pune, I had the opportunity to host a **[PyTorch Meetup](https://pretalx.devconf.info/devconf-in-2025/talk/W3YURM/)** on February 28th. The session, titled "**Powering AI with PyTorch, Fedora, and Open Source Communities**" was aimed at introducing PyTorch to students and professionals, explaining why **PyTorch+Fedora** form an ideal AI development platform. The other key aspect I covered was collaboration between open source communities. + diff --git a/_community_blog/pytorch-at-gtc.md b/_community_blog/pytorch-at-gtc.md new file mode 100644 index 000000000000..da3632fa17fe --- /dev/null +++ b/_community_blog/pytorch-at-gtc.md @@ -0,0 +1,8 @@ +--- +title: "PyTorch at GTC 2025" +author: "Team PyTorch at NVIDIA" +ext_url: /blog/pytorch-at-gtc/ +date: Mar 16, 2025 +--- + +[GTC](https://www.nvidia.com/gtc/) is coming back to San Jose on March 17–21, 2025. Join PyTorch Foundation members Arm, AWS, Google Cloud, IBM, Lightning AI, Meta, Microsoft Azure, Snowflake, and thousands of developers as we celebrate PyTorch. Together learn how AI & accelerated computing are helping humanity solve our most complex challenges. diff --git a/_community_blog/sglang-joins-pytorch.md b/_community_blog/sglang-joins-pytorch.md new file mode 100644 index 000000000000..6a05a4714873 --- /dev/null +++ b/_community_blog/sglang-joins-pytorch.md @@ -0,0 +1,8 @@ +--- +title: "SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine" +author: "SGLang Team" +ext_url: /blog/sglang-joins-pytorch/ +date: Mar 19, 2025 +--- + +We’re thrilled to announce that the SGLang project has been integrated into the PyTorch ecosystem! This integration ensures that SGLang aligns with PyTorch’s standards and practices, providing developers with a reliable and community-supported framework for fast and flexible serving of LLMs. \ No newline at end of file diff --git a/_community_stories/57.md b/_community_stories/57.md new file mode 100644 index 000000000000..7e717dfd000b --- /dev/null +++ b/_community_stories/57.md @@ -0,0 +1,8 @@ +--- +title: 'How IBM Research Uses PyTorch and TerraTorch to Make Geospatial Computer Vision Accessible for Everyone' +ext_url: /blog/how-ibm-uses-pt-terratorch/ +date: May 1, 2025 +tags: ["Computer Vision"] +--- + +Geospatial computer vision is essential for understanding our planet — from monitoring deforestation to tracking urban development and analyzing the impacts of climate change. However, the coding and deep learning skills for applying AI models to satellite imagery and earth observation data has traditionally been a major barrier for many practitioners. diff --git a/_events/autonomous-language-model-systems.md b/_events/autonomous-language-model-systems.md new file mode 100644 index 000000000000..8532258afef0 --- /dev/null +++ b/_events/autonomous-language-model-systems.md @@ -0,0 +1,23 @@ +--- +category: event +title: "Towards Autonomous Language Model Systems" +date: May 21, 2025 +poster: assets/images/pt-day-cfp.png +--- + + +Towards Autonomous Language Model Systems + + +**Date**: May 21, 2025, 11AM PT / 2PM ET +**Location**: Online + +Language models (LMs) are increasingly used to assist users in day-to-day tasks such as programming (Github Copilot) or search (Google's AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end? + +In this talk, Ofir Press will discuss efforts to build autonomous LM systems, focusing on the software engineering domain. Ofir will present SWE-bench, a novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. Ofir will then discuss SWE-agent, a system for solving SWE-bench tasks. + +SWE-bench and SWE-agent are used by many leading AI organizations in academia and industry, including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets can have a substantial impact in steering the research community toward building autonomous systems that can complete challenging tasks. + +Ofir is a postdoc at Princeton University, where they mainly work with Karthik Narasimhan's lab. Ofir previously completed their PhD at the University of Washington in Seattle, where Ofir was advised by Noah Smith. During their PhD, Ofir spent two years at Facebook AI Research Labs on Luke Zettlemoyer's team. + +[Register Now](/autonomous-language-model-systems) diff --git a/_events/ce1.md b/_events/ce1.md new file mode 100644 index 000000000000..94c9e66165d9 --- /dev/null +++ b/_events/ce1.md @@ -0,0 +1,14 @@ +--- +category: event +title: "COLING 2025" +date: Jan 19, 2025 +--- +Community Event + +**Date**: Jan 19 - 25, 2025 + +COLING, the International Conference on Computational Linguistics, is one of the premier conferences for the natural language processing and computational linguistics. + +First established in 1965, the biennial COLING conference is held in diverse parts of the globe and attracts participants from both top-ranked research centers and emerging countries. Today, the most important developments in our field are taking place not only in universities and academic research institutes but also in industrial research departments including tech-startups. COLING provides opportunities for all these communities to showcase their exciting discovery. + +[Learn more about this event](https://coling2025.org/) \ No newline at end of file diff --git a/_events/ce10.md b/_events/ce10.md new file mode 100644 index 000000000000..67d9e00f66f8 --- /dev/null +++ b/_events/ce10.md @@ -0,0 +1,13 @@ +--- +category: event +title: "PyCon 2025" +date: May 14, 2025 +--- +Community Event + +**Date**: May 15 - 22, 2025 +**Location**: Pittsburgh, PA + +At PyCon US 2025, find a program filled with pre-conference tutorials and sponsor presentations, 90+ of our community’s best talks, which includes the Charlas track, brilliant keynote speakers, posters on display, a lively Expo Hall filled with incredible Sponsors’ booths, and famed lightning talks on each main conference day. + +[Learn more about this event](https://us.pycon.org/2025/) diff --git a/_events/ce11.md b/_events/ce11.md new file mode 100644 index 000000000000..7cc0095a96cd --- /dev/null +++ b/_events/ce11.md @@ -0,0 +1,15 @@ +--- +category: event +title: "Gamesbeat Summit 2025" +date: May 19, 2025 +--- +Community Event + +**Date**: May 19 - 20, 2025 +**Location**: Los Angeles, CA + +The gaming industry is on the cusp of a transformative era, driven by innovation, cultural impact, and new economic opportunities. At GamesBeat Summit 2025, explore how creative storytelling, community engagement, and effective business strategies that are shaping the future of gaming industry. + +Delve into the diverse influences—ranging from player experiences to industry collaborations—that are paving the way for the next phase of growth. + +[Learn more about this event](https://gbs.venturebeat.com/) diff --git a/_events/ce12.md b/_events/ce12.md new file mode 100644 index 000000000000..d2ea93af6df7 --- /dev/null +++ b/_events/ce12.md @@ -0,0 +1,13 @@ +--- +category: event +title: "NYC Tech Week" +date: Jun 2, 2025 +--- +Community Event + +**Date**: Jun 2 - 8, 2025 +**Location**: New York City + +Tech Week is a decentralized tech conference presented by a16z. Every Tech Week, hundreds of events take place across the host city - from hackathons to panel events, community meetups and more. Every event is organized individually by startups, companies and VCs. + +[Learn more about this event](https://www.tech-week.com/) diff --git a/_events/ce14.md b/_events/ce14.md new file mode 100644 index 000000000000..fcfab07f890f --- /dev/null +++ b/_events/ce14.md @@ -0,0 +1,13 @@ +--- +category: event +title: "Data + AI Summit" +date: Jun 9, 2025 +--- +Community Event + +**Date**: Jun 9 - 12, 2025 +**Location**: San Francisco, CA + +Join 20,000 peers for 700+ sessions, keynotes and training at the world’s largest data, analytics and AI conference. + +[Learn more about this event](https://www.databricks.com/dataaisummit) diff --git a/_events/ce15.md b/_events/ce15.md new file mode 100644 index 000000000000..e85a7403d1e8 --- /dev/null +++ b/_events/ce15.md @@ -0,0 +1,13 @@ +--- +category: event +title: "CVPR 2025" +date: Jun 10, 2025 +--- +Community Event + +**Date**: Jun 10 - 17, 2025 +**Location**: Nashville, TN + +The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. With its high quality and low cost, it provides an exceptional value for students, academics and industry researchers. + +[Learn more about this event](https://cvpr.thecvf.com/) diff --git a/_events/ce16.md b/_events/ce16.md new file mode 100644 index 000000000000..eda670bc7191 --- /dev/null +++ b/_events/ce16.md @@ -0,0 +1,13 @@ +--- +category: event +title: "We are Developers Conference" +date: Jul 9, 2025 +--- +Community Event + +**Date**: Jul 9 - 11, 2025 +**Location**: Berlin, Germany + +Join the largest gathering of software innovators, tech leaders, and decision-makers shaping the future of AI-powered technology. + +[Learn more about this event](https://www.wearedevelopers.com/world-congress) diff --git a/_events/ce17.md b/_events/ce17.md new file mode 100644 index 000000000000..ded03e328983 --- /dev/null +++ b/_events/ce17.md @@ -0,0 +1,13 @@ +--- +category: event +title: "ICML 2025" +date: Jul 13, 2025 +--- +Community Event + +**Date**: Jul 13 - 19, 2025 +**Location**: Berlin, Germany + +Forty-Second International Conference on Machine Learning. + +[Learn more about this event](https://icml.cc/) diff --git a/_events/ce18.md b/_events/ce18.md new file mode 100644 index 000000000000..dd61d8531f90 --- /dev/null +++ b/_events/ce18.md @@ -0,0 +1,13 @@ +--- +category: event +title: "SIGGRAPH 2025" +date: Aug 10, 2025 +--- +Community Event + +**Date**: Aug 10 - 14, 2025 +**Location**: Vancouver, B.C. + +[ACM SIGGRAPH](https://www.siggraph.org/) is a special interest group (SIG) devoted to computer graphics (GRAPH) within the [Association for Computing Machinery](https://www.acm.org/) (ACM), the world’s largest educational and scientific computing society devoted to advancing computing as a science and a profession. Its annual conference, first held in 1974, is the premier conference on computer graphics and interactive techniques worldwide. At SIGGRAPH 2025, we boldly look toward the future, imagining how humanity and technology will be increasingly connected and examining how we can create a future that connects our physical and digital worlds for the better. + +[Learn more about this event](https://s2025.siggraph.org/) diff --git a/_events/ce19.md b/_events/ce19.md new file mode 100644 index 000000000000..2e9625dd9a67 --- /dev/null +++ b/_events/ce19.md @@ -0,0 +1,13 @@ +--- +category: event +title: "San Francisco Tech Week" +date: Oct 6, 2025 +--- +Community Event + +**Date**: Oct 6 - 12, 2025 +**Location**: San Francisco + +Tech Week is a decentralized tech conference presented by a16z. Every Tech Week, hundreds of events take place across the host city - from hackathons to panel events, community meetups and more. Every event is organized individually by startups, companies and VCs. + +[Learn more about this event](https://www.tech-week.com/) diff --git a/_events/ce2.md b/_events/ce2.md new file mode 100644 index 000000000000..f0857e44a475 --- /dev/null +++ b/_events/ce2.md @@ -0,0 +1,15 @@ +--- +category: event +title: "Open Source AI Summit" +date: Jan 22, 2025 +--- +Community Event + +**Date**: Jan 22, 2025 +**Location**: Paris, France + +Open Source AI has become a major trend in the industry, with even many digital giants adopting an Open Source approach. While Open Source AI isn't magic, it does offer the potential to address many challenges more effectively than proprietary AI models. + +This first edition of the Paris Open Source AI Summit will bring together global leaders and industry players to address these issues. The summit will aim to establish a common set of ideas, vocabulary and definitions to create a shared understanding of the current state of Open Source AI. + +[Learn more about this event](https://opensourceaisummit.eu/#rec838155366) diff --git a/_events/ce20.md b/_events/ce20.md new file mode 100644 index 000000000000..de0a07092616 --- /dev/null +++ b/_events/ce20.md @@ -0,0 +1,13 @@ +--- +category: event +title: "LA Tech Week" +date: Oct 13, 2025 +--- +Community Event + +**Date**: Oct 13 - 19, 2025 +**Location**: Los Angeles, CA + +Tech Week is a decentralized tech conference presented by a16z. Every Tech Week, hundreds of events take place across the host city - from hackathons to panel events, community meetups and more. Every event is organized individually by startups, companies and VCs. + +[Learn more about this event](https://www.tech-week.com/) diff --git a/_events/ce21.md b/_events/ce21.md new file mode 100644 index 000000000000..c7b0e5dae932 --- /dev/null +++ b/_events/ce21.md @@ -0,0 +1,13 @@ +--- +category: event +title: "ICCV 2025" +date: Oct 20, 2025 +--- +Community Event + +**Date**: Oct 20 - 24, 2025 +**Location**: Honolulu, HI + +International Conference on Computer Vision, ICCV 2025. + +[Learn more about this event](https://iccv.thecvf.com/) diff --git a/_events/ce22.md b/_events/ce22.md new file mode 100644 index 000000000000..07ef894b515a --- /dev/null +++ b/_events/ce22.md @@ -0,0 +1,15 @@ +--- +category: event +title: "Open Source AI Week" +date: Oct 18, 2025 +--- +Community Event + +**Date**: Oct 18 - 26, 2025 +**Location**: San Francisco, CA + +Open Source AI Week is the premier event that brings together the best AI and ML conferences, hackathons, startup showcases, and networking opportunities exploring the intersection of artificial intelligence, machine learning, and open source technology. Taking place between October 18 – 26, 2025 in San Francisco area. This week-long celebration is dedicated to fostering innovation, collaboration, and community-driven solutions in the rapidly evolving AI landscape, featuring the PyTorch Conference as the flagship event. + +[Submit your event](https://linuxfoundation.research.net/r/FD6JMH5) to be included in Open Source AI Week, and check back mid-May to see the Open Source AI Week event lineup! + +[Learn more about this event](https://events.linuxfoundation.org/open-source-ai-week/) diff --git a/_events/ce23.md b/_events/ce23.md new file mode 100644 index 000000000000..e06dedf1e645 --- /dev/null +++ b/_events/ce23.md @@ -0,0 +1,13 @@ +--- +category: event +title: "NeurIPS 2025" +date: Dec 7, 2025 +--- +Community Event + +**Date**: Dec 7 - 10, 2025 +**Location**: San Diego, CA + +The Thirty-Ninth Annual Conference on Neural Information Processing Systems. + +[Learn more about this event](https://neurips.cc/) diff --git a/_events/ce24.md b/_events/ce24.md new file mode 100644 index 000000000000..d08216a6e078 --- /dev/null +++ b/_events/ce24.md @@ -0,0 +1,15 @@ +--- +category: event +title: "ECCV 2026" +date: Sep 9, 2025 +--- +Community Event + +**Date**: Sep 9 - 13, 2026 +**Location**: Malmö, Sweden + +ECCV is the official event under the European Computer Vision Association and is biannual on even numbered years. Any other event trying to utilize this title is not a sanctioned event. + +The European Conference on Computer Vision (ECCV) is a biennial premier research conference in Computer Vision and Machine Learning, managed by the [European Computer Vision Association (ECVA)](https://www.ecva.net/). It is held on even years and gathers the scientific and industrial communities on these areas. The first ECCV was held in 1990 in Antibes, France, and subsequently organized all over Europe. Paper proceedings are published by [Springer Science+Business Media](https://en.wikipedia.org/wiki/Springer_Science%2BBusiness_Media). + +[Learn more about this event](https://eccv.ecva.net/) diff --git a/_events/ce25.md b/_events/ce25.md new file mode 100644 index 000000000000..2d9d6d02d568 --- /dev/null +++ b/_events/ce25.md @@ -0,0 +1,11 @@ +--- +category: event +title: "GOSIM AI" +date: May 6, 2025 +--- +Community Event + +**Date**: May 6 - 7, 2025 +**Location**: Paris, France + +[Learn more about this event](https://paris2025.gosim.org/) diff --git a/_events/ce26.md b/_events/ce26.md new file mode 100644 index 000000000000..328b0fd3d870 --- /dev/null +++ b/_events/ce26.md @@ -0,0 +1,13 @@ +--- +category: event +title: "PyTorch ATX Community Meetup" +date: April 30, 2025 +--- +Community Event + +**Date**: April 30, 2025 +**Location**: Austin, TX + +The Triton framework provides a hardware agnostic way of programming and targeting GPUs. As Triton becomes more widely adopted, it will be essential in understanding how to write, optimize and troubleshoot the Triton kernel in order to optimize GPU efficiency for algorithms. Join the PyTorch community meetup to learn how Red Hat, Intel, AMD, IBM Research and University of Texas are working on developing Triton kernels. + +[Learn more about this event](https://meetu.ps/e/NYlm0/qrnF8/i) diff --git a/_events/ce3.md b/_events/ce3.md new file mode 100644 index 000000000000..9a4e195afee3 --- /dev/null +++ b/_events/ce3.md @@ -0,0 +1,15 @@ +--- +category: event +title: "Open Source Forum" +date: Feb 13, 2025 +--- +Community Event + +**Date**: Feb 13, 2025 +**Location**: Los Angeles, CA + +The Academy Software Foundation’s (ASWF) annual Open Source Forum brings together Foundation members and select guests from the motion picture and media industries to collaborate and discuss the future of open source software. + +Open Source Forum 2025 features a new format to better enable open dialogue and interactive discussion. Hosted at Walt Disney Animation Studios in Burbank, CA, the half-day event will kick off with several presentations around the anatomy of a studio, emerging technologies impacting studios, and open source opportunities, followed by a moderated discussion. + +[Learn more about this event](https://events.linuxfoundation.org/aswf-open-source-forum/) diff --git a/_events/ce4.md b/_events/ce4.md new file mode 100644 index 000000000000..1b1063abf142 --- /dev/null +++ b/_events/ce4.md @@ -0,0 +1,13 @@ +--- +category: event +title: "AAAI Conference on AI" +date: Feb 25, 2025 +--- +Community Event + +**Date**: Feb 25 - Mar 4, 2025 +**Location**: Philadelphia, PA + +The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced. + +[Learn more about this event](https://aaai.org/conference/aaai/) diff --git a/_events/ce5.md b/_events/ce5.md new file mode 100644 index 000000000000..6be2a635a465 --- /dev/null +++ b/_events/ce5.md @@ -0,0 +1,13 @@ +--- +category: event +title: "Nvidia GTC 2025" +date: Mar 17, 2025 +--- +Community Event + +**Date**: Mar 17 - 21, 2025 +**Location**: San Jose, CA + +Nvidia's GTC 2025, a global AI conference for developers, showcased advancements in AI, robotics, and data centers, with key announcements including the Blackwell Ultra AI chip and the Vera Rubin architecture. + +[Learn more about this event](https://www.nvidia.com/gtc/) diff --git a/_events/ce6.md b/_events/ce6.md new file mode 100644 index 000000000000..1a45335fedf1 --- /dev/null +++ b/_events/ce6.md @@ -0,0 +1,15 @@ +--- +category: event +title: "LF Member Summit" +date: Mar 18, 2025 +--- +Community Event + +**Date**: Mar 18 - 20, 2025 +**Location**: Napa, CA + +The Linux Foundation Member Summit is the annual gathering for Linux Foundation member organizations. + +An annual gathering for Linux Foundation members that fosters collaboration, innovation, and partnerships among the leading projects and organizations working to drive digital transformation with open source technologies. It is a must-attend for business and technical leaders looking to advance open source strategy, implementation, and investment in their organizations and learn how to collaboratively manage the largest shared technology investment of our time. + +[Learn more about this event](https://events.linuxfoundation.org/lf-member-summit/) diff --git a/_events/ce7.md b/_events/ce7.md new file mode 100644 index 000000000000..37a87c50453f --- /dev/null +++ b/_events/ce7.md @@ -0,0 +1,15 @@ +--- +category: event +title: "ICLR 2025" +date: Apr 24, 2025 +--- +Community Event + +**Date**: Apr 24 - 28, 2025 +**Location**: Singapore + +The International Conference on Learning Representations (ICLR) is the premier gathering of professionals dedicated to the advancement of the branch of artificial intelligence called representation learning, but generally referred to as deep learning. + +ICLR is globally renowned for presenting and publishing cutting-edge research on all aspects of deep learning used in the fields of artificial intelligence, statistics and data science, as well as important application areas such as machine vision, computational biology, speech recognition, text understanding, gaming, and robotics. + +[Learn more about this event](https://iclr.cc/) diff --git a/_events/ce8.md b/_events/ce8.md new file mode 100644 index 000000000000..13d99e29d4bc --- /dev/null +++ b/_events/ce8.md @@ -0,0 +1,15 @@ +--- +category: event +title: "Dubai AI Festival" +date: Apr 23, 2025 +--- +Community Event + +**Date**: Apr 23 - 24, 2025 +**Location**: Dubai, UAE + +At Dubai AI Festival, attendees will experience the convergence of artificial intelligence, blockchain, XR, decentralised systems, driving the progression of digital economies and technological innovation. + +This dynamic platform is designed to foster collaboration, innovation, and knowledge-sharing among industry leaders, entrepreneurs, and tech enthusiasts from around the world. Join us to engage with the future of technology at Dubai AI Festival. + +[Learn more about this event](https://dubaiaifestival.com/) diff --git a/_events/ce9.md b/_events/ce9.md new file mode 100644 index 000000000000..99bfe5b69ed9 --- /dev/null +++ b/_events/ce9.md @@ -0,0 +1,13 @@ +--- +category: event +title: "MLSys" +date: May 12, 2025 +--- +Community Event + +**Date**: May 12 - 15, 2025 +**Location**: Santa Clara, CA + +The Eighth Annual Conference on Machine Learning and Systems + +[Learn more about this event](https://mlsys.org/) diff --git a/_events/devcon-meetup.md b/_events/devcon-meetup.md new file mode 100644 index 000000000000..a93c10cd4c6b --- /dev/null +++ b/_events/devcon-meetup.md @@ -0,0 +1,10 @@ +--- +category: event +title: "PyTorch Meetup at DevConf.IN 2025" +date: Feb 28, 2025 +--- + +**Date**: Feb 28, 2025 +**Location**: Pune, India + +[Event Blog](https://pytorch.org/blog/pt-fedora-os-communities/) \ No newline at end of file diff --git a/_events/docathon-2025.md b/_events/docathon-2025.md new file mode 100644 index 000000000000..88bc55a52724 --- /dev/null +++ b/_events/docathon-2025.md @@ -0,0 +1,16 @@ +--- +category: event +title: "Docathon 2025" +date: Jun 3, 2025 +--- + +**Date**: June 3-18, 2025 +**Location**: Online + + +PyTorch Docathon + + +The PyTorch Docathon 2025, akin to a hackathon, is an event dedicated to enhancing the quality of the PyTorch documentation with the invaluable assistance of our community. This is an inclusive event designed to be accessible to all levels of expertise, from newcomers to experienced ML/PyTorch users. It offers a rewarding experience as participants can see the direct impact of their contributions on the project's usability and accessibility. The Docathon promotes a collaborative environment, allowing participants to work with other contributors and PyTorch maintainers, fostering the exchange of ideas and networking. It also provides a rich learning experience, offering the opportunity to explore PyTorch modules, update docstrings, and test tutorials. + +[RSVP Now](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/) \ No newline at end of file diff --git a/_events/kr-conf.md b/_events/kr-conf.md new file mode 100644 index 000000000000..2acc9671a2a9 --- /dev/null +++ b/_events/kr-conf.md @@ -0,0 +1,12 @@ +--- +category: event +title: "PyTorch KR Conference" +date: March 30, 2025 +--- + +**Date**: March 30, 2025, 13:00 ~ 18:00 +**Location**: Seoul, Republic of Korea + +Hear from speakers from the PyTorch Foundation, Meta, FuriosaAI, Lablup, Nota AI, Rebellions, etc. + +[Event Info](https://event-us.kr/pytorchkr/event/100142) \ No newline at end of file diff --git a/_events/multi-modal-dl-frame.md b/_events/multi-modal-dl-frame.md index 0add0e3d9b68..ed2539f2d0d0 100644 --- a/_events/multi-modal-dl-frame.md +++ b/_events/multi-modal-dl-frame.md @@ -11,8 +11,8 @@ poster: assets/images/multi-modal-dl-frame.png Multi-Modal Tabular Deep Learning with PyTorch Frame -In this talk, Akihiro introduces PyTorch Frame, a modular framework for multi-modal tabular deep learning. PyTorch Frame enables seamless integration with the PyTorch ecosystem, including PyTorch Geometric for graph-based message passing across relational data and Hugging Face Transformers for extracting rich text features. The talk also highlights its specialized data structures for efficiently handling sparse features, making PyTorch Frame an essential tool for modern tabular data. +In this talk, Akihiro introduced PyTorch Frame, a modular framework for multi-modal tabular deep learning. PyTorch Frame enables seamless integration with the PyTorch ecosystem, including PyTorch Geometric for graph-based message passing across relational data and Hugging Face Transformers for extracting rich text features. The talk also highlights its specialized data structures for efficiently handling sparse features, making PyTorch Frame an essential tool for modern tabular data. Akihiro Nitta is a software engineer on the ML team at Kumo.ai and a core contributor to PyTorch Frame and PyTorch Geometric, with prior experience as a maintainer of PyTorch Lightning. -[Register now to join the event](/multi-modal-dl-frame) +[Learn more about the event](/multi-modal-dl-frame) diff --git a/_events/pt-27-release-qa.md b/_events/pt-27-release-qa.md new file mode 100644 index 000000000000..d1e75363137e --- /dev/null +++ b/_events/pt-27-release-qa.md @@ -0,0 +1,25 @@ +--- +category: event +title: "PyTorch 2.7 Release Live Q&A" +date: Apr 28, 2025 +poster: assets/images/pt27qa.png +--- + + +PyTorch 2.7 Release Q&A + + +**Date**: April 28, 12 pm PT +**Speakers**: Piotr Bialecki (NVIDIA) and Nikita Shulga (Meta) +**Location**: Online + +Have questions about PyTorch 2.7? Join PyTorch Core Maintainers Piotr Bialecki (NVIDIA) and Nikita Shulga (Meta) for a live Q&A session on Monday, April 28 at 12 PM PST. + +Piotr joined the PyTorch team at NVIDIA in 2019 and currently manages the team. He drives NVIDIA’s effort in maintaining and advancing PyTorch’s CUDA backend and received the PyTorch SUPERHERO award in 2023 for his community contributions, especially in the PyTorch discussion board. As a Core Maintainer, he is also focused on PyTorch’s long-term vision and development. + +Nikita is a Software Engineer at Meta where, among other things, he is responsible for PyTorch releases and continuous integration. Nikita is committed to uplifting the developer community and continuously improving PyTorch. He earned a Master’s degree in Applied Mathematics from the Moscow Institute of Physics and Technology (MIPT). + +Bring your PyTorch 2.7 questions for Piotr & Nikita during this live Q&A session. + +[Learn more about this event](/pt-27-release-qa) + diff --git a/_events/pt-day-china-2025.md b/_events/pt-day-china-2025.md new file mode 100644 index 000000000000..a8cb293c7fb8 --- /dev/null +++ b/_events/pt-day-china-2025.md @@ -0,0 +1,18 @@ +--- +category: event +title: "PyTorch Day China 2025" +date: June 7, 2025 +--- + + +PyTorch Day China 2025 + + +**Date:** June 7, 2025 +**Location:** Beijing, China + +PyTorch Day China 2025, proudly hosted by the PyTorch Foundation, is the premier gathering dedicated to open-source AI and machine learning innovation. Scheduled for June 7th in Beijing, China and co-located with the BAAI Conference, this community-driven event provides an unparalleled platform for PyTorch enthusiasts, machine learning engineers, AI researchers, and industry professionals. + +Immerse yourself in a vibrant day of insightful technical talks, interactive discussions, and engaging poster sessions designed to foster knowledge exchange and collaboration. PyTorch Day China is your gateway to connecting with leading experts and peers in the open-source AI community, offering you unique opportunities to explore cutting-edge advancements and shape the future of deep learning. + +[Read more about the event](https://www.lfasiallc.com/pytorch-day-china/) \ No newline at end of file diff --git a/_events/pt-day-france-2025.md b/_events/pt-day-france-2025.md new file mode 100644 index 000000000000..09b44cb627cd --- /dev/null +++ b/_events/pt-day-france-2025.md @@ -0,0 +1,18 @@ +--- +category: event +title: "PyTorch Day France 2025: Registration Open" +date: May 7, 2025 +poster: assets/images/pt-day-cfp.png +--- + + +PyTorch Day France 2025 + + +**Date**: May 7, 2025 +**Location**: Paris, France + +PyTorch Day France 2025, proudly hosted by the PyTorch Foundation, is the premier gathering dedicated to open-source AI and machine learning innovation. Scheduled for 7 May in Paris, France and co-located with the GOSIM AI Paris, this community-driven event provides an unparalleled platform for PyTorch enthusiasts, machine learning engineers, AI researchers, and industry professionals. +Immerse yourself in a vibrant day of insightful technical talks, interactive discussions, and engaging poster sessions designed to foster knowledge exchange and collaboration. PyTorch Day France is your gateway to connecting with leading experts and peers in the open-source AI community, offering you unique opportunities to explore cutting-edge advancements and shape the future of deep learning. + +[Register Now](https://events.linuxfoundation.org/pytorch-day-france/) diff --git a/_events/pt-dinov2-multi-label-plant-species-classification.md b/_events/pt-dinov2-multi-label-plant-species-classification.md new file mode 100644 index 000000000000..f4b7edede489 --- /dev/null +++ b/_events/pt-dinov2-multi-label-plant-species-classification.md @@ -0,0 +1,18 @@ +--- +category: event +title: "Using PyTorch and DINOv2 for Multi-label Plant Species Classification" +date: March 27 +poster: assets/images/pt-dinov2-multi-label-plant-species-classification.png +--- + +**Date**: March 27th, 12 PM PST + + +Using PyTorch and DINOv2 for Multi-label Plant Species Classification + + +Join us for an engaging webinar on our innovative transfer learning approach using self-supervised Vision Transformers (DINOv2) for multi-label plant species classification in the PlantCLEF 2024 challenge. We’ll cover how we efficiently extract feature embeddings from a dataset of 1.4 million images and utilize PyTorch Lightning for model training and Apache Spark for data management. Learn about our image processing techniques, including transforming images into grids of tiles and aggregating predictions to overcome computational challenges. Discover the significant performance improvements achieved and get insights into multi-label image classification. Perfect for PyTorch developers, this session will include a Q&A and access to our complete codebase at [github.com/dsgt-kaggle-clef/plantclef-2024](https://github.com/dsgt-kaggle-clef/plantclef-2024). + +Murilo Gustineli is a Senior AI Software Solutions Engineer at Intel, and is currently pursuing a Master’s in Computer Science at Georgia Tech focusing on machine learning. His work involves creating synthetic datasets, fine-tuning large language models, and training multi-modal models using Intel® Gaudi® Al accelerators as part of the Development Enablement team. He is particularly interested in deep learning, information retrieval, and biodiversity research, aiming to improve species identification and support conservation efforts. + +[Learn more about the event](/pt-dinov2-multi-label-plant-species-classification) diff --git a/_get_started/installation/linux.md b/_get_started/installation/linux.md index 35deaa0cde02..7461e2dfcd26 100644 --- a/_get_started/installation/linux.md +++ b/_get_started/installation/linux.md @@ -40,26 +40,10 @@ If you decide to use APT, you can run the following command to install it: sudo apt install python ``` -> If you use [Anaconda](#anaconda) to install PyTorch, it will install a sandboxed version of Python that will be used for running PyTorch applications. - ### Package Manager {: #linux-package-manager} -To install the PyTorch binaries, you will need to use one of two supported package managers: [Anaconda](https://www.anaconda.com/download/#linux) or [pip](https://pypi.org/project/pip/). Anaconda is the recommended package manager as it will provide you all of the PyTorch dependencies in one, sandboxed install, including Python. - -#### Anaconda - -To install Anaconda, you will use the [command-line installer](https://www.anaconda.com/download/#linux). Right-click on the 64-bit installer link, select `Copy Link Location`, and then use the following commands: - -```bash -# The version of Anaconda may be different depending on when you are installing` -curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -sh Miniconda3-latest-Linux-x86_64.sh -# and follow the prompts. The defaults are generally good.` -``` - -> You may have to open a new terminal or re-source your `~/.bashrc `to get access to the `conda` command. - +To install the PyTorch binaries, you will need to use the supported package manager: [pip](https://pypi.org/project/pip/). #### pip *Python 3* @@ -75,24 +59,6 @@ sudo apt install python3-pip ## Installation {: #linux-installation} -### Anaconda -{: #linux-anaconda} - -#### No CUDA/ROCm - -To install PyTorch via Anaconda, and do not have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) or [ROCm-capable](https://rocm.docs.amd.com/) system or do not require CUDA/ROCm (i.e. GPU support), in the above selector, choose OS: Linux, Package: Conda, Language: Python and Compute Platform: CPU. -Then, run the command that is presented to you. - -#### With CUDA - -To install PyTorch via Anaconda, and you do have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) system, in the above selector, choose OS: Linux, Package: Conda and the CUDA version suited to your machine. Often, the latest CUDA version is better. -Then, run the command that is presented to you. - -#### With ROCm - -PyTorch via Anaconda is not supported on ROCm currently. Please use pip instead. - - ### pip {: #linux-pip} @@ -148,7 +114,7 @@ For the majority of PyTorch users, installing from a pre-built binary via a pack ### Prerequisites {: #linux-prerequisites-2} -1. Install [Anaconda](#anaconda) or [Pip](#pip) +1. Install [Pip](#pip) 2. If you need to build PyTorch with GPU support a. for NVIDIA GPUs, install [CUDA](https://developer.nvidia.com/cuda-downloads), if your machine has a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus). b. for AMD GPUs, install [ROCm](https://rocm.docs.amd.com/), if your machine has a [ROCm-enabled GPU](https://rocm.docs.amd.com/) diff --git a/_get_started/installation/mac.md b/_get_started/installation/mac.md index 5294b865a38c..d3803a1ee278 100644 --- a/_get_started/installation/mac.md +++ b/_get_started/installation/mac.md @@ -1,7 +1,7 @@ # Installing on macOS {:.no_toc} -PyTorch can be installed and used on macOS. Depending on your system and GPU capabilities, your experience with PyTorch on a Mac may vary in terms of processing time. +PyTorch can be installed and used on macOS. Depending on your system and GPU capabilities, your experience with PyTorch on macOS may vary in terms of processing time. ## Prerequisites {: #mac-prerequisites} @@ -14,24 +14,13 @@ PyTorch is supported on macOS 10.15 (Catalina) or above. {: #mac-python} It is recommended that you use Python 3.9 - 3.12. -You can install Python either through the Anaconda -package manager (see [below](#anaconda)), [Homebrew](https://brew.sh/), or +You can install Python either through [Homebrew](https://brew.sh/) or the [Python website](https://www.python.org/downloads/mac-osx/). ### Package Manager {: #mac-package-manager} -To install the PyTorch binaries, you will need to use one of two supported package managers: [pip](https://pypi.org/project/pip/) or [Anaconda](https://www.anaconda.com/download/#macos). -#### Anaconda - -To install Anaconda, you can [download graphical installer](https://www.anaconda.com/download/#macos) or use the command-line installer. If you use the command-line installer, you can right-click on the installer link, select `Copy Link Address`, or use the following commands on Mac computer with Apple silicon: - -```bash -# The version of Anaconda may be different depending on when you are installing` -curl -O https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh -sh Miniconda3-latest-MacOSX-arm64.sh -# and follow the prompts. The defaults are generally good.` -``` +To install the PyTorch binaries, you will need to use the supported package manager: [pip](https://pypi.org/project/pip/). #### pip *Python 3* @@ -43,19 +32,10 @@ If you installed Python via Homebrew or the Python website, `pip` was installed ## Installation {: #mac-installation} -### Anaconda -{: #mac-anaconda} - -To install PyTorch via Anaconda, use the following conda command: - -```bash -conda install pytorch torchvision -c pytorch -``` - ### pip {: #mac-pip} -To install PyTorch via pip, use one of the following two commands, depending on your Python version: +To install PyTorch via pip, use the following command, depending on your Python version: ```bash # Python 3.x @@ -91,7 +71,7 @@ For the majority of PyTorch users, installing from a pre-built binary via a pack ### Prerequisites {: #mac-prerequisites-2} -1. [Optional] Install [Anaconda](#anaconda) +1. [Optional] Install [pip](https://pypi.org/project/pip/) 2. Follow the steps described here: [https://github.com/pytorch/pytorch#from-source](https://github.com/pytorch/pytorch#from-source) You can verify the installation as described [above](#mac-verification). diff --git a/_get_started/installation/windows.md b/_get_started/installation/windows.md index 2d0ec5c041f5..8000e9cddbc6 100644 --- a/_get_started/installation/windows.md +++ b/_get_started/installation/windows.md @@ -24,9 +24,6 @@ As it is not installed by default on Windows, there are multiple ways to install * [Chocolatey](https://chocolatey.org/) * [Python website](https://www.python.org/downloads/windows/) -* [Anaconda](#anaconda) - -> If you use Anaconda to install PyTorch, it will install a sandboxed version of Python that will be used for running PyTorch applications. > If you decide to use Chocolatey, and haven't installed Chocolatey yet, ensure that you are running your command prompt as an administrator. @@ -39,12 +36,7 @@ choco install python ### Package Manager {: #windows-package-manager} -To install the PyTorch binaries, you will need to use at least one of two supported package managers: [Anaconda](https://www.anaconda.com/download/#windows) and [pip](https://pypi.org/project/pip/). Anaconda is the recommended package manager as it will provide you all of the PyTorch dependencies in one, sandboxed install, including Python and `pip.` - -#### Anaconda - -To install Anaconda, you will use the [64-bit graphical installer](https://www.anaconda.com/download/#windows) for PyTorch 3.x. Click on the installer link and select `Run`. Anaconda will download and the installer prompt will be presented to you. The default options are generally sane. - +To install the PyTorch binaries, you will need to use the supported package manager: [pip](https://pypi.org/project/pip/). #### pip If you installed Python by any of the recommended ways [above](#windows-python), [pip](https://pypi.org/project/pip/) will have already been installed for you. @@ -52,22 +44,6 @@ If you installed Python by any of the recommended ways [above](#windows-python), ## Installation {: #windows-installation} -### Anaconda -{: #windows-anaconda} - -To install PyTorch with Anaconda, you will need to open an Anaconda prompt via `Start | Anaconda3 | Anaconda Prompt`. - -#### No CUDA - -To install PyTorch via Anaconda, and do not have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) system or do not require CUDA, in the above selector, choose OS: Windows, Package: Conda and CUDA: None. -Then, run the command that is presented to you. - -#### With CUDA - -To install PyTorch via Anaconda, and you do have a [CUDA-capable](https://developer.nvidia.com/cuda-zone) system, in the above selector, choose OS: Windows, Package: Conda and the CUDA version suited to your machine. Often, the latest CUDA version is better. -Then, run the command that is presented to you. - - ### pip {: #windows-pip} @@ -126,7 +102,7 @@ For the majority of PyTorch users, installing from a pre-built binary via a pack ### Prerequisites {: #windows-prerequisites-2} -1. Install [Anaconda](#anaconda) +1. Install [pip](https://pypi.org/project/pip/) 2. Install [CUDA](https://developer.nvidia.com/cuda-downloads), if your machine has a [CUDA-enabled GPU](https://developer.nvidia.com/cuda-gpus). 3. If you want to build on Windows, Visual Studio with MSVC toolset, and NVTX are also needed. The exact requirements of those dependencies could be found out [here](https://github.com/pytorch/pytorch#from-source). 4. Follow the steps described here: [https://github.com/pytorch/pytorch#from-source](https://github.com/pytorch/pytorch#from-source) diff --git a/_get_started/mobile.md b/_get_started/mobile.md index 2a640293144c..d709ee61e2f8 100644 --- a/_get_started/mobile.md +++ b/_get_started/mobile.md @@ -1,6 +1,6 @@ --- layout: get_started -title: ExecuTorch +title: PyTorch for Edge permalink: /get-started/executorch/ background-class: get-started-background body-class: get-started @@ -10,11 +10,29 @@ published: true ## Get Started with PyTorch ExecuTorch -

- - ExecuTorch Documentation - -

+PyTorch’s edge specific library is [ExecuTorch](https://github.com/pytorch/executorch/) and is designed to be lightweight, very performant even on devices with constrained hardware such as mobile phones, embedded systems and microcontrollers. + +ExecuTorch relies heavily on PyTorch core technologies such as [torch.compile](https://pytorch.org/docs/stable/torch.compiler.html) and [torch.export](https://pytorch.org/docs/stable/export.html), and should be very familiar to anyone who has used PyTorch in the past. + +### Getting Started +You can get started by following the [general getting started guide](https://pytorch.org/executorch/stable/getting-started.html#) or jump to the specific steps for your target device. + +* [Using ExecuTorch on Android](https://pytorch.org/executorch/stable/using-executorch-android.html) +* [Using ExecuTorch on iOS](https://pytorch.org/executorch/stable/using-executorch-ios.html) +* [Using ExecuTorch with C++](https://pytorch.org/executorch/stable/using-executorch-cpp.html) + +### Hardware Acceleration +ExecuTorch provides out of the box hardware acceleration for a growing number of chip manufacturers. See the following resources to learn more about how to leverage them: + +* [Backend Overview](https://pytorch.org/executorch/stable/backends-overview.html) +* [XNNPACK](https://pytorch.org/executorch/stable/backends-xnnpack.html) +* [Core ML](https://pytorch.org/executorch/stable/backends-coreml.html) +* [MPS](https://pytorch.org/executorch/stable/backends-mps.html) +* [Vulkan](https://pytorch.org/executorch/stable/backends-vulkan.html) +* [ARM Ethos-U](https://pytorch.org/executorch/stable/backends-arm-ethos-u.html) +* [Qualcomm AI Engine](https://pytorch.org/executorch/stable/backends-qualcomm.html) +* [MediaTek](https://pytorch.org/executorch/stable/backends-mediatek.html) +* [Cadence Xtensa](https://pytorch.org/executorch/stable/backends-cadence.html) diff --git a/_get_started/previous-versions.md b/_get_started/previous-versions.md index e8456fe12968..d86ae87de17e 100644 --- a/_get_started/previous-versions.md +++ b/_get_started/previous-versions.md @@ -17,6 +17,33 @@ your convenience. ## Commands for Versions >= 1.0.0 +### v2.6.0 + +#### Wheel + +##### OSX + +``` +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 +``` + +##### Linux and Windows + +``` +# ROCM 6.1 (Linux only) +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/rocm6.1 +# ROCM 6.2.4 (Linux only) +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/rocm6.2.4 +# CUDA 11.8 +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu118 +# CUDA 12.4 +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124 +# CUDA 12.6 +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126 +# CPU only +pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cpu +``` + ### v2.5.1 #### Conda diff --git a/_includes/footer.html b/_includes/footer.html index 4e1ada721a59..a74402d61751 100644 --- a/_includes/footer.html +++ b/_includes/footer.html @@ -87,7 +87,6 @@

Resources

- {% include google_pixel.html %} {% include mobile_menu.html %} diff --git a/_includes/head.html b/_includes/head.html index 06be73f6c60f..b86b1e202467 100644 --- a/_includes/head.html +++ b/_includes/head.html @@ -34,7 +34,6 @@ {% if jekyll.environment == 'production' %} - {% include analytics.html %} {% include pixel.html %} {% include twitter_pixel.html %} {% endif %} diff --git a/_includes/header.html b/_includes/header.html index cd3d2370eddd..45c484e1845e 100644 --- a/_includes/header.html +++ b/_includes/header.html @@ -1,6 +1,6 @@
- Join us in Silicon Valley September 18-19 at the 2024 PyTorch Conference. Learn more. + Join us at PyTorch Conference in San Francisco, October 22-23. CFP open now! Learn more.
diff --git a/_includes/main_menu.html b/_includes/main_menu.html index a5f41f2f51f6..46cc727fedf5 100644 --- a/_includes/main_menu.html +++ b/_includes/main_menu.html @@ -26,6 +26,9 @@ Intro to PyTorch - YouTube Series

Master PyTorch basics with our engaging YouTube tutorial series

+ + New to PyTorch Foundation +
@@ -36,10 +39,13 @@ Ecosystem
- + Tools

Learn about the tools and frameworks in the PyTorch Ecosystem

+ + Join the Ecosystem + Community

Join the PyTorch developer community to contribute, learn, and get your questions answered.

diff --git a/_includes/mobile_menu.html b/_includes/mobile_menu.html index e9ce82726284..70e11e57ec2a 100644 --- a/_includes/mobile_menu.html +++ b/_includes/mobile_menu.html @@ -42,13 +42,19 @@
  • Introduction to PyTorch - YouTube Series
  • +
  • + New to PyTorch Foundation +
  • Ecosystem
  • diff --git a/_layouts/blog_detail.html b/_layouts/blog_detail.html index 9b3726de5552..eb80011a163b 100644 --- a/_layouts/blog_detail.html +++ b/_layouts/blog_detail.html @@ -7,7 +7,7 @@
    - Join us in Silicon Valley September 18-19 at the 2024 PyTorch Conference. Learn more. + Join us at PyTorch Conference in San Francisco, October 22-23. CFP open now! Learn more.
    diff --git a/_posts/2019-07-18-pytorch-ecosystem.md b/_posts/2019-07-18-pytorch-ecosystem.md index 7351cbbd9d4f..1be05469bb83 100644 --- a/_posts/2019-07-18-pytorch-ecosystem.md +++ b/_posts/2019-07-18-pytorch-ecosystem.md @@ -41,7 +41,7 @@ When we review project submissions for the PyTorch ecosystem, we take into accou 5. *Ongoing maintenance:* Project authors need to be committed to supporting and maintaining their projects. 6. *Community:* Projects should have (or be on track to building) an active, broad-based community. -If you would like to have your project included in the PyTorch ecosystem and featured on [pytorch.org/ecosystem](http://pytorch.org/ecosystem), please complete the form [here](https://pytorch.org/ecosystem/join). If you've previously submitted a project for consideration and haven't heard back, we promise to get back to you as soon as we can - we've received a lot of submissions! +If you would like to have your project included in the PyTorch ecosystem and featured on [pytorch.org/ecosystem](http://pytorch.org/ecosystem), please complete the form [here](https://github.com/pytorch-fdn/ecosystem). If you've previously submitted a project for consideration and haven't heard back, we promise to get back to you as soon as we can - we've received a lot of submissions! ## PyTorch Hub for reproducible research | New models diff --git a/_posts/2024-05-11-enhancing-deep-learning.md b/_posts/2024-05-11-enhancing-deep-learning.md index fc5af1bc3c57..456ba8b9e658 100644 --- a/_posts/2024-05-11-enhancing-deep-learning.md +++ b/_posts/2024-05-11-enhancing-deep-learning.md @@ -8,7 +8,7 @@ Welcome to the thriving PyTorch ecosystem, where a wealth of tools and libraries Initially, PyTorch aimed to establish a thriving community, enabling developers to access each other's tools, engage in meaningful discussions, and explore the wealth of resources available within the community. -Today, the PyTorch ecosystem has grown to feature over 100 projects tailored to your needs, providing robust support, enhanced speed, and effortless integration with PyTorch. If your project aligns with our mission, we invite you to [submit](https://pytorch.org/ecosystem/join) it and join this dynamic ecosystem. +Today, the PyTorch ecosystem has grown to feature over 100 projects tailored to your needs, providing robust support, enhanced speed, and effortless integration with PyTorch. If your project aligns with our mission, we invite you to [submit](https://github.com/pytorch-fdn/ecosystem) it and join this dynamic ecosystem. New this month, we’ve moved all of our Ecosystem blogs over to our PyTorch.org website to host a space where our community can show off the latest innovations with our users. Read on to hear about the latest projects in the ecosystem! @@ -94,7 +94,7 @@ Our diverse ecosystem tools are instrumental in PyTorch's success.. They provid Leveraging these tools empowers developers and researchers to accelerate their deep learning workflows and unlock new possibilities in the field of AI. -Have a tool that would be a good fit for the [PyTorch Ecosystem](https://pytorch.org/ecosystem/)? If you can answer the below questions, we’d love for you to [submit your tool for review](https://pytorch.org/ecosystem/join). +Have a tool that would be a good fit for the [PyTorch Ecosystem](https://pytorch.org/ecosystem/)? If you can answer the below questions, we’d love for you to [submit your tool for review](https://github.com/pytorch-fdn/ecosystem). diff --git a/_posts/2024-12-23-2024-year-in-review.md b/_posts/2024-12-23-2024-year-in-review.md index f9bae5c6e48c..4b972e0c4c4d 100644 --- a/_posts/2024-12-23-2024-year-in-review.md +++ b/_posts/2024-12-23-2024-year-in-review.md @@ -40,7 +40,7 @@ Throughout the year the PyTorch Team has been working hard to introduce a number We’ve also had a number of strong technical showcases throughout the year to highlight how PyTorch can be used! [TorchTitan](https://arxiv.org/html/2410.06511v1) exhibited what an open source, PyTorch-native distributed training system could look like for training large language models (LLMs). [TorchChat](https://pytorch.org/blog/torchchat-local-llm-inference/) showcased how to seamlessly and performantly run LLMs across laptop, desktop, and mobile devices. -As well we were very excited to include [multiple new projects](https://pytorch.org/blog/enhancing-deep-learning/) into the PyTorch ecosystem throughout 2024, including the introduction of [vLLM](https://pytorch.org/blog/vllm-joins-pytorch/) into the PyTorch Ecosystem, a state-of-the-art inference engine, which gives machine learning engineers an easy, fast, and cheap way of serving LLMs. If you are interested in joining the PyTorch Ecosystem, please [join](https://pytorch.org/ecosystem/join)! +As well we were very excited to include [multiple new projects](https://pytorch.org/blog/enhancing-deep-learning/) into the PyTorch ecosystem throughout 2024, including the introduction of [vLLM](https://pytorch.org/blog/vllm-joins-pytorch/) into the PyTorch Ecosystem, a state-of-the-art inference engine, which gives machine learning engineers an easy, fast, and cheap way of serving LLMs. If you are interested in joining the PyTorch Ecosystem, please [join](https://github.com/pytorch-fdn/ecosystem)! ![people at a conference](/assets/images/2024-year-in-review/fg4.jpg){:style="width:100%"} diff --git a/_posts/2025-01-28-2025-priorities-for-tac.md b/_posts/2025-01-28-2025-priorities-for-tac.md index 197c39dbacff..8e55be0b3338 100644 --- a/_posts/2025-01-28-2025-priorities-for-tac.md +++ b/_posts/2025-01-28-2025-priorities-for-tac.md @@ -22,4 +22,4 @@ In 2025, the TAC will focus on four key areas: By focusing on these priorities, the TAC aims to maintain PyTorch’s position as the leading deep learning framework, while ensuring it remains open, accessible, and responsive to the needs of its diverse community. -As members of the TAC, we’re extremely excited to contribute to the success of PyTorch and to the impact it’s having in the real world. If you are a PyTorch user or developer, consider [participating in our monthly calls](https://zoom-lfx.platform.linuxfoundation.org/meetings/pytorch?__hstc=132719121.a26416c161ac91bef494ffc19f91a62e.1723036593114.1738082449904.1738088158683.375&__hssc=132719121.1.1738088158683&__hsfp=810579359) (they are open to everyone, and the recordings are available [here](https://lists.pytorch.org/g/tac)). Also, if you develop or maintain a project based on PyTorch, consider contributing it to the new PyTorch ecosystem ([instructions](https://pytorch.org/ecosystem/join)). \ No newline at end of file +As members of the TAC, we’re extremely excited to contribute to the success of PyTorch and to the impact it’s having in the real world. If you are a PyTorch user or developer, consider [participating in our monthly calls](https://zoom-lfx.platform.linuxfoundation.org/meetings/pytorch?__hstc=132719121.a26416c161ac91bef494ffc19f91a62e.1723036593114.1738082449904.1738088158683.375&__hssc=132719121.1.1738088158683&__hsfp=810579359) (they are open to everyone, and the recordings are available [here](https://lists.pytorch.org/g/tac)). Also, if you develop or maintain a project based on PyTorch, consider contributing it to the new PyTorch ecosystem ([instructions](https://github.com/pytorch-fdn/ecosystem)). \ No newline at end of file diff --git a/_posts/2025-02-19-optimize-llms.md b/_posts/2025-02-19-optimize-llms.md new file mode 100644 index 000000000000..b2dfec99bd0b --- /dev/null +++ b/_posts/2025-02-19-optimize-llms.md @@ -0,0 +1,176 @@ +--- +layout: blog_detail +title: "Optimize LLMs for Efficiency & Sustainability" +hidden: true +author: "Zach Lasiuk, Arm" +--- + +The rapid growth of large language model (LLM) applications is linked to rapid growth in energy demand. According to the International Energy Agency (IEA), data center electricity consumption is projected to roughly double by 2026 primarily driven by AI. This is due to the energy-intensive training requirements for massive LLMs – however, the increase in AI Inferencing workloads also plays a role. For example, compared with traditional search queries, a single AI inference can consume about [10x more energy](https://www.weforum.org/stories/2024/07/generative-ai-energy-emissions/). + +As developers, we directly affect how energy-intensive our AI solution is. There are technical decisions we can take to help make our AI solution more environmentally sustainable. Minimizing compute to deliver LLM solutions is not the only requirement for creating sustainable AI use. For example, systemic changes, such as policy interventions may be needed, but utilizing energy efficient solutions is an important factor and is an impactful intervention we can adopt right away. + +With that said, minimizing your LLM inference cloud compute requirements also leads to reducing your cloud bill and makes your app more energy efficient, creating a win-win situation. In this blog, we will take you through the steps to creating an LLM chatbot by optimizing and deploying a Llama 3.1 model on PyTorch, quantifying the computational efficiency benefits of specific architecture decisions. + + +## What will we evaluate? + +For this blog, our goal is to create an immersive fantasy storytelling app where users enter a fantasy world by chatting with a Generative AI. The first location is the land of Wicked, allowing people to role-play walking around the Emerald City and observe the sights and scenes in real-time. We’ll implement this via a chatbot and a custom system prompt. + +We will be evaluating LLM performance on CPUs. You can see the advantages of[ CPU vs GPU inference here](https://www.arm.com/resources/ebook/cpu-inference). In general, leveraging CPUs in the cloud for LLM inference is a great choice for models around 10B parameters or less like the Llama series. + +We will also be using Arm-based CPUs, specifically the AWS Graviton series. Based on studies,[ the Arm-based Graviton3 server can provide 67.6 percent lower workload carbon intensity built in](https://newsroom.arm.com/blog/aws-graviton-decarbonize-compute). While this study was based on a simulation, it is an excellent start to showing the possibilities for minimizing our app’s energy requirements. + +First, you’ll see how to run a simple LLM chatbot on PyTorch, then explore three techniques to optimize your application for computational efficiency: + +1. Model optimization: Utilizing 4-bit quantization and added KleidiAI kernels. +2. Shortcut optimization: Implementing a vector database to handle common queries. +3. Architecture optimization: Adopting a serverless architecture. + +Let’s get started. + + +## Run Llama-3.1 via PyTorch on AWS Graviton4 + +To maximize energy efficiency, we will only use the minimum server resources needed to support this LLM chatbot. For this [Llama-3.1 8-billion parameter model](https://huggingface.co/meta-llama/Llama-3.1-8B), 16 cores, 64GB RAM, and disk space of 50GB is required. We will use the r8g.4xlarge Graviton4 instance running Ubuntu 24.04, as it meets these specifications. + +Spin up this EC2 instance, connect to it, and start installing the requirements: + + +``` + sudo apt-get update + sudo apt install gcc g++ build-essential python3-pip python3-venv google-perftools -y +``` + + +Then install Torchchat, the library developed by the PyTorch team that enables running LLMs across devices: + + +``` + git clone https://github.com/pytorch/torchchat.git + cd torchchat + python3 -m venv .venv + source .venv/bin/activate + ./install/install_requirements.sh +``` + + +Next, install the Llama-3.1-8b model from Hugging Face through the CLI. You will first need to make a Hugging Face access token on your HF account. This will download the 16GB model to your instance, which may take a few minutes: + + +``` + pip install -U "huggingface_hub[cli]" + huggingface-cli login + + python torchchat.py export llama3.1 --output-dso-path exportedModels/llama3.1.so --device cpu --max-seq-length 1024 +``` + + +Now you are ready to run the LLM model, adding a system prompt to be a guiding storyteller in the land of Wicked: + + +``` + LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python torchchat.py generate llama3.1 --device cpu --chat +``` + + +Type ‘y’ to enter a system prompt and enter the following prompt: + + +*You are the guiding storyteller for a fantasy adventure application. Immerse users in the enchanting world of Wicked, guiding them through interactive, real-time experiences in the Emerald City. Describe vivid sights, dynamic scenes, and engage users in storytelling that feels alive and responsive. Allow users to make choices that shape their journey while maintaining the magical tone of the Wicked universe.* + +Then enter your user query: + + +*I walk through the Emerald City gates and look up* + +The output will show on the screen, taking about 7 seconds to generate the first token with less than 1 token per second. + + +![terminal](/assets/images/optimize-llms.png){:style="width:100%"} + + +This example took 245 seconds, or 4 minutes, to generate its complete reply—not very fast. The first optimization we’ll look at will speed up the LLM generation, reducing its computational footprint. + + +### Optimization 1: KleidiAI and Quantization + +Several optimizations are possible from the basic implementation above. The simplest and quickest one t to do is to quantize the model from FP16 to INT4. This approach trades-off some accuracy while cutting the model size from 16Gb to about 4Gb, increasing the inference speed in the process. + +Another common optimization comes in leveraging TorchAO (Torch Architecture Optimization), the PyTorch library that works seamlessly with TorchChat to enhance model performance through various quantization and sparsity methods. + +Lastly, we’ll use Arm KleidiAI optimizations. These are micro-kernels written in assembly that lead to significant performance improvements for LLM inference on Arm CPUs. You can read more about [how KleidiAI kernels work if interested](https://learn.arm.com/learning-paths/cross-platform/kleidiai-explainer/). + +To implement these optimizations, spin up a fresh EC2 instance and follow the instructions [on how to run a Large Language Model (LLM) chatbot with PyTorch](https://learn.arm.com/learning-paths/servers-and-cloud-computing/pytorch-llama/). When ready, run the model and enter the same system prompt and user query as above. You’ll get results that significantly speed up the inference: Less than 1 second to first token, and about 25 tokens per second. + +This cuts the inference time from 245 seconds to about 10 seconds. This results in less power-draw from your server, as it is spending more time idle vs running a power-hungry inference. All else being equal, this is a more carbon-friendly solution than the non-optimized app. The next two approaches go beyond model inference optimization, modifying the solution architectural to further reduce computational load. + + +### Optimization 2: FAISS to match database for common questions + +As stated in the introduction, model inferences are typically more computationally expensive than other search techniques. What if you could automatically respond to common user queries without performing an LLM inference? Using a query/response database is an option to bypass LLM inference and respond efficiently. For this interactive storytelling app, you can imagine common questions about specific characters, the world itself, and rules about what the chatbot is/is not capable of that can have pre-generated answers. + +However, a traditional exact-match database isn’t sufficient as users can phrase the same query in many ways. Asking about the chatbot’s capabilities could all invite the same answer but be phrased differently: + + + +* “What are you capable of?” +* “Tell me what you can do.” +* “How can I interact with you?” + +Implementing semantic search solves this issue by matching a user’s query to the most relevant pre-generated answer by understanding the user’s intent. The [FAISS library](https://github.com/facebookresearch/faiss) is a great option to implement semantic search. + +The computational savings of this approach depends on three factors: + + + +1. Percentage of user queries that can be serviced by semantic search instead of LLM. +2. Computational cost of running the LLM inference. +3. Computational cost of running the semantic search. + +With the savings equation being: + + +``` + Computational_savings = (% of queries) * (LLM_cost – search_cost). +``` + + +This type of architecture makes sense in a few situations. One is if your system has common queries with many repeat questions. Another is large-scale systems with hundreds of thousands of incoming queries, where small percentage savings add up to meaningful changes. Lastly, if your LLM inference is very computationally expensive compared to the search cost, particularly with larger parameter models. + +The final optimization approach is transitioning from server to serverless. + + +### Optimization 3: Serverless approach + +Using serverless architectures are popular for many reasons, one being only paying for active compute time, and eliminating costs with idle servers. Idling servers require a non-trivial amount of power to keep on, wasting energy while waiting. + +This cost efficiency translates into being an inherently more environmentally friendly architecture, as it reduces wasteful energy consumption. Further, multiple applications share underlying physical infrastructure, improving resource efficiency. + +To set up your own serverless chatbot, you need to first containerize the quantized Llama-3.1-8b with TorchChat, TorchAO, and Arm KleidiAI optimizations with a python script containing a Lambda entry function `lambda_handler`. One deployment option is to upload your container to AWS ECR and attach the container to your Lambda function. Then set up an API Gateway WebSocket or similar to interact with your Lambda through an API. + +There are two notable limitations to using a serverless architecture to host your LLM, the first being token generation speed. Recall that the server-based approach delivered about 25 tokens/second with KleidiAI optimizations. The serverless approach delivers an order of magnitude slower, which we measured at around about 2.5 tokens/second. This limitation mainly results from Lambda functions deploying onto Graviton2 servers. When deployment moves to CPUs with more SIMD channels, like Graviton3 and Graviton4, the tokens/second should increase over time. Learn more about architecture optimizations introduced in Graviton3 via the [Arm Neoverse-V1 CPU here](https://developer.arm.com/Processors/Neoverse%20V1). + +This slower speed restricts the viable use cases for serverless LLM architectures, but there are certain cases where this can be seen as an advantage. In our use cases of interactive storytelling, slowly revealing information creates a sense of immersion, building anticipation and mimicking real-time narration. Other use cases include: + + + +* Guided meditation apps with slow, relaxing word delivery +* Virtual friend engaging in thoughtful conversation, or a therapeutic conversation. +* Poetry generation or interactive art to slow delivery creating a contemplative aesthetic. + +Users may have a better experience with slower token generation in the right applications. When prioritizing a more sustainable solution, restrictions end up becoming strengths. As an analogy, a common critique of modern movies today is that their overreliance on visual effects leads to fewer compelling storylines vs older movies. The cost restrictions of VFX meant older movies had to craft captivating dialog, leveraging skillful camera angles and character positioning to fully engage viewers. Similarly, focusing on sustainable AI architectures can lead to more engaging, immersive experiences when done thoughtfully. + +The second serverless limitation on LLM inferences is the cold-start time of about 50 seconds. If implemented poorly, a user waiting 50 seconds with no alternative will likely leave the app. You can turn this limitation into a feature in our Wicked-based experience with several design tricks: + + + +* Create a “prologue experience” where you guide users through hard-coded questions and answers, priming them for where they will land in Emerald City and collecting input to shape their upcoming experience. +* Make the waiting period a countdown timer, revealing hard-coded text snippets of the story or world-building. A character, like the wizard, could communicate with the user with fragmented lines to build suspense and prime the user into the right mindset. +* Create an audio intro with music from the movie or musical, along with rotating visuals to draw users into the atmosphere of the Wicked world. + + +### Thinking outside the box + +Implementing a sustainability-minded solution architecture includes and goes beyond optimizing your AI inferences. Understand how users will interact with your system, and right-size your implementation accordingly. Always optimizing for fast tokens per second or time to first token will hide opportunities for engaging features. + +With that said, you should be leveraging straightforward optimizations when possible. Using TorchAO and Arm KleidiAI micro-kernels are great ways to speed up your LLM chatbot. By combining creative solution architectures and optimizing where possible, you can build more sustainable LLM-based applications. Happy coding! \ No newline at end of file diff --git a/_posts/2025-02-26-accelerating-generative-ai-segment-anything-2.md b/_posts/2025-02-26-accelerating-generative-ai-segment-anything-2.md new file mode 100644 index 000000000000..87751067df7b --- /dev/null +++ b/_posts/2025-02-26-accelerating-generative-ai-segment-anything-2.md @@ -0,0 +1,1342 @@ +--- +layout: blog_detail +title: "Accelerating Generative AI with PyTorch: Segment Anything 2 - Fast and furious inference with low latency and fast cold starts" +--- + +This post is a follow-up to our [first entry in the multi-series blog focused on how to accelerate generative AI models](https://pytorch.org/blog/accelerating-generative-ai/) with pure, native PyTorch and a focus on latency and elastic scalability. We use torch.compile and torch.export to create highly optimized low latency versions of SAM2 that can be quickly scaled up on new instances. + +By utilizing AOTInductor's (AOTI) ahead-of-time compilation via torch.export, reduced precision, batched prompts and GPU preprocessing we observe up to **13x improvement in p90 execution latency** and **queue times compared to regular eager mode PyTorch**. + +We calculate our final results and demonstrate the improvement in a realistic deployment on auto-scaling cloud infrastructure from [Modal](https://modal.com). + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 execution latency +
    +(ms / improvement) +
    p90 execution latency +
    +(ms / improvement) +
    + eager float32 + AOTI float16 + eager float32 + AOTI float16 +
    AMG + 741 + 112 (6.6x) + 1140 + 176 (6.5x) +
    SPS + 98 + 20 (4.9x) + 130 + 28 (4.6x) +
    MPS + 269 + 38 (7.1x) + 714 + 52 (13.7x) +
    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 queue time (ms / improvement) + p90 queue time (ms / improvement) +
    + eager float32 + AOTI float16 + eager float32 + AOTI float16 +
    AMG + 201 + 41 (4.9x) + 815 + 327 (2.6x) +
    SPS + 31 + 33 (0.9x) + 441 + 49 (9.0x) +
    MPS + 40 + 37 (1.1x) + 942 + 75 (12.6x) +
    + + + +## The Tasks + +The first post focused on processing a small number of varying prompts (points of interest) per image. These points represented the center points of the ground truth masks. For this post, we'll now focus on a broader set of tasks. Single prompt segmentation (SPS), multi prompt segmentation (MPS), automatic mask generation (AMG) which generates the full set of masks for the input image without a given set of prompts. The first post focused on MPS only. + +![comparison of 3 images](/assets/images/accelerating-generative-ai-2.jpg){:style="width:100%"} + + + +The little star in the image represents a user prompt. For AMG there are no prompts and masks are filtered down heuristically from a dense grid of initial candidate prompts (guesses). For SPS and MPS user prompts are derived from the center points of AMG masks. For SPS we choose the mask with the largest area. + +**Note that SAM2 uses a different backbone than SAM1. In particular, we only consider the largest and most accurate sam2.1_hiera_large backbone for this blog.** + +We aggregate the scripts needed to reproduce the results in [torchao's example folder](https://github.com/pytorch/ao/tree/main/examples/sam2_amg_server) and incrementally upstream the more stable parts of the [changes to the SAM2 model in torchao](https://github.com/pytorch/ao/tree/main/torchao/_models/sam2) to the main [SAM2](https://github.com/facebookresearch/sam2) repository. So if you are interested in taking a look at the cutting-edge variant or would like to contribute experimental features, please don't hesitate to reach out to the torchao repository and team. For the more stable and latest model version, please head on over to SAM2 directly. + + +## Overview + +We categorize the changes presented here into two. **Fast** changes constrain themselves to techniques that are not meant to affect model accuracy. **Furious** changes sacrifice some numerical accuracy for additional speed by making use of approximations such as low-precision data types. + +Approximations may slightly lower precision metrics in favor of significantly improved performance while still passing an end-to-end check based on mean intersection over union (mIoU). + +To measure the performance improvements we processed 1000 images, which were selected at random from the SAM2 validation dataset. We look at the p50 and p90 latency per image. To measure accuracy we consider the mIoU. Most notably for the AMG task we also define a fail count metric. We consider a comparison failed if the **number of masks** differs. This turns out to be a fairly unstable quantity and we can see that the other tasks are not as sensitive to small numeric changes as AMG. + + +## The Setup + +We are running the offline experiments on a regular H100 devserver, which is a fairly beefy and performant machine. + +However, we try to look at these tasks with realistic constraints. In particular, we would like to emulate a server-side inference environment. That means we don't use DataLoader to hide the latency of image preprocessing or decoding routines. + +For the latency calculations we include decoding, segmentation and conversion of masks to a dictionary of run-length encoded masks. Or put differently, we exclude loading the images into in-memory host bytearrays and storing the resulting dictionaries as json files on disk. This is meant to emulate a more realistic setting. + +More concretely, consider the code below for the routines we include in our measurements. For any task `gen_masks` produces a batched bool Tensor bitmask that represents the corresponding object masks. We then compress this bitmask into a run length encoded (rle) format that can be used to transfer back the results from a remote server much more efficiently. + + +``` +image_tensors = decode_img_bytes(...) +masks = gen_masks(image_tensors, ...) +rle_dicts = [rle_dict_from_masks(m) for m in masks] +``` + + + +## Optimizations + + +### ao: eager code optimizations + +The most effective tool for this work is the PyTorch autograd profiler combined with `record_function`. To build this software, we've used the profiler repeatedly to observe the program and confirm the effectiveness of any changes. It's also important to keep in mind that the profiler itself has overhead. The more data you collect, such as stack traces, the more overhead you introduce, which might skew the collected trace. But it is excellent to find synchronization points, space between kernels and GPU kernels that take a long time. + +GPU traces help you understand bottlenecks that are not necessarily easily addressed by compile. We found that AutomaticMaskGeneration in particular is dominated by the data structure used to store the masks and by the routine used to convert the masks to a run-length encoded compressed format. We also found a large part of AMG performance is dominated by the large number of masks created as a single batch. Sometimes candidate masks can be filtered down to fewer candidates earlier in the postprocessing stage by reordering operations. This in turn significantly speeds up the later operations. + +In order to confirm the accuracy of our implementation we first compare without any changes in settings and using float32 precision. We see that mIoU is unchanged and the masks match perfectly when using the exact same settings. This means that these eager mode changes did not affect the accuracy of these tasks. + +AMG + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU / fail count +
    Baseline + 864 + 1144 + 4350 + reference +
    AO + 693 + 786 + 4010 + 1 / 0 +
    + + + +### ao: batching prompts + +Another lossless performance optimization that we were able to apply is batching the user input prompt calculations. When optimizing for latency at batch size 1 on a server-grade GPU such as an H100 we are often left with a lot of spare memory. We can easily trade off that memory for more performance by processing more points of interest (also called user prompts) at once. Remember that SAM2 is split into two parts: First the backbone (image encoder), second the prediction and decoding of masks based on a set of user prompts / points of interest. It is the second part where we may expect a larger or even varying number of inputs and it is this second part where we apply batching. + +This causes a large increase in memory, but also much better latency. The baseline generates one mask per prompt in a loop. For AMG the baseline processes 64 prompts at once and all that is needed is to change it to 1024, which is the number of candidate prompts generated. For SPS we process one prompt at a time, but it's still included below for completeness. + +AMG + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU / fail count +
    Baseline + 864 + 1144 + 4350 + reference +
    AO + batching + 613 + 706 + 33786 + 0.9999995 / 0 +
    + + +SPS + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU +
    Baseline + 116 + 181 + 1337 + reference +
    AO + 110 + 170 + 1339 + 1 +
    + + +MPS + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU +
    Baseline + 276 + 681 + 1337 + reference +
    AO + batching + 126 + 225 + 8021 + 0.9999992 +
    + + +As a technical side note: Most notably to enable batching for MPS, and to avoid a significant manual rewrite of the code base to support multiple prompts at the same time, we used a Tensor subclass we call MapTensor. A MapTensor allows us to pass a batch of N prompts, but have it advertise a batch size of 1. Any operation is then automatically broadcast to the wrapped Tensor and propagated throughout the prediction part of the model. This works because individual prompt predictions are independent of one another. This is very similar to torch.vmap. + + +``` +center_points_torch = to_map_tensor(center_points_torch) +center_points_label_torch = to_map_tensor(center_points_label_torch) +masks, scores, _ = mask_generator.predictor.predict( + point_coords=center_points_torch, + point_labels=center_points_label_torch, + multimask_output=True, + return_logits=False, + return_type="torch", +) +# Unwrapping MapTensor +masks = masks.elems +scores = scores.elems +``` + + + +### fast: fullgraph compilation + +Just as with our first post, we first remove GPU syncs and graph breaks to make use of fullgraph compiled model code with max-autotune kernels where appropriate. After some rewriting, we are able to compile the image encoder and the prediction of masks. + +We run the experiments twice to get a sense of the overhead due to compilation. We run it once in an environment with an empty TORCHINDUCTOR_CACHE_DIR and then again while ingesting the artifacts from the previous run. In particular, auto-tuning can take a long time and happens on the first call in a pristine environment. We call the second run "warm". The first iteration is typically expected to be slow due to various other related initialization processes, but compile increases it significantly, even if an existing cache is used and the same exact shapes are fed again. Having said that, an overhead of a few seconds in a warm environment is often still stomachable on the very first call. + +Most of these drawbacks can be mitigated and compiling causes a significant improvement in latency and reduction in memory. + +AMG + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU / +
    +fail count +
    first iteration +
    +(ms) +
    AO + batching + 613 + 706 + 33786 + 0.9999995 / 0 + 1125 +
    + compile (cold) + 423 + 513 + 29349 + skipped + 404866 +
    + compile (warm) + 439 + 530 + 29349 + 0.994 / 190 + 8544 +
    + + +The number of masks produced per mask can vary slightly when using automatic mask segmentation. There is ambiguity in the number of masks per object the model may produce. For example, a car may be subdivided into frames, windows and doors or treated as a whole. When a modification causes the number of masks to change, we consider the comparison failed and we only calculate the mIoU on masks with an exact match. This does not apply to the other tasks. We found that the number of masks generated is very sensitive to small numerical changes. The other tasks use the same code and MPS in particular can help us further verify correctness. + +SPS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU + first iteration +
    +(ms) +
    AO + 110 + 170 + 1339 + 1 + 562 +
    + compile (cold) + 102 + 158 + 1343 + skipped + 319954 +
    + compile (warm) + 100 + 160 + 1302 + 0.9999 + 8947 +
    + + +MPS + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU + first iteration +
    +(ms) +
    AO + batching + 126 + 225 + 8021 + 0.9999992 + 504 +
    + compile (cold) + 129 + 215 + 8021 + skipped + 333308 +
    + compile (warm) + 113 + 213 + 8021 + 0.998 + 8617 +
    + + + +### furious: TF32, float16 and GPU preprocessing + +We found that using float16 is the right level of precision for a few significant subcomponents of the model. In particular, the image encoder and mask decoder weights can be converted entirely to float16. We can also use TensorFloat32 precision for the remaining float32 matrix operations. It should be possible to further reduce the precision and we may address this in a future post. We also move image preprocessing such as image normalization onto the GPU with the furious mode. We can't use GPU decoding (nvJPEG) routines, because the differences are too significant and the model suffers from significant degradation in mIoU, so image decoding still happens on the CPU. + +AMG + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU / +
    +fail count +
    AO +
    ++ batching +
    ++ compile (warm) +
    439 + 530 + 29349 + 0.994 / 190 +
    + furious + 165 + 240 + 28335 + 0.978 / 306 +
    + + +This causes a significant degradation in mIoU for the AMG task, but doesn't affect the other tasks. After an in-depth investigation, we still chalk this up to numerical instability and reordering of operations. More work is needed to further investigate this and it may not be interesting to run the AMG task in lower precision. The other tasks, however, benefit drastically in latency with minimal changes in mIoU. + +SPS + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU +
    AO +
    ++ compile (warm) +
    100 + 160 + 1302 + 0.9999 +
    + furious + 32 + 63 + 861 + 0.9997 +
    + + +MPS + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU +
    AO +
    ++ batching +
    ++ compile (warm) +
    113 + 213 + 8021 + 0.998 +
    + furious + 36 + 64 + 4222 + 0.997 +
    + + + +### AOTInductor's (AOTI) ahead-of-time compilation via torch.export + +When scaling elastically it often is not possible to accommodate long startup times. That means the first iteration cannot be slow, but we must quickly deliver results. This is when torch.compile's current compilation overhead can get in the way. To address this we can use AOTInductor's (AOTI) ahead-of-time compilation via torch.export. AOTI lets us compile the model on a representative input and store the resulting code in a binary that is quick to load and run. + +AOTI via torch.export is a new feature and we currently can't export everything that is compilable. We've been able to export the image encoder for all tasks but have only been able to export the mask prediction for the AMG and SPS tasks due to varying prompts. torch.export also supports dynamic shapes, but we need to invest a bit more time to prepare the code for it. + +AMG: AO + batching + furious + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU / +
    +fail count +
    first iteration +
    +(ms) +
    + compile (warm) + 165 + 240 + 28335 + 0.978 / 306 + 10341 +
    + load export +
    +(cold) +
    162 + 233 + 27927 + 0.974 / 308 + 906 +
    + + +SPS: AO + furious + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU + first iteration +
    +(ms) +
    + compile (warm) + 32 + 63 + 861 + 0.9997 + 7989 +
    + load export +
    +(cold) +
    35 + 66 + 1686 + 0.9997 + 763 +
    + + +Note that loading the exported model significantly increases memory. It likely only increases peak memory utilization, because initialization really needs to be delayed before loading up an exported model to avoid having twice the weights in memory at once. This is something we could address, but the memory consumption is nowhere near the limit. We don't see an increase in the other tasks, because AMG and MPS peak memory is dominated by processing batches of masks. One way to reduce that could be to operate on masks in the rle format (or some other sparse format) earlier on, but for now, there is no reason for this given the current memory consumption and focus on latency. + +MPS: AO + batching + furious + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU + first iteration +
    +(ms) +
    + compile (warm) + 36 + 64 + 4222 + 0.997 + 9626 +
    + load export +
    +(cold) +
    43 + 72 + 3813 + 0.997 + 747 +
    + + +Using export by itself doesn't seem to benefit from extensive warmup and can be run in a pristine new inductor cache directory. But again, we do not evict the CUDA cache or other caches. In the section on Modal, we are running some of these experiments in a pristine environment. + +When only processing 1000 images in a new process, using export can really be worth it to save out on compile and other cold start overhead. + + +### bonus: More GPU preprocessing + +At this point, the latency is fairly low. In particular, for the SPS and MPS tasks we are processing at around 30ms to 40ms. Let's bring back the pseudo-code from the setup section again. + + +``` +image_tensors = decode_img_bytes(...) +masks = gen_masks(image_tensors, ...) +rle_dicts = [rle_dict_from_masks(m) for m in masks] +``` + + +Further profiling showed that at this point `decode_img_bytes` takes about 10ms. In particular, it uses torchvision's ToTensor transform to convert from a numpy Tensor to a scaled, float32 torch.Tensor. The bytes passed to ToTensor have already been decoded and converted to an numpy ndarray. By slightly rewriting ToTensor, using torchvision's v2 API and moving the uint8 decoded smaller integer Tensor to GPU first before scaling, we can gain another 10ms in latency. Without including `decode_img_bytes` in our analysis we would have missed this opportunity that has real-world impact on server-side inference. + + +``` +image_tensor = torch.from_numpy(image_tensor) +image_tensor = image_tensor.permute((2, 0, 1)) +image_tensor = image_tensor.cuda() +image_tensor = v2.ToDtype(torch.float32, scale=True)( image_tensor) +``` + + +Note in particular that using pinned memory to perform asynchronous data transfers doesn't apply, since the time it takes to move the Tensor into pinned memory isn't worth the gain in asynchronicity for this data movement. For future work, we might want to explore further improvements here by using more advanced direct memory transfer techniques. + +AMG: AO + batching + furious + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU / +
    +fail count +
    first iteration +
    +(ms) +
    + load export +
    +(cold) +
    162 + 233 + 27927 + 0.974 / 308 + 906 +
    + load export (warm) + 157 + 230 + 27927 + 0.974 / 308 + 799 +
    + load export (warm) +
    ++ preproc +
    136 + 208 + 27950 + 0.977 / 311 + 908 +
    + + +SPS: AO + furious + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU + first iteration +
    +(ms) +
    + load export +
    +(cold) +
    35 + 66 + 1686 + 0.9997 + 763 +
    + load export (warm) + 31 + 63 + 1686 + 0.9997 + 683 +
    + load export (warm) +
    ++ preproc +
    19 + 25 + 1711 + 0.9997 + 658 +
    + + +MPS: AO + batching + furious + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 latency (ms) + p90 latency (ms) + memory (MiB) + mIoU + first iteration +
    +(ms) +
    + load export +
    +(cold) +
    43 + 72 + 3813 + 0.997 + 747 +
    + load export (warm) + 53 + 81 + 3813 + 0.997 + 807 +
    + load export (warm) +
    ++ preproc +
    31 + 41 + 3837 + 0.997 + 671 +
    + + +This small change has a significant impact on the SPS and MPS task. + + +## Deploying on Modal + +Finally, we deployed our optimized inference onto [Modal](https://modal.com), a serverless infrastructure provider, to demonstrate that the benefits of these optimizations can be realized in a more realistic deployment setting. + +In particular, compilation and AOTI via torch.export requires extra work. In a naïve deployment that work might be added to every single inference execution, adding latency that dwarfs any improvements from a faster model. This is particularly challenging with elastic or autoscaling infrastructure, where replicas of our inference service need to be regularly and automatically created and destroyed. + +We share a deployment script in the torchao repository ([cli_on_modal.py](https://github.com/pytorch/ao/tree/main/examples/sam2_amg_server)) to demonstrate one pattern for an elastic deployment. We build the exported models ahead of time and then upload them to [distributed storage](https://modal.com/docs/guide/volumes). Relative to eager execution, this adds a bit of extra work when replicas spin up since they need to read this data over a network, but this is far less costly than compilation or export. + +We benchmarked this deployment with a large batch inference workload: sending 1000 images for concurrent processing. The deployment scales up to ten replicas on ten GPUs at peak and scales down to zero GPUs when inactive. + +First, let’s look at the execution latencies. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 execution latency +
    +(ms / improvement) +
    p90 execution latency +
    +(ms / improvement) +
    + eager float32 + AOTI float16 + eager float32 + AOTI float16 +
    + + Modal + Offline + + Modal + Offline +
    AMG + 741 + 112 (6.6x) + 136 (5.4x) + 1140 + 176 (6.5x) + 208 (5.5x) +
    SPS + 98 + 20 (4.9x) + 19 (5.2x) + 130 + 28 (4.6x) + 25 (5.2x) +
    MPS + 269 + 38 (7.1x) + 31 (8.7x) + 714 + 52 (13.7x) + 41 (17.4x) +
    + + +We notice that execution latencies on Modal and Offline are fairly close, especially relative to the baseline, indicating that optimizing the deployment offline was a reasonable proxy for optimizing the deployment directly. + +In addition to execution latency, our batch workload has queueing time, since there are fewer replicas than there are inputs, and so some inputs have to wait in line. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    + p50 queue time (ms) + p90 queue time (ms) +
    + eager float32 + AOTI float16 + eager float32 + AOTI float16 +
    AMG + 201 + 41 (4.9x) + 815 + 327 (2.6x) +
    SPS + 31 + 33 (0.9x) + 441 + 49 (9.0x) +
    MPS + 40 + 37 (1.1x) + 942 + 75 (12.6x) +
    + + +Even though the queueing system provided by the infrastructure is unchanged, the queue latencies also decrease when we use our optimized model – in the p90 case by a factor of 2 to 12. That’s because when we finish previous inputs faster (from reduced execution latency) we can pull our next inputs sooner (reducing their queueing time). + +If you’re interested in optimizing SAM2 inference or deployments further, don’t hesitate to reach out to us at the [torchao repository](https://github.com/pytorch/ao)! + + +## Conclusions + +We rewrote Meta's original SAM2 in pure PyTorch with little loss of accuracy and a strong focus on latency. We deployed our optimized inference onto [Modal](https://modal.com), a serverless infrastructure provider, to demonstrate that the benefits of these optimizations can be realized in a more realistic deployment setting. + +By utilizing AOTInductor's (AOTI) ahead-of-time compilation via torch.export, reduced precision, batched prompts and GPU preprocessing we observe up to 13x improvement in p90 execution latency and queue times compared to regular eager mode PyTorch. + +With elastic or autoscaling infrastructure, where replicas of our inference service need to be regularly and automatically created and destroyed, a naïve deployment of torch.compile can add work to inference execution that dwarfs any improvements from a faster model. By utilizing AOTInductor's (AOTI) ahead-of-time compilation via torch.export, we are able to upload exported models ahead of time and read this data over a network, which enables us to get the benefits of compilation without significantly increased work. + +For more details on how to reproduce the data in this blog post, [check out the experiments folder of torchao](https://github.com/pytorch/ao/tree/main/examples/sam2_amg_server). Please don't hesitate to contact us or [open an issue](https://github.com/pytorch/ao/issues/new) if you run into any technical issues. \ No newline at end of file diff --git a/_posts/2025-03-04-submit-to-speak.md b/_posts/2025-03-04-submit-to-speak.md new file mode 100644 index 000000000000..89d9907b682d --- /dev/null +++ b/_posts/2025-03-04-submit-to-speak.md @@ -0,0 +1,79 @@ +--- +layout: blog_detail +title: "📣 Submit to Speak at PyTorch Conference + Save on Registration" +--- + +Step into the Future of AI at PyTorch Conference 2025. + + +![banner ad for conference](/assets/images/submit-to-speak/fg1.png){:style="width:100%"} + + +The Call for Proposals for **PyTorch Conference 2025** is officially open! + +**Join us in San Francisco from October 22–23, 2025,** to showcase your expertise and innovations with PyTorch—the industry-leading, open-source machine learning framework powering innovations from bare-metal infrastructure to sophisticated application and agent layers. This is your opportunity to share insights, breakthroughs, and case studies with a global audience of AI and Generative AI practitioners, researchers, and developers. + +![people watching presentation at conference](/assets/images/submit-to-speak/fg2.jpg){:style="width:100%"} + + +Submit your proposals and prepare to engage, learn, and network alongside some of the brightest minds in the AI/ML community. We’re seeking sessions, Birds of a Feather discussions, lightning talks, and poster sessions on the following topics: + +* Core PyTorch Framework +* PyTorch on Accelerator Hardware +* PyTorch Ecosystem and Tools +* AI Applications and Use Cases +* AI in Research and Academia +* AI in Industry and Enterprise Applications +* AI Infrastructure and Scalability +* Ethical AI, Governance, and Regulation +* Training, Fine-Tuning, and Alignment +* Inference, Deployment, and Serving +* Performance Measurement and Benchmarking +* Data Engineering and Management for AI +* Generative AI and Large Language Models (LLMs) +* Model Optimization and Efficiency +* Open Source Collaboration, Education and Community Building +* Edge AI and On-Device +* DL Compilers and Kernel Authoring + + +
    +

    Learn more and submit your talk by Sunday, June 1, at 11:59 PDT!

    + + SUBMIT TO SPEAK + +
    + + +--- + +![people arriving at conference](/assets/images/submit-to-speak/fg3.jpg){:style="max-width:300px; display: block; float: right;"} + +**Save up to USD$500 with Super Early Bird Pricing!** + +* Reserve your pass by **11:59 PM PDT on March 21** and score Super Early Bird pricing for just **USD$499**. That’s a savings of up to USD$500! +* Student or faculty? Learn more about our **[discounted academic rate](https://events.linuxfoundation.org/pytorch-conference/register/#registration-rates)**. +* Need help covering travel costs? We offer discretionary travel funding for those community members who would otherwise not be able to attend. **[Learn more](https://events.linuxfoundation.org/pytorch-conference/register/#additional-information)**. + +
    + + REGISTER NOW & SAVE + +
    + +--- + + +**Become a Sponsor at PyTorch Conference 2025!** + +Seize your opportunity to influence the future of Generative AI and Machine Learning by sponsoring PyTorch Conference 2025. PyTorch is at the forefront of innovation—empowering rapid experimentation, flexible model development, and efficient deployment into production environments with its powerful, versatile ecosystem of tools and thriving community of dedicated users. + +As a sponsor, you'll gain more than visibility; you'll strategically position your organization at the heart of a vibrant, global AI/ML ecosystem. Connect directly with **3,000+** expert attendees, researchers, engineers, and decision-makers, and actively shape the conversations driving the next generation of AI advancements. + +
    + + BECOME A SPONSOR + +
    + +For more details on CFP submissions, registration, and sponsorship, visit **the** [PyTorch Conference Website](https://events.linuxfoundation.org/pytorch-conference/). \ No newline at end of file diff --git a/_posts/2025-03-05-activation-checkpointing-techniques.md b/_posts/2025-03-05-activation-checkpointing-techniques.md new file mode 100644 index 000000000000..782722e96681 --- /dev/null +++ b/_posts/2025-03-05-activation-checkpointing-techniques.md @@ -0,0 +1,233 @@ +--- +layout: blog_detail +title: "Current and New Activation Checkpointing Techniques in PyTorch" +--- + +As models scale in depth, batch size, and sequence length, etc, activation memory becomes an increasingly significant contributor to the overall memory usage. To help address this, PyTorch provides utilities for [activation checkpointing](https://pytorch.org/docs/stable/checkpoint.html), which reduce the number of saved tensors by recomputing them when needed, trading off memory usage for additional compute. + +In this post, we’ll walk through the basics of what activation memory is, the high-level ideas behind existing activation checkpointing techniques, and also introduce some newer techniques that aim to improve flexibility and provide more optimization/automation out of the box. + +As we look at these techniques, we'll compare how these methods fit into a speed vs. memory trade-off diagram and hopefully provide some insight on how to choose the right strategy for your use case. + +*(If you prefer to jump straight to the new APIs, please skip ahead to the “Selective Activation Checkpoint” and “Memory Budget API” sections below.)* + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg1.png){:style="width:100%"} + + +--- + + +## Activation Memory Basics + +By default, in eager mode (rather than using `torch.compile`), PyTorch’s autograd preserves intermediate activations for backward computation. For example, if you call `sin` on a tensor `x` during the forward pass, autograd must remember `x` to compute `cos(x)` during backward. + + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg2.png){:style="max-width:400px; display: block; margin-left: auto; margin-right: auto"} + + +If this tensor `x` is saved at the beginning of the forward pass, it remains in memory throughout both the forward and backward phases. It can only be cleared after it is used to compute the gradient, which happens at the end of the backward pass (due to the reverse order of execution). + +Thus, as you proceed through the forward pass and perform more and more operations, you accumulate more and more activations, resulting in more and more activation memory until it (typically) reaches its peak at the start of backward (at which point activations can start to get cleared). + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg3.png){:style="width:100%"} + + +*In the diagram above, the orange boxes represent operations, black arrows represent their tensor inputs and outputs. The black arrows that cross over the right represent tensors that autograd saves for backward.* + +A useful way to visually organize this default saving behavior in eager as well as the techniques we're about to introduce is based on how they trade off speed versus memory. + + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg4.png){:style="width:100%"} + + +The ideal place to be on this diagram is the top-left, where you have "high" speed but also low memory usage. + +We begin by putting the default saving behavior on the **top-right** (for reasons we'll explain in more detail as we introduce more points for other techniques). + + +--- + + +## Activation Checkpointing (AC) + +**[Activation checkpointing (AC)](https://pytorch.org/docs/stable/checkpoint.html)** is a popular technique to reduce memory usage in PyTorch. + +During forward, any operations performed inside the AC'd region do not save tensors for backward. (Only the inputs to the function are saved.) During backward, the intermediate activations needed for gradient computation are rematerialized by running the function a second time. + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg5.png){:style="width:100%"} + + +*In the diagram (right), the black box shows where activation checkpointing is applied. Compared to the default eager approach (left), this setup results in fewer tensors being saved (1 versus 3).* + +Applying AC on the right parts of the model has the effect of reducing peak memory, because the intermediate activations are no longer materialized in memory when the memory usage typically peaks (at the beginning of backward). + +On the speed-versus-memory tradeoff diagram, AC is plotted on the **bottom-left.** Relative to eager mode, it reduces the amount of memory saved for backward but comes with an added cost in compute due to recomputation. + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg6.png){:style="width:100%"} + + +Note that AC’s speed–memory tradeoff /can/ be adjusted by selecting which parts of the forward pass to checkpoint and by defining how many checkpoint regions to use. However, implementing these changes may require modifying your model’s structure and can be cumbersome depending on how your code is organized. For the purposes of this diagram, we assume only one region is checkpointed; under this assumption, AC appears as a single point on the tradeoff diagram. + +Also note that “memory” here does not refer to peak memory usage; rather, it indicates the how much memory is saved for backward for a fixed region. + + +--- + + +## torch.compile and min-cut partitioner + +Another notable approach to keep in mind is **torch.compile** (introduced in PyTorch 2.0). Like activation checkpointing, `torch.compile` can also perform some level of recomputation under the hood. Specifically, it traces the forward and backward computations into a single joint graph, which is then processed by a [“min-cut” partitioner](https://dev-discuss.pytorch.org/t/min-cut-optimal-recomputation-i-e-activation-checkpointing-with-aotautograd/467). This partitioner uses a min-cut/max-flow algorithm to split the graph such that it minimizes the number of tensors that need to be saved for backward. + +At first glance, this might sound a lot like what we want for activation memory reduction. However, the reality is more nuanced. By default, the partitioner’s primary goal is to reduce runtime. As a result, it only recomputes certain types of operations—primarily simpler, fusible, and non-compute-intensive ops (like pointwise ops). + +Placing "compile" on the speed-versus-memory tradeoff diagram... + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg7.png){:style="width:100%"} + + +It is to the top-left of the eager non-AC point, as we expect `torch.compile` to improve on both speed and memory. + +On the other hand, relative to activation checkpointing, torch.compile is more conservative about what it recomputes, placing it closer to the top-left on the speed-versus-memory diagram. + + +--- + + +## Selective Activation Checkpoint [NEW!] + +While normal checkpointing recomputes every op in a chosen region, [selective activation checkpointing (SAC)](https://pytorch.org/docs/main/checkpoint.html#torch.utils.checkpoint.create_selective_checkpoint_contexts) is an additional setting on top of activation checkpointing that you can apply to have a more granular control over which operations to recompute. + +This can be useful if you have certain more expensive operations like matmuls which you prefer to avoid recomputing, but still generally want to recompute cheaper operations like pointwise. + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg8.png){:style="width:100%"} + + +*Where plain AC (left) would save a single tensor and then recompute the entire AC'd region, with SAC (right) you can selectively save specific operations (marked red) in the region, so you can avoid recomputing them.* + +To specify what to selectively save, you can specify a policy_fn. To illustrate the additional trade offs you can make with this, we present two simple policy functions. + + +### Policy 1: Not recomputing matmuls: + + +``` +aten = torch.ops.aten +compute_intensive_ops = [ + aten.mm, + aten.bmm, + aten.addmm, +] +def policy_fn(ctx, op, *args, **kwargs): + if op in compute_intensive_ops: + return CheckpointPolicy.MUST_SAVE + else: + return CheckpointPolicy.PREFER_RECOMPUTE +``` + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg9.png){:style="width:100%"} + + +### Policy 2: More aggressively save anything compute intensive + + +``` +# torch/_functorch/partitioners.py +aten = torch.ops.aten +compute_intensive_ops = [ + aten.mm, + aten.convolution, + aten.convolution_backward, + aten.bmm, + aten.addmm, + aten._scaled_dot_product_flash_attention, + aten._scaled_dot_product_efficient_attention, + aten._flash_attention_forward, + aten._efficient_attention_forward, + aten.upsample_bilinear2d, + aten._scaled_mm +] +def policy_fn(ctx, op, *args, **kwargs): + if op in compute_intensive_ops: + return CheckpointPolicy.MUST_SAVE + else: + return CheckpointPolicy.PREFER_RECOMPUTE +``` + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg10.png){:style="width:100%"} + + +On the speed-versus-memory diagram, SAC is plotted as a range of points from closer to AC to closer to Eager, depending on your chosen policy. + + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg11.png){:style="width:100%"} + + +**Try it out!** (Available in 2.5 as a prototype feature; see [docs](https://pytorch.org/docs/main/checkpoint.html#torch.utils.checkpoint.create_selective_checkpoint_contexts) for more info + copy-pastable example) + + +``` +from torch.utils.checkpoint import checkpoint, create_selective_checkpoint_contexts + +# Create a policy function that returns a CheckpointPolicy +def policy_fn(ctx, op, *args, **kwargs): + if op in ops_to_save: + return CheckpointPolicy.MUST_SAVE + else: + return CheckpointPolicy.PREFER_RECOMPUTE + +# Use the context_fn= arg of the existing checkpoint API +out = checkpoint( + fn, *args, + use_reentrant=False, + # Fill in SAC context_fn's policy_fn with functools.partial + context_fn=partial(create_selective_checkpoint_contexts, policy_fn), +) + +``` +--- + + + +## (compile-only) Memory Budget API [NEW!] + +As mentioned previously, any given SAC policy can be represented as a point on a speed-memory tradeoff diagram. Not all policies are created equal, however. The "optimal" policies are the ones that fall on a pareto curve, e.g. for all policies that incur the same memory overhead, this policy is the one that minimizes the amount of required compute. + +For users who are using torch.compile, we offer a **memory budget API** that automatically applies SAC over your compiled region with a pareto-optimal policy given a user-specified "memory budget" between 0 and 1, where a budget of 0 behaves like plain-AC and a budget of 1 behaves like default torch.compile. + + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg12.png){:style="width:100%"} + + +Below are some real results on a transformer model: + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg13.png){:style="width:100%"} + + +We observe a 50% memory reduction by recomputing only pointwise ops, with a steady drop-off as you recompute more and more of your matmuls. Attention is the most expensive, so you tend to want to recompute those last. + +**Try it out!** (Available in 2.4 as an experimental feature; see this [comment block](https://github.com/pytorch/pytorch/blob/68a363548409a3ff17965770304ee5e12fe718d9/torch/_functorch/config.py#L110-L122) for more info) + + +``` +torch._dynamo.config.activation_memory_budget = 0.5 + +out = torch.compile(fn)(inp) +``` + +--- + + + + +## Conclusion + + +![flow diagram](/assets/images/activation-checkpointing-techniques/fg14.png){:style="width:100%"} + + +In summary, activation checkpointing techniques in PyTorch offer a variety of ways to balance memory and compute demands, from simple region-based checkpointing to more selective and automated methods. By choosing the option that best matches your model’s structure and resource constraints, you can achieve significant memory savings with an acceptable trade-off in compute. + + +## Acknowledgements + +We would like to thank Meta's [xformers](https://github.com/facebookresearch/xformers) team including [Francisco Massa](https://github.com/fmassa) for working on the original version of Selective Activation Checkpoint. \ No newline at end of file diff --git a/_posts/2025-03-06-peak-performance-minimized-memory.md b/_posts/2025-03-06-peak-performance-minimized-memory.md new file mode 100644 index 000000000000..6271d6412aff --- /dev/null +++ b/_posts/2025-03-06-peak-performance-minimized-memory.md @@ -0,0 +1,152 @@ +--- +layout: blog_detail +title: "Peak Performance, Minimized Memory: Optimizing torchtune’s performance with torch.compile & Liger Kernel" +author: LinkedIn and Meta +--- + +**LinkedIn**: Shivam Sahni, Byron Hsu, Yanning Chen +**Meta**: Ankith Gunapal, Evan Smothers + +This blog explores the integration of a custom triton kernel, Liger Kernel with `torch.compile` to enhance the performance of fine-tuning large language models (LLMs) using torchtune. torchtune, a PyTorch-native library, offers modular building blocks and customizable finetuning recipes which include `torch.compile` support for various LLMs, while Liger Kernel provides optimized Triton kernels to improve training efficiency and reduce memory usage. The integration involves modifying the `TransformerDecoder` module in torchtune to bypass the linear layer computation, allowing the Liger Fused Linear Cross Entropy Loss to handle the forward projection weights. Experiments conducted on an NVIDIA A100 instance demonstrate that `torch.compile` outperforms PyTorch Eager in throughput and memory efficiency, with Liger Kernel further reducing peak memory allocation and enabling larger batch sizes. The results show a 47% reduction in peak memory at batch size 256 and a marginal increase in throughput with `meta-llama/Llama-3.2-1B` , confirming the effectiveness of the integration without affecting the loss curves. + + +## Introduction to torchtune + +torchtune is a PyTorch-native library which has been designed for finetuning LLMs. torchtune provides composable and modular building blocks along with finetuning recipes that can be easily customized for your use case, as will be shown in this blog. \ +torchtune provides: + + + +* PyTorch implementations of popular LLM model architectures from Llama, Gemma, Mistral, Phi, and Qwen model families +* Hackable training recipes for full finetuning, LoRA, QLoRA, DPO, PPO, QAT, knowledge distillation, and more +* Out-of-the-box memory efficiency, performance improvements, and scaling with the latest PyTorch APIs, including `torch.compile` +* YAML configs for easily configuring training, evaluation, quantization or inference recipes +* Built-in support for many popular dataset formats and prompt templates + + +## Introduction to Liger Kernel + +Liger Kernel is an open source library of optimized Triton kernels designed to enhance the efficiency and scalability of training Large Language Models (LLMs). It focuses on kernel-level optimizations such as operation fusing and input chunking, achieving significant improvements in training throughput and GPU memory usage compared to existing implementations like those from HuggingFace. By using a single line of code, Liger Kernel can improve [training throughput by 20% and reduce memory usage by 60%](https://www.linkedin.com/blog/engineering/open-source/liger-kernel-open-source-ecosystem-for-efficient-llm-training). + + +![Fused Linear Cross Entropy](/assets/images/peak-performance-minimized-memory/fg1.png){:style="width:100%"} + +
    +

    Figure 1: Fused Linear Cross Entropy

    +
    + + +The bulk of LIger Kernel’s performance improvement comes from the Fused Linear Cross Entropy (FLCE) Loss, whose core idea is as follows: + +In LLMs, the vocabulary size has increased significantly, leading to a large logit tensor during cross-entropy (CE) loss computation. This logit tensor consumes excessive memory, causing a bottleneck in training. For example, when training with a batch size of 8 and sequence length of 4096, the 256k vocabulary size results in a 16.8 GB logit tensor. The FLCE kernel breaks down the computation into smaller chunks, reducing memory consumption. + +Here's how it works: + + + +1. Flattens the 3D hidden states into a 2D matrix by collapsing the batch size and sequence length dimensions. +2. Applies the linear projection head sequentially on the chunked hidden states. +3. Computes the partial loss and returns the chunked logits gradient using the Liger CE kernel. +4. Derives the chunked hidden states gradients and accumulates the projection head gradients. + +Torchtune’s recipes provide `torch.compile` support out of the box. It has been shown that utilizing `torch.compile` with FLCE makes [FLCE 2x faster](https://github.com/linkedin/Liger-Kernel/issues/227). + + +## Integrating Liger Kernel with torch.compile & torchtune + +We demonstrate integration of Liger Kernel with `torch.compile` & torchtune by running a full fine-tuning recipe with `meta-llama/Llama-3.2-1B`. To make this integration happen, we have defined a custom full finetuning recipe, the details of the changes are mentioned below. + + +``` +CUDA_VISIBLE_DEVICES=0,1,2,3 tune run --nproc_per_node 4 recipes/full_finetune_distributed.py --config llama3_2/1B_full optimizer=torch.optim.AdamW optimizer.fused=True optimizer_in_bwd=False gradient_accumulation_steps=1 dataset.packed=True compile=True enable_activation_checkpointing=True tokenizer.max_seq_len=512 batch_size=128 +``` + + +One of the inputs to the LCE Kernel is the forward projection weights. torchtune is designed as a modular library with composable blocks. There is a `TransformerDecoder` [block](https://github.com/pytorch/torchtune/blob/main/torchtune/modules/transformer.py#L322) where at the end of the block, we pass the final hidden state through a linear layer to get the final output. Since the linear layer is combined with the CE loss in LCE Kernel, we write a custom `forward` function for `TransformerDecoder` where we skip the computation through the linear layer. + +In the full finetuning recipe, we override the model's forward method with this custom method + + +``` +import types +from liger_kernel.torchtune.modules.transformers import decoder_forward +self._model.forward = types.MethodType(decoder_forward, self._model) +``` + + +We then pass the model's forward projection weights to calculate the loss with LCE Kernel + + +``` +from liger_kernel.transformers.fused_linear_cross_entropy import ( + LigerFusedLinearCrossEntropyLoss, +) + +# Use LCE loss instead of CE loss +self._loss_fn = LigerFusedLinearCrossEntropyLoss() + +# call torch.compile on the loss function +if self._compile: + training.compile_loss(self._loss_fn, verbose=self._is_rank_zero) + +# pass the model's forward projection weights for loss computation +current_loss = ( + self._loss_fn( + self._model.output.tied_module.weight, + logits, + labels, + ) + * current_num_tokens + ) +``` + + +The complete code and instructions can be found in the [GitHub repo](https://github.com/pytorch-labs/applied-ai/tree/liger_kernel/third_party). + + +## Experiments & Benchmarking Results + +We conduct 3 types of experiments to demonstrate how Liger Kernel integration with `torch.compile` enhances the performance of torchtune. We set up our experiments on an instance running NVIDIA A100. We fine-tune a small LLM `meta-llama/Llama-3.2-1B `with differing batch sizes. We record the throughput in terms of tokens/second and measure the peak memory allocated during finetuning. Since it's a small model, we only use 4 A100 GPUs for the benchmarking. The following are the experiments we conducted: + + + +1. Increase batch_size in powers of 2 with PyTorch eager +2. Increase batch_size in powers of 2 with torch.compile +3. Increase batch_size in powers of 2 with torch.compile & Liger integration + +We notice that with PyTorch Eager, throughput increases with increasing batch_size till we hit OOM at batch_size 256. With `torch.compile`, the throughput is higher than PyTorch Eager for each batch_size. We see that the peak memory allocation reduces drastically with increasing batch_size and more than 50% reduction in peak memory at batch_size 128. This results in `torch.compile` being able to support batch_size 256 and hence, the overall throughput with `torch.compile` being 36% greater than PyTorch Eager. Integrating Liger Kernel with `torch.compile` doesn’t drop the throughput at lower batch_size but with increasing batch_size, we notice that torchtune is consuming less memory compared to torch.compile. At batch_size 256, we see a 47% reduction in peak memory allocation with the Liger kernel. This allows us to use batch_size 512 with `torch.compile` & Liger. We notice that there is a marginal 1-2% increase in throughput compared to `torch.compile` without custom triton kernels. + + +![Plot of tokens/sec per rank vs batch_size](/assets/images/peak-performance-minimized-memory/fg2.png){:style="width:100%"} + +
    +

    Figure 2: Plot of tokens/sec per rank vs batch_size

    +
    + +![Peak memory allocated vs batch_size](/assets/images/peak-performance-minimized-memory/fg3.png){:style="width:100%;margin-top: 60px;"} + +
    +

    Figure 3: Peak memory allocated vs batch_size

    +
    + +To rule out any potential functional issues with our integration of Liger Kernel with torchtune, we plot the loss curve against training steps with & without Liger. We see that there is no visible difference in the loss curves. + + +![Plot of loss vs training steps for batch_size=128](/assets/images/peak-performance-minimized-memory/fg4.png){:style="width:100%"} + +
    +

    Figure 4: Plot of loss vs training steps for batch_size=128

    +
    + + +## Next Steps + + + +* Enable Liger kernels for [DPO loss](https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/chunked_loss/dpo_loss.py#L7) and [distillation loss](https://github.com/linkedin/Liger-Kernel/blob/main/src/liger_kernel/chunked_loss/fused_linear_distillation.py#L9) in torchtune’s recipes for [DPO](https://pytorch.org/torchtune/main/recipes/dpo.html) and [knowledge distillation](https://pytorch.org/blog/llama-into-torchtune/), respectively. +* Support Liger integration in torchtune with [tensor parallel training](https://github.com/pytorch/torchtune/pull/2330). + + +## Acknowledgments + +We thank Hamid Shojanazeri (Meta), Less Wright (Meta), Horace He (Meta) & Gregory Chanan (Meta) for their feedback and support in making this blog post happen. diff --git a/_posts/2025-03-07-pt-fedora-os-communities.md b/_posts/2025-03-07-pt-fedora-os-communities.md new file mode 100644 index 000000000000..77081b55ea04 --- /dev/null +++ b/_posts/2025-03-07-pt-fedora-os-communities.md @@ -0,0 +1,52 @@ +--- +layout: blog_detail +title: "Powering AI with PyTorch, Fedora, and Open Source Communities" +author: Sudhir Dharanendraiah +hidden: true +--- + + +![man speaking at a conference](/assets/images/pt-fedora-os-communities/fg1.jpg){:style="width:100%"} + + +At [DevConf.IN 2025](https://www.devconf.info/in/) in Pune, I had the opportunity to host a **[PyTorch Meetup](https://pretalx.devconf.info/devconf-in-2025/talk/W3YURM/)** on February 28th. The session, titled "**Powering AI with PyTorch, Fedora, and Open Source Communities**" was aimed at introducing PyTorch to students and professionals, explaining why **PyTorch+Fedora** form an ideal AI development platform. The other key aspect I covered was collaboration between open source communities. + + +## Introduction to PyTorch + + +## The Power of Deep Learning made simple + + +With the explosion of GPTs, there is a renowned interest in the field of AI and ML. The myth of developing AI/ML technologies and its applications is rocket science and far-fetched, needs correction. Only open source has the power to demystify this myth and further evolve the technology to make it versatile and developer friendly. Since its inception, PyTorch has evolved and has been a driving force to make AI/ML development extremely simple. I covered the aspects of PyTorch key components, its features and why PyTorch is the best choice as a deep learning framework. + + +![man speaking at a conference](/assets/images/pt-fedora-os-communities/fg2.jpg){:style="width:100%"} + + + +The codewalk through was designed to showcase how easy and simple it is to utilise the power of GPUs, creating a simple neural network and training the model. The code walkthrough was very well received and it was great to hear back from the attendees that they never knew how powerful PyTorch is for deep learning. The real world examples sighted how this powerful framework can be used beyond the common GPTs and has the power to influence across a broad spectrum of applications. + + +## Fedora+PyTorch the Ideal AI/ML Development Platform + +![man speaking at a conference](/assets/images/pt-fedora-os-communities/fg3.jpg){:style="width:100%"} + +![man speaking at a conference](/assets/images/pt-fedora-os-communities/fg4.jpg){:style="width:100%"} + + +One of the highlights of the event was the discussion on Fedora’s role as an AI platform. Fedora’s reliability, flexibility, and strong community support make it an ideal partner for PyTorch, allowing developers to focus on model-building without worrying about infrastructure. The students were intrigued by the idea of contributing to Fedora’s AI/ML ecosystem while building their own projects. Sumantro Mukherjee spoke about the AI policy in Fedora and how one can start contributing to the AI/ML using Fedora as a platform. He highlighted how Fedora is evolving to meet the needs of AI practitioners. The idea that an open-source operating system could provide the perfect foundation for AI research sparked an engaging conversation. + + +## Innovation in Open Source When Communities Come Together + +![charts](/assets/images/pt-fedora-os-communities/fg5.jpg){:style="width:100%"} + +It is important that we learn from history and repeat the good things! When open source communities come together they can create seismic shifts in the industry. To drive this home, I took the audience on a journey through history, revisiting a pivotal moment when Apache and Linux came together, solving common problems and fundamentally reshaping enterprise computing. That moment was not just about technology; it was about collaboration. It was about two powerful communities recognizing that they were stronger together. Today, we stand at the cusp of another such moment - PyTorch and Linux, particularly Fedora, are coming together to shape the future of AI/ML. This is not just an opportunity but a responsibility for contributors, developers, and AI/ML enthusiasts to be part of this movement. + + +## Looking Ahead + +![man speaking at a conference](/assets/images/pt-fedora-os-communities/fg6.jpg){:style="width:100%"} + +One of the best parts of the event was the enthusiasm it generated. Diverse audience, including students, AI enthusiasts, and industry professionals. Notably, Vincent Caldeira (CTO, APAC, Red Hat) and Chris Butler (Senior Principal Chief Architect, Red Hat) were present, reinforcing the growing interest in open-source AI/ML. Many students were eager to explore PyTorch and Fedora, contribute to open-source AI projects, and start their own AI experiments. Industry experts saw the potential for scalable, community-driven AI innovation. The session sparked curiosity and conversations that continued long after the event ended. \ No newline at end of file diff --git a/_posts/2025-03-11-scaling-recommendation-2d-sparse-parallelism.md b/_posts/2025-03-11-scaling-recommendation-2d-sparse-parallelism.md new file mode 100644 index 000000000000..230b3d0337bb --- /dev/null +++ b/_posts/2025-03-11-scaling-recommendation-2d-sparse-parallelism.md @@ -0,0 +1,219 @@ +--- +layout: blog_detail +title: "Scaling Recommendation Systems Training to Thousands of GPUs with 2D Sparse Parallelism" +author: "PyTorch Team at Meta: Chunzhi Yang, Rich Zhu, Zain Huda, Liangbei Xu, Xin Zhang, Jiyan Yang, Dennis van der Staay, Wang Zhou, Jin Fang, Jade Nie, Yuxi Hu" +--- + +At Meta, recommendation systems are the cornerstone of delivering relevant and personalized ads to billions of users globally. Through technologies like PyTorch's TorchRec, we've successfully developed solutions that enable model training across hundreds of GPUs. While these systems have served us well, recent research on scaling laws has revealed a compelling opportunity: we can achieve significantly better model performance by training dramatically larger neural networks. + +However, this insight presents us with a new challenge. Our current training infrastructure, though highly optimized for hundreds of GPUs, cannot efficiently scale to the thousands of GPUs needed to train these larger models. The leap from hundreds to thousands of GPUs introduces complex technical challenges, particularly around handling sparse operations in recommendation models. These challenges require fundamentally new approaches to distributed training, which we address with a novel parallelization strategy. + +**To address these issues, we introduced 2D embedding parallel, a novel parallelism strategy that overcomes the sparse scaling challenges inherent in training large recommendation models across thousands of GPUs. This is available today in TorchRec through the DMPCollection API.** This approach combines two complementary parallelization techniques: data parallelism for the sparse components of the model, and model parallelism for the embedding tables, leveraging TorchRec's robust sharding capabilities. By strategically integrating these techniques, we've created a solution that scales to thousands of GPUs and now powers Meta's largest recommendation model training runs. + +**What are the sparse scaling challenges?** + +We identified three key challenges that prevented us from naively scaling our model to thousands of GPUs: + +* **Imbalancing and straggler issue:** with more GPUs it’s harder to achieve balanced sharding, some ranks can have much heavier workload for embedding computations, which can slow down the entire training. +* **Communication across nodes:** As training jobs utilize an increased number of GPUs, the all-to-all communication bandwidth can drop under certain network topologies which can increase communication latency significantly. +* **Memory overhead:** The memory used by input features is often negligible, however, as we use thousands of GPUs, we can introduce larger input features and the memory requirements can become significant. + +With 2D embedding parallel, we can describe our new parallelism scheme like this, in this example we have 2 model replicas (Replica 1: GPU1/GPU3, Replica 2: GPU2/GPU4) + + +![Flow diagram](/assets/images/scaling-recommendation-2d-sparse-parallelism/fg1.png){:style="width:100%"} + +***Figure 1: Layout illustration of 2D Sparse Parallelism*** + +With 2D sparse parallelism we address these challenges, instead of sharding tables across all ranks, we first evenly divide all ranks into several parallel groups: + + + +1. Within each group, we use model parallel for the embedding tables, such as column-wise/row-wise sharding. At scale, for our largest tables, we have also developed a grid sharding, which shards embedding tables on the row and column dimension. +2. Across groups, we do data parallel, such that each rank in a group has its corresponding replica rank in the other groups (replica rank means storing the same embedding table shards). + 1. After each group has completed its own backward pass, we all reduce the embedding table weights across the replicas to keep them synchronized. + +## Our production solution + +TorchRec is our library to build the sparse part of the recommendation models in native PyTorch. With the traditional API being DistributedModelParallel which applies model parallel to the embedding tables. We introduce a new API alongside it, known as DMPCollection, which serves as the main entry point for enabling 2D parallel on TorchRec models. We designed it to be as easy of a change as applying FSDP/DDP is. + +To understand what DMPCollection does, we have to understand what DistributedModelParallel (DMP) does first: + + + +1. Create embedding tables, known as EmbeddingBagCollection and EmbeddingCollections. +2. Generate a sharding plan with respect to GPU topology, embedding tables, memory available, input data, and more. +3. Wrap model with DMP and the associated sharding plan passed in. +4. DMP initializes and shards the embedding tables in accordance with the sharding plan. +5. On a train step, DMP takes an input batch, communicates it to the appropriate GPUs containing the embedding table shard of interest, looks up the value, and returns it back to the GPU that requested it. This is all done on the global process group, with some exceptions for special sharding (such as table row wise sharding) + +DistributedModelParallel was built for model parallel with many parts working under the assumption of sharding and working around the global world size. We need to change these parts in a way where we can introduce additional dimensions of parallelism without losing the optimizations and feature set of TorchRec. + +DMPCollection changes a few key parts to enable 2D parallel in an extensible way, + + + +* Generate sharding plans for the smaller sharding group once, once passed in we communicate to the appropriate ranks across the global group and remap the ranks to fit the new sharding group ranks. +* Create two new NCCL process groups, known as sharding and replica process groups. The sharding process group is passed into sharding and train step components of TorchRec. The replica process group is used for the weight and optimizer state synchronization, the all reduce call happens over this process group. + * The sub NCCL process groups allow us to efficiently communicate only between the ranks that are relevant for a particular comm. Each rank will have two associated process groups. + +To the user, the change is very simple, while taking away all the complexity around applying the parallelism strategies to the model. + +## How do we create these sharding and replication groups? + +These process groups are one of the keys to DMPCollection’s performant implementation. From our earlier diagram, we showed a simple 2x2 GPU setup, however, at scale, how do we assign which ranks are part of a given sharding group and what are their replica ranks across the sharding groups? + +Consider the following setup with 2 nodes, each with 4 GPUs. The sharding and replication groups under 2D parallel will be, + + + + + + + +
    + + + + + + + + + + + + + + +
    Sharding Group + Sharding Ranks +
    0 + 0, 2, 4, 6 +
    1 + 1, 3, 5, 7 +
    + + +
    + + + + + + + + + + + + + + + + + + + + + + +
    Replication Group + Replication Ranks +
    0 + 0, 1 +
    1 + 2, 3 +
    2 + 4, 5 +
    3 + 6, 7 +
    + + +
    + + +We use the following formulation, + + + +1. Divide all trainers into G sharding groups, each with L trainers + 1. Groups, G, is determined by G = T / L, where T is total number of trainers +2. For each group, G, we assigned non-contiguous trainer ranks based on the group it’s in, following, + 2. [i, G+i, 2G+i, ..., (L - 1) G+i], where* i = 0 to G-1* +3. From the groups, G, we can create the replication group, which is every G continuous ranks + 3. (0 to G-1, G to 2* G - 1) each continuous set stores the duplicate embedding table shards. + +This means our sharding groups, G, are of size L, which can be known as the number of ranks to apply model parallel across. This, in turn, gives us replica groups, each of size G, which are the ranks we data parallel across. + +In DMPCollection, we’re able to create these process groups efficiently with the use of DeviceMesh, we create the entire GPU topology in a 2x2 matrix, with each row representing the group of sharding ranks and each column representing the corresponding replica ranks, + +``` +create peer matrix +num_groups = global_world_size // sharding_group_size +for each group_rank in num_groups: + peers = [num_groups * rank + group_rank for rank in range(sharding_group_size)] + add peer to peer matrix + +initalize DeviceMesh with two dimensions (shard, replicate) +slice DeviceMesh on shard for sharding process group +slide DeviceMesh on replicate for replica process group +``` + +With our DeviceMesh approach, should we want to change the topology or provide further flexibility in the future, we can easily extend our creation logic to any form of topologies and even extend for further dimensions of parallelism if needed. + +## Performance of 2D parallel + +Our rank partitioning strategy optimizes communication patterns by strategically placing model replica ranks for each shard within the same compute node. This architecture provides significant performance benefits for the weight synchronization operation. After the backward pass, we perform all-reduce operations to synchronize model weights—which is an expensive process given the large parameter counts we have to communicate and sync—with our setup of placing replicas on the same node we leverage intra node’s high-bandwidth over-relying on slower inter-node bandwidth. + +The effect of this design choice on the other communication collectives generally improves the latencies. The improvement stems from two factors. + + + +1. By sharding the embedding tables over a reduced number of ranks and conducting communications for the model within the smaller group, we achieve a lower all-to-all latency. +2. With the replication in 2D parallel, our embedding lookup latency on a rank reduces, we can reduce the local batch size to 1/Nth of the equivalent global batch size, where N is the number of model replicas. + +A production model trace exemplifies these two factors, here we run the 2D parallel job on 1024 GPUs, with a sharding group size of 256 GPUs. + +![State diagram](/assets/images/scaling-recommendation-2d-sparse-parallelism/fg2.png){:style="width:100%"} + +***Figure 2: Comparing latencies between non 2D parallel and 2D parallel workloads*** + +There are two key levers users have to tune to maximize performance for their workloads: + + + +1. The size of the model sharding group relative to the global world size. The global world size divided by the sharding group size represents the number of model replicas we will have. + 1. To maximize performance, users can look to scale up their model up to 8x, this scaling factor maintains the intra-host all reduce. + 1. For further scaling, the all reduce would have to happen over inter host. From our experiments, we did not see an obvious performance regression and in fact note advantages of an inter host all reduce. We can change our sharding and replica topology to inter host all reduce, which can help us introduce fault tolerance strategies should a particular host go down. +2. Frequency of all reduce synchronization, DMPCollection comes with a sync() call, which can be tuned to be called every N training steps, performing a sort of local SGD training. With scale, reducing the frequency of synchronization can bring significant gains to performance. + +## Future Work + +Readers should note that 2D sparse parallel training differs from non-parallelized training because we synchronize the embedding table weights rather than the gradients. This approach is made possible by TorchRec's use of FBGEMM, which provides optimized kernels under the hood. One of FBGEMM's key optimizations is the fusion of the optimizer in the backward pass. Instead of fully materializing the embedding table gradients—which would consume significant memory—they are passed directly to the optimizer update. Attempting to materialize and synchronize these gradients would create substantial overhead, making that approach impractical. + +Our exploration revealed that to achieve training results comparable to the baseline, we synchronize optimizer states on a delayed schedule, with the timing dependent on the number of sharding/replica groups (ie: for Adagrad we update the momentum behind by one sync step). This approach also enables users to implement local SGD or semi-synchronized training strategies, which can achieve convergence and potentially produce better loss curves than the baseline. + +We thank you for reading our post! This is an exciting direction we have come across that we hope to develop further to maximize performance of recommendation systems and push the state of the art. + + \ No newline at end of file diff --git a/_posts/2025-03-13-pytorch-landscape.md b/_posts/2025-03-13-pytorch-landscape.md new file mode 100644 index 000000000000..4cc3687be952 --- /dev/null +++ b/_posts/2025-03-13-pytorch-landscape.md @@ -0,0 +1,44 @@ +--- +layout: blog_detail +title: "Introducing the New PyTorch Landscape: Your Guide to the PyTorch Ecosystem" +--- + +We’re excited to reveal our brand new PyTorch Landscape. The [PyTorch Landscape](https://landscape.pytorch.org/) helps researchers, developers, and organizations easily locate useful, curated, community-built tools that augment the PyTorch core framework. + + +landscape banner + +## What the Landscape Offers + +The Landscape visually organizes projects into three categories—Modeling, Training, and Optimizations—making finding relevant frameworks, libraries, and projects easy. Users can quickly locate curated, valuable tools for a variety of use cases that complement the PyTorch framework. Each tool that is part of the Landscape has been reviewed and vetted by PyTorch project experts. The projects in the Landscape are considered to be mature and healthy and provide valuable capabilities that complement the PyTorch framework in their respective domains. + + +## Explore the AI Landscape + +The **Explore** page presents platforms, tools, and libraries, each with a logo, description, and links to GitHub and further details. This categorized, visual approach simplifies discovery and provides quick access to essential technologies. + + +## Guide Page: A Closer Look + +For deeper insights, the **Guide** page expands on each project, highlighting methodologies and trends shaping AI development, from adversarial robustness to self-supervised learning. There are also project statistics provided for each project, including metrics such as number of stars, contributors, commit history, languages used, license, and other valuable metrics that provide an in-depth understanding of the project and how it may be used. + + +## Tracking AI’s Growth: The Stats Page + +The **Stats** page provides insights into AI development trends, tracking repository activity, programming languages, and industry funding data. + +* Repositories: 117 repositories, 20.5k contributors, and 797.2k stars across 815MB of source code. +* Development Trends: Weekly commit activity over the last year. +* Licensing Breakdown: Repositories are categorized by license type. +* Funding & Acquisitions: Insights into investment trends, including funding rounds and acquisitions. + + +## Why Use the PyTorch Landscape? + +Finding useful and high quality open source projects that complement the PyTorch core system can be overwhelming. The PyTorch Landscape offers a clear, accessible way to explore the ecosystem of community-built tools, whether you're researching, building models, or making strategic decisions. + +Stay ahead with the [PyTorch Landscape](https://landscape.pytorch.org/) — your guide to the PyTorch Ecosystem. + +## Want to Contribute a Project to the PyTorch Landscape? + +Have you built a useful open source tool that you would like to share with the PyTorch community? Then help us grow the Ecosystem by contributing your tool! You can find the [instructions to apply here](https://github.com/pytorch-fdn/ecosystem). We welcome all contributions from the community! \ No newline at end of file diff --git a/_posts/2025-03-16-pytorch-at-gtc.md b/_posts/2025-03-16-pytorch-at-gtc.md new file mode 100644 index 000000000000..94be8a113f5f --- /dev/null +++ b/_posts/2025-03-16-pytorch-at-gtc.md @@ -0,0 +1,109 @@ +--- +layout: blog_detail +title: "PyTorch at GTC 2025" +author: "Team PyTorch at NVIDIA" +hidden: true +--- + +[GTC](https://www.nvidia.com/gtc/) is coming back to San Jose on March 17–21, 2025. Join PyTorch Foundation members Arm, AWS, Google Cloud, IBM, Lightning AI, Meta, Microsoft Azure, Snowflake, and thousands of developers as we celebrate PyTorch. Together learn how AI & accelerated computing are helping humanity solve our most complex challenges. + +Join in person with [discounted GTC registration](https://www.nvidia.com/gtc/?ncid=GTC-NVI0K8HVX) for PyTorch Foundation or [watch online](https://register.nvidia.com/flow/nvidia/gtcs25/registration/) with free registration. + + +![book cover](/assets/images/pytorch-at-gtc.jpg){:style="max-width:500px; display: block; margin-left: auto; margin-right: auto"} + + +### [Scaling Open Source AI: From Foundation Models to Ecosystem Success](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1738966749087001K1dG) + +Hear from PyTorch Foundation’s Executive Director Matt White & panelists from UC Berkeley, Meta, NVIDIA, & Sequoia Capital how open source is transforming AI development, bringing together experts from industry, academia, and venture capital to discuss the technical and business aspects of collaborative open source AI development They’ll examine how open source projects like PyTorch, vLLM, Ray, and NVIDIA's NeMo are accelerating AI innovation while creating new opportunities for businesses and researchers. They'll share real-world experiences from PyTorch's development, Berkeley's research initiatives, and successful AI startups. Take away valuable insights into the technical and business aspects of open source AI. – Monday, Mar 17 10:00 AM - 11:00 AM PDT + + +## PyTorch @ GTC + +[The Performance of CUDA with the Flexibility of PyTorch ](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1726155993061001WWZM) +Mark Saroufim, Software Engineer, Meta Platforms + +This talk explores how PyTorch users are also becoming CUDA developers. We'll start with motivating examples from eager, the launch of torch.compile and the more recent trend of kernel zoos. We will share details on how we went about integrating low bit matmuls in torchao and the torch.compile CUTLASS backend. We'll also discuss details on how you can define, build and package your own custom ops in PyTorch so you get the raw performance of CUDA while maintaining the flexibility of PyTorch. + +[Make My PyTorch Model Fast, and Show Me How You Did It](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1727978036338001UVLu) +Thomas Viehmann, Principal Research Engineer, Lightning AI +Luca Antiga, CTO, Lightning AI + +PyTorch is popular in deep learning and LLMs for richness and ease of expressions. To make the most of compute resources, PyTorch models benefit from nontrivial optimizations, but this means losing some of their ease and understandability. Learn how with Thunder, a PyTorch-to-Python compiler focused on usability, understandability, and extensibility, you can optimize and transform (i.e., distribute across many machines) models while • leaving the PyTorch code unchanged • targeting a variety of models without needing to adapt to each of them • understanding each transformation step because the results are presented as simple Python code • accessing powerful extension code for your own optimizations with just one or a few lines of code We'll show how the combination of Thunder transforms and the NVIDIA stack (NVFuser, cuDNN, Apex) delivers optimized performance in training and inference on a variety of models. + +[FlexAttention: The Flexibility of PyTorch With the Performance of FlashAttention](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1726184633014001Jh5G) +Driss Guessous, Machine Learning Engineer, Meta Platforms + +Introducing FlexAttention: a novel PyTorch API that enables custom, user-defined attention mechanisms with performance comparable to state-of-the-art solutions. By leveraging the PyTorch compiler stack, FlexAttention supports dynamic modifications to attention scores within SDPA, achieving both runtime and memory efficiency through kernel fusion with the FlashAttention algorithm. Our benchmarks on A100 GPUs show FlexAttention achieves 90% of FlashAttention2's performance in forward passes and 85% in backward passes. On H100 GPUs, FlexAttention's forward performance averages 85% of FlashAttention3 and is ~25% faster than FlashAttention2, while backward performance averages 76% of FlashAttention3 and is ~3% faster than FlashAttention2. Explore how FlexAttention balances near-state-of-the-art performance with unparalleled flexibility, empowering researchers to rapidly iterate on attention mechanisms without sacrificing efficiency. + +[Keep Your GPUs Going Brrr : Crushing Whitespace in Model Training](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1731693095418001cruA) +Syed Ahmed, Senior Software Engineer, NVIDIA +Alban Desmaison, Research Engineer, Meta +Aidyn Aitzhan, Senior Software Engineer, NVIDIA + +Substantial progress has recently been made on the compute-intensive portions of model training, such as high-performing attention variants. While invaluable, this progress exposes previously hidden bottlenecks in model training, such as redundant copies during collectives and data loading time. We'll present recent improvements in PyTorch achieved through Meta/NVIDIA collaboration to tackle these newly exposed bottlenecks and how practitioners can leverage them. + +[Accelerated Python: The Community and Ecosystem](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1727176757800001qp7T) +Andy Terrel, CUDA Python Product Lead, NVIDIA +Jeremy Tanner, Open Source Programs, NVIDIA +Anshuman Bhat, CUDA Product Management, NVIDIA + +Python is everywhere. Simulation, data science, and Gen AI all depend on it. Unfortunately, the dizzying array of tools leaves a newcomer baffled at where to start. We'll take you on a guided tour of the vibrant community and ecosystem surrounding accelerated Python programming. Explore a variety of tools, libraries, and frameworks that enable efficient computation and performance optimization in Python, including CUDA Python, RAPIDS, Warp, and Legate. We'll also discuss integration points with PyData, PyTorch, and JAX communities. Learn about collaborative efforts within the community, including open source projects and contributions that drive innovation in accelerated computing. We'll discuss best practices for leveraging these frameworks to enhance productivity in developing AI-driven applications and conducting large-scale data analyses. + +[Supercharge large scale AI with Google Cloud AI hypercomputer (Presented by Google Cloud)](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1734571562315001xMKM) +Deepak Patil, Product Manager, Google Cloud +Rajesh Anantharaman, Product Management Lead, ML Software, Google Cloud + +Unlock the potential of your large-scale AI workloads with Google Cloud AI Hypercomputer – a supercomputing architecture designed for maximum performance and efficiency. In this session, we will deep dive into PyTorch and JAX stacks on Google Cloud on NVIDIA GPUs, and showcase capabilities for high performance foundation model building on Google Cloud. + +[Peering Into the Future: What AI and Graph Networks Can Mean for the Future of Financial Analysis](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1739906058885001OxEF) +Siddharth Samsi, Sr. Solutions Architect, NVIDIA +Sudeep Kesh, Chief Innovation Officer, S&P Global + +Artificial Intelligence, agentic systems, and graph neural networks (GNNs) are providing the new frontier to assess, monitor, and estimate opportunities and risks across work portfolios within financial services. Although many of these technologies are still developing, organizations are eager to understand their potential. See how S&P Global and NVIDIA are working together to find practical ways to learn and integrate such capabilities, ranging from forecasting corporate debt issuance to understanding capital markets at a deeper level. We'll show a graph representation of market data using the PyTorch-Geometric library and a dataset of issuances spanning three decades and across financial and non-financial industries. Technical developments include generation of a bipartite graph and link-prediction GNN forecasting. We'll address data preprocessing, pipelines, model training, and how these technologies can broaden capabilities in an increasingly complex world. + +[Unlock Deep Learning Performance on Blackwell With cuDNN](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1727984645671001Y9eq) +Yang Xu (Enterprise Products), DL Software Engineering Manager, NVIDIA + +Since its launch, cuDNN, a library for GPU-accelerating deep learning (DL) primitives, has been powering many AI applications in domains such as conversational AI, recommender systems, and speech recognition, among others. CuDNN remains a core library for DL primitives in popular frameworks such as PyTorch, JAX, Tensorflow, and many more while covering training, fine-tuning, and inference use cases. Even in the rapidly evolving space of Gen AI — be it Llama, Gemma, or mixture-of-experts variants requiring complex DL primitives such as flash attention variants — cuDNN is powering them all. Learn about new/updated APIs of cuDNN pertaining to Blackwell’s microscaling format, and how to program against those APIs. We'll deep dive into leveraging its graph APIs to build some fusion patterns, such as matmul fusion patterns and fused flash attention from state-of-the-art models. Understand how new CUDA graph support in cuDNN, not to be mistaken with the cuDNN graph API, could be exploited to avoid rebuilding CUDA graphs, offering an alternative to CUDA graph capture with real-world framework usage. + +[Train and Serve AI Systems Fast With the Lightning AI Open-Source Stack (Presented by Lightning AI)](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1736347047099001au7y) +Luca Antiga, CTO, Lightning AI + +See how the Lightning stack can cover the full life cycle, from data preparation to deployment, with practical examples and particular focus on distributed training and high-performance inference. We'll show examples that focus on new features like support for multi-dimensional parallelism through DTensors, as well as quantization through torchao. + + +## Connect With Experts (Interactive Sessions) + +[Meet the Experts From Deep Learning Framework Teams ](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1728516848639001tO7H) +Eddie Yan, Technical Lead of PyTorch, NVIDIA +Masaki Kozuki, Senior Software Engineer in PyTorch, NVIDIA +Patrick Wang (Enterprise Products), Software Engineer in PyTorch, NVIDIA +Mike Ruberry, Distinguished Engineer in Deep Learning Frameworks, NVIDIA +Rishi Puri, Sr. Deep Learning Engineer and Lead for PyTorch Geometric, NVIDIA + + +## Training Labs + +[Kernel Optimization for AI and Beyond: Unlocking the Power of Nsight Compute ](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1726073884811001C0za) +Felix Schmitt, Sr. System Software Engineer, NVIDIA +Peter Labus, Senior System Software Engineer, NVIDIA + +Learn how to unlock the full potential of NVIDIA GPUs with the powerful profiling and analysis capabilities of Nsight Compute. AI workloads are rapidly increasing the demand for GPU computing, and ensuring that they efficiently utilize all available GPU resources is essential. Nsight Compute is the most powerful tool for understanding kernel execution behavior and performance. Learn how to configure and launch profiles customized for your needs, including advice on profiling accelerated Python applications, AI frameworks like PyTorch, and optimizing Tensor Core utilization essential to modern AI performance. Learn how to debug your kernel and use the expert system built into Nsight Compute, known as “Guided Analysis,” that automatically detects common issues and directs you to the most relevant performance data all the way down to the source code level. + +[Make Retrieval Better: Fine-Tuning an Embedding Model for Domain-Specific RAG](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1725042189130001cmoW) +Gabriel Moreira, Sr. Research Scientist, NVIDIA +Ronay Ak, Sr. Data Scientist, NVIDIA + +LLMs power AI applications like conversational chatbots and content generators, but are constrained by their training data. This might lead to hallucinations in content generation, which requires up-to-date or domain-specific information. Retrieval augmented generation (RAG) addresses this issue by enabling LLMs to access external context without modifying model parameters. Embedding or dense retrieval models are a key component of a RAG pipeline for retrieving relevant context to the LLM. However, an embedding model’s effectiveness to capture the unique characteristics of the custom data hinges on the quality and domain relevance of its training data. Fine-tuning embedding models is gaining interest to provide more accurate and relevant responses tailored to users’ specific domain. + +In this lab, you'll learn to generate a synthetic dataset with question-context pairs from a domain-specific corpus, and process the data for fine-tuning. Then, fine-tune a text embedding model using synthetic data and evaluate it. + + +## Poster Presentations + +[Single-View X-Ray 3D Reconstruction Using Neural Back Projection and Frustum Resampling](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1729781473379001KiPD) +Tran Minh Quan, Developer Technologist, NVIDIA + +[Enable Novel Applications in the New AI Area in Medicine: Accelerated Feature Computation for Pathology Slides](https://www.nvidia.com/gtc/session-catalog/?regcode=no-ncid&ncid=no-ncid&tab.catalogallsessionstab=16566177511100015Kus&search=pytorch#/session/1729757102989001KDG4) +Nils Bruenggel, Principal Software Engineer, Roche Diagnostics Int. AG \ No newline at end of file diff --git a/_posts/2025-03-19-pt-day-china-2025-cfp.md b/_posts/2025-03-19-pt-day-china-2025-cfp.md new file mode 100644 index 000000000000..44f98dfd7ee1 --- /dev/null +++ b/_posts/2025-03-19-pt-day-china-2025-cfp.md @@ -0,0 +1,60 @@ +--- +layout: blog_detail +title: "PyTorch Day China 2025 Call for Proposals Open" +--- + +We’re excited to announce the **first-ever [PyTorch Day China](https://www.lfasiallc.com/pytorch-day-china/)**! This new event, hosted by the PyTorch Foundation, will take place on **June 7 in Beijing, China**, bringing together AI practitioners, researchers, and industry professionals to explore the latest advancements in open source AI and machine learning. Co-located with the **BAAI Conference**, PyTorch Day China is a chance to connect with the community, share knowledge, and help shape the future of deep learning. + + +![PyTorch Day China 2025 Call for Proposals Open](/assets/images/pt-day-china-2025-cfp.jpg){:style="max-width:500px; display: block; margin-left: auto; margin-right: auto"} + + +## Why Submit a Proposal? + +PyTorch Day China offers a platform for AI practitioners and researchers to showcase their work, exchange ideas, and connect with others in the community. If you're working on innovative applications, tools, or research in the PyTorch ecosystem, we encourage you to share your expertise. + + +## Topics for Submission: + + + +* AI Applications and Use Cases +* Core PyTorch Framework +* DL Compilers and Kernel Authoring +* Edge AI and On-Device +* Ethical AI, Governance, and Regulation +* Generative AI and Large Language Models (LLMs) with PyTorch +* Open Source Collaboration, Education, and Community Building +* Optimization for Training and Inference +* PyTorch on Accelerator Hardware +* PyTorch Ecosystem and Tools +* PyTorch in Research and Academia +* Performance Measurement and Benchmarking +* Scaling Training and Inference + +**The submission deadline is April 13. Submit and learn more here:** [https://www.lfasiallc.com/pytorch-day-china/call-for-proposals-cfp/](https://www.lfasiallc.com/pytorch-day-china/call-for-proposals-cfp/) + + +## Why Attend? + +PyTorch Day China will feature **technical talks, discussions, and poster sessions** that highlight real-world applications and developments in AI and machine learning. Attendees will have the opportunity to learn from experts, contribute to the open source community, and engage with fellow PyTorch users. Registration information will be available in April. + + +## Event Details + +* **Date:** June 7, 2025 +* **Location:** Zhongguancun Exhibition Center, Beijing, China +* **Address:** 索家坟, Hai Dian Qu, Bei Jing Shi, China, 100080 +* **Co-located with:** BAAI Conference + + +## Travel Information + +The venue, **Zhongguancun Exhibition Center**, is approximately **39 km from Beijing International Airport**. More details on travel and accommodation will be available on the **BAAI Conference website** and updated here as they become available. + + +## Have Questions? + +For inquiries, please contact pytorchevents@linuxfoundation.org. + +Submit your proposal by **April 13** and join the conversation shaping the future of PyTorch. \ No newline at end of file diff --git a/_posts/2025-03-19-sglang-joins-pytorch.md b/_posts/2025-03-19-sglang-joins-pytorch.md new file mode 100644 index 000000000000..1334a6b6a52c --- /dev/null +++ b/_posts/2025-03-19-sglang-joins-pytorch.md @@ -0,0 +1,105 @@ +--- +layout: blog_detail +title: "SGLang Joins PyTorch Ecosystem: Efficient LLM Serving Engine" +author: "SGLang Team" +hidden: true +--- + + +![sglang logo](/assets/images/sglang-join-pytorch/fg1.png){:style="max-width:400px; display: block; margin-left: auto; margin-right: auto"} + + +We’re thrilled to announce that the SGLang project has been integrated into the PyTorch ecosystem! This integration ensures that SGLang aligns with PyTorch’s standards and practices, providing developers with a reliable and community-supported framework for fast and flexible serving of LLMs. + +To view the PyTorch Ecosystem, see the [PyTorch Landscape](https://landscape.pytorch.org/) and learn more about how projects can [join the PyTorch Ecosystem](https://github.com/pytorch-fdn/ecosystem). + + +## About SGLang + +SGLang is a fast-serving engine for large language models and vision language models. It makes the interaction with models faster and more controllable by co-designing the backend runtime and frontend language. + +The core features include: + +* Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, zero-overhead CPU scheduler, continuous batching, token attention (paged attention), speculative decoding, tensor parallelism, chunked prefill, structured outputs, and quantization (FP8/INT4/AWQ/GPTQ). +* Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions. +* Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, Qwen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models. +* Active Community: SGLang is open source and backed by an active community with industry adoption. + +SGLang is famous for its fast speed. It can often significantly outperform other state-of-the-art frameworks in terms of serving throughput and latency. You can learn more about the underlying techniques from the past release blog posts: [v0.2 blog](https://lmsys.org/blog/2024-07-25-sglang-llama3/), [v0.3 blog](https://lmsys.org/blog/2024-09-04-sglang-v0-3/), [v0.4 blog](https://lmsys.org/blog/2024-12-04-sglang-v0-4/). + +SGLang has been widely adopted by leading industry companies and frontier research labs. For example, xAI uses SGLang to serve its flagship model, [Grok 3](https://grok.com/), which is currently the best model according to the Chatbot Arena leaderboard. Microsoft Azure uses SGLang to serve [DeepSeek R1](https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/running-deepseek-r1-on-a-single-ndv5-mi300x-vm/4372726) on AMD GPUs, which is currently the best open source model. + + +## Serving DeepSeek Models + +You can easily launch a Docker container to serve a DeepSeek model with the following command: + +``` +# Pull the latest image +docker pull lmsysorg/sglang:latest + +# Launch a server +docker run --gpus all --shm-size 32g -p 30000:30000 -v ~/.cache/huggingface:/root/.cache/huggingface --ipc=host --network=host --privileged lmsysorg/sglang:latest \ + python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --tp 8 --trust-remote-code --port 30000 +``` + +Then you can query the server with the OpenAI-compatible API + +``` +import openai +client = openai.Client(base_url=f"http://127.0.0.1:30000/v1", api_key="None") + +response = client.chat.completions.create( + model="deepseek-ai/DeepSeek-V3", + messages=[ + {"role": "user", "content": "List 3 countries and their capitals."}, + ], + temperature=0, + max_tokens=64, +) +``` + +The server launch command above works for 8xH200. You can find detailed instructions for other hardware (MI300X, H100, A100, H20, L40S) at https://docs.sglang.ai/references/deepseek.html. + +SGLang integrates DeepSeek-specific optimizations, such as MLA throughput optimizations, MLA-optimized kernels, data-parallel attention, multi-token prediction, and DeepGemm, making it the top choice for serving DeepSeek models by dozens of [companies](https://x.com/lmsysorg/status/1887262321636221412), including AMD, NVIDIA, and many cloud providers. The team is actively working on integrating more optimizations following the 2025 H1 roadmap below. + + +## Serving Llama Models + +Similarly, you can launch the server for a Llama 3.1 text model with: + +``` +python -m sglang.launch_server --model-path meta-llama/Meta-Llama-3.1-8B-Instruct +``` + +Or a Llama 3.2 multimodal model with: + +``` +python3 -m sglang.launch_server --model-path meta-llama/Llama-3.2-11B-Vision-Instruct --chat-template=llama_3_vision +``` + + +## Roadmap + +This year, the SGLang team will continue to push the boundaries of system efficiency. You can find the roadmap of 2025H1 [here](https://github.com/sgl-project/sglang/issues/4042). The focus is + +- Throughput-oriented large-scale deployment similar to the DeepSeek inference system +- Long context optimizations +- Low latency speculative decoding +- Reinforcement learning training framework integration +- Kernel optimizations + +## Community + +SGLang has been deployed to large-scale production, generating trillions of tokens every day. It has an active community with over three hundred contributors on GitHub. It is supported by the following institutions: AMD, Atlas Cloud, Baseten, Cursor, DataCrunch, Etched, Hyperbolic, iFlytek, Jam & Tea Studios, LinkedIn, LMSYS, Meituan, Nebius, Novita AI, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, and 01.AI. + + +![logos](/assets/images/sglang-join-pytorch/fg2.png){:style="width:100%;"} + + + +## Conclusion + +We’re excited to welcome SGLang to the PyTorch ecosystem. SGLang accelerates the serving of large language and vision language models. It’s widely adopted by industry, powering the large-scale online serving of frontier models like Grok and DeepSeek. + +We invite you to explore the [SGLang GitHub repo](https://github.com/sgl-project/sglang/tree/main), join the [community on Slack](https://slack.mindee.com/), and reach out to [contact@sglang.ai](mailto:contact@sglang.ai) for inquiries or collaboration opportunities. Together, we can make powerful AI models accessible to everyone. \ No newline at end of file diff --git a/_posts/2025-04-03-pt-day-france-cfp.md b/_posts/2025-04-03-pt-day-france-cfp.md new file mode 100644 index 000000000000..9ed63b302833 --- /dev/null +++ b/_posts/2025-04-03-pt-day-france-cfp.md @@ -0,0 +1,58 @@ +--- +layout: blog_detail +title: "PyTorch Day France 2025: Call For Proposals Open" +--- + +We’re pleased to announce **[PyTorch Day France 2025](https://events.linuxfoundation.org/pytorch-day-france/)**, a dedicated gathering of the PyTorch community held **7 May 2025** in **Paris, France**. Proudly hosted by the **PyTorch Foundation** and co-located with **[GOSIM AI Paris 2025](https://paris2025.gosim.org/)**, this event will bring together developers, researchers, and practitioners driving innovation in open source AI and machine learning. + +Whether you're building cutting-edge models or contributing to the ecosystem, PyTorch Day France is your opportunity to connect, collaborate, and help shape the future of deep learning. + + + +![PT Day CFP](/assets/images/pt-day-cfp.png){:style="max-width:600px; display: block; margin-left: auto; margin-right: auto"} + + +## Why Attend? + +Set in the vibrant atmosphere of STATION F, the world’s largest startup campus, PyTorch Day France will offer a full day of: + +* Insightful Technical Talks +* Interactive Discussions +* Engaging Poster Sessions + +The event is designed to foster open exchange across the PyTorch ecosystem, providing a space to learn from peers, share practical insights, and explore the latest research and applications in AI. + + +## Submit a Proposal + +We are currently accepting proposals for talks. If you have a project, idea, or research story you'd like to share with the PyTorch community, we want to hear from you. + +📩 Email your **talk title and abstract** to [pytorchevents@linuxfoundation.org](mailto:pytorchevents@linuxfoundation.org) for consideration. + + +## Registration + +To register for PyTorch Day France, please visit the **GOSIM AI Paris website**, and use the code PYTORCHFRIEND to receive 25% off. + +👉 [https://paris2025.gosim.org/](https://paris2025.gosim.org/) + +We encourage early registration to secure your spot and ensure access to both PyTorch Day France and the broader GOSIM AI Paris programming. + + +## Venue + +STATION F +5 Parv. Alan Turing, 75013 Paris, France +A landmark of innovation and entrepreneurship in the heart of Paris. + + +## Travel and Accommodations + +Participants are responsible for their own travel and lodging. For those arriving internationally, Paris Charles de Gaulle Airport is approximately 38.4 km from STATION F. Additional information about accommodations and transportation may be available on the [GOSIM AI Paris website](https://paris2025.gosim.org/). + + +## Questions? + +For any inquiries, please contact us at [pytorchevents@linuxfoundation.org](mailto:pytorchevents@linuxfoundation.org). + +We look forward to welcoming the PyTorch community to Paris this May for a day of collaboration, learning, and open source AI innovation. \ No newline at end of file diff --git a/_posts/2025-04-08-accelerating-whisper-arm-w-transformers.md b/_posts/2025-04-08-accelerating-whisper-arm-w-transformers.md new file mode 100644 index 000000000000..10db0cabc270 --- /dev/null +++ b/_posts/2025-04-08-accelerating-whisper-arm-w-transformers.md @@ -0,0 +1,39 @@ +--- +layout: blog_detail +title: "Accelerating Whisper on Arm with PyTorch and Hugging Face Transformers" +author: Pareena Verma, Arm +--- + +Automatic speech recognition (ASR) has revolutionized how we interact with technology, clearing the way for applications like real-time audio transcription, voice assistants, and accessibility tools. OpenAI Whisper is a powerful model for ASR, capable of multilingual speech recognition and translation. + +A new Arm Learning Path is now available that explains how to accelerate Whisper on Arm-based cloud instances using PyTorch and Hugging Face transformers. + +**Why Run Whisper on Arm?** + +Arm processors are popular in cloud infrastructure for their efficiency, performance, and cost-effectiveness. With major cloud providers such as AWS, Azure, and Google Cloud offering Arm-based instances, running machine learning workloads on this architecture is becoming increasingly attractive. + +**What You’ll Learn** + +The [Arm Learning Path](https://learn.arm.com/learning-paths/servers-and-cloud-computing/whisper/) provides a structured approach to setting up and accelerating Whisper on Arm-based cloud instances. Here’s what you cover: + +**1. Set Up Your Environment** + +Before running Whisper, you must set up your development environment. The learning path walks you through setting up an Arm-based cloud instance and installing all dependencies, such as PyTorch, Transformers, and ffmpeg. + +**2. Run Whisper with PyTorch and Hugging Face Transformers** + +Once the environment is ready, you will use the Hugging Face transformer library with PyTorch to load and execute Whisper for speech-to-text conversion. The tutorial provides a step-by-step approach for processing audio files and generating audio transcripts. + +**3. Measure and Evaluate Performance** + +To ensure efficient execution, you learn how to measure transcription speeds and compare different optimization techniques. The guide provides insights into interpreting performance metrics and making informed decisions on your deployment. + +**Try it Yourself** + +Upon completion of this tutorial, you know how to: + +* Deploy Whisper on an Arm-based cloud instance. +* Implement performance optimizations for efficient execution. +* Evaluate transcription speeds and optimize further based on results. + +**Try the live demo today** and see audio transcription in action on Arm: [Whisper on Arm Demo](https://learn.arm.com/learning-paths/servers-and-cloud-computing/whisper/_demo/). \ No newline at end of file diff --git a/_posts/2025-04-23-pytorch-2-7.md b/_posts/2025-04-23-pytorch-2-7.md new file mode 100644 index 000000000000..1f31b9f2e6c3 --- /dev/null +++ b/_posts/2025-04-23-pytorch-2-7.md @@ -0,0 +1,161 @@ +--- +layout: blog_detail +title: "PyTorch 2.7 Release" +--- + +We are excited to announce the release of PyTorch® 2.7 ([release notes](https://github.com/pytorch/pytorch/releases/tag/v2.7.0))! This release features: + +* support for the [NVIDIA Blackwell GPU architecture](https://www.nvidia.com/en-us/data-center/technologies/blackwell-architecture/) and pre-built wheels for [CUDA 12.8](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) across Linux x86 and arm64 architectures. +* *torch.compile* support for Torch Function Modes which enables users to override any *torch.** operation to implement custom user-defined behavior. +* Mega Cache which allows users to have end-to-end portable caching for torch; +* new features for FlexAttention - LLM first token processing, LLM throughput mode optimization and Flex Attention for Inference. + +This release is composed of 3262 commits from 457 contributors since PyTorch 2.6. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.7. More information about how to get started with the PyTorch 2-series can be found at our [Getting Started](https://pytorch.org/get-started/pytorch-2.0/) page. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Beta + Prototype +
    Torch.Compile support for Torch Function Modes + NVIDIA Blackwell Architecture Support +
    Mega Cache + PyTorch Native Context Parallel +
    + Enhancing Intel GPU Acceleration +
    + FlexAttention LLM first token processing on x86 CPUs +
    + FlexAttention LLM throughput mode optimization on x86 CPUs +
    + Foreach Map +
    + Flex Attention for Inference +
    + Prologue Fusion Support in Inductor +
    + + +*To see a full list of public feature submissions click [here](https://docs.google.com/spreadsheets/d/1TzGkWuUMF1yTe88adz1dt2mzbIsZLd3PBasy588VWgk/edit?usp=sharing). + + +## BETA FEATURES + + +### [Beta] Torch.Compile support for Torch Function Modes + +This feature enables users to override any *torch.** operation to implement custom user-defined behavior. For example, ops can be rewritten to accommodate a specific backend. This is used in FlexAttention to re-write indexing ops. + +See the [tutorial](https://pytorch.org/tutorials/recipes/torch_compile_torch_function_modes.html) for more information. + + +### [Beta] Mega Cache + +Mega Cache allows users to have end-to-end portable caching for torch. The intended use case is after compiling and executing a model, the user calls *torch.compiler.save_cache_artifacts()* which will return the compiler artifacts in a portable form. Later, potentially on a different machine, the user may call *torch.compiler.load_cache_artifacts()* with these artifacts to pre-populate the torch.compile caches in order to jump-start their cache. + +See the [tutorial](https://pytorch.org/tutorials/recipes/torch_compile_caching_tutorial.html#torch-compile-end-to-end-caching-mega-cache) for more information. + + +## PROTOTYPE FEATURES + + +### [Prototype] NVIDIA Blackwell Architecture Support + +PyTorch 2.7 introduces support for NVIDIA's new Blackwell GPU architecture and ships pre-built wheels for CUDA 12.8. For more details on CUDA 12.8 see [CUDA Toolkit Release](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html). + + + +* Core components and libraries including cuDNN, NCCL, and CUTLASS have been upgraded to ensure compatibility with Blackwell platforms. +* PyTorch 2.7 includes Triton 3.3, which adds support for the Blackwell architecture with torch.compile compatibility. +* To utilize these new features, install PyTorch with CUDA 12.8 using: *pip install torch==2.7.0 --index-url https://download.pytorch.org/whl/cu128* + +More context can also be found [here](https://github.com/pytorch/pytorch/issues/145949). + + +### [Prototype] PyTorch Native Context Parallel + +PyTorch Context Parallel API allows users to create a Python context so that every *torch.nn.functional.scaled_dot_product_attention() *call within will run with context parallelism. Currently, PyTorch Context Parallel supports 3 attention backends: 1. Flash attention; 2. Efficient attention; and 3. cuDNN attention. + +As an example, this is [used within TorchTitan as the Context Parallel solution for LLM training](https://discuss.pytorch.org/t/distributed-w-torchtitan-breaking-barriers-training-long-context-llms-with-1m-sequence-length-in-pytorch-using-context-parallel/215082). + +See [tutorial](https://pytorch.org/tutorials/prototype/context_parallel.html) here. + + +### [Prototype] Enhancing Intel GPU Acceleration + +This latest release introduces enhanced performance optimizations for Intel GPU architectures. These improvements accelerate workloads across various Intel GPUs through the following key enhancements: + + + +* Enable torch.compile on Windows 11 for Intel GPUs, delivering the performance advantages over eager mode as on Linux. +* Optimize the performance of PyTorch 2 Export Post Training Quantization (PT2E) on Intel GPU to provide a full graph mode quantization pipelines with enhanced computational efficiency. +* Improve Scaled Dot-Product Attention (SDPA) inference performance with bfloat16 and float16 to accelerate attention-based models on Intel GPUs. +* Enable AOTInuctor and torch.export on Linux to simplify deployment workflows. +* Implement more Aten operators to enhance the continuity of operators execution on Intel GPU and increase the performance on Intel GPU in eager mode. +* Enable profiler on both Windows and Linux to facilitate model performance analysis. +* Expand the Intel GPUs support to [Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics](https://www.intel.com/content/www/us/en/products/details/processors/core-ultra.html), and [Intel® Arc™ B-Series graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/desktop/b-series/overview.html) on both Windows and Linux. + +For more information regarding Intel GPU support, please refer to [Getting Started Guide](https://pytorch.org/docs/main/notes/get_start_xpu.html). + +See also the tutorials [here](https://pytorch.org/tutorials/prototype/inductor_windows.html) and [here](https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html). + + +### [Prototype] FlexAttention LLM first token processing on x86 CPUs + +FlexAttention x86 CPU support was first introduced in PyTorch 2.6, offering optimized implementations — such as PageAttention, which is critical for LLM inference—via the TorchInductor C++ backend. In PyTorch 2.7, more attention variants for first token processing of LLMs are supported. With this feature, users can have a smoother experience running FlexAttention on x86 CPUs, replacing specific *scaled_dot_product_attention* operators with a unified FlexAttention API, and benefiting from general support and good performance when using torch.compile. + + +### [Prototype] FlexAttention LLM throughput mode optimization + +The performance of FlexAttention on x86 CPUs for LLM inference throughput scenarios has been further improved by adopting the new C++ micro-GEMM template ability. This addresses the performance bottlenecks for large batch size scenarios present in PyTorch 2.6. With this enhancement, users can transparently benefit from better performance and a smoother experience when using FlexAttention APIs and torch.compile for LLM throughput serving on x86 CPUs. + + +### [Prototype] Foreach Map + +This feature uses torch.compile to allow users to apply any pointwise or user-defined function (e.g. torch.add) to lists of tensors, akin to the existing *torch._foreach_** ops. The main advantage over the existing *torch._foreach_** ops is that any mix of scalars or lists of tensors can be supplied as arguments, and even user-defined python functions can be lifted to apply to lists of tensors. Torch.compile will automatically generate a horizontally fused kernel for optimal performance. + +See [tutorial](https://pytorch.org/tutorials/recipes/foreach_map.html) here. + + +### [Prototype] Flex Attention for Inference + +In release 2.5.0, [FlexAttention](https://pytorch.org/blog/flexattention/)* torch.nn.attention.flex_attention* was introduced for ML researchers who’d like to customize their attention kernels without writing kernel code. This update introduces a decoding backend optimized for inference, supporting GQA and PagedAttention, along with feature updates including nested jagged tensor support, performance tuning guides and trainable biases support. + +### [Prototype] Prologue Fusion Support in Inductor + +Prologue fusion optimizes matrix multiplication (matmul) operations by fusing operations that come before the matmul into the matmul kernel itself, improving performance by reducing global memory bandwidth. diff --git a/_posts/2025-04-25-pytorch-2-7-intel-gpus.md b/_posts/2025-04-25-pytorch-2-7-intel-gpus.md new file mode 100644 index 000000000000..7643d20ae51b --- /dev/null +++ b/_posts/2025-04-25-pytorch-2-7-intel-gpus.md @@ -0,0 +1,92 @@ +--- +layout: blog_detail +title: "Accelerate PyTorch 2.7 on Intel® GPUs" +author: the Intel PyTorch Team +--- + +[PyTorch 2.7](https://pytorch.org/blog/pytorch-2-7/) continues to deliver significant functionality and performance enhancements on Intel® GPU architectures to streamline AI workflows. Application developers and researchers seeking to fine-tune, inference and develop PyTorch models on Intel GPUs will now have a consistent user experience across various operating systems, including Windows, Linux and Windows Subsystem for Linux (WSL2). This is made possible through improved installation, eager mode script debugging, a performance profiler, and graph model (torch.compile) deployment. As a result, developers have greater options with a unified GPU programming paradigm for both front-end and back-end development. + +## Incremental improvements of Intel GPU support in PyTorch + +Since PyTorch 2.4, we've made steady improvements to Intel GPU support with each release. With PyTorch 2.7, we are excited to share that we have established a solid foundation to have Intel GPU work in both graph mode (torch.compile) and eager mode on Windows and Linux. This includes a wide range of Intel GPU products, many of which you may already access. We hope these enhancements will unlock more ubiquitous hardware for your AI research and development. + +* Over time, we have expanded Intel GPU Support across Windows and Linux, including these products: + * [Intel® Arc™ A-Series Graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/desktop/a-series/overview.html) + * [Intel® Arc™ B-Series Graphics](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/arc/desktop/b-series/overview.html) + * [Intel® Core™ Ultra Processors with Intel Arc Graphics](https://www.intel.com/content/www/us/en/support/articles/000097599/processors.html) + * [Intel® Core™ Ultra Mobile Processors (Series 2) with Intel Arc Graphics](https://www.intel.com/content/www/us/en/products/docs/processors/core-ultra/core-ultra-series-2-mobile-product-brief.html) + * [Intel® Core™ Ultra Desktop Processors (Series 2) with Intel Arc Graphics](https://www.intel.com/content/www/us/en/products/docs/processors/core-ultra/core-ultra-desktop-processors-series-2-brief.html) + * [Intel® Data Center GPU Max Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html) +* [Simpler installation](https://pytorch.org/docs/2.7/notes/get_start_xpu.html) of torch-xpu PIP wheels and an effortless setup experience. +* High ATen operation coverage with SYCL and oneDNN for smooth eager mode support with functionality and performance. +* Notable speedups with torch.compile through default TorchInductor and Triton backend, proved by measurable performance gains with Hugging Face, TIMM, and TorchBench benchmarks. + +Check out the detailed advancements in these related release blogs:[ PyTorch 2.4](https://pytorch.org/blog/intel-gpus-pytorch-2-4/),[ PyTorch 2.5](https://pytorch.org/blog/intel-gpu-support-pytorch-2-5/), and[ PyTorch 2.6](https://pytorch.org/blog/unlocking-pt-2-6-intel/). + + +## What's New in PyTorch 2.7 + +These are the features in PyTorch 2.7 that were added to help accelerate performance on Intel GPUs. + + + +* Improve scaled dot-product attention (SDPA) inference performance with bfloat16 and float16 to accelerate attention-based models on Intel GPUs. +With the new SDPA optimization for Intel GPUs on PyTorch 2.7, Stable Diffusion float16 inference achieved up to 3x gain over PyTorch 2.6 release on Intel® Arc™ B580 Graphics and Intel® Core™ Ultra 7 Processor 258V with Intel® Arc™ Graphics 140V on eager mode. See Figure 1 below. + + +![chart](/assets/images/pytorch-2-7-intel-gpus/fg1.png){:style="width:100%"} + +**Figure 1. PyTorch 2.7 Stable Diffusion Performance Gains Over PyTorch 2.6** + +* Enable torch.compile on Windows 11 for Intel GPUs, delivering the performance advantages over eager mode as on Linux. With this, Intel GPUs became the first accelerator to support torch.compile on Windows. Refer to[ Windows tutorial](https://pytorch.org/tutorials/prototype/inductor_windows.html) for details. +Graph model (torch.compile) is enabled in Windows 11 for the first time across Intel GPUs, delivering the performance advantages over eager mode as on Linux by PyTorch 2.7. The latest performance data was measured on top of PyTorch Dynamo Benchmarking Suite using Intel® Arc™ B580 Graphics on Windows showcase torch.compile speedup ratio over eager mode as shown in Figure 2. Both training and inference achieved similar significant improvements. + + +![chart](/assets/images/pytorch-2-7-intel-gpus/fg2.png){:style="width:100%"} + +**Figure 2. Torch.compile Performance Gains Over Eager Mode on Windows** + + + +* Optimize the performance of PyTorch 2 Export Post Training Quantization (PT2E) on Intel GPU to provide full graph mode quantization pipelines with enhanced computational efficiency. Refer to [PT2E tutorial](https://pytorch.org/tutorials/prototype/pt2e_quant_xpu_inductor.html) for details. +* Enable AOTInductor and torch.export on Linux to simplify deployment workflows. Refer to[ AOTInductor tutorial](https://pytorch.org/docs/main/torch.compiler_aot_inductor.html) for details. +* Enable profiler on both Windows and Linux to facilitate model performance analysis. Refer to the[ PyTorch profiler tutorial](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html#pytorch-profiler) for details. + +Review the [Getting Started on Intel GPU Guide](https://pytorch.org/docs/2.7/notes/get_start_xpu.html) for a tour of the environment setup and a quick start on Intel GPUs. + + +## Future Work + +Looking ahead, we will continue the Intel GPU upstream efforts in future PyTorch releases to: + +* Attain state-of-the-art PyTorch-native performance to showcase competitive GEMM computational efficiency for torch.compile, and enhance performance for LLM models through FlexAttention and lower precision data types. +* Broaden feature compatibility by delivering distributed XCCL backend support for Intel® Data Center GPU Max Series. +* Expand accelerator support across core PyTorch ecosystem components including torchao, torchtune, and torchtitan. + +Follow along in the [PyTorch Dev Discussion](https://dev-discuss.pytorch.org/t/intel-gpu-cpu-enabling-status-and-feature-plan-2025-h1-update/2913) to learn more about Intel GPU & CPU enabling status and features. As we get further along, we will create tickets on GitHub to document our progress. + + +## Summary + +In this blog, we reviewed the Intel GPU upstream progress starting in PyTorch 2.4 and highlighted the new features of PyTorch 2.7 that accelerate AI workload performance across various Intel GPUs. These new features, especially SDPA on Windows, achieved up to 3x inference (Stable Diffusion, float16) gain over PyTorch 2.6 release on Intel Arc B580 Graphics and Intel Core Ultra 7 Processor 258V with Intel Arc Graphics 140V. Also, torch.compile on Windows delivers similar performance advantages over eager mode on Dynamo benchmarks as on Linux. + + +## Acknowledgments + +We want to thank the following PyTorch maintainers for their technical discussions and insights: [Nikita Shulga](https://github.com/malfet), [Jason Ansel](https://github.com/jansel), [Andrey Talman](https://github.com/atalman), [Alban Desmaison](https://github.com/alband), and [Bin Bao](https://github.com/desertfire). + +We also thank collaborators from PyTorch for their professional support and guidance. + +## Product and Performance Information + +Measurement on Intel Core Ultra 7 258V: 2200 MHz, 8 Core(s), 8 Logical Processor(s) with Intel Arc 140V GPU (16GB), GPU memory 18.0 GB, using Intel Graphics Driver 32.0.101.6647 (WHQL Certified), Windows 11 Pro - 24H2. And Intel Core Ultra 5 245KF: 4200 MHz, 14 Core(s), 14 Logical Processor(s), Intel Arc B580 Graphics, dedicated GPU memory 12.0 GB, shared GPU memory 15.8 GB, using Intel Graphics Driver 32.0.101.6647 (WHQL Certified), Windows 11 Enterprise LTSC - 24H2. Test by Intel on Apr 8th, 2025. + +## Notices and Disclaimers + +Performance varies by use, configuration and other factors. Learn more on the Performance Index site. Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates.  See backup for configuration details.  No product or component can be absolutely secure. Your costs and results may vary. Intel technologies may require enabled hardware, software or service activation. + +Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others. + +## AI Disclaimer + +AI features may require software purchase, subscription or enablement by a software or platform provider, or may have specific configuration or compatibility requirements. Details at [www.intel.com/AIPC](http://www.intel.com/AIPC). Results may vary. \ No newline at end of file diff --git a/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md b/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md new file mode 100644 index 000000000000..245688c07605 --- /dev/null +++ b/_posts/2025-04-28-accelerating-training-float8-rowwise-crusoe.md @@ -0,0 +1,195 @@ +--- +layout: blog_detail +title: "Accelerating Large Scale Training and Convergence with PyTorch Float8 Rowwise on Crusoe 2K H200s" +author: Meta and Crusoe +--- + +**Meta**: Less Wright, Hamid Shojanazeri, Vasiliy Kuznetsov, Daniel Vega-Myhre, Gokul Nadathur, Will Constable, Tianyu Liu, Tristan Rice, Driss Guessous, Josh Fromm, Luca Wehrstedt, Jiecao Yu +**Crusoe**: Ethan Petersen, Martin Cala, Chip Smith + +Working with [Crusoe.AI](http://Crusoe.AI) we were provided access to one of their new 2K H200 clusters in Iceland, which enabled us to showcase training accelerations of 34 - 43% at scale by leveraging TorchTitan’s HSDP2 and TorchAO’s new float8 rowwise, with comparable convergence and stability vs BF16. + + +![bar chart](/assets/images/accelerating-training-float8-rowwise-crusoe/fg1.png){:style="width:100%;"} + + +In this post we detail the synergy of H200’s with PyTorch’s new Float8 rowwise training with TorchTitan’s FSDP2/HSDP2 and CP at scale. + +## Background - what is an H200? + +H200’s are an ‘enhanced’ H100, offering the exact same compute as an H100, but with two additional improvements. + +* Larger global memory, 141GiB HBM3e vs the standard 80GiB HBM3 +* Memory bandwidth is ~43% faster with 4.8TB/s vs 3.35 TB/s. The faster memory transfer has an outsized effect on training speed, especially for PyTorch’s AsyncTP. + +## What is PyTorch Float8 rowwise? + +Float 8 Rowwise is a finer grained resolution for Float8 vs the previous ‘tensor wise’ Float8. It is designed to ensure finer grained accuracy to support larger workloads that tend to become more sensitive to quantization at scale and as training progresses. + +There are two key improvements with Float8 rowwise: + +* Each row now maintains its own scaling factor versus a single scaling factor for the entire tensor, thus improving quantization precision. Finer grained scaling per row helps reduce the effect of outliers (extreme values that force the quantization scaling factor to stretch and degrade the precision of the normally distributed values) and thus ensures better precision. +* The scaling factor itself is now implemented by rounding down to the nearest power of 2. This has been shown to help reduce quantization errors when multiplying/dividing by the scaling factor as well as ensuring large values remain scaled to the same value in both the forward and backward passes. + +Note that other large scale models have been trained using Float8 at 2K scale with a combination of 1x128 groupwise and 128x128 blockwise, with power of 2 scaling factors. They had the same goal of improving Float8’s precision for supporting large scale training. + +Thus, Float8 rowwise offers a similar promise to enable Float8 for very large scale training, but we wanted to provide proof of stability and convergence at scale, which training on the Crusoe H200 2k cluster provided initial verification thereof. + +## Showcasing Float8 Rowwise Loss convergence vs BF16 at 1600 and 1920 GPU Scale: + +In order to verify comparable loss convergence, we ran two separate runs at both 1920 and then 1600 (1.6k) gpu scale using TorchTitan and Lllama3 70B. The 1.6K GPU runs were set for 2.5k iterations, using TorchTitans’ HSDP2 and Context Parallel to enable 2D parallelism. + +The loss convergence tests were run using Titan’s deterministic mode - this mode effectively freezes most potential sources of variation from run to run, and thus helps ensure that the only substantial change is what we want to test, namely the loss convergence and loss curves of BF16 vs Float8 Rowwise. + +Note that deterministic mode also slows down training speed because various kernels will not be autotuned to maximize throughput (otherwise we risk using different kernels between runs and introducing variance). + +Two runs were completed, one with BF16 and the other with Float8 Rowwise. + +Both runs completed their assigned 2.5k iters without issue, showcasing the Crusoe cluster stability, with FP8 completing at exactly 24 hours and BF16 finishing after 31 hours, 19 minutes. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    DType + Time / Iters + Loss +
    + + +
    BF16 + 24 hours + 3.15453 +
    Float8 Rowwise + 24 hours + 2.86386 +
    + + +
    BF16 + 31 hours, 19 minutes / 2.5K + 2.88109 +
    Float8 Rowwise + 24 hours / 2.5K + 2.86386 +
    + + +At the 24 hour mark, Float8 completed 2.5K iterations showcasing the comparative speed up (even in deterministic mode) of float8 training. At the 24 hour mark, Float8 enabled a **+9.21%** relative improvement in loss compared to BF16 for the same 24 hours of large scale training time. + + +After 31 hours, 19 minutes, the BF16 run finally completed its 2.5k iters. + + +The final loss numbers: +BF16 = **2.88109** +Float8 = **2.86386** + +From the loss curves we observed very similar curves at the first and last ⅓ and then a turbulent zone in the middle where both showed similar spikes, but with a slight skew to the relative timing of the spikes. + + +![line chart](/assets/images/accelerating-training-float8-rowwise-crusoe/fg2.png){:style="width:100%;"} + + +As a result of this, we can see that PyTorch’s Float8 rowwise offers similar convergence but over 33% speedup for the same amount of training time. + +## Long Term Training stability with Float8 Rowwise + +Beyond showcasing comparable convergence, we also wanted to show longer term training stability with Float8 and thus we launched a 4 day, 15K run at 256 scale. + +![line chart](/assets/images/accelerating-training-float8-rowwise-crusoe/fg3.png){:style="width:100%;"} + + +As shown above, Float8 training ran for over 100 hours with no issues, highlighting the long term stability of Float8 Rowwise. + +## Determinism in TorchTitan + +To verify determinism and to see if the spikiness in the longer runs was from scale, we also ran a smaller run comprising of 2 runs of BF16, and 1 run of Float8 at 256 scale, and with HSDP2 only (i.e. without 2D Context parallel). + +In this case both BF16 runs had identical curves and final loss, and we saw a similar spikiness zone for all three runs. + +At the 2K iteration mark, both Float8 and BF16 ending at nearly identical points: +BF16 *2 = **3.28538** +Float8 rowwise = **3.28203** + +![line chart](/assets/images/accelerating-training-float8-rowwise-crusoe/fg4.png){:style="width:100%;"} + + +The above result confirms that neither CP nor scale (2k) are responsible for spikiness in the loss as we saw similar effect at 256 scale as well. The most likely explanation for the loss spikes could be content distribution in the dataset. + +For the sake of determinism, the experiments were run with a serialized C4 dataset (not shuffled), meaning the spikes could be from encountering new content within the dataset. + +## Net speedups at various Scales with Float8 rowwise: + +We performed shorter runs at various GPU scales to understand how Float8 Rowwise would scale in terms of training acceleration as cluster sizes expanded. Doubling in scale from 960 to 1920, Float8 continued to deliver impressive training speedups, with a range of over 34-43% gains compared to BF16. We also want to note that scaling from 1k to 2k GPUs communication overhead likely kicked in and we observed a 4% hit on throughput with BF16. + +![bar chart](/assets/images/accelerating-training-float8-rowwise-crusoe/fg5.png){:style="width:100%;"} + + +As shown in the longer training runs at scale above, Float8 rowwise delivered substantial speedups with equal or even slightly improved loss endpoints while delivering 34% speedups at 1920 (DeepSeek) scale. + +## How can I use Float8 Rowwise in my training? + +Float8 Rowwise is available now for you to use in your large scale training. It is packaged in [TorchAO’s](https://github.com/pytorch/ao) latest builds (0.9 and higher) and integrated into [TorchTitan](https://github.com/pytorch/torchtitan) natively if you want to get up and running quickly. + +To activate Float8 Rowwise in TorchTitan: + +First enable the model converter to hotswap the nn.linears into float8 linear layers in your models .toml file - see line 29: + + +![code](/assets/images/accelerating-training-float8-rowwise-crusoe/fg6.png){:style="max-width:600px; display: block; margin-left: auto; margin-right: auto"} + +Secondly, specify the ‘rowwise’ float8 recipe - see line 72: + + +![code](/assets/images/accelerating-training-float8-rowwise-crusoe/fg7.png){:style="max-width:600px; display: block; margin-left: auto; margin-right: auto"} + + +Note that you have three choices for the ‘recipe_name’: + +* rowwise which is the recommended default, +* tensorwise (the older style float8) and +* rowwise_with_gw_hp. + +The gw_hp rowwise option keeps the gradients to the weights in BF16 precision during the backwards pass, and this can further enhance float8 precision for extremely sensitive workloads. But, it can ironically be a bit more performant than generic rowwise if the majority of the matmul sizes in your model are smaller (with an estimated tipping point at roughly 13-16K dimensions on H100). + +Thus while we recommend rowwise as the default, it may be worth comparing with gw_hp on your model to verify which provides the best performance, with an upside of even greater precision. + +By toggling the model converter on and off with a #, you can directly compare training acceleration between BF16 and Float8 Rowwise to understand the potential speedups for your own training. + +## Future Updates: + +We’ll have an additional update coming showcasing multiple improvements for Pipeline Parallel and Async Distributed Checkpointing so please stay tuned. \ No newline at end of file diff --git a/_posts/2025-04-29-pt-foundation-expands.md b/_posts/2025-04-29-pt-foundation-expands.md new file mode 100644 index 000000000000..a0b0454ae588 --- /dev/null +++ b/_posts/2025-04-29-pt-foundation-expands.md @@ -0,0 +1,50 @@ +--- +layout: blog_detail +title: "PyTorch Foundation Expands to an Umbrella Foundation to Accelerate AI Innovation" +author: Matt White, Executive Director, PyTorch Foundation +--- + +Today, I am thrilled to announce a significant milestone for the PyTorch Foundation: we are expanding our scope to become an umbrella foundation, allowing us to host additional projects. This expansion positions the PyTorch Foundation to foster a broader ecosystem of high-value, trusted, and innovative AI projects that cater to all stages of the AI lifecycle—from training and inference to industry-specific applications. + +## Why Expand? + +Since its inception at the Linux Foundation two and a half years ago, the PyTorch Foundation has rapidly grown, now encompassing over 30 member organizations and 120 vibrant ecosystem projects. PyTorch itself has become the framework of choice for AI researchers, practitioners, and industry leaders worldwide. Our flagship PyTorch Conference has seen attendance multiply sixfold over just two years, reflecting the community’s tremendous enthusiasm and engagement. + +With new initiatives such as PyTorch Day events, global community meetups, the PyTorch Ambassador Program, Open Source Program Office (OSPO) outreach, the Speaker’s Bureau, and our upcoming training and certification programs, we have significantly deepened our community’s expertise and collaboration capabilities. To sustain and accelerate this momentum, the logical next step was to expand the PyTorch Foundation into an umbrella organization. + +## What Does an Umbrella Foundation Mean? + +By transitioning into an umbrella foundation, PyTorch will now host a range of diverse, high-quality AI and ML projects beyond PyTorch Core. These include foundation-hosted projects in two categories: + + +* **Platform Projects**: Domain-agnostic solutions essential across various stages of the AI lifecycle, such as training, inference, model optimization, and deployment as well as agentic systems. +* **Vertical Projects**: Domain-specific projects tailored to particular industries or applications, such as biomedical imaging, protein folding, and geospatial analysis. + +Projects under our umbrella gain immediate access to vendor-neutral governance, enhanced visibility, increased funding opportunities, and robust community engagement and support. + +## Foundation-Hosted vs. Ecosystem Projects + +As we expand, it’s important to clarify the distinction between foundation-hosted and ecosystem projects: + +* **Foundation-Hosted Projects** are projects that fall under the umbrella, they are officially governed and administered under the PyTorch Foundation’s neutral and transparent governance model. Project maintainers continue to oversee their project, and they transfer assets to the Linux Foundation for independent stewardship and adopt an open governance model significantly reducing vendor bias and encouraging broader community contributions and adoption. These projects have greater stability and longevity and integrate with the larger PyTorch community. +* **Ecosystem Projects** remain independently managed but receive recognition and increased visibility by aligning themselves closely with the PyTorch Foundation community standards. These projects meet specific quality and maturity criteria but retain full independence in governance and asset management. + +## How to Join the PyTorch Ecosystem or Become a Foundation-Hosted Project + +We have clearly defined pathways for projects looking to become part of the PyTorch community: + +1. **[Ecosystem Project Status](https://github.com/pytorch-fdn/ecosystem)**: Projects must meet defined criteria, such as active development, comprehensive documentation, CI/CD infrastructure, clear governance, and community engagement. Approved ecosystem projects benefit from increased exposure and official recognition on the [PyTorch Landscape](https://landscape.pytorch.org/). +2. **[Candidate Project Status](https://github.com/pytorch-fdn/foundation-hosted)**: Ecosystem projects aspiring to foundation-hosted status can become candidates by securing sponsorship from a PyTorch Foundation [Technical Advisory Council (TAC)](/tac) voting member. Candidates receive guidance on meeting all necessary governance, technical, and strategic criteria. +3. **[Foundation-Hosted Project Status](https://github.com/pytorch-fdn/foundation-hosted)**: Candidate projects demonstrating high maturity, stability, multi-platform support, security best practices, and strategic value to the PyTorch community can be approved by the TAC. These projects gain extensive benefits, including neutral trademark hosting, foundation support, marketing and events resources, governance guidance, and strategic funding opportunities. + +## Ensuring Long-Term Success and Innovation + +By expanding our scope to become an umbrella foundation, the PyTorch Foundation is uniquely positioned to enhance collaboration, innovation, and sustained growth across the entire AI community. Our mission is clear: create a vendor-neutral, open source environment where the best AI and ML tools can thrive, benefiting users, contributors, and industry stakeholders worldwide. + +*“PyTorch is absolutely the foundation of the innovation happening in AI today and with projects like Llama, ChatGPT, and hundreds of thousands of open projects built on PyTorch, it has cemented itself as a critical ingredient to the world of AI. This move to create an umbrella foundation enables PyTorch to significantly expand its ecosystem both horizontally and vertically in this new era of agentic systems. I am very excited about this opportunity to take the PyTorch community to the next level!” - Joe Spisak, Product Director for PyTorch at Meta.* + +*"PyTorch sits at the very core of AI today. Meanwhile, the depth of the AI stack has grown dramatically—evolving from enabling accelerated compute to powering fully autonomous systems. Broadening the PyTorch Foundation is a key step in keeping the AI revolution open and accessible to all, across the stack and aligned with the principles PyTorch was built on." - Luca Antiga, CTO at Lightning AI.* + +We are incredibly optimistic about the opportunities ahead and excited to welcome new projects into our growing family. The PyTorch Foundation remains deeply committed to driving AI innovation forward, and together, we will continue to build the future of open source artificial intelligence. + +Stay tuned for more updates, announcements, and opportunities to participate! \ No newline at end of file diff --git a/_posts/2025-04-30-6x-faster-async-checkpointing.md b/_posts/2025-04-30-6x-faster-async-checkpointing.md new file mode 100644 index 000000000000..12a2f9e1b1de --- /dev/null +++ b/_posts/2025-04-30-6x-faster-async-checkpointing.md @@ -0,0 +1,108 @@ +--- +layout: blog_detail +title: "6x faster Async Checkpointing in PyTorch, using Cached Plans, no GIL contention" +author: Meta and Crusoe +--- + +**Meta**: Less Wright, Meet Vadakkanchery, Saurabh Mishra, Ela Krepska, Hamid Shojanazeri, Pradeep Fernando +**Crusoe**: Ethan Petersen, Martin Cala, Chip Smith + +PyTorch DCP (Distributed Checkpointing) has recently enabled new optimizations in asynchronous checkpointing to reduce GPU utilization drop by minimizing collective overhead and improving overall checkpointing efficiency. + +Using Crusoe’s 2K H200 cluster, with TorchTitan and training a Llama3-70B, we were able to verify these new features deliver substantial speedups at 1856 GPU scale, reducing the background processing time for async DCP checkpoints from ~436 seconds to ~67 seconds. + +This is roughly a 6.5x reduction in background checkpoint processing time, enabling even more total training time to proceed at full training throughput. + +![chart](/assets/images/6x-faster-async-checkpointing/fg1.png){:style="width:100%"} + + +*Fig 1: 1856 training run with high frequency checkpointing. The first checkpoint (drop down in tps) does not have a cached save plan, and the background processing takes far longer than the rest where the cached plan is used.* + + +## Background: What is Asynchronous Checkpointing? + +In a standard checkpointing workflow, GPUs are blocked while the checkpointing data is offloaded from GPU to CPU and then written to storage. After the save to physical media is complete, training can resume. + +Asynchronous checkpointing greatly reduces this downtime by enabling the actual saving to storage to be done via CPU threads, allowing GPU-based training to continue while the checkpoint data is being persisted in parallel. It is used primarily for intermediate/fault tolerant checkpoints as it unblocks the GPUs much faster compared to the synchronous checkpoints. \ +For example, in our large-scale experiment, GPU training was blocked for less than a second (.78 seconds at 1856 scale) while checkpoint data was moved from GPU to CPU (staging). At that point, GPU training immediately continues, which is a substantial training time improvement over traditional checkpointing. For reference, Async Checkpointing is covered in more detail [here](https://pytorch.org/blog/reducing-checkpointing-times/). + + +## Challenges with Asynchronous Checkpointing + +However, the background processing inherent in Asynchronous Checkpointing has additional challenges that result in a temporary reduction of training throughput while the storage phase is being completed. These are highlighted below. + + +### GPU utilization drop from GIL contention: + +The Global Interpreter Lock (GIL) in Python is a mechanism that prevents multiple native threads from executing Python bytecode at the same time. This lock is necessary mainly because CPython's memory management is not thread-safe. + +DCP currently uses background threads for metadata collectives and uploading to storage. Although these expensive steps are done asynchronously, it leads to contention for the GIL with the trainer threads. This causes the GPU utilization (QPS) to suffer significantly and also increases the e2e upload latency. For large-scale checkpoints, the overhead of the CPU parallel processing has a suppressive effect on net GPU training speed since CPUs also drive the training process via GPU kernel launches. + +Please refer to the following figure from our experiments: + +![chart](/assets/images/6x-faster-async-checkpointing/fg2.png){:style="width:100%"} + + +*Fig 2: One can see a sustained drop in training QPS even after staging (i.e. blocking operation to trainer) is complete.* + +The first dip in Figure 2 (marked by the purple line) indicates that staging is complete, and training can continue. However, a second drop is evident (marked by the area between the purple and yellow lines) which is due to trainer thread and checkpointing threads contending for the Python GIL, leading to degraded training QPS until the checkpoint thread completes execution. + + +### Collective communications cost: + +DCP performs multiple collectives today for various reasons: dedupe, global metadata for the checkpoint, resharding, and distributed exception handling. Collectives are costly as these require network I/O and pickling/unpickling of the large metadata being sent across the GPU network. These collectives become extremely expensive as the job scale grows, leading to significantly higher e2e latency and potential for collective timeouts. + + +## Solutions + + +### Process based async checkpointing + +DCP now supports async checkpoint save via a background process. This helps avoid the training QPS drop by eliminating the python GIL contention with the trainer threads. Please see Fig 2 for checkpointing via threads and Fig 3 for checkpointing via background process. + + +### Caching of the save plans + +DCP has a clear boundary between the planning and storage I/O steps. SavePlanner in DCP is a stateful component which acts as an access proxy to the state_dict. Planner manages save plans prepared by individual ranks, which carry metadata information necessary to do the write I/O. The planning step involves a collective operation to gather a comprehensive view of the checkpoint on the coordinator rank. The coordinator rank is responsible for de-duplicating parameters/weights to eliminate redundancies, validating the global plan to ensure accuracy and consistency, and creating the global metadata structs. This is followed by a scatter collective where the coordinator rank assigns I/O tasks to each rank. Any transformations done on the plans affect how the storage components finally write the data. + +During the course of a training job, multiple checkpoints are saved. In the majority of these cases, only the checkpoint data changes between different save instances, and thus, the plan remains the same. This presented an opportunity for us to cache the plans, pay the planning cost only on the first save, and then amortize that cost across all the subsequent attempts. Only the updated plans (plans which changed in the next attempt) are sent via collective, thus reducing the collective overhead significantly. + + +## Experiment Results + +**Set up:** 1856 H200 GPUs, Llama3-70B, HSDP2 with TorchTitan + +After deploying both the solutions above, the following are the key results: + +* TPS drop has significantly narrowed, with a peak dip to 372 vs 315 tps, and for a greatly reduced time window (~67 seconds vs ~437 seconds). This time window is now mostly attributed to the blocking for CPU processing. +* Subsequent checkpoint save attempts also continue to be much faster due to very low overhead at the planning stage. E2E latency is thus improved by over 6.5x. This will allow our partners to increase the checkpointing frequency and reduce the lost training progress (i.e. wasted training time). + +If you look at the very first downspike in Figure 1, this drawdown in GPU processing time takes training throughput from 700 down to 320 tps, and suppresses it for roughly 7 minutes (467 seconds). Once the CPUs have finished processing, training continues again at full speed. + +Previously, this ~7 minute suppression would be repeated at *every* checkpoint. However, with the new process-based checkpointing feature, only the first checkpoint has the full drawdown time (mainly due to overhead from daemon process initialization), as all future checkpoints are executed via the background process, mitigating GIL contention with the trainer threads. + +This is visually shown in all the subsequent checkpoints where the average MFU suppression time drops to just over a minute, reflected by the sharp spikes that almost immediately revert to full MFU throughput. + + +![chart](/assets/images/6x-faster-async-checkpointing/fg3.png){:style="width:100%"} + + +*Fig 3: The red box shows the non-cached plan checkpoint, which also includes Checkpoint Background Init process overhead, while the purple box highlights the first checkpoint to run with the cached plan.* + +This means that even large-scale checkpointing, such as shown in Fig 2 at 1856 GPU scale, can be done with ~6x reduced training throughput impact. This enables Asynchronous DCP checkpointing to be run more frequently (thus better rollback protection) while enhancing total training throughput relative to previous Async Checkpointing overhead. + +**Using DCP’s cached checkpointing:** + +This feature is already available as part of the PyTorch nightly builds, and you can test out PyTorch’s Asynchronous DCP checkpointing directly in TorchTitan. Following are the instructions to enable these features: + +* Process-based asynchronous checkpointing: + * Set the **async_checkpointer_type** to AsyncCheckpointerType.PROCESS in the [async_save](https://github.com/pytorch/pytorch/blob/main/torch/distributed/checkpoint/state_dict_saver.py#L193) API. (*file*: pytorch/torch/distributed/checkpoint/state_dict_saver.py) +* Save plan caching: + * Set the **enable_plan_caching** flag to true in the [DefaultSavePlanner](https://github.com/pytorch/pytorch/blob/main/torch/distributed/checkpoint/default_planner.py#L78C9-L78C28). (*file*: pytorch/torch/distributed/checkpoint/default_planner.py) + + +## Future work + +DCP will be rolling out additional optimizations to further improve the checkpointing cost. Currently even though the save plans are cached, coordinator rank still prepares the metadata. For larger jobs and models with many tensors, this overhead is non-trivial. In the next iteration, DCP will eliminate the metadata overhead and improve the e2e latency further. DCP will also introduce additional optimizations, such as zero-overhead checkpointing, to enable efficient checkpointing in large-scale jobs. + +Stay tuned! diff --git a/_posts/2025-04-30-flexattention-for-inference.md b/_posts/2025-04-30-flexattention-for-inference.md new file mode 100644 index 000000000000..587aedf2158a --- /dev/null +++ b/_posts/2025-04-30-flexattention-for-inference.md @@ -0,0 +1,380 @@ +--- +layout: blog_detail +title: "FlexAttention Part II: FlexAttention for Inference" +author: Joy Dong, Boyuan Feng, Driss Guessous, Joel Schlosser, Yanbo Liang, Horace He +--- + +## Overview + +In PyTorch 2.5.0 release, we introduced [FlexAttention](https://pytorch.org/blog/flexattention/) `torch.nn.attention.flex_attention` for ML researchers who’d like to customize their attention kernels without writing kernel code. This blog introduces our decoding backend optimized for inference, supporting GQA and PagedAttention, along with feature updates including nested jagged tensor support, performance tuning guides and trainable biases support. + +If you’re looking for an easy way to play around with FlexAttention in your post-training / inference pipeline, PyTorch native post-training library [torchtune](https://github.com/pytorch/torchtune) and inference codebase [gpt-fast](https://github.com/pytorch-labs/gpt-fast) already have FlexAttention integrated. Try it out! + +We are excited to share that our paper on FlexAttention has been accepted for presentation at the MLSys2025 Conference held from May 12-15th in Santa Clara, California. + +Title: **FlexAttention: A Programming Model for Generating Optimized Attention Kernels.** [Poster](https://mlsys.org/virtual/2025/poster/3007) + + +## FlexAttention for Inference + +TL;DR: `torch.compile` lowers `flex_attention` to a fused [FlashDecoding](https://pytorch.org/blog/flash-decoding/) kernel when it runs on a very short query. + +One fused attention kernel does not suit all – especially in long-context LLM inference. + +The decoding phase of LLM inference is an iterative process: tokens are generated one at a time, requiring `N` forward passes to generate an `N`-token sentence. Fortunately, each iteration doesn’t need to recompute self-attention over the full sentence — previously calculated tokens are cached, therefore we only need to attend the newly generated token to the cached context. + + +![chart](/assets/images/flexattention-for-inference/fg1.png){:style="width:100%"} + + +This results in a unique attention pattern where a short query sequence (1 token) attends to a long key-value cache (context length up to 128k). Traditional optimizations for square attention kernels (`q_len ≈ kv_len`) don’t directly apply here. This pattern poses new challenges for GPU memory utilization and occupancy. We build a dedicated FlexDecoding backend optimized for long-context LLM inference incorporating decoding-specific techniques from [FlashDecoding](https://pytorch.org/blog/flash-decoding/). + +FlexDecoding is implemented as an alternative backend for the `torch.nn.attention.flex_attention `operator. `flex_attention` automatically switches to the FlexDecoding backend for its JIT compilation when given a short query and a long KV cache. If the input shape changes significantly, for example transitioning from the prefill phase to decoding, JIT recompilation generates a separate kernel for each scenario. + +``` +flex_attention = torch.compile(flex_attention) + +k_cache = torch.random(B, H, 16384, D) +v_cache = torch.random(B, H, 16384, D) + +... + +# Prefill Phase: query shape = [B, H, 8000, D] +flex_attention(q_prefill, k_cache, v_cache, ...) # Uses FlexAttention backend optimized for prefill & training + +# Decoding Phase: q_last_token shape = [B, H, 1, D] +flex_attention(q_last_token , k_cache, v_cache, ...) # Recompiles with the FlexDecoding backend + +# decode 2 tokens at the same time: q_last_2_tokens shape = [B, H, 2, D] +flex_attention(q_last_2_tokens, k_cache, v_cache, ...) # No recompilation needed! Runs the decoding kernel again. +``` + + +## Working with KV Cache + +One of the key optimizations for efficient inference is maintaining a preallocated KV cache that updates **in place** as new tokens are generated. Instead of enforcing a specific KV cache policy with a dedicated API, FlexDecoding allows users to define and manage the KV cache themselves. + +Similar to FlexAttention, FlexDecoding takes user-defined `mask_mod` and `score_mod` functions. These functions modify attention scores before the softmax operation. + +![chart](/assets/images/flexattention-for-inference/fg2.png){:style="width:100%"} + +``` +score_mod(score, b, h, q_idx, kv_idx) -> tensor # return updated score +``` + +Score is a scalar pytorch tensor that represents the dot product of a query token and a key token. The rest of the arguments specify which score is being computed: + + + +* `b` batch index +* `h` attention head index +* `q_idx` token position in query tensor +* `kv_idx` token position in key/value tensor + +In the decoding phase, previously calculated tokens are cached, and only the latest generated token (i-th) is used as the query. A naive causal mask on this one token query looks like this: + +``` +def causal(score, b, h, q_idx, kv_idx): + return torch.where(q_idx >= kv_idx, score, -float("inf")) +``` + + +![chart](/assets/images/flexattention-for-inference/fg3.png){:style="width:100%"} + + +This is problematic: the new token “*saw*” should attend to all previously generated tokens i.e. “*The cat sat on the mat and saw*”, not just the first entry in the kv cache. To correct this, the `score_mod` needs to **offset q_idx** **by i **for accurate decoding. + + +![chart](/assets/images/flexattention-for-inference/fg4.png){:style="width:100%"} + + +Creating a new `score_mod` for each token to accommodate the offset is slow since it means FlexAttention needs to be recompiled every iteration for a different `score_mod`. Instead, + +We define this `offset` as a tensor and increment its value at each iteration: + +``` +offset = torch.tensor(i, "cuda") +def causal_w_offset(score, b, h, q_idx, kv_idx): + return torch.where(q_idx + offset >= kv_idx, score, -float("inf")) + +# Attend the i-th token +flex_attention(..., score_mod=causal_w_offset ) # Compiles the kernel here +... +# Attend the i+1-th token +offset = offset + 1 # Increment offset +flex_attention(..., score_mod=causal_w_offset ) # Doesn't need to recompile! +``` + +Notably, here `offset` becomes a captured tensor and it does not need to recompile if `offset` changes values. + +Manually rewriting your `score_mod` and `mask_mod` for offset handling isn't necessary. We can automate this process with a generic rewriter: + +``` +offset = torch.tensor(i, "cuda") + +def get_score_mod_w_offset(score_mod: _score_mod_signature, _offset: tensor): + def _score_mod(score, b, h, q, kv): + return score_mod(score, b, h, q + _offset, kv) + return _score_mod + +def get_mask_mod_w_offset(mask_mod: _mask_mod_signature, _offset: tensor): + def _mask_mod(b, h, q, kv): + return mask_mod(b, h, q + _offset, kv) + return _mask_mod + +causal_w_offset = get_score_mod_w_offset(causal, offset) +``` + +## BlockMask for Inference + +We can also use BlockMask with inference to leverage mask sparsity. The idea is to precompute the BlockMask once during model setup and use slices of it during decoding + + +### Precomputing BlockMask + +During setup, we create a squared BlockMask for `MAX_SEQ_LEN x MAX_SEQ_LEN`: + +``` +from torch.nn.attention.flex_attention import create_block_mask + +def causal_mask(b, h, q_idx, kv_idx): + return q_idx >= kv_idx + +block_mask = create_block_mask(causal_mask, B=None, H=None, Q_LEN=MAX_SEQ_LEN,KV_LEN=MAX_SEQ_LEN) +``` + +![chart](/assets/images/flexattention-for-inference/fg5.png){:style="width:100%"} + + +### Using BlockMask During Decoding + +For the i-th token, we use a slice of the mask: + +``` +block_offset = i // block_mask.BLOCK_SIZE[0] +block_mask_slice = block_mask[:, :, block_offset] + +# don't forget to use the mask_mod with offset! +block_mask_slice.mask_mod = get_mask_mod_w_offset(causal_mask) +``` + +![chart](/assets/images/flexattention-for-inference/fg6.png){:style="width:100%"} + + +## Performance + + +![chart](/assets/images/flexattention-for-inference/fg7.png){:style="width:100%"} + +FlexDecoding kernel performs on par with FlashDecoding (FAKV) and significantly outperforms pytorch scaled_dot_product_attention ([code](https://github.com/pytorch/pytorch/blob/main/benchmarks/transformer/score_mod.py)). + + +![chart](/assets/images/flexattention-for-inference/fg8.png){:style="width:100%"} + +FlexDecoding boosts LLaMa3.1-8B serving performance by 1.22x-2.04x, and LLaMa3.1-70B performance by 0.99x - 1.66x compared to SDPA in gpt-fast. ([code](https://github.com/pytorch-labs/gpt-fast)) + + +## Paged Attention + +[vLLM](https://blog.vllm.ai/2023/06/20/vllm.html) is one of the popular LLM serving engines, powered by the efficient memory management from PagedAttention. Existing [PagedAttention](https://github.com/vllm-project/vllm/blob/main/csrc/attention/paged_attention_v2.cu) implementation requires dedicated CUDA kernels and shows limited flexibility on supporting emerging attention variants. In this section, we present a PT2-native PagedAttention implementation that is enabled by flex attention and torch.compile. + +PagedAttention scatters KV cache to reduce memory fragmentation and support higher batch sizes. Without PagedAttention, KV cache from the same request are stored in a contiguous memory, requiring 2 tensor of shape *B x H x KV LEN x D*. We call it a logical KV cache. Here, KV_LEN is the maximum sequence length over all requests in a batch. Considering the Figure 1(a), KV_LEN is 9 thus all requests must be padded to 9 tokens, leading to large memory waste. With PagedAttention, we can chunk each request into multiple pages of the same size page_size and scatter these pages into a physical KV cache of shape *1 x H x max seq len x D*, where max_seq_len=n_pages x page_size. This avoids padding requests to the same length and saves memory. Specifically, we provide an `assign` API to update KV cache via index computations: + +``` +def assign( + batch_idx: torch.Tensor, + input_pos: torch.Tensor, + k_val: torch.Tensor, + v_val: torch.Tensor, + k_cache: torch.Tensor, + v_cache: torch.Tensor, +) -> None +``` + +Behind this `assign` API is a page table, a tensor mapping logical KV cache to physical KV cache: + +[batch_idx, logical_page_idx] -> physical_page_idx + +`assign` takes `k_val` and `v_val` and scatters to physical KV cache guided by the mapping from the page table. + + +![chart](/assets/images/flexattention-for-inference/fg9.png){:style="width:100%"} + + +**Paged Attention with Page Table** + +A natural question is, how to integrate PagedAttention with flex attention to support diverse attention variants? A naive idea is to materialize the logical KV cache before computing with flex attention. But this leads to redundant memory copy and bad performance. Another idea is to build a dedicated CUDA or Triton kernel for paged attention, similar to [existing PagedAttention implementation](https://github.com/vllm-project/vllm/blob/main/csrc/attention/paged_attention_v2.cu). However, this adds much manual effort and code complexity. + +Instead, we design a fused indirect memory access by converting a logical block mask according to the page table. In FlexAttention, we exploit BlockMask to identify logical blocks and skip redundant computation. While Paged Attention adds an extra layer of indirect memory access, we can further convert the logical block mask to the physical block mask corresponding to the page table, as illustrated in Figure 2. Our PagedAttention implementation provides a `convert_logical_block_mask` via torch.gather calls: + +``` +def convert_logical_block_mask( + block_mask: BlockMask, + batch_idx: Optional[torch.Tensor] = None, +) -> BlockMask +``` + +![chart](/assets/images/flexattention-for-inference/fg10.png){:style="width:100%"} + + + +**Paged Attention via Block Mask Conversion** + +One remaining question is how to rewrite user-specified `mask_mod` and `score_mod` for PagedAttention. When users specify these modifications, they write with logical indices without the knowledge of the page table maintained at runtime. The following code shows an automated conversion at runtime which is necessary to rewrite user-specified modifications with physical kv indices. The `new_mask_mod` would take the physical_kv_idx and convert it back to the logical_kv_idx and apply user-specified `mask_mod` on the logical_kv_idx for the correct mask. For efficiency, we maintain physical_to_logical as a mapping from physical_kv_block to logical_kv_block to facilitate the conversion. For correctness, we mask out-of-boundary blocks as False with a `torch.where` call. After batching logical KV caches from multiple requests into the same physical KV cache, there are much more physical blocks than the number of logical blocks for each request. Thus, a physical block may not have a corresponding logical block for a specific request during block mask conversion. By masking as False with `torch.where`, we can ensure the correctness that data from different requests do not interfere with each other. Similarly, we can convert the [score_mod](https://github.com/pytorch/pytorch/blob/main/torch/nn/attention/experimental/_paged_attention.py#L308-L338) automatically. + +``` +def get_mask_mod(mask_mod: Optional[_mask_mod_signature]) -> _mask_mod_signature: + if mask_mod is None: + mask_mod = noop_mask + + def new_mask_mod( + b: torch.Tensor, + h: torch.Tensor, + q_idx: torch.Tensor, + physical_kv_idx: torch.Tensor, + ): + physical_kv_block = physical_kv_idx // page_size + physical_kv_offset = physical_kv_idx % page_size + logical_block_idx = physical_to_logical[b, physical_kv_block] + logical_kv_idx = logical_block_idx * page_size + physical_kv_offset + return torch.where( + logical_block_idx >= 0, mask_mod(b, h, q_idx, logical_kv_idx), False + ) + + return new_mask_mod +``` + +Figure 3 demonstrates the latency from Paged Attention ([code](https://github.com/pytorch-labs/attention-gym/blob/main/attn_gym/paged_attention/latency.py)). Overall, there is less than 5% overhead from Flex Attention with Paged Attention, compared with Flex Attention only. We also observe an on-par performance with Flash Attention v2. A [minimal serving example](https://github.com/pytorch-labs/attention-gym/blob/main/attn_gym/paged_attention/throughput.py) further shows that PagedAttention can support 76x higher batch size when evaluating on [OpenOrca dataset](https://huggingface.co/datasets/Open-Orca/OpenOrca) which includes 1M GPT-4 completions and 3.2M GPT-3.5 completions. + + +![chart](/assets/images/flexattention-for-inference/fg11.png){:style="width:100%"} + + +**Paged Attention: Latency under diverse sequence length** + + +## Ragged input sequences with Nested Jagged Tensors (NJTs) + +FlexAttention now supports ragged-sized input sequences through the use of Nested Jagged Tensors (NJTs). NJTs represent ragged-sized sequences by packing sequences into a single “stacked sequence” and maintaining a set of offsets delimiting sequence boundaries for each batch item. + +A block mask can be created for input NJTs through the new `create_nested_block_mask()` API. The returned block mask is compatible with the ragged structure of the given NJT, treating it as a single “stacked sequence” with inter-sequence attention automatically masked out. The mask_mod or score_mod function can be written as usual. + +``` +from torch.nn.attention.flex_attention import create_nested_block_mask, flex_attention + +BATCH = 8 +NUM_HEADS = 8 +D = 16 +device = "cuda" + +# Input NJTs of shape (BATCH, SEQ_LEN*, D) with ragged SEQ_LEN +sequence_lengths = [torch.randint(5, 30, ()).item() for _ in range(BATCH)] +query = torch.nested.nested_tensor([ + torch.randn(seq_len, NUM_HEADS * D, device=device) + for seq_len in sequence_lengths +], layout=torch.jagged) +key = torch.randn_like(query) +value = torch.randn_like(query) + +# View as shape (BATCH, NUM_HEADS, SEQ_LEN*, HEAD_DIM) +query = query.unflatten(-1, [NUM_HEADS, D]).transpose(1, 2) +key = key.unflatten(-1, [NUM_HEADS, D]).transpose(1, 2) +value = value.unflatten(-1, [NUM_HEADS, D]).transpose(1, 2) + +# Simple causal mask +def my_mask_mod(b, h, q_idx, kv_idx): + return q_idx >= kv_idx + +# Construct a block mask using the ragged structure of the +# specified query NJT. Ragged-sized sequences are treated as a single +# "stacked sequence" with inter-sequence attention masked out. +block_mask = create_nested_block_mask(my_mask_mod, 1, 1, query) + +# For cross attention, create_nested_block_mask() also supports a +# rectangular block mask using the ragged structures of both query / key. +#block_mask = create_nested_block_mask(my_mask_mod, 1, 1, query, key) + +output = flex_attention(query, key, value, block_mask=block_mask) +``` + +## Trainable Biases + +FlexAttention now supports trainable parameters in `score_mod functions.` This feature enables users to reference tensors that require gradients within their `score_mod` implementations, with gradients automatically backpropagating through these parameters during training. + + +### Memory-Efficient Gradient Accumulation + +Instead of materializing the full attention scores matrix, FlexAttention uses atomic additions (`tl.atomic_add`) to accumulate gradients. This approach significantly reduces memory usage at the cost of introducing some non-determinism in gradient calculations. + + +### Handling Broadcasted Operations + +Broadcasting operations in the forward pass (e.g., `score + bias[h]`) require special consideration in the backward pass. When broadcasting a tensor across multiple attention scores within a head or other dimensions, we need to reduce these gradients back to the original tensor shape. Rather than materializing the full attention score matrix to perform this reduction, we use atomic operations. While this incurs some runtime overhead, it allows us to maintain memory efficiency by avoiding the materialization of large intermediate tensors. + + +### Current Limitations + +The implementation currently allows only a single read from each input tensor in the `score_mod` function. For example, `bias[q_idx] + bias[kv_idx]` would not be supported as it reads from the same tensor twice. We hope to remove this restriction in the future. + + +### Simple Example: + +``` +bias = torch.randn(num_heads, requires_grad=True) +def score_mod(score, b, h, q_idx, kv_idx): + return score + bias[h] +``` + +## Performance Tuning for FlexAttention + + +### TL;DR + +For optimal performance, compile FlexAttention using `max-autotune`, especially when dealing with complex `score_mods` and `mask_mods`: + +flex_attention = torch.compile(flex_attention, dynamic=True, mode='max-autotune') + + +### What is `max-autotune`? + +`max-autotune` is a `torch.compile` mode in which TorchInductor sweeps many kernel parameters (e.g., tile size, `num_stages`) and selects the best-performing configuration. This process allows kernels to test both successful and failing configurations without issues, and find the best viable configuration. + +While compilation takes longer with `max-autotune`, the optimal configuration is cached for future kernel executions. + +Here’s an example of FlexAttention compiled with `max-autotune`: + +``` +triton_flex_attention_backward_7 0.2528 ms 100.0% BLOCKS_ARE_CONTIGUOUS=False, BLOCK_M1=32, BLOCK_M2=32, BLOCK_N1=32, BLOCK_N2=32, FLOAT32_PRECISION="'ieee'", GQA_SHARED_HEADS=7, HAS_FULL_BLOCKS=False, IS_DIVISIBLE=False, OUTPUT_LOGSUMEXP=True, PRESCALE_QK=False, QK_HEAD_DIM=128, ROWS_GUARANTEED_SAFE=False, SM_SCALE=0.08838834764831843, SPARSE_KV_BLOCK_SIZE=1073741824, SPARSE_Q_BLOCK_SIZE=1073741824, V_HEAD_DIM=128, num_stages=4, num_warps=4 +``` + +### Why Use `max-autotune` for FlexAttention? + +The amount of shared memory utilized in FlexAttention depends on `score_mod` and `mask_mod` methods. This variability means that the preconfigured default kernel parameters may lead to performance cliffs or even out of shared memory** **errors on certain hardware for some masks/mods. + +For instance, with document masks, default configurations can halve GPU occupancy, reducing performance to ~75% of its potential on some GPUs. To avoid such issues, we strongly recommend enabling `max-autotune`. + + +## Updates and Enhancements + +* Now available as a prototype feature in PyTorch 2.5.0 +* Fixed critical correctness issues, including a bug affecting multiple calls to FlexAttention within the same call to torch.compile + + +## Expanded Architecture Support + +* Arbitrary sequence length support - no longer requires multiples of 128 +* Added native grouped-query attention (GQA) support via `is_gqa=True` +* Enhanced dimension flexibility: + * Different QK and V head dimensions + * Non-power-of-two head dimensions +* Trainable attention biases (prototype) + + +## Under the Hood + +* New fused CPU backend +* Improved TF32 handling for float32 inputs +* Resolved various dynamic shape issues +* Output layout matching query strides + +These updates make FlexAttention more robust and flexible while maintaining its core promise of combining PyTorch's ease of use with FlashAttention's performance benefits. \ No newline at end of file diff --git a/_posts/2025-05-01-docathon-2025.md b/_posts/2025-05-01-docathon-2025.md new file mode 100644 index 000000000000..1ad33370e775 --- /dev/null +++ b/_posts/2025-05-01-docathon-2025.md @@ -0,0 +1,54 @@ +--- +layout: blog_detail +title: 'Announcing the PyTorch Docathon 2025' +--- + +![PyTorch Docathon 2025](/assets/images/docathon-2025.png){:style="max-width:600px; display: block; margin-left: auto; margin-right: auto"} + + +We're thrilled to announce the [2025 PyTorch Docathon](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/)! This is a hackathon-style event aimed at enhancing PyTorch documentation with the support of the community. Documentation is a vital component of any technology, and by refining it, we can simplify the onboarding process for new users, help them effectively utilize PyTorch's features, and ultimately speed up the transition from research to production in machine learning. + + +## WHY PARTICIPATE + + +### Low Barrier to Entry + +Unlike many open-source projects that require deep knowledge of the codebase and previous contributions to join hackathon events, the Docathon is tailored for newcomers. While we expect participants to be familiar with Python, and have basic knowledge of PyTorch and machine learning, there are tasks related to website issues that don't even require that level of expertise. + + +### Tangible Results + +A major advantage of the Docathon is witnessing the immediate impact of your contributions. Enhancing documentation significantly boosts a project's usability and accessibility, and you'll be able to observe these improvements directly. Seeing tangible outcomes can also be a strong motivator to continue contributing. + + +### Collaborative Environment + +The Docathon fosters a collaborative atmosphere, offering you the chance to work alongside other contributors and PyTorch maintainers to improve the documentation. This is a fantastic opportunity to learn from peers, exchange ideas, and build connections. + + +### Learning Opportunities + +Even if you're not a PyTorch expert, the Docathon offers a valuable learning experience. You'll have the chance to delve into PyTorch modules, test tutorials on your machine, and explore them in the CI environment. + + +## WHO SHOULD PARTICIPATE + +Whether you’re a seasoned documentation expert or just starting out, we invite everyone to join in the PyTorch docathon to contribute and develop your skills and knowledge to help improve the documentation for everyone! We will have issues labelled by skill level, and the PyTorch Discord will be available for collaboration and help. + + +## EVENT DETAILS + + + +* June 3: Kick-off 10 AM PT +* June 4 - June 15: Submissions and Feedback +* June 16 - June 17: Final Reviews +* June 18: Winner Announcements + +Make sure to [RSVP](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/) to the event so you receive all the notifications and instructions on how to participate. + +Further details about the Docathon will be shared during the Kick-off call on June 3. + + +**Don't forget to register for this year's event: [RSVP now](https://community.linuxfoundation.org/events/details/lfhq-pytorch-foundation-presents-pytorch-docathon-june-3rd-18th-2025/)** \ No newline at end of file diff --git a/_posts/2025-05-01-how-ibm-uses-pt-terratorch.md b/_posts/2025-05-01-how-ibm-uses-pt-terratorch.md new file mode 100644 index 000000000000..db6955023bc0 --- /dev/null +++ b/_posts/2025-05-01-how-ibm-uses-pt-terratorch.md @@ -0,0 +1,90 @@ +--- +layout: blog_detail +title: 'How IBM Research Uses PyTorch and TerraTorch to Make Geospatial Computer Vision Accessible for Everyone' +hidden: true +--- + +Earth Observation-based analytics are becoming essential for understanding our planet — from monitoring deforestation to tracking urban development and analyzing the impacts of climate change. However, the coding and deep learning skills for applying AI models to satellite imagery and earth observation data has traditionally been a major barrier for many practitioners. + +By IBM Research’s launch of TerraTorch 1.0, a PyTorch domain library for fine-tuning of Geospatial Computer Vision Foundation Models, we make geospatial AI not only more accessible but also more practical for the wider PyTorch community. Our goal: simplify the process so that any data scientist, researcher, or enthusiast can build powerful geospatial models with ease and low GPU and data processing requirements. + +![globes](/assets/images/how-ibm-uses-pt-terratorch/fg1.png){:style="width:100%"} + + +**The power of foundation models, even with 75-95% of the input data removed, the models do a fantastic job in reconstruction of the input data - therefore learning the underlying physics of our planet in a deep, latent space** + +## The Business Challenge + +Our goal was to remove the technical barriers that prevent people from working with satellite imagery, weather and climate data at scale. Together with NASA, we’ve developed the Prithvi family of foundation models. Integrating the latest innovations of AI research using the clean API PyTorch provides has facilitated the job. + +We wanted to create a framework that anyone can use to go from raw data to inference ready models in just a few steps. + + +![globes](/assets/images/how-ibm-uses-pt-terratorch/fg2.png){:style="width:100%"} + + +**How a weather and climate foundation model created and fine-tuned on PyTorch is used for weather forecasts** + +## How IBM Research Used PyTorch + +We’ve built TerraTorch on top of PyTorch, leveraging its dynamic ecosystem to integrate: + + + +* PyTorch Lightning for clean, scalable training loops +* TorchGeo for geospatial data handling and transformations (PyTorch transforms) +* For foundation models like the leading generative multimodal foundation model ['Terramind'](https://research.ibm.com/blog/terramind-esa-earth-observation-model), co-developed by IBM and ESA, and [the ‘Prithvi’ family](https://huggingface.co/ibm-nasa-geospatial), co-developed by IBM and NASA, TerraTorch has been used to fine-tune all of the downstream geospatial models for satellite imagery, weather and climate data. It includes the family of fine-tuned models that IBM has released as part of [Granite](https://huggingface.co/collections/ibm-granite/granite-geospatial-models-667dacfed21bdcf60a8bc982). In addition, other interesting foundation models and ecosystem components like Clay, SatMAE, Satlas, DeCur and DOFA are included in TerraTorch. +* Powerful and state-of-the-art vision transformers to experiment with modern neural network architectures +* TerraTorch-Iterate build on top of PyTorch, Optuna, MLFlow and Ray Tune for Hyperparameter Optimization (HPO), Neural Architecture Search (NAS) and Foundation Model Benchmarking (GeoBench), where TerraTorch became the reference implementation + + +![flow diagram](/assets/images/how-ibm-uses-pt-terratorch/fg5.png){:style="width:100%"} + +**The fine-tuning and inference process is completely described in a single YAML config file. There, the architectural building blocks of the model (backbone, neck, decoder, head) are defined. The Model Factory assembles the model using the build-in and custom registries. In addition, the Optimizer and Data Modules are created as defined in the config. Finally, everything is passed to the Lightning Trainer, who executes the task.** + + +With PyTorch’s flexibility, we were able to prototype quickly, iterate on model architectures, and deploy pipelines for a range of geospatial applications — from flood and biomass detection to increasing resolution of climate data, where some of our our work became part of the [IBM Granite Geospatial Model Family](https://huggingface.co/collections/ibm-granite/granite-geospatial-models-667dacfed21bdcf60a8bc982). + + +![flow diagram](/assets/images/how-ibm-uses-pt-terratorch/fg3.png){:style="width:100%"} + + +**Architecture of the Prithvi-EO-2.0-600M foundation model which IBM Research developed together with NASA** + +## Solving AI Challenges with PyTorch + +PyTorch helped us to tackle three major challenges: + +* Ease of experimentation: Dynamic computation graphs, automatic differentiation, full abstraction of CUDA and rich visualization tools made it simple to test different models and training strategies. +* Scalability: With DDP, FSDP, PyTorch Lightning and TorchGeo, we could train models on large-scale datasets without worrying about infrastructure. +* Community support: PyTorch - the de-facto standard in AI research - with its active community and excellent documentation made it easy to overcome hurdles and stay up to date with the latest advancements in AI research. + +## A Word from IBM Research + +*"PyTorch gave me the power to turn complex linear algebra and optimization problems into accessible, shareable solutions for the community. It feels empowering that we’re building and fine-tuning models for anyone curious about understanding our planet through AI."* + +— Romeo Kienzler, AI Research Engineer at IBM Research Zurich, Rueschlikon + + +![quote](/assets/images/how-ibm-uses-pt-terratorch/fg4.png){:style="width:100%"} + + +## The Benefits of Using PyTorch + +Using PyTorch allowed us to: + + + +* Build a reproducible, open-source framework for fine-tuning geospatial foundation models +* Share our work with the community through easy-to-follow notebooks, TerraTorch configuration files, tutorials and model checkpoints on HuggingFace +* Rapidly iterate over foundation model architectures and deploy fine-tuned models for inference, from research to real-world client products + +## Learn More + +For more information about this project and to explore the code, visit: + +* [GitHub Repository](https://github.com/IBM/terratorch) +* [IBM Research: Simplifying Geospatial AI with TerraTorch 1.0](https://research.ibm.com/blog/simplifying-geospatial-ai-with-terra-torch-1-0) +* [TerraTorch PrithviEOv2 example notebooks](https://github.com/IBM/terratorch/tree/main/examples/tutorials/PrithviEOv2) +* [TerraMind example notebooks](https://github.com/IBM/terramind/tree/main/notebooks) +* [Run TerraMind using TerraTorch on Colab](https://colab.research.google.com/github/IBM/terramind/blob/main/notebooks/terramind_v1_base_sen1floods11.ipynb) diff --git a/_posts/2025-05-02-pt-day-france-featured-sessions.md b/_posts/2025-05-02-pt-day-france-featured-sessions.md new file mode 100644 index 000000000000..36bd9bacd37b --- /dev/null +++ b/_posts/2025-05-02-pt-day-france-featured-sessions.md @@ -0,0 +1,49 @@ +--- +layout: blog_detail +title: 'PyTorch Day France Featured Sessions: A Defining Moment for Open Source AI' +--- + +[PyTorch Day France](https://events.linuxfoundation.org/pytorch-day-france/) offers a front-row seat to the future of open source AI. Taking place **7 May at Station F in Paris** and co-located with **[GOSIM AI Paris](https://paris2025.gosim.org/)**, this one-day event will bring together developers, researchers, and industry leaders for a day of technical sessions, real-world insights, and community exchange. + + +## 🌍 A Major Milestone for the PyTorch Foundation + +This event marks the very first **PyTorch Day**, launching a new international series hosted annually in different regions to convene AI researchers, developers, engineers, and enthusiasts. PyTorch Days are designed to spotlight open source AI advancements, foster community collaboration, and provide a forum to learn about active, high-impact AI projects built using PyTorch. + +PyTorch Day France also represents a pivotal moment in the PyTorch Foundation’s journey. With its recent [expansion into an umbrella foundation]( https://pytorch.org/blog/pt-foundation-expands/), PyTorch is now positioned to support a broader ecosystem of trusted, community-driven AI projects across the full AI lifecycle. + +At PyTorch Day France, you’ll hear directly from PyTorch Foundation **Executive Director, Matt White,** about this transition—and get a first look at some exciting announcements. + + +## 🎟️ Registration Details + +[Register now](https://www.eventbrite.com/e/gosim-ai-paris-tickets-1265928669729?aff=oddtdtcreator) with code **PYTORCH** for **free access** to the full day of **PyTorch Day France** sessions, **plus** **GOSIM AI Paris**. + +🔗Two events, one registration—double the sessions, double the innovation. \ +[Register here](https://www.eventbrite.com/e/gosim-ai-paris-tickets-1265928669729?aff=oddtdtcreator) + + +## 📅 Featured Sessions + +The day’s agenda includes deep technical dives and applied AI use cases from across the community, including the following talks: + + + +* [Luca Antiga (Lightning AI)](https://sched.co/21nz4) + *Lightning Thunder: Supercharged PyTorch for Modern Hardware* +* [Erwan Gallen & Eldar Kurtic (Red Hat)](https://sched.co/21nyd) + *Scaling LLM Inference with vLLM: Multi‑Accelerator Serving and Quantized LLMs* +* [Pierre Rouanet (Pollen Robotics)](https://sched.co/21nyX) + *Real-World Robotics as the Next Frontier for AI?* +* [Pablo Montalvo (Hugging Face)](https://sched.co/21nzG) + *PyTorch x Transformers: Pythonicity, Autodiff, and Modularity Defining Modern AI* +* [Pedro Ortis (Common Crawl)](https://sched.co/21nym) + *Harnessing Common Crawl for AI and ML Applications* +* [Meriem Bendris (NVIDIA)](https://sched.co/21nys) + *Teaching Mistral to Reason: Post-Training with PyTorch and NVIDIA* +* [Olatunji Ruwase (Snowflake)](https://sched.co/21nyy) + *DeepSpeed – Efficient Training Scalability for Deep Learning Models* + +[View the full schedule](https://pytorchdayfrance2025.sched.com/). + +Whether you’re a contributor, practitioner, or simply curious about what’s ahead, PyTorch Day France is an opportunity to connect with the community and shape what’s next for our ecosystem. diff --git a/_posts/2025-05-02-pt-korea-user-group-recap.md b/_posts/2025-05-02-pt-korea-user-group-recap.md new file mode 100644 index 000000000000..b5a2126271b8 --- /dev/null +++ b/_posts/2025-05-02-pt-korea-user-group-recap.md @@ -0,0 +1,87 @@ +--- +layout: blog_detail +title: 'Recap of the PyTorch Korea User Group Meetup: A Technical Conference with a PyTorch Core Maintainer' +author: 'Jiho Kim, PyTorch Korea User Group' +--- + +At the end of March, the PyTorch Korea User Group hosted a special meetup that brought together prominent speakers for deep discussions on the PyTorch core and its broader ecosystem. With the event more than doubling in size compared to past gatherings, we were able to connect with even more developers and share insights. Huge thanks to [goorm](https://goorm.co/) for sponsoring the fantastic venue! 😄 + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg1.jpg){:style="width:100%"} + + + +This recap is for those who couldn’t attend in person, as well as for participants who want to revisit the energy and insights of the day. The event featured experts in core PyTorch, AI accelerators, inference optimization, and large language model development. Below is a quick overview of the key sessions that anchored the conference. + + + +## 1️⃣ Jerry Lee | PyTorch Foundation + +Representing the PyTorch Foundation, part of the Linux Foundation, Jaeung provided an overview of how PyTorch is driving core open source technologies forward. He shared PyTorch's growth story, the many global projects currently in motion, and the ecosystem’s impressive 20%+ annual growth. The session also covered how the foundation operates, how member organizations are involved, and upcoming plans that are particularly useful for practitioners. + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg2.jpg){:style="width:100%"} + + +## 2️⃣ Alban Desmaison | PyTorch Roadmap + +Alban shared the design philosophy behind PyTorch and Meta’s official contribution roadmap ([link](https://dev-discuss.pytorch.org/t/meta-pytorch-team-2025-h1-roadmaps/2794)). He provided a deep technical dive into the differences between Eager and Compiled modes, especially breaking down the backend architecture of device Eager execution. Practical tools and improvements were also introduced—such as memory profilers, enhanced custom operator support, and pinned memory optimizations. + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg3.jpg){:style="width:100%"} + + + + +## 3️⃣ Hongseok Kim | PyTorch on Rebellions AI Accelerators: Status + +Rebellions is building runtime integration for their proprietary NPU architecture, fully aligned with the structural changes in PyTorch 2.0. This talk introduced the performance and scalability of their upcoming chip, their integration strategy with the PyTorch runtime, and challenges in supporting Eager Mode. Hongseok also previewed their roadmap toward releasing these features within the year. + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg4.jpg){:style="width:100%"} + + + +## 4️⃣ Kyujin Cho | Backend.AI: A Unified Platform for All AI Accelerators + +Backend.AI abstracts and integrates various AI accelerators into a unified workflow. As the diversity of accelerator architectures grows, the need for portability and infrastructure unification becomes even more important. This session showcased features across development and operations—from NPU scheduling and resource allocation to monitoring. Backend.AI currently supports accelerators from NVIDIA, Intel, Tenstorrent, Rebellions, and more. + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg5.jpg){:style="width:100%"} + + + +## 5️⃣ Taeho Kim | Optimizing & Deploying Models Across Multiple Chipsets Using NetsPresso + +This talk focused on the challenges of inference in real-world industrial applications of AI models. As new state-of-the-art models emerge rapidly, there’s a growing need for environments that can quickly validate device compatibility—ideally with one-click ease. NetsPresso is actively working on a static graph representation compatible with PyTorch, offering efficient support for model development, optimization, and testing. + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg6.jpg){:style="width:100%"} + + +## 6️⃣ Jungyeop Lee | The Journey to Reproduce Deepseek-R1 + +Jungyeop took us through his journey of reproducing Deepseek, a large language model—an effort that involved 201 experiments. He shared real-world lessons from training with Korean data, tokenizer modifications, and fine-tuning strategies. His practical insights and next steps were especially valuable for those building or re-implementing large models from scratch. + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg7.jpg){:style="width:100%"} + + +## 7️⃣ Sol Kim | A journey from TCP architecture to production-level LLMs + +Sol presented an integrated optimization approach to deploying large models using the TCP(Tensor Contraction Processor) architecture, which supports tensor contraction at the hardware level. The talk highlighted optimization techniques built on hardware abstraction layers (HALs) and bottom-up integration strategies with PyTorch—offering a hybrid hardware-software perspective. + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg8.jpg){:style="width:100%"} + +## 💡 Panel Talk & Q&A 💡 + +The event wrapped up with an engaging panel discussion. Attendees asked sharp questions, and the speakers offered insightful answers. It was a powerful moment that captured the community’s enthusiasm for PyTorch and their hunger for deeper technical understanding. + + +![people at a conference](/assets/images/pt-korea-user-group-recap/fg9.jpg){:style="width:100%"} + + +## Final Thoughts + +Since our first offline meetup in October 2022, the PyTorch Korea User Group has held five major technical conferences. Each event deepens our appreciation for the scale and depth of the PyTorch ecosystem. With perspectives from users, contributors, and ecosystem builders, the stories we share are only growing—and we’re committed to continuing this journey together. + +See you at the next conference—with even more exciting talks to come! 🙌 \ No newline at end of file diff --git a/_sass/events.scss b/_sass/events.scss index 7397707b5df2..18e89c238ca2 100644 --- a/_sass/events.scss +++ b/_sass/events.scss @@ -37,6 +37,16 @@ } } } + .community-event { + margin: 0; + padding: 3px 10px; + border: 1px solid #8c8c8c; + border-radius: 3px; + text-transform: uppercase; + font-size: 14px; + font-weight: 700; + color: #8c8c8c; + } .event-side-nav-container { padding-left: 3rem; ul { diff --git a/_sass/navigation.scss b/_sass/navigation.scss index fd84dc74f890..420978c613c1 100644 --- a/_sass/navigation.scss +++ b/_sass/navigation.scss @@ -2,7 +2,7 @@ height: $mobile_header_height; @include full-nav-menu-desktop { - height: $desktop_header_height; + height: $desktop_header_height - 20px; } align-items: center; @@ -13,6 +13,9 @@ position: fixed; right: 0; top: 0; + @include full-nav-menu-desktop { + top: 32px; + } width: 100%; z-index: 9999; @@ -36,7 +39,7 @@ @include full-nav-menu-desktop { background-color: #CC2F90; color: $white; - display: none; + display: flex; letter-spacing: .34px; justify-content: center; padding: 4px 0; diff --git a/_videos/vid10.md b/_videos/vid10.md new file mode 100644 index 000000000000..faf1c637b5ae --- /dev/null +++ b/_videos/vid10.md @@ -0,0 +1,5 @@ +--- +title: 'Using PyTorch and DINOv2 for Multi-label Plant Species Classification' +youtube_id: rxVg3yrc51s +date: Mar 28, 2025 +--- diff --git a/_videos/vid11.md b/_videos/vid11.md new file mode 100644 index 000000000000..b7720dd02abb --- /dev/null +++ b/_videos/vid11.md @@ -0,0 +1,5 @@ +--- +title: 'PyTorch Expert Exchange – Multi-Modal Tabular Deep Learning with PyTorch Frame' +youtube_id: zPjLHf0X78w +date: Feb 20, 2025 +--- diff --git a/_videos/vid12.md b/_videos/vid12.md new file mode 100644 index 000000000000..f3ba5fc289fa --- /dev/null +++ b/_videos/vid12.md @@ -0,0 +1,5 @@ +--- +title: 'PyTorch 2.6 Release Live Q&A' +youtube_id: 1OopuwTq6oE +date: Feb 8, 2025 +--- diff --git a/_videos/vid13.md b/_videos/vid13.md new file mode 100644 index 000000000000..747642d8aea4 --- /dev/null +++ b/_videos/vid13.md @@ -0,0 +1,5 @@ +--- +title: 'How does batching work on modern GPUs?' +youtube_id: HTcnp9NEHGY +date: Nov 14, 2024 +--- diff --git a/announcement.html b/announcement.html index 9cc4d3985cfc..90eda81bc8d7 100644 --- a/announcement.html +++ b/announcement.html @@ -19,11 +19,13 @@

    PyTorch
           Foundation

    +

    Accelerating Open Source AI

    - Welcome to the PyTorch Foundation—a vibrant, collaborative hub created for and by the deep learning community. Here, developers, researchers, and industry leaders come together to shape and expand the open source PyTorch framework and ecosystem. Through a network of dedicated contributors to the PyTorch project, the PyTorch Foundation fuels discussion, innovation, and hands-on collaboration across the PyTorch landscape. + Welcome to the PyTorch Foundation—a vibrant, community-driven hub for open source AI. Developers, researchers, and industry pioneers collaborate here to advance the PyTorch framework and strengthen the open source AI ecosystem.

    - Community-driven collaboration is at the heart of PyTorch's growth and evolution. From advancing the core framework to building essential tools that power PyTorch at a production scale, your contributions are key to moving this ecosystem forward. As part of the Linux Foundation, the PyTorch community also supports a variety of initiatives: developer training, regional and local events, open source tooling, research, and guides for both newcomers and seasoned contributors—all to make your journey with PyTorch more accessible and rewarding. + From cutting-edge development to production-ready tools and libraries, the PyTorch Foundation thrives through transparent collaboration and collective innovation. As part of the Linux Foundation, we host global events, deliver specialized training, support research, and provide resources to accelerate your AI journey. + Whether you are contributing code, sharing your expertise, or deploying real-world AI solutions, the PyTorch Foundation actively empowers you to shape the future of accessible and impactful open source AI.

    @@ -46,127 +48,18 @@

    Our Guiding Principles

    -

    Premier Members

    -
    - {% for card in cards %} - {% assign card_title = card.title | split: ' ' %} - - {% endfor %} -
    -
    -
    -
    -
    +

    PyTorch Members

    -
    -
    -
    -
    -

    General Members

    -
    - - - -
    -
    -
    -
    -
    - -
    -
    -
    -
    -

    Associate Members

    - + + + + + + + +
    @@ -179,7 +72,10 @@

    PROM, IIT Rajasthan

    Our Governance

    - The PyTorch Foundation’s Governing Board oversees the Foundation’s activities according to its Guiding Principles and the PyTorch Foundation Charter. + The PyTorch Foundation’s Governing Board oversees the Foundation’s activities according to its Guiding Principles and the PyTorch Foundation Charter. +
    +
    + The PyTorch Foundation Code of Conduct details our commitment to fostering an inclusive, welcoming, and safe environment for everyone involved in the PyTorch Foundation community.

    The technical governance structure for the PyTorch open source project is defined by the PyTorch maintainers and is available on our PyTorch Technical Governance page. diff --git a/assets/brand-guidelines/PyTorch Foundation Charter.pdf b/assets/brand-guidelines/PyTorch Foundation Charter.pdf deleted file mode 100644 index 3a2d110c15bd..000000000000 Binary files a/assets/brand-guidelines/PyTorch Foundation Charter.pdf and /dev/null differ diff --git a/assets/images/6x-faster-async-checkpointing/fg1.png b/assets/images/6x-faster-async-checkpointing/fg1.png new file mode 100644 index 000000000000..4c3d3bf50d02 Binary files /dev/null and b/assets/images/6x-faster-async-checkpointing/fg1.png differ diff --git a/assets/images/6x-faster-async-checkpointing/fg2.png b/assets/images/6x-faster-async-checkpointing/fg2.png new file mode 100644 index 000000000000..1eaddbc43e68 Binary files /dev/null and b/assets/images/6x-faster-async-checkpointing/fg2.png differ diff --git a/assets/images/6x-faster-async-checkpointing/fg3.png b/assets/images/6x-faster-async-checkpointing/fg3.png new file mode 100644 index 000000000000..4c3d3bf50d02 Binary files /dev/null and b/assets/images/6x-faster-async-checkpointing/fg3.png differ diff --git a/assets/images/accelerating-generative-ai-2.jpg b/assets/images/accelerating-generative-ai-2.jpg new file mode 100644 index 000000000000..d2bddef62d8f Binary files /dev/null and b/assets/images/accelerating-generative-ai-2.jpg differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg1.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg1.png new file mode 100644 index 000000000000..7dcf02db043e Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg1.png differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg2.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg2.png new file mode 100644 index 000000000000..2245f96c5fff Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg2.png differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg3.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg3.png new file mode 100644 index 000000000000..e5797aedd0ca Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg3.png differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg4.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg4.png new file mode 100644 index 000000000000..3adae3b02e6b Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg4.png differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg5.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg5.png new file mode 100644 index 000000000000..7dcf02db043e Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg5.png differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg6.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg6.png new file mode 100644 index 000000000000..9c77b71f5d4f Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg6.png differ diff --git a/assets/images/accelerating-training-float8-rowwise-crusoe/fg7.png b/assets/images/accelerating-training-float8-rowwise-crusoe/fg7.png new file mode 100644 index 000000000000..35695c3de6d0 Binary files /dev/null and b/assets/images/accelerating-training-float8-rowwise-crusoe/fg7.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg1.png b/assets/images/activation-checkpointing-techniques/fg1.png new file mode 100644 index 000000000000..e4805cb40ea6 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg1.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg10.png b/assets/images/activation-checkpointing-techniques/fg10.png new file mode 100644 index 000000000000..91bd1c909173 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg10.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg11.png b/assets/images/activation-checkpointing-techniques/fg11.png new file mode 100644 index 000000000000..d4fa91fb677c Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg11.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg12.png b/assets/images/activation-checkpointing-techniques/fg12.png new file mode 100644 index 000000000000..e6c1679433dd Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg12.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg13.png b/assets/images/activation-checkpointing-techniques/fg13.png new file mode 100644 index 000000000000..ea5a5cbe0bf8 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg13.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg14.png b/assets/images/activation-checkpointing-techniques/fg14.png new file mode 100644 index 000000000000..cc20d543962d Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg14.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg2.png b/assets/images/activation-checkpointing-techniques/fg2.png new file mode 100644 index 000000000000..00c20f76c09a Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg2.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg3.png b/assets/images/activation-checkpointing-techniques/fg3.png new file mode 100644 index 000000000000..412639ab92b8 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg3.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg4.png b/assets/images/activation-checkpointing-techniques/fg4.png new file mode 100644 index 000000000000..5b4af130db49 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg4.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg5.png b/assets/images/activation-checkpointing-techniques/fg5.png new file mode 100644 index 000000000000..d4cdc3202836 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg5.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg6.png b/assets/images/activation-checkpointing-techniques/fg6.png new file mode 100644 index 000000000000..919609dbabce Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg6.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg7.png b/assets/images/activation-checkpointing-techniques/fg7.png new file mode 100644 index 000000000000..bbddbd9bf91a Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg7.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg8.png b/assets/images/activation-checkpointing-techniques/fg8.png new file mode 100644 index 000000000000..42b413e2118f Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg8.png differ diff --git a/assets/images/activation-checkpointing-techniques/fg9.png b/assets/images/activation-checkpointing-techniques/fg9.png new file mode 100644 index 000000000000..a4b748ead8e9 Binary files /dev/null and b/assets/images/activation-checkpointing-techniques/fg9.png differ diff --git a/assets/images/autonomous-language-model-systems.png b/assets/images/autonomous-language-model-systems.png new file mode 100644 index 000000000000..06b75fe2c6be Binary files /dev/null and b/assets/images/autonomous-language-model-systems.png differ diff --git a/assets/images/docathon-2025.png b/assets/images/docathon-2025.png new file mode 100644 index 000000000000..aad9c70d1f36 Binary files /dev/null and b/assets/images/docathon-2025.png differ diff --git a/assets/images/executorch-chip-logo.svg b/assets/images/executorch-chip-logo.svg new file mode 100644 index 000000000000..11e5ed60956b --- /dev/null +++ b/assets/images/executorch-chip-logo.svg @@ -0,0 +1,205 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/images/flexattention-for-inference/fg1.png b/assets/images/flexattention-for-inference/fg1.png new file mode 100644 index 000000000000..c42a3bf5717f Binary files /dev/null and b/assets/images/flexattention-for-inference/fg1.png differ diff --git a/assets/images/flexattention-for-inference/fg10.png b/assets/images/flexattention-for-inference/fg10.png new file mode 100644 index 000000000000..70d9e441b97c Binary files /dev/null and b/assets/images/flexattention-for-inference/fg10.png differ diff --git a/assets/images/flexattention-for-inference/fg11.png b/assets/images/flexattention-for-inference/fg11.png new file mode 100644 index 000000000000..94697c426b7e Binary files /dev/null and b/assets/images/flexattention-for-inference/fg11.png differ diff --git a/assets/images/flexattention-for-inference/fg2.png b/assets/images/flexattention-for-inference/fg2.png new file mode 100644 index 000000000000..47ae6ab99d26 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg2.png differ diff --git a/assets/images/flexattention-for-inference/fg3.png b/assets/images/flexattention-for-inference/fg3.png new file mode 100644 index 000000000000..06bc61656d47 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg3.png differ diff --git a/assets/images/flexattention-for-inference/fg4.png b/assets/images/flexattention-for-inference/fg4.png new file mode 100644 index 000000000000..b78a15172977 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg4.png differ diff --git a/assets/images/flexattention-for-inference/fg5.png b/assets/images/flexattention-for-inference/fg5.png new file mode 100644 index 000000000000..dbb7081efe98 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg5.png differ diff --git a/assets/images/flexattention-for-inference/fg6.png b/assets/images/flexattention-for-inference/fg6.png new file mode 100644 index 000000000000..d2221e66d982 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg6.png differ diff --git a/assets/images/flexattention-for-inference/fg7.png b/assets/images/flexattention-for-inference/fg7.png new file mode 100644 index 000000000000..6ec36ad490c5 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg7.png differ diff --git a/assets/images/flexattention-for-inference/fg8.png b/assets/images/flexattention-for-inference/fg8.png new file mode 100644 index 000000000000..a6c6a5227db8 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg8.png differ diff --git a/assets/images/flexattention-for-inference/fg9.png b/assets/images/flexattention-for-inference/fg9.png new file mode 100644 index 000000000000..8187641ba4b5 Binary files /dev/null and b/assets/images/flexattention-for-inference/fg9.png differ diff --git a/assets/images/governing-board/Dwarak-Rajagopal.jpg b/assets/images/governing-board/Dwarak-Rajagopal.jpg deleted file mode 100644 index dc8d4a070614..000000000000 Binary files a/assets/images/governing-board/Dwarak-Rajagopal.jpg and /dev/null differ diff --git a/assets/images/governing-board/dwarakrajagopal2.jpg b/assets/images/governing-board/dwarakrajagopal2.jpg new file mode 100644 index 000000000000..9036b956605d Binary files /dev/null and b/assets/images/governing-board/dwarakrajagopal2.jpg differ diff --git a/assets/images/governing-board/ricardo-aravena.jpg b/assets/images/governing-board/ricardo-aravena.jpg new file mode 100644 index 000000000000..4c76381a73cf Binary files /dev/null and b/assets/images/governing-board/ricardo-aravena.jpg differ diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg1.png b/assets/images/how-ibm-uses-pt-terratorch/fg1.png new file mode 100644 index 000000000000..140186a272cf Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg1.png differ diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg2.png b/assets/images/how-ibm-uses-pt-terratorch/fg2.png new file mode 100644 index 000000000000..7a37b893773d Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg2.png differ diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg3.png b/assets/images/how-ibm-uses-pt-terratorch/fg3.png new file mode 100644 index 000000000000..bcbe77ea9eca Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg3.png differ diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg4.png b/assets/images/how-ibm-uses-pt-terratorch/fg4.png new file mode 100644 index 000000000000..798947a41f20 Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg4.png differ diff --git a/assets/images/how-ibm-uses-pt-terratorch/fg5.png b/assets/images/how-ibm-uses-pt-terratorch/fg5.png new file mode 100644 index 000000000000..a8306bf3ed84 Binary files /dev/null and b/assets/images/how-ibm-uses-pt-terratorch/fg5.png differ diff --git a/assets/images/landscape.jpg b/assets/images/landscape.jpg new file mode 100644 index 000000000000..b9702fdb895f Binary files /dev/null and b/assets/images/landscape.jpg differ diff --git a/assets/images/members/common-crawl-logo.svg b/assets/images/members/common-crawl-logo.svg new file mode 100644 index 000000000000..2a9efcd9ef62 --- /dev/null +++ b/assets/images/members/common-crawl-logo.svg @@ -0,0 +1,52 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/images/members/dodlf-logo.jpg b/assets/images/members/dodlf-logo.jpg new file mode 100644 index 000000000000..c153de2adb11 Binary files /dev/null and b/assets/images/members/dodlf-logo.jpg differ diff --git a/assets/images/members/iabfu-logo.svg b/assets/images/members/iabfu-logo.svg new file mode 100644 index 000000000000..ac630fa9079e --- /dev/null +++ b/assets/images/members/iabfu-logo.svg @@ -0,0 +1,265 @@ + + + + +LOGO-01 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/images/members/iit-logo.png b/assets/images/members/iit-logo.png new file mode 100644 index 000000000000..1cdf841f2aa2 Binary files /dev/null and b/assets/images/members/iit-logo.png differ diff --git a/assets/images/members/rensselaer-logo.png b/assets/images/members/rensselaer-logo.png new file mode 100644 index 000000000000..cc30e72e6df4 Binary files /dev/null and b/assets/images/members/rensselaer-logo.png differ diff --git a/assets/images/openreg.png b/assets/images/openreg.png new file mode 100644 index 000000000000..71fab0973309 Binary files /dev/null and b/assets/images/openreg.png differ diff --git a/assets/images/optimize-llms.png b/assets/images/optimize-llms.png new file mode 100644 index 000000000000..ba6e73cf4899 Binary files /dev/null and b/assets/images/optimize-llms.png differ diff --git a/assets/images/peak-performance-minimized-memory/fg1.png b/assets/images/peak-performance-minimized-memory/fg1.png new file mode 100644 index 000000000000..175eadfbe04d Binary files /dev/null and b/assets/images/peak-performance-minimized-memory/fg1.png differ diff --git a/assets/images/peak-performance-minimized-memory/fg2.png b/assets/images/peak-performance-minimized-memory/fg2.png new file mode 100644 index 000000000000..365dfa313c7d Binary files /dev/null and b/assets/images/peak-performance-minimized-memory/fg2.png differ diff --git a/assets/images/peak-performance-minimized-memory/fg3.png b/assets/images/peak-performance-minimized-memory/fg3.png new file mode 100644 index 000000000000..6d28237582f5 Binary files /dev/null and b/assets/images/peak-performance-minimized-memory/fg3.png differ diff --git a/assets/images/peak-performance-minimized-memory/fg4.png b/assets/images/peak-performance-minimized-memory/fg4.png new file mode 100644 index 000000000000..3685c1c81f98 Binary files /dev/null and b/assets/images/peak-performance-minimized-memory/fg4.png differ diff --git a/assets/images/pt-day-cfp.png b/assets/images/pt-day-cfp.png new file mode 100644 index 000000000000..f8f6a849f3ab Binary files /dev/null and b/assets/images/pt-day-cfp.png differ diff --git a/assets/images/pt-day-china-2025-cfp.jpg b/assets/images/pt-day-china-2025-cfp.jpg new file mode 100644 index 000000000000..d42c377175a5 Binary files /dev/null and b/assets/images/pt-day-china-2025-cfp.jpg differ diff --git a/assets/images/pt-dinov2-multi-label-plant-species-classification.png b/assets/images/pt-dinov2-multi-label-plant-species-classification.png new file mode 100644 index 000000000000..c544f914043a Binary files /dev/null and b/assets/images/pt-dinov2-multi-label-plant-species-classification.png differ diff --git a/assets/images/pt-fedora-os-communities/fg1.jpg b/assets/images/pt-fedora-os-communities/fg1.jpg new file mode 100644 index 000000000000..e9c0de7b24ef Binary files /dev/null and b/assets/images/pt-fedora-os-communities/fg1.jpg differ diff --git a/assets/images/pt-fedora-os-communities/fg2.jpg b/assets/images/pt-fedora-os-communities/fg2.jpg new file mode 100644 index 000000000000..1aa340f71de9 Binary files /dev/null and b/assets/images/pt-fedora-os-communities/fg2.jpg differ diff --git a/assets/images/pt-fedora-os-communities/fg3.jpg b/assets/images/pt-fedora-os-communities/fg3.jpg new file mode 100644 index 000000000000..11ff09aaff08 Binary files /dev/null and b/assets/images/pt-fedora-os-communities/fg3.jpg differ diff --git a/assets/images/pt-fedora-os-communities/fg4.jpg b/assets/images/pt-fedora-os-communities/fg4.jpg new file mode 100644 index 000000000000..008d80e99dd4 Binary files /dev/null and b/assets/images/pt-fedora-os-communities/fg4.jpg differ diff --git a/assets/images/pt-fedora-os-communities/fg5.jpg b/assets/images/pt-fedora-os-communities/fg5.jpg new file mode 100644 index 000000000000..8761774d551b Binary files /dev/null and b/assets/images/pt-fedora-os-communities/fg5.jpg differ diff --git a/assets/images/pt-fedora-os-communities/fg6.jpg b/assets/images/pt-fedora-os-communities/fg6.jpg new file mode 100644 index 000000000000..9d06bd98d994 Binary files /dev/null and b/assets/images/pt-fedora-os-communities/fg6.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg1.jpg b/assets/images/pt-korea-user-group-recap/fg1.jpg new file mode 100644 index 000000000000..dbd408d6baf8 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg1.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg1.png b/assets/images/pt-korea-user-group-recap/fg1.png new file mode 100644 index 000000000000..02caf444eb0b Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg1.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg2.jpg b/assets/images/pt-korea-user-group-recap/fg2.jpg new file mode 100644 index 000000000000..924780bb6a3c Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg2.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg2.png b/assets/images/pt-korea-user-group-recap/fg2.png new file mode 100644 index 000000000000..3153c75f603c Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg2.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg3.jpg b/assets/images/pt-korea-user-group-recap/fg3.jpg new file mode 100644 index 000000000000..b49e18e82f9d Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg3.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg3.png b/assets/images/pt-korea-user-group-recap/fg3.png new file mode 100644 index 000000000000..ed9e8ccadb3e Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg3.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg4.jpg b/assets/images/pt-korea-user-group-recap/fg4.jpg new file mode 100644 index 000000000000..b89e21c71d78 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg4.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg4.png b/assets/images/pt-korea-user-group-recap/fg4.png new file mode 100644 index 000000000000..4831610f703d Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg4.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg5.jpg b/assets/images/pt-korea-user-group-recap/fg5.jpg new file mode 100644 index 000000000000..5e41bbd65b32 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg5.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg5.png b/assets/images/pt-korea-user-group-recap/fg5.png new file mode 100644 index 000000000000..02e5b01ac852 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg5.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg6.jpg b/assets/images/pt-korea-user-group-recap/fg6.jpg new file mode 100644 index 000000000000..ce7c789c2385 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg6.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg6.png b/assets/images/pt-korea-user-group-recap/fg6.png new file mode 100644 index 000000000000..5e5650ab16cf Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg6.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg7.jpg b/assets/images/pt-korea-user-group-recap/fg7.jpg new file mode 100644 index 000000000000..e0b7f94f2a3b Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg7.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg7.png b/assets/images/pt-korea-user-group-recap/fg7.png new file mode 100644 index 000000000000..3371ea3c6073 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg7.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg8.jpg b/assets/images/pt-korea-user-group-recap/fg8.jpg new file mode 100644 index 000000000000..8d57c2a30bc4 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg8.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg8.png b/assets/images/pt-korea-user-group-recap/fg8.png new file mode 100644 index 000000000000..4ba3b87290b3 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg8.png differ diff --git a/assets/images/pt-korea-user-group-recap/fg9.jpg b/assets/images/pt-korea-user-group-recap/fg9.jpg new file mode 100644 index 000000000000..e631c2a29545 Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg9.jpg differ diff --git a/assets/images/pt-korea-user-group-recap/fg9.png b/assets/images/pt-korea-user-group-recap/fg9.png new file mode 100644 index 000000000000..02d9224b8c7d Binary files /dev/null and b/assets/images/pt-korea-user-group-recap/fg9.png differ diff --git a/assets/images/pt27qa.png b/assets/images/pt27qa.png new file mode 100644 index 000000000000..dbc60c8fcd0e Binary files /dev/null and b/assets/images/pt27qa.png differ diff --git a/assets/images/pytorch-2-7-intel-gpus/fg1.png b/assets/images/pytorch-2-7-intel-gpus/fg1.png new file mode 100644 index 000000000000..a0b4ee57da90 Binary files /dev/null and b/assets/images/pytorch-2-7-intel-gpus/fg1.png differ diff --git a/assets/images/pytorch-2-7-intel-gpus/fg2.png b/assets/images/pytorch-2-7-intel-gpus/fg2.png new file mode 100644 index 000000000000..cb39643891c1 Binary files /dev/null and b/assets/images/pytorch-2-7-intel-gpus/fg2.png differ diff --git a/assets/images/pytorch-at-gtc.jpg b/assets/images/pytorch-at-gtc.jpg new file mode 100644 index 000000000000..7380cfb0125a Binary files /dev/null and b/assets/images/pytorch-at-gtc.jpg differ diff --git a/assets/images/scaling-recommendation-2d-sparse-parallelism/fg1.png b/assets/images/scaling-recommendation-2d-sparse-parallelism/fg1.png new file mode 100644 index 000000000000..08674e9efda3 Binary files /dev/null and b/assets/images/scaling-recommendation-2d-sparse-parallelism/fg1.png differ diff --git a/assets/images/scaling-recommendation-2d-sparse-parallelism/fg2.png b/assets/images/scaling-recommendation-2d-sparse-parallelism/fg2.png new file mode 100644 index 000000000000..45b60ca30c15 Binary files /dev/null and b/assets/images/scaling-recommendation-2d-sparse-parallelism/fg2.png differ diff --git a/assets/images/sglang-join-pytorch/fg1.png b/assets/images/sglang-join-pytorch/fg1.png new file mode 100644 index 000000000000..a7838c59ac6e Binary files /dev/null and b/assets/images/sglang-join-pytorch/fg1.png differ diff --git a/assets/images/sglang-join-pytorch/fg2.png b/assets/images/sglang-join-pytorch/fg2.png new file mode 100644 index 000000000000..5e7e3b1d1f0a Binary files /dev/null and b/assets/images/sglang-join-pytorch/fg2.png differ diff --git a/assets/images/submit-to-speak/fg1.png b/assets/images/submit-to-speak/fg1.png new file mode 100644 index 000000000000..37ce1065eddc Binary files /dev/null and b/assets/images/submit-to-speak/fg1.png differ diff --git a/assets/images/submit-to-speak/fg2.jpg b/assets/images/submit-to-speak/fg2.jpg new file mode 100644 index 000000000000..b8e0a6f2cbfc Binary files /dev/null and b/assets/images/submit-to-speak/fg2.jpg differ diff --git a/assets/images/submit-to-speak/fg3.jpg b/assets/images/submit-to-speak/fg3.jpg new file mode 100644 index 000000000000..7beba8019952 Binary files /dev/null and b/assets/images/submit-to-speak/fg3.jpg differ diff --git a/assets/pytorch-foundation-charter-04052023.pdf b/assets/pytorch-foundation-charter-04052023.pdf deleted file mode 100644 index 3a2d110c15bd..000000000000 Binary files a/assets/pytorch-foundation-charter-04052023.pdf and /dev/null differ diff --git a/assets/pytorch-foundation-charter.pdf b/assets/pytorch-foundation-charter.pdf new file mode 100644 index 000000000000..7dac6f5ac972 Binary files /dev/null and b/assets/pytorch-foundation-charter.pdf differ diff --git a/assets/pytorch-frame-expert-exchange.pdf b/assets/pytorch-frame-expert-exchange.pdf new file mode 100644 index 000000000000..6930f03c3ccb Binary files /dev/null and b/assets/pytorch-frame-expert-exchange.pdf differ diff --git a/assets/quick-start-module.js b/assets/quick-start-module.js index 2a3e6116a969..f8fa31902fcd 100644 --- a/assets/quick-start-module.js +++ b/assets/quick-start-module.js @@ -11,8 +11,8 @@ var archInfoMap = new Map([ ['accnone', {title: "CPU", platforms: new Set(['linux', 'macos', 'windows'])}] ]); -let version_map={"nightly": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.4"], "cuda.z": ["cuda", "12.6"], "rocm5.x": ["rocm", "6.3"]}, "release": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.4"], "cuda.z": ["cuda", "12.6"], "rocm5.x": ["rocm", "6.2.4"]}} -let stable_version="Stable (2.6.0)"; +let version_map={"nightly": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "12.6"], "cuda.y": ["cuda", "12.8"], "cuda.z": ["cuda", "12.9"], "rocm5.x": ["rocm", "6.4"]}, "release": {"accnone": ["cpu", ""], "cuda.x": ["cuda", "11.8"], "cuda.y": ["cuda", "12.6"], "cuda.z": ["cuda", "12.8"], "rocm5.x": ["rocm", "6.3"]}} +let stable_version="Stable (2.7.1)"; var default_selected_os = getAnchorSelectedOS() || getDefaultSelectedOS(); var opts = { @@ -267,7 +267,7 @@ $("[data-toggle='cloud-dropdown']").on("click", function(e) { }); function commandMessage(key) { - var object = {"preview,pip,linux,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,linux,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", "preview,pip,linux,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", "preview,pip,linux,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,linux,rocm5.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "preview,conda,linux,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,linux,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,linux,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,linux,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,linux,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,libtorch,linux,accnone,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.x,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu118/libtorch-shared-with-deps-latest.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.y,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu124/libtorch-shared-with-deps-latest.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu124/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.z,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-shared-with-deps-latest.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/rocm6.3/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,conda,macos,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,macos,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,macos,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,macos,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,macos,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,pip,windows,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,windows,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", "preview,pip,windows,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", "preview,pip,windows,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "preview,conda,windows,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,windows,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,windows,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,conda,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "preview,conda,windows,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "preview,libtorch,windows,accnone,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.x,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.y,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cu124/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cu124/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.z,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows", "stable,pip,linux,accnone,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "stable,pip,linux,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,linux,cuda.y,python": "pip3 install torch torchvision torchaudio", "stable,pip,linux,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "stable,pip,linux,rocm5.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", "stable,conda,linux,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,linux,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,linux,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,linux,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,linux,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,libtorch,linux,accnone,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.6.0%2Bcpu.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcpu.zip", "stable,libtorch,linux,cuda.x,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/cu118/libtorch-shared-with-deps-2.6.0%2Bcu118.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcu118.zip", "stable,libtorch,linux,cuda.y,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/cu124/libtorch-shared-with-deps-2.6.0%2Bcu124.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cu124/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcu124.zip", "stable,libtorch,linux,cuda.z,cplusplus": "Download here (Pre-cxx11 ABI):
    https://download.pytorch.org/libtorch/cu126/libtorch-shared-with-deps-2.6.0%2Bcu126.zip
    Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcu126.zip", "stable,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/rocm6.2.4/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Brocm6.2.4.zip", "stable,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,accnone,python": "pip3 install torch torchvision torchaudio", "stable,conda,macos,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,macos,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,macos,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,macos,rocm5.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,macos,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.6.0.zip", "stable,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.6.0.zip", "stable,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.6.0.zip", "stable,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.6.0.zip", "stable,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.6.0.zip", "stable,pip,windows,accnone,python": "pip3 install torch torchvision torchaudio", "stable,pip,windows,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,windows,cuda.y,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124", "stable,pip,windows,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "stable,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "stable,conda,windows,cuda.x,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,windows,cuda.y,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,windows,cuda.z,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,conda,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "stable,conda,windows,accnone,python": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", "stable,libtorch,windows,accnone,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.6.0%2Bcpu.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.6.0%2Bcpu.zip", "stable,libtorch,windows,cuda.x,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.6.0%2Bcu118.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.6.0%2Bcu118.zip", "stable,libtorch,windows,cuda.y,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cu124/libtorch-win-shared-with-deps-2.6.0%2Bcu124.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cu124/libtorch-win-shared-with-deps-debug-2.6.0%2Bcu124.zip", "stable,libtorch,windows,cuda.z,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.6.0%2Bcu126.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.6.0%2Bcu126.zip", "stable,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows"}; + var object = {"preview,pip,linux,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,linux,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,linux,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "preview,pip,linux,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "preview,pip,linux,rocm5.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "preview,libtorch,linux,accnone,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.x,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.y,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu128/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,cuda.z,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/cu129/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/nightly/rocm6.4/libtorch-cxx11-abi-shared-with-deps-latest.zip", "preview,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package
    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,macos,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "preview,pip,windows,accnone,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "preview,pip,windows,cuda.x,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "preview,pip,windows,cuda.y,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "preview,pip,windows,cuda.z,python": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "preview,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "preview,libtorch,windows,accnone,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.x,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.y,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,cuda.z,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/nightly/cu129/libtorch-win-shared-with-deps-latest.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/nightly/cu129/libtorch-win-shared-with-deps-debug-latest.zip", "preview,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows", "stable,pip,linux,accnone,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "stable,pip,linux,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,linux,cuda.y,python": "pip3 install torch torchvision torchaudio", "stable,pip,linux,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "stable,pip,linux,rocm5.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "stable,libtorch,linux,accnone,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcpu.zip", "stable,libtorch,linux,cuda.x,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip", "stable,libtorch,linux,cuda.y,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu126.zip", "stable,libtorch,linux,cuda.z,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/cu128/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu128.zip", "stable,libtorch,linux,rocm5.x,cplusplus": "Download here (cxx11 ABI):
    https://download.pytorch.org/libtorch/rocm6.3/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Brocm6.3.zip", "stable,pip,macos,cuda.x,python": "# CUDA is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.y,python": "# CUDA is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,cuda.z,python": "# CUDA is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,rocm5.x,python": "# ROCm is not available on MacOS, please use default package
    pip3 install torch torchvision torchaudio", "stable,pip,macos,accnone,python": "pip3 install torch torchvision torchaudio", "stable,libtorch,macos,accnone,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,cuda.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,cuda.y,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,cuda.z,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,libtorch,macos,rocm5.x,cplusplus": "Download arm64 libtorch here (ROCm and CUDA are not supported):
    https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "stable,pip,windows,accnone,python": "pip3 install torch torchvision torchaudio", "stable,pip,windows,cuda.x,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "stable,pip,windows,cuda.y,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "stable,pip,windows,cuda.z,python": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "stable,pip,windows,rocm5.x,python": "NOTE: ROCm is not available on Windows", "stable,libtorch,windows,accnone,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.7.1%2Bcpu.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.7.1%2Bcpu.zip", "stable,libtorch,windows,cuda.x,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.7.1%2Bcu118.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu118.zip", "stable,libtorch,windows,cuda.y,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.7.1%2Bcu126.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu126.zip", "stable,libtorch,windows,cuda.z,cplusplus": "Download here (Release version):
    https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-2.7.1%2Bcu128.zip
    Download here (Debug version):
    https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu128.zip", "stable,libtorch,windows,rocm5.x,cplusplus": "NOTE: ROCm is not available on Windows"}; if (!object.hasOwnProperty(key)) { $("#command").html( diff --git a/autonomous-language-model-systems.html b/autonomous-language-model-systems.html new file mode 100644 index 000000000000..3b065fafb852 --- /dev/null +++ b/autonomous-language-model-systems.html @@ -0,0 +1,46 @@ +--- +layout: default +title: "Towards Autonomous Language Model Systems" +body-class: announcement +background-class: announcement-background +permalink: /autonomous-language-model-systems +--- + +

    +
    +
    +

    PyTorch Webinars

    +
    +
    +
    + +
    +
    +
    +
    + Towards Autonomous Language Model Systems +

    Towards Autonomous Language Model Systems

    +

    + Date: May 21, 2025, 11AM PT / 2PM ET +
    + Speaker: Ofir Press +
    +
    + Language models (LMs) are increasingly used to assist users in day-to-day tasks such as programming (Github Copilot) or search (Google's AI Overviews). But can we build language model systems that are able to autonomously complete entire tasks end-to-end? +

    + +In this talk, Ofir Press will discuss efforts to build autonomous LM systems, focusing on the software engineering domain. Ofir will present SWE-bench, a novel method for measuring AI systems on their abilities to fix real issues in popular software libraries. Ofir will then discuss SWE-agent, a system for solving SWE-bench tasks. +

    + +SWE-bench and SWE-agent are used by many leading AI organizations in academia and industry, including OpenAI, Anthropic, Meta, and Google, and SWE-bench has been downloaded over 2 million times. These projects show that academics on tight budgets can have a substantial impact in steering the research community toward building autonomous systems that can complete challenging tasks. +

    + +Ofir is a postdoc at Princeton University, where they mainly work with Karthik Narasimhan's lab. Ofir previously completed their PhD at the University of Washington in Seattle, where Ofir was advised by Noah Smith. During their PhD, Ofir spent two years at Facebook AI Research Labs on Luke Zettlemoyer's team. +

    +

    Register now to attend this event

    +
    +

    +
    +
    +
    +
    \ No newline at end of file diff --git a/code-of-conduct.html b/code-of-conduct.html new file mode 100644 index 000000000000..419ba4a38970 --- /dev/null +++ b/code-of-conduct.html @@ -0,0 +1,224 @@ +--- +layout: default +title: PyTorch Foundation Code of Conduct +body-class: announcement +background-class: announcement-background +permalink: /code-of-conduct +--- +{% assign cards = site.board_info %} + +
    +
    +
    +

    PyTorch Foundation
    Code of Conduct

    +
    +
    +
    + +
    +
    +
    +
    + + +

    Our Commitment

    + + +

    + The PyTorch Foundation is committed to fostering an inclusive, welcoming, and safe environment for everyone involved in the PyTorch Foundation community. This commitment extends across all Foundation activities, including but not limited to our technical projects, events, communication channels, and social media presence. We pledge to make participation in our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, sex characteristics, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation. +

    +

    Scope

    + + +

    + This code of conduct applies to Governing Board meetings, Technical Advisory Council meetings and outreach programs (such as the Ambassador Program) of the PyTorch Foundation and any other activity of the PyTorch Foundation that is not otherwise covered by a code of conduct of either The Linux Foundation or an applicable technical project. +

    +

    PyTorch Foundation Events

    +

    + PyTorch Foundation events that are produced by the Linux Foundation with professional events staff are governed by the Linux Foundation Events Code of Conduct available on the event page, which is designed to be used in conjunction with this PyTorch Foundation Code of Conduct. +

    +

    Technical Projects in the PyTorch Foundation Umbrella

    +

    + Technical projects supported by the PyTorch Foundation are organized as separate projects and each maintains a code of conduct that applies to participants in those projects. +

    +

    Expected Behavior

    + + +

    + Community members are expected to: +

    +
      + +
    • Use welcoming and inclusive language
    • + +
    • Respect differing viewpoints and experiences
    • + +
    • Accept constructive criticism gracefully
    • + +
    • Prioritize what benefits the community as a whole
    • + +
    • Show empathy and kindness toward others
    • + +
    • Be professional and responsible in all interactions
    • + +
    • Follow health and safety requirements at in-person events
    • + +
    • Exercise consideration and respect in speech and actions
    • + +
    • Collaborate with other community members in a constructive manner
    • +
    +

    Unacceptable Behavior

    + + +

    + The following behaviors are considered unacceptable within our community: +

    +

    Harassment and Discrimination

    + + +
      + +
    • Harassment of any kind, whether verbal, physical, or visual
    • + +
    • Discrimination based on protected characteristics
    • + +
    • Sexual harassment or unwelcome sexual attention
    • + +
    • Deliberate intimidation, stalking, or following
    • + +
    • Sustained disruption of talks, events, or online discussions
    • + +
    • Inappropriate physical contact
    • +
    +

    Communication and Content

    + + +
      + +
    • Use of sexualized language or imagery
    • + +
    • Violent or threatening language or imagery
    • + +
    • Trolling, insulting/derogatory comments, or personal attacks
    • + +
    • Public or private harassment
    • + +
    • Publishing others’ private information without permission
    • + +
    • Using Foundation platforms for political campaigning or promotion of political causes that are unrelated to technology
    • + +
    • Other conduct which could reasonably be considered inappropriate in a professional setting
    • +
    +

    Online and Social Media Behavior

    + + +
      + +
    • Harassment or bullying through social media platforms
    • + +
    • Spreading misinformation about the Foundation or its members
    • + +
    • Using Foundation channels for commercial promotion without permission
    • + +
    • Creating multiple accounts to evade moderation
    • + +
    • Impersonating Foundation members or officials
    • +
    +

    Behavior During Investigations

    + + +
      + +
    • Providing knowingly false or misleading information in connection with a Code of Conduct investigation or otherwise intentionally tampering with an investigation.
    • + +
    • Retaliating against a person because they reported an incident or provided information about an incident as a witness.
    • +
    +

    Enforcement

    + + +

    Reporting Violations

    + + +

    + Violations can be reported to conduct@pytorch.org. All reports will be: +

    +
      + +
    • Reviewed promptly and thoroughly
    • + +
    • Treated with strict confidentiality
    • + +
    • Investigated and addressed appropriately
    • + +
    • Documented for future reference
    • +
    +

    Consequences

    + + +

    + Violations may result in: +

    +
      + +
    • Warning to the offending individual
    • + +
    • Temporary or permanent ban from Foundation spaces
    • + +
    • Removal from leadership or contributory roles
    • + +
    • Expulsion from events without refund
    • + +
    • Reporting to appropriate authorities if necessary
    • + +
    • Other consequences
    • +
    +

    Appeals Process

    + + +
      + +
    • Individuals may appeal enforcement decisions
    • + +
    • Appeals must be submitted in writing within 30 days to the PyTorch Foundation via email to conduct@pytorch.org
    • + +
    • Decisions on appeals are final
    • +
    +

    Pre-Event Concerns

    + + +

    + If you have concerns about attending an upcoming event where specific individuals may be present: +

    +
      + +
    • Contact conduct@pytorch.org in advance
    • + +
    • Arrangements can be made for your safety and comfort
    • + +
    • Precautions may include providing security escorts and notifying staff
    • +
    +

    Amendments

    + + +

    + This Code of Conduct may be amended by the PyTorch Foundation as needed. Changes will be communicated to the community, and continued participation in the community indicates agreement to the current version. +

    +

    Questions and Reporting - Contact

    + + +

    + For questions, concerns, or reports: +
    + Email: conduct@pytorch.org +

    +

    ​​Acknowledgements

    +

    + This Code of Conduct is adapted from the Contributor Covenant, version 2.0 available here. +

    + + + +
    +
    +
    +
    diff --git a/credits.html b/credits.html index ce48847b9725..d4e3dc24f111 100644 --- a/credits.html +++ b/credits.html @@ -21,9 +21,9 @@

    PyTorch Cloud
           Credit Program<
    -

    We believe providing public, self-service, and automated access to Cloud Infrastructure is crucial for every project to incubate, grow, and succeed. +

    We believe providing public, self-service, and automated access to cloud infrastructure is essential for every project's incubation, growth, and success.

    - That’s why PyTorch has created the Cloud Credits program, focussed on the mutual success of projects and participating companies. To date, supporters like AWS have donated cloud credits to showcase direct support of our incubating and graduated projects. In fact, many organizations sponsor PyTorch projects for their business to succeed. + To support this, PyTorch has established a program that enables organizations to contribute either cloud credits or financial donations directly towards maintaining and expanding our Continuous Integration (CI) infrastructure and other foundation-hosted project infrastructure. Contributions from organizations like AWS have already provided cloud credits, demonstrating a clear commitment to the success and sustainability of the PyTorch' Foundation's hosted projects. Many organizations continue to sponsor PyTorch projects, recognizing that supporting foundational infrastructure contributes directly to their own business growth and success.

    diff --git a/ecosystem/ecosystem.html b/ecosystem/ecosystem.html index 2fe0e044cdcb..b60f4bc5efd2 100644 --- a/ecosystem/ecosystem.html +++ b/ecosystem/ecosystem.html @@ -4,6 +4,7 @@ permalink: ecosystem/ background-class: ecosystem-background body-class: ecosystem +redirect_to: https://landscape.pytorch.org/ ---
    @@ -14,7 +15,8 @@

    Tap into a rich ecosystem of tools, libraries, and more to support, accelerate, and explore AI development.

    -

    Join the Ecosystem

    +

    Join the Ecosystem

    +
    @@ -64,7 +66,7 @@

    Have a project you want featured?

    -

    Join the PyTorch ecosystem

    +

    Join the PyTorch ecosystem

    diff --git a/ecosystem/join.html b/ecosystem/join.html index 05f246cf4e8c..ce102e84e325 100644 --- a/ecosystem/join.html +++ b/ecosystem/join.html @@ -4,57 +4,5 @@ permalink: ecosystem/join.html body-class: ecosystem background-class: ecosystem-join-background +redirect_to: https://github.com/pytorch-fdn/ecosystem --- - -
    -
    -

    Join The
    Ecosystem

    - -
    -
    - -
    -
    -
    - - - -
    -
    -
    -

    Thank you for being part of the PyTorch community! The PyTorch ecosystem consists of projects, tools, and libraries from a broad set of researchers in academia and industry, application developers, and ML engineers. - The goal of this ecosystem is to support, accelerate, and aid in your exploration with PyTorch. Below are a few questions to think about if you'd like to have your project included as part of the ecosystem:

    -
    - -
    -
    -

    01

    -
    - -
    -

    Can your project be used with or within PyTorch to help augment the user experience, enable new capabilities or speed up training/inference?
    - Examples could include visualization tools, a kernel library or a framework that sits on top to enable research in a particular area such as NLP.

    -
    - -
    - -
    -
    -

    02

    -
    - -
    -

    Is the project ready for broad developer usage?
    - For example, is the project stable, will it be maintained, and is there adequate supporting infrastructure, documentation, and technical support to allow a developer to successfully use it?

    -
    -
    - -

    If you answered 'yes' to both, please fill out the form below with more details, and we'll follow up with next steps. If your tool seems like a good fit, the PyTorch Foundation team will work with you to find a time present your tool to the TAC. -

    -
    - - {% include ecosystem_form.html %} -
    -
    -
    -
    diff --git a/executorch.html b/executorch.html index c9ca7d74a168..705b8daad87d 100644 --- a/executorch.html +++ b/executorch.html @@ -22,10 +22,17 @@

    ExecuTorch

    -

    What is ExecuTorch?

    -

    ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of various PyTorch models (vision, speech, Generative AI, and more) to edge devices. Key value propositions of ExecuTorch are:

    -
    +
    +
    +

    What is ExecuTorch?

    +

    ExecuTorch is an end-to-end solution for enabling on-device inference capabilities across mobile and edge devices including wearables, embedded devices and microcontrollers. It is part of the PyTorch Edge ecosystem and enables efficient deployment of various PyTorch models (vision, speech, Generative AI, and more) to edge devices. Key value propositions of ExecuTorch are:

    +
    +
    + ExecuTorch logo +
    +
    +
    Mobile icon diff --git a/governing-board.html b/governing-board.html index 55fe3c48246d..588aa3db0d42 100644 --- a/governing-board.html +++ b/governing-board.html @@ -56,6 +56,20 @@

    Brian Granger, Amazon Web Services

    +
    +
    + Dwarak Rajagopal +
    +
    +

    Dwarak Rajagopal, Snowflake

    +

    Dwarak Rajagopal is a distinguished technology leader with over two decades of experience at some of the world’s most innovative companies, including Google, Meta, Uber, Apple, and AMD. As Vice President of AI Engineering at Snowflake, Dwarak oversees the AI and ML engineering organizations, helping drive innovation across Snowflake Cortex AI, Snowflake’s AI Research Team, and open source offerings. He previously served as Senior Engineering Director at Google, where he spearheaded the AI Frameworks and On-Device Machine Learning teams, driving cutting-edge advancements in AI and large language models (LLMs) to enhance efficiency across server and mobile platforms. Earlier in his career, Dwarak led Meta’s PyTorch Core Frameworks team, bridging the gap between AI research and production. His career also includes developing autonomous driving software at Uber, pioneering graphics technology at Apple, and shaping compiler innovations at AMD. +
    +
    + Dwarak is passionate about democratizing AI through open source initiatives and ensuring the ethical advancement of technology. His visionary leadership continues to shape the future of AI and empower global innovation. +
    +

    +
    +
    Fred Li diff --git a/host-your-project.html b/host-your-project.html new file mode 100644 index 000000000000..9e392fb50f5b --- /dev/null +++ b/host-your-project.html @@ -0,0 +1,37 @@ +--- +layout: default +title: Host Your Project +body-class: announcement +background-class: announcement-background +permalink: /host-your-project +--- +{% assign cards = site.board_info %} + +
    +
    +
    +

    Host Your
           Project

    +
    +
    +
    + +
    +
    +
    +
    +

    Host Your Project with the PyTorch Foundation

    +

    + Have an open source project that advances the PyTorch ecosystem? The PyTorch Foundation offers a neutral, community-driven home for projects that share our commitment to open, accessible deep learning innovation. By hosting your project with us, you’ll gain support from a global network of developers, researchers, and industry leaders—plus access to events, training, and infrastructure that help your project grow and thrive. +
    +
    + Ready to take the next step? Submit your project for consideration and become part of the broader PyTorch ecosystem. +
    +
    + + Host Your Project + +

    +
    +
    +
    +
    diff --git a/index.html b/index.html index cc1e2256281d..3f0e5a4976bf 100644 --- a/index.html +++ b/index.html @@ -1,225 +1,12 @@ ---- -layout: default -background-class: home-page-background -body-class: homepage -display-news-banner: true ---- - -
    -
    -

    PyTorch logo
    Get Started

    - -

    Choose Your Path: Install PyTorch Locally or Launch Instantly on Supported Cloud Platforms

    - - - Get started - -
    -
    - -
    - -
    -
    -
    -

    Blog

    -

    - Stay up-to-date on the latest news and technical topics from the PyTorch Foundation. -

    -

    - Read more -

    -
    -
    -

    PyTorch 2.6

    -

    - Featuring Python 3.13 support for torch.compile, several AOTInductor enhancements, FP16 support on X86 CPUs, and more. -

    - Learn more -

    -
    -
    -

    Membership Available

    -

    - Become an integral part of the PyTorch Foundation, to build and shape the future of AI. -

    -

    - Join -

    -
    -
    -
    - - -
    - -
    -
    -
    -
    -

    Key Features &
    Capabilities

    - - - See all Features - -
    -
    - -
    - {% assign features = site.features | where: "featured-home", true | sort: 'order' %} - - {% for feature in features %} -
    -
    {{ feature.title }}
    -

    {{ feature.summary-home }}

    -
    - {% endfor %} -
    -
    -
    -
    - - {% include quick_start_module.html %} - -
    -
    -
    -
    -

    Ecosystem

    -
    Feature Projects
    - - - See all Projects - - -
    -
    -

    Explore a rich ecosystem of libraries, tools, and more to support development.

    -
    -
    - -
    - {% assign ecosystems = site.ecosystem | where: "featured-home", true | sort: 'order' %} - {% for item in ecosystems %} - - {% endfor %} -
    - -
    -
    -

    Community

    - -
    -
    -

    Join the PyTorch developer community to contribute, learn, and get your questions answered.

    -
    -
    - -
    - {% assign resources = site.resources | where: "featured-home",true | sort: 'order' %} - - {% for resource in resources %} - {% assign card_title = resource.title | split: ' ' %} - - - -
    -
    -
    - -
    -
    -
    -
    -

    Companies & Universities
    Using PyTorch

    - - - -
    - {% assign case_studies = site.case_studies | where: "featured-home", true | sort: "order" %} - - {% for case_study in case_studies %} -
    - -

    {{ case_study.excerpt }}

    - - Learn More - -
    - {% endfor %} -
    -
    -
    -
    -
    -
    - - - - -
    - -
    - - - - - - - - - - - + + + + + Redirecting... + + + + +

    If you are not redirected automatically, follow this link to the documentation.

    + + \ No newline at end of file diff --git a/join-ecosystem.html b/join-ecosystem.html new file mode 100644 index 000000000000..255cb92eff40 --- /dev/null +++ b/join-ecosystem.html @@ -0,0 +1,65 @@ +--- +layout: default +title: Join the PyTorch Ecosystem +body-class: announcement +background-class: announcement-background +permalink: /join-ecosystem +--- +{% assign cards = site.board_info %} + +
    +
    +
    +

    Join the
           PyTorch Ecosystem

    +
    +
    +
    + +
    +
    +
    +
    +

    + The PyTorch Ecosystem is made up of innovative open source AI projects that extend, integrate with, or build upon PyTorch. If you're developing a project that supports the PyTorch community, you’re welcome to apply for inclusion in the Ecosystem. +

    + Please review the PyTorch Ecosystem review process to ensure that you meet the minimum expectations before applying. +

    +

    Application Process

    +

    + Applying to join the PyTorch Ecosystem is simple: +

    +
      + +
    1. Open a new application using the Ecosystem Application GitHub issue form.
    2. + +
    3. Complete all required sections.
    4. + +
    5. Submit the issue.
    6. +
    +

    + Once submitted, your application enters the review pipeline managed by the PyTorch Ecosystem Working Group. +

    +

    Review Workflow

    +

    + All applications are tracked via the PyTorch Ecosystem Project Board. +

    + Applications are reviewed on a first-come, first-served basis. +

    + The Working Group reviews approximately 7–10 projects per session, depending on availability. +

    + Projects are moved to “In Progress” roughly two weeks before a scheduled review session. +

    + During review, Working Group members assess your application based on technical merit, alignment with the PyTorch mission, and community readiness. Questions and requests for clarification may be posted directly in your GitHub issue. +

    +

    Ready to apply?

    +

    + + + Submit your application on GitHub + +

    + +
    +
    +
    +
    diff --git a/join.html b/join.html index 18be388bbe4a..256d5bb64cec 100644 --- a/join.html +++ b/join.html @@ -41,7 +41,7 @@

    Apply for Membership Today

    If your organization is interested in becoming a Member of the PyTorch Foundation, please complete the Member Enrollment form and designate someone with signature authority for your institution for the membership approval process.

    - Premier Membership requires approval from the PyTorch Governing Board. After completing the Membership Application form, please fill out the Premier Membership Questionnaire. The Governing Board will review your application within 14 days. + For Premier Membership you must also complete the Premier Membership Questionnaire after submitting your membership application. The Foundation will review your application after both have been submitted.

    Please note, membership is not required to participate in PyTorch as an individual contributor. You can join the community at any time and collaborate with others interested in the project. diff --git a/multi-modal-dl-frame.html b/multi-modal-dl-frame.html index f2468400cbdb..5fc616d414e2 100644 --- a/multi-modal-dl-frame.html +++ b/multi-modal-dl-frame.html @@ -25,15 +25,14 @@

    Multi-Modal Tabular Deep Learning with PyTorch Frame


    Speaker: Akihiro Nitta, Software Engineer, Kumo.ai
    - Location: Online + Link to session video
    + Download slides
    - In this talk, Akihiro introduces PyTorch Frame, a modular framework for multi-modal tabular deep learning. PyTorch Frame enables seamless integration with the PyTorch ecosystem, including PyTorch Geometric for graph-based message passing across relational data and Hugging Face Transformers for extracting rich text features. The talk also highlights its specialized data structures for efficiently handling sparse features, making PyTorch Frame an essential tool for modern tabular data. +
    + In this talk, Akihiro introduced PyTorch Frame, a modular framework for multi-modal tabular deep learning. PyTorch Frame enables seamless integration with the PyTorch ecosystem, including PyTorch Geometric for graph-based message passing across relational data and Hugging Face Transformers for extracting rich text features. The talk also highlights its specialized data structures for efficiently handling sparse features, making PyTorch Frame an essential tool for modern tabular data.

    Akihiro Nitta is a software engineer on the ML team at Kumo.ai and a core contributor to PyTorch Frame and PyTorch Geometric, with prior experience as a maintainer of PyTorch Lightning. -

    - Register now to attend this event. -

    diff --git a/new.html b/new.html index ec2bedaa5923..65a7707c07ad 100644 --- a/new.html +++ b/new.html @@ -122,6 +122,19 @@

    Get Started

    +
    +
    +
    +
    +

    Get Linux Foundation Help

    +

    + Need to reach Member Projects Support for The Linux Foundation? Visit the LF Service Desk. +

    +
    +
    +
    +
    +
    diff --git a/pt-27-release-qa.html b/pt-27-release-qa.html new file mode 100644 index 000000000000..12a713ab06b2 --- /dev/null +++ b/pt-27-release-qa.html @@ -0,0 +1,46 @@ +--- +layout: default +title: "PyTorch 2.7 Release Live Q&A" +body-class: announcement +background-class: announcement-background +permalink: /pt-27-release-qa +--- + +
    +
    +
    +

    PyTorch Webinars

    +
    +
    +
    + +
    +
    +
    +
    + PyTorch 2.7 Release Live Q&A +

    PyTorch 2.7 Release Live Q&A

    +

    + Date: April 28, 12 pm PT +
    + Speakers: Piotr Bialecki (NVIDIA) and Nikita Shulga (Meta) +
    +
    + Have questions about PyTorch 2.7? Join PyTorch Core Maintainers Piotr Bialecki (NVIDIA) and Nikita Shulga (Meta) for a live Q&A session on Monday, April 28 at 12 PM PST. +

    + +Piotr joined the PyTorch team at NVIDIA in 2019 and currently manages the team. He drives NVIDIA’s effort in maintaining and advancing PyTorch’s CUDA backend and received the PyTorch SUPERHERO award in 2023 for his community contributions, especially in the PyTorch discussion board. As a Core Maintainer, he is also focused on PyTorch’s long-term vision and development. +

    + +Nikita is a Software Engineer at Meta where, among other things, he is responsible for PyTorch releases and continuous integration. Nikita is committed to uplifting the developer community and continuously improving PyTorch. He earned a Master’s degree in Applied Mathematics from the Moscow Institute of Physics and Technology (MIPT). +

    + +Bring your PyTorch 2.7 questions for Piotr & Nikita during this live Q&A session. +

    + Watch the recording: +

    +

    +
    +
    +
    +
    \ No newline at end of file diff --git a/pt-dinov2-multi-label-plant-species-classification.html b/pt-dinov2-multi-label-plant-species-classification.html new file mode 100644 index 000000000000..10b99f213935 --- /dev/null +++ b/pt-dinov2-multi-label-plant-species-classification.html @@ -0,0 +1,39 @@ +--- +layout: default +title: "Using PyTorch and DINOv2 for Multi-label Plant Species Classification" +body-class: announcement +background-class: announcement-background +permalink: /pt-dinov2-multi-label-plant-species-classification +--- + +
    +
    +
    +

    PyTorch Webinars

    +
    +
    +
    + +
    +
    +
    +
    + Using PyTorch and DINOv2 for Multi-label Plant Species Classification +

    Using PyTorch and DINOv2 for Multi-label Plant Species Classification

    +

    + Date: March 27th, 12 PM PST +
    + Speaker: Murilo Gustineli +
    +
    + Join us for an engaging webinar on our innovative transfer learning approach using self-supervised Vision Transformers (DINOv2) for multi-label plant species classification in the PlantCLEF 2024 challenge. We’ll cover how we efficiently extract feature embeddings from a dataset of 1.4 million images and utilize PyTorch Lightning for model training and Apache Spark for data management. Learn about our image processing techniques, including transforming images into grids of tiles and aggregating predictions to overcome computational challenges. Discover the significant performance improvements achieved and get insights into multi-label image classification. Perfect for PyTorch developers, this session will include a Q&A and access to our complete codebase at github.com/dsgt-kaggle-clef/plantclef-2024. +

    + Murilo Gustineli is a Senior AI Software Solutions Engineer at Intel, and is currently pursuing a Master’s in Computer Science at Georgia Tech focusing on machine learning. His work involves creating synthetic datasets, fine-tuning large language models, and training multi-modal models using Intel® Gaudi® Al accelerators as part of the Development Enablement team. He is particularly interested in deep learning, information retrieval, and biodiversity research, aiming to improve species identification and support conservation efforts. Visit Murilo on GitHub. +

    + Watch the recording: +

    +

    +
    +
    +
    +
    \ No newline at end of file diff --git a/published_versions.json b/published_versions.json index 680d66fd5592..655a9608ed45 100644 --- a/published_versions.json +++ b/published_versions.json @@ -1,5 +1,5 @@ { - "latest_stable": "2.6.0", + "latest_stable": "2.7.1", "latest_lts": "lts-1.8.2", "versions": { "preview": { @@ -11,76 +11,50 @@ }, "cuda.x": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118" + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126" }, "cuda.y": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124" + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128" }, "cuda.z": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126" + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129" }, "rocm5.x": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3" - } - }, - "conda": { - "cuda.x": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "cuda.y": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "cuda.z": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "rocm5.x": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "accnone": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4" } }, "libtorch": { "accnone": { "note": null, "versions": { - "Download here (Pre-cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip", "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip" } }, "cuda.x": { "note": null, "versions": { - "Download here (Pre-cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-shared-with-deps-latest.zip", - "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip" + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip" } }, "cuda.y": { "note": null, "versions": { - "Download here (Pre-cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-shared-with-deps-latest.zip", - "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-cxx11-abi-shared-with-deps-latest.zip" + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu128/libtorch-cxx11-abi-shared-with-deps-latest.zip" } }, "cuda.z": { "note": null, "versions": { - "Download here (Pre-cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-shared-with-deps-latest.zip", - "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip" + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/cu129/libtorch-cxx11-abi-shared-with-deps-latest.zip" } }, "rocm5.x": { "note": null, "versions": { - "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/rocm6.3/libtorch-cxx11-abi-shared-with-deps-latest.zip" + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/nightly/rocm6.4/libtorch-cxx11-abi-shared-with-deps-latest.zip" } } } @@ -112,32 +86,6 @@ "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu" } }, - "conda": { - "cuda.x": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null, - "default": true - }, - "cuda.y": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null, - "default": true - }, - "cuda.z": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null, - "default": true - }, - "rocm5.x": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null, - "default": true - }, - "accnone": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - } - }, "libtorch": { "accnone": { "note": null, @@ -183,43 +131,21 @@ }, "cuda.x": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118" + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126" }, "cuda.y": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124" + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128" }, "cuda.z": { "note": null, - "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126" + "command": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129" }, "rocm5.x": { "note": "NOTE: ROCm is not available on Windows", "command": null } }, - "conda": { - "cuda.x": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "cuda.y": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "cuda.z": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - }, - "rocm5.x": { - "note": "NOTE: ROCm is not available on Windows", - "command": null - }, - "accnone": { - "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", - "command": null - } - }, "libtorch": { "accnone": { "note": null, @@ -231,22 +157,22 @@ "cuda.x": { "note": null, "versions": { - "Download here (Release version):": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-latest.zip", - "Download here (Debug version):": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-debug-latest.zip" + "Download here (Release version):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip" } }, "cuda.y": { "note": null, "versions": { - "Download here (Release version):": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-win-shared-with-deps-latest.zip", - "Download here (Debug version):": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-win-shared-with-deps-debug-latest.zip" + "Download here (Release version):": "https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-latest.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-debug-latest.zip" } }, "cuda.z": { "note": null, "versions": { - "Download here (Release version):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip", - "Download here (Debug version):": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip" + "Download here (Release version):": "https://download.pytorch.org/libtorch/nightly/cu129/libtorch-win-shared-with-deps-latest.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/nightly/cu129/libtorch-win-shared-with-deps-debug-latest.zip" } }, "rocm5.x": { @@ -5861,6 +5787,414 @@ } } } + }, + "2.7.0": { + "linux": { + "pip": { + "accnone": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu" + }, + "cuda.x": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118" + }, + "cuda.y": { + "note": null, + "command": "pip3 install torch torchvision torchaudio" + }, + "cuda.z": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128" + }, + "rocm5.x": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3" + } + }, + "libtorch": { + "accnone": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcpu.zip" + } + }, + "cuda.x": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcu118.zip" + } + }, + "cuda.y": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcu126.zip" + } + }, + "cuda.z": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cu128/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Bcu128.zip" + } + }, + "rocm5.x": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/rocm6.3/libtorch-cxx11-abi-shared-with-deps-2.7.0%2Brocm6.3.zip" + } + } + } + }, + "macos": { + "pip": { + "cuda.x": { + "note": "# CUDA is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "cuda.y": { + "note": "# CUDA is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "cuda.z": { + "note": "# CUDA is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "rocm5.x": { + "note": "# ROCm is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "accnone": { + "note": null, + "command": "pip3 install torch torchvision torchaudio" + } + }, + "conda": { + "cuda.x": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null, + "default": true + }, + "cuda.y": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null, + "default": true + }, + "cuda.z": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null, + "default": true + }, + "rocm5.x": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null, + "default": true + }, + "accnone": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null + } + }, + "libtorch": { + "accnone": { + "note": null, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip" + } + }, + "cuda.x": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip" + } + }, + "cuda.y": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip" + } + }, + "cuda.z": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip" + } + }, + "rocm5.x": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.0.zip" + } + } + } + }, + "windows": { + "pip": { + "accnone": { + "note": null, + "command": "pip3 install torch torchvision torchaudio" + }, + "cuda.x": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118" + }, + "cuda.y": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126" + }, + "cuda.z": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128" + }, + "rocm5.x": { + "note": "NOTE: ROCm is not available on Windows", + "command": null + } + }, + "conda": { + "cuda.x": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null + }, + "cuda.y": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null + }, + "cuda.z": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null + }, + "rocm5.x": { + "note": "NOTE: ROCm is not available on Windows", + "command": null + }, + "accnone": { + "note": "NOTE: Conda packages are no longer available. Please use pip instead.
    ", + "command": null + } + }, + "libtorch": { + "accnone": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.7.0%2Bcpu.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.7.0%2Bcpu.zip" + } + }, + "cuda.x": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.7.0%2Bcu118.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.7.0%2Bcu118.zip" + } + }, + "cuda.y": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.7.0%2Bcu126.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.7.0%2Bcu126.zip" + } + }, + "cuda.z": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-2.7.0%2Bcu128.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-debug-2.7.0%2Bcu128.zip" + } + }, + "rocm5.x": { + "note": "NOTE: ROCm is not available on Windows", + "versions": null + } + } + } + }, + "2.7.1": { + "linux": { + "pip": { + "accnone": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu" + }, + "cuda.x": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118" + }, + "cuda.y": { + "note": null, + "command": "pip3 install torch torchvision torchaudio" + }, + "cuda.z": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128" + }, + "rocm5.x": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3" + } + }, + "libtorch": { + "accnone": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcpu.zip" + } + }, + "cuda.x": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip" + } + }, + "cuda.y": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu126.zip" + } + }, + "cuda.z": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/cu128/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu128.zip" + } + }, + "rocm5.x": { + "note": null, + "versions": { + "Download here (cxx11 ABI):": "https://download.pytorch.org/libtorch/rocm6.3/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Brocm6.3.zip" + } + } + } + }, + "macos": { + "pip": { + "cuda.x": { + "note": "# CUDA is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "cuda.y": { + "note": "# CUDA is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "cuda.z": { + "note": "# CUDA is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "rocm5.x": { + "note": "# ROCm is not available on MacOS, please use default package", + "command": "pip3 install torch torchvision torchaudio", + "default": true + }, + "accnone": { + "note": null, + "command": "pip3 install torch torchvision torchaudio" + } + }, + "libtorch": { + "accnone": { + "note": null, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip" + } + }, + "cuda.x": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip" + } + }, + "cuda.y": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip" + } + }, + "cuda.z": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip" + } + }, + "rocm5.x": { + "note": null, + "default": true, + "versions": { + "Download arm64 libtorch here (ROCm and CUDA are not supported):": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip" + } + } + } + }, + "windows": { + "pip": { + "accnone": { + "note": null, + "command": "pip3 install torch torchvision torchaudio" + }, + "cuda.x": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118" + }, + "cuda.y": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126" + }, + "cuda.z": { + "note": null, + "command": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128" + }, + "rocm5.x": { + "note": "NOTE: ROCm is not available on Windows", + "command": null + } + }, + "libtorch": { + "accnone": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.7.1%2Bcpu.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.7.1%2Bcpu.zip" + } + }, + "cuda.x": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.7.1%2Bcu118.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu118.zip" + } + }, + "cuda.y": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.7.1%2Bcu126.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu126.zip" + } + }, + "cuda.z": { + "note": null, + "versions": { + "Download here (Release version):": "https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-2.7.1%2Bcu128.zip", + "Download here (Debug version):": "https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu128.zip" + } + }, + "rocm5.x": { + "note": "NOTE: ROCm is not available on Windows", + "versions": null + } + } + } } } } \ No newline at end of file diff --git a/releases.json b/releases.json index 67ab80679def..603a47ed621f 100644 --- a/releases.json +++ b/releases.json @@ -13,82 +13,82 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_9-cuda11_8", + "build_name": "manywheel-py3_9-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_9-cuda12_4", + "build_name": "manywheel-py3_9-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "manywheel", - "build_name": "manywheel-py3_9-cuda12_6", + "build_name": "manywheel-py3_9-cuda12_9", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_9-rocm6_2_4", + "build_name": "manywheel-py3_9-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.3", - "desired_cuda": "rocm6.3", - "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.4", "package_type": "manywheel", - "build_name": "manywheel-py3_9-rocm6_3", + "build_name": "manywheel-py3_9-rocm6_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -103,82 +103,82 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_10-cuda11_8", + "build_name": "manywheel-py3_10-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_10-cuda12_4", + "build_name": "manywheel-py3_10-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "manywheel", - "build_name": "manywheel-py3_10-cuda12_6", + "build_name": "manywheel-py3_10-cuda12_9", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_10-rocm6_2_4", + "build_name": "manywheel-py3_10-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.3", - "desired_cuda": "rocm6.3", - "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.4", "package_type": "manywheel", - "build_name": "manywheel-py3_10-rocm6_3", + "build_name": "manywheel-py3_10-rocm6_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -193,82 +193,82 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_11-cuda11_8", + "build_name": "manywheel-py3_11-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_11-cuda12_4", + "build_name": "manywheel-py3_11-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "manywheel", - "build_name": "manywheel-py3_11-cuda12_6", + "build_name": "manywheel-py3_11-cuda12_9", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_11-rocm6_2_4", + "build_name": "manywheel-py3_11-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.3", - "desired_cuda": "rocm6.3", - "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.4", "package_type": "manywheel", - "build_name": "manywheel-py3_11-rocm6_3", + "build_name": "manywheel-py3_11-rocm6_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -283,82 +283,82 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_12-cuda11_8", + "build_name": "manywheel-py3_12-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_12-cuda12_4", + "build_name": "manywheel-py3_12-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "manywheel", - "build_name": "manywheel-py3_12-cuda12_6", + "build_name": "manywheel-py3_12-cuda12_9", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_12-rocm6_2_4", + "build_name": "manywheel-py3_12-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.3", - "desired_cuda": "rocm6.3", - "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.4", "package_type": "manywheel", - "build_name": "manywheel-py3_12-rocm6_3", + "build_name": "manywheel-py3_12-rocm6_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -373,143 +373,173 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_13-cuda11_8", + "build_name": "manywheel-py3_13-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_13-cuda12_4", + "build_name": "manywheel-py3_13-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "manywheel", - "build_name": "manywheel-py3_13-cuda12_6", + "build_name": "manywheel-py3_13-cuda12_9", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_13-rocm6_2_4", + "build_name": "manywheel-py3_13-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.2.4", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.3", - "desired_cuda": "rocm6.3", - "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.4", "package_type": "manywheel", - "build_name": "manywheel-py3_13-rocm6_3", + "build_name": "manywheel-py3_13-rocm6_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cpu", "gpu_arch_version": "", "desired_cuda": "cpu", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cpu", - "package_type": "libtorch", - "build_name": "libtorch-cpu-shared-with-deps-pre-cxx11", + "container_image": "pytorch/manylinux2_28-builder:cpu", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cpu", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-shared-with-deps-latest.zip", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cuda11.8", - "package_type": "libtorch", - "build_name": "libtorch-cuda11_8-shared-with-deps-pre-cxx11", - "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-shared-with-deps-latest.zip", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cuda12_6", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cuda12.4", - "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-pre-cxx11", - "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-shared-with-deps-latest.zip", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cuda12_8", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cuda12.6", - "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-pre-cxx11", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cuda12_9", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "rocm", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-rocm6_3", + "validation_runner": "linux.2xlarge", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.3", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "rocm", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.4", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-rocm6_4", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-shared-with-deps-latest.zip", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm6.4", "channel": "nightly", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { "gpu_arch_type": "cpu", @@ -524,82 +554,82 @@ "validation_runner": "linux.2xlarge", "installation": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-cxx11-abi-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:cuda11.8", + "container_image": "pytorch/libtorch-cxx11-builder:cuda12.6", "package_type": "libtorch", - "build_name": "libtorch-cuda11_8-shared-with-deps-cxx11-abi", + "build_name": "libtorch-cuda12_6-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-cxx11-abi-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:cuda12.4", + "container_image": "pytorch/libtorch-cxx11-builder:cuda12.8", "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-cxx11-abi", + "build_name": "libtorch-cuda12_8-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-cxx11-abi-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu128/libtorch-cxx11-abi-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:cuda12.6", + "container_image": "pytorch/libtorch-cxx11-builder:cuda12.9", "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-cxx11-abi", + "build_name": "libtorch-cuda12_9-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-cxx11-abi-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu129/libtorch-cxx11-abi-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:rocm6.2.4", + "container_image": "pytorch/libtorch-cxx11-builder:rocm6.3", "package_type": "libtorch", - "build_name": "libtorch-rocm6_2_4-shared-with-deps-cxx11-abi", + "build_name": "libtorch-rocm6_3-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/rocm6.2.4/libtorch-cxx11-abi-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/rocm6.3/libtorch-cxx11-abi-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "rocm", - "gpu_arch_version": "6.3", - "desired_cuda": "rocm6.3", + "gpu_arch_version": "6.4", + "desired_cuda": "rocm6.4", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:rocm6.3", + "container_image": "pytorch/libtorch-cxx11-builder:rocm6.4", "package_type": "libtorch", - "build_name": "libtorch-rocm6_3-shared-with-deps-cxx11-abi", + "build_name": "libtorch-rocm6_4-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/rocm6.3/libtorch-cxx11-abi-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/rocm6.4/libtorch-cxx11-abi-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" } ], "windows": [ @@ -615,52 +645,52 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_9-cuda11_8", + "build_name": "wheel-py3_9-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_9-cuda12_4", + "build_name": "wheel-py3_9-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "wheel", - "build_name": "wheel-py3_9-cuda12_6", + "build_name": "wheel-py3_9-cuda12_9", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -675,52 +705,52 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_10-cuda11_8", + "build_name": "wheel-py3_10-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_10-cuda12_4", + "build_name": "wheel-py3_10-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "wheel", - "build_name": "wheel-py3_10-cuda12_6", + "build_name": "wheel-py3_10-cuda12_9", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -735,52 +765,52 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_11-cuda11_8", + "build_name": "wheel-py3_11-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_11-cuda12_4", + "build_name": "wheel-py3_11-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "wheel", - "build_name": "wheel-py3_11-cuda12_6", + "build_name": "wheel-py3_11-cuda12_9", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -795,52 +825,52 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_12-cuda11_8", + "build_name": "wheel-py3_12-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_12-cuda12_4", + "build_name": "wheel-py3_12-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", "package_type": "wheel", - "build_name": "wheel-py3_12-cuda12_6", + "build_name": "wheel-py3_12-cuda12_9", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -855,52 +885,112 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", - "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_13-cuda11_8", + "build_name": "wheel-py3_13-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu118", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_13-cuda12_4", + "build_name": "wheel-py3_13-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", + "package_type": "wheel", + "build_name": "wheel-py3_13-cuda12_9", + "validation_runner": "windows.g4dn.xlarge", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cpu", + "gpu_arch_version": "", + "desired_cuda": "cpu", + "container_image": "pytorch/manylinux2_28-builder:cpu", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cpu", + "validation_runner": "windows.4xlarge", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cuda", "gpu_arch_version": "12.6", "desired_cuda": "cu126", "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_13-cuda12_6", + "build_name": "wheel-py3_13t-cuda12_6", "validation_runner": "windows.g4dn.xlarge", "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu126", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cuda", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cuda12_8", + "validation_runner": "windows.g4dn.xlarge", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cuda", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", + "container_image": "pytorch/manylinux2_28-builder:cuda12.9", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cuda12_9", + "validation_runner": "windows.g4dn.xlarge", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -916,52 +1006,52 @@ "validation_runner": "windows.4xlarge", "installation": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", "libtorch_variant": "shared-with-deps", "libtorch_config": "release", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda11_8-shared-with-deps-release", + "build_name": "libtorch-cuda12_6-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", "libtorch_variant": "shared-with-deps", "libtorch_config": "release", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-release", + "build_name": "libtorch-cuda12_8-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-win-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", "libtorch_variant": "shared-with-deps", "libtorch_config": "release", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-release", + "build_name": "libtorch-cuda12_9-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu129/libtorch-win-shared-with-deps-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cpu", @@ -976,52 +1066,52 @@ "validation_runner": "windows.4xlarge", "installation": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-win-shared-with-deps-debug-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "11.8", - "desired_cuda": "cu118", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", "libtorch_variant": "shared-with-deps", "libtorch_config": "debug", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda11_8-shared-with-deps-debug", + "build_name": "libtorch-cuda12_6-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu118/libtorch-win-shared-with-deps-debug-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", "libtorch_variant": "shared-with-deps", "libtorch_config": "debug", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-debug", + "build_name": "libtorch-cuda12_8-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu124/libtorch-win-shared-with-deps-debug-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu128/libtorch-win-shared-with-deps-debug-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", + "gpu_arch_version": "12.9", + "desired_cuda": "cu129", "libtorch_variant": "shared-with-deps", "libtorch_config": "debug", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-debug", + "build_name": "libtorch-cuda12_9-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/nightly/cu126/libtorch-win-shared-with-deps-debug-latest.zip", + "installation": "https://download.pytorch.org/libtorch/nightly/cu129/libtorch-win-shared-with-deps-debug-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" } ], "macos": [ @@ -1037,7 +1127,7 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1052,7 +1142,7 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1067,7 +1157,7 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1082,7 +1172,7 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1097,7 +1187,22 @@ "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", "channel": "nightly", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cpu", + "gpu_arch_version": "", + "desired_cuda": "cpu", + "container_image": "pytorch/manylinux2_28-builder:cpu", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cpu", + "validation_runner": "macos-m1-stable", + "installation": "pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu", + "channel": "nightly", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1113,7 +1218,7 @@ "validation_runner": "macos-m1-stable", "installation": "https://download.pytorch.org/libtorch/nightly/cpu/libtorch-macos-arm64-latest.zip", "channel": "nightly", - "stable_version": "2.6.0" + "stable_version": "2.7.1" } ] }, @@ -1131,7 +1236,7 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1146,67 +1251,67 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_9-cuda12_4", + "build_name": "manywheel-py3_9-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_9-cuda12_6", + "build_name": "manywheel-py3_9-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.1", - "desired_cuda": "rocm6.1", - "container_image": "pytorch/manylinux2_28-builder:rocm6.1", + "gpu_arch_version": "6.2.4", + "desired_cuda": "rocm6.2.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", "package_type": "manywheel", - "build_name": "manywheel-py3_9-rocm6_1", + "build_name": "manywheel-py3_9-rocm6_2_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_9-rocm6_2_4", + "build_name": "manywheel-py3_9-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1221,7 +1326,7 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1236,67 +1341,67 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_10-cuda12_4", + "build_name": "manywheel-py3_10-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_10-cuda12_6", + "build_name": "manywheel-py3_10-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.1", - "desired_cuda": "rocm6.1", - "container_image": "pytorch/manylinux2_28-builder:rocm6.1", + "gpu_arch_version": "6.2.4", + "desired_cuda": "rocm6.2.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", "package_type": "manywheel", - "build_name": "manywheel-py3_10-rocm6_1", + "build_name": "manywheel-py3_10-rocm6_2_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_10-rocm6_2_4", + "build_name": "manywheel-py3_10-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1311,7 +1416,7 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1326,22 +1431,7 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", - "use_split_build": false - }, - { - "python_version": "3.11", - "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", - "package_type": "manywheel", - "build_name": "manywheel-py3_11-cuda12_4", - "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install torch torchvision torchaudio", - "channel": "release", - "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1353,25 +1443,25 @@ "package_type": "manywheel", "build_name": "manywheel-py3_11-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", - "gpu_arch_type": "rocm", - "gpu_arch_version": "6.1", - "desired_cuda": "rocm6.1", - "container_image": "pytorch/manylinux2_28-builder:rocm6.1", + "gpu_arch_type": "cuda", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_11-rocm6_1", - "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1", + "build_name": "manywheel-py3_11-cuda12_8", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1386,7 +1476,22 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.11", + "gpu_arch_type": "rocm", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "package_type": "manywheel", + "build_name": "manywheel-py3_11-rocm6_3", + "validation_runner": "linux.2xlarge", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1401,7 +1506,7 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1416,67 +1521,67 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_12-cuda12_4", + "build_name": "manywheel-py3_12-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_12-cuda12_6", + "build_name": "manywheel-py3_12-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.1", - "desired_cuda": "rocm6.1", - "container_image": "pytorch/manylinux2_28-builder:rocm6.1", + "gpu_arch_version": "6.2.4", + "desired_cuda": "rocm6.2.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", "package_type": "manywheel", - "build_name": "manywheel-py3_12-rocm6_1", + "build_name": "manywheel-py3_12-rocm6_2_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_12-rocm6_2_4", + "build_name": "manywheel-py3_12-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1491,7 +1596,7 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1506,128 +1611,158 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "manywheel", - "build_name": "manywheel-py3_13-cuda12_4", + "build_name": "manywheel-py3_13-cuda12_6", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "manywheel", - "build_name": "manywheel-py3_13-cuda12_6", + "build_name": "manywheel-py3_13-cuda12_8", "validation_runner": "linux.g5.4xlarge.nvidia.gpu", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.1", - "desired_cuda": "rocm6.1", - "container_image": "pytorch/manylinux2_28-builder:rocm6.1", + "gpu_arch_version": "6.2.4", + "desired_cuda": "rocm6.2.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", "package_type": "manywheel", - "build_name": "manywheel-py3_13-rocm6_1", + "build_name": "manywheel-py3_13-rocm6_2_4", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.1", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", - "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", "package_type": "manywheel", - "build_name": "manywheel-py3_13-rocm6_2_4", + "build_name": "manywheel-py3_13-rocm6_3", "validation_runner": "linux.2xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cpu", "gpu_arch_version": "", "desired_cuda": "cpu", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cpu", - "package_type": "libtorch", - "build_name": "libtorch-cpu-shared-with-deps-pre-cxx11", + "container_image": "pytorch/manylinux2_28-builder:cpu", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cpu", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-shared-with-deps-2.6.0%2Bcpu.zip", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu", "channel": "release", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cuda", "gpu_arch_version": "11.8", "desired_cuda": "cu118", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cuda11.8", - "package_type": "libtorch", - "build_name": "libtorch-cuda11_8-shared-with-deps-pre-cxx11", - "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-shared-with-deps-2.6.0%2Bcu118.zip", + "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cuda11_8", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cuda12.4", - "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-pre-cxx11", - "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cu124/libtorch-shared-with-deps-2.6.0%2Bcu124.zip", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cuda12_6", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install torch torchvision torchaudio", "channel": "release", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { + "python_version": "3.13t", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "libtorch_variant": "shared-with-deps", - "libtorch_config": "", - "devtoolset": "pre-cxx11", - "container_image": "pytorch/manylinux-builder:cuda12.6", - "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-pre-cxx11", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-cuda12_8", + "validation_runner": "linux.g5.4xlarge.nvidia.gpu", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "rocm", + "gpu_arch_version": "6.2.4", + "desired_cuda": "rocm6.2.4", + "container_image": "pytorch/manylinux2_28-builder:rocm6.2.4", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-rocm6_2_4", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-shared-with-deps-2.6.0%2Bcu126.zip", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2.4", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "rocm", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", + "container_image": "pytorch/manylinux2_28-builder:rocm6.3", + "package_type": "manywheel", + "build_name": "manywheel-py3_13t-rocm6_3", + "validation_runner": "linux.2xlarge", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3", "channel": "release", - "stable_version": "2.6.0" + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false }, { "gpu_arch_type": "cpu", @@ -1640,9 +1775,9 @@ "package_type": "libtorch", "build_name": "libtorch-cpu-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcpu.zip", + "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcpu.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", @@ -1655,69 +1790,69 @@ "package_type": "libtorch", "build_name": "libtorch-cuda11_8-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcu118.zip", + "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu118.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:cuda12.4", + "container_image": "pytorch/libtorch-cxx11-builder:cuda12.6", "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-cxx11-abi", + "build_name": "libtorch-cuda12_6-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cu124/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcu124.zip", + "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu126.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:cuda12.6", + "container_image": "pytorch/libtorch-cxx11-builder:cuda12.8", "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-cxx11-abi", + "build_name": "libtorch-cuda12_8-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Bcu126.zip", + "installation": "https://download.pytorch.org/libtorch/cu128/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Bcu128.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "rocm", - "gpu_arch_version": "6.1", - "desired_cuda": "rocm6.1", + "gpu_arch_version": "6.2.4", + "desired_cuda": "rocm6.2.4", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:rocm6.1", + "container_image": "pytorch/libtorch-cxx11-builder:rocm6.2.4", "package_type": "libtorch", - "build_name": "libtorch-rocm6_1-shared-with-deps-cxx11-abi", + "build_name": "libtorch-rocm6_2_4-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/rocm6.1/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Brocm6.1.zip", + "installation": "https://download.pytorch.org/libtorch/rocm6.2.4/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Brocm6.2.4.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "rocm", - "gpu_arch_version": "6.2.4", - "desired_cuda": "rocm6.2.4", + "gpu_arch_version": "6.3", + "desired_cuda": "rocm6.3", "libtorch_variant": "shared-with-deps", "libtorch_config": "", "devtoolset": "cxx11-abi", - "container_image": "pytorch/libtorch-cxx11-builder:rocm6.2.4", + "container_image": "pytorch/libtorch-cxx11-builder:rocm6.3", "package_type": "libtorch", - "build_name": "libtorch-rocm6_2_4-shared-with-deps-cxx11-abi", + "build_name": "libtorch-rocm6_3-shared-with-deps-cxx11-abi", "validation_runner": "linux.2xlarge", - "installation": "https://download.pytorch.org/libtorch/rocm6.2.4/libtorch-cxx11-abi-shared-with-deps-2.6.0%2Brocm6.2.4.zip", + "installation": "https://download.pytorch.org/libtorch/rocm6.3/libtorch-cxx11-abi-shared-with-deps-2.7.1%2Brocm6.3.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" } ], "windows": [ @@ -1733,7 +1868,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1748,37 +1883,37 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_9-cuda12_4", + "build_name": "wheel-py3_9-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.9", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_9-cuda12_6", + "build_name": "wheel-py3_9-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1793,7 +1928,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1808,37 +1943,37 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_10-cuda12_4", + "build_name": "wheel-py3_10-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.10", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_10-cuda12_6", + "build_name": "wheel-py3_10-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1853,7 +1988,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1868,37 +2003,37 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_11-cuda12_4", + "build_name": "wheel-py3_11-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.11", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_11-cuda12_6", + "build_name": "wheel-py3_11-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1913,7 +2048,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1928,37 +2063,37 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_12-cuda12_4", + "build_name": "wheel-py3_12-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.12", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", - "container_image": "pytorch/manylinux2_28-builder:cuda12.6", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", "package_type": "wheel", - "build_name": "wheel-py3_12-cuda12_6", + "build_name": "wheel-py3_12-cuda12_8", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1973,7 +2108,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -1988,37 +2123,97 @@ "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", - "container_image": "pytorch/manylinux2_28-builder:cuda12.4", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", + "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_13-cuda12_4", + "build_name": "wheel-py3_13-cuda12_6", "validation_runner": "windows.g4dn.xlarge", - "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { "python_version": "3.13", "gpu_arch_type": "cuda", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", + "package_type": "wheel", + "build_name": "wheel-py3_13-cuda12_8", + "validation_runner": "windows.g4dn.xlarge", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cpu", + "gpu_arch_version": "", + "desired_cuda": "cpu", + "container_image": "pytorch/manylinux2_28-builder:cpu", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cpu", + "validation_runner": "windows.4xlarge", + "installation": "pip3 install torch torchvision torchaudio", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cuda", + "gpu_arch_version": "11.8", + "desired_cuda": "cu118", + "container_image": "pytorch/manylinux2_28-builder:cuda11.8", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cuda11_8", + "validation_runner": "windows.g4dn.xlarge", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cuda", "gpu_arch_version": "12.6", "desired_cuda": "cu126", "container_image": "pytorch/manylinux2_28-builder:cuda12.6", "package_type": "wheel", - "build_name": "wheel-py3_13-cuda12_6", + "build_name": "wheel-py3_13t-cuda12_6", "validation_runner": "windows.g4dn.xlarge", "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cuda", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", + "container_image": "pytorch/manylinux2_28-builder:cuda12.8", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cuda12_8", + "validation_runner": "windows.g4dn.xlarge", + "installation": "pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -2032,9 +2227,9 @@ "package_type": "libtorch", "build_name": "libtorch-cpu-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.6.0%2Bcpu.zip", + "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-2.7.1%2Bcpu.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", @@ -2047,39 +2242,39 @@ "package_type": "libtorch", "build_name": "libtorch-cuda11_8-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.6.0%2Bcu118.zip", + "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-2.7.1%2Bcu118.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", "libtorch_variant": "shared-with-deps", "libtorch_config": "release", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-release", + "build_name": "libtorch-cuda12_6-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cu124/libtorch-win-shared-with-deps-2.6.0%2Bcu124.zip", + "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.7.1%2Bcu126.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", "libtorch_variant": "shared-with-deps", "libtorch_config": "release", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-release", + "build_name": "libtorch-cuda12_8-shared-with-deps-release", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-2.6.0%2Bcu126.zip", + "installation": "https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-2.7.1%2Bcu128.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cpu", @@ -2092,9 +2287,9 @@ "package_type": "libtorch", "build_name": "libtorch-cpu-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.6.0%2Bcpu.zip", + "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-win-shared-with-deps-debug-2.7.1%2Bcpu.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", @@ -2107,39 +2302,39 @@ "package_type": "libtorch", "build_name": "libtorch-cuda11_8-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.6.0%2Bcu118.zip", + "installation": "https://download.pytorch.org/libtorch/cu118/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu118.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.4", - "desired_cuda": "cu124", + "gpu_arch_version": "12.6", + "desired_cuda": "cu126", "libtorch_variant": "shared-with-deps", "libtorch_config": "debug", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_4-shared-with-deps-debug", + "build_name": "libtorch-cuda12_6-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cu124/libtorch-win-shared-with-deps-debug-2.6.0%2Bcu124.zip", + "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu126.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" }, { "gpu_arch_type": "cuda", - "gpu_arch_version": "12.6", - "desired_cuda": "cu126", + "gpu_arch_version": "12.8", + "desired_cuda": "cu128", "libtorch_variant": "shared-with-deps", "libtorch_config": "debug", "devtoolset": "", "container_image": "", "package_type": "libtorch", - "build_name": "libtorch-cuda12_6-shared-with-deps-debug", + "build_name": "libtorch-cuda12_8-shared-with-deps-debug", "validation_runner": "windows.4xlarge", - "installation": "https://download.pytorch.org/libtorch/cu126/libtorch-win-shared-with-deps-debug-2.6.0%2Bcu126.zip", + "installation": "https://download.pytorch.org/libtorch/cu128/libtorch-win-shared-with-deps-debug-2.7.1%2Bcu128.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" } ], "macos": [ @@ -2155,7 +2350,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -2170,7 +2365,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -2185,7 +2380,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -2200,7 +2395,7 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -2215,7 +2410,22 @@ "installation": "pip3 install torch torchvision torchaudio", "channel": "release", "upload_to_base_bucket": "no", - "stable_version": "2.6.0", + "stable_version": "2.7.1", + "use_split_build": false + }, + { + "python_version": "3.13t", + "gpu_arch_type": "cpu", + "gpu_arch_version": "", + "desired_cuda": "cpu", + "container_image": "pytorch/manylinux2_28-builder:cpu", + "package_type": "wheel", + "build_name": "wheel-py3_13t-cpu", + "validation_runner": "macos-m1-stable", + "installation": "pip3 install torch torchvision torchaudio", + "channel": "release", + "upload_to_base_bucket": "no", + "stable_version": "2.7.1", "use_split_build": false }, { @@ -2229,9 +2439,9 @@ "package_type": "libtorch", "build_name": "libtorch-cpu-shared-with-deps-cxx11-abi", "validation_runner": "macos-m1-stable", - "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.6.0.zip", + "installation": "https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.7.1.zip", "channel": "release", - "stable_version": "2.6.0" + "stable_version": "2.7.1" } ] } diff --git a/scripts/gen_quick_start_module.py b/scripts/gen_quick_start_module.py index a3ceb1a35bc3..5fd20d79949f 100755 --- a/scripts/gen_quick_start_module.py +++ b/scripts/gen_quick_start_module.py @@ -157,8 +157,10 @@ def update_versions(versions, release_matrix, release_version): if x["libtorch_variant"] == "shared-with-deps" } if instr["versions"] is not None: - for ver in [PRE_CXX11_ABI, CXX11_ABI]: - if gpu_arch_type == "rocm" and ver == PRE_CXX11_ABI: + for ver in [CXX11_ABI, PRE_CXX11_ABI]: + # temporarily remove setting pre-cxx11-abi. For Release 2.7 we + # should remove pre-cxx11-abi completely. + if ver == PRE_CXX11_ABI: continue else: instr["versions"][LIBTORCH_DWNL_INSTR[ver]] = ( diff --git a/tac.html b/tac.html index 10844f1af8da..171e68422fb2 100644 --- a/tac.html +++ b/tac.html @@ -81,6 +81,20 @@

    Milos Puzovic, Arm

    +
    +
    + Ricardo Aravena +
    +
    +

    Ricardo Aravena, Snowflake

    +

    + Ricardo is a seasoned technology leader with over two decades of experience across the enterprise and startup landscape. He works at Snowflake as a Cloud Infrastructure and Open Source Lead, focusing on automating AI/ML infrastructure using cloud-native technologies at scale. A passionate open source advocate, Ricardo also serves as a shadow member of the CNCF Technical Oversight Committee and CNCF AI Sub-project, where he helps shape the future of AI computing infrastructure. +

    + Throughout his career, Ricardo has held key engineering and leadership roles at major companies such as Rakuten, Cisco, and VMware, as well as at innovative startups including Truera, Branch Metrics, Coupa, HyTrust, Exablox, and SnapLogic. He's committed to community-driven innovation and regularly contributes to industry initiatives that bridge the gap between open source communities and enterprise adoption. +
    +

    +
    +
    Shauheen Zahirazami @@ -103,6 +117,10 @@

    Shauheen Zahirazami, Google

    Yikun Jiang, Huawei

    +

    Yikun Jiang is principal software engineer from Huawei computing opensource development team, working on multi-arch, heterogeneous hardware support and improvement of projects in computing area. He has more than 10 years experience in computing area, and leads an active and creative team in R&D under the principle of “upstream first”, which aims to make diverse computing power ubiquitous.
    +