Skip to content

Commit c8c63e2

Browse files
authored
Merge pull request pytorch#1019 from Jam3/TerryCE-introducing-accelerated-pytorch-training-on-mac
Terry ce introducing accelerated pytorch training on mac
2 parents 4ad55c4 + 4764f4d commit c8c63e2

4 files changed

+34
-0
lines changed
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
---
2+
layout: blog_detail
3+
title: "Introducing Accelerated PyTorch Training on Mac"
4+
author: PyTorch
5+
featured-img: "/assets/images/METAPT-002-BarGraph-02-static.png"
6+
---
7+
8+
In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac.
9+
10+
<p align="center">
11+
<img src="/assets/images/intro-graphic-accelerated-pytorch-training.jpg" width="100%">
12+
</p>
13+
14+
## Metal Acceleration
15+
16+
Accelerated GPU training is enabled using Apple’s Metal Performance Shaders (MPS) as a backend for PyTorch. The MPS backend extends the PyTorch framework, providing scripts and capabilities to set up and run operations on Mac. MPS optimizes compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family. The new device maps machine learning computational graphs and primitives on the MPS Graph framework and tuned kernels provided by MPS.
17+
18+
## Training Benefits on Apple Silicon
19+
20+
Every Apple silicon Mac has a unified memory architecture, providing the GPU with direct access to the full memory store. This makes Mac a great platform for machine learning, enabling users to train larger networks or batch sizes locally. This reduces costs associated with cloud-based development or the need for additional local GPUs. The Unified Memory architecture also reduces data retrieval latency, improving end-to-end performance.
21+
22+
In the graphs below, you can see the performance speedup from accelerated GPU training and evaluation compared to the CPU baseline:
23+
24+
<p align="center">
25+
<img src="/assets/images/METAPT-002-BarGraph-02.gif" width="100%">
26+
</p>
27+
28+
Testing conducted by Apple in April 2022 using production Mac Studio systems with Apple M1 Ultra, 20-core CPU, 64-core GPU 128GB of RAM, and 2TB SSD. Tested with macOS Monterey 12.3, prerelease PyTorch 1.12, ResNet50 (batch size=128), HuggingFace BERT (batch size=64), and VGG16 (batch size=64). Performance tests are conducted using specific computer systems and reflect the approximate performance of Mac Studio.
29+
30+
## Getting Started
31+
32+
To get started, just install the latest [Preview (Nightly) build](https://pytorch.org/get-started/locally/) on your Apple silicon Mac running macOS 12.3 or later with a native version (arm64) of Python.
33+
34+
You can also learn more about Metal and MPS on [Apple’s Metal page](https://developer.apple.com/metal/).
19.9 KB
Loading
4.06 MB
Loading

0 commit comments

Comments
 (0)