Skip to content

Commit 0681bc9

Browse files
authored
Merge pull request #428 from pytorch/1.6-blog-post
1.6 blog post
2 parents 2153a09 + 58f8db1 commit 0681bc9

File tree

5 files changed

+224
-6
lines changed

5 files changed

+224
-6
lines changed

_news/news-item-1.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
order: 1
3-
link: https://pytorch.org/blog/updates-improvements-to-pytorch-tutorials/
4-
summary: Click Here to Read About Latest Updates and Improvements to PyTorch Tutorials
3+
link: https://pytorch.org/blog/pytorch-1.6-released/
4+
summary: PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintainers for Windows.
55
---
66

77

_news/news-item-2.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
order: 2
3-
link: https://pytorch.org/blog/pytorch-1-dot-5-released-with-new-and-updated-apis
4-
summary: PyTorch 1.5 released, new and updated APIs including C++ frontend API parity with Python.
3+
link: https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/
4+
summary: Accelerating Training on NVIDIA GPUs with PyTorch Automatic Mixed Precision.
55
---
66

_news/news-item-3.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
order: 3
3-
link: https://pytorch.org/blog/pytorch-library-updates-new-model-serving-library
4-
summary: PyTorch library updates including new model serving library
3+
link: https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch/
4+
summary: Microsoft becomes maintainer of the Windows version of PyTorch.
55
---
66

_news/news-item-4.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
order: 4
3+
link: https://pytorch.org/blog/pytorch-feature-classification-changes/
4+
summary: See the new PyTorch feature classification changes
5+
---
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
layout: blog_detail
3+
title: 'PyTorch 1.6 released w/ Native AMP Support, Microsoft joins as maintainers for Windows'
4+
author: Team PyTorch
5+
---
6+
7+
Today, we’re announcing the availability of PyTorch 1.6, along with updated domain libraries. We are also excited to announce the team at [Microsoft is now maintaining Windows builds and binaries](https://pytorch.org/blog/microsoft-becomes-maintainer-of-the-windows-version-of-pytorch) and will also be supporting the community on GitHub as well as the PyTorch Windows discussion forums.
8+
9+
The PyTorch 1.6 release includes a number of new APIs, tools for performance improvement and profiling, as well as major updates to both distributed data parallel (DDP) and remote procedure call (RPC) based distributed training.
10+
A few of the highlights include:
11+
12+
1. Automatic mixed precision (AMP) training is now natively supported and a stable feature (See [here](https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/) for more details) - thanks for NVIDIA’s contributions;
13+
2. Native TensorPipe support now added for tensor-aware, point-to-point communication primitives built specifically for machine learning;
14+
3. Added support for complex tensors to the frontend API surface;
15+
4. New profiling tools providing tensor-level memory consumption information;
16+
5. Numerous improvements and new features for both distributed data parallel (DDP) training and the remote procedural call (RPC) packages.
17+
18+
Additionally, from this release onward, features will be classified as Stable, Beta and Prototype. Prototype features are not included as part of the binary distribution and are instead available through either building from source, using nightlies or via compiler flag. You can learn more about what this change means in the post [here](https://pytorch.org/blog/pytorch-feature-classification-changes/). You can also find the full release notes [here](https://github.com/pytorch/pytorch/releases).
19+
20+
# Performance & Profiling
21+
22+
## [Stable] Automatic Mixed Precision (AMP) Training
23+
24+
AMP allows users to easily enable automatic mixed precision training enabling higher performance and memory savings of up to 50% on Tensor Core GPUs. Using the natively supported `torch.cuda.amp` API, AMP provides convenience methods for mixed precision, where some operations use the `torch.float32 (float)` datatype and other operations use `torch.float16 (half)`. Some ops, like linear layers and convolutions, are much faster in `float16`. Other ops, like reductions, often require the dynamic range of `float32`. Mixed precision tries to match each op to its appropriate datatype.
25+
26+
* Design doc ([Link](https://github.com/pytorch/pytorch/issues/25081))
27+
* Documentation ([Link](https://pytorch.org/docs/stable/amp.html))
28+
* Usage examples ([Link](https://pytorch.org/docs/stable/notes/amp_examples.html))
29+
30+
## [Beta] Fork/Join Parallelism
31+
32+
This release adds support for a language-level construct as well as runtime support for coarse-grained parallelism in TorchScript code. This support is useful for situations such as running models in an ensemble in parallel, or running bidirectional components of recurrent nets in parallel, and allows the ability to unlock the computational power of parallel architectures (e.g. many-core CPUs) for task level parallelism.
33+
34+
Parallel execution of TorchScript programs is enabled through two primitives: `torch.jit.fork` and `torch.jit.wait`. In the below example, we parallelize execution of `foo`:
35+
36+
```python
37+
import torch
38+
from typing import List
39+
40+
def foo(x):
41+
return torch.neg(x)
42+
43+
@torch.jit.script
44+
def example(x):
45+
futures = [torch.jit.fork(foo, x) for _ in range(100)]
46+
results = [torch.jit.wait(future) for future in futures]
47+
return torch.sum(torch.stack(results))
48+
49+
print(example(torch.ones([])))
50+
```
51+
52+
* Documentation ([Link](https://pytorch.org/docs/stable/jit.html))
53+
54+
## [Beta] Memory Profiler
55+
56+
The `torch.autograd.profiler` API now includes a memory profiler that lets you inspect the tensor memory cost of different operators inside your CPU and GPU models.
57+
58+
Here is an example usage of the API:
59+
60+
```python
61+
import torch
62+
import torchvision.models as models
63+
import torch.autograd.profiler as profiler
64+
65+
model = models.resnet18()
66+
inputs = torch.randn(5, 3, 224, 224)
67+
with profiler.profile(profile_memory=True, record_shapes=True) as prof:
68+
model(inputs)
69+
70+
# NOTE: some columns were removed for brevity
71+
print(prof.key_averages().table(sort_by="self_cpu_memory_usage", row_limit=10))
72+
# --------------------------- --------------- --------------- ---------------
73+
# Name CPU Mem Self CPU Mem Number of Calls
74+
# --------------------------- --------------- --------------- ---------------
75+
# empty 94.79 Mb 94.79 Mb 123
76+
# resize_ 11.48 Mb 11.48 Mb 2
77+
# addmm 19.53 Kb 19.53 Kb 1
78+
# empty_strided 4 b 4 b 1
79+
# conv2d 47.37 Mb 0 b 20
80+
# --------------------------- --------------- --------------- ---------------
81+
```
82+
83+
* PR ([Link](https://github.com/pytorch/pytorch/pull/37775))
84+
* Documentation ([Link](https://pytorch.org/docs/stable/autograd.html#profiler))
85+
86+
# Distributed Training & RPC
87+
88+
## [Beta] TensorPipe backend for RPC
89+
90+
PyTorch 1.6 introduces a new backend for the RPC module which leverages the TensorPipe library, a tensor-aware point-to-point communication primitive targeted at machine learning, intended to complement the current primitives for distributed training in PyTorch (Gloo, MPI, ...) which are collective and blocking. The pairwise and asynchronous nature of TensorPipe lends itself to new networking paradigms that go beyond data parallel: client-server approaches (e.g., parameter server for embeddings, actor-learner separation in Impala-style RL, ...) and model and pipeline parallel training (think GPipe), gossip SGD, etc.
91+
92+
```python
93+
# One-line change needed to opt in
94+
torch.distributed.rpc.init_rpc(
95+
...
96+
backend=torch.distributed.rpc.BackendType.TENSORPIPE,
97+
)
98+
99+
# No changes to the rest of the RPC API
100+
torch.distributed.rpc.rpc_sync(...)
101+
```
102+
103+
* Design doc ([Link](https://github.com/pytorch/pytorch/issues/35251))
104+
* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html))
105+
106+
## [Beta] DDP+RPC
107+
108+
PyTorch Distributed supports two powerful paradigms: DDP for full sync data parallel training of models and the RPC framework which allows for distributed model parallelism. Previously, these two features worked independently and users couldn’t mix and match these to try out hybrid parallelism paradigms.
109+
110+
Starting in PyTorch 1.6, we’ve enabled DDP and RPC to work together seamlessly so that users can combine these two techniques to achieve both data parallelism and model parallelism. An example is where users would like to place large embedding tables on parameter servers and use the RPC framework for embedding lookups, but store smaller dense parameters on trainers and use DDP to synchronize the dense parameters. Below is a simple code snippet.
111+
112+
```python
113+
// On each trainer
114+
115+
remote_emb = create_emb(on="ps", ...)
116+
ddp_model = DDP(dense_model)
117+
118+
for data in batch:
119+
with torch.distributed.autograd.context():
120+
res = remote_emb(data)
121+
loss = ddp_model(res)
122+
torch.distributed.autograd.backward([loss])
123+
```
124+
125+
* DDP+RPC Tutorial ([Link](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html))
126+
* Documentation ([Link](https://pytorch.org/docs/stable/rpc/index.html))
127+
* Usage Examples ([Link](https://github.com/pytorch/examples/pull/800))
128+
129+
## [Beta] RPC - Asynchronous User Functions
130+
131+
RPC Asynchronous User Functions supports the ability to yield and resume on the server side when executing a user-defined function. Prior to this feature, when a callee processes a request, one RPC thread waits until the user function returns. If the user function contains IO (e.g., nested RPC) or signaling (e.g., waiting for another request to unblock), the corresponding RPC thread would sit idle waiting for these events. As a result, some applications have to use a very large number of threads and send additional RPC requests, which can potentially lead to performance degradation. To make a user function yield on such events, applications need to: 1) Decorate the function with the `@rpc.functions.async_execution` decorator; and 2) Let the function return a `torch.futures.Future` and install the resume logic as callbacks on the `Future` object. See below for an example:
132+
133+
134+
```python
135+
@rpc.functions.async_execution
136+
def async_add_chained(to, x, y, z):
137+
return rpc.rpc_async(to, torch.add, args=(x, y)).then(
138+
lambda fut: fut.wait() + z
139+
)
140+
141+
ret = rpc.rpc_sync(
142+
"worker1",
143+
async_add_chained,
144+
args=("worker2", torch.ones(2), 1, 1)
145+
)
146+
147+
print(ret) # prints tensor([3., 3.])
148+
```
149+
150+
* Tutorial for performant batch RPC using Asynchronous User Functions ([Link](https://github.com/pytorch/tutorials/blob/release/1.6/intermediate_source/rpc_async_execution.rst))
151+
* Documentation ([Link](https://pytorch.org/docs/stable/rpc.html#torch.distributed.rpc.functions.async_execution))
152+
* Usage examples ([Link](https://github.com/pytorch/examples/tree/master/distributed/rpc/batch))
153+
154+
# Frontend API Updates
155+
156+
## [Beta] Complex Numbers
157+
158+
The PyTorch 1.6 release brings beta level support for complex tensors including torch.complex64 and torch.complex128 dtypes. A complex number is a number that can be expressed in the form a + bj, where a and b are real numbers, and j is a solution of the equation x^2 = −1. Complex numbers frequently occur in mathematics and engineering, especially in signal processing and the area of complex neural networks is an active area of research. The beta release of complex tensors will support common PyTorch and complex tensor functionality, plus functions needed by Torchaudio, ESPnet and others. While this is an early version of this feature, and we expect it to improve over time, the overall goal is provide a NumPy compatible user experience that leverages PyTorch’s ability to run on accelerators and work with autograd to better support the scientific community.
159+
160+
# Updated Domain Libraries
161+
162+
## torchvision 0.7
163+
164+
torchvision 0.7 introduces two new pretrained semantic segmentation models, [FCN ResNet50](https://arxiv.org/abs/1411.4038) and [DeepLabV3 ResNet50](https://arxiv.org/abs/1706.05587), both trained on COCO and using smaller memory footprints than the ResNet101 backbone. We also introduced support for AMP (Automatic Mixed Precision) autocasting for torchvision models and operators, which automatically selects the floating point precision for different GPU operations to improve performance while maintaining accuracy.
165+
166+
* Release notes ([Link](https://github.com/pytorch/vision/releases))
167+
168+
## torchaudio 0.6
169+
170+
torchaudio now officially supports Windows. This release also introduces a new model module (with wav2letter included), new functionals (contrast, cvm, dcshift, overdrive, vad, phaser, flanger, biquad), datasets (GTZAN, CMU), and a new optional sox backend with support for TorchScript.
171+
172+
* Release notes ([Link](https://github.com/pytorch/audio/releases))
173+
174+
# Additional updates
175+
176+
## HACKATHON
177+
178+
The Global PyTorch Summer Hackathon is back! This year, teams can compete in three categories virtually:
179+
180+
1. **PyTorch Developer Tools:** Tools or libraries designed to improve productivity and efficiency of PyTorch for researchers and developers
181+
2. **Web/Mobile Applications powered by PyTorch:** Applications with web/mobile interfaces and/or embedded devices powered by PyTorch
182+
3. **PyTorch Responsible AI Development Tools:** Tools, libraries, or web/mobile apps for responsible AI development
183+
184+
This is a great opportunity to connect with the community and practice your machine learning skills.
185+
186+
* [Join the hackathon](http://pytorch2020.devpost.com/)
187+
* [Watch educational videos](https://www.youtube.com/pytorch)
188+
189+
190+
## LPCV Challenge
191+
192+
The [2020 CVPR Low-Power Vision Challenge (LPCV) - Online Track for UAV video](https://lpcv.ai/2020CVPR/video-track) submission deadline is coming up shortly. You have until July 31, 2020 to build a system that can discover and recognize characters in video captured by an unmanned aerial vehicle (UAV) accurately using PyTorch and Raspberry Pi 3B+.
193+
194+
## Prototype Features
195+
196+
To reiterate, Prototype features in PyTorch are early features that we are looking to gather feedback on, gauge the usefulness of and improve ahead of graduating them to Beta or Stable. The following features are not part of the PyTorch 1.6 release and instead are available in nightlies with separate docs/tutorials to help facilitate early usage and feedback.
197+
198+
#### Distributed RPC/Profiler
199+
Allow users to profile training jobs that use `torch.distributed.rpc` using the autograd profiler, and remotely invoke the profiler in order to collect profiling information across different nodes. The RFC can be found [here](https://github.com/pytorch/pytorch/issues/39675) and a short recipe on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source).
200+
201+
#### TorchScript Module Freezing
202+
Module Freezing is the process of inlining module parameters and attributes values into the TorchScript internal representation. Parameter and attribute values are treated as final value and they cannot be modified in the frozen module. The PR for this feature can be found [here](https://github.com/pytorch/pytorch/pull/32178) and a short tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source).
203+
204+
#### Graph Mode Quantization
205+
Eager mode quantization requires users to make changes to their model, including explicitly quantizing activations, module fusion, rewriting use of torch ops with Functional Modules and quantization of functionals are not supported. If we can trace or script the model, then the quantization can be done automatically with graph mode quantization without any of the complexities in eager mode, and it is configurable through a `qconfig_dict`. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source).
206+
207+
#### Quantization Numerical Suite
208+
Quantization is good when it works, but it’s difficult to know what's wrong when it doesn't satisfy the expected accuracy. A prototype is now available for a Numerical Suite that measures comparison statistics between quantized modules and float modules. This is available to test using eager mode and on CPU only with more support coming. A tutorial on how to use this feature can be found [here](https://github.com/pytorch/tutorials/tree/master/prototype_source).
209+
210+
211+
Cheers!
212+
213+
Team PyTorch

0 commit comments

Comments
 (0)