Skip to content

Commit ef47fd4

Browse files
authored
Update 2020-08-08-pytorch-1.6-now-includes-stochastic-weight-averaging.md
1 parent 49ff5cb commit ef47fd4

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2020-08-08-pytorch-1.6-now-includes-stochastic-weight-averaging.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Do you use stochastic gradient descent (SGD) or Adam? Regardless of the procedur
88

99

1010
SWA has a wide range of applications and features:
11-
* SWA significantly improves performance compared to standard training techniques in computer vision (e.g., VGG, ResNets, Wide ResNets and DenseNets on ImageNet and CIFAR benchmarks [1, 2])
11+
* SWA significantly improves performance compared to standard training techniques in computer vision (e.g., VGG, ResNets, Wide ResNets and DenseNets on ImageNet and CIFAR benchmarks [1, 2]).
1212
* SWA provides state-of-the-art performance on key benchmarks in semi-supervised learning and domain adaptation [2].
1313
* SWA was shown to improve performance in language modeling (e.g., AWD-LSTM on WikiText-2 [4]) and policy-gradient methods in deep reinforcement learning [3].
1414
* SWAG, an extension of SWA, can approximate Bayesian model averaging in Bayesian deep learning and achieves state-of-the-art uncertainty calibration results in various settings. Moreover, its recent generalization MultiSWAG provides significant additional performance gains and mitigates double-descent [4, 10]. Another approach, Subspace Inference, approximates the Bayesian posterior in a small subspace of the parameter space around the SWA solution [5].

0 commit comments

Comments
 (0)