Update 2020-08-08-pytorch-1.6-now-includes-stochastic-weight-averaging.md

wookim3 · web-flow · commit ef47fd413f82 · 2020-08-18T09:51:51.000-07:00
diff --git a/_posts/2020-08-08-pytorch-1.6-now-includes-stochastic-weight-averaging.md b/_posts/2020-08-08-pytorch-1.6-now-includes-stochastic-weight-averaging.md
@@ -8,7 +8,7 @@ Do you use stochastic gradient descent (SGD) or Adam? Regardless of the procedur
 
 
 SWA has a wide range of applications and features:
-* SWA significantly improves performance compared to standard training techniques in computer vision (e.g., VGG, ResNets, Wide ResNets and DenseNets on ImageNet and CIFAR benchmarks [1, 2])
+* SWA significantly improves performance compared to standard training techniques in computer vision (e.g., VGG, ResNets, Wide ResNets and DenseNets on ImageNet and CIFAR benchmarks [1, 2]).
 * SWA provides state-of-the-art performance on key benchmarks in semi-supervised learning and domain adaptation [2].
 * SWA was shown to improve performance in language modeling (e.g., AWD-LSTM on WikiText-2 [4]) and policy-gradient methods in deep reinforcement learning [3].
 * SWAG, an extension of SWA, can approximate Bayesian model averaging in Bayesian deep learning and achieves state-of-the-art uncertainty calibration results in various settings. Moreover, its recent generalization MultiSWAG provides significant additional performance gains and mitigates double-descent [4, 10]. Another approach, Subspace Inference, approximates the Bayesian posterior in a small subspace of the parameter space around the SWA solution [5].