@@ -53,7 +53,7 @@ are trained, we can train the :math:`k+1`-th layer because we can now
53
53
compute the code or latent representation from the layer below.
54
54
55
55
Once all layers are pre-trained, the network goes through a second stage
56
- of training called **fine-tuning**,
56
+ of training called **fine-tuning**. Here we consider **supervised fine-tuning**
57
57
where we want to minimize prediction error on a supervised task.
58
58
For this, we first add a logistic regression
59
59
layer on top of the network (more precisely on the output code of the
@@ -66,15 +66,14 @@ training. (See the :ref:`mlp` for details on the multilayer perceptron.)
66
66
67
67
This can be easily implemented in Theano, using the class defined
68
68
previously for a denoising autoencoder. We can see the stacked denoising
69
- autoencoder as having two facades: One is a list of
70
- autoencoders. The other is an MLP. During pre-training we use the first facade, i.e., we treat our model
69
+ autoencoder as having two facades: a list of
70
+ autoencoders, and an MLP. During pre-training we use the first facade, i.e., we treat our model
71
71
as a list of autoencoders, and train each autoencoder seperately. In the
72
- second stage of training, we use the second facade. These two
73
- facades are linked
72
+ second stage of training, we use the second facade. These two facades are linked because:
74
73
75
- * by the parameters shared by the autoencoders and the sigmoid layers of the MLP, and
74
+ * the autoencoders and the sigmoid layers of the MLP share parameters , and
76
75
77
- * by feeding the latent representations of intermediate layers of the MLP as input to the autoencoders.
76
+ * the latent representations computed by intermediate layers of the MLP are fed as input to the autoencoders.
78
77
79
78
.. literalinclude:: ../code/SdA.py
80
79
:start-after: start-snippet-1
@@ -83,8 +82,8 @@ facades are linked
83
82
``self.sigmoid_layers`` will store the sigmoid layers of the MLP facade, while
84
83
``self.dA_layers`` will store the denoising autoencoder associated with the layers of the MLP.
85
84
86
- Next, we construct ``n_layers`` denoising autoencoders and ``n_layers`` sigmoid
87
- layers , where ``n_layers`` is the depth of our model. We use the
85
+ Next, we construct ``n_layers`` sigmoid layers and ``n_layers`` denoising
86
+ autoencoders , where ``n_layers`` is the depth of our model. We use the
88
87
``HiddenLayer`` class introduced in :ref:`mlp`, with one
89
88
modification: we replace the ``tanh`` non-linearity with the
90
89
logistic function :math:`s(x) = \frac{1}{1+e^{-x}}`).
0 commit comments