You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tutorial aims to provide an example of how a Recurrent Neural Network (RNN) using the Long Short Term Memory (LSTM) architecture can be implemented using Theano. In this tutorial, this model is used to perform sentiment analysis on movie reviews from the `Large Movie Review Dataset <http://ai.stanford.edu/~amaas/data/sentiment/>`_, sometimes known as the IMDB dataset.
9
+
This tutorial aims to provide an example of how a Recurrent Neural Network
10
+
(RNN) using the Long Short Term Memory (LSTM) architecture can be implemented
11
+
using Theano. In this tutorial, this model is used to perform sentiment
12
+
analysis on movie reviews from the `Large Movie Review Dataset
13
+
<http://ai.stanford.edu/~amaas/data/sentiment/>`_, sometimes known as the
14
+
IMDB dataset.
15
+
16
+
In this task, given a movie review, the model attempts to predict whether it
17
+
is positive or negative. This is a binary classification task.
18
+
19
+
Data
20
+
++++
21
+
22
+
As previously mentioned, the provided scripts are used to train a LSTM
23
+
recurrent neural network on the Large Movie Review Dataset dataset.
24
+
25
+
While the dataset is public, in this tutorial we provide a copy of the dataset
26
+
that has previously been preprocessed according to the needs of this LSTM
27
+
implementation. Running the code provided in this tutorial will automatically
28
+
download the data to the local directory.
29
+
30
+
Model
31
+
+++++
32
+
33
+
LSTM
34
+
====
35
+
36
+
In a *traditional* recurrent neural network, during the gradient
37
+
back-propagation phase, the gradient signal can end up being multiplied a
38
+
large number of times (as many as the number of timesteps) by the weight
39
+
matrix associated with the connections between the neurons of the recurrent
40
+
hidden layer. This means that, the magnitude of weights in the transition
41
+
matrix can have a strong impact on the learning process.
42
+
43
+
If the weights in this matrix are small (or, more formally, if the leading
44
+
eigenvalue of the weight matrix is smaller than 1.0), it can lead to a
45
+
situation called *vanishing gradients* where the gradient signal gets so small
46
+
that learning either becomes very slow or stops working altogether. It can
47
+
also make more difficult the task of learning long-term dependencies in the
48
+
data. Conversely, if the weights in this matrix are large (or, again, more
49
+
formally, if the leading eigenvalue of the weight matrix is larger than 1.0),
50
+
it can lead to a situation where the gradient signal is so large that it can
51
+
cause learning to diverge. This is often referred to as *exploding gradients*.
52
+
53
+
These issues are the main motivation behind the LSTM model which introduces a
54
+
new structure called a *memory cell* (see Figure 1 below). A memory cell is
55
+
composed of four main elements: an input gate, a neuron with a self-recurrent
56
+
connection (a connection to itself), a forget gate and an output gate. The
57
+
self-recurrent connection has a weight of 1.0 and ensures that, barring any
58
+
outside interference, the state of a memory cell can remain constant from one
59
+
timestep to another. The gates serve to modulate the interactions between the
60
+
memory cell itself and its environment. The input gate can allow incoming
61
+
signal to alter the state of the memory cell or block it. On the other hand,
62
+
the output gate can allow the state of the memory cell to have an effect on
63
+
other neurons or prevent it. Finally, the forget gate can modulate the memory
64
+
cell’s self-recurrent connection, allowing the cell to remember or forget its
65
+
previous state, as needed.
66
+
67
+
.. figure:: images/lstm_memorycell.png
68
+
:align: center
69
+
70
+
**Figure 1** : Illustration of an LSTM memory cell.
71
+
72
+
The equations below describe how a layer of memory cells is updated at every
73
+
timestep :math:`t`. In these equations :
74
+
75
+
* :math:`x_t` is the input to the memory cell layer at time :math:`t`
The script will automatically download the data and decompress it.
35
196
36
197
Papers
37
198
======
38
199
39
-
If you use this tutorial, please cite the following papers:
200
+
If you use this tutorial, please cite the following papers.
201
+
202
+
Introduction of the LSTM model:
203
+
204
+
* `[pdf] <http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf>`_ Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
205
+
206
+
Addition of the forget gate to the LSTM model:
40
207
41
-
* `[pdf] <http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf>`_ HOCHREITER, Sepp et SCHMIDHUBER, Jürgen. Long short-term memory. Neural computation, 1997, vol. 9, no 8, p. 1735-1780. 1997.
208
+
* `[pdf] <http://www.mitpressjournals.org/doi/pdf/10.1162/089976600300015015>`_ Gers, F. A., Schmidhuber, J., & Cummins, F. (2000). Learning to forget: Continual prediction with LSTM. Neural computation, 12(10), 2451-2471.
* Graves, A. (2012). Supervised sequence labelling with recurrent neural networks (Vol. 385). Springer.
236
+
237
+
* Hochreiter, S., Bengio, Y., Frasconi, P., & Schmidhuber, J. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies.
238
+
239
+
* Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 5(2), 157-166.
65
240
241
+
* Maas, A. L., Daly, R. E., Pham, P. T., Huang, D., Ng, A. Y., & Potts, C. (2011, June). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1 (pp. 142-150). Association for Computational Linguistics.
0 commit comments