@@ -41,14 +41,14 @@ hidden layer. This means that, the magnitude of weights in the transition
41
41
matrix can have a strong impact on the learning process.
42
42
43
43
If the weights in this matrix are small (or, more formally, if the leading
44
- eigenvalue of the weight matrix is small ), it can lead to a situation called
45
- *vanishing gradients* where the gradient signal gets so small that learning
46
- either becomes very slow or stops working altogether. It can also make more
47
- difficult the task of learning long-term dependencies in the data.
48
- Conversely, if the weights in this matrix are large (or, again, more formally,
49
- if the leading eigenvalue of the weight matrix is large), it can lead to a
50
- situation where the gradient signal is so large that it can cause learning to
51
- diverge. This is often referred to as *exploding gradients*.
44
+ eigenvalue of the weight matrix is smaller than 1.0 ), it can lead to a
45
+ situation called *vanishing gradients* where the gradient signal gets so small
46
+ that learning either becomes very slow or stops working altogether. It can
47
+ also make more difficult the task of learning long-term dependencies in the
48
+ data. Conversely, if the weights in this matrix are large (or, again, more
49
+ formally, if the leading eigenvalue of the weight matrix is larger than 1.0),
50
+ it can lead to a situation where the gradient signal is so large that it can
51
+ cause learning to diverge. This is often referred to as *exploding gradients*.
52
52
53
53
These issues are the main motivation behind the LSTM model which introduces a
54
54
new structure called a *memory cell* (see Figure 1 below). A memory cell is
0 commit comments