5a. Recurrent Neural Networks
5a. Recurrent Neural Networks
5a. Recurrent Neural Networks
Learning
Week 5a. Recurrent Neural Networks (RNNs)
Sonit Singh
School of Computer Science and Engineering
June 25, 2024
Outline
2
Processing Temporal Sequences
There are many tasks which require a sequence of inputs to be processed rather
than a single input.
• speech recognition
• time series prediction
• machine translation
• handwriting recognition
3
Sliding Window
4
NetTalk Task
5
NetTalk Architecture
6
NetTalk Test
https://www.youtube.com/watch?v=gakJlr3GecE
7
NetTalk
8
Simple Recurrent Network (Elman, 1990)
• at each time step, hidden layer activations are copied to “context” layer
• hidden layer receives connections from input and context layers
• the inputs are fed one at a time to the network, it uses the context layer to
“remember” whatever information is required for it to produce the correct
output
9
Back Propagation Through Time
10
Other Recurrent Neural Architectures
d
j
X
xt = tanh(Wσj0t + Wσjkt xt−1
k
)
k=1
d
Pj xnj )
X
z = tanh(P0 +
j=1
12
Task: Formal Language Recognition
Accept Reject
1 0
11 10
111 01
1111 00
11111 011
111111 110
1111111 11111110
11111111 10111111
13
Dynamic Recognizers
14
Task: Formal Language Recognition
Accept Reject
1 000
0 11000
10 0001
01 000000000
00 11111000011
100100 1101010000010111
001111110100 1010010001
0100100100 0000
11100 00000
0010
15
Dynamical Recognizers
16
Phase Transition
17
Chomsky Hierarchy
18
Task: Formal Language Prediction
abaabbabaaabbbaaaabbbbabaabbaaaaabbbbb . . .
• Scan a sequence of characters one at a time, and try at each step to predict
the next character in the sequence.
• In some cases, the prediction is probabilistic.
• For the an b n task, the first b is not predictable, but subsequent b’s and the
initial a in the next subsequence are predictable.
19
Elman Network for predicting an bn
a b
a b
20
Oscillating Solution for an bn
21
Learning to Predict an bn
• the network does not implement a Finite State Automaton but instead uses
two fixed points in activation space – one attracting, the other repelling (Wiles
& Elman, 1995)
• networks trained only up to a10 b 10 could generalize up to a12 b 12
• training the weights by evolution is more stable than by backpropagation
• networks trained by evolution were sometimes monotonic rather than
oscillating
22
Monotonic Solution for an bn
23
Hidden Unit Analysis for an bn
24
Counting by Spiralling
• for this task, sequence is accepted if the number of a’s and b’s are equal
• network counts up by spiralling inwards, down by spiralling outwards
25
Hidden Unit Dynamics for an bn c n
26
Partly Monotonic Solution for an bn c n
27
Long Range Dependencies
28
Long Short Term Memory
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
christianherta.de/lehre/dataScience/machineLearning/neuralNetworks/LSTM.php
29
Reber Grammar
30
Embedded Reber Grammar
31
Simple Recurrent Network
• SRN – context layer is combined directly with the input to produce the next
hidden layer.
• SRN can learn Reber Grammar, but not Embedded Reber Grammar.
32
Long Short Term Memory
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
33
Long Short Term Memory
34
Gated Recurrent Unit
GRU is similar to LSTM but has only two gates instead of three.
35
End-to-End Text Classification
36
Seq2Seq model
Encoder-Decoder Framework
37
Seq2Seq model applications
Machine Translation
38
Seq2Seq model applications
39
Seq2Seq model applications
Google’s Neural Machine Translation
40
CNNs + LSTM : Image Captioning
Show and Tell: Neural Image Caption Generator (Vinyals et al. 2015)
41
CNNs + LSTM : Image Captioning
Show and Tell: Neural Image Caption Generator (Vinyals et al. 2015)
42
CNNs + LSTM : Image Captioning
43
CNNs + LSTM : Image Captioning
Show, Attend and Tell (Xu et al. 2015) Examples
44
Summary
• Recurrent Neural Networks (RNNs) are specialised neural networks suitable
for modelling sequential or time-series data.
• RNNs have a looping mechanism that acts as a highway to allow information
to flow from one step to the next. This information is the hidden state, which is
a representation of previous inputs.
• Simple RNNs suffers from vanishing gradient problem
◦ As the RNNs processes more steps, it has troubles retaining information from
previous steps.
◦ Due to back-propagation, the earlier layers fail to do any learning as the internal
weights are barely being adjusted due to extremely small gradients.
◦ Does not learn the long-range dependencies across time steps
• LSTMs and GRUs are two special RNNs, capable of learning long-term
dependencies using mechanisms called gates.
• These gates are different tensor operations that can learn what information to
add or remove to the hidden state.
45