Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021

Sequence Modeling: Recurrent
and Recursive Nets

Dr. Anand Kumar M
Dept. Of IT
National Institute of Technology Karnataka (NITK)
m_anandkumar@nitk.edu.in
Outline
• Motivation
• Unfolding Computational Graph
• RNN –variants
• BPTT
• Bi-RNN
• Encoder-Decoder
• DRN and Recursive NN
• LSTM
Motivation
• Recurrent neural networks , or RNNs, are a
family of neural networks for processing
sequential data.
• Much as a convolutional network is a neural
network that is specialized for processing a
grid of values X such as an image.
• Recurrent neural network is a neural network
that is specialized for processing a sequence of
values x (1) , . . . , x (τ)
Motivation
• recurrent networks can also process
sequences of variable length.
• Parameter sharing makes it possible to extend
and apply the model to examples of different
lengths and generalize across them.
• If we had separate parameters for each value
of the time index, we could not generalize to
sequence lengths not seen during training
Motivation
• The convolution operation allows a network to
share parameters across time but is shallow.
• The output of convolution is a sequence
where each member of the output is a
function of a small number of neighboring
members of the input.
• IN RNN, each member of the output is a
function of the previous members of the
output.
Unfolding Computational Graphs
• As another example, let us consider a
dynamical system driven by an external signal
Xt.
• The unfolding process thus introduces two
major advantages:
– Regardless of the sequence length, the learned
model always has the same input size, because it
is specified in terms of transition from one state to
another state, rather than specified in terms of a
variable-length history of states.
– It is possible to use the same transition function f
with the same parameters at every time step
Recurrent Neural Network
RNN
Naïve RNN

y
h' Wh h Wi x
h f h'
y Wo h’ Note, y is computed
from h’
x softmax
We have ignored the bias

How does RNN reduce complexity?
• Given function f: h’,y=f(h,x) h and h’ are vectors with the
same dimension
y1 y2 y3
h0 f h1 f h2 f h3 ……
x1 x2 x3
No matter how long the input/output sequence is, we only need
one function f. If f’s are different, then it becomes a feedforward
NN. This may be treated as another compression from fully
connected network.
Recurrent Neural Networks
• Some examples of important design patterns
for recurrent neural networks include the
following.
Bi-RNN
• In many applications, however, we want to
output a prediction of y(t) that may depend
on the whole input sequence.
• They have been extremely successful in
applications where that need arises, such as
handwriting recognition, Speech recognition
and Bioinformatics.
Bi-RNN
• As the name suggests, bidirectional RNNs
combine an RNN that moves forward through
time, beginning from the start of the
sequence, with another RNN that moves
backward through time, beginning from the
end of the sequence.
Bidirectional RNN y,h=f1(x,h) z,g = f2(g,x)
x1 x2 x3
g0 f2 g1 f2 g2 f2 g3
z1 z2 z3
p=f3(y,z) f3 p1 f3 p2 f3 p3
y1 y2 y3
h0 f1 h1 f1 h2 f1 h3
x1 x2 x3
Encoder-Decoder Sequence-to-Sequence Architectures
• RNN can map an input sequence to a fixed-size

vector.
• An RNN can map a fixed-size vector to a
sequence.
• an RNN can map an input sequence to an
output sequence of the same length.
• RNN can be trained to map an input sequence
to an output sequence which is not necessarily
of the same length.
Example of an encoder-decoder or
sequence-to- sequence RNN
architecture, for learning to
generate an output sequence y
given an input sequence x.
It is composed of an encoder RNN that
reads the input sequence as well as
a decoder RNN that generates the
output sequence (or computes the
probability of a given output
sequence).
The final hidden state of the encoder
RNN is used to compute a generally
fixed-size context variable C , which
represents a semantic summary of
the input sequence and is given as
input to the decoder RNN
Deep Recurrent Net
Deep RNN h’,y = f1(h,x), g’,z = f2(g,y)
…
z1 z2 z3
g0 f2 g1 f2 g2 f2 g3 ……
y1 y2 y3
h0 f1 h1 f1 h2 f1 h3 ……
x1 x2 x3
Recursive NN
Problems with naive RNN
• When dealing with a time series, it tends to
forget old information. When there is a distant
relationship of unknown length, we wish to
have a “memory” to it.
• Vanishing gradient problem.
LSTM
The sigmoid layer outputs numbers between 0-1 determine how much
each component should be let through. Pink X gate is point-wise multiplication.
Output
This
This LSTM
decides
gate
sigmoid gateinfo
what
Controls what
Isdetermines how
to add to the cellmuch
state
goes into output
information goes thru
Ct-1
ht-1
Forget input
gate gate
The core idea is this cell state
Why sigmoid or tanh:
Ct, it is changed slowly, with
Sigmoid: 0,1 gating as switch.
only minor linear interactions.
Vanishing gradient problem in
It is very easy for information
LSTM is handled already.
to flow along it unchanged.
ReLU replaces tanh ok?
it decides what component
is to be updated.
C’t provides change contents
Updating the cell state
Decide what part of the cell

state to output
RNN vs LSTM
Naïve RNN vs LSTM yt
yt ct-1 ct
LSTM
Naïve
ht-1 ht ht-1 ht
RNN
xt xt
c changes slowly ct is ct-1 added by something
h changes faster ht and ht-1 can be very different

These 4 matrix
computation should
be done concurrently.
xt
z W
ht-1
ct-1 xt
zi = σ( Wi )
ht-1
Controls Controls Updating Controls
forget gate input gate information Output gate xt
zf = σ( Wf )
ht-1
zf zi z zo
xt
zo = σ( Wo )
ht-1 xt ht-1
Information flow of LSTM

Applications
End to End Memory Networks
Neural machine translation
LSTM
Sequence to sequence chat model
Chat with context
M: Hi
M: Hello
U: Hi
M: Hi
M: Hello U: Hi
Serban, Iulian V., Alessandro Sordoni, Yoshua Bengio, Aaron Courville, and Joelle Pineau, 2015
"Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models.
Baidu’s speech recognition using RNN
Attention
Image Caption Generation
Word 1 Word 2
A vector for
each region
z0 z1 z2
weighted
CNN filter filter filter sum
filter filter filter
0.0 0.8 0.2
0.0 0.0 0.0

Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural
Image Caption Generation with Visual Attention”, ICML, 2015
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio, “Show, Attend and Tell: Neural
Image Caption Generation with Visual Attention”, ICML, 2015
Li Yao, Atousa Torabi, Kyunghyun Cho, Nicolas Ballas, Christopher Pal, Hugo Larochelle, Aaron
Courville, “Describing Videos by Exploiting Temporal Structure”, ICCV, 2015
Demo

Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021

Uploaded by

Copyright:

Available Formats

Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sequence Modeling RNN-LSTM-APPL-Anand Kumar JUNE2021

Uploaded by

Copyright:

Available Formats

Sequence Modeling: Recurrent

and Recursive Nets

We have ignored the bias

• RNN can map an input sequence to a ﬁxed-size

Updating the cell state

Decide what part of the cell

c changes slowly ct is ct-1 added by something

h changes faster ht and ht-1 can be very different

Information flow of LSTM

filter filter filter

You might also like