Skip to content

Commit 74aba5c

Browse files
author
Grégoire
committed
convert bold subtitles to subsections
1 parent ff9b62d commit 74aba5c

File tree

1 file changed

+26
-12
lines changed

1 file changed

+26
-12
lines changed

doc/rnnslu.txt

Lines changed: 26 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
.. _rnnslu:
2+
13
Recurrent Neural Networks with Word Embeddings
24
**********************************************
35

@@ -15,11 +17,13 @@ in order to perform Semantic Parsing / Slot-Filling (Spoken Language Understandi
1517
Code - Citations - Contact
1618
++++++++++++++++++++++++++
1719

18-
**Code**
20+
Code
21+
====
1922

2023
Directly running experiments is also possible using this `github repository <https://github.com/mesnilgr/is13>`_.
2124

22-
**Papers**
25+
Papers
26+
======
2327

2428
If you use this tutorial, cite the following papers:
2529

@@ -35,7 +39,8 @@ If you use this tutorial, cite the following papers:
3539

3640
Thank you!
3741

38-
**Contact**
42+
Contact
43+
=======
3944

4045
Please email to `Grégoire Mesnil <http://www-etud.iro.umontreal.ca/~mesnilgr/>`_ for any
4146
problem report or feedback. We will be glad to hear from you.
@@ -91,7 +96,8 @@ measure the performance of our models.
9196
Recurrent Neural Network Model
9297
++++++++++++++++++++++++++++++
9398

94-
**Raw input encoding**
99+
Raw input encoding
100+
==================
95101

96102
A token corresponds to a word. Each token in the ATIS vocabulary is associated to an index. Each sentence is a
97103
array of indexes (``int32``). Then, each set (train, valid, test) is a list of arrays of indexes. A python
@@ -115,7 +121,8 @@ Same thing for labels corresponding to this particular sentence.
115121
'O', 'B-arrive_time.time_relative', 'B-arrive_time.time',
116122
'I-arrive_time.time', 'I-arrive_time.time']
117123

118-
**Context window**
124+
Context window
125+
==============
119126

120127
Given a sentence i.e. an array of indexes, and a window size i.e. 1,3,5,..., we
121128
need to convert each word in the sentence to a context window surrounding this
@@ -165,7 +172,8 @@ Here is a sample:
165172
To summarize, we started with an array of indexes and ended with a matrix of
166173
indexes. Each line corresponds to the context window surrounding this word.
167174

168-
**Word embeddings**
175+
Word embeddings
176+
=================
169177

170178
Once we have the sentence converted to context windows i.e. a matrix of indexes, we have to associate
171179
these indexes to the embeddings (real-valued vector associated to each word).
@@ -218,7 +226,8 @@ We now have a sequence (of length 5 which is corresponds to the length of the
218226
sentence) of **context window word embeddings** which is easy to feed to a simple
219227
recurrent neural network to iterate with.
220228

221-
**Elman recurrent neural network**
229+
Elman recurrent neural network
230+
==============================
222231

223232
The followin (Elman) recurrent neural network (E-RNN) takes as input the current input
224233
(time ``t``) and the previous hiddent state (time ``t-1``). Then it iterates.
@@ -329,22 +338,25 @@ Note that the extension is `txt` and you will have to cahnge it to `pl`.
329338
Training
330339
++++++++
331340

332-
**Updates**
341+
Updates
342+
=======
333343

334344
For stochastic gradient descent (SGD) update, we consider the whole sentence as a mini-batch
335345
and perform one update per sentence. It is possible to perform a pure SGD (contrary to mini-batch)
336346
where the update is done on only one single word at a time.
337347

338348
After each iteration/update, we normalize the word embeddings to keep them on a unit sphere.
339349

340-
**Stopping Criterion**
350+
Stopping Criterion
351+
==================
341352

342353
Early-stopping on a validation set is our regularization technique:
343354
the training is run for a given number of epochs (a single pass through the
344355
whole dataset) and keep the best model along with respect to the F1 score
345356
computed on the validation set after each epoch.
346357

347-
**Hyper-Parameter Selection**
358+
Hyper-Parameter Selection
359+
=========================
348360

349361
Although there is interesting research/`code
350362
<https://github.com/JasperSnoek/spearmint>`_ on the topic of automatic
@@ -373,7 +385,8 @@ The user can then run the code by calling:
373385
...
374386
('BEST RESULT: epoch', 57, 'valid F1', 97.23, 'best test F1', 94.2, 'with the model', 'rnnslu')
375387

376-
**Timing**
388+
Timing
389+
======
377390

378391
Running experiments on ATIS using this `repository <https://github.com/mesnilgr/is13>`_
379392
will run one epoch in less than 40 seconds on i7 CPU 950 @ 3.07GHz using less than 200 Mo of RAM::
@@ -394,7 +407,8 @@ After a few epochs, you obtain decent performance **94.48 % of F1 score**.::
394407
[learning] epoch 44 >> 100.00% completed in 35.31 (sec) <<
395408
[...]
396409

397-
**Word Embedding Nearest Neighbors**
410+
Word Embedding Nearest Neighbors
411+
================================
398412

399413
We can check the k-nearest neighbors of the learned embeddings. L2 and
400414
cosine distance gave the same results so we plot them for the cosine distance.

0 commit comments

Comments
 (0)