1
+ .. _rnnslu:
2
+
1
3
Recurrent Neural Networks with Word Embeddings
2
4
**********************************************
3
5
@@ -15,11 +17,13 @@ in order to perform Semantic Parsing / Slot-Filling (Spoken Language Understandi
15
17
Code - Citations - Contact
16
18
++++++++++++++++++++++++++
17
19
18
- **Code**
20
+ Code
21
+ ====
19
22
20
23
Directly running experiments is also possible using this `github repository <https://github.com/mesnilgr/is13>`_.
21
24
22
- **Papers**
25
+ Papers
26
+ ======
23
27
24
28
If you use this tutorial, cite the following papers:
25
29
@@ -35,7 +39,8 @@ If you use this tutorial, cite the following papers:
35
39
36
40
Thank you!
37
41
38
- **Contact**
42
+ Contact
43
+ =======
39
44
40
45
Please email to `Grégoire Mesnil <http://www-etud.iro.umontreal.ca/~mesnilgr/>`_ for any
41
46
problem report or feedback. We will be glad to hear from you.
@@ -91,7 +96,8 @@ measure the performance of our models.
91
96
Recurrent Neural Network Model
92
97
++++++++++++++++++++++++++++++
93
98
94
- **Raw input encoding**
99
+ Raw input encoding
100
+ ==================
95
101
96
102
A token corresponds to a word. Each token in the ATIS vocabulary is associated to an index. Each sentence is a
97
103
array of indexes (``int32``). Then, each set (train, valid, test) is a list of arrays of indexes. A python
@@ -115,7 +121,8 @@ Same thing for labels corresponding to this particular sentence.
115
121
'O', 'B-arrive_time.time_relative', 'B-arrive_time.time',
116
122
'I-arrive_time.time', 'I-arrive_time.time']
117
123
118
- **Context window**
124
+ Context window
125
+ ==============
119
126
120
127
Given a sentence i.e. an array of indexes, and a window size i.e. 1,3,5,..., we
121
128
need to convert each word in the sentence to a context window surrounding this
@@ -165,7 +172,8 @@ Here is a sample:
165
172
To summarize, we started with an array of indexes and ended with a matrix of
166
173
indexes. Each line corresponds to the context window surrounding this word.
167
174
168
- **Word embeddings**
175
+ Word embeddings
176
+ =================
169
177
170
178
Once we have the sentence converted to context windows i.e. a matrix of indexes, we have to associate
171
179
these indexes to the embeddings (real-valued vector associated to each word).
@@ -218,7 +226,8 @@ We now have a sequence (of length 5 which is corresponds to the length of the
218
226
sentence) of **context window word embeddings** which is easy to feed to a simple
219
227
recurrent neural network to iterate with.
220
228
221
- **Elman recurrent neural network**
229
+ Elman recurrent neural network
230
+ ==============================
222
231
223
232
The followin (Elman) recurrent neural network (E-RNN) takes as input the current input
224
233
(time ``t``) and the previous hiddent state (time ``t-1``). Then it iterates.
@@ -329,22 +338,25 @@ Note that the extension is `txt` and you will have to cahnge it to `pl`.
329
338
Training
330
339
++++++++
331
340
332
- **Updates**
341
+ Updates
342
+ =======
333
343
334
344
For stochastic gradient descent (SGD) update, we consider the whole sentence as a mini-batch
335
345
and perform one update per sentence. It is possible to perform a pure SGD (contrary to mini-batch)
336
346
where the update is done on only one single word at a time.
337
347
338
348
After each iteration/update, we normalize the word embeddings to keep them on a unit sphere.
339
349
340
- **Stopping Criterion**
350
+ Stopping Criterion
351
+ ==================
341
352
342
353
Early-stopping on a validation set is our regularization technique:
343
354
the training is run for a given number of epochs (a single pass through the
344
355
whole dataset) and keep the best model along with respect to the F1 score
345
356
computed on the validation set after each epoch.
346
357
347
- **Hyper-Parameter Selection**
358
+ Hyper-Parameter Selection
359
+ =========================
348
360
349
361
Although there is interesting research/`code
350
362
<https://github.com/JasperSnoek/spearmint>`_ on the topic of automatic
@@ -373,7 +385,8 @@ The user can then run the code by calling:
373
385
...
374
386
('BEST RESULT: epoch', 57, 'valid F1', 97.23, 'best test F1', 94.2, 'with the model', 'rnnslu')
375
387
376
- **Timing**
388
+ Timing
389
+ ======
377
390
378
391
Running experiments on ATIS using this `repository <https://github.com/mesnilgr/is13>`_
379
392
will run one epoch in less than 40 seconds on i7 CPU 950 @ 3.07GHz using less than 200 Mo of RAM::
@@ -394,7 +407,8 @@ After a few epochs, you obtain decent performance **94.48 % of F1 score**.::
394
407
[learning] epoch 44 >> 100.00% completed in 35.31 (sec) <<
395
408
[...]
396
409
397
- **Word Embedding Nearest Neighbors**
410
+ Word Embedding Nearest Neighbors
411
+ ================================
398
412
399
413
We can check the k-nearest neighbors of the learned embeddings. L2 and
400
414
cosine distance gave the same results so we plot them for the cosine distance.
0 commit comments