@@ -178,7 +178,7 @@ Using Theano, it gives::
178
178
(nv+1, de)).astype(theano.config.floatX)) # add one for PADDING at the end
179
179
180
180
idxs = T.imatrix() # as many columns as words in the context window and as many lines as words in the sentence
181
- x, _ = theano.scan(fn = lambda idx: embeddings[idx].flatten(), sequences = idxs)
181
+ x, _ = theano.scan(fn= lambda idx: embeddings[idx].flatten(), sequences= idxs)
182
182
183
183
The x symbolic variable corresponds to a matrix of shape (number of words in the
184
184
sentences, dimension of the embedding space X context window size).
@@ -193,7 +193,7 @@ Let's compile a theano function to do so
193
193
[-1, 0, 1, 2, 3, 4,-1],
194
194
[ 0, 1, 2, 3, 4,-1,-1],
195
195
[ 1, 2, 3, 4,-1,-1,-1]]
196
- >>> f = theano.function( inputs=[idxs], outputs=x)
196
+ >>> f = theano.function(inputs=[idxs], outputs=x)
197
197
>>> f(cx)
198
198
array([[-0.08088442, 0.08458307, 0.05064092, ..., 0.06876887,
199
199
-0.06648078, -0.15192257],
@@ -250,25 +250,25 @@ It gives the following code::
250
250
de :: dimension of the word embeddings
251
251
cs :: word window context size
252
252
'''
253
- self.emb = theano.shared(name='embeddings', value=0.2 * numpy.random.uniform(-1.0, 1.0,\
253
+ self.emb = theano.shared(name='embeddings', value=0.2 * numpy.random.uniform(-1.0, 1.0,
254
254
(ne+1, de)).astype(theano.config.floatX)) # add one for PADDING at the end
255
- self.Wx = theano.shared(name='Wx', value=0.2 * numpy.random.uniform(-1.0, 1.0,\
255
+ self.Wx = theano.shared(name='Wx', value=0.2 * numpy.random.uniform(-1.0, 1.0,
256
256
(de * cs, nh)).astype(theano.config.floatX))
257
- self.Wh = theano.shared(name='Wh', value=0.2 * numpy.random.uniform(-1.0, 1.0,\
257
+ self.Wh = theano.shared(name='Wh', value=0.2 * numpy.random.uniform(-1.0, 1.0,
258
258
(nh, nh)).astype(theano.config.floatX))
259
- self.W = theano.shared(name='W', value=0.2 * numpy.random.uniform(-1.0, 1.0,\
259
+ self.W = theano.shared(name='W', value=0.2 * numpy.random.uniform(-1.0, 1.0,
260
260
(nh, nc)).astype(theano.config.floatX))
261
261
self.bh = theano.shared(name='bh', value=numpy.zeros(nh, dtype=theano.config.floatX))
262
262
self.b = theano.shared(name='b', value=numpy.zeros(nc, dtype=theano.config.floatX))
263
263
self.h0 = theano.shared(name='h0', value=numpy.zeros(nh, dtype=theano.config.floatX))
264
264
265
265
# bundle
266
- self.params = [ self.emb, self.Wx, self.Wh, self.W, self.bh, self.b, self.h0 ]
266
+ self.params = [self.emb, self.Wx, self.Wh, self.W, self.bh, self.b, self.h0]
267
267
268
268
Then we integrate the way to build the input from the embedding matrix::
269
269
270
270
idxs = T.imatrix() # as many columns as context window size/lines as words in the sentence
271
- x, _ = theano.scan(fn = lambda idx: self.emb[idx].flatten(), sequences = idxs)
271
+ x, _ = theano.scan(fn= lambda idx: self.emb[idx].flatten(), sequences= idxs)
272
272
y = T.ivector('y') # label
273
273
274
274
We use the scan operator to construct the recursion, works like a charm::
@@ -278,33 +278,33 @@ We use the scan operator to construct the recursion, works like a charm::
278
278
s_t = T.nnet.softmax(T.dot(h_t, self.W) + self.b)
279
279
return [h_t, s_t]
280
280
281
- [h, s], _ = theano.scan(fn=recurrence, \
282
- sequences=x, outputs_info=[self.h0, None], \
281
+ [h, s], _ = theano.scan(fn=recurrence,
282
+ sequences=x, outputs_info=[self.h0, None],
283
283
n_steps=x.shape[0])
284
284
285
- p_y_given_x_sentence = s[:,0, :]
285
+ p_y_given_x_sentence = s[:, 0, :]
286
286
y_pred = T.argmax(p_y_given_x_sentence, axis=1)
287
287
288
288
Theano will then compute all the gradients automatically to maximize the log-likelihood::
289
289
290
290
lr = T.scalar('lr')
291
291
nll = -T.mean(T.log(p_y_given_x_sentence)[T.arange(x.shape[0]),y])
292
292
gradients = T.grad( nll, self.params )
293
- updates = OrderedDict(( p, p- lr*g ) for p, g in zip( self.params , gradients))
293
+ updates = OrderedDict((p, p - lr*g) for p, g in zip(self.params, gradients))
294
294
295
295
Next compile those functions::
296
296
297
297
self.classify = theano.function(inputs=[idxs], outputs=y_pred)
298
298
299
- self.train = theano.function( inputs = [idxs, y, lr],
300
- outputs = nll,
301
- updates = updates )
299
+ self.train = theano.function(inputs= [idxs, y, lr],
300
+ outputs= nll,
301
+ updates= updates)
302
302
303
303
We keep the word embeddings on the unit sphere by normalizing them after each update::
304
304
305
- self.normalize = theano.function( inputs = [],
306
- updates = {self.emb:\
307
- self.emb/ T.sqrt((self.emb**2).sum(axis=1)).dimshuffle(0,'x')})
305
+ self.normalize = theano.function(inputs= [],
306
+ updates = {self.emb:
307
+ self.emb / T.sqrt((self.emb**2).sum(axis=1)).dimshuffle(0, 'x')})
308
308
309
309
And that's it!
310
310
@@ -358,7 +358,7 @@ Results
358
358
**Timing**
359
359
360
360
Running experiments on ATIS using this `repository <https://github.com/mesnilgr/is13>`_
361
- will run one epoch in less than 40 seconds on bart1 processor using less than 200 Mo of RAM::
361
+ will run one epoch in less than 40 seconds on i7 CPU 950 @ 3.07GHz using less than 200 Mo of RAM::
362
362
363
363
[learning] epoch 0 >> 100.00% completed in 34.48 (sec) <<
364
364
@@ -378,7 +378,7 @@ After a few epochs, you obtain decent performance **94.48 % of F1 score**.::
378
378
379
379
**Word Embedding Nearest Neighbors**
380
380
381
- We can check the k-nearest neighbors of the thus learned embeddings. L2 and
381
+ We can check the k-nearest neighbors of the learned embeddings. L2 and
382
382
cosine distance gave the same results so we plot them for the cosine distance.
383
383
384
384
+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+------------------------------+
0 commit comments