You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/rnnrbm.txt
+26-24Lines changed: 26 additions & 24 deletions
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,7 @@ Note that for clarity of the implementation, contrarily to [BoulangerLewandowski
67
67
Implementation
68
68
++++++++++++++
69
69
70
-
We wish to construct two Theano functions: one to to train the RNN-RBM, and one to generate sample sequences from it.
70
+
We wish to construct two Theano functions: one to train the RNN-RBM, and one to generate sample sequences from it.
71
71
72
72
For *training*, i.e. given :math:`\{v^{(t)}\}`, the RNN hidden state :math:`\{u^{(t)}\}` and the associated :math:`\{b_v^{(t)}, b_h^{(t)}\}` parameters are deterministic and can be readily computed for each training sequence.
73
73
A stochastic gradient descent (SGD) update on the parameters can then be estimated via contrastive divergence (CD) on the individual time steps of a sequence in the same way that individual training examples are treated in a mini-batch for regular RBMs.
@@ -312,30 +312,30 @@ The output was the following:
312
312
313
313
.. code-block:: text
314
314
315
-
Epoch 1/150 -15.0154373583
316
-
Epoch 2/150 -10.4948703701
317
-
Epoch 3/150 -10.2507567848
318
-
Epoch 4/150 -10.1417621708
319
-
Epoch 5/150 -9.69403756276
320
-
Epoch 6/150 -8.6036962785
321
-
Epoch 7/150 -8.35180803953
322
-
Epoch 8/150 -8.26202621624
323
-
Epoch 9/150 -8.21526214665
324
-
Epoch 10/150 -8.16552397791
315
+
Epoch 1/200 -15.0308940028
316
+
Epoch 2/200 -10.4892606673
317
+
Epoch 3/200 -10.2394696138
318
+
Epoch 4/200 -10.1431669994
319
+
Epoch 5/200 -9.7005382843
320
+
Epoch 6/200 -8.5985647524
321
+
Epoch 7/200 -8.35115428534
322
+
Epoch 8/200 -8.26453580552
323
+
Epoch 9/200 -8.21208991542
324
+
Epoch 10/200 -8.16847274143
325
325
326
326
... truncated for brevity ...
327
327
328
-
Epoch 140/150 -5.09668220315
329
-
Epoch 141/150 -5.08657006002
330
-
Epoch 142/150 -5.09776776338
331
-
Epoch 143/150 -5.10151042486
332
-
Epoch 144/150 -5.07677377181
333
-
Epoch 145/150 -5.07374453388
334
-
Epoch 146/150 -inf
335
-
Epoch 147/150 -5.06393939067
336
-
Epoch 148/150 -5.07493685431
337
-
Epoch 149/150 -5.06504525246
338
-
Epoch 150/150 -5.04567771601
328
+
Epoch 190/200 -4.74799179994
329
+
Epoch 191/200 -4.73488515216
330
+
Epoch 192/200 -4.7326138489
331
+
Epoch 193/200 -4.73841636884
332
+
Epoch 194/200 -4.70255511452
333
+
Epoch 195/200 -4.71872634914
334
+
Epoch 196/200 -4.7276415885
335
+
Epoch 197/200 -4.73497644728
336
+
Epoch 198/200 -inf
337
+
Epoch 199/200 -4.75554987143
338
+
Epoch 200/200 -4.72591935412
339
339
340
340
341
341
@@ -346,7 +346,7 @@ The figures below show the piano-rolls of two sample sequences and we provide th
346
346
347
347
Listen to `sample1.mid <http://www-etud.iro.umontreal.ca/~boulanni/sample1.mid>`_
348
348
349
-
.. figure:: images/sample1.png
349
+
.. figure:: images/sample2.png
350
350
:scale: 60%
351
351
352
352
Listen to `sample2.mid <http://www-etud.iro.umontreal.ca/~boulanni/sample2.mid>`_
@@ -357,10 +357,12 @@ How to improve this code
357
357
358
358
The code shown in this tutorial is a stripped-down version that can be improved in the following ways:
359
359
360
+
* Preprocessing: transposing the sequences in a common tonality (e.g. C major / minor) and normalizing the tempo in beats (quarternotes) per minute can have the most effect on the generative quality of the model.
360
361
* Pretraining techniques: initialize the :math:`W,b_v,b_h` parameters with independent RBMs with fully shuffled frames (i.e. :math:`W_{uh}=W_{uv}=W_{uu}=W_{vu}=0`); initialize the :math:`W_{uv},W_{uu},W_{vu},b_u` parameters of the RNN with the auxiliary cross-entropy objective via either SGD or, preferably, Hessian-free optimization [BoulangerLewandowski12]_.
361
362
* Optimization techniques: gradient clipping, Nesterov momentum and the use of NADE for conditional density estimation.
362
-
* Preprocessing: transposing the sequences in a common tonality (e.g. C major / minor) and normalizing the tempo in beats (quarternotes) per minute can yield substantial improvement in the generative quality of the model.
363
363
* Hyperparameter search: learning rate (separately for the RBM and RNN parts), learning rate schedules, batch size, number of hidden units (recurrent and RBM), momentum coefficient, momentum schedule, Gibbs chain length :math:`k` and early stopping.
364
364
* Learn the initial condition :math:`u^{(0)}` as a model parameter.
365
365
366
366
367
+
A few samples generated with code including these features are available `here <http://www-etud.iro.umontreal.ca/~boulanni/sequences.zip>`_.
0 commit comments