Feature alpha dropout #855

rfratila · 2017-06-28T02:22:16Z

This is my first PR to Lasagne so I'm definitely looking for suggestions. To complement SELU activation, I've made a new AlphaDropoutLayer as specified in the paper: https://arxiv.org/abs/1706.02515

rfratila · 2017-06-28T03:07:09Z

Due to the similarities and common code from DropoutLayer, should it extend from DropoutLayer and modify the get_out_for()?

f0k

Welcome, and thank you for the PR! Seems a good strategy to attract contributors is to wait for an interesting paper to come out and not implement it right away ;)

I've got some detailed comments, and a general note on the implementation. You're right that it shares a lot of code with the DropoutLayer. Making AlphaDroupoutLayer inherit from DropoutLayer will not immediately reduce the duplication, though, we'd need some refactoring. Your current implementation requires the mask to be known, so we cannot simply call the super get_output_for(), we'd still have to implement it. It's possible to formulate alpha dropout in terms of the original dropout, by shifting the input, calling the super get_output_for(), shifting and scaling the output. Theano will not be able to optimize this, though.

So I'd suggest the following:

Create a read-only property self.q that encapsulates the conversion from p
In DropoutLayer, create a method apply_dropout(self, input, const=0) that rescales the input, creates the mask, multiplies the input with it, and adds const * (1 - mask) if const != 0. This can be called from the DropoutLayer (with const=0) and AlphaDropoutLayer (with const=self.alpha).
Make AlphaDropoutLayer inherit from DropoutLayer, overwrite get_output_for(). It will have minimal code duplication.

A nuisance is that a separate AlphaDropoutLayer means the spatial_dropout and dropout_locations convenience functions would have to be duplicated. But merging everything into a single DropoutLayer implementation would be ugly as well, as AlphaDropoutLayer with alpha=0 is still not the same as DropoutLayer, so we'd need at least two additional parameters for the DropoutLayer if we were to unify the implementations.

f0k · 2017-06-28T08:46:56Z

lasagne/layers/noise.py

+    Sets values to zero with probability p. Will also converge the remaining
+    neurons to mean 0 and a variance of 1 unit. See notes for disabling dropout
+    during testing.
+    Parameters


Missing blank line before the Parameters heading.

f0k · 2017-06-28T08:47:20Z

lasagne/layers/noise.py

+    ``False`` or not specified, dropout (and possibly scaling) is enabled.
+    Usually, you would use ``deterministic=False`` at train time and
+    ``deterministic=True`` at test time.
+    References


Missing blank line.

f0k · 2017-06-28T08:48:03Z

lasagne/layers/noise.py

+class AlphaDropoutLayer(Layer):
+    """Dropout layer.
+
+    Sets values to zero with probability p. Will also converge the remaining


Actually it sets values to alpha.

f0k · 2017-06-28T08:48:16Z

lasagne/layers/noise.py

+        super(AlphaDropoutLayer, self).__init__(incoming, **kwargs)
+        self._srng = RandomStreams()
+        self.p = p
+        self.alpha = -1.7580993408473766


I think alpha should be a parameter for the layer.

Oh, and we should use all available digits, because we can. And maybe call it differently, since it's called alpha prime in the paper, computed as minus lambda times alpha? Not sure.

The default value would be -1.0507009873554804934193349852946 * 1.6732632423543772848170429916717.

Good idea, I'll default it to that and create an additional parameter. The floating point precision is the difference between the calculated alpha and what I have.

f0k · 2017-06-28T08:48:50Z

lasagne/layers/noise.py

+                mask_shape = tuple(1 if a in shared_axes else s
+                                   for a, s in enumerate(mask_shape))
+
+            mask = self._srng.uniform(mask_shape, dtype=input.dtype) < q


This will result in a binary mask. Why not use binomial as in the original dropout layer?

f0k · 2017-06-28T08:50:22Z

docs/modules/layers/noise.rst

@@ -21,3 +21,6 @@ Noise layers
 .. autoclass:: GaussianNoiseLayer
    :members:

+.. autoclass:: AlphaDropoutLayer
+    :members:
+


I'd place it after the dropout layers.

f0k · 2017-06-28T08:50:30Z

lasagne/layers/noise.py

@@ -14,6 +14,7 @@
    "spatial_dropout",
    "dropout_locations",
    "GaussianNoiseLayer",
+    "AlphaDropoutLayer",


I'd place it after the dropout layers.

f0k · 2017-06-28T09:38:11Z

lasagne/layers/noise.py

+                bcast = tuple(bool(s == 1) for s in mask_shape)
+                mask = T.patternbroadcast(mask, bcast)
+
+            a = T.pow(q + self.alpha ** 2 * q * (1 - q), -0.5)


Should use p, not 1 - q, otherwise it might result in an unwanted upcast again. Should also use T.square instead of ** 2 to make the graph smaller.

f0k · 2017-06-28T09:38:22Z

lasagne/layers/noise.py

+
+            a = T.pow(q + self.alpha ** 2 * q * (1 - q), -0.5)
+
+            b = -a * (1 - q) * self.alpha


Again, should use p, not 1-q.

rfratila · 2017-06-28T21:13:55Z

I apologize for all of the commits @f0k I'm having some trouble squashing them together. Any help with this would be very much appreciated.

Thanks

f0k · 2017-06-30T10:49:34Z

Thank you for continuing, and sorry for the delay!

I'm having some trouble squashing them together. Any help with this would be very much appreciated.

I guess the easiest to get rid of all the merge commits in between will be to squash everything into a single commit for now. In the end I'd like two commits -- one refactoring the dropout layer and one introducing alpha dropout -- but we can take care of that later. To squash, do:

git reset --soft edbbc12  # set pointer to the first commit of the PR, leave other changes stages
git commit --amend --no-edit  # tack all other changes to the first commit
git push --force  # overwrite the current PR with the squashed version

I have made a local backup of the current state of your PR in case anything goes wrong.

I will review the code afterwards, as I don't know if the reviews will stick after squashing.

rfratila · 2017-06-30T14:25:15Z

Thanks for the assist! I think everything is still working as I left it. Should be ready for another round of reviews @f0k .

rfratila · 2017-07-18T19:04:15Z

@f0k Have you had a chance to look at the changes?

andreh7 · 2017-07-26T08:03:12Z

lasagne/layers/noise.py

+                                                **kwargs)
+        if alpha is None:
+            self.alpha = -1.0507009873554804934193349852946 * \
+                         1.6732632423543772848170429916717


would it make sense to mention in a comment that these two values are fixed point solutions to equations (4) and (5) in [1] for zero mean and unit variance input and that the analytic expressions for them are given in equation (14) ?

Furthermore, these value for alpha should be consistent with a SELU input layer, would a possible option be to pass a SELU instance for the parameter alpha and take the value from there, e.g.:

elif isinstance(alpha, SELU): self.alpha = - alpha.scale * alpha.scale_neg

?

You bring up good points. I will update the documentation accordingly. Regarding the option to pass a SELU instance and calculate the alpha based on that, I think that since AlphaDropout is meant to work with SELU, it only makes sense for it to be able to use SELU. I will make this change.

f0k · 2017-07-31T10:16:20Z

Sorry, had to finish my thesis first. You're including the commits of the SELU PR in your PR, can you please rebase onto master instead? It's more difficult to review otherwise. Cheers!

git fetch upstream master
git rebase upstream/master
git push --force  # assuming that the rebase went through without conflicts, otherwise git rebase --abort and let me know

…06.02515

rfratila · 2017-07-31T15:30:42Z

No worries and congrats! I rebased onto master. Resolved a few conflicts and pushed. Everything is as it was.

f0k · 2017-08-01T15:16:24Z

Oops, sorry, introduced a conflict by merging #871. Will review and can then rebase your PR onto the new master myself, resolving the conflict. Thank you for rebasing onto yesterday's master!

f0k

Thank you for rebasing, much easier to review now! There are still a couple of issues with the code and the documentation. The latter will need some thinking to explain well what the layer does.

f0k · 2017-08-01T15:20:37Z

lasagne/tests/layers/test_noise.py

+    def test_alpha(self, input_layer):
+        input = theano.shared(numpy.ones((100, 100)))
+        from lasagne.layers.noise import AlphaDropoutLayer
+        layer = AlphaDropoutLayer(input_layer, alpha=-1.7580993408473766)


But that's just the default value for alpha. It should test another value, otherwise you never know whether that value was actually respected or it took the default anyway :)

f0k · 2017-08-01T15:21:25Z

lasagne/tests/layers/test_noise.py

+        input = theano.shared(numpy.ones((100, 100)))
+        from lasagne.layers.noise import AlphaDropoutLayer
+        from lasagne.nonlinearities import selu  # SELU instance
+        layer = AlphaDropoutLayer(input_layer, alpha=selu)


Same here, it should be a SELU instance with nonstandard scales. You can then test whether self.alpha was set correctly, in addition to checking the output.

f0k · 2017-08-01T15:26:00Z

lasagne/layers/noise.py

+
+        self.q = one - self.p
+        if self.rescale:
+            input /= self.q


self.q should not be set here, it should become a property of the layer (i.e., not an attribute):

@property def q(self): return T.constant(1) - self.p

Then you never need to set self.q and can use it in apply_dropout() and in get_output_for().

f0k · 2017-08-01T15:27:29Z

lasagne/layers/noise.py

+            mask = T.patternbroadcast(mask, bcast)
+
+        if const != 0:
+            return (input * mask) + (const * (1 - mask))


This has to use T.constant(1) instead of 1 again.

f0k · 2017-08-01T15:29:48Z

lasagne/layers/noise.py

+    """Dropout layer.
+
+    Sets values to alpha. Will also converge the remaining
+    neurons to mean 0 and a variance of 1 unit. See notes for disabling dropout


Sorry, this explanation is a bit imprecise. Can you be more explicit what the layer does? (If you can't find anything, I'll try that myself.)

f0k · 2017-08-01T15:30:13Z

lasagne/layers/noise.py

+    incoming : a :class:`Layer` instance or a tuple
+        the layer feeding into this layer, or the expected input shape
+    p : float or scalar tensor
+        The probability of setting a value to zero


Only that it's not set to zero...

f0k · 2017-08-01T15:31:05Z

lasagne/layers/noise.py

+    alpha : float or SELU instance
+        The default values are fixed point solutions to equations (4) and (5)
+        in [1] for zero mean and unit variance input. The analytic expressions
+        for them are given in equation (14) also in [1].


Should probably first state what role alpha plays for the operation of the layer. Also the reference must be given as [1]_, not [1].

f0k · 2017-08-01T15:32:08Z

lasagne/layers/noise.py

+    -----
+    The alpha dropout layer is a regularizer that randomly sets input values to
+    zero and also applies a normalization function to bring the mean to 0
+    and the variance to 1; see [1]_for why this might improve


That used to reference the original dropout paper. Either the reference or the text should be changed.

f0k · 2017-08-01T15:34:12Z

lasagne/layers/noise.py

+            a = T.pow(self.q + T.square(self.alpha) * self.q * self.p, -0.5)
+            b = -a * self.p * self.alpha
+
+            return T.cast(a * mask + b, input.dtype)


Why do you need this cast? Can you trace back why it is necessary? If possible, it would be better to cast earlier, otherwise Theano might execute part of the graph on CPU and then only cast and copy the results back to GPU afterwards.

I believe it returns a float64 so I just want to make sure the dtype of the output is the same as the dtype of the input.

rfratila · 2017-08-03T16:48:04Z

@f0k I have made the changes that address all your comments. Let me know what you think.

f0k · 2017-08-07T05:04:13Z

Thank you, the code looks good now! For the docstring I'll need to have a closer look, unfortunately I'll be abroad without web access for the next two weeks -- sorry to let you wait, I'll get back to this when I return!

rfratila · 2017-09-13T21:17:04Z

Have you had a chance to review the docstring in more detail @f0k ? Please let me know if anything is inaccurate. Thanks!

f0k requested changes Jun 28, 2017

View reviewed changes

rfratila force-pushed the feature_alphadropout branch from 2010d19 to f92c309 Compare June 30, 2017 14:22

andreh7 reviewed Jul 26, 2017

View reviewed changes

rfratila and others added 6 commits July 31, 2017 11:17

AlphaDropoutLayer implemented as proposed in https://arxiv.org/abs/17…

d85a1c8

…06.02515

Alpha can be calculated from SELU instance. Documentaion update

f4fe22f

AlphaDropoutLayer implemented as proposed in https://arxiv.org/abs/17…

50b92a1

…06.02515

Alpha can be calculated from SELU instance. Documentaion update

8cda69e

Test selu alpha extraction

026786e

PEP8 fix

4304373

rfratila force-pushed the feature_alphadropout branch from 73f220a to 4304373 Compare July 31, 2017 15:22

f0k requested changes Aug 1, 2017

View reviewed changes

rfratila added 2 commits August 1, 2017 12:35

Documentation and test updates

7538f4a

Remove T.cast from Alphadropout outputs

33ca51a


		a = T.pow(q + self.alpha ** 2 * q * (1 - q), -0.5)

		b = -a * (1 - q) * self.alpha

Feature alpha dropout #855

Are you sure you want to change the base?

Feature alpha dropout #855

Conversation

rfratila commented Jun 28, 2017

rfratila commented Jun 28, 2017

f0k left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f0k Jun 28, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfratila commented Jun 28, 2017

f0k commented Jun 30, 2017

rfratila commented Jun 30, 2017 • edited Loading

rfratila commented Jul 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

f0k commented Jul 31, 2017

rfratila commented Jul 31, 2017

f0k commented Aug 1, 2017 • edited Loading

f0k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rfratila Aug 1, 2017 • edited Loading

Choose a reason for hiding this comment

rfratila commented Aug 3, 2017

f0k commented Aug 7, 2017

rfratila commented Sep 13, 2017

f0k left a comment •

edited

Loading

f0k Jun 28, 2017 •

edited

Loading

rfratila commented Jun 30, 2017 •

edited

Loading

f0k commented Aug 1, 2017 •

edited

Loading

rfratila Aug 1, 2017 •

edited

Loading