-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] added leaky_relu activation and derivative to multilayer_perceptron #10665
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from this needing tests, I see two problems. One is a question of how mature the technology is and whether we should be maintaining various recent inventions in this space. The other is that the user can't set alpha here. Should we support custom (activation, gradient) instead?
I'm -1 on adding this. We don't even have dropout now, right? I feel recent advances are better suited for Keras and we don't really want or need to reimplement Keras here. |
So essentially this adds an activation function and a derivative to |
there's no way to change the alpha, though? And the question is a bit what "simple" means. Is there evidence that this really helps in practice? In particular for dense networks? |
Yes, that is a significant limitation of this approach.
Haven't searched in the literature but this is the third PR or issue about it, so there is certainly interest. Though following the dev meeting today it seems there was a consensus on not adding new DL features. Another approach requiring a 1 line change for supporting custom activation function can be found in #14815 |
I think "delta[Z < 0] = alpha" should be replaced by " delta[Z < 0] *= alpha " in the computation of the derivative in order to be consistent with other derivative computations. For example, for the tanh, the function inplace derivative is computed as : delta *= (1 - Z ** 2). So, delta needs to be multiplied by the derivation function. |
Implementation of the leaky ReLU activation function and its derivative, to use as activation in multilayer_perceptron neurons.
This is done as a possible solution to the dying ReLU problem, a situation in which the ReLU function always outputs 0 for any given input.
The leaky ReLU function allows a small gradient when the unit is not active.