Skip to content

[MRG] added leaky_relu activation and derivative to multilayer_perceptron #10665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

[MRG] added leaky_relu activation and derivative to multilayer_perceptron #10665

wants to merge 4 commits into from

Conversation

dpstart
Copy link

@dpstart dpstart commented Feb 20, 2018

Implementation of the leaky ReLU activation function and its derivative, to use as activation in multilayer_perceptron neurons.

This is done as a possible solution to the dying ReLU problem, a situation in which the ReLU function always outputs 0 for any given input.

The leaky ReLU function allows a small gradient when the unit is not active.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from this needing tests, I see two problems. One is a question of how mature the technology is and whether we should be maintaining various recent inventions in this space. The other is that the user can't set alpha here. Should we support custom (activation, gradient) instead?

@amueller
Copy link
Member

I'm -1 on adding this. We don't even have dropout now, right? I feel recent advances are better suited for Keras and we don't really want or need to reimplement Keras here.

@rth
Copy link
Member

rth commented Jun 16, 2019

So essentially this adds an activation function and a derivative to neural_network.{ACTIVATIONS,DERIVATIVES}. Maybe we could just document that and provide a simple example? That way anyone could add the activation function they want without much maintenance effort on our side. Sometimes it's still handy to do a simple NN without installing tensorflow or pytorch.

@amueller
Copy link
Member

amueller commented Aug 6, 2019

there's no way to change the alpha, though? And the question is a bit what "simple" means. Is there evidence that this really helps in practice? In particular for dense networks?

@amueller amueller added the Needs Decision Requires decision label Aug 6, 2019
@rth
Copy link
Member

rth commented Aug 26, 2019

there's no way to change the alpha, though?

Yes, that is a significant limitation of this approach.

And the question is a bit what "simple" means. Is there evidence that this really helps in practice?

Haven't searched in the literature but this is the third PR or issue about it, so there is certainly interest. Though following the dev meeting today it seems there was a consensus on not adding new DL features.

Another approach requiring a 1 line change for supporting custom activation function can be found in #14815

@aurel-av
Copy link

aurel-av commented Dec 3, 2020

I think "delta[Z < 0] = alpha" should be replaced by " delta[Z < 0] *= alpha " in the computation of the derivative in order to be consistent with other derivative computations. For example, for the tanh, the function inplace derivative is computed as : delta *= (1 - Z ** 2). So, delta needs to be multiplied by the derivation function.
I used the california housing database to test this leaky_relu function. It performs well with this modification.

@adrinjalali adrinjalali deleted the branch scikit-learn:master January 22, 2021 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants