You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Repeatedly the sigmoid activation function produces very similar (multiple dp) outputs for the prediction probabilities, seemingly similar around the average of the predicted value, similar to a linear function. It works when predicting a linear function, but higher order tends to cause issues.
It's quite likely that the initialization scheme of the parameters implemented in scikit-learn is not optimal for the logistic activation function, especially when using a few hidden units.
If you increase the hidden_layer_sizes, the problem somewhat goes away:
I would not consider this a bug as it's known that the "logistic" activation function is harder to optimize for than "tanh" since the 90s. Modern alternatives, like the "relu" activation function used by default in scikit-learn, empirically lead to better fits. We mostly keep in scikit-learn for educational/historical purpose, as this was the original activation function used by NN pioneers in the 80s (or even before that). Maybe we could make that more explicit in the docstring and/or the user guide.
Describe the bug
Repeatedly the sigmoid activation function produces very similar (multiple dp) outputs for the prediction probabilities, seemingly similar around the average of the predicted value, similar to a linear function. It works when predicting a linear function, but higher order tends to cause issues.
Steps/Code to Reproduce
Expected Results
The prediction does not resemble target data
Actual Results
Versions
The text was updated successfully, but these errors were encountered: