Skip to content

MLP Classifier "Logistic" activation function providing ~constant prediction probabilities for all inputs when predicting quadratic function #31235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KyleEMol opened this issue Apr 21, 2025 · 1 comment
Labels

Comments

@KyleEMol
Copy link

KyleEMol commented Apr 21, 2025

Describe the bug

Repeatedly the sigmoid activation function produces very similar (multiple dp) outputs for the prediction probabilities, seemingly similar around the average of the predicted value, similar to a linear function. It works when predicting a linear function, but higher order tends to cause issues.

Steps/Code to Reproduce

from sklearn.neural_network import MLPClassifier
import numpy as np

np.random.seed(1)
Data_X = (np.random.random((500,2)))
Data_Y = np.array([int((x[0] + ((2*(x[1]-0.5))**2  - 0.75))>=0) for x in Data_X])

NN = MLPClassifier(hidden_layer_sizes = (20,20),activation = "logistic", random_state = 42)
NN.fit(Data_X,Data_Y)
print(NN.predict(np.array(Data_X[:20])))
print(Data_Y[:20])

Expected Results

The prediction does not resemble target data

Actual Results

[1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]
[0 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 0]

Versions

System:
    python: 3.13.2 (tags/v3.13.2:4f8bb39, Feb  4 2025, 15:23:48) [MSC v.1942 64 bit (AMD64)]
executable: c:\Users\km\AppData\Local\Programs\Python\Python313\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.6.1
          pip: 25.0.1
   setuptools: None
        numpy: 2.2.3
        scipy: 1.15.2
       Cython: None
       pandas: 2.2.3
   matplotlib: 3.10.1
       joblib: 1.4.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libscipy_openblas
       filepath: C:\Users\km\AppData\Local\Programs\Python\Python313\Lib\site-packages\numpy.libs\libscipy_openblas64_-43e11ff0749b8cbe0a615c9cf6737e0e.dll
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: vcomp
       filepath: C:\Users\km\AppData\Local\Programs\Python\Python313\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libscipy_openblas
       filepath: C:\Users\km\AppData\Local\Programs\Python\Python313\Lib\site-packages\scipy.libs\libscipy_openblas-f07f5a5d207a3a47104dca54d6d0c86a.dll
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell
@KyleEMol KyleEMol added Bug Needs Triage Issue requires triage labels Apr 21, 2025
@ogrisel
Copy link
Member

ogrisel commented Apr 25, 2025

It's quite likely that the initialization scheme of the parameters implemented in scikit-learn is not optimal for the logistic activation function, especially when using a few hidden units.

If you increase the hidden_layer_sizes, the problem somewhat goes away:

...
NN = MLPClassifier(hidden_layer_sizes=(200, 200), activation="logistic", random_state=42)
...
[0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 0]
[0 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 0]

and off-course if you use a more modern activation function such as "relu" then the model can fit the training set:

...
NN = MLPClassifier(hidden_layer_sizes=(200, 200), activation="relu", random_state=42, max_iter=1000)
...
[0 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 0]
[0 0 1 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 0]

I would not consider this a bug as it's known that the "logistic" activation function is harder to optimize for than "tanh" since the 90s. Modern alternatives, like the "relu" activation function used by default in scikit-learn, empirically lead to better fits. We mostly keep in scikit-learn for educational/historical purpose, as this was the original activation function used by NN pioneers in the 80s (or even before that). Maybe we could make that more explicit in the docstring and/or the user guide.

@ogrisel ogrisel closed this as not planned Won't fix, can't repro, duplicate, stale Apr 25, 2025
@ogrisel ogrisel removed the Needs Triage Issue requires triage label Apr 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants