Skip to content

Different Python version causes a different distribution of classification result #31206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
GloC99 opened this issue Apr 15, 2025 · 6 comments
Closed
Labels

Comments

@GloC99
Copy link

GloC99 commented Apr 15, 2025

Describe the bug

Running the same code using Python 3.10 and Python 3.13 with n_jobs > 1 had a variety of result. Python 3.10 and Python 3.13 also has different distributions.

Steps/Code to Reproduce

import numpy as np
import random
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, recall_score, confusion_matrix

# Control the randomness
random.seed(0)  
np.random.seed(0)

iris = load_iris()  
x, y = iris.data, iris.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.20, random_state=0)

# Define and create a model
model = RandomForestClassifier(
    n_estimators=np.int64(101),
    criterion='gini',
    max_depth=np.int64(31),
    min_samples_split=7.291122019556396e-304,
    min_samples_leaf=np.int64(14876671),
    min_weight_fraction_leaf=0.0,
    max_features=None,
    max_leaf_nodes=None,
    min_impurity_decrease=0.0,
    bootstrap=True,
    oob_score=False,
    n_jobs= np.int64(255),
    random_state=0,
    verbose=np.int64(0),
    warm_start=False,
    class_weight='balanced_subsample',
    ccp_alpha=0.0,
    max_samples=None)

model.fit(x_train, y_train)

# Evaluate model
y_pred = model.predict(x_test)
print("Accuracy: ", accuracy_score(y_test,
                                    y_pred))
print("Recall:",
    recall_score(y_test, y_pred, average='micro'))
# Print confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred))

Expected Results

If n_jobs is 1, the result is:

    Accuracy:  0.43333333333333335
    Recall: 0.43333333333333335
    Confusion Matrix:
    [[ 0 11  0]
    [ 0 13  0]
    [ 0  6  0]]

Actual Results

When the program is run 10,000 times:
n_jobs=255, Python 3.10 has two possible results:

    Group:
    Accuracy:  0.43333333333333335
    Recall: 0.43333333333333335
    Confusion Matrix:
    [[ 0 11  0]
    [ 0 13  0]
    [ 0  6  0]]
    Count: 9887
    
    Group:
    Accuracy:  0.36666666666666664
    Recall: 0.36666666666666664
    Confusion Matrix:
    [[11  0  0]
    [13  0  0]
    [ 6  0  0]]
    Count: 113

n_jobs=255, Python 3.13 has three possible results:

    Group:
    Accuracy:  0.36666666666666664
    Recall: 0.36666666666666664
    Confusion Matrix:
    [[11  0  0]
    [13  0  0]
    [ 6  0  0]]
    Count: 7790
    
    Group:
    Accuracy:  0.43333333333333335
    Recall: 0.43333333333333335
    Confusion Matrix:
    [[ 0 11  0]
    [ 0 13  0]
    [ 0  6  0]]
    Count: 1965
    
    Group:
    Accuracy:  0.2
    Recall: 0.2
    Confusion Matrix:
    [[ 0  0 11]
    [ 0  0 13]
    [ 0  0  6]]
    Count: 245

Versions

System:
    python: 3.13.2 (main, Mar 27 2025, 14:05:19) [GCC 11.4.0]
executable: /opt/python/3.13.2/bin/python3.13
   machine: Linux-5.15.0-122-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.6.1
          pip: 24.3.1
   setuptools: 75.6.0
        numpy: 2.2.4
        scipy: 1.15.2
       Cython: None
       pandas: 2.2.3
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.6.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /users/GloC99/.local/lib/python3.13/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /users/GloC99/.local/lib/python3.13/site-packages/scipy.libs/libscipy_openblas-68440149.so
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /users/GloC99/.local/lib/python3.13/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

===============

System:
    python: 3.10.17 (main, Apr 10 2025, 12:04:30) [GCC 11.4.0]
executable: /opt/python/3.10.17/bin/python3.10
   machine: Linux-5.15.0-122-generic-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.6.1
          pip: 24.3.1
   setuptools: 75.6.0
        numpy: 2.2.0
        scipy: 1.14.1
       Cython: None
       pandas: 2.2.3
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /users/GloC99/.local/lib/python3.10/site-packages/numpy.libs/libscipy_openblas64_-6bb31eeb.so
        version: 0.3.28
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libscipy_openblas
       filepath: /users/GloC99/.local/lib/python3.10/site-packages/scipy.libs/libscipy_openblas-c128ec02.so
        version: 0.3.27.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /users/GloC99/.local/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
@GloC99 GloC99 added Bug Needs Triage Issue requires triage labels Apr 15, 2025
@GAVARA-PRABHAS-RAM
Copy link

Hi, I'd like to work on this issue.
Can I go ahead and open a PR?

@GloC99
Copy link
Author

GloC99 commented Apr 17, 2025

Hi, I'd like to work on this issue. Can I go ahead and open a PR?

If you need any more information, please let me know. If you need, I also have a script that will run the program many time and counts the repetition of different group.

@GAVARA-PRABHAS-RAM
Copy link

i didnt understand what exactly i have to do??

@GloC99
Copy link
Author

GloC99 commented Apr 23, 2025

i didnt understand what exactly i have to do??

It would be great if you can open a PR. If you link it here I can have a look and see what I can contribute?

@ogrisel
Copy link
Member

ogrisel commented Apr 24, 2025

i didnt understand what exactly i have to do??

@GAVARA-PRABHAS-RAM First, we need to analyze the root cause of the behavior reported by @GloC99 and assess whether this is a bug or not.

@GloC99 thanks for the report. I confirm I can reproduce locally with a fresh conda-forge based environment running Python 3.13 on macOS. But, strangely, I could not reproduce using my usual dev env running Python 3.12 and scikit-learn main built from source.

Some preliminary remarks:

  • fitting with n_jobs=255 on a machine with 16 cores is likely to be either useless or even detrimental from a performance point of view.

  • fitting min_samples_leaf=np.int64(14876671) on the 150 data points of the iris dataset means that the trees will never perform any split and the single leaf will consistently predict the marginal class frequencies observed on the (bootstrapped) training sets. This can be checked via:

>>> np.unique([e.tree_.node_count for e in model.estimators_])
array([1])

I will try to investigate a bit further to understand the source of the non-deterministic behavior now that I can reproduce.

@ogrisel ogrisel added Needs Investigation Issue requires investigation and removed Needs Triage Issue requires triage labels Apr 24, 2025
@ogrisel ogrisel closed this as not planned Won't fix, can't repro, duplicate, stale Apr 24, 2025
@ogrisel
Copy link
Member

ogrisel commented Apr 24, 2025

I think I understand. Because this model is fit with class_weight='balanced_subsample', it means that all trees are fit with exactly equally balanced training sets 1/3 of each 3 classes
(this is possible because, currently, bagging is implemented with sample_weight). You can confirm that if you comment out the line that sets class_weight='balanced_subsample': the outcome of the execution of the code becomes deterministic again.

This can also be confirmed with the fact that class frequencies stored in the single leaf value attribute are all 1/n_classes:

>>> np.allclose(
...     np.vstack([e.tree_.value.squeeze() for e in model.estimators_]),
...     np.full(shape=(model.n_estimators, 3), fill_value=1/3))
... )
True

So the individual trees return identically tied predict_proba values. But then, when n_jobs > 1, the forest aggregates the predict_proba returned by the trees in parallel using Python threads and accumulate the results in shared memory:

Parallel(n_jobs=n_jobs, verbose=self.verbose, require="sharedmem")(
delayed(_accumulate_prediction)(e.predict_proba, X, all_proba, lock)
for e in self.estimators_
)

which calls into:

def _accumulate_prediction(predict, X, out, lock):
"""
This is a utility function for joblib's Parallel.
It can't go locally in ForestClassifier or ForestRegressor, because joblib
complains that it cannot pickle it when placed there.
"""
prediction = predict(X, check_input=False)
with lock:
if len(out) == 1:
out[0] += prediction
else:
for i in range(len(out)):
out[i] += prediction[i]

Because floating point operations have rounding errors, the ordering of the operations matters, and it is not deterministic when n_jobs > 1 and depend on thread scheduling hence the observed dependency on the Python version.

As a result, the predict function which returns np.argmax(y_pred_proba, axis=1) is unstable: the exactly tied predictions are broken depending on rounding errors that non-deterministic because of the use of threads when accumulating the predicted probabilities.

I therefore think this is not a bug. If you want to get deterministic predictions, you can call model.set_params(n_jobs=1) (after fitting).

In the future, we could change the code to aggregate the parallel predictions in a deterministic order while not allocating too much memory for temporary prediction array in case the forest has a very large number of trees by using the return_as="generator" feature of joblib. However, this is quite a new feature of (released a year ago) so I would rather not depend on it yet to follow our minimum dependency version support guidelines.

@ogrisel ogrisel removed the Needs Investigation Issue requires investigation label Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants