Trees fitted in 1.3.2 produce different outcome when evaluated in 1.4 #28229

mosalx · 2024-01-23T13:47:12Z

Describe the bug

We have a number of ensemble tree models in production fitted using 1.3.2 or older. Many of these models produce different outcomes when evaluated on the same data in sklearn 1.3.2 and 1.4.

Analysis led me to the change in DecisionTreeClassifier.predict_proba() from this PR: #27639

Based on this conversation I understand that in order to support monotonicity constraint, probabilities were allowed to be outside of [0, 1] bounds as see in this diff:

Tree splitting criterion was modified to ensure that probabilities are within [0, 1] bounds. This works only if the user fits the tree in 1.4 and evaluates it in the same version. I was not able to find an example where a tree fitted in 1.4 produces probabilities outside of [0,1]. However, trees fitted in older versions do violate this constraint.

Would you consider rolling back probability normalization for trees that don't have monotonicity constraint so this method produces correctly normalized probabilities on trees fitted in prior versions? I don't think scikit-learn makes any guarantees about compatibility of estimators created in older versions. However, it is quite disruptive for anyone who maintains older tree models essentially forcing people to re-fit them if they switch to 1.4.

Note: I have not provided code to reproduce because it requires 2 different versions of sklearn.

Steps/Code to Reproduce

N/A

Expected Results

probabilities add up to 1

Actual Results

Probabilities do not add up to 1

Versions

System:
    python: 3.11.7 | packaged by conda-forge | (main, Dec 23 2023, 14:27:59) [MSC v.1937 64 bit (AMD64)]
executable: C:\Users\smozharov\AppData\Local\miniconda3\envs\env311_3a\python.exe
   machine: Windows-10-10.0.22631-SP0

Python dependencies:
      sklearn: 1.4.0
          pip: 23.3.2
   setuptools: 69.0.3
        numpy: 1.26.3
        scipy: 1.12.0
       Cython: 3.0.7
       pandas: 2.2.0
   matplotlib: 3.8.2
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libomp
       filepath: C:\Users\smozharov\AppData\Local\miniconda3\envs\env311_3a\Library\bin\libomp.dll
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libblas
       filepath: C:\Users\smozharov\AppData\Local\miniconda3\envs\env311_3a\Library\bin\libblas.dll
        version: 0.3.25
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: vcomp
       filepath: C:\Users\smozharov\AppData\Local\miniconda3\envs\env311_3a\vcomp140.dll

The text was updated successfully, but these errors were encountered:

adrinjalali · 2024-01-23T13:57:55Z

That's expected. We only support loading models with the same scikit-learn version as they were persisted: https://scikit-learn.org/stable/model_persistence.html

adrinjalali · 2024-01-23T13:58:53Z

We can reopen the issue if you provide a minimal reproducible with the latest version.

mosalx · 2024-01-24T14:31:35Z

Thanks for the quick response! I looked deeper into the history of changes and understood the reasoning behind the change I suggested to revert. Cython Tree class now normalizes node values when the tree is fitted, so additional normalization in DecisionTreeClassifier is no longer necessary. I now think of this as a good change.
We were able to make trees fitted in previous versions compatible with sklearn 1.4 by normalizing tree values after deserializing them. I know it is not recommended or advised as described in the user guide: https://scikit-learn.org/stable/model_persistence.html#security-maintainability-limitations. However, in our circumstances keeping an existing model unchanged for as long as possible (by maintaining custom serializers and extensive testing) is more productive than re-fitting it.

mosalx added Bug Needs Triage Issue requires triage labels Jan 23, 2024

adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Jan 23, 2024

ogrisel removed the Needs Triage Issue requires triage label Apr 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trees fitted in 1.3.2 produce different outcome when evaluated in 1.4 #28229

Trees fitted in 1.3.2 produce different outcome when evaluated in 1.4 #28229

mosalx commented Jan 23, 2024

adrinjalali commented Jan 23, 2024

adrinjalali commented Jan 23, 2024

mosalx commented Jan 24, 2024

Trees fitted in 1.3.2 produce different outcome when evaluated in 1.4 #28229

Trees fitted in 1.3.2 produce different outcome when evaluated in 1.4 #28229

Comments

mosalx commented Jan 23, 2024

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

adrinjalali commented Jan 23, 2024

adrinjalali commented Jan 23, 2024

mosalx commented Jan 24, 2024