Skip to content

scikit-learn 1.6 changed behavior of growing trees #30554

Closed
@sebp

Description

@sebp

Describe the bug

While porting scikit-survival to support scikit-learn 1.6, I noticed that one test failed due to trees in a random forest having a different structure (see this GitHub Actions log).

Using git bisect, I could determine that #29458 is the culprit.

The PR imports log from libc.math:

from libc.math cimport isnan, log

Previously, logwas imported from ._utils:

from ._utils cimport log

which actually implements log2:

cdef inline float64_t log(float64_t x) noexcept nogil:
return ln(x) / ln(2.0)

Replacing

 from libc.math cimport isnan, log 

with

from libc.math cimport isnan, log2 as log

fixes the problem.

Steps/Code to Reproduce

from collections import namedtuple
import numpy as np
from sksurv.datasets import load_whas500
from sksurv.column import standardize, categorical_to_numeric
from sksurv.tree import SurvivalTree

from sklearn.tree import export_graphviz

DataSetWithNames = namedtuple("DataSetWithNames", ["x", "y", "names", "x_data_frame"])


def _make_whas500(with_mean=True, with_std=True, to_numeric=False):
    x, y = load_whas500()
    if with_mean:
        x = standardize(x, with_std=with_std)
    if to_numeric:
        x = categorical_to_numeric(x)
    names = ["(Intercept)"] + x.columns.tolist()
    return DataSetWithNames(x=x.values, y=y, names=names, x_data_frame=x)


whas500 = _make_whas500(to_numeric=True)

rng = np.random.RandomState(42)
mask = rng.binomial(n=1, p=0.15, size=whas500.x.shape)
mask = mask.astype(bool)
X = whas500.x.copy()
X[mask] = np.nan

X_train = X[:400]
y_train = whas500.y[:400]
weights = np.array([
    4,5,1,1,2,1,1,1,2,1,0,1,0,1,4,2,0,0,1,0,1,0,1,2,1,1,1,1,1,0,0,0,1,1,3,1,2,1,2,1,0,3,1,0,0,3,0,1,4,1,0,0,2,1,0,1,0,
    1,0,2,1,0,1,1,4,4,2,1,2,2,4,2,1,1,2,1,0,1,0,1,0,0,1,1,1,1,1,1,1,1,1,0,1,3,0,3,0,1,1,1,3,0,1,2,2,3,0,0,1,1,2,0,0,2,
    0,0,1,0,0,1,2,1,2,0,1,1,0,0,0,2,1,1,2,1,0,1,0,1,1,0,2,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,2,0,0,0,0,0,0,0,2,3,0,0,2,0,
    0,1,2,0,1,0,1,2,0,0,0,0,1,2,0,0,2,0,0,0,1,0,2,2,0,1,0,1,4,1,0,0,3,0,1,1,0,1,1,0,2,0,2,1,4,0,1,0,1,0,2,1,3,1,0,0,2,
    1,1,0,2,1,2,2,0,2,0,0,1,1,1,3,0,2,2,0,0,1,3,0,2,0,0,1,1,4,1,0,0,1,1,2,1,1,1,2,0,2,1,1,1,2,0,0,0,1,0,0,2,0,0,0,0,0,
    0,3,1,2,0,3,1,4,1,2,0,0,1,1,2,2,1,1,3,1,1,1,1,1,0,0,0,0,2,2,1,2,0,2,1,2,0,2,0,1,0,0,1,1,1,1,1,3,1,2,0,2,2,2,1,3,1,
    0,0,0,0,0,1,0,2,2,1,1,2,0,0,0,2,2,1,0,1,0,2,0,1,0,2,2,0,3,2,2,1,0,3,0,0,2,2,0,1,0,2,1,1,0,0,2,1,1,0,0,2,1,0,0,2,2,3
], dtype=float)

t = SurvivalTree(
    low_memory=True,
    max_depth=3,
    max_features='sqrt',
    max_leaf_nodes=None,
    min_samples_leaf=3,
    min_samples_split=6,
    min_weight_fraction_leaf=0.0,
    random_state=1608637542,
    splitter='best',
)
t.fit(X_train, y_train, weights)

export_graphviz(
    t, "tree.dot", label="none", impurity=False
)

Expected Results

tree-1-5

Actual Results

tree-1-6

Versions

System:
    python: 3.13.0 (main, Oct  7 2024, 23:47:22) [Clang 18.1.8 ]
executable: /…/.venv/bin/python
   machine: macOS-15.2-arm64-arm-64bit-Mach-O

Python dependencies:
      sklearn: 1.6.0
          pip: None
   setuptools: 75.6.0
        numpy: 2.2.1
        scipy: 1.14.1
       Cython: 3.0.11
       pandas: 2.2.3
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /opt/homebrew/Cellar/libomp/19.1.6/lib/libomp.dylib
        version: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions