Closed
Description
Describe the bug
While porting scikit-survival to support scikit-learn 1.6, I noticed that one test failed due to trees in a random forest having a different structure (see this GitHub Actions log).
Using git bisect, I could determine that #29458 is the culprit.
The PR imports log
from libc.math
:
Previously, log
was imported from ._utils
:
scikit-learn/sklearn/tree/_splitter.pyx
Line 10 in 215be2e
which actually implements log2
:
scikit-learn/sklearn/tree/_utils.pyx
Lines 65 to 66 in 215be2e
Replacing
from libc.math cimport isnan, log
with
from libc.math cimport isnan, log2 as log
fixes the problem.
Steps/Code to Reproduce
from collections import namedtuple
import numpy as np
from sksurv.datasets import load_whas500
from sksurv.column import standardize, categorical_to_numeric
from sksurv.tree import SurvivalTree
from sklearn.tree import export_graphviz
DataSetWithNames = namedtuple("DataSetWithNames", ["x", "y", "names", "x_data_frame"])
def _make_whas500(with_mean=True, with_std=True, to_numeric=False):
x, y = load_whas500()
if with_mean:
x = standardize(x, with_std=with_std)
if to_numeric:
x = categorical_to_numeric(x)
names = ["(Intercept)"] + x.columns.tolist()
return DataSetWithNames(x=x.values, y=y, names=names, x_data_frame=x)
whas500 = _make_whas500(to_numeric=True)
rng = np.random.RandomState(42)
mask = rng.binomial(n=1, p=0.15, size=whas500.x.shape)
mask = mask.astype(bool)
X = whas500.x.copy()
X[mask] = np.nan
X_train = X[:400]
y_train = whas500.y[:400]
weights = np.array([
4,5,1,1,2,1,1,1,2,1,0,1,0,1,4,2,0,0,1,0,1,0,1,2,1,1,1,1,1,0,0,0,1,1,3,1,2,1,2,1,0,3,1,0,0,3,0,1,4,1,0,0,2,1,0,1,0,
1,0,2,1,0,1,1,4,4,2,1,2,2,4,2,1,1,2,1,0,1,0,1,0,0,1,1,1,1,1,1,1,1,1,0,1,3,0,3,0,1,1,1,3,0,1,2,2,3,0,0,1,1,2,0,0,2,
0,0,1,0,0,1,2,1,2,0,1,1,0,0,0,2,1,1,2,1,0,1,0,1,1,0,2,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,1,2,0,0,0,0,0,0,0,2,3,0,0,2,0,
0,1,2,0,1,0,1,2,0,0,0,0,1,2,0,0,2,0,0,0,1,0,2,2,0,1,0,1,4,1,0,0,3,0,1,1,0,1,1,0,2,0,2,1,4,0,1,0,1,0,2,1,3,1,0,0,2,
1,1,0,2,1,2,2,0,2,0,0,1,1,1,3,0,2,2,0,0,1,3,0,2,0,0,1,1,4,1,0,0,1,1,2,1,1,1,2,0,2,1,1,1,2,0,0,0,1,0,0,2,0,0,0,0,0,
0,3,1,2,0,3,1,4,1,2,0,0,1,1,2,2,1,1,3,1,1,1,1,1,0,0,0,0,2,2,1,2,0,2,1,2,0,2,0,1,0,0,1,1,1,1,1,3,1,2,0,2,2,2,1,3,1,
0,0,0,0,0,1,0,2,2,1,1,2,0,0,0,2,2,1,0,1,0,2,0,1,0,2,2,0,3,2,2,1,0,3,0,0,2,2,0,1,0,2,1,1,0,0,2,1,1,0,0,2,1,0,0,2,2,3
], dtype=float)
t = SurvivalTree(
low_memory=True,
max_depth=3,
max_features='sqrt',
max_leaf_nodes=None,
min_samples_leaf=3,
min_samples_split=6,
min_weight_fraction_leaf=0.0,
random_state=1608637542,
splitter='best',
)
t.fit(X_train, y_train, weights)
export_graphviz(
t, "tree.dot", label="none", impurity=False
)
Expected Results
Actual Results
Versions
System:
python: 3.13.0 (main, Oct 7 2024, 23:47:22) [Clang 18.1.8 ]
executable: /…/.venv/bin/python
machine: macOS-15.2-arm64-arm-64bit-Mach-O
Python dependencies:
sklearn: 1.6.0
pip: None
setuptools: 75.6.0
numpy: 2.2.1
scipy: 1.14.1
Cython: 3.0.11
pandas: 2.2.3
matplotlib: None
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 8
prefix: libomp
filepath: /opt/homebrew/Cellar/libomp/19.1.6/lib/libomp.dylib
version: None