FIX Draw indices using sample_weight in Forest #31529

antoinebaker · 2025-06-12T08:19:04Z

Part of #16298. Similar to #31414 (Bagging estimators) but for Forest estimators.

What does this implement/fix? Explain your changes.

When subsampling is activated (bootstrap=True), sample_weight are now used as probabilities to draw the indices. Forest estimators then pass the statistical repeated/weighted equivalence test.

Comments

This PR does not fix Forest estimators when bootstrap=False (no subsampling). sample_weight are still passed to the decision trees. Forest estimators then fail the statistical repeated/weighted equivalence test because the individual trees
also fail this test (probably because of tied splits in decision trees #23728).

TODO

choose how to generate indices in the sample_weight=None case
fix relative (float) max_samples as done in FIX Draw indices using sample_weight in Bagging #31414
docstrings
how to handle balanced_subsample option
changelog

github-actions · 2025-06-12T08:19:52Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a55643b. Link to the linter CI: here}

antoinebaker · 2025-06-12T08:53:13Z

sklearn/ensemble/_forest.py

+    if sample_weight is None:
+        sample_weight = np.ones(n_samples)
+    normalized_sample_weight = sample_weight / np.sum(sample_weight)
+    sample_indices = random_instance.choice(
+        n_samples, n_samples_bootstrap, replace=True, p=normalized_sample_weight
    )


I hesitate between two options for dealing with the sample_weight=None case.

Convert to all ones.

if sample_weight is None: sample_weight = np.ones(n_samples) normalized_sample_weight = sample_weight / np.sum(sample_weight) sample_indices = random_instance.choice( n_samples, n_samples_bootstrap, replace=True, p=normalized_sample_weight )

Use the old code path when sample_weight=None

if sample_weight is None: sample_indices = random_instance.randint( 0, n_samples, n_samples_bootstrap, dtype=np.int32 ) else: normalized_sample_weight = sample_weight / np.sum(sample_weight) sample_indices = random_instance.choice( n_samples, n_samples_bootstrap, replace=True, p=normalized_sample_weight, )

The benefit of 2. is that the code is backward compatible when sample_weight=None, this PR and main give the exact same fit for a given random_state.

The benefit of 1. is that sample_weight=None and sample_weight=np.ones(n_samples) give the exact same fit for a given random_state.

Using 1. test_set_estimator_drop, test_rfe_features_importance or test_forest_classifier_oob fail. These tests do not use sample_weight.

Using 2. test_class_weights fails, because it checks that no or all ones sample_weight give the same results.

antoinebaker · 2025-06-12T08:56:40Z

sklearn/ensemble/_forest.py

+            # NOTE: "balanced_subsample" option is ignored, treated as "balanced"
+            class_weight = self.class_weight
+            if class_weight == "balanced_subsample":
+                class_weight = "balanced"
+            expanded_class_weight = compute_sample_weight(class_weight, y_original)


Here I choose to simply ignore the "balanced_subsample" option and treat it as the "balanced" case.

antoinebaker · 2025-06-12T09:14:29Z

The forest estimators now pass the statistical repeated/weighted equivalence test, for example

use sample_weight in choice

9458a1c

github-actions bot added the module:ensemble label Jun 12, 2025

antoinebaker commented Jun 12, 2025

View reviewed changes

antoinebaker added 2 commits June 13, 2025 17:40

use old code path

2f30d7d

relative max_samples

a55643b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Draw indices using sample_weight in Forest #31529

FIX Draw indices using sample_weight in Forest #31529

Uh oh!

antoinebaker commented Jun 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jun 12, 2025 •

edited

Loading

Uh oh!

antoinebaker Jun 12, 2025 •

edited

Loading

Uh oh!

antoinebaker Jun 12, 2025

Uh oh!

antoinebaker commented Jun 12, 2025

Uh oh!

Uh oh!

Uh oh!

FIX Draw indices using sample_weight in Forest #31529

Are you sure you want to change the base?

FIX Draw indices using sample_weight in Forest #31529

Uh oh!

Conversation

antoinebaker commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement/fix? Explain your changes.

Comments

Uh oh!

github-actions bot commented Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

antoinebaker Jun 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

antoinebaker Jun 12, 2025

Choose a reason for hiding this comment

Uh oh!

antoinebaker commented Jun 12, 2025

Uh oh!

Uh oh!

antoinebaker commented Jun 12, 2025 •

edited

Loading

github-actions bot commented Jun 12, 2025 •

edited

Loading

antoinebaker Jun 12, 2025 •

edited

Loading