BUG Fixes sample weights when there are missing values in DecisionTrees #26376

thomasjpfan · 2023-05-15T20:33:21Z

Reference Issues/PRs

Follow up to #23595

What does this implement/fix? Explain your changes.

On main, the weighted_n_missing was incorrectly computed. This PR fixes it. For reference the computation is exactly the same as sum_total:

scikit-learn/sklearn/tree/_criterion.pyx

Lines 818 to 824 in 6be774b

    
           for k in range(self.n_outputs): 
        
               y_ik = self.y[i, k] 
        
               w_y_ik = w * y_ik 
        
               self.sum_total[k] += w_y_ik 
        
               self.sq_sum_total += w_y_ik * y_ik 
        
           self.weighted_n_node_samples += w

…Trees

jjerphan · 2023-05-15T21:02:42Z

sklearn/tree/tests/test_tree.py

@@ -2549,7 +2549,8 @@ def test_missing_values_poisson():
        (datasets.make_classification, DecisionTreeClassifier),
    ],
 )
-def test_missing_values_is_resilience(make_data, Tree):
+@pytest.mark.parametrize("sample_weight_train", [None, "ones"])


Should we also test the behavior when using non-uniform weights?

Probably, an easier test than non-uniform weight is to assign 0-weight to some specific samples.

It might be worth having a separated test for checking an equivalence.

jjerphan

LGTM. Thank you, @thomasjpfan.

ogrisel

LGTM!

ogrisel · 2023-05-16T16:34:04Z

sklearn/tree/tests/test_tree.py

+    X, y = make_data(n_samples=n_samples, n_features=n_features, random_state=rng)
+
+    # Create dataset with missing values
+    X[rng.choice([False, True], size=X.shape, p=[0.9, 0.1])] = np.nan


neat idiom :)

…es (scikit-learn#26376)

BUG Fixes sample weights when there are missing values in Regression …

ef48585

…Trees

thomasjpfan added this to the 1.3 milestone May 15, 2023

thomasjpfan changed the title ~~BUG Fixes sample weights when there are missing values in Regression Trees~~ BUG Fixes sample weights when there are missing values in DecisionTrees May 15, 2023

github-actions bot added cython module:tree labels May 15, 2023

thomasjpfan added 2 commits May 15, 2023 16:33

DOC Adds PR number

b385c45

REV Less diffs

c0365a7

thomasjpfan mentioned this pull request May 15, 2023

ENH Adds missing value support for trees #23595

Merged

jjerphan reviewed May 15, 2023

View reviewed changes

thomasjpfan added 2 commits May 16, 2023 06:52

TST Adds test for non-uniform sample weights

32f0030

TST Stronger test

d595565

jjerphan reviewed May 16, 2023

View reviewed changes

jjerphan approved these changes May 16, 2023

View reviewed changes

thomasjpfan added 2 commits May 16, 2023 10:45

TST Speed up test

3b4dbc1

DOC Adjust comments

a32d7bb

ogrisel approved these changes May 16, 2023

View reviewed changes

ogrisel merged commit 43cf7d4 into scikit-learn:main May 16, 2023

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

BUG Fixes sample weights when there are missing values in DecisionTre…

8778929

…es (scikit-learn#26376)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG Fixes sample weights when there are missing values in DecisionTrees #26376

BUG Fixes sample weights when there are missing values in DecisionTrees #26376

Uh oh!

thomasjpfan commented May 15, 2023

Uh oh!

jjerphan May 15, 2023

Uh oh!

glemaitre May 16, 2023

Uh oh!

glemaitre May 16, 2023

Uh oh!

jjerphan left a comment

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel May 16, 2023

Uh oh!

Uh oh!

	for k in range(self.n_outputs):
	y_ik = self.y[i, k]
	w_y_ik = w * y_ik
	self.sum_total[k] += w_y_ik
	self.sq_sum_total += w_y_ik * y_ik

	self.weighted_n_node_samples += w

Uh oh!

BUG Fixes sample weights when there are missing values in DecisionTrees #26376

BUG Fixes sample weights when there are missing values in DecisionTrees #26376

Uh oh!

Conversation

thomasjpfan commented May 15, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

jjerphan May 15, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre May 16, 2023

Choose a reason for hiding this comment

Uh oh!

glemaitre May 16, 2023

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel May 16, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!