Stratified Split for Regression #30033

Pablitosalinero · 2024-10-08T15:24:57Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR introduces a new implementation of the train_test_split function that enhances its functionality with the following improvements:
- Stratified splitting: Ensures that both the training and test sets maintain the class distribution in classification tasks.
- Balanced regression splitting: Supports balancing continuous target variables by binning values into intervals.
- Flexible control: Allows precise control over test and training sizes, with support for specifying the random state and whether or not to shuffle the data.

Additionally, unit tests were added to cover various edge cases, including:

Stratified splitting for classification tasks.
Balanced splits for regression targets.
Custom train/test sizes.
Shuffle and random state behavior.

Any other comments?

This enhancement provides a more flexible and robust dataset splitting utility for users working with both classification and regression problems, addressing the needs discussed in issue #30009.

github-actions · 2024-10-08T15:26:19Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 96fc167. Link to the linter CI: here}

…into feat/stratified-split-for-regression

…nto feat/stratified-split-for-regression

Pablitosalinero added 4 commits October 7, 2024 18:08

[Feat] Include StratifiedShuffleSplitRegression

52d4a1e

[Feat] Change changelog and split init

49dceb6

[Feat] Include StratifiedShuffleSplitRegression into train_test_split

47fddf1

Update changelog

a9a04df

github-actions bot added the module:model_selection label Oct 8, 2024

Pablitosalinero added 6 commits October 8, 2024 15:35

[Feat] Include StratifiedShuffleSplitRegression

5371798

[Feat] Change changelog and split init

c8f5550

[Feat] Include StratifiedShuffleSplitRegression into train_test_split

8c5f5b0

Update changelog

3c4a95b

Merge main

cf7ada8

Fix cv_params

e7b9244

Pablitosalinero mentioned this pull request Oct 8, 2024

Add balance_regression option to train_test_split for regression problems #30009

Closed

Pablitosalinero added 14 commits October 8, 2024 16:35

Fix docstrings

5479c53

Fix docstrings

68d29a3

Update documentation

14a7d58

Include grid search test

031ffe7

Include halvin search test

f488887

Include split tests

fe3d507

Fix stratified shuffle split iter test

bb08670

[Feat] Include tests

8cf4259

:Merge branch 'main' of https://github.com/scikit-learn/scikit-learn …

1999628

…into feat/stratified-split-for-regression

Fix doctests

e0556f5

Fix doctests

b0049d3

Merge branch 'main' of https://github.com/scikit-learn/scikit-learn i…

b0dc958

…nto feat/stratified-split-for-regression

Fix doctest

9b6bdfd

Merge branch 'main' into feat/stratified-split-for-regression

96fc167

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Stratified Split for Regression #30033

Stratified Split for Regression #30033

Uh oh!

Pablitosalinero commented Oct 8, 2024

Uh oh!

github-actions bot commented Oct 8, 2024 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Stratified Split for Regression #30033

Are you sure you want to change the base?

Stratified Split for Regression #30033

Uh oh!

Conversation

Pablitosalinero commented Oct 8, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

Uh oh!

github-actions bot commented Oct 8, 2024 •

edited

Loading