Skip to content

Stratified Split for Regression #30033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

Pablitosalinero
Copy link

Reference Issues/PRs

Fixes #30009.

What does this implement/fix? Explain your changes.

  • This PR introduces a new implementation of the train_test_split function that enhances its functionality with the following improvements:
    • Stratified splitting: Ensures that both the training and test sets maintain the class distribution in classification tasks.
    • Balanced regression splitting: Supports balancing continuous target variables by binning values into intervals.
    • Flexible control: Allows precise control over test and training sizes, with support for specifying the random state and whether or not to shuffle the data.

Additionally, unit tests were added to cover various edge cases, including:

  • Stratified splitting for classification tasks.
  • Balanced splits for regression targets.
  • Custom train/test sizes.
  • Shuffle and random state behavior.

Any other comments?

This enhancement provides a more flexible and robust dataset splitting utility for users working with both classification and regression problems, addressing the needs discussed in issue #30009.

Copy link

github-actions bot commented Oct 8, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 96fc167. Link to the linter CI: here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add balance_regression option to train_test_split for regression problems
1 participant