Skip to content

ENH: Ability to name columns/index levels when using .str.split(..., expand=True) on Index/Series #61515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
nachomaiz opened this issue May 29, 2025 · 2 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@nachomaiz
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

When using .str.split(..., expand=True):

  • On a Series the resulting dataframe columns are labeled with numbers by default
  • On an Index the resulting levels are not labeled

It would be great if we could specify the names that the new columns or levels will take once the split is performed.

Feature Description

I think it would be helpful if the method had a names parameter that would at a minimum accept a sequence of labels for the newly created columns/levels, similarly to how MultiIndex is initialized.

It could work like so:

>>> index = pd.Index(["a_b"])
>>> index.str.split("_", expand=True, names=["A", "B"])
MultiIndex([('a', 'b')], names=["A", "B"], length=1)
>>> series = pd.Series(["a_b"])
>>> series.str.split("_", expand=True, names=["A", "B"])
|   | A | B |
|---|---|---|
| 0 | a | b |

The length of the names sequence should match the number of expanded columns/levels, otherwise it should throw a ValueError.

Alternative Solutions

For Index, this works almost exactly the same:

>>> index.str.split("_", expand=True).rename(["A", "B"])

So I think it's not as impactful for Index.

But for Series, this becomes more cumbersome, and the need to specify the renaming via a dictionary makes it feel disjointed vs the easier index renaming and MultiIndex instantiation:

>>> series.str.split("_", expand=True).rename(columns={0: "A", 1: "B"})

So my proposal would provide a similar interface for using the split method of the str accessor across pandas sequences.

Additional Context

No response

@nachomaiz nachomaiz added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels May 29, 2025
@yuanx749
Copy link
Contributor

What about series.str.split("_", expand=True).set_axis(["A", "B"], axis=1)?

@nachomaiz
Copy link
Author

Ah, good point! Yes that would work as well and is less cumbersome than the rename method.

I'd still think it would be worth having this functionality in str.split, but I agree the need-gap isn't as large as I thought.

So please consider it or feel free to close the issue if not worth pursuing. 😊

Thanks as always!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants