-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: Add force_suffixes
boolean argument to pd.merge
#61498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
force_suffixes
boolean argument to pd.merge
force_suffixes
boolean argument to pd.merge
Hey @mroeschke, can you please take a look at my if the direction is right for you (i.e. you are OK with an additional argument) before I will fix the failing tests, linting errors and adjust the documentation. Ty in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening the PR, but I would say this feature needs more discussion and agreement from the core team before moving forward with a PR
pd.merge(df1.add_suffix("_left"), df2.add_suffix("_right")) Let me know if I'm missing something, but while seems some people would appreciate that, doesn't seem any core dev is excited about this, ans I understand why, since it makes a tricky method even trickier. @TomAugspurger in the issue seems like you were a bit more positive than others about adding this when discussed some time ago. Would you move on with this PR? Otherwise let's close it. |
Thanks for your feedback! Let me motivate the additional flag approach. I agree with you, using Using However I also agree on the additional complexity of the upcoming implementation. The internal logic of renaming and returning columns is quite complex already, which is not easy to grasp, maintain and test. I will wait upon your decision. |
Thanks for the clarification. I didn't realize the suffix would be added to the columns to join and it'd make things more complex than just adding the @pandas-dev/pandas-core any opinion on adding a flag to |
I think this is a nice idea. If we default it to |
This is a situation I've run into occasionally. It's a few lines of user code, and yes, you need to track what you're joining on. I don't think it's unreasonable for the onus to be on users here, but no objection to adding a flag. |
Motivation
Often, when working with wide (i.e. multiple columns) dataframes in exploratory, merging them leads to an even wider dataframe. Currently, the
suffixes
mechanism is only applied on equally named columns from both dataframes.However, often developers alter the column names beforehand, or use solutions similar to the one suggested here.
Changes
This PR adds a
force_suffixes
boolean argument topd.merge
which applies the suffixes on all columns, no matter if they equally named or not.The goal is to have the following:
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.