-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-34861 Make cProfile default output more useful #9655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ce14a9f
to
4bce0c3
Compare
Excuse the ping here, but what can I do to move this forward? This is a very simple change that make a HUGE difference in usability, and it has been reviewed already. How can this be merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @boxed and thanks for the PR. I would suggest splitting this PR into the one changing the default and the one changing the path because I have some doubts regarding the path approach but I don't want to block on the default, as that is indeed a good improvement.
Misc/NEWS.d/next/Library/2018-10-01-13-20-00.bpo-34861.8JanqQ.rst
Outdated
Show resolved
Hide resolved
Lib/pstats.py
Outdated
@@ -492,7 +492,10 @@ def compare (self, left, right): | |||
|
|||
def func_strip_path(func_name): | |||
filename, line, name = func_name | |||
return os.path.basename(filename), line, name | |||
path, base_name = os.path.split(filename) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although I agree that this happens substantially more with __init__
files, I don't think this is an elegant solution as there may be many files with the same file name in different directories or packages. I think the correct solution will be incorporating a way for the user to get the full path (or how many levels of the path maybe). Notice as well that the docs refer to this field as filename
. Including the folder instead in some cases is technically backwards incompatible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is an elegant solution as there may be many files with the same file name in different directories or packages
Absolutely. The correct solution is to cut the paths until they are all unique but also the shortest possible. But this change is going to do that almost always at a much smaller implementation and maintenance cost.
I think the correct solution will be incorporating a way for the user to get the full path (or how many levels of the path maybe).
I disagree. The full path is way too long in almost all cases, which is the reason the default is to drop all of the path all of the time. That just happens to be slightly too much to cover even the 80-20 rule. This change will get us well within 80% with a super small and understandable change to the behavior.
Notice as well that the docs refer to this field as filename. Including the folder instead in some cases is technically backwards incompatible.
It would surprise me greatly if "filename" is never used for the full filename with path anywhere in the docs :) And yes, this is technically incompatible. The change to the sort order is also incompatible if someone has built tools to parse this output and depend on the alphabetical output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would surprise me greatly if "filename" is never used for the full filename with path anywhere in the docs :) And yes, this is technically incompatible. The change to the sort order is also incompatible if someone has built tools to parse this output and depend on the alphabetical output.
I understand you points, but we care greatly about backwards compatibility and IMHO this does not justify breaking it. Take into account that pstats
can be used as a library and therefore people may be relying programmatically upon (without parsing any output) that filename
is what the name suggests: a filename. I am referring uniquely to the change in the filename, the change in the default on the command line is fine.
I disagree. The full path is way too long in almost all cases, which is the reason the default is to drop all of the path all of the time. That just happens to be slightly too much to cover even the 80-20 rule. This change will get us well within 80% with a super small and understandable change to the behavior.
I understand what you say, but I still think that making special cases for files that start with double underscores makes the working of the tool inconsistent to work with. This is the standard library and therefore we need to target correctness, not something that works with 80% of the cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take into account that pstats can be used as a library and therefore people may be relying programmatically upon (without parsing any output) that filename is what the name suggests: a filename. I am referring uniquely to the change in the filename, the change in the default on the command line is fine.
Oh, sorry, I misremembered my change. Yes, I see it clearly now. My change should have been to introduce another func_strip_path() that is used by the command line tool by default.
I understand what you say, but I still think that making special cases for files that start with double underscores makes the working of the tool inconsistent to work with.
Inconsistent can be good, if consistent is not usable though, which I think is the current situation in many cases.
This is the standard library and therefore we need to target correctness, not something that works with 80% of the cases.
I don't agree, but I do see your point. If I were to take your position I think the correct way to do this would be to introduce a func_shortest_unique_path
function that is called by the command line by default that does the full job to make filenames minimally unique.
If you agree with this, I can get started right away, it doesn't seem like it would be too hard, it just unfortunately requires the function to have access to all paths instead of just the one at a time. A mutation tested test suite for such a function seems pretty straight forward to produce too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I were to take your position I think the correct way to do this would be to introduce a
func_shortest_unique_path
function that is called by the command line by default that does the full job to make filenames minimally unique.
If this were something that only affects the command line output, I would be ok with such change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent can be good, if consistent is not usable though, which I think is the current situation in many cases.
I am referring to the following scenario: imagine we do the change you propose. Then users start complaining that sometimes they see a filename and sometimes a path and they don't understand why is that. Then, when they understand that is for files with __
, they immediately start complaining (rightfully) that there are plenty of other stuff that is not being normalized and why we chose only files with __
and not include namespace packages or other common prefixes to disambiguate the output. Then other users will come in and ask why we are only showing one folder because they have an implicit namespace with 10 different directories with a init.py file that has the same parent folder. And so on and so forth.
Inconsistent here is that (1) the output may be strange to users (why is doing that?) and (2) is not solving the generic problem, just making a bandaid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, giving an option to receive the full path (not in the command line, that can be too big, we would need a different solution for that) will make the tool consistent and will cover 100% of all cases.
I am not suggesting we do that here, I am just explaining my point further :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bandaids stop bleeding, they are a very good thing :)
But yes ok, I'll do the full thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented this suggested change. It's much more complex obviously, but actually using it, I think it's worse. The problem is that we've been thinking about this wrong. Unique filenames isn't the only relevant thing. If the filename is __init__.py
, this is pretty much identical to the empty string in usefulness unless your entire project has only one__init__.py
with code in it. That the profiler only hit one of those files doesn't actually mean that just showing __init__.py
gives any useful information. So I think a special case for __init__.py
is still very much warranted.
I have committed an updated piece of code with the changes you suggested plus a special case for __init__.py
. There's also a test suite with only one surviving mutant for mutmut and that is a <=
-> <
mutation for the step
loop. I think this is fine.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I clicked on the "resolve conversation" button to let you know I've answered the questions. I hope that was the right thing to do and doesn't appear rude or something. I'm unsure what the etiquette is... |
Ah, I need to do this too: I have made the requested changes; please review again. I haven't made all the requested changes, but I have replied to the discussion at least. |
Thanks for making the requested changes! @pablogsal: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out this comment: https://github.com/python/cpython/pull/9655/files#r504605933
I kindly insist on what I said before: I think is easier to split the PR.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
I have made the requested changes; please review again. |
Thanks for making the requested changes! @pablogsal: please review the changes made to this pull request. |
@boxed Did you forgot to push your changes? The last commit here was 3 days ago |
@pablogsal Yes I did :( I have pushed now and resolved a merge conflict I got now too. |
99d996d
to
f13da29
Compare
I rebased and squashed everything, and fixed the docs. I was so far out of date with master that this was needed (due to this PR started its life in 2018). |
f13da29
to
63e8549
Compare
This PR is stale because it has been open for 30 days with no activity. |
Display one folder level for pstats.stripdirs() when filename starts with __ (__init__.py, __main__.py primarily)
32bde8f
to
f6da3b6
Compare
I have reapplied the change on top of current |
How do I remove the stale tag? I don't think it's correct. |
For a test script:
The old output is:
after this change the output is:
https://bugs.python.org/issue34861