Skip to content

GH-112906: Fix performance regression in pathlib path initialisation #112907

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 10, 2023

Conversation

barneygale
Copy link
Contributor

@barneygale barneygale commented Dec 9, 2023

This was caused by 76929fd, specifically its use of super() and its packing/unpacking of *args.

…ation

This was caused by 76929fd, specifically its use of `super()` and its
packing/unpacking `*args`.
@barneygale
Copy link
Contributor Author

barneygale commented Dec 9, 2023

$ ./python -m timeit -n 1000000 -s "from pathlib import PurePath" "PurePath('a', 'b')"
1000000 loops, best of 5: 636 nsec per loop  # before regression (baseline)
1000000 loops, best of 5: 827 nsec per loop  # after regression (30% slower)
1000000 loops, best of 5: 662 nsec per loop  # after this fix (4% slower)

The last few % are hard to recoup unfortunately

@barneygale
Copy link
Contributor Author

Skipping news because this only regressed yesterday.

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that PurePath may be instantiated multiple times in an app, the speed recovery looks worthwhile. The net 5% loss is the cost of a new class.

The logic looks correct (and I presume testing verifies it); just add docstrings with minimal explanation. Without the explanation in the issue and PR, this could puzzle future readers.

@bedevere-app
Copy link

bedevere-app bot commented Dec 9, 2023

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@barneygale
Copy link
Contributor Author

barneygale commented Dec 9, 2023

I have made the requested changes; please review again

Copy link
Member

@AlexWaygood AlexWaygood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered either of these two alternatives? (Both diffs are relative to main, not this PR branch)

diff --git a/Lib/pathlib/__init__.py b/Lib/pathlib/__init__.py
index f4668ab327..626f817733 100644
--- a/Lib/pathlib/__init__.py
+++ b/Lib/pathlib/__init__.py
@@ -90,7 +90,9 @@ def __init__(self, *args):
                         "object where __fspath__ returns a str, "
                         f"not {type(path).__name__!r}")
                 paths.append(path)
-        super().__init__(*paths)
+        # As an optimisation, avoid calling super().__init__() here
+        self._raw_paths = paths
+        self._resolving = False

or

diff --git a/Lib/pathlib/__init__.py b/Lib/pathlib/__init__.py
index f4668ab327..5c605547c0 100644
--- a/Lib/pathlib/__init__.py
+++ b/Lib/pathlib/__init__.py
@@ -90,7 +90,7 @@ def __init__(self, *args):
                         "object where __fspath__ returns a str, "
                         f"not {type(path).__name__!r}")
                 paths.append(path)
-        super().__init__(*paths)
+        super().__init__(paths)

     def __reduce__(self):
         # Using the parts tuple helps share interned path parts
diff --git a/Lib/pathlib/_abc.py b/Lib/pathlib/_abc.py
index 4808d0e61f..2d7fba392a 100644
--- a/Lib/pathlib/_abc.py
+++ b/Lib/pathlib/_abc.py
@@ -206,7 +206,7 @@ class PurePathBase:
     )
     pathmod = os.path

-    def __init__(self, *paths):
+    def __init__(self, paths):
         self._raw_paths = paths
         self._resolving = False

The first is slightly faster than your PR here; the second is slightly slower.

These both feel like slightly simpler patches. The second might also provide speedups to third-party subclasses of PurePathBase as well as subclasses in the Lib/pathlib/ directory

@barneygale
Copy link
Contributor Author

Thanks Alex

First alternative: it's been a while since I've dealt with this, but I think not calling super() from __init__() can snooker users of multiple inheritance in certain cases. Maybe? My editor warns me about it but I don't recall specifics.

Second alternative: would make the initialiser signature of PurePathBase and PathBase different from PurePath and Path, which I think would be hard to explain to users.

@barneygale
Copy link
Contributor Author

I think I was wrong about the first alternative: it's fine to omit the super() call. Will adjust the patch.

@AlexWaygood
Copy link
Member

Yeah, if you're looking to do cooperative multiple inheritance, it's always advisable to always call super().__init__() when overriding __init__ in a subclass.

Outside of that situation, though, I don't think there's any reason to say you have to always call super().__init__() in a subclass -- there are often good reasons to avoid doing so. Linters often warn about it because a common mistake is to accidentally forget to include the super().__init__() call in the subclass -- but we know what we're doing here, so it's all good :)

The cooperative multiple inheritance concern doesn't apply here either, I don't think, since PurePath itself only has one base -- PurePathBase. But it's also been a while since I've tried to do cooperative multiple inheritance with anything; I might have to rewatch https://www.youtube.com/watch?v=EiOglTERPEo&list=PLd9YubueMBAAHenyO2yyf4bg27uwzx-Fc&index=11 to brush up :)

Second alternative: would make the initialiser signature of PurePathBase and PathBase different from PurePath and Path, which I think would be hard to explain to users.

Yeah, good point!

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@barneygale barneygale enabled auto-merge (squash) December 9, 2023 23:49
@barneygale barneygale merged commit 23df46a into python:main Dec 10, 2023
@barneygale
Copy link
Contributor Author

Thanks for the reviews, both!

aisk pushed a commit to aisk/cpython that referenced this pull request Feb 11, 2024
…ation (python#112907)

This was caused by 76929fd, specifically its use of `super()` and its
packing/unpacking `*args`.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Glyphack pushed a commit to Glyphack/cpython that referenced this pull request Sep 2, 2024
…ation (python#112907)

This was caused by 76929fd, specifically its use of `super()` and its
packing/unpacking `*args`.

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage skip news topic-pathlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants