-
Notifications
You must be signed in to change notification settings - Fork 44
perf: further avoid using proto-plus wrapper when unmarshalling entities #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is actually a big win: in my comparison, we go from 3x to 2x slower than the older/faster version.
Always take the raw protobuf message, rather than the proto-plus wrapper. This is another big win: we are back to within a few percent of the older/faster version on my comparison test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! After a few read-throughs of the PR, I think I have pieced together how/why this improves performance. This LGTM pending your confirmation of my understanding (which is only important in that, if I am still misunderstanding, then I have not actually reviewed this PR as much as I think 😆).
|
||
return meaning | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make sure I'm following, do the edits to _get_meaning()
impact performance, or is this just for readability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big hit: the three-item getattr
can be much faster than a try: ... except AttributeError:
, but likely not the conditional assignment. OTOH, since in later commits we no longer risk handing the proto-plus wrapper into _get_meaning
, we could likely pull that getattr
out. The other changes to _get_meaning
are micro-optimizaitons, which neither helped nor hurt on my comparison test.
google/cloud/datastore/helpers.py
Outdated
values = ( | ||
value_pb._pb.array_value.values | ||
if hasattr(value_pb, "_pb") | ||
else value_pb.array_value.values | ||
) | ||
# Use 'raw' protobuf message if possible | ||
value_pb = getattr(value_pb, "_pb", value_pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes like this (there are a few in the PR) seem be a pure refactor of the old implementation, unless I am missing something that further helps performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Three-arg getattr
does the test in C, versus in Python's own branch machinery. At least in some cases, it can be a speedup.
@@ -124,21 +108,18 @@ def entity_from_protobuf(pb): | |||
:rtype: :class:`google.cloud.datastore.entity.Entity` | |||
:returns: The entity derived from the protobuf. | |||
""" | |||
if not isinstance(pb, entity_pb2.Entity): | |||
proto_pb = entity_pb2.Entity.wrap(pb) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the removal of this line and focusing entirely on raw protobuf objects what delivers the final performance gains?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, indeed. The proto-plus wrappers are an enormous perf-suck. Most of the win in this PR is avoiding touching them anywhere except at the outermost entry point, and then unwrapping there ASAP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had not pieced together that we could avoid .wrap()
altogether. The last time I was in here, I'd mistakenly assumed calling that (which is at least the least-slow proto-plus entrypoint) was still required.
Very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
🤖 I have created a release \*beep\* \*boop\* --- ### [2.1.4](https://www.github.com/googleapis/python-datastore/compare/v2.1.3...v2.1.4) (2021-07-09) ### Performance Improvements * further avoid using proto-plus wrapper when unmarshalling entities ([#190](https://www.github.com/googleapis/python-datastore/issues/190)) ([d0481bf](https://www.github.com/googleapis/python-datastore/commit/d0481bf8caa84a829808e7f512fda8709f38d0cc)) ### Documentation * omit mention of Python 2.7 in 'CONTRIBUTING.rst' ([#1127](https://www.github.com/googleapis/python-datastore/issues/1127)) ([#181](https://www.github.com/googleapis/python-datastore/issues/181)) ([6efde70](https://www.github.com/googleapis/python-datastore/commit/6efde70db751bf708091b24a932ab8571bd981a6)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Follow-on to @craiglabenz's work in PR #155. These tweaks combine to get performance on my comparison with 1.5.3 script to within a few percentage points of the 1.5.3 implementation.
Closes #150.