-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Load .data.json files on-demand in fine-grained mode #4910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expected performance improvement is very impressive! Looks mostly good, just some minor notes. It would be nice to get rid of the MypyFile
subclass, but maybe we can leave it for another PR.
I think that this would be worth testing against some collection of real-world commits, as this is a bit of a risky change.
mypy/server/update.py
Outdated
if to_process: | ||
manager.log("Calling process_fresh_scc on an 'scc' of size {} ({})".format( | ||
len(to_process), to_process)) | ||
process_fresh_scc(graph, to_process, manager) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modules to process may contain multiple SCCs, so the process_fresh_scc
function name is a misnomer. Maybe rename the function, and mention in the docstring that it can be used to process a SCC?
mypy/server/update.py
Outdated
"""Find names of all targets that need to reprocessed, given some triggers. | ||
|
||
Returns: Dictionary from module id to a set of stale targets. | ||
Returns: a tuple containing a: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: Capitalize the first 'a'.
mypy/server/update.py
Outdated
Returns: Dictionary from module id to a set of stale targets. | ||
Returns: a tuple containing a: | ||
* Dictionary from module id to a set of stale targets. | ||
* A set of module ids for unparsed modules with stale targets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nit: missing period at the end.
mypy/build.py
Outdated
def calculate_mros(self) -> None: | ||
assert self.tree is not None, "Internal error: method must be called on parsed file only" | ||
fixup_module_pass_two(self.tree, self.manager.modules) | ||
fixup_module(self.tree, self.manager.modules, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why the two fixup passes now can be combined into one?
mypy/nodes.py
Outdated
@@ -259,6 +259,31 @@ def deserialize(cls, data: JsonDict) -> 'MypyFile': | |||
return tree | |||
|
|||
|
|||
class UnloadedMypyFile(MypyFile): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is pretty unfortunate. How hard would it be to not need this?
mypy/server/update.py
Outdated
module, we don't need to explore its dependencies. (This | ||
invariant is slightly violated when dependencies are added, which | ||
can be handled by calling find_unloaded_deps directly on the new | ||
dependencies) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: missing period at the end of final sentence.
mypy/server/update.py
Outdated
|
||
def ensure_trees_loaded(manager: BuildManager, graph: Dict[str, State], | ||
initial: Sequence[str]) -> None: | ||
"""Ensure that the modules in initial and their deps have loaded trees""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Missing period at the end of the docstring.
This is a prerequisite for loading .data.json files on demand (#4910), since if MROs are computed on-demand when a tree is loaded, it is impossible to detect changes in the MRO caused by a change in some other file that triggered an on-demand load.
This is a prerequisite for loading .data.json files on demand (#4910), since if MROs are computed on-demand when a tree is loaded, it is impossible to detect changes in the MRO caused by a change in some other file that triggered an on-demand load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to merge once you've tried this with a bunch of real commits in Dropbox internal code, and please monitor the Dropbox internal builds and metrics afterwards for failures.
The logic in build to determine what imported modules are depended on used to elide dependencies to m in `from m import a, b, c` if all of a, b, c were submodules. This was removed in #4910 because it seemed like it ought not be necessary (and that semantically there *was* a dependency), and early versions of #4910 depended removing it. The addition of this dependency, though, can cause cycles that wouldn't be there otherwise, which can cause #4498 (invalid type when using aliases in import cycles) to trip when it otherwise wouldn't. We've seen this once in a bug report and once internally, so restore the `all_are_submodules` logic in avoid triggering #4498 in these cases.
The logic in build to determine what imported modules are depended on used to elide dependencies to m in `from m import a, b, c` if all of a, b, c were submodules. This was removed in #4910 because it seemed like it ought not be necessary (and that semantically there *was* a dependency), and early versions of #4910 depended removing it. The addition of this dependency, though, can cause cycles that wouldn't be there otherwise, which can cause #4498 (invalid type when using aliases in import cycles) to trip when it otherwise wouldn't. We've seen this once in a bug report and once internally, so restore the `all_are_submodules` logic in avoid triggering #4498 in these cases. Fixes #5015
) The logic in build to determine what imported modules are depended on used to elide dependencies to m in `from m import a, b, c` if all of a, b, c were submodules. This was removed in #4910 because it seemed like it ought not be necessary (and that semantically there *was* a dependency), and early versions of #4910 depended on removing it. The addition of this dependency, though, can cause cycles that wouldn't be there otherwise, which can cause #4498 (invalid type when using aliases in import cycles) to trip when it otherwise wouldn't. Unfortunately the dependency on the module is actually required for correctness in some corner cases, so instead of eliding the import, we lower its priority. This causes the cycles in the regressions we are looking at to get processed in the order that works. This is obviously just a workaround.
Depends on #4906.
This gives around a 13s speed up for cold runs on S.