Load .data.json files on-demand in fine-grained mode #4910

msullivan · 2018-04-14T00:45:42Z

Depends on #4906.

This gives around a 13s speed up for cold runs on S.

JukkaL

The expected performance improvement is very impressive! Looks mostly good, just some minor notes. It would be nice to get rid of the MypyFile subclass, but maybe we can leave it for another PR.

I think that this would be worth testing against some collection of real-world commits, as this is a bit of a risky change.

JukkaL · 2018-04-17T15:18:53Z

mypy/server/update.py

+    if to_process:
+        manager.log("Calling process_fresh_scc on an 'scc' of size {} ({})".format(
+            len(to_process), to_process))
+        process_fresh_scc(graph, to_process, manager)


The modules to process may contain multiple SCCs, so the process_fresh_scc function name is a misnomer. Maybe rename the function, and mention in the docstring that it can be used to process a SCC?

JukkaL · 2018-04-17T15:20:54Z

mypy/server/update.py

    """Find names of all targets that need to reprocessed, given some triggers.

-    Returns: Dictionary from module id to a set of stale targets.
+    Returns: a tuple containing a:


Style nit: Capitalize the first 'a'.

JukkaL · 2018-04-17T15:21:22Z

mypy/server/update.py

-    Returns: Dictionary from module id to a set of stale targets.
+    Returns: a tuple containing a:
+     * Dictionary from module id to a set of stale targets.
+     * A set of module ids for unparsed modules with stale targets


Grammar nit: missing period at the end.

JukkaL · 2018-04-17T15:22:57Z

mypy/build.py

-    def calculate_mros(self) -> None:
-        assert self.tree is not None, "Internal error: method must be called on parsed file only"
-        fixup_module_pass_two(self.tree, self.manager.modules)
+        fixup_module(self.tree, self.manager.modules,


Can you explain why the two fixup passes now can be combined into one?

JukkaL · 2018-04-17T15:30:28Z

mypy/nodes.py

@@ -259,6 +259,31 @@ def deserialize(cls, data: JsonDict) -> 'MypyFile':
        return tree


+class UnloadedMypyFile(MypyFile):


Yeah, this is pretty unfortunate. How hard would it be to not need this?

JukkaL · 2018-04-17T15:43:30Z

mypy/server/update.py

+    module, we don't need to explore its dependencies.  (This
+    invariant is slightly violated when dependencies are added, which
+    can be handled by calling find_unloaded_deps directly on the new
+    dependencies)


Nit: missing period at the end of final sentence.

JukkaL · 2018-04-17T15:43:48Z

mypy/server/update.py

+
+def ensure_trees_loaded(manager: BuildManager, graph: Dict[str, State],
+                        initial: Sequence[str]) -> None:
+    """Ensure that the modules in initial and their deps have loaded trees"""


Nit: Missing period at the end of the docstring.

This is a prerequisite for loading .data.json files on demand (#4910), since if MROs are computed on-demand when a tree is loaded, it is impossible to detect changes in the MRO caused by a change in some other file that triggered an on-demand load.

JukkaL

Feel free to merge once you've tried this with a bunch of real commits in Dropbox internal code, and please monitor the Dropbox internal builds and metrics afterwards for failures.

The logic in build to determine what imported modules are depended on used to elide dependencies to m in `from m import a, b, c` if all of a, b, c were submodules. This was removed in #4910 because it seemed like it ought not be necessary (and that semantically there *was* a dependency), and early versions of #4910 depended removing it. The addition of this dependency, though, can cause cycles that wouldn't be there otherwise, which can cause #4498 (invalid type when using aliases in import cycles) to trip when it otherwise wouldn't. We've seen this once in a bug report and once internally, so restore the `all_are_submodules` logic in avoid triggering #4498 in these cases.

The logic in build to determine what imported modules are depended on used to elide dependencies to m in `from m import a, b, c` if all of a, b, c were submodules. This was removed in #4910 because it seemed like it ought not be necessary (and that semantically there *was* a dependency), and early versions of #4910 depended removing it. The addition of this dependency, though, can cause cycles that wouldn't be there otherwise, which can cause #4498 (invalid type when using aliases in import cycles) to trip when it otherwise wouldn't. We've seen this once in a bug report and once internally, so restore the `all_are_submodules` logic in avoid triggering #4498 in these cases. Fixes #5015

) The logic in build to determine what imported modules are depended on used to elide dependencies to m in `from m import a, b, c` if all of a, b, c were submodules. This was removed in #4910 because it seemed like it ought not be necessary (and that semantically there *was* a dependency), and early versions of #4910 depended on removing it. The addition of this dependency, though, can cause cycles that wouldn't be there otherwise, which can cause #4498 (invalid type when using aliases in import cycles) to trip when it otherwise wouldn't. Unfortunately the dependency on the module is actually required for correctness in some corner cases, so instead of eliding the import, we lower its priority. This causes the cycles in the regressions we are looking at to get processed in the order that works. This is obviously just a workaround.

msullivan requested review from JukkaL and ilevkivskyi April 14, 2018 00:45

msullivan force-pushed the less-deser branch from 6202c5c to 5c9e84c Compare April 14, 2018 01:14

JukkaL reviewed Apr 17, 2018

View reviewed changes

msullivan mentioned this pull request Apr 17, 2018

Store MROs in cache files instead of recomputing them #4921

Merged

msullivan added 2 commits April 17, 2018 12:19

Get rid of all_are_submodules and add a test for a bug this fixes

7b8c58c

Postpone load_tree()s until they are needed

3d732f2

msullivan force-pushed the less-deser branch from 02d0568 to a1e8fd1 Compare April 17, 2018 19:20

msullivan changed the base branch from deps-file to master April 17, 2018 19:21

Some cleanups.

3a61d28

msullivan force-pushed the less-deser branch from a1e8fd1 to 3a61d28 Compare April 17, 2018 19:21

Get rid of UnloadedMypyFile and use the graph as the source of truth.

3c028d9

JukkaL approved these changes Apr 23, 2018

View reviewed changes

Merge branch 'master' into less-deser

63873a0

msullivan merged commit 250eafe into master Apr 24, 2018

msullivan deleted the less-deser branch April 24, 2018 02:45

msullivan mentioned this pull request May 8, 2018

"Invalid Type" error in v0.600 on uncached runs involving file with same name as parent package #5015

Closed

msullivan mentioned this pull request May 9, 2018

Restore all_are_submodules import logic as workaround for #4498 #5016

Merged

hauntsaninja mentioned this pull request Aug 3, 2021

Cache causes mypy to fail every 2nd run for module importing from aws_cdk #9852

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load .data.json files on-demand in fine-grained mode #4910

Load .data.json files on-demand in fine-grained mode #4910

msullivan commented Apr 14, 2018

JukkaL left a comment

JukkaL Apr 17, 2018

JukkaL Apr 17, 2018

JukkaL Apr 17, 2018

JukkaL Apr 17, 2018

JukkaL Apr 17, 2018

JukkaL Apr 17, 2018

JukkaL Apr 17, 2018

JukkaL left a comment

		@@ -259,6 +259,31 @@ def deserialize(cls, data: JsonDict) -> 'MypyFile':
		return tree


		class UnloadedMypyFile(MypyFile):

Load .data.json files on-demand in fine-grained mode #4910

Load .data.json files on-demand in fine-grained mode #4910

Conversation

msullivan commented Apr 14, 2018

JukkaL left a comment

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL Apr 17, 2018

Choose a reason for hiding this comment

JukkaL left a comment

Choose a reason for hiding this comment