Improve performance of deletions in initial fine-grained cache load #4701

msullivan · 2018-03-08T01:34:21Z

To do this we just fully ditch process_graph for doing initial cache loads, since we don't want to do anything but load the graph and trees from the cache.

This has the side effect of disabling any writes to the cache while doing a successful cache load, so for consistency we also disable cache writes when a fine-grained cache load fails and we give up on it.

JukkaL

Looks good, but I'm a little worried about test coverage -- cache load performance seems like a tricky area that regress in the future (see my comments for some more concrete ideas).

JukkaL · 2018-03-08T17:10:31Z

mypy/dmypy_server.py

@@ -280,6 +280,9 @@ def initialize_fine_grained(self, sources: List[mypy.build.BuildSource]) -> Dict
            result = mypy.build.build(sources=sources,
                                      options=self.options,
                                      alt_lib_path=self.alt_lib_path)
+            # build will clear use_fine_grained_cache if it needs to give up


Style nit: looks like it would be better to move this outside the try statement, since this can't generate a CompileError exception.

JukkaL · 2018-03-08T17:12:30Z

mypy/server/update.py

@@ -174,8 +174,8 @@ def __init__(self,
        # for the cache. This is kind of a hack and it might be better to have
        # this directly reflected in load_graph's interface.
        self.options.cache_dir = os.devnull
+        self.options.use_fine_grained_cache = False


Modifying the options object it's kind of ugly. Maybe at least mention that it gets mutated in the docstring?

JukkaL · 2018-03-08T17:16:25Z

mypy/dmypy_server.py

@@ -280,6 +280,9 @@ def initialize_fine_grained(self, sources: List[mypy.build.BuildSource]) -> Dict
            result = mypy.build.build(sources=sources,
                                      options=self.options,
                                      alt_lib_path=self.alt_lib_path)
+            # build will clear use_fine_grained_cache if it needs to give up
+            # on doing a cache load.
+            cache_load_succeeded = self.options.use_fine_grained_cache


Instead of mutating the use_fine_grained_cache attribute, would it be cleaner to add another attribute to the build result? Having a side channel like this seems questionable.

JukkaL · 2018-03-08T17:18:26Z

mypy/build.py

+    # we process all SCCs as fresh SCCs so that we have all of the symbol
+    # tables and fine-grained dependencies available.
+    # We fail the loading of any SCC that we can't load a meta for, so we
+    # don't have anything *but* fresh SCCs.


Have you tested what happens if we fail loading some SCCs? Will they be picked up by the build later?

Yeah, that is tested by the testAddModuleAfterCache tests.

JukkaL · 2018-03-08T17:20:28Z

mypy/build.py

    if manager.options.use_fine_grained_cache and len(graph) < 0.50 * len(sources):
-        manager.log("Redoing load_graph because too much was missing")
-        manager.only_load_from_cache = False
+        manager.log("Redoing load_graph without cache because too much was missing")


Did you test that this still gets triggered correctly? What about setting a flag when we get here and having a test case where we test the existence of the flag (enabled using a magic comment in test description, for example)? The test could also delete some subset of cache files (maybe based on module name or something).

JukkaL · 2018-03-08T17:22:53Z

mypy/build.py

@@ -2128,7 +2133,10 @@ def dispatch(sources: List[BuildSource], manager: BuildManager) -> Graph:
    if manager.options.dump_graph:
        dump_graph(graph)
        return graph
-    process_graph(graph, manager)
+    if manager.options.use_fine_grained_cache:


Add comment here describing why we have two different ways of processing the cache.

JukkaL · 2018-03-08T17:25:02Z

test-data/unit/fine-grained-modules.test

@@ -624,6 +624,22 @@ c.f(1)
 ==
 a.py:2: error: Too many arguments for "f"

+[case testRenameAndDeleteModule]


Was this test case failing on master? Do we have sufficient test coverage? The details of loading the import graph are pretty subtle. Maybe we could, say, keep track of the number of files parsed during load_graph and test that it is zero in certain cases when using the cache (and non-zero when not using the cache)?

It wasn't failing, no. I ran this test with verbosity on to track that the right thing was happening. Which, of course, is a pretty good sign that there should be some automated scheme.

…l load

msullivan · 2018-03-08T23:09:39Z

I added some test hooks that I think should cover things, and fixed a bug in the fallback to doing a full load that it revealed.

JukkaL

Thanks for the updates! I have some remaining questions and ideas about comments.

JukkaL · 2018-03-09T17:06:20Z

test-data/unit/fine-grained.test

@@ -15,6 +15,11 @@
 --   <optional output from batch pass>
 --   ==
 --   <optional output from first incremental pass>
+--
+--
+-- Modules that are expected to be detected as changed can be checked with [stale ...]


Please make the description more detailed. Say, does changed mean that source code is changed or the file is added/deleted, or only some of these? How does it differ for the first run and successive runs, and what about tests using cache or not using cache?

Also mention [stale2 etc.

JukkaL · 2018-03-09T17:07:59Z

test-data/unit/fine-grained.test

+--
+-- Modules that are expected to be detected as changed can be checked with [stale ...]
+-- while modules that are reprocessed by update (which can include cached files
+-- that need to be loaded) can be checked with [rechecked ...]


Make this more explicit -- here reprocessing only means full module reprocessing, not that we process only some target in the module when following fine-grained dependencies.

JukkaL · 2018-03-09T17:26:03Z

test-data/unit/fine-grained-modules.test

+
+-- in no cache mode, there is no way to know about b yet,
+-- but a should still get triggered
+[stale a]


I don't fully understand why a is included here, since only b had file changes (it was deleted). Please explain in the comment why a is triggered.

The initial cache mode run picks up changes to files that are not "fresh", in order to pick up modules with changed dependencies.

JukkaL · 2018-03-09T17:29:56Z

test-data/unit/fine-grained-modules.test

-- regular tests.
-[case testAddModuleAfterCache1]
+-- doesn't appear in the cache, for cache mode. They are run in
+-- cache mode oly because stale and rechecked differ heavily between


Typo: oly

JukkaL · 2018-03-09T17:31:46Z

test-data/unit/fine-grained-modules.test

 [file b.py.3]
 def foo(x: int) -> None: pass
+
+[stale2 b]


Include [rechecked2 ...?

JukkaL · 2018-03-09T17:34:05Z

test-data/unit/fine-grained-modules.test

+
+-- No files should be stale or reprocessed in the first step since the large number
+-- of missing files will force build to give up on cache loading.
+[stale]


Should we check for [rechecked here as well?

When there is no rechecked, the value of stale is used. This is behavior inherited from the regular incremental test suite. I can document this better in the block comment or just add all the recheckeds in.

JukkaL · 2018-03-09T17:34:12Z

test-data/unit/fine-grained-modules.test

+
+[file b.py.3]
+def foo(x: int) -> None: pass
+[stale2 b]


Similar to above.

JukkaL · 2018-03-09T17:34:44Z

test-data/unit/fine-grained-modules.test

@@ -1064,14 +1093,20 @@ def foo() -> None: pass
 [file b.py.2]
 import a
 a.foo(10)
+
+[stale a, b]


Here and various other places below we don't check for [rechecked. Is this intentional?

JukkaL · 2018-03-09T17:37:41Z

mypy/build.py

@@ -569,6 +571,7 @@ class BuildManager:
      flush_errors:    A function for processing errors after each SCC
      saved_cache:     Dict with saved cache state for coarse-grained dmypy
                       (read-write!)
+      cache_enabled:   Whether cache usage is enabled


Could be worth mentioning how this relates to use_fine_grained_cache -- the latter means that we attempt to use the cache if it exists, and this indicates whether we actually used it or something.

msullivan requested a review from JukkaL March 8, 2018 01:34

JukkaL reviewed Mar 8, 2018

View reviewed changes

msullivan added 3 commits March 8, 2018 10:27

Refactor fine-grained graph processing and optimize deletes in initia…

9dfe8dd

…l load

Refactor away only_load_from_cache, never write to cache

870e8c9

A bunch of cleanup

3a27fa2

msullivan force-pushed the delete_optimize branch from b182f15 to 133a413 Compare March 8, 2018 23:05

msullivan changed the base branch from dmypy-fg-testing to master March 8, 2018 23:06

Add more test facilities, fix a bug revealed by them

e282180

msullivan force-pushed the delete_optimize branch from 133a413 to e282180 Compare March 8, 2018 23:07

msullivan added 2 commits March 8, 2018 16:04

Add type annotations

9c0b5c0

Remove an assert that I shouldn't have added

83c59a2

JukkaL reviewed Mar 9, 2018

View reviewed changes

Some cleanups

9835709

msullivan merged commit 9374acf into master Mar 9, 2018

msullivan deleted the delete_optimize branch March 9, 2018 22:22

Uh oh!

Improve performance of deletions in initial fine-grained cache load #4701

Improve performance of deletions in initial fine-grained cache load #4701

Uh oh!

Conversation

msullivan commented Mar 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msullivan commented Mar 8, 2018

Uh oh!

JukkaL left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

msullivan commented Mar 8, 2018 •

edited

Loading