Skip to content

Load .data.json files on-demand in fine-grained mode #4910

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Apr 24, 2018
Merged

Conversation

msullivan
Copy link
Collaborator

Depends on #4906.

This gives around a 13s speed up for cold runs on S.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expected performance improvement is very impressive! Looks mostly good, just some minor notes. It would be nice to get rid of the MypyFile subclass, but maybe we can leave it for another PR.

I think that this would be worth testing against some collection of real-world commits, as this is a bit of a risky change.

if to_process:
manager.log("Calling process_fresh_scc on an 'scc' of size {} ({})".format(
len(to_process), to_process))
process_fresh_scc(graph, to_process, manager)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The modules to process may contain multiple SCCs, so the process_fresh_scc function name is a misnomer. Maybe rename the function, and mention in the docstring that it can be used to process a SCC?

"""Find names of all targets that need to reprocessed, given some triggers.

Returns: Dictionary from module id to a set of stale targets.
Returns: a tuple containing a:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: Capitalize the first 'a'.

Returns: Dictionary from module id to a set of stale targets.
Returns: a tuple containing a:
* Dictionary from module id to a set of stale targets.
* A set of module ids for unparsed modules with stale targets
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grammar nit: missing period at the end.

mypy/build.py Outdated
def calculate_mros(self) -> None:
assert self.tree is not None, "Internal error: method must be called on parsed file only"
fixup_module_pass_two(self.tree, self.manager.modules)
fixup_module(self.tree, self.manager.modules,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why the two fixup passes now can be combined into one?

mypy/nodes.py Outdated
@@ -259,6 +259,31 @@ def deserialize(cls, data: JsonDict) -> 'MypyFile':
return tree


class UnloadedMypyFile(MypyFile):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is pretty unfortunate. How hard would it be to not need this?

module, we don't need to explore its dependencies. (This
invariant is slightly violated when dependencies are added, which
can be handled by calling find_unloaded_deps directly on the new
dependencies)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: missing period at the end of final sentence.


def ensure_trees_loaded(manager: BuildManager, graph: Dict[str, State],
initial: Sequence[str]) -> None:
"""Ensure that the modules in initial and their deps have loaded trees"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Missing period at the end of the docstring.

msullivan added a commit that referenced this pull request Apr 17, 2018
This is a prerequisite for loading .data.json files on demand (#4910),
since if MROs are computed on-demand when a tree is loaded, it is
impossible to detect changes in the MRO caused by a change in some
other file that triggered an on-demand load.
msullivan added a commit that referenced this pull request Apr 17, 2018
This is a prerequisite for loading .data.json files on demand (#4910),
since if MROs are computed on-demand when a tree is loaded, it is
impossible to detect changes in the MRO caused by a change in some
other file that triggered an on-demand load.
@msullivan msullivan changed the base branch from deps-file to master April 17, 2018 19:21
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to merge once you've tried this with a bunch of real commits in Dropbox internal code, and please monitor the Dropbox internal builds and metrics afterwards for failures.

@msullivan msullivan merged commit 250eafe into master Apr 24, 2018
@msullivan msullivan deleted the less-deser branch April 24, 2018 02:45
msullivan added a commit that referenced this pull request May 9, 2018
The logic in build to determine what imported modules are depended on
used to elide dependencies to m in `from m import a, b, c` if all of
a, b, c were submodules. This was removed in #4910 because it seemed
like it ought not be necessary (and that semantically there *was*
a dependency), and early versions of #4910 depended removing it.

The addition of this dependency, though, can cause cycles that
wouldn't be there otherwise, which can cause #4498 (invalid type when
using aliases in import cycles) to trip when it otherwise wouldn't.

We've seen this once in a bug report and once internally, so restore
the `all_are_submodules` logic in avoid triggering #4498 in these
cases.
msullivan added a commit that referenced this pull request May 9, 2018
The logic in build to determine what imported modules are depended on
used to elide dependencies to m in `from m import a, b, c` if all of
a, b, c were submodules. This was removed in #4910 because it seemed
like it ought not be necessary (and that semantically there *was*
a dependency), and early versions of #4910 depended removing it.

The addition of this dependency, though, can cause cycles that
wouldn't be there otherwise, which can cause #4498 (invalid type when
using aliases in import cycles) to trip when it otherwise wouldn't.

We've seen this once in a bug report and once internally, so restore
the `all_are_submodules` logic in avoid triggering #4498 in these
cases.

Fixes #5015
msullivan added a commit that referenced this pull request May 14, 2018
)

The logic in build to determine what imported modules are depended on
used to elide dependencies to m in `from m import a, b, c` if all of
a, b, c were submodules. This was removed in #4910 because it seemed
like it ought not be necessary (and that semantically there *was*
a dependency), and early versions of #4910 depended on removing it.

The addition of this dependency, though, can cause cycles that
wouldn't be there otherwise, which can cause #4498 (invalid type when
using aliases in import cycles) to trip when it otherwise wouldn't.

Unfortunately the dependency on the module is actually required for
correctness in some corner cases, so instead of eliding the import, we
lower its priority. This causes the cycles in the regressions we are
looking at to get processed in the order that works.

This is obviously just a workaround.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants