Add option to tag known messages with an error code #6152

chadrik · 2019-01-06T19:51:41Z

Adds --show-error-codes for tagging known error messages (and their notes) with error codes. This is a stepping stone toward native message filtering (#2417), but it is useful on its own for users who want to implement their own basic message filtering without relying on brittle regex against messages. Also adds --list-error-codes to get a list of all known error codes.

Design goal

Provide a way of registering message ids that is:

accessible from main, so that they can be listed with --list-error-codes
concise
able to handle the 3 types of error messages in a fairly consistent way:
1. static messages, with no args to format: currently have constants in mypy.messages
2. messages with basic string args to format: currently have constants in mypy.messages (mostly)
3. messages with more complex formatting: currently have methods on MessageBuilder
type safe for the code reporting the error, to avoid mistakes with the necessary arguments for the given message type

Current Design

The design that I settled on is this:

convert all messages to methods of MessageBuilder
use a @tracked decorator to identify which methods emit messages which should be tracked
- by default, the name of the method is the message id, but this can be overridden
- multiple methods can have the same id, which places them all into the same group, which can be filtered together

Questions

naming: I was a bit non-committal on the terminology to use: "message id", "error code"? Internally I went with "message id", but the cli uses "error code"
should there be a less verbose way of tracking constant error messages? I created a descriptor-based prototype of this that I did not include, which allowed simple messages to be added to MessageBuilder like this:
```
  no_return_exepected = ErrorReporter('No return value expected')
  invalid_implicit_return = ErrorReporter('Implicit return in function which does not return')
```
this would be called like a normal MessageBuilder failure message: self.msg.invalid_implicit_return(ctx). I was concerned it was a bit too magical.

What's next

This PR is not complete. My idea is to finish converting message constants to MessageBuilder methods, as I already have with no_return_exepected and must_have_none_return_type, but I wanted to get feedback on the design before doing this tedious work.

Once this is complete we can begin expanding tracked messages and adding native filtering of tracked messages.

ilevkivskyi

Not a real review, just some random comments (but I like the idea in general).

ilevkivskyi · 2019-01-07T15:35:32Z

mypy/main.py

+                        group=error_group)
+    error_group.add_argument(
+        '--list-error-codes', action='store_true',
+        help="List known error codes and exit")


Why didn't you add tests for the new flags? I would be great to see how they work.

I wanted to get tentative approval of the idea first. Sounds like I have that, so I'll do that shortly.

ilevkivskyi · 2019-01-07T15:39:45Z

mypy/main.py

+        import mypy.messages
+        for msg_id in sorted(mypy.messages.MessageBuilder.get_message_ids()):
+            print(msg_id)
+        raise SystemExit(0)


SystemExit is never used in build.py and main.py. I think it can cause bad interference with mypy.api etc.

Yes, I was wary of doing it, but without it we need a way of communicating back to main that it should exit there without doing anything. I figured SystemExit behavior for this flag is the same behavior as with --version.

The return of process_options is Tuple[List[BuildSource], Options]. I can make List[BuildSource] optional, and then check for that in main. What do you think?

I addressed this by making List[BuildSource] optional.

ilevkivskyi · 2019-01-07T15:45:06Z

mypy/main.py

+                        help="Show error codes in error messages",
+                        group=error_group)
+    error_group.add_argument(
+        '--list-error-codes', action='store_true',


I think this should accept an optional pattern: mypy --list-error-codes return would show only error codes that have a "return" substring.

sounds good

mypy/messages.py

ilevkivskyi · 2019-01-07T16:02:34Z

mypy/messages.py

-    def has_no_attr(self, original_type: Type, typ: Type, member: str, context: Context) -> Type:
+    @tracked()
+    def no_return_exepected(self, context: Context) -> None:
+        self.fail('No return value expected', context)


Converting all static messages to this format may cause lots of code churn, but in principle I am fine with this.

Another possible option is to have a global dictionary (or JSON file) mapping error message to error code:

message_to_id = { NO_RETURN_VALUE_EXPECTED: 'unexpected_return', MUST_HAVE_NONE_RETURN_TYPE: 'unexpected_return', MISSING_RETURN_STATEMENT: 'missing_return', ... }

Another possible option is to have a global dictionary (or JSON file) mapping error message to error code

I think that could be a good compromise, but it doesn't work for messages that require formatting so we'd have to be very clear about this. A developer might expect this to work with the message_to_id dict:

self.fail(messages.MUST_HAVE_NONE_RETURN_TYPE.format(fdef.name()), item)

We can protect against this in a few ways:

clear comments near message_to_id documenting that this is only for static messages

validate message_to_id keys and raise an internal error if formatting operations {} are found.

add a variation of MessageBuilder.fail, or a flag, that raises an exception if the message is not in message_to_id (I'm assuming here that we're not going to get to 100% message id coverage in the near term)

I just pushed a commit inspired your message_to_id idea, which addresses most of my concerns about type safety. I used an Enum, where the names are ids and the values are the messages. This is used for messages that are static or require only basic formatting (though I haven transitioned all of the code yet). If you like it, I'll finish it up.

@ilevkivskyi Sorry to bug you. Can I get a sign off on the idea of using Enum? If that sounds like a good idea to you I think I can take this PR to completion.

NP, I will think more about this tomorrow and let you know, I was really busy at work last two days.

Everything you've written makes sense. The main problem is that I can't get over the notion that the message_registry dictionary is going to be awfully redundant and tedious to maintain, since it's mostly going to be the message constant followed by a lowercase string of the message constant's name (depending on how broad vs granular we are with grouping) :

message_registry = { NO_RETURN_VALUE_EXPECTED: 'no_return_value_expected', MISSING_RETURN_STATEMENT: 'missing_return_statement', INVALID_IMPLICIT_RETURN: 'invalid_implicit_return', INCOMPATIBLE_RETURN_VALUE_TYPE: 'incompatible_return_type', RETURN_VALUE_EXPECTED: 'return_value_expected', NO_RETURN_EXPECTED: 'no_return_expected', INVALID_EXCEPTION: 'invalid_exception', INVALID_EXCEPTION_TYPE: 'invalid_exception', RETURN_IN_ASYNC_GENERATOR: 'return_in_async_generator', }

I also don't like that the message and the id are stored in different locations. I think making the constants into basic data classes would be an improvement:

NO_RETURN_VALUE_EXPECTED = Msg('No return value expected', 'no_return_value_expected') # type: Final MISSING_RETURN_STATEMENT = Msg('Missing return statement', 'missing_return_statement') # type: Final

But I'd really love to design a system where the constant doubles as the id, unless it's overridden by a group definition. If enums won't work, the only other way I can think of to do this is with a descriptor. Should I just give up on this idea?

But I'd really love to design a system where the constant doubles as the id

Why not move all the constants into a dedicated module, and iterate over the module namespace and calculate the id based on the attribute name? This sort of metaprogramming should be okay with mypyc. Something like this:

... FOO_BAR = Msg('Foo bar') MSG_B = Msg('Another message') for name, val in list(globals().items()): if isinstance(val, Msg) and val.id is None: val.id = name.lower()

@chadrik I like what Jukka proposed. We should probably still support passing bare string to self.fail(), this will allow:

Not breaking existing plugins

Gradual transition to the new scheme for existing error messages (I strongly prefer two-three smaller PRs, than one large).

Passing a bare string would turn it into Msg(string, 'uncategorized_error').

Btw, the module may be named message_registry.py :-)

Also as I said above it should be clearly documented (probably both in the new module docstring and in docstring for self.register()) that there are two ways to register an error message: simple static and format strings should just go as constants in the module (and nothing else), error messages with complex formatting logic can be manually registered with @register().

Gradual transition to the new scheme for existing error messages (I strongly prefer two-three smaller PRs, than one large)

How about I start with a new PR that moves all of the existing constants into mypy.message_registry with no other changes? That seems to be one part of the design we've all come to agreement on.

TBH, I don't see how this is important now. Anyway, we agreed that @JukkaL will follow on this and related PRs.

ilevkivskyi · 2019-01-07T16:09:06Z

messages with basic string args to format: currently have constants in mypy.messages (mostly)

Maybe we should fix the "(mostly)" part first? (I mean finish moving the error message constants to mypy.messages.)

JukkaL · 2019-01-11T13:15:33Z

I think that we should discuss how to represent error codes in abstract before looking at merging any code. This is a fairly big change, and it would be good to consider this from a few different viewpoints.

Here are some questions to consider (some of these were raised by Ivan above):

How are new static messages defined?
How are new runtime constructed messages defined?
What would the error message ids look like in mypy output or in the config file (give examples for a few common messages)?
How would the message ids be shown in mypy output?
At what granularity should the message ids be (i.e., should we group similar messages under one id)?
How well does this the approach work with mypyc? In particular, mypyc is bad at compiling certain Python idioms such as decorated methods and enums. We should perhaps either use those idioms sparingly or first improve how mypyc handles them. This can be important since there are many error messages, so whatever approach we choose it will probably have many instances in the code.
How can user plugins define new error messages?
Do we need to do something to avoid conflicts between different plugins (e.g. both define a similar message)?
Does the proposed approach slow down mypy startup? This is probably not an issue, but in case we construct some complex data structures this might plausibly have a measurable effect.

Iterating on a big PR is a lot of work, especially if we need to go through multiple iterations. So I'd suggest creating an issue that describes the current proposed approach in some detail and we can look for answers to the above questions. We can move on from there once we have a clearer picture of the tradeoffs.

ilevkivskyi · 2019-05-08T16:25:22Z

IIUC this is now replaced by #6472, so closing this PR in favour of the latter.

ilevkivskyi reviewed Jan 7, 2019

View reviewed changes

chadrik added 9 commits January 8, 2019 00:03

Add basic prototype of message id tracking

d88fe24

Add --show-error-codes option to display known error codes in messages

51146b0

Add --list-error-codes to list known message error codes

9052981

get all tests passing

69d921a

cleanup

3c3b52f

bugfix

a978e83

add docs for @Tracked

8654084

rename tracked to register

3cec328

don't raise SystemExit in process_options

cfb1913

chadrik force-pushed the message_ids branch from a4bb1f3 to cfb1913 Compare January 8, 2019 08:04

chadrik added 2 commits January 8, 2019 22:06

Prototype of messages and ids as an Enum

3818829

Add id tracking for Enum messages

f721e2d

chadrik mentioned this pull request Jan 15, 2019

Move message constants to mypy.message_registry #6194

Merged

chadrik mentioned this pull request Feb 23, 2019

Add flag to tag known messages with an error code (take 2) #6472

Closed

ilevkivskyi closed this May 8, 2019

Uh oh!

Add option to tag known messages with an error code #6152

Add option to tag known messages with an error code #6152

Uh oh!

Conversation

chadrik commented Jan 6, 2019

Design goal

Current Design

Questions

What's next

Uh oh!

ilevkivskyi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chadrik Jan 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi Jan 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilevkivskyi commented Jan 7, 2019

Uh oh!

JukkaL commented Jan 11, 2019

Uh oh!

ilevkivskyi commented May 8, 2019

Uh oh!

Uh oh!

chadrik Jan 8, 2019 •

edited

Loading

ilevkivskyi Jan 14, 2019 •

edited

Loading