-
-
Notifications
You must be signed in to change notification settings - Fork 8.2k
tools/verifygitlog.py: Apply stricter rules on git subject line. #15823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
tools/verifygitlog.py: Apply stricter rules on git subject line. #15823
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #15823 +/- ##
=======================================
Coverage 98.54% 98.54%
=======================================
Files 169 169
Lines 21890 21890
=======================================
Hits 21572 21572
Misses 318 318 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
423269c
to
37ec6a4
Compare
Code size report:
|
I support whatever tooling makes it easier for you to generate changelogs(*), but I am concerned that the "Subject prefix must correspond to a path" requirement will make certain kinds of commit messages less informative generally, and might feel pretty punitive for contributors to bounce off in CI. Based on a quick survey of recent history(^), the commit message I'm most concerned about are: Commit title is summarising multiple things with a common prefixThe prefix is meaningful, but the the actual prefix name doesn't exist. Some examples:
For any of these examples, changing them to confirm to the rule would seemingly make the title a significantly worse summary of the change. Commit makes some small change in multiple placesIn particular, multiple ports. Example:
You could argue that this should be split into N commits, but in a lot of cases I'd argue that makes the commit history much noisier for understanding what a particular change is (because need to figure out retrospectively which changes were applied atomically, rather than having them grouped in a commit). What if the prefix requirement was:
This should still allow programmatic extraction of which section a commit applies to in the changelog, and I am guessing that is the main goal here? Separate suggestion, but after they're decided please enumerate all of these rules very clearly in the CODECONVENTIONS doc, and maybe with a few more examples. MicroPython is already quite strict regarding commit message formats, and having this kind of detail arbitrarily rejected (from a new contributor's point of view) can be very dispiriting - especially if it's not obvious what exactly will be accepted. (*) Note there also some third party tools we could lean on for this stuff, i.e. https://git-cliff.org/ (^) My survey method was |
I realised this was my assumption rather than stated as the reason. Is there a different reason? |
Angus' examples above of historical commits that don't match real paths is quite relevant to my experience -- I often find it hard to know which path to use for a given commit, and e.g. using a prefix was a good escape hatch. But I'm not sure I'm entirely convinced that the filename is useful here -- when I'm looking through the history I want to know the conceptual area that was changed, rather than the exact file (we're just duplicating information that the tools already know). As an example, let's say I was changing the code in If the purpose of the subject prefix is to make it easy e.g. to generate the release notes then maybe we should come up with a finite, curated set of valid areas/tags, and then the validation is just against that list. (Similar to what we did to get the board json features somewhat under control). We might even be able to generate that set from the existing git log. Worth mentioning while we're here -- I did spend considerable time a while back trying to make https://github.com/jorisroovers/gitlint work for this repo (and some other tools too), but could not find a way to capture our rules. The change in this PR would make it even less likely to be able to move to a third-party tool. |
Can you add these rules, or examples thereof, to the commit message so that it's clear what this actually does? In any case, more often than not when I'm writing a commit message I look at the files' history to figure out what the proper prefix is. So it would definitely not be bad to have a clear and fixed set of rules there.
Indeed; I assume
I could see that being clear, but since none of the previous commits modifying the scheduler do that breaking consistency would also not be ideal. |
The arguments Angus brought up are valid in my opinion. There are quite a few commits which are not limited to a single file. And sometimes even if it is a single file it may be inconvenient to use when the path name is long. Making a good and informative commit message is a challenge, like making a good headline in the newspaper business. The length is limited, it shall carry the core message and still be acceptable language. More restrictions make it even harder. Having a clear message in the commit message seems to me more important than calling a single file that has been affected. However, if it is a single file, it can be named. Other examples where you would break the single file rule would be a change to a xyz.c file and the related xxx.h header file(s). So one would mention the xyz.c file in the commit message, but not the other changes. Or consider changes in the configuration split between a mpconfigport.h and mpconfigboard.h pair. Which file to mention? Or recently the change to the .mk files handling variants. |
Hmm, I really didn't expect this to be so controversial! Most of the rules here are already followed implicitly and my aim was to simply encode them so that they are properly enforced. Eg sometimes a contributor starts a commit message with Doing a quick check of the 1000 most recent commits, about 98% of them conform to the rules here (would need to do a proper count to get a more accurate number, because some files were renamed and then the check fails on them after the rename). That's pretty consistent, and IMO worth making the remaining 2% also conform.
Yes, I missed this, it should accept a comma-separated list of valid prefixes.
Yes, good idea. |
Apologies for derailing the discussion a bit but since an issue for this topic is already here... Can this be also updated to not discard the commit message if there's a typo in there? It's really frustrating to type a 10-12 lines commit message only for it to disappear due to a typo, it should probably be saved somewhere else. The typo doesn't even have to be in the commit message - even names of files that are not scheduled for commit trigger the rejection. Ideally the checks should occur between line 1 and wherever the sign-off should be. If needed I can submit a new issue specifically for those two bits. |
This one is a git pre-commit hook thing, rather than a If you can see a better way to retain the message when using pre-commit then we could also add that. Maybe echoing the old message to the console, or even just a clue about "Existing message saved in {path}" would help a lot?
This seems like a bug in the pre-commit hook for sure, could you open an issue for it please? |
I don't want this to sound overly argumentative, but what problem is caused by these occasional inconsistencies? Is there some task that will become a lot easier with stricter enforcement here? If there is, then absolutely in favour of adding rules to make that thing easier (assuming that's the best way to improve it). As @jimmo said, the git commit already has information about paths, etc. in the actual commit data. There's no end of options like On the other hand, like @robert-hh said, commit messages have a human component - they're subjective metadata by their nature. They give the author a chance to communicate intent, context, and similar things which aren't in the actual commit data. Those things are often slightly fuzzy. Moreover, there's no set of mechanistic rules that will force someone to write a clear commit message if they're not inclined to do that. On the other hand, as discussed above, there might be excessively strict rules that force someone to write a less communicative commit message in order to get it past the checks. Should also not rule out the increased frustration that strict and arbitrary-seeming hook & CI failures create for contributors. To be clear, I'm not arguing in favour of messy git commit logs or that commit messages don't matter. I'm mostly saying that it seems to me MicroPython already has a very good commit history! So if we're going to make the bots stricter, it should be in return for significant other benefit(s). |
Imo the benefit of consistency alone is worth pursuing and significant enough. Makes reading easier, and writing as well. And since the vast majortity of the messages are in fact correct already, it is unlikely that this is going to lead to substantial frustration? |
@projectgus Thanks for the info about recovering the commit message - I didn't know that was possible.
Something like that or the git command line to recover the commit would help me for sure, but maybe I'm the exception here.
Done. |
No I second your pain, I hate losing a carefully scripted commit message because of a pre-commit fail. That's one of the things I love about the smartgit gui app I use, it remembers the message and reuses it by default if the commit doesn't go through. I didn't realise there's possible ways to do the same on command line, I'll have to look them up too |
From the perspective of someone who's run afoul of the commit message formatting a number of times:
|
Well I'll chime in for the other side and say the rules were already too strict (and the error messages WAY too vague) and making them stricter is going to definitely push more people away from contributing... even with clearer error messages. For example... leaving a Also, I feel if the initial commit message used to raise a PR follows the correct formatting, then subsequent commit messages in a PR should not have to, if it's something minor like "fixing a code formatting fail" - like when I get told I can't use |
Let me give some recent examples of PRs and their commit messages (I'm not intending to single out people here, it's just that these were recent PRs I processed that are good examples of why I wanted to make the rules stricter): #15822This is a very simple, small PR, and it touches only
That should be This may seem like a very small inconsistency but when you scan through the commit history and some things start with (Note that I rewrote this commit message to start with #13572This PR has two commits. Both of them touch exactly the same set of files (
I'm really not sure why they are different like that, but IMO they should both be the same and both start with #15798This was titled:
I rewrote it to start with #15854This was titled:
I rewrote it to start with #12283The two commits in this PR were:
In addition to starting with
All the above examples pass the current checks but would have failed the checks added in this PR. Of course this PR doesn't check grammar/tense, that would be very hard to do (maybe with an LLM). So there's always going to be some level of human checking needed, if we want to keep consistency. Maybe I'm being very picky here? I actually don't mind tweaking commit messages during merge, and I do like the consistency it brings. But moving forward, if other maintainers are going to be merging commits, it would be good to have a set of rules in place around what is an acceptable commit message. This PR tries to do that in a simple way using path names (although still needs some exceptions to the rule). Maybe there's a better way to force consistency without being as strict. |
In a sense, yes, but in the end it's just different semantics to say 'we want consistency' :) As mentioned earlier: imo what is lacking from this commit is spelling out more clearly what is and what is not allowed. For example codeconventions.md still says "It's also ok to drop file extensions." whereas this commit enforces a slightly different rule. |
Hrm, would something like vale together with languagetool help here? Given that it won't work 100% of the time, maybe having extra checks in the precommit hooks using those tools can still be helpful. Just run them only if the appropriate tools are installed on the system and report a warning instead of an error. It's still better than nothing, especially for people who aren't native English speakers :) Vale already has an integration for GitHub CI, not sure about languagetool. |
37ec6a4
to
c077308
Compare
There is a bit of ambiguity as to how the prefix of the git subject line should look like. Eg `py/vm: ...` vs `py/vm.c: ...` (whether the extension should be there or not). This commit makes the existing CI check of the git commit message stricter, by applying extra rules to the prefix, the bit before the : in the subject line. Full error messages are given when a rule does not pass. This helps to reduce maintainer burden by applying stricter rules, to keep the git commit history consistent. Signed-off-by: Damien George <damien@micropython.org>
c077308
to
50e9378
Compare
I've updated this to not be as strict. It no longer checks that the prefix of the subject line is a file in the repo. So now it just checks:
|
Summary
There is a bit of ambiguity as to how the prefix of the git subject line should look like. Eg
py/vm: ...
vspy/vm.c: ...
(whether the extension should be there or not).This commit makes the existing CI check of the git commit message stricter, by applying extra rules to the prefix, the bit before the
:
in the subject line. Full error messages are given when a rule does not pass.This helps to reduce maintainer burden by applying stricter rules, to keep the git commit history consistent.
Testing
Tested by making a dummy commit with various errors in the subject prefix.