Skip to content

Failure to determine correct level bump with custom parser in 9.17.0 #1162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pwrmiller opened this issue Jan 28, 2025 · 7 comments · Fixed by #1165
Closed

Failure to determine correct level bump with custom parser in 9.17.0 #1162

pwrmiller opened this issue Jan 28, 2025 · 7 comments · Fixed by #1165
Labels
awaiting-reply Waiting for response bug Something isn't working properly

Comments

@pwrmiller
Copy link

pwrmiller commented Jan 28, 2025

Bug Report

Description

Different / breaking behavior between 9.16.1 and 9.17.0 in algorithm.py Step 5. Parse commits to determine the bump level that should be applied.

Expected behavior

I would have expected the same behavior between 9.16.1 and 9.17.0 as minor version changes should not exhibit breaking behavior. This seems to be a regression.

Actual behavior

Specifically, in 9.16.1, parsed_levels will contain the results of the parsed levels in the commits_since_last_release, but 9.17.0 will not. It is not entirely clear to me where this is failing due to the complexity of the mapped/filtered set.

Environment

  • Operating System (w/ version): MacOS
  • Python version: 3.12
  • Pip version: 25.0 (using Poetry)
  • Semantic-release version: 9.17.0 has the regression
  • Build tool (w/ version): poetry 1.8.3
poetry show python-semantic-release
 name         : python-semantic-release                           
 version      : 9.16.1                                            
 description  : Automatic Semantic Versioning for Python projects 

dependencies
 - click >=8.0,<9.0
 - click-option-group >=0.5,<1.0
 - dotty-dict >=1.3,<2.0
 - gitpython >=3.0,<4.0
 - importlib-resources >=6.0,<7.0
 - jinja2 >=3.1,<4.0
 - pydantic >=2.0,<3.0
 - python-gitlab >=4.0,<5.0
 - requests >=2.25,<3.0
 - rich >=13.0,<14.0
 - shellingham >=1.5,<2.0
 - tomlkit >=0.11,<1.0


git log --oneline --decorate --graph --all -n 50
"""
*   5947a27 (origin/main, origin/HEAD) Merged PR 34224: some description
|\
| * 484adac (HEAD -> fix/950777-something) fix(XYZ-950777): some description
|/
*   6a320ac (main) Merged PR 34214: feat(XYZ-950230): some description
|\
| * fdc2c4b (feat/950230-ruff-version-for-check) feat(XYZ-950230): some description
|/
* b3d8aab (tag: v3.17.0) chore(release): release v3.17.0 [skip ci]
*   3ba9e71 Merged PR 33609: feat(XYZ-938710): some description
|\
| * eed8b07 feat(XYZ-938710): some description
|/
* 1886245 (tag: v3.16.1, fix/947167-something) chore(release): release v3.16.1 [skip ci]
*   a4ca462 Merged PR 33438: fix(XYZ-945996): some description
|\
| * d3096b5 fix(XYZ-945996): some description
| * 89e7a32 fix(XYZ-945996): some description
| * 241cee5 fix(XYZ-945996): some description
|/
*   eb53d60 Merged PR 32668: docs(XYZ-936992): some description
|\
| * ed1823c docs(XYZ-936992): some description
|/
* 1f4aeac (tag: v3.16.0) chore(release): release v3.16.0 [skip ci]
* e944328 Merged PR 32255: feat(XYZ-877699): some description
* b8f74f1 (tag: v3.15.0) chore(release): release v3.15.0 [skip ci]
*   b16df6e Merged PR 32425: some description
|\
| * dcbcb90 feat(XYZ-929489): some description
| | * 22ecbd7 (origin/feat/877699-something) feat(XYZ-877699): some description
| | * 19a171e feat(XYZ-877699): some description
| | * cc1a1ed feat(XYZ-877699): some description
| | * 2d2b874 feat(XYZ-877699): some description
| | * b734132 feat(XYZ-877699): some description
| | * fbcbab3 feat(XYZ-877699): some description
| | * 6c1b9f1 feat(XYZ-877699): some description
| |/
|/|
* | 44e166e (tag: v3.14.1) chore(release): release v3.14.1 [skip ci]
* |   d3d9c78 Merged PR 32406: fix(XYZ-929227): some description
|\ \
| * | fb90a03 fix(XYZ-929227): some description
|/ /
* | 60427a2 (tag: v3.14.0) chore(release): release v3.14.0 [skip ci]
* |   c9e0ac6 Merged PR 32349: feat: some description
|\ \
| |/
|/|
| * 921073b feat: update something
|/
* eab864b (tag: v3.13.1) chore(release): release v3.13.1 [skip ci]
*   4ab8ff1 Merged PR 32022: increasing minimum aws-cdk-lib version
|\
| * 4552a43 fix(933941): some description
* | 5fc2b5c (tag: v3.13.0) chore(release): release v3.13.0 [skip ci]
* |   651895b Merged PR 31947: feat(XYZ-933508): some description
|\ \
| * | 6a67df4 feat(XYZ-933508): some description
|/ /
| | * 028a9dd (origin/feat/933508-merge-coverage) Initial commit
| | * 99a7bcb feat(XYZ-933508): some description
| |/
|/|
| | * 46f6fb3 (origin/feat/877699-something, feat/877699-something) feat(XYZ-877699): some description
| |/
|/|
* | ad8093f (tag: v3.12.0) chore(release): release v3.12.0 [skip ci]
* |   8dd80e3 Merged PR 31794: feat(XYZ-932586): some description
|\ \
| * | 2ad1331 (feature/932586-something) feat(XYZ-932586): some description
* | | e456008 (tag: v3.11.0) chore(release): release v3.11.0 [skip ci]
* | |   f04be2d Merged PR 31765: XYZ-930364: some description
|\ \ \
| * | | 17a541e (origin/feature/XYZ-930364-something) feat(XYZ-930364): some description
|/ / /
| | | *   f103abc (refs/stash) WIP on feature/XYZ-930364-something: 2bc9a7e some description
| | | |\
| | | | * a4d443c index on feature/XYZ-930364-something: 2bc9a7e some description
| | | |/
| | | * 2bc9a7e (feature/XYZ-930364-something) some description

Configuration

Semantic Release Configuration I use a custom commit parser to support working with Azure Devops.
import re
from typing import Any, Optional

import git
import toml
from semantic_release import (
    CommitParser,
    LevelBump,
    ParsedCommit,
    ParseError,
    ParseResult,
    ParserOptions,
)


class MyCommitParserOptions(ParserOptions):
    def __init__(self, **_: Any):
        # Load whole configuration
        super().__init__(**_)
        pyproject = toml.load("pyproject.toml")

        # Extract commit parsers options
        commit_parser_options = pyproject["tool"]["semantic_release"][
            "commit_parser_options"
        ]

        # Load options as instance variables
        self.allowed_tags = commit_parser_options["allowed_tags"]
        self.minor_tags = commit_parser_options["minor_tags"]
        self.patch_tags = commit_parser_options["patch_tags"]
        self.default_bump_level = LevelBump(commit_parser_options["default_bump_level"])


class MyCommitParser(CommitParser[ParseResult, MyCommitParserOptions]):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.options = ADOCommitParserOptions()

    def parse(self, commit: git.objects.commit.Commit) -> Optional[ParseResult]:
        sentences = commit.message.strip().split("\n\n")
        unique_sentences = list(dict.fromkeys(sentences))
        message = "\n\n".join(unique_sentences)
        allowed_tags_regex_part = "|".join(map(re.escape, self.options.allowed_tags))
        match = re.match(
            rf"^.*({allowed_tags_regex_part})(?:\(([^)]+)\))?: (.+)(?:\n|$)([\s\S]*)",
            message,
        )

        if match:
            type_ = match.group(1)
            scope = match.group(2) or ""
            description = match.group(3)
            rest_of_message = match.group(4)

            descriptions = [description]
            breaking_descriptions = []

            for line in rest_of_message.split("\n"):
                line = line.strip()
                if line.startswith("BREAKING CHANGE: "):
                    breaking_descriptions.append(line.replace("BREAKING CHANGE: ", ""))
                elif line != "":
                    descriptions.append(line)

            bump = self.options.default_bump_level
            if breaking_descriptions:
                bump = LevelBump.MAJOR
            elif type_ in self.options.minor_tags:
                bump = LevelBump.MINOR
            elif type_ in self.options.patch_tags:
                bump = LevelBump.PATCH

            return ParsedCommit(
                bump=bump,
                type=type_,
                scope=scope,
                descriptions=descriptions,
                breaking_descriptions=breaking_descriptions,
                commit=commit,
            )

        else:
            return ParseError(commit=commit, error="Could not parse commit message")
[tool.semantic_release]
assets = []
build_command_env = []
commit_message = "chore(release): release v{version} [skip ci]"
commit_parser = "commit_parser:MyCommitParser"
logging_use_named_masks = false
major_on_zero = true
allow_zero_version = true
no_git_verify = false
tag_format = "v{version}"
version_toml = [
    "pyproject.toml:tool.poetry.version",
]
version_variables = [
    "calabrio_cdk/_version.py:__version__"
]

[tool.semantic_release.branches.main]
match = "(main|master)"
prerelease_token = "rc"
prerelease = false

[tool.semantic_release.changelog]
template_dir = "templates"
changelog_file = "CHANGELOG.md"
exclude_commit_patterns = [
  '''chore(?:\([^)]*?\))?: .+''',
  '''ci(?:\([^)]*?\))?: .+''',
  '''refactor(?:\([^)]*?\))?: .+''',
  '''style(?:\([^)]*?\))?: .+''',
  '''test(?:\([^)]*?\))?: .+''',
  '''build\((?!deps\): .+)''',
  '''Merged? .*''',
]

[tool.semantic_release.changelog.environment]
block_start_string = "{%"
block_end_string = "%}"
variable_start_string = "{{"
variable_end_string = "}}"
comment_start_string = "{#"
comment_end_string = "#}"
trim_blocks = false
lstrip_blocks = false
newline_sequence = "\n"
keep_trailing_newline = false
extensions = []
autoescape = true

[tool.semantic_release.commit_author]
env = "GIT_COMMIT_AUTHOR"
default = "semantic-release <semantic-release>"

[tool.semantic_release.commit_parser_options]
allowed_tags = ["build", "chore", "ci", "docs", "feat", "fix", "perf", "style", "refactor", "test"]
minor_tags = ["feat"]
patch_tags = ["fix", "perf"]
default_bump_level = 0

[tool.semantic_release.remote]
name = "origin"
type = "github"
domain = "dev.azure.com"
api_domain = "dev.azure.com"
ignore_token_for_push = true
insecure = false
url = { env = "AZURE_REPOSITORY_URL" }

[tool.semantic_release.remote.token]
env = "AZURE_TOKEN"

[tool.semantic_release.publish]
dist_glob_patterns = ["dist/*"]
upload_to_vcs_release = false

Build System Configuration

GitHub Actions Job Definition

Execution Log

poetry run semantic-release -vv version --no-vcs-release

9.17.1

           DEBUG    [cmd.execute] Popen(['git', 'rev-list', 'v3.17.0', '--'], cwd=/Users/me/project, stdin=None, shell=False, universal_newlines=False)                                            cmd.py:1253
           INFO     [algorithm._traverse_graph_for_commits] Found 3 commits since the last release!                                                                                                                algorithm.py:112
           DEBUG    [algorithm.next_version] parsed the following distinct levels from the commits since the last release: set()     

9.16.1

           DEBUG    [cmd.execute] Popen(['git', 'rev-list', 'v3.17.0', '--'], cwd=/Users/me/project, stdin=None, shell=False, universal_newlines=False)                                            cmd.py:1253
           DEBUG    [algorithm.dfs] queuing parent commit 6a320ac                                                                                                                                                   algorithm.py:92
           DEBUG    [algorithm.dfs] queuing parent commit b3d8aab                                                                                                                                                   algorithm.py:92
           DEBUG    [algorithm.dfs] queuing parent commit fdc2c4b                                                                                                                                                   algorithm.py:92
           DEBUG    [algorithm.dfs] queuing parent commit b3d8aab                                                                                                                                                   algorithm.py:92
           INFO     [algorithm._traverse_graph_for_commits] Found 3 commits since the last release!                                                                                                                algorithm.py:112
           DEBUG    [algorithm.next_version] parsed the following distinct levels from the commits since the last release: {<LevelBump.PATCH: 2>, <LevelBump.MINOR: 3>}                                   

Additional context

Debugging suggests that perhaps the result of the filter on isinstance for ParsedCommit might be failing, since at this point the parsed_result is already a LevelBump. Though this is not entirely clear due to the complexity of step 5.

The commit parser returns the correct ParsedCommit(s) in both versions of PSR, however, the filtered list in parsed_levels differs between versions.

@pwrmiller pwrmiller added bug Something isn't working properly triage waiting for initial maintainer review labels Jan 28, 2025
@codejedi365
Copy link
Contributor

@pwrmiller, thanks for pointing me to a specific place in the code that you think there is a change, however, this is not enough information for me to understand what is happening and how to replicate.

Please fill in your configuration in the boxes above and provide your commit messages in a git tree so that I can replicate the problem because all of our extensive test cases did not catch/replicate your environment.

Obviously, it was not intended to be a breaking change otherwise I would of bumped it otherwise but people make mistakes.

@codejedi365 codejedi365 added awaiting-reply Waiting for response and removed triage waiting for initial maintainer review labels Jan 28, 2025
@pwrmiller
Copy link
Author

@codejedi365 I've provided additional context on my configuration. My apologies for not providing sufficient context for your review.

@codejedi365
Copy link
Contributor

@pwrmiller, I did find a bug in the code. Basically I was attempting to maintain the legacy return value of the parse() method which is a single ParseResult value--now with the feature to parse squash commits, this has changed to also allow a list[ParseResult] objects to be returned. To prevent a breaking change and maintain compatibility, I attempted to detect and cast all return values to a list with isinstance(result, Iterable). This ends up not working correctly because at the heart of it a ParseCommit extends the NamedTuple class which is iterable itself (which I overlooked as it's my first real time using NamedTuples). The custom parser with the single return value was the different scenario that was not covered in testing with the last release.

I've came up with an initial fix to apply more stricter type checking from return values of parsers, followed with a better type cast.

@codejedi365
Copy link
Contributor

codejedi365 commented Jan 29, 2025

Separately, I wanted to offer a few suggestions to your custom parser implementation.

  1. I don't know if you happened to see my note on issue Commit parsers to handle merge commit prefixes #540 related to Asure DevOps, but it is my general recommendation to automatically ignore merge commits as they generally create duplicative information in changelogs. It looks like one of the features is that this parser removes the merge commit prefix but you have some other stuff happening. I recommend re-evaluating because the downside with a custom parser is you will miss out on a lot of the new updates and implementations I have been releasing over time.

  2. Here is a few changes that I would make to your implementation. First off, PSR will automatically load your configuration from pyproject.toml and pass it into your options object (if it is defined in the parser class, see below) so there is no need to load that part yourself. I don't see any unique options defined so I just had it extend AngularParserOptions as that will inherit all of your expected values. If you want to hard code them instead then I would just copy the fields to define the dataclass rather than using the init function. The base CommitParser class will automatically load the options so all you need on the parser init is to call the super class.

+ from semantic_release.commit_parser.angular import AngularParserOptions
+ from semantic_release.helpers import force_str

+ class MyCommitParserOptions(AngularParserOptions):
- class MyCommitParserOptions(ParserOptions):
-      def __init__(self, **_: Any):
-          # Load whole configuration
-          super().__init__(**_)
-          pyproject = toml.load("pyproject.toml")
-
-          # Extract commit parsers options
-          commit_parser_options = pyproject["tool"]["semantic_release"][
-              "commit_parser_options"
-          ]
-
-          # Load options as instance variables
-          self.allowed_tags = commit_parser_options["allowed_tags"]
-          self.minor_tags = commit_parser_options["minor_tags"]
-          self.patch_tags = commit_parser_options["patch_tags"]
-          self.default_bump_level = LevelBump(commit_parser_options["default_bump_level"])


  class MyCommitParser(CommitParser[ParseResult, MyCommitParserOptions]):
+
+    parser_options = ADOCommitParserOptions   # removed in v10
+
-     def __init__(self, **kwargs):
-         super().__init__(**kwargs)
-         self.options = MyCommitParserOptions()
+     def __init__(self, options: MyCommitParserOptions | None = None) -> None:
+         super().__init__(options)
+
+     @staticmethod
+     def get_default_options() -> MyCommitParserOptions:
+         return MyCommitParserOptions()

      def parse(self, commit: git.objects.commit.Commit) -> Optional[ParseResult]:
-         sentences = commit.message.strip().split("\n\n")
+         sentences = force_str(commit.message).strip().split("\n\n")
          unique_sentences = list(dict.fromkeys(sentences))
          message = "\n\n".join(unique_sentences)
          allowed_tags_regex_part = "|".join(map(re.escape, self.options.allowed_tags))
          match = re.match(
              rf"^.*({allowed_tags_regex_part})(?:\(([^)]+)\))?: (.+)(?:\n|$)([\s\S]*)",
              message,
          )

          if match:
              type_ = match.group(1)
              scope = match.group(2) or ""
              description = match.group(3)
              rest_of_message = match.group(4)

              descriptions = [description]
              breaking_descriptions = []

              for line in rest_of_message.split("\n"):
                  line = line.strip()
                  if line.startswith("BREAKING CHANGE: "):
                      breaking_descriptions.append(line.replace("BREAKING CHANGE: ", ""))
                  elif line != "":
                      descriptions.append(line)

              bump = self.options.default_bump_level
              if breaking_descriptions:
                  bump = LevelBump.MAJOR
              elif type_ in self.options.minor_tags:
                  bump = LevelBump.MINOR
              elif type_ in self.options.patch_tags:
                  bump = LevelBump.PATCH

              return ParsedCommit(
                  bump=bump,
                  type=type_,
                  scope=scope,
                  descriptions=descriptions,
                  breaking_descriptions=breaking_descriptions,
                  commit=commit,
              )

          else:
              return ParseError(commit=commit, error="Could not parse commit message")

@codejedi365 codejedi365 changed the title Regression between 9.17.0 and 9.16.1 in commit parsing to determine level bump (Step 5. in algorithm.py) Failure to determine correct level bump with custom parser in 9.17.0 Jan 29, 2025
@pwrmiller
Copy link
Author

@codejedi365 thank you so much for the detailed review and recommendations on my commit parser. I'll take a look on my end into avoiding the custom CommitParser as recommended. We were doing this to filter out certain commits that weren't matching any pattern, if I recall, to avoid version bumps that didn't make sense for a project that moved towards semantic versioning later in its life.

I'll be revisiting that re.match to see if it's doing anything special related to our earlier commits, but I don't see anything obvious other than perhaps dealing with formatting for multi-line commit messages (those with a more verbose description) since we wanted to see those in our CHANGELOG.md for completeness (these are auto-generated by your library after a certain release number, then pushed up to a static site with mkdocs).

@codejedi365
Copy link
Contributor

codejedi365 commented Jan 30, 2025

We were doing this to filter out certain commits that weren't matching any pattern, if I recall, to avoid version bumps that didn't make sense for a project that moved towards semantic versioning later in its life.

I would check out exclude_commit_patterns as well when it comes to changelog content if you use the init mode for changelog generation, but you likely will not need to consider older messages if you migrate to the update mode.

I'll be revisiting that re.match to see if it's doing anything special related to our earlier commits, but I don't see anything obvious other than perhaps dealing with formatting for multi-line commit messages (those with a more verbose description) since we wanted to see those in our CHANGELOG.md for completeness (these are auto-generated by your library after a certain release number, then pushed up to a static site with mkdocs).

If it is a more verbose description, then you may want to look at issue #1132 which discusses how to customize the changelog output. We now support parsing squash commits if that is what you were describing as a more verbose description. The default changelog currently prints out the entire commit body but in v10 it will only display the subject/summary line.

codejedi365 added a commit to codejedi365/python-semantic-release that referenced this issue Jan 30, 2025
resolves the error where the casting to a list was incorrectly
implemented causing single results back from custom parsers
(legacy return value) to be mis-interpreted which meant no parse
level would be determined.

Resolves: python-semantic-release#1162
codejedi365 added a commit to codejedi365/python-semantic-release that referenced this issue Jan 30, 2025
resolves the error where the casting to a list was incorrectly
implemented causing single results back from custom parsers
(legacy return value) to be mis-interpreted which meant no parse
level would be determined.

Resolves: python-semantic-release#1162
codejedi365 added a commit to codejedi365/python-semantic-release that referenced this issue Jan 30, 2025
resolves the error where the casting to a list was incorrectly
implemented causing single results back from custom parsers
(legacy return value) to be mis-interpreted which meant no parse
level would be determined.

Resolves: python-semantic-release#1162
codejedi365 added a commit to codejedi365/python-semantic-release that referenced this issue Jan 30, 2025
resolves the error where the casting to a list was incorrectly
implemented causing single results back from custom parsers
(legacy return value) to be mis-interpreted which meant no parse
level would be determined.

Resolves: python-semantic-release#1162
@codejedi365
Copy link
Contributor

🎉 This issue has been resolved in Version 9.18.0 🎉

You can find more information about this release on the GitHub Releases page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-reply Waiting for response bug Something isn't working properly
Projects
None yet
2 participants