Skip to content

Conversation

rickeylev
Copy link
Collaborator

@rickeylev rickeylev commented Jun 25, 2025

When parsing the local identifier segment of <digit><letter> the parser would
give an error saying the letter was unexpected.

What was happening was accept_digits() consumed up to the first non-digit, and
considered this success, which prevented calling accept_alnum() to finish
the parsing.

To fix, only call accept_alnum(), then post-process the value to normalize
an all-digit segment.

I'm guessing accept_digits() stopping at the first non-digit is WAI because
it expects to parse e.g. "3.14b", where the caller handles subsequent
characters.

Along the way, some minor doc improvements to the parser code.

Fixes #3030

@rickeylev rickeylev force-pushed the fix.version.parsing.normalized.local branch from 6e85b32 to f655a46 Compare June 25, 2025 23:41
@rickeylev rickeylev changed the title wip: test for version parsing local with fix: parsing local version with digit followed by non-digits Jul 4, 2025
@rickeylev
Copy link
Collaborator Author

The commit 9d5360b in this PR fixes it, but it's not obvious to me how. I had an AI churn away on it for a bit and that's what it came up with. (it might have also fixed a bug with comparing int and non-int segments ?)

It did give me an idea of what might be going wrong, though:

accept_alnum() is implemented using accept_digits() or accept_alnum(). I think accept_digits() consumes as many digits as it can, normalizes the string to a number (e.g. "007" -> 7), and then lets parsing continue from where it left off (accept_digits() returns true which would prevent the or accept_alnum() from firing). This leaves the parser in the state of having a "stray" alpha character as the next input char, which can't parse. I'm guessing the parser state is thinking the segment is done parsing, so it's expecting e.g. ".", not e.g. "e".

I'm assuming intended behavior is:

  • 1.0+brt.9e -> treat "9e" as a plain string, e.g. (1, 0, "brt", "9e")
  • 1.0+brt.99 -> treat "99" as the integer 99, e.g. (1, 0, "brt", 99)

i.e. segment of all digits gets converted to an integer, otherwise its a literal string

So, I think the proper fix would be to change the segment parsing to something like:

  • if it starts with a non-digit, consume up to a separator, treat all of it as plain string.
  • if it starts with a digit, consume up to a separator or non-digit.
    • If it stopped at a separator, convert accumulated segment to integer.
    • Otherwise, act as if we had treated it as a plain string from the start (e.g. backtrack or just finish parsing as a plain string).

If I squint, I the AI code looks like it's sort of doing that. It sort of unrolled it in a weird, way, though, which I don't like.

@rickeylev rickeylev force-pushed the fix.version.parsing.normalized.local branch from 9d5360b to ba2d225 Compare July 7, 2025 03:12
@rickeylev rickeylev marked this pull request as ready for review July 7, 2025 03:24
@rickeylev rickeylev requested a review from aignas as a code owner July 7, 2025 03:24
@rickeylev
Copy link
Collaborator Author

PTAL.

My analysis was correct. Switched it to consume alnum and detect if it was all-digits afterwards.

Copy link
Collaborator

@aignas aignas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@rickeylev rickeylev added this pull request to the merge queue Jul 7, 2025
Merged via the queue into bazel-contrib:main with commit b2c3926 Jul 7, 2025
3 checks passed
@rickeylev rickeylev deleted the fix.version.parsing.normalized.local branch August 17, 2025 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1.5.0-rc3 fails when parsing local identifiers starting with digits followed by non digits
2 participants