Skip to content

Ignore first element if it is a comment when determining page title #3956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mschoettle
Copy link

@mschoettle mschoettle commented Apr 15, 2025

When a Markdown document starts with a comment (for example, a copyright notice, such as one gets when adopting REUSE, the heading is not found.

Comments are converted to HTML placeholders by markdown.
Change the logic to ignore the first element if it is a placeholder (in case it comes before the first-level heading).

@oprypin
Copy link
Contributor

oprypin commented Apr 15, 2025

I can assure you that this break was not accidentally mis-indented. Accepting a heading only at the very beginning folows the same idea as the original behavior since the year 2017. I had to comment because this is written as if prior to #3191 this title in your case would have been found, but that's not the case.

Allowing a comment, if there's some problem with it, should probably be a much more targeted change.

@mschoettle mschoettle changed the title Break only when h1 tag is finding when extracting a page title Break only when h1 tag is found when extracting a page title Apr 15, 2025
@mschoettle
Copy link
Author

I can assure you that this break was not accidentally mis-indented. Accepting a heading only at the very beginning folows the same idea as the original behavior since the year 2017. I had to comment because this is written as if prior to #3191 this title in your case would have been found, but that's not the case.

Allowing a comment, if there's some problem with it, should probably be a much more targeted change.

It was not my intention to suggest that #3191 broke previous behaviour. Sorry if that came across. I was trying to figure out where the title is determined and found the most recent change, so I did not look at original behaviour at all. I updated the description.

The way I understood "report the first H1 tag" in the PR description was that it would look until it found it. I agree that the heading should come first. I'll try to figure something out that still expects the heading to come first but allows for comments.

@mschoettle mschoettle changed the title Break only when h1 tag is found when extracting a page title Ignore first element if it is a comment when determining page title Apr 15, 2025
@mschoettle
Copy link
Author

I reverted my change and added a more targeted change to skip the first element if it is an HTML placeholder.

@oprypin
Copy link
Contributor

oprypin commented Apr 15, 2025

Nice :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants