Skip to content
/ mkdocs Public
  • Sponsor
  • Notifications You must be signed in to change notification settings
  • Fork 2.5k
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle (pass-through) character and entity references in TOC parser #612

Merged
merged 3 commits into from
Jun 8, 2015

Conversation

zmousm
Copy link
Contributor

@zmousm zmousm commented Jun 7, 2015

I have noticed the ampersand in a header like this:

### Tips & tricks

converted to & by markdown, is dropped in the resulting TOC:

Tips  tricks

This patch fixes this by adding handle_entityref (and handle_charref, while we're at it) to mkdocs.toc.TOCParser which pass through such entities unchanged.

@d0ugal
Copy link
Member

d0ugal commented Jun 7, 2015

Interesting, good catch! Would you be able to add a test and a quick mention in the release notes?

@d0ugal d0ugal added the Bug label Jun 7, 2015
@d0ugal d0ugal added this to the 0.14.0 milestone Jun 7, 2015
@zmousm
Copy link
Contributor Author

zmousm commented Jun 8, 2015

Testing for more entities would require the input to be unicode...

d0ugal added a commit that referenced this pull request Jun 8, 2015
Handle (pass-through) character and entity references in TOC parser
@d0ugal d0ugal merged commit 2c1d166 into mkdocs:master Jun 8, 2015
@d0ugal
Copy link
Member

d0ugal commented Jun 8, 2015

This is great, thanks very much!

@waylan
Copy link
Member

waylan commented Jun 9, 2015

Testing for more entities would require the input to be unicode...

It should be Unicode already. The TOC is initially returned by Markdown and the Markdown library is all Unicode all the time. Additionally, due to a recent change, all literal strings defined in MkDocs are also Unicode as every module uses Unicode literals (from __future__ import unicode_literals), including the test modules. So, unless MkDocs converts the Unicode string to a byte string somewhere (which would be a bug), then it should be Unicode.

Actually, I see that the tests wrap the output with str(). They should probably be using mkdocs.utils.text_type() which will always return a Unicode string regardless of Python version.

@zmousm
Copy link
Contributor Author

zmousm commented Jun 9, 2015

I tried your suggestion, and you are right, however contrary to my belief markdown does not seem to convert e.g. © to an entity, but rather returns unicode in the toc, so a test like this fails:

    def test_charref(self):
        md = dedent("""
        # Heading © 1
        """)
        expected = dedent("""
        Heading © 1 - #heading-1
        """)
        toc = markdown_to_toc(md)
        self.assertEqual(text_type(toc).strip(), expected)

@facelessuser
Copy link
Contributor

I don't think Python Markdown has ever auto converted things like that. As I recall, you need to use the correct HTML entity. Or you can write a Python Markdown extension to convert those for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants