Handle (pass-through) character and entity references in TOC parser #612

zmousm · 2015-06-07T19:53:13Z

I have noticed the ampersand in a header like this:

### Tips & tricks

converted to & by markdown, is dropped in the resulting TOC:

Tips  tricks

This patch fixes this by adding handle_entityref (and handle_charref, while we're at it) to mkdocs.toc.TOCParser which pass through such entities unchanged.

d0ugal · 2015-06-07T21:18:04Z

Interesting, good catch! Would you be able to add a test and a quick mention in the release notes?

…harref() in release notes.

zmousm · 2015-06-08T02:28:30Z

Testing for more entities would require the input to be unicode...

Handle (pass-through) character and entity references in TOC parser

d0ugal · 2015-06-08T08:57:58Z

This is great, thanks very much!

waylan · 2015-06-09T12:54:45Z

Testing for more entities would require the input to be unicode...

It should be Unicode already. The TOC is initially returned by Markdown and the Markdown library is all Unicode all the time. Additionally, due to a recent change, all literal strings defined in MkDocs are also Unicode as every module uses Unicode literals (from __future__ import unicode_literals), including the test modules. So, unless MkDocs converts the Unicode string to a byte string somewhere (which would be a bug), then it should be Unicode.

Actually, I see that the tests wrap the output with str(). They should probably be using mkdocs.utils.text_type() which will always return a Unicode string regardless of Python version.

zmousm · 2015-06-09T16:16:50Z

I tried your suggestion, and you are right, however contrary to my belief markdown does not seem to convert e.g. © to an entity, but rather returns unicode in the toc, so a test like this fails:

    def test_charref(self):
        md = dedent("""
        # Heading © 1
        """)
        expected = dedent("""
        Heading &copy; 1 - #heading-1
        """)
        toc = markdown_to_toc(md)
        self.assertEqual(text_type(toc).strip(), expected)

facelessuser · 2015-06-09T16:23:27Z

I don't think Python Markdown has ever auto converted things like that. As I recall, you need to use the correct HTML entity. Or you can write a Python Markdown extension to convert those for you.

d0ugal added the Bug label Jun 7, 2015

d0ugal added this to the 0.14.0 milestone Jun 7, 2015

zmousm added 3 commits June 8, 2015 05:14

Handle (pass-through) character and entity references in TOC parser

10e0798

A test for TOCParser handle_entityref

9379ce6

Add a note about mkdocs.toc.TOCParser handle_entityref() and handle_c…

0a152cf

…harref() in release notes.

zmousm force-pushed the master branch from 0656fce to 0a152cf Compare June 8, 2015 02:16

d0ugal added a commit that referenced this pull request Jun 8, 2015

Merge pull request #612 from zmousm/master

2c1d166

Handle (pass-through) character and entity references in TOC parser

d0ugal merged commit 2c1d166 into mkdocs:master Jun 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle (pass-through) character and entity references in TOC parser #612

Handle (pass-through) character and entity references in TOC parser #612

zmousm commented Jun 7, 2015

d0ugal commented Jun 7, 2015

zmousm commented Jun 8, 2015

d0ugal commented Jun 8, 2015

waylan commented Jun 9, 2015

zmousm commented Jun 9, 2015

facelessuser commented Jun 9, 2015

Handle (pass-through) character and entity references in TOC parser #612

Handle (pass-through) character and entity references in TOC parser #612

Conversation

zmousm commented Jun 7, 2015

d0ugal commented Jun 7, 2015

zmousm commented Jun 8, 2015

d0ugal commented Jun 8, 2015

waylan commented Jun 9, 2015

zmousm commented Jun 9, 2015

facelessuser commented Jun 9, 2015