-
-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle (pass-through) character and entity references in TOC parser #612
Conversation
Interesting, good catch! Would you be able to add a test and a quick mention in the release notes? |
Testing for more entities would require the input to be unicode... |
Handle (pass-through) character and entity references in TOC parser
This is great, thanks very much! |
It should be Unicode already. The TOC is initially returned by Markdown and the Markdown library is all Unicode all the time. Additionally, due to a recent change, all literal strings defined in MkDocs are also Unicode as every module uses Unicode literals ( Actually, I see that the tests wrap the output with |
I tried your suggestion, and you are right, however contrary to my belief markdown does not seem to convert e.g. def test_charref(self):
md = dedent("""
# Heading © 1
""")
expected = dedent("""
Heading © 1 - #heading-1
""")
toc = markdown_to_toc(md)
self.assertEqual(text_type(toc).strip(), expected) |
I don't think Python Markdown has ever auto converted things like that. As I recall, you need to use the correct HTML entity. Or you can write a Python Markdown extension to convert those for you. |
I have noticed the ampersand in a header like this:
converted to
&
by markdown, is dropped in the resulting TOC:This patch fixes this by adding
handle_entityref
(andhandle_charref
, while we're at it) tomkdocs.toc.TOCParser
which pass through such entities unchanged.