Skip to content

Add optional pypandoc rst parsing #220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

chsasank
Copy link

@chsasank chsasank commented Apr 7, 2017

Please see #219 for discussion on this. Fix #219.

Can somebody point to me where I should add optional requirements.

@lesteve
Copy link
Member

lesteve commented Apr 8, 2017

Please use "Fix #issueNumber" this way the associated issue gets closed automatically when the PR is merged. For more details, look at this.

I have edited your description but please remember to do it next time.

@lesteve
Copy link
Member

lesteve commented Apr 8, 2017

I think we would need tests for this and probably a new entry in the build matrix with pypandoc and pandoc installed. The latter can be installed through anaconda.

@lesteve
Copy link
Member

lesteve commented Apr 8, 2017

Also I am wondering what happens if you have pypandoc but not pandoc installed. Is there a clear error message?

@chsasank
Copy link
Author

chsasank commented Apr 8, 2017

If pypandoc is installed but not pandoc, OSError is raised.
That's why two errors are caught. We can be logging which implementation is used like this

try:
    import pypandoc
    print('Using pandoc for converting rst to md in notebooks')
except (ImportError, OSError) as e:
    print('Pandoc not found. Error: {}. Falling back to regex based parser'.format(e))

Please show me where I should add the above and tests.

@Titan-C
Copy link
Member

Titan-C commented Apr 8, 2017

I don't like having our rst2md code in the except block. I would rather write a new wrapper function like

def rst2md(args):
    try:
        import pypandoc
        return pypandoc.rst2md(args)
    except:
        return sphx_glr_rst2md(args)

Then we can document this function explaining the pypandoc use and the default option of a regexp based conversion in our library. Also our function documentation remains available.

I'm fine not throwing a message if pandoc is not installed, is an optional dependecy and just as we don't complain about mayavi or seaborn being installed here it shall not be the case. Optional dependencies remain user responsibility.

We do need to add this case to the test matrix this dependency. I don't like the anaconda install because it has to go trough conda-forge(but it is just another line in the .travis.yml, and travis time is for free so include it at will). I would do the install trough pip and then use pypandoc itself to download the official binary.

$ python -c "import pypandoc;pypandoc.download_pandoc()"

This command takes care of it. Finally we have to include this in the changes and put some documentation in the advanced_configuration.rst

@lesteve
Copy link
Member

lesteve commented Apr 8, 2017

I don't like the anaconda install because it has to go trough conda-forge

pandoc is accessible through conda without conda-forge. pypandoc is accessible through pip. This is what I would do for simplicity.

@chsasank
Copy link
Author

chsasank commented Apr 8, 2017

I don't like having our rst2md code in the except block.

Moved existing code to a new function as you suggested.

We do need to add this case to the test matrix this dependency.

Do I have to create a new environment here? or Use existing one?
I prefer using pip install pypandoc and
python -c "import pypandoc;pypandoc.download_pandoc()" . I need to these two lines to this block right?

I'm sorry but I'm a noob when it comes to travis.

@Titan-C
Copy link
Member

Titan-C commented Apr 16, 2017

For testing with travis and also I would like to see in the circle ci artifacts this output you need this

diff --git a/.travis.yml b/.travis.yml
index 438e17ec..ed79e259 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -32,6 +32,8 @@ matrix:
       env: DISTRIB="conda" PYTHON_VERSION="3.4"
     - os: linux
       env: DISTRIB="conda" PYTHON_VERSION="3.5"
+    - os: linux
+      env: DISTRIB="conda" PYTHON_VERSION="3.5" PYPANDOC="True"
 before_install:
     - if [ "$DISTRIB" == "conda" ]; then
          wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh;
@@ -51,6 +53,9 @@ install:
         if [ "$PYTHON_VERSION" == "2.7" ]; then
           conda install --yes mayavi;
         fi;
+        if [ "$PYPANDOC" == "True"]; then
+            python -c "import pypandoc;pypandoc.download_pandoc()";
+        fi;
       fi;
     - pip install -r requirements.txt
     - if [ "$DISTRIB" == "ubuntu 14" ]; then pip install seaborn sphinx==1.4.9; fi;
diff --git a/circle.yml b/circle.yml
index dc3a807f..b153aebd 100644
--- a/circle.yml
+++ b/circle.yml
@@ -16,6 +16,7 @@ dependencies:
     - sed -i "s/ENABLE_USER_SITE = .*/ENABLE_USER_SITE = False/g" /home/ubuntu/miniconda/envs/circleenv/lib/python2.7/site.py
     - conda install -n circleenv --yes numpy scipy seaborn mayavi setuptools matplotlib pillow sphinx nose sphinx_rtd_theme
     - pip install -r requirements.txt
+    - python -c "import pypandoc;pypandoc.download_pandoc()"
 
   override:
     - python setup.py develop

Pandoc for converting rst to markdown for notebook downloads
============================================================

Sphinx uses restructured text while Jupyter notebooks uses markdown for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notebook use markdown to declare the format of the rendered text.

============================================================

Sphinx uses restructured text while Jupyter notebooks uses markdown for
markup of the text. Therefore, restructured text need to be translated to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to this mismatch in input formats, the generated Jupyter notebooks may not display correctly if you have heavily used the formatting syntax of restructured text. Sphinx-Gallery provides a minimal parser to transform between this two formats suitable for basic use cases and is the default parse. Sphinx-Gallery is nevertheless aware of pypandoc and if found in your system it will be used to convert between this formats. To install pypandoc in your system you can use the following commands:

I'm not so good with writing. These paragraphs might need some extra reviews to give a clear and easy to understand message.

Copy link
Author

@chsasank chsasank Apr 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here's what I think we should write:

Sphinx uses restructured text while Jupyter notebooks uses markdown for formatting the text. Due to this mismatch in markup formats, the generated Jupyter notebooks may not display the text correctly if you have formatted your text heavily with restructured text.

Sphinx-Gallery provides a minimal parser to convert from restructured text to markdown. It is suitable for basic use cases and is the default parser. Sphinx-Gallery is nevertheless aware of pypandoc and if it is found in your system, it will be used to conversion. To install pypandoc in your system you can use the following commands:

What do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reminding @Titan-C

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are getting to a clearer text:

Sphinx uses reStructuredText while Jupyter notebooks use Markdown for
formatting the text blocks. Due to this mismatch in markup formats, the
generated Jupyter notebooks may not render the text cells correctly if
you make heavy use of the syntax features in reStructuredText

Sphinx-Gallery provides a minimal parser to convert from
reStructuredText to Markdown. It is suitable for most basic use cases
and is the default parser. Sphinx-Gallery is nevertheless aware of
pypandoc and when it is found in
your system, it will be used instead to do the conversion. To install
pypandoc in your system you can use the following commands:

@@ -98,6 +99,18 @@ def rst2md(text):

return text

def rst2md(text):
"""Converts the RST text from the examples docstrigs and comments
into markdown text for the Jupyter notebooks"""
Copy link
Member

@Titan-C Titan-C Apr 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add:

Delegates the conversion to pypandoc if found in the system, otherwise uses our custom converter

@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented Apr 17, 2017 via email

@chsasank
Copy link
Author

@Titan-C I think this is ready to be merged.

@Titan-C
Copy link
Member

Titan-C commented Apr 22, 2017

@chsasank Thank you so much for keeping up with this. The code looks great and I'm happy with the documentation. But just now with the Circle CI in place I checked the output of pypandoc and comparied it against what Sphinx-gallery supports.

I'm disappointed on the output of Pandoc as it misses conversions we already support, we certainly need more calibration for it beyond the default. Have a look at:

@Titan-C
Copy link
Member

Titan-C commented Apr 22, 2017

We have test for the latex conversion. Travis being green and pandoc not having good latex conversion means travis was not setup correctly. (My error) The if clause needs to be

-        if [ "$PYPANDOC" == "True"]; then
+        if [ "$PYPANDOC" == "True" ]; then

The white-space is relevant

@chsasank
Copy link
Author

Updated. Build fails because

  1. '$T<0$ and $U&gt;0$\n' is produced instead of '$T<0$ and $U&gt;0$'. New line is inserted. I don't think this is a issue. Test can be fixed with strip()
  2. note and warning directives are not converted to divs as expected by the tests.

Sasank.

@Titan-C
Copy link
Member

Titan-C commented Apr 22, 2017

Yes build should fail on Pandoc because it is not converting everything as we would want. We expect warning blocks and multiline Latex to work. (The inline Latex, first fail is indeed no big deal)

@chsasank
Copy link
Author

Hey @Titan-C @lesteve,

Is there anything I should do before closing/merging this PR?

@Titan-C
Copy link
Member

Titan-C commented May 28, 2017 via email

@chsasank
Copy link
Author

note and warning directives are not converted to divs as expected by the tests.

Is this the feature you want but pandoc doesn't support?

@Titan-C
Copy link
Member

Titan-C commented May 29, 2017 via email

@chsasank
Copy link
Author

Arguably, that's just two tags that are not supported by pandoc. Output is not unreasonable either:

note

Interpolation is a linear operation that can be performed also on

Compare this to current rst2md parser which misses so many tags (see rst and output)

Anyway, closing the PR

@chsasank chsasank closed this May 29, 2017
@GaelVaroquaux
Copy link
Contributor

GaelVaroquaux commented May 29, 2017 via email

@goodlux
Copy link

goodlux commented Jul 16, 2018

So, is there no way to present titles, or h3 and h4 headers in .ipynb that are converted from .py files? Seems like a big limitation.

@lesteve
Copy link
Member

lesteve commented Jul 16, 2018

Rather than commenting in a closed PR, please open an issue with more details and ideally a way to reproduce what you are seeing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants