Add optional pypandoc rst parsing #220

chsasank · 2017-04-07T17:57:57Z

Please see #219 for discussion on this. Fix #219.

Can somebody point to me where I should add optional requirements.

lesteve · 2017-04-08T08:18:30Z

Please use "Fix #issueNumber" this way the associated issue gets closed automatically when the PR is merged. For more details, look at this.

I have edited your description but please remember to do it next time.

lesteve · 2017-04-08T08:21:03Z

I think we would need tests for this and probably a new entry in the build matrix with pypandoc and pandoc installed. The latter can be installed through anaconda.

lesteve · 2017-04-08T08:34:14Z

Also I am wondering what happens if you have pypandoc but not pandoc installed. Is there a clear error message?

chsasank · 2017-04-08T09:21:30Z

If pypandoc is installed but not pandoc, OSError is raised.
That's why two errors are caught. We can be logging which implementation is used like this

try:
    import pypandoc
    print('Using pandoc for converting rst to md in notebooks')
except (ImportError, OSError) as e:
    print('Pandoc not found. Error: {}. Falling back to regex based parser'.format(e))

Please show me where I should add the above and tests.

Titan-C · 2017-04-08T09:52:06Z

I don't like having our rst2md code in the except block. I would rather write a new wrapper function like

def rst2md(args):
    try:
        import pypandoc
        return pypandoc.rst2md(args)
    except:
        return sphx_glr_rst2md(args)

Then we can document this function explaining the pypandoc use and the default option of a regexp based conversion in our library. Also our function documentation remains available.

I'm fine not throwing a message if pandoc is not installed, is an optional dependecy and just as we don't complain about mayavi or seaborn being installed here it shall not be the case. Optional dependencies remain user responsibility.

We do need to add this case to the test matrix this dependency. I don't like the anaconda install because it has to go trough conda-forge(but it is just another line in the .travis.yml, and travis time is for free so include it at will). I would do the install trough pip and then use pypandoc itself to download the official binary.

$ python -c "import pypandoc;pypandoc.download_pandoc()"

This command takes care of it. Finally we have to include this in the changes and put some documentation in the advanced_configuration.rst

lesteve · 2017-04-08T10:05:15Z

I don't like the anaconda install because it has to go trough conda-forge

pandoc is accessible through conda without conda-forge. pypandoc is accessible through pip. This is what I would do for simplicity.

chsasank · 2017-04-08T11:09:51Z

I don't like having our rst2md code in the except block.

Moved existing code to a new function as you suggested.

We do need to add this case to the test matrix this dependency.

Do I have to create a new environment here? or Use existing one?
I prefer using pip install pypandoc and
python -c "import pypandoc;pypandoc.download_pandoc()" . I need to these two lines to this block right?

I'm sorry but I'm a noob when it comes to travis.

Titan-C · 2017-04-16T19:25:11Z

For testing with travis and also I would like to see in the circle ci artifacts this output you need this

diff --git a/.travis.yml b/.travis.yml
index 438e17ec..ed79e259 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -32,6 +32,8 @@ matrix:
       env: DISTRIB="conda" PYTHON_VERSION="3.4"
     - os: linux
       env: DISTRIB="conda" PYTHON_VERSION="3.5"
+    - os: linux
+      env: DISTRIB="conda" PYTHON_VERSION="3.5" PYPANDOC="True"
 before_install:
     - if [ "$DISTRIB" == "conda" ]; then
          wget http://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh;
@@ -51,6 +53,9 @@ install:
         if [ "$PYTHON_VERSION" == "2.7" ]; then
           conda install --yes mayavi;
         fi;
+        if [ "$PYPANDOC" == "True"]; then
+            python -c "import pypandoc;pypandoc.download_pandoc()";
+        fi;
       fi;
     - pip install -r requirements.txt
     - if [ "$DISTRIB" == "ubuntu 14" ]; then pip install seaborn sphinx==1.4.9; fi;
diff --git a/circle.yml b/circle.yml
index dc3a807f..b153aebd 100644
--- a/circle.yml
+++ b/circle.yml
@@ -16,6 +16,7 @@ dependencies:
     - sed -i "s/ENABLE_USER_SITE = .*/ENABLE_USER_SITE = False/g" /home/ubuntu/miniconda/envs/circleenv/lib/python2.7/site.py
     - conda install -n circleenv --yes numpy scipy seaborn mayavi setuptools matplotlib pillow sphinx nose sphinx_rtd_theme
     - pip install -r requirements.txt
+    - python -c "import pypandoc;pypandoc.download_pandoc()"
 
   override:
     - python setup.py develop

Titan-C · 2017-04-16T19:26:00Z

doc/advanced_configuration.rst

+Pandoc for converting rst to markdown for notebook downloads
+============================================================
+
+Sphinx uses restructured text while Jupyter notebooks uses markdown for


notebook use markdown to declare the format of the rendered text.

Titan-C · 2017-04-16T19:39:59Z

doc/advanced_configuration.rst

+============================================================
+
+Sphinx uses restructured text while Jupyter notebooks uses markdown for
+markup of the text. Therefore, restructured text need to be translated to


Due to this mismatch in input formats, the generated Jupyter notebooks may not display correctly if you have heavily used the formatting syntax of restructured text. Sphinx-Gallery provides a minimal parser to transform between this two formats suitable for basic use cases and is the default parse. Sphinx-Gallery is nevertheless aware of pypandoc and if found in your system it will be used to convert between this formats. To install pypandoc in your system you can use the following commands:

I'm not so good with writing. These paragraphs might need some extra reviews to give a clear and easy to understand message.

Ok, here's what I think we should write:

Sphinx uses restructured text while Jupyter notebooks uses markdown for formatting the text. Due to this mismatch in markup formats, the generated Jupyter notebooks may not display the text correctly if you have formatted your text heavily with restructured text.

Sphinx-Gallery provides a minimal parser to convert from restructured text to markdown. It is suitable for basic use cases and is the default parser. Sphinx-Gallery is nevertheless aware of pypandoc and if it is found in your system, it will be used to conversion. To install pypandoc in your system you can use the following commands:

What do you think?

reminding @Titan-C

We are getting to a clearer text:

Sphinx uses reStructuredText while Jupyter notebooks use Markdown for
formatting the text blocks. Due to this mismatch in markup formats, the
generated Jupyter notebooks may not render the text cells correctly if
you make heavy use of the syntax features in reStructuredText

Sphinx-Gallery provides a minimal parser to convert from
reStructuredText to Markdown. It is suitable for most basic use cases
and is the default parser. Sphinx-Gallery is nevertheless aware of
pypandoc and when it is found in
your system, it will be used instead to do the conversion. To install
pypandoc in your system you can use the following commands:

Titan-C · 2017-04-16T19:41:45Z

sphinx_gallery/notebook.py

@@ -98,6 +99,18 @@ def rst2md(text):

    return text

+def rst2md(text):
+    """Converts the RST text from the examples docstrigs and comments
+    into markdown text for the Jupyter notebooks"""


Add:

Delegates the conversion to pypandoc if found in the system, otherwise uses our custom converter

GaelVaroquaux · 2017-04-17T08:31:29Z

I think we would need tests for this and probably a new entry in the build matrix with pypandoc and pandoc installed. anaconda.

+1

chsasank · 2017-04-22T07:21:59Z

@Titan-C I think this is ready to be merged.

Titan-C · 2017-04-22T09:30:04Z

@chsasank Thank you so much for keeping up with this. The code looks great and I'm happy with the documentation. But just now with the Circle CI in place I checked the output of pypandoc and comparied it against what Sphinx-gallery supports.

I'm disappointed on the output of Pandoc as it misses conversions we already support, we certainly need more calibration for it beyond the default. Have a look at:

https://46-25860190-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/rtd_html/auto_examples/plot_quantum.html#sphx-glr-auto-examples-plot-quantum-py
It misses multiline Latex equations
https://46-25860190-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/rtd_html/auto_examples/sin_func/plot_sin.html#sphx-glr-auto-examples-sin-func-plot-sin-py
The See also box is removed instead of kept without unconverted
https://46-25860190-gh.circle-artifacts.com/0/home/ubuntu/sphinx-gallery/rtd_html/tutorials/plot_parse.html#sphx-glr-tutorials-plot-parse-py
Warning boxes are washed, code-block is removed,

Titan-C · 2017-04-22T10:28:08Z

We have test for the latex conversion. Travis being green and pandoc not having good latex conversion means travis was not setup correctly. (My error) The if clause needs to be

-        if [ "$PYPANDOC" == "True"]; then
+        if [ "$PYPANDOC" == "True" ]; then

The white-space is relevant

chsasank · 2017-04-22T13:17:10Z

Updated. Build fails because

'$T<0$ and $U>0$\n' is produced instead of '$T<0$ and $U>0$'. New line is inserted. I don't think this is a issue. Test can be fixed with strip()
note and warning directives are not converted to divs as expected by the tests.

Sasank.

Titan-C · 2017-04-22T13:38:50Z

Yes build should fail on Pandoc because it is not converting everything as we would want. We expect warning blocks and multiline Latex to work. (The inline Latex, first fail is indeed no big deal)

chsasank · 2017-05-28T19:04:02Z

Hey @Titan-C @lesteve,

Is there anything I should do before closing/merging this PR?

Titan-C · 2017-05-28T22:36:36Z

@chsasank I think we won't merge this because pypandoc, at this moment, does not provide the support for some features we currently use. We do need to improve on our RST parser as I mention in #232 (comment), it is just not an easy problem to solve.

chsasank · 2017-05-29T03:56:16Z

note and warning directives are not converted to divs as expected by the tests.

Is this the feature you want but pandoc doesn't support?

Titan-C · 2017-05-29T08:29:51Z

Yes, I want the notes and warning boxes to be parsed.

chsasank · 2017-05-29T08:53:33Z

Arguably, that's just two tags that are not supported by pandoc. Output is not unreasonable either:

note

Interpolation is a linear operation that can be performed also on

Compare this to current rst2md parser which misses so many tags (see rst and output)

Anyway, closing the PR

GaelVaroquaux · 2017-05-29T08:58:47Z

I agree that it was a great idea to explore. It's a pity that we cannot normalize enough the two ouptuts. I think that it boils down to that.

goodlux · 2018-07-16T21:04:26Z

So, is there no way to present titles, or h3 and h4 headers in .ipynb that are converted from .py files? Seems like a big limitation.

lesteve · 2018-07-16T22:14:34Z

Rather than commenting in a closed PR, please open an issue with more details and ideally a way to reproduce what you are seeing.

Add optional pypandoc rst parsing

ab60835

Move regex based parsing code to a function

8ef49ac

Documentation for pandoc

92814a7

Titan-C requested changes Apr 16, 2017

View reviewed changes

chsasank added 2 commits April 17, 2017 10:38

Travis & circle CI for pypandoc

4f086cc

Add pypandoc install in circle adn travis

7194981

Requested documentation changes

808cf67

Fix travis not building with pandoc

f60011d

Titan-C mentioned this pull request May 1, 2017

SG incorrectly grabbing the description when a label is defined in docstring #232

Closed

Titan-C mentioned this pull request May 24, 2017

Support jupyter notebooks #245

Open

chsasank closed this May 29, 2017

goodlux mentioned this pull request Jul 24, 2018

Notebooks (ipynb) are not created properly during build process. pytorch/tutorials#274

Closed

choldgraf mentioned this pull request Oct 14, 2018

adding a notebook to python converter #423

Closed

chsasank mentioned this pull request Jun 25, 2019

Format tutorial Markdown for jupyter notebook pytorch/tutorials#495

Closed

Add optional pypandoc rst parsing #220

Add optional pypandoc rst parsing #220

Uh oh!

Conversation

chsasank commented Apr 7, 2017 • edited by lesteve Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lesteve commented Apr 8, 2017

Uh oh!

lesteve commented Apr 8, 2017

Uh oh!

lesteve commented Apr 8, 2017

Uh oh!

chsasank commented Apr 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Titan-C commented Apr 8, 2017

Uh oh!

lesteve commented Apr 8, 2017

Uh oh!

chsasank commented Apr 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Titan-C commented Apr 16, 2017

Uh oh!

Titan-C Apr 16, 2017

Choose a reason for hiding this comment

Uh oh!

Titan-C Apr 16, 2017

Choose a reason for hiding this comment

Uh oh!

chsasank Apr 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chsasank Apr 19, 2017

Choose a reason for hiding this comment

Uh oh!

Titan-C Apr 19, 2017

Choose a reason for hiding this comment

Uh oh!

Titan-C Apr 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Apr 17, 2017 via email

Uh oh!

chsasank commented Apr 22, 2017

Uh oh!

Titan-C commented Apr 22, 2017

Uh oh!

Titan-C commented Apr 22, 2017

Uh oh!

chsasank commented Apr 22, 2017

Uh oh!

Titan-C commented Apr 22, 2017

Uh oh!

chsasank commented May 28, 2017

Uh oh!

Titan-C commented May 28, 2017 via email

Uh oh!

chsasank commented May 29, 2017

Uh oh!

Titan-C commented May 29, 2017 via email

Uh oh!

chsasank commented May 29, 2017

Uh oh!

GaelVaroquaux commented May 29, 2017 via email

Uh oh!

goodlux commented Jul 16, 2018

Uh oh!

lesteve commented Jul 16, 2018

Uh oh!

Uh oh!

chsasank commented Apr 7, 2017 •

edited by lesteve

Loading

chsasank commented Apr 8, 2017 •

edited

Loading

chsasank commented Apr 8, 2017 •

edited

Loading

chsasank Apr 17, 2017 •

edited

Loading

Titan-C Apr 16, 2017 •

edited

Loading