Skip to content

Pyplot update in setup.py #928

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 11, 2012
Merged

Pyplot update in setup.py #928

merged 9 commits into from
Jun 11, 2012

Conversation

pelson
Copy link
Member

@pelson pelson commented Jun 6, 2012

It seems that it is currently very easy for the pyplot.py file to get out of date.

I have included an update process in setup.py which always re-builds the pyplot.py file.

@efiring
Copy link
Member

efiring commented Jun 6, 2012

Providing a function for building pyplot.py is good, but I don't think it should be run automatically by setup. We don't want the act of installation to modify the code being installed, if it can possibly be avoided. In addition, I don't like the mechanism of switching stdout to grab what is printed when boilerplate is imported. Importing should import capabilities, not generate output.

Suggestion: take all this out of setup.py, and turn the present boilerplate.py into a "make_pyplot.py". The present contents of boilerplate would become a function returning a string; another function would take that string and substitute it in the correct location in pyplot.py, as you are doing; and the whole thing would end with the usual "if name == 'main':" block to scriptify it.

@mdboom
Copy link
Member

mdboom commented Jun 6, 2012

I actually don't have a problem with code being generated by setup.py. It already happens, for example, when running 2to3 over all of the code when installing on Python 3, not to mention all of the C++ code that gets compiled. I personally prefer that over an extra step that we need to remember to run.

However, there are some things about this approach that could be improved.

This is putting generated code in the git repository (pyplot.py is in the repo), so it's possible that just the act of building matplotlib could cause unexpected git merge conflicts on the next git pull. The source and build products need to be clearly separated in the directory tree.

Here's what I would propose:

  1. Leave the manually written part of pyplot.py where it is, and then add from .pyplot_generated.py import * at the end.

  2. Put the generated stuff in pyplot_generated.py -- but this file should only be in the build tree so as not to pollute the source tree. There is a way to query distutils for the build directory (usually build/lib...), and it should go there.

  3. This file should only be run during distutils' build step, otherwise it will get run even if the user does setup.py --help.

  4. The file should only be written to disk if the contents are different from the file already on disk (if any). This will prevent unnecessary copies and rebuilds.

  5. I agree with @efiring that boilerplate.py should just have a function that returns a string, or alternatively be a function that takes an output stream and writes to that. Importing alone shouldn't write things to stdout.

@efiring
Copy link
Member

efiring commented Jun 6, 2012

I'm still a bit uncomfortable with this, but I will get used to it. But please, where you have the "from .pyplot_generated.py..." line, include a comment to the effect that this is generated in the build directory at build time, so some poor soul doesn't waste an hour hunting around for it in the git repo or tarball. And it probably should have a leading underscore to help drive home the point.

@mdboom
Copy link
Member

mdboom commented Jun 6, 2012

@efiring: good point about the comment and leading underscore.

@pelson
Copy link
Member Author

pelson commented Jun 6, 2012

Thanks to you both. I was hoping for a quick win by improving some stuff re boilerplating without having to change all of it, but clearly there are some un-ignorable warts :-) .

We don't want the act of installation to modify the code being installed, if it can possibly be avoided.

I don't really have a good response to this statement. Your absolutely correct. A standard user could do an install and end up with a different pyplot to the one we thought we had released (admittedly this is not very likely). On the other hand, pyplot is currently not being kept perfectly up to date and that is unlikely to change even if there was a simple script for a developer to run. I guess one approach could be to put a git hook inplace which runs the generate_pyplot script on commit.

  1. I'm not a fan of having the extra file and doing a import *. With regards to your specific concern, conflicts should be simple to resolve: the first part of pyplot would use the normal git way and conflicts to the second part could simply be ignored by re-building using setup.py.

The source and build products need to be clearly separated in the directory tree.

In general, I agree with this statement. Indeed, I started off by having the leading part of pyplot in the src directory, but I felt that it added yet another thing to remember for very little benefit.

  1. & 4) Will do.

  2. We all agree that boilerplate.py shouldn't be doing prints on the top level. I will implement this, but it will make things noisier/longer to review.

From this point I'm considering taking the easy answer and creating a script as @efiring suggested. This will at least make building pyplot easier, but it does leave the outstanding issue of keeping the content up-to-date.

@mdboom
Copy link
Member

mdboom commented Jun 6, 2012

@pelson: github doesn't allow custom git hooks, so that's out.

Generated code should never go in the git repository -- unfortunately it has been that way for a long time in the case of pyplot.py, but since the generation of that file was manually controlled (i.e. not as part of setup.py), the problems with that approach were minimized -- you have to consciously know what you're doing. The problem with the approach in this PR is that you could have a particular version of pyplot.py checked into repo, a user checks it out, installs matplotlib and pyplot.py gets changed -- this could be as simple as updating the timestamp or perhaps dictionary ordering causes the elements to come out in a different order than before. Either way, now the user has a file that git considers "changed", yet she did nothing active where she would expect a change. Doing a git diff shows changes. Doing a git pull after this happens could result in a merge conflict. That could be alarming for someone who is just tracking the latest updates using git. It's true that setup.py would overwrite the correct contents, but git still forces you to mark the merge conflict as resolved using git add. I just see that as being a major pain for users and developers alike.

In case there was some confusion -- the leading part of pyplot.py would stay where it is in lib/matplotlib, not be moved to the src directory -- that is for C/C++ code. It would only be the generated parts that would be in a different place -- in build/lib.../matplotlib, which will ultimately be installed alongside all the source code from lib/matplotlib.

I agree that creating a separate script is a reasonable first step, and worthwhile work even if it gets automated in setup.py down the road.

@efiring
Copy link
Member

efiring commented Jun 6, 2012

There are different kinds of generated code, and some belong in the repo, some don't. For example, cython-generated C-code typically is in the repo, for good reason. Its generation should be under developer control, not installer or user control.
pyplot boilerplate could go either way. I think the reasonable alternatives are (1) a script that a developer uses to rebuild it when warranted, with the code always being in the repo, or (2) automatic building at build time of code that is not in the repo.

@pelson
Copy link
Member Author

pelson commented Jun 7, 2012

I have updated the boilerplate script. Unfortunately, the diffs look bigger than they are due to simple indentation. Fortunately, pyplot has changed very little, so that should give confidence that boilerplate is still doing what it was originally.

@efiring
Copy link
Member

efiring commented Jun 7, 2012

This does look like a step in the right direction. I like it. @mdboom, are there any potential 2to3 or other version issues involved in the use of file() and binary versus text mode, for example? Does this need to be tested with git on Windows? (I don't know who would do that.)

@mdboom
Copy link
Member

mdboom commented Jun 7, 2012

It would be nice (though not entirely necessary) for boilerplate to run under Python 3. At the moment, it doesn't, since it doesn't get run through 2to3.

>python3 boilerplate.py 
  File "boilerplate.py", line 171
    raise ValueError, ('default value %s unknown to boilerplate.formatvalue'
                    ^
SyntaxError: invalid syntax

As @efiring points out, the calls to file should be replaced by open. It doesn't necessarily have to open the files in binary mode -- testing will tell.

This approach (which isn't running in setup.py anymore) shouldn't have any issues wrt to git on Windows, unless I'm missing something.

@pelson
Copy link
Member Author

pelson commented Jun 8, 2012

That makes things a little neater. There are still some improvements that could be made, but I am resisting the temptation.

The motivation for this whole PR is so that I can submit a small change to the content of pyplot. However I am reluctant to include it here as it would make the review harder.

import inspect
import random
import re
import sys
import types

# import the local copy of matplotlib, not the installed one
#sys.path.insert(0, './lib')
from matplotlib.axes import Axes
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that this line probably puts the scuppers on including the boilerplate script in setup.py (at least before building the c code) in the future.

@mdboom
Copy link
Member

mdboom commented Jun 8, 2012

I've filed a pull request against your branch that adds Python 3 compatibility for boilerplate.py.

One other thing while we're in here -- maybe we should add a paragraph to the developer docs about this? I can't seem to find it if it already exists.

Other than that, I think this is good to go and is a worthwhile improvement.

Phil Elson and others added 3 commits June 8, 2012 07:34
Add Python 3 compatiblity to boilerplate.py
…ot_build

* 'pyplot_build' of github.com:pelson/matplotlib:
  Make boilerplate.py Python 3 compatible
@pelson
Copy link
Member Author

pelson commented Jun 8, 2012

Added some docs to the developer guide. I have built them and it all renders nicely.

manually written and some autogenerated functions. Modifications can be made
to pyplot where the function definition does not explicitly state otherwise
and autogenerated functions can be modified by changing, and subsequently
runnning, the :file:`boilerplate.py` script which will re-generate the complete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What changes to boilerplate.py are required?

@pelson
Copy link
Member Author

pelson commented Jun 8, 2012

I was trying to keep it succinct and to avoid any redundancy by documenting the boilerplate mechanism in two places (here and in boilerplate.py itself). It was my intention to ensure that a developer was aware of boilerplate.py without fully documenting its functionality here. If you've got a feeling as to what you would like to see here, please let me know.

@jdh2358
Copy link
Collaborator

jdh2358 commented Jun 9, 2012

The perfect is the enemy of the good, so I have no problem accepting an improved boilerplate.py solution. But the real solution is to probably just get rid of it. As discussed in the thread linked below, when I wrote boilerplate.py there were already code magic solutions in python2.4 that could have avoided the need for boilerplate.py

http://groups.google.com/group/comp.lang.python/browse_frm/thread/dcd63ec13096a0f6/1b14640f3a4ad3dc?pli=1

But I was supporting python 2.2 at the time so I just did the brute force thing. There is now so much magic cleverness in python function generation that I expect we could do away with it entirely,

@pelson
Copy link
Member Author

pelson commented Jun 11, 2012

But the real solution is to probably just get rid of it. As discussed in the thread linked below, when I wrote boilerplate.py there were already code magic solutions in python2.4 that could have avoided the need for boilerplate.py.

Technically, as you say, there is a far more elegant solution using new which would avoid the need for the boilerplate script. However, one of the biggest benefits of the boilerplate script is that it generates python code which a novice user could understand with very little effort; I am not confident that providing a "code magic" solution would be able to maintain such simple and readable code (in the user facing pyplot.py).

@mdboom: Do you have any suggestions for the documentation in the developer guide?

@mdboom
Copy link
Member

mdboom commented Jun 11, 2012

How about something like:

Large portions of the pyplot interface is automatically generated by the `boilerplate.py` script (in the root of the source tree).  To add or remove a plotting method from pyplot, edit the appropriate list in `boilerplate.py` and then run the script.  The new versions of `boilerplate.py` and `lib/matplotlib/pyplot.py` should be checked into git.

I think John makes a good point about removing code generation in the future if possible. I don't think it would be terribly complex, and __new__ won't be needed. As far as I can tell, boilerplate.py on generates functions. This is easy enough to do with nested functions, which while somewhat unusual are not terribly difficult relative to metaclasses.

@pelson
Copy link
Member Author

pelson commented Jun 11, 2012

@mdboom I've used your content and tweaked a little.

@mdboom
Copy link
Member

mdboom commented Jun 11, 2012

Looks good, thanks!

@pelson
Copy link
Member Author

pelson commented Jun 11, 2012

I think this resolves all of the issues. Unless anyone objects, I will merge in a couple of hours.

pelson pushed a commit that referenced this pull request Jun 11, 2012
Improved boilerplate.py and associated docs to simplify the generation of pyplot.py
@pelson pelson merged commit 45493c4 into matplotlib:master Jun 11, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants