-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add 3 new styles with color schemes from Tableau [backport to 1.4.x] #3700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
That's a very nice example plot that you show above; it would be great to have something like this in the gallery. Is it all automated, or did you use manual operations to position the labels on the right? |
Thanks! I've promised @tacaswell that I'll put together a full example of that plot for the gallery. But that's for another PR. :-) Here's my blog post with the full code for the plot: http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/ I'll have to rework it to use base Python instead of pandas. The labels are semi-automated: They all use the same x position, and I use the last y value in each series to determine the y position of the corresponding label. This chart in particular took a little tweaking to make it so the labels don't overlap. |
Doing the y-layout might be a good use of the liner constraint solver (see kiwi, there as an issue about this someplace). |
I just committed a working example of the full plot. I originally started coding it up without pandas, but the code became so gross that I simply wasn't willing to publish it that way. I would never handle tabulated data without pandas, so I can't publish code that recommends someone else not use pandas. :-) |
Also, where is Tableau from and are there IP issues with it? |
Thanks; maybe this evening I will see about stripping out pandas. I really don't want it to be a dependency here. I would also use a dictionary for your offsets--it will be more compact and readable that way. |
Tableau is made by Tableau Software. I doubt they have any kind of ownership over the color scheme used here, or if that's even possible.
Sure! I'll make that change real quick. |
Provides a more concise way of storing the offsets rather than a long string of if statements.
plt.style.use("tableau20") | ||
|
||
gender_degree_data = pd.read_csv("http://files.figshare.com/1726892/percent_bachelors_degrees_women_usa.csv") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can put the file in lib/matplotlib/mpl-data/sample_data, and then access it using matplotlib.cbook.get_sample_data()
. See the example pylab_examples/loadrec.py
. Then you can read it using matplotlib.mlab.csv2rec
.
from matplotlib.mlab import csv2rec
from matplotlib.cbook import get_sample_data
fname = get_sample_data('percent_bachelors_degrees_women_usa.csv')
gender_degree_data = csv2rec(fname)
That leaves you with a recarray, which supports both dictionary and attribute styles of field access.
Using matplotlib’s csv2rec function instead. Also made sure the file is pep8 compliant.
Thanks for the tips @efiring - I've managed to eliminate the pandas dependency now. I'm pretty sure I put the csv file in the right place, but would appreciate if someone would double-check that. When testing locally, I had to copy the csv into the corresponding directory of my local install of matplotlib. |
This PR brings up strategy questions: what should be the criteria for adding styles to the mpl distribution? How should they be named? And how much should the styles specify? If two styles differ by a single line, does it make sense to have both of them? I think we need to come up with a plan, before this gets out of control. |
# Author: Randal S. Olson (randalolson.com / @randal_olson) | ||
# Uses Tableau's Color Blind 10 color scheme | ||
|
||
figure.figsize: 12, 7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure this should be part of the style? How often do people want a figure that is 12 inches wide? How does this interact with the ever-confusing dpi variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do people want a figure that is 12 inches wide?
For me: All the time! I find the default figure size to be way too small. Of course, we could remove it if you think it's inappropriate to change the default figure size in a base style.
How does this interact with the ever-confusing dpi variables?
No clue, but I've never had an issue.
That's what I was thinking. I'm considering breaking the style up into the "Tableau" styles for coloring, then have a separate (fourth) style to do all of my other plotting customizations. |
multiple styles can be used (ex I am in favor of being as permissive as possible with adding these. It might be worth coming up with a way to have a 'endorsed' and 'contributed' set of styles? |
@rhiever I suspect the reason you find the default figure size to be too small is partly because the default screen dpi (80) is too low for most present-day machines, and differs from the savefig default dpi, which is 100. The problem with setting figure.figsize to 12 inches wide is that then it doesn't fit on a standard page without being scaled. I think it is preferable to be able to display or print a pdf without scaling it, even if in the end, when a figure is used in a paper or presentation, it does have to be scaled. |
In my opinion, a single stylesheet should change either layout (e.g. figsize, font sizes) or non-layout aesthetics (e.g. colors, font family). That said, there's a bit of overlap between the two. The reason |
Ping. Anything else needed from me for this PR? Still nothing from Tableau. I presume they aren't going to answer. |
Without explicit permission from Tableau, I don't think we should use their name or their color sequences in mpl. See http://www.tableausoftware.com/ip. Color sequences are designed, with genuine thought and effort. They are IP. It is up to the designers to decide what rights, if any, they want to retain for that IP. |
That is also the precedent set in #2871 where we got the company to release the color map under MIT. |
Does Related: http://matplotlib.1069221.n5.nabble.com/Matlab-parula-colormap-tt44174.html#none To borrow an example from here (see License for Colors & Palettes), I could not create a 3-color palette of red, white, and blue and then prevent people from printing US flags. According to another page on that same site, it's not the colors per se, but the specific arrangement as a colorbar that is copyrightable. Matplotlib is a tool. If people use the tool in a way that violates copyright, then that isn't really Matplotlib's concern, as people can also use Matplotlib in a way that doesn't violate copyright. For example, it seems to me (but please tell me if I sound way off base) that for your own personal use, you can use whatever colors you wanted. This could include colormaps identical to Tableau's colormaps, MATLAB's parula colormap, ggplot2 colormaps, etc. If you aren't publishing the figure, then it seems like you couldn't be breaking copyright---think of an image that is generated on the fly and is never saved to disk! Even if you choose to publish the figure, it's not entirely clear that you would be breaking copyright, as your figure is much more than just a rectangular colorbar---there is the actual plot material, and even the colorbar itself has a custom axis with custom tickmarks and custom ticklabels. Worst case, you would have to find a more creative way to display the colorbar. I guess I feel like Matplotlib's main concern should be with code copyright. So long as we aren't violating that, then these colormaps should be included. If it happens that published figures which use a particular copyright'd colormap are violating copyright (which is not clear at all), then that is a decision that users of Matplotlib can make or not make. We could even put a warning for users. |
Matplotlib provides a very easy mechanism for adding colormaps and style We know that colors can be trademarked (UPS has brown trademarked, for On Thu, Nov 13, 2014 at 1:27 PM, chebee7i notifications@github.com wrote:
|
I wonder if there would be any legal issues if we very slightly modified the colors and didn't use Tableau's name? |
@rhiever I doubt it, as that is what I initially suggested with #2871. Once Wistia released it, it was moot. @WeatherGod I understand the conservative stance (e.g. avoiding the potential for legal trouble). Could you comment, though, on my point about users creating figures that are not even published? Isn't the potential for legal trouble only for published figures that users create? Or is that too naive? I guess I don't know how anyone can claim a copyright violation on something that isn't published. That's why I was stressing code copyright, for which I think we'd be in the clear. I think an external colormap library is probably the best option, but not out of principle. |
IANAL. It would seem correct that unpublished works are not subject to I wouldn't know any of this for sure. I didn't go to law school like my On Thu, Nov 13, 2014 at 2:07 PM, chebee7i notifications@github.com wrote:
|
Oh, and I think it would make sense that any such external package can not On Thu, Nov 13, 2014 at 2:15 PM, Benjamin Root ben.v.root@gmail.com wrote:
|
@chebee7i Yes, you can do whatever you want with plots that you don't publish. And in practice, if you were to copy the tableau color sequence and use it in a publication, chances are no one would care enough to do anything about it. But we absolutely must not include such IP in our distribution. We would have nothing to gain and much to lose. And it would be wrong. Slightly tweaking the colors wouldn't help. Granted, the point at which colors or a sequence of colors becomes protected IP may be murky, but I don't see much murk in this case, where the clear intention is to copy the tableau set. |
@efiring I disagree that there is nothing to gain...it's clear to me that people want such colormaps in Matplotlib because they think it makes Matlpotlib better. However, I would agree that it's probably not worth any potential legal issues. [Aside: I assume by "wrong" you meant "not worth the risk", as opposed to injecting some notion of morality into this discussion.] But I'm splitting hairs... Mathworks does not have a trademark for their colormap. See here for their list. There is also no code that we are copying that is copyrighted, as far as I can see. These colors can be reverse-engineered. Maybe a patent is possible, but I doubt it. Anyway, this seems like a very abstract (yet understandable) concern. So while I understand the desire to play it safe, I personally think it's playing safer than necessary. Not gonna fight it though. @rhiever if you want to start on a package that gathers a bunch of colormaps (matlab, tableau, d3, etc), I'd be more than happy to help out. Adding |
Actually, in my research for the history of matplotlib, we did not copy it On Thu, Nov 13, 2014 at 3:57 PM, chebee7i notifications@github.com wrote:
|
Good info. The point stands however, since then IDL must have the IP claim (unless it was taken from yet another place). If Matlab just copied them from IDL, maybe that indicates that Matlab didn't think copying a set of colors was infringing on anything? |
Feel free to research it further. It may very well have been "open sourced" On Thu, Nov 13, 2014 at 4:50 PM, chebee7i notifications@github.com wrote:
|
Seems like if people are so concerned about potential liabilities, then there is an obvious safe action. If new colormaps cannot be added b/c of concerns for IP, why are existing ones with similar concerns being kept in? Anyway, unless the conversation direction changes, I should probably stop commenting. Just wanted to voice my opinion that I find the current 'policy' a bit inconsistent. I think its a shame that this really nice example figure is probably not going to make it in as an example, at least not without changing his colors. They're just colors, folks. |
Probably something akin to statute of limitations, I guess. They have been It would be prudent to double-check the rights for what we have now, but Ben Root On Thu, Nov 13, 2014 at 5:00 PM, chebee7i notifications@github.com wrote:
|
One key difference is that we know of explicit claims of ip in the case of There is also public record of us knowing of the ip claims so if we ignore I am with Eric, there is minimal up side and a whole lot of possible down The style module makes changing your local defaults very easy and there are On Thu, Nov 13, 2014, 17:00 chebee7i notifications@github.com wrote:
|
Just to add an info, at least partly the tableau color-cycle colors seem to be identical with some of ms-offices colorshemes. |
Yeah this is all messy...everyone seems to be copying everyone's colormaps. Personally, I don't think IP claims for colormaps hold water, but IANAL. It's all about calculated risk I guess. I agree the risk is greater for these new colormaps than for the old, but there is still risk for both as ignorance of an IP claim doesn't really get you much. Anyway, fun conversation! Given the various tolerance levels for risk, the external package seems by far the best solution for this. |
I am going to close this due to lack of response from Tableau and the general uncertainty of the whole thing. @rhiever Would it be possible for the example to go in without this color cycle? This started as a PR to get a cool example that got side tracked by IP issues. |
I've taken three color schemes from Tableau Public and ported them into a matplotlib style. I've also added some custom styling that makes the plots look cleaner and clearer by default.
Some things to discuss before merging:
These styles should result in plots that look similar to this one: