Skip to content

Add 3 new styles with color schemes from Tableau [backport to 1.4.x] #3700

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed

Conversation

rhiever
Copy link
Contributor

@rhiever rhiever commented Oct 22, 2014

I've taken three color schemes from Tableau Public and ported them into a matplotlib style. I've also added some custom styling that makes the plots look cleaner and clearer by default.

Some things to discuss before merging:

  • Given that only the color scheme and not the rest of the styling is from Tableau, should these styles be called "tableau"?
  • Should there be separate styles that introduce the color schemes, and then another separate style that introduces my custom stylings?

These styles should result in plots that look similar to this one:

Example visualization

@efiring
Copy link
Member

efiring commented Oct 22, 2014

That's a very nice example plot that you show above; it would be great to have something like this in the gallery. Is it all automated, or did you use manual operations to position the labels on the right?

@rhiever
Copy link
Contributor Author

rhiever commented Oct 22, 2014

Thanks! I've promised @tacaswell that I'll put together a full example of that plot for the gallery. But that's for another PR. :-)

Here's my blog post with the full code for the plot: http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/

I'll have to rework it to use base Python instead of pandas.

The labels are semi-automated: They all use the same x position, and I use the last y value in each series to determine the y position of the corresponding label. This chart in particular took a little tweaking to make it so the labels don't overlap.

@tacaswell
Copy link
Member

Doing the y-layout might be a good use of the liner constraint solver (see kiwi, there as an issue about this someplace).

@tacaswell tacaswell added this to the v1.4.2 milestone Oct 22, 2014
@rhiever
Copy link
Contributor Author

rhiever commented Oct 23, 2014

I just committed a working example of the full plot. I originally started coding it up without pandas, but the code became so gross that I simply wasn't willing to publish it that way. I would never handle tabulated data without pandas, so I can't publish code that recommends someone else not use pandas. :-)

@tacaswell tacaswell changed the title Add 3 new styles with color schemes from Tableau Add 3 new styles with color schemes from Tableau [backpont to 1.4.x] Oct 23, 2014
@tacaswell tacaswell changed the title Add 3 new styles with color schemes from Tableau [backpont to 1.4.x] Add 3 new styles with color schemes from Tableau [backport to 1.4.x] Oct 23, 2014
@tacaswell
Copy link
Member

Also, where is Tableau from and are there IP issues with it?

@efiring
Copy link
Member

efiring commented Oct 23, 2014

Thanks; maybe this evening I will see about stripping out pandas. I really don't want it to be a dependency here. I would also use a dictionary for your offsets--it will be more compact and readable that way.

@rhiever
Copy link
Contributor Author

rhiever commented Oct 23, 2014

Also, where is Tableau from and are there IP issues with it?

Tableau is made by Tableau Software. I doubt they have any kind of ownership over the color scheme used here, or if that's even possible.

I would also use a dictionary for your offsets--it will be more compact and readable that way.

Sure! I'll make that change real quick.

Provides a more concise way of storing the offsets rather than a long string of if statements.
plt.style.use("tableau20")

gender_degree_data = pd.read_csv("http://files.figshare.com/1726892/percent_bachelors_degrees_women_usa.csv")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can put the file in lib/matplotlib/mpl-data/sample_data, and then access it using matplotlib.cbook.get_sample_data(). See the example pylab_examples/loadrec.py. Then you can read it using matplotlib.mlab.csv2rec.

from matplotlib.mlab import csv2rec
from matplotlib.cbook import get_sample_data
fname = get_sample_data('percent_bachelors_degrees_women_usa.csv')
gender_degree_data = csv2rec(fname)

That leaves you with a recarray, which supports both dictionary and attribute styles of field access.

Using matplotlib’s csv2rec function instead.

Also made sure the file is pep8 compliant.
@rhiever
Copy link
Contributor Author

rhiever commented Oct 23, 2014

Thanks for the tips @efiring - I've managed to eliminate the pandas dependency now. I'm pretty sure I put the csv file in the right place, but would appreciate if someone would double-check that. When testing locally, I had to copy the csv into the corresponding directory of my local install of matplotlib.

@efiring
Copy link
Member

efiring commented Oct 23, 2014

This PR brings up strategy questions: what should be the criteria for adding styles to the mpl distribution? How should they be named? And how much should the styles specify? If two styles differ by a single line, does it make sense to have both of them? I think we need to come up with a plan, before this gets out of control.

# Author: Randal S. Olson (randalolson.com / @randal_olson)
# Uses Tableau's Color Blind 10 color scheme

figure.figsize: 12, 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this should be part of the style? How often do people want a figure that is 12 inches wide? How does this interact with the ever-confusing dpi variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How often do people want a figure that is 12 inches wide?

For me: All the time! I find the default figure size to be way too small. Of course, we could remove it if you think it's inappropriate to change the default figure size in a base style.

How does this interact with the ever-confusing dpi variables?

No clue, but I've never had an issue.

@rhiever
Copy link
Contributor Author

rhiever commented Oct 23, 2014

That's what I was thinking. I'm considering breaking the style up into the "Tableau" styles for coloring, then have a separate (fourth) style to do all of my other plotting customizations.

@tacaswell
Copy link
Member

multiple styles can be used (ex mpl.style.use([style1, style2])) so in the case of the one line difference something like that should be done (and if there isn't a notion of extends in the style there should be).

I am in favor of being as permissive as possible with adding these. It might be worth coming up with a way to have a 'endorsed' and 'contributed' set of styles?

@efiring
Copy link
Member

efiring commented Oct 24, 2014

@rhiever I suspect the reason you find the default figure size to be too small is partly because the default screen dpi (80) is too low for most present-day machines, and differs from the savefig default dpi, which is 100. The problem with setting figure.figsize to 12 inches wide is that then it doesn't fit on a standard page without being scaled. I think it is preferable to be able to display or print a pdf without scaling it, even if in the end, when a figure is used in a paper or presentation, it does have to be scaled.

@tonysyu
Copy link
Contributor

tonysyu commented Oct 24, 2014

This PR brings up strategy questions: what should be the criteria for adding styles to the mpl distribution? How should they be named? And how much should the styles specify?

In my opinion, a single stylesheet should change either layout (e.g. figsize, font sizes) or non-layout aesthetics (e.g. colors, font family). That said, there's a bit of overlap between the two.

The reason style.use takes a list as an argument is to facilitate this separation. For example, I might have the same script generate figures for a presentation and for a journal paper just by switching between style.use([pretty_style, presentation_layout]) and style.use([pretty_style, jfm_journal_layout])

@rhiever
Copy link
Contributor Author

rhiever commented Nov 12, 2014

Ping. Anything else needed from me for this PR?

Still nothing from Tableau. I presume they aren't going to answer.

@efiring
Copy link
Member

efiring commented Nov 12, 2014

Without explicit permission from Tableau, I don't think we should use their name or their color sequences in mpl. See http://www.tableausoftware.com/ip. Color sequences are designed, with genuine thought and effort. They are IP. It is up to the designers to decide what rights, if any, they want to retain for that IP.

@tacaswell
Copy link
Member

That is also the precedent set in #2871 where we got the company to release the color map under MIT.

@chebee7i
Copy link
Contributor

Does matplotlib have access to a lawyer who could help us answer some of these questions? While I agree that colormaps do take some creative work, there seems to be a lot of uncertainty as to whether you can actually copyright a collection of colors.

Related: http://matplotlib.1069221.n5.nabble.com/Matlab-parula-colormap-tt44174.html#none

To borrow an example from here (see License for Colors & Palettes), I could not create a 3-color palette of red, white, and blue and then prevent people from printing US flags. According to another page on that same site, it's not the colors per se, but the specific arrangement as a colorbar that is copyrightable.

Matplotlib is a tool. If people use the tool in a way that violates copyright, then that isn't really Matplotlib's concern, as people can also use Matplotlib in a way that doesn't violate copyright.

For example, it seems to me (but please tell me if I sound way off base) that for your own personal use, you can use whatever colors you wanted. This could include colormaps identical to Tableau's colormaps, MATLAB's parula colormap, ggplot2 colormaps, etc. If you aren't publishing the figure, then it seems like you couldn't be breaking copyright---think of an image that is generated on the fly and is never saved to disk! Even if you choose to publish the figure, it's not entirely clear that you would be breaking copyright, as your figure is much more than just a rectangular colorbar---there is the actual plot material, and even the colorbar itself has a custom axis with custom tickmarks and custom ticklabels. Worst case, you would have to find a more creative way to display the colorbar.

I guess I feel like Matplotlib's main concern should be with code copyright. So long as we aren't violating that, then these colormaps should be included. If it happens that published figures which use a particular copyright'd colormap are violating copyright (which is not clear at all), then that is a decision that users of Matplotlib can make or not make. We could even put a warning for users.

@WeatherGod
Copy link
Member

Matplotlib provides a very easy mechanism for adding colormaps and style
sheets from external sources. If you want to make a package that registers
your own copy of those things upon import, feel free to do so. As for
Matplotlib itself, we have an obligation to make our software freely
redistributable and not to put those downstream of us into potential legal
situations.

We know that colors can be trademarked (UPS has brown trademarked, for
goodness sake's!). tableau has claimed IP on these colors for plotting
purposes, and so has MATLAB. Whether such claims can be defended in court
is up to a lawyer to decide, but in the meantime, I think Thomas is
absolutely correct is erring on the side of caution, not only for
ourselves, but for our downstream repackagers. Even if we were to get
explicit permission for us to use the colors, it doesn't help anybody else
(look at all of the fun happening with MKL licensing for numpy). Meanwhile,
the brewer colors and others were explicitly made open for anybody to use,
so we included them.

On Thu, Nov 13, 2014 at 1:27 PM, chebee7i notifications@github.com wrote:

Does matplotlib have access to a lawyer who could help us answer some of
these questions? While I agree that colormaps do take some creative work,
there seems to be a lot of uncertainty as to whether you can actually
copyright a collection of colors.

Related:
http://matplotlib.1069221.n5.nabble.com/Matlab-parula-colormap-tt44174.html#none

To borrow an example from here
http://www.colourlovers.com/faq/18/How_can_you_copyright_a_color_palette
(see License for Colors & Palettes), I could not create a 3-color palette
of red, white, and blue and then prevent people from printing US flags.
According to another page
http://www.colourlovers.com/faq/18/How_can_you_copyright_a_color_palette
on that same site, it's not the colors per se, but the specific arrangement
as a colorbar that is copyrightable.

Matplotlib is a tool. If people use the tool in a way that violates
copyright, then that isn't really Matplotlib's concern, as people can also
use Matplotlib in a way that doesn't violate copyright.

For example, it seems to me (but please tell me if I sound way off
base) that for your own personal use, you can use whatever colors you
wanted. This could include colormaps identical to Tableau's colormaps,
MATLAB's parula colormap, ggplot2 colormaps, etc. If you aren't publishing
the figure, then it seems like you couldn't be breaking copyright---think
of an image that is generated on the fly and is never saved to disk! Even
if you choose to publish the figure, it's not entirely clear that you would
be breaking copyright, as your figure is much more than just a rectangular
colorbar---there is the actual plot material, and even the colorbar itself
has a custom axis with custom tickmarks and custom ticklabels. Worst case,
you would have to find a more creative way to display the colorbar.

I guess I feel like Matplotlib's main concern should be with code
copyright. So long as we aren't violating that, then these colormaps should
be included. If it happens that published figures which use a particular
copyright'd colormap are violating copyright (which is not clear at all),
then that is a decision that users of Matplotlib can make or not make.
We could even put a warning for users.


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@rhiever
Copy link
Contributor Author

rhiever commented Nov 13, 2014

I wonder if there would be any legal issues if we very slightly modified the colors and didn't use Tableau's name?

@chebee7i
Copy link
Contributor

@rhiever I doubt it, as that is what I initially suggested with #2871. Once Wistia released it, it was moot.

@WeatherGod I understand the conservative stance (e.g. avoiding the potential for legal trouble). Could you comment, though, on my point about users creating figures that are not even published? Isn't the potential for legal trouble only for published figures that users create? Or is that too naive? I guess I don't know how anyone can claim a copyright violation on something that isn't published. That's why I was stressing code copyright, for which I think we'd be in the clear.

I think an external colormap library is probably the best option, but not out of principle.

@WeatherGod
Copy link
Member

IANAL. It would seem correct that unpublished works are not subject to
copyright issues (in other words, "fair use"), but note that the "fair use"
doctrine is not the same everywhere. As for code, what code do you speak
of? Are you talking about the code that would be required to implement the
colors? I guess it would be in the clear since it is entirely our own code,
but those configuration files might be questionable.

I wouldn't know any of this for sure. I didn't go to law school like my
parents wanted me to...

On Thu, Nov 13, 2014 at 2:07 PM, chebee7i notifications@github.com wrote:

@rhiever https://github.com/rhiever I doubt it, as that is what I
initially suggested with #2871
#2871. Once Wistia
released it, it was moot.

@WeatherGod https://github.com/WeatherGod I understand the conservative
stance (e.g. avoiding the potential for legal trouble). Could you
comment, though, on my point about users creating figures that are not even
published? Isn't the potential for legal trouble only for published figures
that users create? Or is that too naive? I guess I don't know how anyone
can claim a copyright violation on something that isn't published. That's
why I was stressing code copyright, for which I think we'd be in the clear.

I think an external colormap library is probably the best option, but not
out of principle.


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@WeatherGod
Copy link
Member

Oh, and I think it would make sense that any such external package can not
get our official "blessing" (we wouldn't link to it from our website).

On Thu, Nov 13, 2014 at 2:15 PM, Benjamin Root ben.v.root@gmail.com wrote:

IANAL. It would seem correct that unpublished works are not subject to
copyright issues (in other words, "fair use"), but note that the "fair use"
doctrine is not the same everywhere. As for code, what code do you speak
of? Are you talking about the code that would be required to implement the
colors? I guess it would be in the clear since it is entirely our own code,
but those configuration files might be questionable.

I wouldn't know any of this for sure. I didn't go to law school like my
parents wanted me to...

On Thu, Nov 13, 2014 at 2:07 PM, chebee7i notifications@github.com
wrote:

@rhiever https://github.com/rhiever I doubt it, as that is what I
initially suggested with #2871
#2871. Once Wistia
released it, it was moot.

@WeatherGod https://github.com/WeatherGod I understand the
conservative stance (e.g. avoiding the potential for legal trouble).
Could you comment, though, on my point about users creating figures that
are not even published? Isn't the potential for legal trouble only for
published figures that users create? Or is that too naive? I guess I don't
know how anyone can claim a copyright violation on something that isn't
published. That's why I was stressing code copyright, for which I think
we'd be in the clear.

I think an external colormap library is probably the best option, but not
out of principle.


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@efiring
Copy link
Member

efiring commented Nov 13, 2014

@chebee7i Yes, you can do whatever you want with plots that you don't publish. And in practice, if you were to copy the tableau color sequence and use it in a publication, chances are no one would care enough to do anything about it. But we absolutely must not include such IP in our distribution. We would have nothing to gain and much to lose. And it would be wrong. Slightly tweaking the colors wouldn't help. Granted, the point at which colors or a sequence of colors becomes protected IP may be murky, but I don't see much murk in this case, where the clear intention is to copy the tableau set.

@chebee7i
Copy link
Contributor

@efiring I disagree that there is nothing to gain...it's clear to me that people want such colormaps in Matplotlib because they think it makes Matlpotlib better. However, I would agree that it's probably not worth any potential legal issues. [Aside: I assume by "wrong" you meant "not worth the risk", as opposed to injecting some notion of morality into this discussion.] But I'm splitting hairs...

Mathworks does not have a trademark for their colormap. See here for their list. There is also no code that we are copying that is copyrighted, as far as I can see. These colors can be reverse-engineered. Maybe a patent is possible, but I doubt it. Anyway, this seems like a very abstract (yet understandable) concern. So while I understand the desire to play it safe, I personally think it's playing safer than necessary. Not gonna fight it though.

@rhiever if you want to start on a package that gathers a bunch of colormaps (matlab, tableau, d3, etc), I'd be more than happy to help out. Adding brewer2mpl as a dependency would bring in the ColorBrewer colormaps as well.

@chebee7i
Copy link
Contributor

Out of curiosity: If we apply this logic consistently, shouldn't we take out all the original colormaps, since they were obviously copied from MATLAB?

MATLAB Colormaps

@WeatherGod
Copy link
Member

Actually, in my research for the history of matplotlib, we did not copy it
from Matlab, but from IDL. Matlab seemed to have copied IDL in this respect
(or IDL got it from someplace else..). While the relevant commit message
from John Hunter states that they were Matlab colormaps, but the patch was
not his. it would seem that he merely recognized them as Matlab names
because of his Matlab background, but the contributor had zero Matlab
experience at the time.

On Thu, Nov 13, 2014 at 3:57 PM, chebee7i notifications@github.com wrote:

Out of curiosity: If we apply this logic consistently, shouldn't we take
out all the original colormaps, since they were obviously copied from
MATLAB?

[image: MATLAB Colormaps]
https://camo.githubusercontent.com/bb20d73b6c1a59a696a59ee8c39b9205586219a9/687474703a2f2f6c6f6f6d7363692e66696c65732e776f726470726573732e636f6d2f323031332f30362f6d61746c61625f636f6c6f726d6170735f616c6c2e706e67


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@chebee7i
Copy link
Contributor

Good info. The point stands however, since then IDL must have the IP claim (unless it was taken from yet another place). If Matlab just copied them from IDL, maybe that indicates that Matlab didn't think copying a set of colors was infringing on anything?

@WeatherGod
Copy link
Member

Feel free to research it further. It may very well have been "open sourced"
stuff back then. Attitudes about this sort of stuff was very different 15
years ago. It may also very well be that Matlab and tableu decided to
assert their IP rights now, while everything else is considered
"grandfathered", so to speak.

On Thu, Nov 13, 2014 at 4:50 PM, chebee7i notifications@github.com wrote:

Good info. The point stands however, since then IDL must have the IP claim
(unless it was taken from yet another place). If Matlab just copied them
from IDL, maybe that indicates that Matlab didn't think copying a set of
colors was infringing on anything?


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@chebee7i
Copy link
Contributor

Seems like if people are so concerned about potential liabilities, then there is an obvious safe action. If new colormaps cannot be added b/c of concerns for IP, why are existing ones with similar concerns being kept in?

Anyway, unless the conversation direction changes, I should probably stop commenting. Just wanted to voice my opinion that I find the current 'policy' a bit inconsistent. I think its a shame that this really nice example figure is probably not going to make it in as an example, at least not without changing his colors. They're just colors, folks.

@WeatherGod
Copy link
Member

Probably something akin to statute of limitations, I guess. They have been
here and in other projects for 15+ years. Again, though, I wouldn't say
that there are similar concerns. We alreaady know that jet and family did
not come from Matlab, but IDL. It may very well be that these colormaps are
"in the open" via some other means (maybe IDL got it from some place else,
maybe they released it for others, etc.). But we know that these
colormaps are not touchable. The IP rights have been explicitly asserted.

It would be prudent to double-check the rights for what we have now, but
unless we see any sort of affirmative action with regards to somebody
protecting their IP on these colors, it would be safe to assume that they
are grandfathered in. If anything, we should track down the source of the
colormap list so that we can at least properly cite the source in
documentation and code.

Ben Root

On Thu, Nov 13, 2014 at 5:00 PM, chebee7i notifications@github.com wrote:

Seems like if people are so concerned about potential liabilities, then
there is an obvious safe action. If new colormaps cannot be added b/c of
concerns for IP, why are existing ones with similar concerns being kept in?


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@tacaswell
Copy link
Member

One key difference is that we know of explicit claims of ip in the case of
these color maps and the new matlab color map, I know of no such claim for
any other color map.

There is also public record of us knowing of the ip claims so if we ignore
those claims then we are knowingly infringing which to my understanding is
frowned on even more strongly.

I am with Eric, there is minimal up side and a whole lot of possible down
side for both us and the down stream packagers/distributors.

The style module makes changing your local defaults very easy and there are
multiple projects that provide the same thing.

On Thu, Nov 13, 2014, 17:00 chebee7i notifications@github.com wrote:

Seems like if people are so concerned about potential liabilities, then
there is an obvious safe action. If new colormaps cannot be added b/c of
concerns for IP, why are existing ones with similar concerns being kept in?


Reply to this email directly or view it on GitHub
#3700 (comment)
.

@Tillsten
Copy link
Contributor

Just to add an info, at least partly the tableau color-cycle colors seem to be identical with some of ms-offices colorshemes.

@chebee7i
Copy link
Contributor

Yeah this is all messy...everyone seems to be copying everyone's colormaps. Personally, I don't think IP claims for colormaps hold water, but IANAL.

It's all about calculated risk I guess. I agree the risk is greater for these new colormaps than for the old, but there is still risk for both as ignorance of an IP claim doesn't really get you much. Anyway, fun conversation! Given the various tolerance levels for risk, the external package seems by far the best solution for this.

@tacaswell tacaswell mentioned this pull request Nov 26, 2014
@tacaswell
Copy link
Member

I am going to close this due to lack of response from Tableau and the general uncertainty of the whole thing.

@rhiever Would it be possible for the example to go in without this color cycle? This started as a PR to get a cool example that got side tracked by IP issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants