Skip to content

Logit scale #3753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 3, 2015
Merged

Logit scale #3753

merged 2 commits into from
Mar 3, 2015

Conversation

iosonofabio
Copy link

As discussed earlier today on the mailing list, here the proposal of PR for a new scale, the LogitScale. This is essentially a nonlinear scale that is log both towards 0+ and log towards 1-. The exact transformation is:

x -> log10 (x / 1 - x)

(the 10 base of the log is just the most common case for plotting, but I could extend to natural or base 2 logs if there is interest). The inverse transform is the logistic function:

x -> 1 / (1 + 10^-x)

(again, 10 is just one possible base).

The LogitScale is useful when one has frequencies in a population (i.e. floats between 0 and 1) and both rare events and very common ones are interesting. For instance, say you ask about the fraction of people with blue eyes in various world populations, you want to spot even tiny deviations from zero or one.

Pierre Haessig had one implementation with example plots here:

http://nbviewer.ipython.org/gist/pierre-haessig/7e3e6a818edeb6819708

I had developed another, virtually identically one in the last few days, but this PR takes essentially his code and slightly modifies it for style and a few design points.

Note: there are a few FIXME in the code in case of doubts, please enlighten me on the best practice in those cases. Tests and docs also still missing.


def __init__(self, p_min, p_max):
Transform.__init__(self)
self.p_min = p_min
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these intentionally part of the public api?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought they are as useful as e.g. symmetric log transformations, which are already public API.

It's possible to define LogitTransform and LogisticTransform within LogitScale, but then LogitScale looks different from any other scale in that file...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, they should probably be hidden behind properties so that the values can not be set to out-of-range values.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean p_min and p_max? I can make a @Property around it and another _p_min/_p_max layer to hide from the user, with a setter that checks for ranges. Is that the idea?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

On Mon Nov 03 2014 at 11:49:49 AM Fabio Zanini notifications@github.com
wrote:

In lib/matplotlib/scale.py:

@@ -478,10 +478,107 @@ def get_transform(self):
return self._transform

+class LogitTransform(Transform):

  • input_dims = 1
  • output_dims = 1
  • is_separable = True
  • has_inverse = True
  • def init(self, p_min, p_max):
  •    Transform.**init**(self)
    
  •    self.p_min = p_min
    

You mean p_min and p_max? I can make a @Property
https://github.com/property around it and another _p_min/_p_max layer
to hide from the user, with a setter that checks for ranges. Is that the
idea?


Reply to this email directly or view it on GitHub
https://github.com/matplotlib/matplotlib/pull/3753/files#r19745773.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I coded this via a sister class that implements the properties. This actually cleans up constructors quite a bit.

@iosonofabio
Copy link
Author

I added a FIXME about having side-effects in transform_non_affine. That might be not fine, depending on how transform_non_affine is supposed to work.


@p_min.setter
def p_min(self, value):
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lean towards letting the user handle exceptions, but this is fine.

@tacaswell
Copy link
Member

A tad annoying thing about the auto-hiding line comments is that sometimes it hides active discussion.

I would drop the **kwargs here and say that the usage else where (where they get ignored) is wrong and should be changed (in a different PR as that is an api change and requires extra documentation).

@iosonofabio
Copy link
Author

I filed a separare PR for the **kwargs issue: PR #3756.

@iosonofabio
Copy link
Author

I think now only the side effects FIXME is to be solved, then I should probably add some documentation and it's done, right?

@tacaswell
Copy link
Member

plus tests and cleaning up the pep8 violation that is making the current tests fail (https://travis-ci.org/matplotlib/matplotlib/builds/40041025)

@iosonofabio
Copy link
Author

Testing is helping, I fixed a few bugs and the PEP8 offenders. I still have to:

  • make a figure for the tests to compare to
  • find out why I get a divisionbyzero warning about log10
  • document

@iosonofabio
Copy link
Author

After many pains, I decided to refactor the whole logit to make it follow closely LogScale. Now not only it does not throw any warnings, but it does not contains any arbitrary parameters like p_min. Instead, I coded a Locator class similar to LogLocator that finds out the plot range from the data.

I am sorry if this forces you to recheck things, but as it happens I was finding so many little bugs here and there that the code was becoming very dirty.

I did a few tests and everything seems to work over here. I am wondering where I should document the logit scale addition... could you please enlighten me on that?
Thx!

@tacaswell
Copy link
Member

See https://github.com/matplotlib/matplotlib/tree/master/doc/users/whats_new for documentation on how to document the new feature.

@@ -14,6 +14,14 @@ def test_log_scales():
ax.axhline(24.1)


@image_comparison(baseline_images=['logit_scales'], remove_text=True)
def test_logit_scales():
ax = plt.subplot(122, yscale='logit', xscale='logit')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make just one axes?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that, I mean a single axes the fills up the whole figure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sry, ok let me do this. I just copied from above in the same file.

@tacaswell
Copy link
Member

It looks like there are more decades on the x-axis than the y-axis, is that correct?

@iosonofabio
Copy link
Author

I set nothing in particular about that, I just make sure all the data are visible. It is up to the user to specify axis limits if he wishes to.

Maybe I should clarify: I follow the same design like LogScale, which defaults to axis.get_minpos if the data is not clear about axis limits.

@iosonofabio
Copy link
Author

Ok, the problem that get_minpos is kind of specially crafted for log axes stays, but I fixed the logit tick locator with a hack for now. In case limits for the axis are outside ]0, 1[, I assume 1 - get_minpos() as axis limit. That defaults to 1 - 1e-7 and should be good for most cases.

I'll work on the documentation a bit now and then I hope the PR should get into a final state for acceptance.

@iosonofabio
Copy link
Author

Ok I added 2 lines of documentation, is that it or should I put examples and other user friendly explanations somewhere else?

@tacaswell
Copy link
Member

Putting an example in the examples directory/gallery would be great.

Finding someplace in the prose docs to write a paragraph explaining why this is useful and how to use it would be better.

@tacaswell
Copy link
Member

Also, can you sort out why all of the tests are failing?

@iosonofabio
Copy link
Author

Sry ta, I made a mess, do you want me to rebase against master to make clean diffs of the whole PR changes? I don't want you to go through the 50 commits I made in the last few hours!

Docs are still a bit of question marks, because I am not sure whether my text/formatting is ok.

@tacaswell
Copy link
Member

I would add a plot of some data with a known shape the test, the ticks look
kinda funny and not symmetric to me.

Over all the docs look pretty good to me, but I just skimmed them.

On Fri Nov 14 2014 at 4:14:52 AM Fabio Zanini notifications@github.com
wrote:

Sorry, I made a little mess that required some rebasing, but everything
should work now.

Docs are still a bit of question marks, because I am not sure whether my
text/formatting is ok.


Reply to this email directly or view it on GitHub
#3753 (comment)
.

@iosonofabio
Copy link
Author

I changed the test figure to one which is by design more symmetric, so that it's evident that x/y axis follow the exact same rules.
btw: it's hard to make anything more symmetric than this in logit scales ;-)

@tacaswell
Copy link
Member

I was thinking more like one of the extinction curves we talked about when
you first proposed this feature, as that is what it is for.

When you change the test data can you do a rebase to remove the old tests
from history.

On Fri Nov 14 2014 at 1:29:47 PM Fabio Zanini notifications@github.com
wrote:

I changed the test figure to one which is by design more symmetric, so
that it's evident that x/y axis follow the exact same rules ;-)


Reply to this email directly or view it on GitHub
#3753 (comment)
.

@iosonofabio
Copy link
Author

I don't think this matters much, because it's a test, not an example. If anything, expecting a circle and seeing a circle is a strong indication that the feature is working, which is what tests are good for. But anyway, what curve would you like to see, I am not sure I get the meaning of "extinction curve". Something like the example plot, but without randomly generated numbers? I.e. a monotonically increasing curve or so, is that the idea?

@iosonofabio
Copy link
Author

Both suggestions followed. Now I'll rebase.

@iosonofabio iosonofabio force-pushed the logit branch 2 times, most recently from 071c0b0 to c48bacf Compare March 2, 2015 22:35
@@ -478,10 +478,111 @@ def get_transform(self):
return self._transform


def _mask_non_logit(a):
"""
Return a Numpy masked array where all values outside ]0, 1[ are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring does not match what the function is doing. It is always returning a copy of the input array in which any values outside ]0, 1[ are replaced by nan.

@iosonofabio
Copy link
Author

He Eric, thanks for all the comments, which I will address ASAP.
I must confess: if those came out before I rebase for the n-th time, I'd not particularly mind. But as they are legitimate, no problem.

efiring added a commit that referenced this pull request Mar 3, 2015
@efiring efiring merged commit ed0ce1a into matplotlib:master Mar 3, 2015
@efiring
Copy link
Member

efiring commented Mar 3, 2015

The ticker module needs more work, such as #4172, so any remaining fine-tuning of the logit functionality can be done later.

@iosonofabio
Copy link
Author

Cool, thanks!

@iosonofabio iosonofabio deleted the logit branch March 3, 2015 18:38
@pierre-haessig
Copy link
Contributor

Hi @iosonofabio, great to this that this PR went in! I just have a question, is the logit scale also usable for the norm argument of plt.imshow ? (like http://stackoverflow.com/a/2546622) ?

Last week I was plotting the transition matrix of a Markov chain, with many values close to 0 and many other values close to 1. Plotting the logit transform of that matrix with a diverging colormap really gave a nice result.

@WeatherGod
Copy link
Member

Currently, I don't think that is possible, but there have been discussion
in past years to merge the scales and norm concepts in some manner. It
isn't something that can be represented via a single codebase, I don't
think, but it would be interesting to see if we could create a Normalize
subclass that could accept a scale.

On Mon, Jul 13, 2015 at 8:46 AM, Pierre Haessig notifications@github.com
wrote:

Hi @iosonofabio https://github.com/iosonofabio, great to this that this
PR went in! I just have a question, is the logit scale also usable for the
norm argument of plt.imshow ? (like http://stackoverflow.com/a/2546622) ?

Last week I was plotting the transition matrix of a Markov chain, with
many values close to 0 and many other values close to 1. Plotting the logit
transform of that matrix with a diverging colormap really gave a nice
result.


Reply to this email directly or view it on GitHub
#3753 (comment)
.

@pierre-haessig
Copy link
Contributor

To illustrate this notion of logit norm, I've created a Notebook example http://nbviewer.ipython.org/gist/pierre-haessig/2d4b46078d46e6fc3842

Thanks Ben for your feedback. I guess that the easiest path would be to create a new Normalize subclass and leave for fancy refactoring for another day. And it's not that clear that this would lead to that much code duplication.

@tacaswell
Copy link
Member

+1 on doing the simplest thing that will solve this problem.

On Mon, Jul 13, 2015, 10:48 AM Pierre Haessig notifications@github.com
wrote:

To illustrate this notion of logit norm, I've created a Notebook example
http://nbviewer.ipython.org/gist/pierre-haessig/2d4b46078d46e6fc3842

Thanks Ben for your feedback. I guess that the easiest path would be to
create a new Normalize
http://matplotlib.org/api/colors_api.html#matplotlib.colors.Normalize
subclass and leave for fancy refactoring for another day. And it's not that
clear that this would lead to that much code duplication.


Reply to this email directly or view it on GitHub
#3753 (comment)
.

@iosonofabio
Copy link
Author

I am unfortunately terribly busy these days but I would be interested in
seeing that implemented, simplest thing is the best for now I think.

On 13/07/2015 17:53, Thomas A Caswell wrote:

+1 on doing the simplest thing that will solve this problem.

On Mon, Jul 13, 2015, 10:48 AM Pierre Haessig notifications@github.com
wrote:

To illustrate this notion of logit norm, I've created a Notebook example
http://nbviewer.ipython.org/gist/pierre-haessig/2d4b46078d46e6fc3842

Thanks Ben for your feedback. I guess that the easiest path would be to
create a new Normalize
http://matplotlib.org/api/colors_api.html#matplotlib.colors.Normalize
subclass and leave for fancy refactoring for another day. And it's not
that
clear that this would lead to that much code duplication.


Reply to this email directly or view it on GitHub

#3753 (comment)
.


Reply to this email directly or view it on GitHub
#3753 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants