Skip to content

Refactor color parsing of Axes.scatter #11663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 4, 2018

Conversation

timhoffm
Copy link
Member

@timhoffm timhoffm commented Jul 15, 2018

PR Summary

As discussed in #11383, it would be helpful to extract the color parameter parsing of Axes.scatter() to a separate method. This helps in managing the complexity of the code: separted from other stuff in scatter(), easier and faster to test because one does need a figure to test the parameter parsing.

Some things that can be improved in the future, but are beyond the scope of this PR:

  • I did only some simple refactorings so far. I'm quite sure this function can be untangled more. This will become easier after this PR, because we can now add a further tests.
  • The tests from TestScatter::test_scatter_c should be moved to simple tests on the Axes._parse_scatter_color_args later on.
  • Ideally, Axes._parse_scatter_color_args would be a static method. So far, it still has a single self reference for obtaining self._get_patches_for_fill.get_next_color(). We cannot easily get rid of that.
    Passing the fallback color in is not an option, because it's only evaluated conditionally and evaluation has the side effect of advancing the color cycle. On the other hand, we cannot return a 'DUMMY_FALLBACK_COLOR' and evaluate it outside the method, because the result is further processed within the method. Good ideas how to handle this are welcome.

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant

@jklymak
Copy link
Member

jklymak commented Sep 6, 2018

OK, so for this PR, no functionality changes? Its just a refactor? I'm 👍 on the refactor....

@timhoffm
Copy link
Member Author

timhoffm commented Sep 6, 2018

Jep, no functional changes. This just extracts the function and regroups independent parts of its logic.

And of course, added some tests. 😄.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better to me. Thanks for all the tests too....

@timhoffm timhoffm force-pushed the scatter-parameter-parsing branch from 05516df to 07c1c62 Compare November 1, 2018 21:42
@timhoffm
Copy link
Member Author

timhoffm commented Nov 1, 2018

Rebased and updated to handle merge conflict due to #12431.

@timhoffm
Copy link
Member Author

timhoffm commented Nov 3, 2018

Rebased to handle merge conflict due to #12673

if edgecolors is None and not rcParams['_internal.classic_mode']:
edgecolors = 'face'

c_is_none = c is None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"c_was_none"? (otherwise the next line reads a bit weird)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


c_is_none = c is None
if c is None:
if facecolors is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would tend to merge the conditionals:

c = (facecolors if facecolors is not None
     else "b" if classic_mode
     else self._get_patches_for_fill.get_next_color())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

try: # First, does 'c' look suitable for value-mapping?
c_array = np.asanyarray(c, dtype=float)
n_elem = c_array.shape[0]
if c_array.shape in [xshape, yshape]:
Copy link
Contributor

@anntzer anntzer Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something funny is going on here (I know it was already there before, but still seems worth pointing out): we take x and y of any shapes and flatten them (they just need to have the same size), but c has to match either the shape of x or y, not just the size.

In other words the following "work" (i.e. perform an implicit ravel()):

scatter(np.arange(12).reshape((3, 4)), np.arange(12).reshape((4, 3)), c=np.arange(12).reshape((3, 4)))
scatter(np.arange(12).reshape((3, 4)), np.arange(12).reshape((4, 3)), c=np.arange(12).reshape((4, 3)))

but the following fail:

scatter(np.arange(12).reshape((3, 4)), np.arange(12).reshape((4, 3)), c=np.arange(12).reshape((6, 2)))
# and even
scatter(np.arange(12).reshape((3, 4)), np.arange(12).reshape((4, 3)), c=np.arange(12))

Of course that last one has the best error message (irony intended):

ValueError: 'c' argument has 12 elements, which is not acceptable for use with 'x' with size 12, 'y' with size 12.

Copy link
Member Author

@timhoffm timhoffm Nov 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well observed. I prefer not to change functionality as part of this refactoring, but I'm happy to open an issue/pr for that. - Also not sure what the intended behavior should be (should we modify c to be as permissive as with x/y, or should we be more strict about x/y shape?

Edit: Issue added #12735

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ValueError: 'c' argument has 12 elements, which is not acceptable for use with 'x' with size 12, 'y' with size 12.

is a really awful error. Has the intuitive functionality that was present before been restored?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please continue this conversation at #12735. I think a PR that gave a better error for this case would be most welcome.

)
# Both the mapping *and* the RGBA conversion failed: pretty
# severe failure => one may appreciate a verbose feedback.
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else: (goes together with the if: above)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@timhoffm timhoffm force-pushed the scatter-parameter-parsing branch 2 times, most recently from 892b72f to 99cae22 Compare November 4, 2018 12:09
"'c' argument has {nc} elements, which is not "
"acceptable for use with 'x' with size {xs}, "
"'y' with size {ys}."
.format(nc=n_elem, xs=xsize, ys=ysize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of tracking n_elem I think you can just do nc=np.shape(c)[0] here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have edge cases in which np.shape(c)[0] will raise here. E.g.

plt.scatter([1,2], [1,2], c='rgb')

...
ValueError:

During handling of the above exception, another exception occurred:

~/dev/matplotlib/lib/matplotlib/axes/_axes.py in _parse_scatter_color_args(self, c, edgecolors, kwargs, xshape, yshape)
   4144                         "acceptable for use with 'x' with size {xs}, "
   4145                         "'y' with size {ys}."
-> 4146                             .format(nc=np.shape(c)[0], xs=xsize, ys=ysize)
   4147                     )
   4148                 else:

IndexError: tuple index out of range

Makes shure the intended message is printed, even if we cannot determine the shape of c. Even if it says -1 there it's better than just "tuple index out of range".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair enough

raise ValueError(
"'c' argument must either be valid as mpl color(s) "
"or as numbers to be mapped to colors. "
"Here c = {}." # <- beware, could be long depending on c.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the standard wording would be something like

"'c' must be either this or that, not\n{}".format(c)

(the newline being there only to handle cases of very long c's)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Message rewritten. I don't think that exception messages should contain explicit newlines.

"with a single row if you really want to specify "
"the same RGB or RGBA value for all points.")
# Wrong size; it must not be intended for mapping.
valid_shape = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a mild preference for moving the invalid shape error message up and just do raise ValueError(invalid_shape.format(...)) here and below instead of keeping the valid_shape and n_elem tracking variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that the overall structure is not very clear and that it can be written much more concise, but that's beyond this PR. Therefore I'd like to defer any further code changes within the function. As written in the original PR message:

I'm quite sure this function can be untangled more. This will become easier after this PR, because we can now add a further tests.

So the strategy is:

  1. Get this PR in without changing more than neccesary of the original code.
  2. Move the tests to directly testing Axes._parse_scatter_color_args, which is significantly faster than creating a figure each time.
  3. This allows adding more tests an thus be surer that code changes do not modify the functionality.

else:
# Both the mapping *and* the RGBA conversion failed: pretty
# severe failure => one may appreciate a verbose feedback.
raise ValueError(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At that point the only error we can get is the one thrown by to_rgba_array which I think is just fine to let through (Invalid RGBA argument: ...).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above would like to defer this to future PRs.

@timhoffm timhoffm force-pushed the scatter-parameter-parsing branch from 0466d49 to 1b68e69 Compare November 4, 2018 14:46
@anntzer anntzer merged commit ac0525e into matplotlib:master Nov 4, 2018
@timhoffm timhoffm deleted the scatter-parameter-parsing branch November 4, 2018 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants