Skip to content

MAINT Use np.full when possible #12379

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 22, 2019
Merged

Conversation

rth
Copy link
Contributor

@rth rth commented Oct 3, 2018

This uses np.full in a few places instead of cst*np.ones which avoids a memory copy and should be ~2x faster.

@anntzer
Copy link
Contributor

anntzer commented Oct 3, 2018

You'll probably need to wait until we bump to numpy>=1.12 (next january, by #12358) or specify a bunch of dtypes manually, see #10571 for basically the same thing.

@rth
Copy link
Contributor Author

rth commented Oct 3, 2018

Thanks for your feedback @anntzer ! I was not aware of the change of behavior in numpy 1.12.

I have specified the dtype manually in the few places it was missing -- I think it's more reliable in any case than relying on the dtype of the second argument (i.e. 5 vs 5.).

This PR is less comprehensive that the one you did earlier, as I only made the changes in places where I though this had a change to impact the run time performance of the library (e.g. excluding examples, and some tests where the array size is very small).

@@ -3978,7 +3978,7 @@ def dopatch(xs, ys, **kwargs):
# maybe draw the fliers
if showfliers:
# fliers coords
flier_x = np.ones(len(stats['fliers'])) * pos
flier_x = np.full(len(stats['fliers']), pos, dtype='float64')
Copy link
Contributor

@anntzer anntzer Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dtype=float is generally preferrable, unless you have a good reason to force float64 even on 32bit architectures (which at least we didn't do before).
same comment applies throughout.

Copy link
Member

@WeatherGod WeatherGod Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, no, dtype=float is not preferable as it is referencing python floats. dtype='float64' or dtype=np.float_ (if you don't want to specify the width) is generally considered preferable. Also note dtype=np.int_ and dtype=np.bool_ all exist for similar reasons.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but dtype=float will give you a proper np.float_ array... at this place, it's literally adding 4 characters for no change in behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You stated "is generally preferrable". Meanwhile: https://stackoverflow.com/questions/6205020/numpy-types-with-underscore-int-float-etc

Travis's Numpy book states reasons for why he created those dtypes. Yes, in many cases, you end up with the same thing, but they do exist for a reason.

Copy link
Contributor

@anntzer anntzer Oct 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh... the link you quote literally says "Whenever a data type is required, as an argument, the standard Python types are recognized as well" and I don't think (?) there's a single sentence in this quote saying that the dtypes are preferable in this use-case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was, and still is, that the fact that python datatypes are accepted is merely a convenience, and is not declared as "generally preferable". I have never seen anyone before state that the python datatypes are "generally preferable", and instead, I have seen people state the opposite because "explicit is better than implicit". I think what the submitter has is perfectly fine. And if the submitter had it as dtype=float in the original submission, I probably wouldn't have said anything anyway, either. I am only opposing the assertion that your suggestion was "generally preferable".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am only opposing the assertion that your suggestion was "generally preferable".

That's fine with me, let's agree to not derail this PR and postpone this discussion to, well, not now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To address the initial comment,

>>> import numpy as np
>>> np.dtype(float)
dtype('float64')
>>> np.dtype(np.float64)
dtype('float64')
>>> np.dtype(np.float_)
dtype('float64')

so the actual dtype when providing dtype=float will be float64. That will be true for both 32 bit and 64 bit systems (checked in a 32 bit Debian docker container), unlike np.dtype(int) which can be int32 or int64 depending on the system.

I also think it's better to say float64 explicitly even if it's 3 char longer, because that's the actual numpy dtype we get, and it removes the need for readers to think about the sizeof float on 32/64 bit architecture and about possible implications of that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, np.float64 is a slightly better spelling than 'float64'

@@ -169,7 +169,8 @@ def __init__(self, hatch, density):
self.num_rows = (hatch.count('*')) * density
path = Path.unit_regular_star(5)
self.shape_vertices = path.vertices
self.shape_codes = np.ones(len(self.shape_vertices)) * Path.LINETO
self.shape_codes = np.full(len(self.shape_vertices), Path.LINETO,
dtype='float64')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be some integer type, i think?

@@ -231,7 +231,7 @@ def create_artists(self, legend, orig_handle,
xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent,
width, height, fontsize)

ydata = ((height - ydescent) / 2.) * np.ones(xdata.shape, float)
ydata = np.full(xdata.shape, ((height - ydescent) / 2.), float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may as well drop the dot after "/2" while you're at it, the __future__(division) is now

@@ -316,7 +316,7 @@ def make_compound_path_from_polys(cls, XY):
stride = numsides + 1
nverts = numpolys * stride
verts = np.zeros((nverts, 2))
codes = np.ones(nverts, int) * cls.LINETO
codes = np.full(nverts, cls.LINETO, dtype=int)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should perhaps have been cls.code_type?

@anntzer
Copy link
Contributor

anntzer commented Oct 5, 2018

There's a bunch of FutureWarnings on the numpy==1.10 build (related to the change in dtype) which we probably don't want. You can make the dtype explicit in these calls, or just see whether this is gone in numpy 1.11 and bump the dependency. (If numpy 1.12 is needed, that's allowed next january.)

@rth
Copy link
Contributor Author

rth commented Oct 5, 2018

Thanks @anntzer! I know. Trying to track them down but so far unsuccessfully as the warning are not raised with stacklevel=2 so they don't say where in matplotlib they are actually raised. All the np.full calls from this PR have an explicit dtype.
Also one of the latest dtype changes made a test fail, and I need to find out which one.

I'll run the tests locally to investigate, and will fix it -- it may take a few days though.

@anntzer
Copy link
Contributor

anntzer commented Oct 5, 2018

the warning are not raised with stacklevel=2

would be nice (for the future) for numpy to do that :)

@rth
Copy link
Contributor Author

rth commented Jan 22, 2019

I rebased on the latest Python3 only master with a more recent numpy. CI seems happy now.

@@ -169,7 +169,8 @@ def __init__(self, hatch, density):
self.num_rows = (hatch.count('*')) * density
path = Path.unit_regular_star(5)
self.shape_vertices = path.vertices
self.shape_codes = np.ones(len(self.shape_vertices)) * Path.LINETO
self.shape_codes = np.full(len(self.shape_vertices), Path.LINETO,
dtype=np.int32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably should be Path.code_type.

@@ -231,7 +231,7 @@ def create_artists(self, legend, orig_handle,
xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent,
width, height, fontsize)

ydata = ((height - ydescent) / 2.) * np.ones(xdata.shape, float)
ydata = np.full(xdata.shape, ((height - ydescent) / 2), float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.full_like(xdata, (height - ydescent) / 2)

@@ -552,7 +552,8 @@ def interpolated(self, steps):
vertices = simple_linear_interpolation(self.vertices, steps)
codes = self.codes
if codes is not None:
new_codes = Path.LINETO * np.ones(((len(codes) - 1) * steps + 1, ))
new_codes = np.full(((len(codes) - 1) * steps + 1, ), Path.LINETO,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can pass the shape as scalar instead of 1-tuple

@@ -864,7 +865,7 @@ def arc(cls, theta1, theta2, n=None, is_wedge=False):
if is_wedge:
length = n * 3 + 4
vertices = np.zeros((length, 2), float)
codes = cls.CURVE4 * np.ones((length, ), cls.code_type)
codes = np.full((length, ), cls.CURVE4, dtype=cls.code_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can pass shape as scalar

@@ -873,7 +874,7 @@ def arc(cls, theta1, theta2, n=None, is_wedge=False):
else:
length = n * 3 + 1
vertices = np.empty((length, 2), float)
codes = cls.CURVE4 * np.ones((length, ), cls.code_type)
codes = np.full((length, ), cls.CURVE4, dtype=cls.code_type)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can pass shape as scalar

@@ -243,7 +243,7 @@ def _refine_triangulation_once(triangulation, ancestors=None):
np.arange(ntri, dtype=np.int32)]))
edge_apexes = np.ravel(np.vstack([np.zeros(ntri, dtype=np.int32),
np.ones(ntri, dtype=np.int32),
np.ones(ntri, dtype=np.int32)*2]))
np.full(ntri, 2, dtype=np.int32)]))
Copy link
Contributor

@anntzer anntzer Jan 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps replace the previous two lines to use np.full too? Looks nicer :)

Also I think the whole thing is just

np.concatenate([np.full(ntri, 0, np.uint32),
                np.full(ntri, 1, np.uint32),
                np.full(ntri, 2, np.uint32)])

right?

edge_apexes = np.ravel(np.vstack([np.zeros(ntri, dtype=np.int32),
np.ones(ntri, dtype=np.int32),
np.full(ntri, 2, dtype=np.int32)]))
edge_elems = np.tile(np.arange(3, dtype=np.int32), ntri)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like you got 3 and ntri swapped?

@rth
Copy link
Contributor Author

rth commented Jan 22, 2019

Thanks for the review @anntzer ! I think I addressed all your comments, waiting for CI now.

looks like you got 3 and ntri swapped?

Yes, thanks! using np.tile / np.repeat in #12379 (comment) should be simpler / faster..

@@ -231,7 +231,7 @@ def create_artists(self, legend, orig_handle,
xdata, xdata_marker = self.get_xdata(legend, xdescent, ydescent,
width, height, fontsize)

ydata = ((height - ydescent) / 2.) * np.ones(xdata.shape, float)
ydata = np.full_like(xdata, ((height - ydescent) / 2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specify dtype?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copying xdata's dtype (which full_like does) seems normal.

@dopplershift dopplershift merged commit 86d961c into matplotlib:master Jan 22, 2019
@rth rth deleted the use-np-full branch January 22, 2019 19:39
@QuLogic QuLogic added this to the v3.1 milestone Jan 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants