Skip to content
This repository was archived by the owner on Jun 10, 2020. It is now read-only.

Add NumPy scalar hierarchy #14

Merged
merged 14 commits into from
Apr 10, 2018
Merged

Add NumPy scalar hierarchy #14

merged 14 commits into from
Apr 10, 2018

Conversation

alanhdu
Copy link
Contributor

@alanhdu alanhdu commented Mar 20, 2018

No description provided.

_Scalar = TypeVar("_Scalar", bound="generic")

class generic:
@property
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we define a private base class for all the shared methods/properties between generic and ndarray? Otherwise we'll have quite a bit to keep in sync.

class int16(signedinteger): ...
class int32(signedinteger): ...
class int64(signedinteger): ...
int_ = int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regrettably, this is actually platform dependent, e.g., on Windows, it's int32: numpy/numpy#9464

class number(generic): ...
class bool_(generic): ...
class object_(generic): ...
class datetime64(generic): ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget timedelta64. Unfortunately it inherits from np.integer, though this is likely to change in the future: numpy/numpy#10685

@@ -358,6 +358,187 @@ class ndarray(Iterable, Sized, SupportsInt, SupportsFloat, SupportsComplex,
def __getattr__(self, name) -> Any: ...


_Scalar = TypeVar("_Scalar", bound="generic")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can actually define circular references in .pyi files, like I did for _DtypeLike above.

Copy link
Member

@eric-wieser eric-wieser Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach in this patch is the approach recommended by PEP484

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, in that case we should probably fix _DtypeLike :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, this is the recommended approach for typing self

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting that bound="generic" could be written as bound=generic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize. I suppose the PEP self-typing example is not within a stub file, so perhaps my point is irrelevant.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In typeshed we prefer not doing any quoting of this sort.


class generic:
@property
def real(self: _Scalar) -> _Scalar: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

real and imag are not guaranteed to return the same scalar type -- they convert complex numbers to reals.

class int8(signedinteger): ...
class int16(signedinteger): ...
class int32(signedinteger): ...
class int64(signedinteger): ...
Copy link
Member

@eric-wieser eric-wieser Mar 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly this doesn't actually reflect the underlying model - in reality, the types are byte, short, intc, int_, and longlong, and the sized aliases are attached in a platform-dependent manner.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it really matter whether we define aliases or the real classes here? I would actually be OK with doing only these aliases. For annotations (and clearly written code in general), it's best to avoid platform dependent types like int.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well since there are actually 5 underlying types, sometimes numpy returns a type that is not referred to by any of these four aliases (which is made worse by numpy/numpy#10151 making them look the same).

I think what you have there is an argument about removing the unsized types from numpy - which I think is valid. But I think the type annotations should reflect what actually happens, not how we want things to happen.

class float16(floating): ...
class float32(floating): ...
class float64(floating): ...
class float128(floating): ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly the actual types here are half, single, double, longdouble, and similarly for complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is relevant because float128 is sometimes called float96, but both are always aliases for longdouble

@alanhdu
Copy link
Contributor Author

alanhdu commented Mar 20, 2018

I've factored out an _ArrayLike class (although I need to play around with the types some more to see how to get it to typecheck the way I want).

I've also moved the platform-specific types together. AFAICT though, on the Python side the platform-specific types are aliases for the sized types (e.g. for me, np.short.__name__ == 'int16'), so the problem reduces to how to decide which sized type the unsized types correspond to (mypy supports sys.platform and maybe also sys.maxsize?). Or is that not how it actually works?

@eric-wieser
Copy link
Member

eric-wieser commented Mar 20, 2018

on the Python side the platform-specific types are aliases for the sized types

The __name__ attributes are misleading and currently non-unique (numpy/numpy#10151)

>>> np.int_.__name__  # on windows, this gives int32, so use `intc` below
'int64'
>>> np.longlong.__name__
'int64'
>>> np.int_ is np.longlong  # C long and C longlong are different types
False

There really is a unique type for each of the underlying C types, but the python-visible names are the sized name for that type. Does mypy use the stubs to print the type name in a message, or the actual __name__ attribute?

@alanhdu
Copy link
Contributor Author

alanhdu commented Mar 21, 2018

I believe that MyPy uses the actual type name (from a small test case):

Argument 1 to "f" has incompatible type "int64"; expected "longlong"

Is the "right" thing to do here is to just to have each of the platform specific types be their own distinct type? How important is it to get all the sized aliases matched up with their underlying components and how hard is that to do?

@JelleZijlstra
Copy link

Mypy definitely doesn't look at the runtime __name__ attribute; all it knows about numpy types necessarily comes from the stubs.

@eric-wieser
Copy link
Member

As part of setup.py, we could import numpy and generate the alias stubs based on the actual alias values

@alanhdu
Copy link
Contributor Author

alanhdu commented Mar 25, 2018

I really like generating the alias stubs when the user installs them! I've opened up a follow-up issue (#15) to discuss that idea further.

For now, I've fixed the type errors and decided to leave the platform-specific types for another PR (which by default leaves them as Any types). Do you mind taking another look @shoyer @eric-wieser?

(Also sorry for the super long iteration cycles -- I unfortunately don't have a ton of time to really zero-in on this so it usually takes a couple days for me to circle back to stubbing out numpy).

real: ndarray

_ArraySelf = TypeVar("_ArraySelf", bound=_ArrayLike)
class _ArrayLike(SupportsInt, SupportsFloat, SupportsComplex, SupportsBytes,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a good name - in the docs, array_like means can be converted to array.

Perhaps _ArrayOrScalarCommon?

class generic(_ArrayLike):
@property
def base(self) -> None: ...
class _is_real(generic):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing newline above _is_real, I think

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_real_generic would be a better name


class flexible(_is_real): ...
class void(flexible): ...
class character(_is_real): ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, but it's also bizarre:

>>> np.bytes_(b'test').real
'test'
>>> np.bytes_(b'test').imag
''

What were we thinking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I had to guess, I think imag by default returns the "0" version of the data type -- while this makes sense for numbers, it has bizarre consequences for things that aren't numbers:

In [4]: np.datetime64(dt.datetime.now()).real
Out[4]: numpy.datetime64('2018-03-26T15:22:14.672804')

In [5]: np.datetime64(dt.datetime.now()).imag
Out[5]: numpy.datetime64('1970-01-01T00:00:00.000000')

or

In [13]: np.object_(["hello world"]).real
Out[13]: array(['hello world'], dtype=object)

In [14]: np.object_(["hello world"]).imag
Out[14]: array([0], dtype=object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue opened at numpy/numpy#10818

alanhdu added 2 commits March 26, 2018 15:19
array_like means "can be converted to array"
@shoyer
Copy link
Member

shoyer commented Mar 29, 2018

👍 this looks great to me!

Two things that would be nice to have:

  • Update the return value of dtype.type to Type[generic].
  • If you can think of any, add a few additional smoke tests to our test file (e.g., verify that you can construct a numpy scalar and check its properties without an error from mypy)

@eric-wieser
Copy link
Member

eric-wieser commented Mar 29, 2018

Update the return value of dtype.type to Type[generic].

What about object?

Edit: nevermind

@alanhdu
Copy link
Contributor Author

alanhdu commented Mar 29, 2018

@shoyer @eric-wieser I went a little overboard with the test suite and ended up writing a small test framework (I wanted to assert that certain things failed and that led me down a bit of a rabbit hole).

I put the documentation in the testing readme, the important bit is that:

There are three main directories of tests right now:

There are three main directories of tests right now:

  • pass/ which contain Python files that must pass mypy checking with
    no type errors
  • fail/ which contain Python files that must fail mypy checking
    with the annotated errors
  • reveal/ which contain Python files that must output the correct
    types with reveal_type

fail and reveal are annotated with comments that specify what error
mypy threw and what type should be revealed respectively. The format
looks like:

bad_function   # E: <error message>
reveal_type(x)   # E: <type name>

Right now, the error messages and types are must be contained within
corresponding mypy message
.

Running the tests

We use py.test to orchestrate our tests. You can just run:

py.test

to run the entire test suite. To run mypy on a specific file (which
can be useful for debugging), you can also run:

$ cd tests
$ MYPYPATH=.. mypy <file_path>

Is that ok? If not, I'm happy to move the test suite stuff to another PR for a longer discussion (I'm not totally sold on the # E: annotations and I'm not super happy with having to parse mypy's string output).

@shoyer
Copy link
Member

shoyer commented Mar 29, 2018

The test suite bit looks pretty sweet to me! Thanks @alanhdu

@alanhdu
Copy link
Contributor Author

alanhdu commented Apr 3, 2018

@shoyer @eric-wieser Do you mind taking another look at this? Hopefully it's close 😆.

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor suggestions

assert lineno in errors, f'Extra error "{marker}"'
assert marker in errors[lineno]
else:
assert "# E:" in target_line, f'Error "{errors[lineno]}" not found'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of the redundant assert, can you just use pytest.fail()?

assert lineno in errors, f'Extra error "{marker}"'
assert marker in errors[lineno]
else:
assert "# E:" in target_line, f'Error "{errors[lineno]}" not found'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To guarantee quotes are escaped properly, let's cast with repr() inside the f-string, e.g., f'Error {repr(errors[lineno])}" not found'.

@shoyer
Copy link
Member

shoyer commented Apr 5, 2018

OK, looks good to me.

@eric-wieser any more thoughts before I merge?

@shoyer shoyer merged commit f0cb9ff into numpy:master Apr 10, 2018
@shoyer
Copy link
Member

shoyer commented Apr 10, 2018

OK, merged. Thanks @alanhdu

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants