MAINT: Refactor of `numpy/core/_type_aliases.py` #24651

mtsokol · 2023-09-06T10:33:19Z

I'm starting here a new PR for reviewing as the discussion in the original one concluded.

Here I'm sharing a proposition of numpy/core/_type_aliases.py file refactor.
While working on removing other scalar aliases I had some difficulties understanding how allTypes/sctypeDict dictionaries are being created. For example, I noticed that the float alias is originally present in sctypeDict, then it's removed and then added again.

Also, as NEP 52 suggest, float96 and float128 could be removed (extended precision) and it would also cover modifying _type_aliases.py. This refactor makes it easier to introduce new changes to the available aliases list.

I decided to try to refactor this file to decompose the creation of these dictionaries into well-defined stages.
I think the purpose of this file is straightforward: Read types from the compiled multiarray module, and expose them as Python dictionaries, while mapping C naming conventions to Python conventions (e.g. interpret float as double instead of single).

I identified which "group" of entries is present in each dictionary (could it be modified for NumPy 2.0?):

	allTypes	sctypeDict
`words` (e.g. "float", "cdouble", "int_")	✅	✅
`words+bits` (e.g. "int16", "complex64")	✅	✅
`symbols` (e.g. "p", "L")	❌	✅
`symbols+bytes` (e.g. "c8", "b1")	❌	✅
`abstracts` (e.g. "inexact", "integer")	✅	❌
`numbers` (e.g. 1, 2, 3)	❌	✅
`aliases` (e.g. "complex_", "longfloat")	✅	✅
`extra/other aliases` (e.g. "bool")	❌	✅
...and there are just a few custom cases for both

Additionally, here's a decomposition of words and words+bits into three columns:

Canonical Python API name	Python API "C-like" name	Actual C type
bool
int8	byte	char
uint8	ubyte	unsigned char
int16	short	short
uint16	ushort	unsigned short
int32	intc	int
uint32	uintc	unsigned int
intp		intptr_t
uintp		uintptr_t
int32 / int64	long	long
uint32 / uint64	ulong	unsigned long
	longlong	longlong
	ulonglong	unsigned longlong
float16	half
float32	single	float
float64	double	double
float128	longdouble	long double
complex64	csingle	complex float
complex128	cdouble	complex double
complex256	clongdouble	complex long double
object_
bytes_
str_
void		void
datetime64
timedelta64

In the refactored file I build each of these groups to then join them into final dicts. I do it in a loop in def _build_dicts(), where sized int & uint aliases must take under consideration a "priority" of types - in case at least two of them are same bitsize.
After that I rename and add aliases to the created groups. In the last stage I clean entries that aren't present in the original implementation (these are custom cases).

The last, and separate, stage creates sctypes which I copied, as the implementation was clear to me.

Important to mention, this modifies a crucial piece of code that could be tricky to debug (e.g. canonical int names are aliased incorrectly on some architecture (I tested it locally only on macbook with Intel)).
I renamed the old implementation to _expired_type_aliases.py and added a test test_new_vs_expired_type_aliases that checks if dictionaries created by the new implementation are exactly the same as done by the previous code. This should flag in the CI if the new implementation fails to mimic the previous one on a specific architecture. (After some time, the original impl could be completely removed)

Please share your feedback!

seberg · 2023-09-07T13:26:36Z

We talked a bit about this and I tried to remove some C-side stuff and change the exposure a bit via: https://github.com/seberg/numpy/tree/typeinfo-remove (it could probably be simplified more, but at least 300 lines of weird C-code gone 🎉). That also removes the redefinition of the "integer priority" list in Python.

I am a bit unsure about float128. On the one hand, I hate it and want everyone to use longdouble. OTOH, I think it may be used (unnecessarily) quite a lot.
So, I am wondering if we can replace float128/float96 (when they exist) with a FutureWarning("This is a terrible alias for longdouble and may go away or be redefined to proper 128bit precision (which this does not have normally). We may also decide to remove longdouble support from NumPy at some future point.")?

mtsokol · 2023-09-09T17:00:50Z

We talked a bit about this and I tried to remove some C-side stuff and change the exposure a bit via: https://github.com/seberg/numpy/tree/typeinfo-remove (it could probably be simplified more, but at least 300 lines of weird C-code gone 🎉). That also removes the redefinition of the "integer priority" list in Python.

@seberg Thank you! I built typeinfo-remove branch locally and one thing that I noticed is that the typeinfo after refactor returns np.dtypes.Int64DType or numpy.dtypes.BoolDType as dict values instead of np.int64 and np.bool_ (it was type attribute in typeinfo). Should I transform them?

seberg · 2023-09-09T19:27:49Z

It should return the dtype, you are fetching the type(dtype) from it. Either way should work in practice via .type although no need for type(dtype) here. You can change it to return the scalar type also.

eric-wieser · 2023-09-22T19:11:03Z

In that table, I'd argue intp and uintp should be intptr_t and uintptr_t, not long and unsigned long. As written that table is wrong on windows.

eric-wieser · 2023-09-23T12:55:33Z

In fact, the table is also wrong on Linux where int64 is the C long type, not the C long long type. You can see that at https://numpy.org/doc/stable/reference/arrays.scalars.html#numpy.int_, which says that on 64-bit linux, int_ is the implementation of int64.

mtsokol · 2023-09-25T13:52:42Z

@eric-wieser thank you for your feedback!
I updated the table - would you have a spare minute to say if it's more accurate?

ngoldbaum · 2023-10-05T14:46:24Z

@mtsokol can you clear the merge conflict?

I believe all comments have been addressed so once this is mergeable I'd like to get this merged this week along with #24867. Please comment if you still have concerns.

mtsokol · 2023-10-05T14:50:46Z

@mtsokol can you clear the merge conflict?

I believe all comments have been addressed so once this is mergeable I'd like to get this merged this week along with #24867. Please comment if you still have concerns.

Sorry for confusion!
This was the old one - I created new PR that has Sebastian's changes: #24679

mtsokol added 3 commits September 6, 2023 12:23

MAINT: Refactor core/_type_aliases.py

614d969

MAINT: Add (c)longdouble guard

a510cf2

Update _type_aliases.py

17e6b8e

mtsokol self-assigned this Sep 6, 2023

github-actions bot added the 03 - Maintenance label Sep 6, 2023

mtsokol added the Numpy 2.0 API Changes label Sep 6, 2023

mtsokol mentioned this pull request Sep 6, 2023

MAINT: Refactor of core/_type_aliases.py #24411

Closed

rgommers mentioned this pull request Sep 10, 2023

DOC: Fix reference warning in some rst and code files. #24487

Merged

mtsokol mentioned this pull request Sep 10, 2023

MAINT: Refactor of numpy/core/_type_aliases.py #24679

Merged

rgommers mentioned this pull request Sep 18, 2023

Discussion: uint / int_ / float_ / complex_ in NEP 52 #24743

Closed

mtsokol mentioned this pull request Oct 5, 2023

DOC: Update types table #24867

Merged

mtsokol changed the title ~~MAINT: Refactor of numpy/core/_type_aliases.py~~ ~~MAINT: Refactor of numpy/core/_type_aliases.py~~ Oct 5, 2023

mtsokol changed the title ~~MAINT: Refactor of numpy/core/_type_aliases.py~~ MAINT: Refactor of numpy/core/_type_aliases.py Oct 5, 2023

mtsokol closed this Oct 5, 2023

mtsokol deleted the type-aliases-refactor-new-branch branch October 5, 2023 14:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Refactor of `numpy/core/_type_aliases.py` #24651

MAINT: Refactor of `numpy/core/_type_aliases.py` #24651

mtsokol commented Sep 6, 2023 •

edited

Loading

seberg commented Sep 7, 2023

mtsokol commented Sep 9, 2023

seberg commented Sep 9, 2023

eric-wieser commented Sep 22, 2023

eric-wieser commented Sep 23, 2023 •

edited

Loading

mtsokol commented Sep 25, 2023

ngoldbaum commented Oct 5, 2023

mtsokol commented Oct 5, 2023

MAINT: Refactor of numpy/core/_type_aliases.py #24651

MAINT: Refactor of numpy/core/_type_aliases.py #24651

Conversation

mtsokol commented Sep 6, 2023 • edited Loading

seberg commented Sep 7, 2023

mtsokol commented Sep 9, 2023

seberg commented Sep 9, 2023

eric-wieser commented Sep 22, 2023

eric-wieser commented Sep 23, 2023 • edited Loading

mtsokol commented Sep 25, 2023

ngoldbaum commented Oct 5, 2023

mtsokol commented Oct 5, 2023

MAINT: Refactor of `numpy/core/_type_aliases.py` #24651

MAINT: Refactor of `numpy/core/_type_aliases.py` #24651

mtsokol commented Sep 6, 2023 •

edited

Loading

eric-wieser commented Sep 23, 2023 •

edited

Loading