MAINT: Refactor of `core/_type_aliases.py` #24411

mtsokol · 2023-08-13T21:35:22Z

Here I'm sharing a proposition of numpy/core/_type_aliases.py file refactor.
While working on removing other scalar aliases I had some difficulties understanding how allTypes/sctypeDict dictionaries are being created. For example, I noticed that the float alias is originally present in sctypeDict, then it's removed and then added again.

I decided to try to refactor this file to decompose the creation of these dictionaries into well-defined stages.
I think the purpose of this file is straightforward: Read types from the compiled multiarray module, and expose them as Python dictionaries, while mapping C naming conventions to Python conventions (e.g. interpret float as double instead of single).

I identified which "group" of entries is present in each dictionary (could it be modified for NumPy 2.0?):

	allTypes	sctypeDict
`words` (e.g. "float", "cdouble", "int_")	✅	✅
`words+bits` (e.g. "int16", "complex64")	✅	✅
`symbols` (e.g. "p", "L")	❌	✅
`symbols+bytes` (e.g. "c8", "b1")	❌	✅
`abstracts` (e.g. "inexact", "integer")	✅	❌
`numbers` (e.g. 1, 2, 3)	❌	✅
`aliases` (e.g. "complex_", "longfloat")	✅	✅
`extra/other aliases` (e.g. "bool")	❌	✅
...and there are just a few custom cases for both

Additionally, here's a decomposition of words and words+bits into three columns (/ symbol means that it may vary depending on 32 or 64-bit arch):

Canonical Python API name	Python API "C-like" name	Actual C type
bool
int8	byte	char
uint8	ubyte	unsigned char
int16	short	short
uint16	ushort	unsigned short
int32	intc	int
uint32	uintc	unsigned int
intp		long
uintp		unsigned long
int64	longlong	longlong
uint64	ulonglong	unsigned longlong
float16	half
float32	single	float
float64	double	double
float128	longdouble	long double
complex64	csingle	complex float
complex128	cdouble	complex double
complex256	clongdouble	complex long double
object_
bytes_
str_
void		void
datetime64
timedelta64

In the refactored file I build each of these groups to then join them into final dicts. I do it in a loop in def _build_dicts(), where sized int & uint aliases must take under consideration a "priority" of types - in case at least two of them are same bitsize.
After that I rename and add aliases to the created groups. In the last stage I clean entries that aren't present in the original implementation (these are custom cases).

The last, and separate, stage creates sctypes which I copied, as the implementation was clear to me.

Important to mention, this modifies a crucial piece of code that could be tricky to debug (e.g. canonical int names are aliased incorrectly on some architecture (I tested it locally only on macbook with Intel)).
I renamed the old implementation to _expired_type_aliases.py and added a test test_new_vs_expired_type_aliases that checks if dictionaries created by the new implementation are exactly the same as done by the previous code. This should flag in the CI if the new implementation fails to mimic the previous one on a specific architecture. (After some time, the original impl could be completely removed)

Please share your feedback!

rgommers · 2023-08-16T13:10:52Z

That's a very useful analysis and clear table, thanks @mtsokol. I think what may be useful for added insight here is explicitly listing all the names for the first two rows. I'd name those "canonical names for C types" and "canonical names in Python". There will be some adjustments here already, e.g. the actual C name float is what I'd expect us to call np.single. Something like (add complex types, str/object/intp/etc. too):

Canonical C name	Canonical Python API name
half	float16
single	float32
double	float64
longdouble	n/a
byte	int8
...
ubyte	uint8
...
n/a	bool

allTypes doesn't seem to be public, so I'm not sure we should focus on it in the first instance.

mtsokol · 2023-08-16T16:45:07Z

Sure! I included such table and I filled it with values coming from sctypeDict. Both columns are present in the dictionary (so all names can be used in python code) but they should clarify relationships between specific names.

allTypes is here to help understand how proposed implementation works (the refactored file builds allTypes and sctypeDict at the same time, as original version).

rgommers · 2023-08-16T18:30:18Z

Sorry, maybe I should have suggested 3 columns. What I meant was the Python "C-like" and "Pythonic" names. The actual type in C could be useful too. So for this one for example:

int (only in C) | int32 / intc
-- | --

in Python code we'd prefer if people wrote np.int32. And the "C-like" name we also want to keep is intc. And any other aliases are on the list for removal.

The long/int64 one doesn't look quite correct. It's longlong in Python I believe as the "C-like" alias.

intp/uintp should be separate entries probably.

mtsokol · 2023-08-17T10:12:17Z

I updated the 2nd table with three columns ("Canonical Python", "C-like python", "compatible C type") - I wanted to clarify some rows.

The long/int64 one doesn't look quite correct. It's longlong in Python I believe as the "C-like" alias.

As int / long / longlong Python counterparts may vary depending on architecture, I checked mappings on my machine (Darwin x86_64) and Emscripten wasm32, and added all combinations that I found with / in these rows. WDYT?

intp/uintp should be separate entries probably.

Sure! Moved to separate rows.

numpy/core/_type_aliases.py

rgommers · 2023-08-18T08:01:05Z

As int / long / longlong Python counterparts may vary depending on architecture, I checked mappings on my machine (Darwin x86_64) and Emscripten wasm32, and added all combinations that I found with / in these rows. WDYT?

I think I'd write it like this:

Canonical Python API name	Python API "C-like" name	Actual C type
-	intc	int
-	uintc	unsigned int
intp	?	long

So for C types that vary in size are exposed in the Python API, we don't have a canonical sized alias, and there should be one preferred name only. So C int becomes intc (or int_, but I think intc would be preferred), and C long becomes intp. I admit I don't know why we don't have a np.long or np.long_.

mtsokol · 2023-08-18T10:34:00Z

I think I'd write it like this:
Canonical Python API name Python API "C-like" name Actual C type
intc int
uintc unsigned int
intp ? long

So for C types that vary in size are exposed in the Python API, we don't have a canonical sized alias, and there should be one preferred name only. So C int becomes intc (or int_, but I think intc would be preferred), and C long becomes intp. I admit I don't know why we don't have a np.long or np.long_.

I updated the table so there are no multiple options depending on architecture (thus int_ is not in the table).

rgommers · 2023-08-18T13:40:18Z

Getting there! Still some more to do for integers I think. What I'd like to get to is for each row to be unique. C int / long / longlong are separate types, so they should never be on the same row. I think the situation is:

Python name int32 corresponding to C-like name intc and C type int (well, the latter is technically not true - https://stackoverflow.com/questions/14515874/difference-between-int32-int-int32-t-int8-and-int8-t - but in a practical sense it always is true I believe)
same for int64 and longlong, C type long long
there is no fixed-size equivalent for long; that is intp

mtsokol · 2023-08-18T14:14:26Z

Getting there! Still some more to do for integers I think. What I'd like to get to is for each row to be unique. C int / long / longlong are separate types, so they should never be on the same row. I think the situation is: ...

My previous update accidentally didn't get saved - I updated the table one more time. It should finally have unique values in each cell.

rgommers · 2023-08-18T14:24:09Z

This looks quite good to me now, thanks Mateusz. Last thoughts for now:

we need to add bool as the canonical Python name, and keep bool_ as an alias because np.bool does not exist in 1.25.x/1.26.x
the C types for complex types can be added: complex float, complex double, and complex long double,

mtsokol · 2023-09-06T10:35:12Z

Closing as #24651 has been opened to hide outdated comments.

github-actions bot added the 03 - Maintenance label Aug 13, 2023

mtsokol force-pushed the type-aliases-refactor branch 2 times, most recently from 91d92ec to 1a495ab Compare August 14, 2023 16:36

mtsokol mentioned this pull request Aug 16, 2023

API: Cleaning numpy/__init__.py and main namespace - Part 3 [NEP 52] #24376

Merged

mtsokol mentioned this pull request Aug 17, 2023

ENH: Reflect changes from numpy namespace refactor Part 3 jax-ml/jax#17155

Closed

eendebakpt reviewed Aug 17, 2023

View reviewed changes

numpy/core/_type_aliases.py Outdated Show resolved Hide resolved

numpy/core/_type_aliases.py Outdated Show resolved Hide resolved

numpy/core/_type_aliases.py Outdated Show resolved Hide resolved

numpy/core/_type_aliases.py Outdated Show resolved Hide resolved

ngoldbaum added the Numpy 2.0 API Changes label Aug 29, 2023

mtsokol added 2 commits August 29, 2023 16:57

MAINT: Refactor core/_type_aliases.py

7ad889c

MAINT: Add (c)longdouble guard

109b1aa

mtsokol force-pushed the type-aliases-refactor branch from 1a495ab to 109b1aa Compare August 29, 2023 14:59

mtsokol self-assigned this Aug 29, 2023

Update _type_aliases.py

bdcb9ac

mtsokol force-pushed the type-aliases-refactor branch from 6eae269 to bdcb9ac Compare August 29, 2023 16:06

mtsokol closed this Sep 6, 2023

mtsokol deleted the type-aliases-refactor branch April 3, 2024 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT: Refactor of `core/_type_aliases.py` #24411

MAINT: Refactor of `core/_type_aliases.py` #24411

Uh oh!

mtsokol commented Aug 13, 2023 •

edited

Loading

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 16, 2023 •

edited

Loading

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 17, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rgommers commented Aug 18, 2023

Uh oh!

mtsokol commented Aug 18, 2023 •

edited

Loading

Uh oh!

rgommers commented Aug 18, 2023

Uh oh!

mtsokol commented Aug 18, 2023

Uh oh!

rgommers commented Aug 18, 2023

Uh oh!

mtsokol commented Sep 6, 2023

Uh oh!

Uh oh!

Uh oh!

MAINT: Refactor of core/_type_aliases.py #24411

MAINT: Refactor of core/_type_aliases.py #24411

Uh oh!

Conversation

mtsokol commented Aug 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 16, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Aug 16, 2023

Uh oh!

mtsokol commented Aug 17, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rgommers commented Aug 18, 2023

Uh oh!

mtsokol commented Aug 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rgommers commented Aug 18, 2023

Uh oh!

mtsokol commented Aug 18, 2023

Uh oh!

rgommers commented Aug 18, 2023

Uh oh!

mtsokol commented Sep 6, 2023

Uh oh!

Uh oh!

MAINT: Refactor of `core/_type_aliases.py` #24411

MAINT: Refactor of `core/_type_aliases.py` #24411

mtsokol commented Aug 13, 2023 •

edited

Loading

mtsokol commented Aug 16, 2023 •

edited

Loading

mtsokol commented Aug 18, 2023 •

edited

Loading