-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Refactor of core/_type_aliases.py
#24411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
91d92ec
to
1a495ab
Compare
That's a very useful analysis and clear table, thanks @mtsokol. I think what may be useful for added insight here is explicitly listing all the names for the first two rows. I'd name those "canonical names for C types" and "canonical names in Python". There will be some adjustments here already, e.g. the actual C name
|
Sure! I included such table and I filled it with values coming from
|
Sorry, maybe I should have suggested 3 columns. What I meant was the Python "C-like" and "Pythonic" names. The actual type in C could be useful too. So for this one for example:
in Python code we'd prefer if people wrote The
|
I updated the 2nd table with three columns ("Canonical Python", "C-like python", "compatible C type") - I wanted to clarify some rows.
As
Sure! Moved to separate rows. |
I think I'd write it like this:
So for C types that vary in size are exposed in the Python API, we don't have a canonical sized alias, and there should be one preferred name only. So C int becomes |
I updated the table so there are no multiple options depending on architecture (thus |
Getting there! Still some more to do for integers I think. What I'd like to get to is for each row to be unique. C
|
My previous update accidentally didn't get saved - I updated the table one more time. It should finally have unique values in each cell. |
This looks quite good to me now, thanks Mateusz. Last thoughts for now:
|
1a495ab
to
109b1aa
Compare
6eae269
to
bdcb9ac
Compare
Closing as #24651 has been opened to hide outdated comments. |
Hi @rgommers @seberg @ngoldbaum,
Here I'm sharing a proposition of
numpy/core/_type_aliases.py
file refactor.While working on removing other scalar aliases I had some difficulties understanding how
allTypes/sctypeDict
dictionaries are being created. For example, I noticed that thefloat
alias is originally present insctypeDict
, then it's removed and then added again.I decided to try to refactor this file to decompose the creation of these dictionaries into well-defined stages.
I think the purpose of this file is straightforward: Read types from the compiled
multiarray
module, and expose them as Python dictionaries, while mapping C naming conventions to Python conventions (e.g. interpretfloat
asdouble
instead ofsingle
).I identified which "group" of entries is present in each dictionary (could it be modified for NumPy 2.0?):
words
(e.g. "float", "cdouble", "int_")words+bits
(e.g. "int16", "complex64")symbols
(e.g. "p", "L")symbols+bytes
(e.g. "c8", "b1")abstracts
(e.g. "inexact", "integer")numbers
(e.g. 1, 2, 3)aliases
(e.g. "complex_", "longfloat")extra/other aliases
(e.g. "bool")Additionally, here's a decomposition of
words
andwords+bits
into three columns (/
symbol means that it may vary depending on 32 or 64-bit arch):In the refactored file I build each of these groups to then join them into final dicts. I do it in a loop in
def _build_dicts()
, where sizedint
&uint
aliases must take under consideration a "priority" of types - in case at least two of them are same bitsize.After that I rename and add aliases to the created groups. In the last stage I clean entries that aren't present in the original implementation (these are custom cases).
The last, and separate, stage creates
sctypes
which I copied, as the implementation was clear to me.Important to mention, this modifies a crucial piece of code that could be tricky to debug (e.g. canonical int names are aliased incorrectly on some architecture (I tested it locally only on macbook with Intel)).
I renamed the old implementation to
_expired_type_aliases.py
and added a testtest_new_vs_expired_type_aliases
that checks if dictionaries created by the new implementation are exactly the same as done by the previous code. This should flag in the CI if the new implementation fails to mimic the previous one on a specific architecture. (After some time, the original impl could be completely removed)Please share your feedback!