-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: np.asarray return a copy with shared memory #24478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIW, in 1.23.5, That said, the examples suggest a little more than what the rest of the docstring claims. The docstring doesn't claim that the given object is returned, though this is often the case. It only claims that no copy (implicitly, "of the data") is performed if none is needed, given the other arguments. And that's what happens here. A new |
I think it's worth adjusting our examples, at least, to not suggest that |
I think it probably makes sense for us to try to make the pickle behavior more sensible, I would naively expect |
I bisected this to #21995. I suspect that pickle is going through the code path that calls |
Ah no, not quite right. The behavior of getting back a dtype object that is not the same as the object dtype singleton when unpickling an object array is older, #21995 just made it so this older behavior led to a new array object getting created. |
Hey, I'll be diving into this np.asarray pickle issue tomorrow. Looks like the crux might be with how np.dtype('O') is pickled/unpickled and its interaction with np.asarray. I'll start by isolating the problem in the code and running some tests, see what's going on under the hood. Any quick pointers before I jump in? Cheers, |
You'll want to look at how dtype objects are pickled and unpickled. The dtype object (what numpy internally calls a descriptor) is defined in |
Thanks, so I have been doing a bit of digging. When unpickling a NumPy array, is there a mechanism in place to ensure that if the dtype of the array corresponds to one of the built-in singleton dtype objects, the singleton is reused instead of creating a new instance? I'm concerned about the potential memory and performance overhead of having redundant dtype objects in memory as this is likely where it is coming from. If not these are a few ideas that may work? Object Signature Matching: This would involve comparing the signature of the unpickled dtype with built-in singleton dtypes and reusing if a match is found. But would be great to get some wider opinions on the best strategy before I try any of them Best, |
Honestly, I think the best strategy is to fix the examples in the docstring. No one should be using |
I agree about the |
Would a note like "*Note: The use of the |
It's more informative and less verbose to change the examples themselves to show the right thing. The right way to do these checks is not more complicated. |
I'd like to point out that the data-api standard specifies a |
No, but that's not relevant to this reported issue. |
Based on our discussion here, I've made changes to the asarray docstring. I have submitted a PR addressing this issue: #24714. Any feedback or further suggestions would be appreciated! Best, |
Describe the issue:
According to
np.asarray(a)
docs, it may return the original array or a copy.In the case below, while
np.asarray(a, dtype='object') is not a
, -np.shares_memory(np.asarray(a, dtype='object'), a)
given that a is copied.Reproduce the code example:
Error message:
No response
Runtime information:
1.25.2
3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) [GCC 11.3.0]
[{'numpy_version': '1.25.2',
'python': '3.10.11 | packaged by conda-forge | (main, May 10 2023, 18:58:44) '
'[GCC 11.3.0]',
'uname': uname_result(system='Linux', node='itay-jether', release='6.2.0-26-generic', version='#26~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Jul 13 16:27:29 UTC 2', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/itay/miniforge3/envs/jpy310/lib/python3.10/site-packages/numpy.libs/libopenblas64_p-r0-5007b62f.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'},
{'architecture': 'SkylakeX',
'filepath': '/home/itay/miniforge3/envs/jpy310/lib/python3.10/site-packages/scipy.libs/libopenblasp-r0-41284840.3.18.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.18'},
{'architecture': 'SkylakeX',
'filepath': '/home/itay/miniforge3/envs/jpy310/lib/python3.10/site-packages/cvxopt.libs/libopenblasp-r0-5c2b7639.3.23.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23'},
{'architecture': 'Prescott',
'filepath': '/home/itay/miniforge3/envs/jpy310/lib/python3.10/site-packages/scs.libs/libopenblas-r0-f650aae0.3.3.so',
'internal_api': 'openblas',
'num_threads': 1,
'prefix': 'libopenblas',
'threading_layer': 'disabled',
'user_api': 'blas',
'version': None},
{'filepath': '/home/itay/miniforge3/envs/jpy310/lib/python3.10/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0',
'internal_api': 'openmp',
'num_threads': 8,
'prefix': 'libgomp',
'user_api': 'openmp',
'version': None}]
Context for the issue:
I'm using pandas and have pickled dataframes, when using
df.astype(str)
the dataframe is changed in-placesee issue in pandas
The text was updated successfully, but these errors were encountered: