Closed
Description
Describe the issue:
Cross post from scikit-learn's repository:
from sklearn.neighbors import KDTree
import time
def mwe(n):
start = time.time()
a = KDTree([[1]]).query_radius([[1]]*int(n), 1)
print(f'Function completed in: {time.time()-start:.2f} seconds')
start = time.time()
mwe(1e6)
print(f'Function returned after {time.time()-start:.1f} seconds')
start = time.time()
mwe(1e7)
print(f'Function returned after {time.time()-start:.1f} seconds')
With numpy==1.24.3, this results in the output:
Function completed in: 0.36 seconds
Function returned after 4.4 seconds
Function completed in: 3.75 seconds
Function returned after 44.0 seconds
Compare this to the output with numpy==1.21.6:
Function completed in: 0.57 seconds
Function returned after 0.6 seconds
Function completed in: 5.73 seconds
Function returned after 6.4 seconds
This appears to be related to garbage collecting many numpy arrays nested in a ragged object array.
Runtime information:
.[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'filepath': 'env_crest\Library\bin\mkl_rt.2.dll',
'internal_api': 'mkl',
'num_threads': 6,
'prefix': 'mkl_rt',
'threading_layer': 'intel',
'user_api': 'blas',
'version': '2023.1-Product'}]