Skip to content

BUG: Garbage collection order of magnitude slower in numpy>=1.22 #24232

Closed
@BrandonSmithJ

Description

@BrandonSmithJ

Describe the issue:

Cross post from scikit-learn's repository:

from sklearn.neighbors import KDTree
import time

def mwe(n):
    start = time.time()
    a = KDTree([[1]]).query_radius([[1]]*int(n), 1)
    print(f'Function completed in: {time.time()-start:.2f} seconds')

start = time.time()
mwe(1e6)
print(f'Function returned after {time.time()-start:.1f} seconds')

start = time.time()
mwe(1e7)
print(f'Function returned after {time.time()-start:.1f} seconds')

With numpy==1.24.3, this results in the output:

Function completed in: 0.36 seconds
Function returned after 4.4 seconds

Function completed in: 3.75 seconds
Function returned after 44.0 seconds

Compare this to the output with numpy==1.21.6:

Function completed in: 0.57 seconds
Function returned after 0.6 seconds

Function completed in: 5.73 seconds
Function returned after 6.4 seconds

This appears to be related to garbage collecting many numpy arrays nested in a ragged object array.

Runtime information:

.[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2'],
'not_found': ['AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL']}},
{'filepath': 'env_crest\Library\bin\mkl_rt.2.dll',
'internal_api': 'mkl',
'num_threads': 6,
'prefix': 'mkl_rt',
'threading_layer': 'intel',
'user_api': 'blas',
'version': '2023.1-Product'}]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions