-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
PyPy tests timeouts / memory usage investigation #27662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for looking into this. The PyPy garbage collector kicks into action when it sees there is too much memory allocated (the Another possible cause for lack of collection is that PyPy does not break C-API memory cycles. There is some discussion of the problem in a open PyPy issue. TL;DR: we hope adoption of HPy will eliminate the dual representation of objects in C and in RPython that makes this problem very difficult. However, I would expect such cycles to at least show up in the |
I check the Cython code for the loss in accordance to the snippet earlier. I think that there is 2 relevant piece of info. The allocation is done in the Python part: scikit-learn/sklearn/_loss/loss.py Lines 186 to 198 in 5c85b58
So we have a scikit-learn/sklearn/_loss/_loss.pyx.tp Lines 1025 to 1048 in 5c85b58
In this part, the only thing that could be weird is the call to @lesteve would you mind to try the following patch that avoid the call to diff --git a/sklearn/_loss/_loss.pyx.tp b/sklearn/_loss/_loss.pyx.tp
index 8efeeea77f..915380d260 100644
--- a/sklearn/_loss/_loss.pyx.tp
+++ b/sklearn/_loss/_loss.pyx.tp
@@ -1045,7 +1045,7 @@ cdef class {{name}}(CyLossFunction):
):
loss_out[i] = sample_weight[i] * {{closs}}(y_true[i], raw_prediction[i]{{with_param}})
- return np.asarray(loss_out)
+ # return np.asarray(loss_out)
{{if closs_grad is not None}}
def loss_gradient(
diff --git a/sklearn/_loss/loss.py b/sklearn/_loss/loss.py
index 11cb0e42c4..bd3d0df884 100644
--- a/sklearn/_loss/loss.py
+++ b/sklearn/_loss/loss.py
@@ -189,13 +189,14 @@ class BaseLoss:
if raw_prediction.ndim == 2 and raw_prediction.shape[1] == 1:
raw_prediction = raw_prediction.squeeze(1)
- return self.closs.loss(
+ self.closs.loss(
y_true=y_true,
raw_prediction=raw_prediction,
sample_weight=sample_weight,
loss_out=loss_out,
n_threads=n_threads,
)
+ return loss_out
def loss_gradient(
self, |
I found my linux machine and could try @lesteve script with the above changes. Interestingly, it removes the memory leak:
So we should probably try to look at the |
We do not need those |
We could report a minimal reproducer to Cython and/or PyPy so that they could work out a way to avoid this problem in the first place, +1 for simplifying our code anyways to drop the useless |
Great news. A smaller reproducer would be welcome, although I imagine it hits the C-python mutual reference cycle problem I mentioned above. At least I could add it to the PyPy FAQ of "things to avoid". |
FYI with the fix from #27662 (comment), the full test suite still uses ~9.5GB on PyPy so this saved about 1.5GB. The Azure VM has 7GB of RAM (according to this) so more effort is needed to find other places where similar issues occur ... |
I also recall that we use |
I check how many times the pattern happened: > git grep "np.asarray(" | grep ".pyx:" | wc -l
37 We might need to fix those if we want to be on the safe side and hoping that what is in the |
I could come up with the following minimal reproducer based on @lesteve script. %%cython
import psutil
import gc
from functools import partial
import platform
import cython
import numpy as np
IS_PYPY = platform.python_implementation() == "PyPy"
def cy_func(cython.floating[::1] arr_out):
cdef:
int i
int size_arr = arr_out.shape[0]
for i in range(size_arr):
arr_out[i] = 10.0
return np.asarray(arr_out)
def func():
for _ in range(100):
arr_out = np.ones(1_000_000, dtype=np.float64)
cy_func(arr_out)
def main():
for i in range(10):
memory_usage = psutil.Process().memory_info().rss / 1e9
message = f'{i} psutil: {memory_usage:.3f}GB'
if IS_PYPY:
pypy_memory_usage = gc.get_stats().memory_allocated_sum
message += f' , pypy total allocated: {pypy_memory_usage}'
print(message)
func()
if IS_PYPY:
print(gc.get_stats())
if __name__ == '__main__':
main() 0 psutil: 0.287GB , pypy total allocated: 193.2MB
1 psutil: 1.087GB , pypy total allocated: 183.0MB
2 psutil: 1.888GB , pypy total allocated: 183.0MB
3 psutil: 2.688GB , pypy total allocated: 183.1MB
4 psutil: 3.489GB , pypy total allocated: 183.0MB
5 psutil: 4.289GB , pypy total allocated: 183.1MB
6 psutil: 5.089GB , pypy total allocated: 183.0MB
7 psutil: 5.890GB , pypy total allocated: 183.0MB
8 psutil: 6.690GB , pypy total allocated: 183.0MB
9 psutil: 7.490GB , pypy total allocated: 183.1MB
Total memory consumed:
GC used: 49.0MB (peak: 204.6MB)
in arenas: 0.2kB
rawmalloced: 46.0MB
nursery: 3.0MB
raw assembler used: 921.6kB
-----------------------------
Total: 49.9MB
Total memory allocated:
GC allocated: 181.0MB (peak: 207.9MB)
in arenas: 128.7MB
rawmalloced: 72.9MB
nursery: 3.0MB
raw assembler allocated: 2.0MB
-----------------------------
Total: 183.0MB
Total time spent in GC: 4.735 |
Thanks for the minimum reproducer. I opened a pypy issue. |
So maybe we can wait before attempting to fix the 37 occurrences of this pattern in scikit-learn. Please let us know @mattip if this is too complex to fix on the PyPy/Cython side: we could then change scikit-learn to avoid this pattern in the first place. |
|
I agree when the allocation is done by the Cython code itself. But in this case, the array is allocated (with numpy) and passed by the Python function as input argument to the Cython function. The Python caller could just return the original array it allocated and the Cython function would not return anything. This would be slightly more efficient (by creating a single numpy array instead of 2 numpy arrays that share the same data buffer). |
PyPy timed out 3 times out of the 6 last runs, this is going in the right direction but not quite there yet ...
|
Saving the reproducer as
So cython has a class |
Can we close as #27670 is merged? |
I would keep this one open since I may have another look in the future. Having said that, this is not very high priority, since PyPy tests time out a lot rarely after #27670 was merged. A quick look at #27750 seems to indicate that PyPy tests timed out 5 times since November 8 (i.e. in 23 days or so) so the situation is a lot better than previously when it was always timing out. |
The slowest test by far is tests/test_multioutput.py::test_classifier_chain_tuple_order[list], at 267 seconds where the next slowest is 162 seconds. Thanks for looking at this. |
Thanks for the input! I said "timed out" which is not very accurate. I am reasonably confident the tests get killed because they use too much memory. |
Which in turns causes the CI host to swap memory and slow the test run to the point of reaching the job-level timeout. |
We could very easily reduce the |
As discussed at the last dev meeting, since none of us plans to invest more effort in fixing the root cause of the memory usage problem regularly observed on the PyPy nightly CI run, I posted the following call for help on our linkedin account: and on my personal mastodon account: feel free to re-post. |
Thanks, I think if someone has some interest in scikit-learn on PyPy and wants to look at this, a good starting point could be to:
|
EDIT: one of the main causes of the problem described below has already been fixed by #27670. However, despite this improvement, there are still important memory problems remaining when running the scikit-learn test suite on PyPy. So similar investigation and fixes are needed to iteratively solve the next worst offenders until the tests can run with an amount of memory comparable to what we observe with CPython (instead of a factor of 10).
Original description:
I had a closer look at the PyPy tests which have been timing out for a while, here is the result of my investigation. This may also help in the future to have a central issue for this rather than the discussion being split in different automatically created issues in this repo and scikit-learn-feedstock.
The PyPy tests locally needs ~11GB on my machine whereas it is 1.2GB with CPython. I ran them without using pytest-xdist to simplify things.
PyPy

CPython

It seems like one of the where the memory usage grows with time is the linear_model tests (needs 3.4GB with PyPy and 200MB with CPython locally).
PyPy

CPython

I manage to reproduce the issue (memory growing way more than on CPython) with the following snippet, where one of our Cython loss functions is called many times in a tight loop:
Output shows that the memory usage grows with time and that it is about 10 times the total memory allocated reported by PyPy:
Not entirely sure whether this is a red herring, there is an issue with our Cython loss implementation, or this is an issue at the interaction between Cython and PyPy. Maybe @mattip has some insights about this?
On CPython the memory usage is stable around 100MB
My environment for good measure
The text was updated successfully, but these errors were encountered: