Skip to content

Memory leak in decision trees #8623

Closed
Closed
@ppallesen

Description

@ppallesen

I enconter memory leakage in decition trees both on my own computer (ubuntu 16.0 on anaconda python 2.7) using a fresh scikit-learn from pip and on a kaggle kernel python 3.5 (https://www.kaggle.com/ppallesen/titanic/notebook388ea683bf/). I seems to be related to the number of jobs, since the memory leakege is strongly reduced by reducing the number of jobs to 1. So I think there might be a memory leakeage in the parellel statement. The size of the memory use changes alot from run to run, which seems kind of odd.

import gc
import os
import numpy as np
from sklearn.ensemble import ExtraTreesClassifier
import psutil
p = psutil.Process(os.getpid())
X = np.random.normal(size=(10000, 50))
Y = np.random.binomial(1, 0.5, size=(10000, ))


def print_mem():
    print("{:.0f}MB".format(p.memory_info().rss / 1e6))

print_mem()

for i in range(5):
    et = ExtraTreesClassifier(n_estimators=1000, max_features=1,
                              n_jobs=4).fit(X, Y)
    del et
    gc.collect()
    print_mem()

out:

115MB
402MB
387MB
715MB
703MB
879MB

So the memory consumption before the ExtraTreesClassifier is 104 MB and after running it and deleting it the memory consumstion is several hundrend mega bits higher but not nesseraly incresing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions