scikit's GridSearch and Python in general are not freeing memory #3973

rasbt · 2014-12-16T20:18:01Z

Hi,
I recently asked a question on StackOverflow here about an issue that I encountered with scikit-learn's GridSearch and memory utilization.

Basically, it allocated more and more memory the longer it runs after the job fails when it reaches the 128 GB on the system I am running it on. I have more details written in the StackOverflow question I linked above, and I also created a GitHub repo where I put the script and data if you want to reproduce this issue.

https://github.com/rasbt/bugreport/tree/master/scikit-learn/gridsearch_memory

The text was updated successfully, but these errors were encountered:

rth · 2016-09-27T09:14:15Z

As this is a 2 year old issue and addresses v0.15, both GridSearchCV and joblib probably had significant changes since; @rasbt is this issue still relevant, or should it be closed?

rasbt · 2016-09-27T13:48:48Z

Haven't tried this particular setting for a while. I would say let's close it due to the reasons you mentioned, but let me just try to re-run this in the next few days to double check.

rasbt · 2016-09-30T17:52:58Z

I just ran the same code (the one that I posted on github in Dec 2014) over night and it seemed to be very stable this time (sklearn 0.18 and python 3.5) :); looks like the issue has been resolved some time ago!

amueller · 2016-09-30T19:34:19Z

thanks for checking :)

rishabhgit · 2019-04-15T02:07:06Z

Hi @amueller , @rasbt ,
I've run into the same issue while trying to optimise hyper-parameters for a Random Forest Regressor.
I'm using Python 3.6 and sklearn version 0.20.3
My data set is not huge (~350k rows * 370 cols). Here's a snippet of the code that I'm using

    y_train = np.array(df_train[label])
    X_train = np.array(df_train[x_cols])
    weights = np.array(df_train['PREMISES_COUNT'])
    print('X_train shape ', X_train.shape)
    print('y_train shape ', y_train.shape)
    print('Sample weight shape ', weights.shape)
    
    param_grid = {'n_estimators': [100,150, 200],
                  'min_samples_leaf': [2,10,20],
                  'min_samples_split': [10,15,20],
                  'max_features': ['auto', 'sqrt', 'log2']
    }
    rf = sk_rfreg( random_state=0)
    #scorer = make_scorer(r2_score,sample_weight=weights)
    rs = RandomizedSearchCV(estimator=rf, param_distributions=param_grid, n_iter=15, 
                           scoring = 'neg_mean_absolute_error',
                           n_jobs=5, cv=3)
    
    rs = rs.fit(X_train, y_train)
    print('Best R^2 score', rs.best_score_)
    print ('Best Params:')
    print(rs.best_params_)

I'm running this code on an EC2 server with 128 GB RAM. RandomSearchCV spawns more than 5 python processes which slowly consume all the RAM and run out of memory before the execution completes. Here's the output of the 'top' command

Could this issue be reopened and investigated?

jnothman · 2019-04-15T08:25:16Z

@rishabhgit I don't think this issue is relevant...

amueller added the Bug label Jan 22, 2015

amueller modified the milestone: 0.19 Sep 29, 2016

rasbt closed this as completed Sep 30, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

scikit's GridSearch and Python in general are not freeing memory #3973

scikit's GridSearch and Python in general are not freeing memory #3973

rasbt commented Dec 16, 2014

rth commented Sep 27, 2016

Uh oh!

rasbt commented Sep 27, 2016

Uh oh!

rasbt commented Sep 30, 2016

Uh oh!

amueller commented Sep 30, 2016

Uh oh!

rishabhgit commented Apr 15, 2019 •

edited

Loading

Uh oh!

jnothman commented Apr 15, 2019

Uh oh!

Uh oh!

scikit's GridSearch and Python in general are not freeing memory #3973

scikit's GridSearch and Python in general are not freeing memory #3973

Comments

rasbt commented Dec 16, 2014

rth commented Sep 27, 2016

Uh oh!

rasbt commented Sep 27, 2016

Uh oh!

rasbt commented Sep 30, 2016

Uh oh!

amueller commented Sep 30, 2016

Uh oh!

rishabhgit commented Apr 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Apr 15, 2019

Uh oh!

rishabhgit commented Apr 15, 2019 •

edited

Loading