Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

muhammad waseem Thu, 18 Feb 2016 10:47:05 -0800

@Sebastian:
Thanks for your reply.
Yes, I just saw the email that new version is out. I will give it a try
tomorrow.


Thanks
Regards
Waseem

On Thu, Feb 18, 2016 at 6:41 PM, Sebastian Raschka <se.rasc...@gmail.com>
wrote:

> > @Your code: Is this the full code or some part of it is missing? I can
> see
> > ...
> > after
>
> Yes, there is part of it missing -- I removed it for clarity. It's
> essentially just a whole bunch of nested for-loops (bad-style anyway, but
> that was just a quick work-around). It's basically just iterating over
> different parameters sets to do the grid-search "manually."
>
> Btw I just saw that scikit-learn 0.17.1 came out today including an
> updated version of joblib. Maybe it's worth a try to see if it may solve
> the problem?
>
>
>
>
> > On Feb 18, 2016, at 1:39 PM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
> >
> >> @Your code: Is this the full code or some part of it is missing? I can
> see
> >> ...
> >> after
> >
> > Yes, there is part of it missing -- I removed it for clarity. It's
> essentially just a whole bunch of nested for-loops (bad-style anyway, but
> that was just a quick work-around). It's basically just iterating over
> different parameters sets to do the grid-search "manually."
> >
> > Btw I just saw that scikit-learn 0.17.1 came out today including an
> updated version of joblib. Maybe it's worth a try to see if it may solve
> the problem?
> >
> >
> >> On Feb 17, 2016, at 2:53 PM, muhammad waseem <m.waseem.ah...@gmail.com>
> wrote:
> >>
> >> @Sebastian: I will add in the discussion, it looks like it is not very
> active :(
> >>
> >> @Your code: Is this the full code or some part of it is missing? I can
> see
> >> ...
> >> after
> >> parameterset2:
>
> >> for p2 in
> >> which means there is some thing missing there, no?
> >>
> >> Thanks
> >>
> >> On Wed, Feb 17, 2016 at 7:40 PM, Sebastian Raschka <
> se.rasc...@gmail.com> wrote:
> >> @Waseem Oh, wait, I just see that we already have an open issue for
> that, please see: https://github.com/scikit-learn/scikit-learn/issues/3973
> Would be great if you could add to the discussion there. Meanwhile, I will
> try to run my code again in the next few days to check if this bug still
> persists.
> >>
> >>
> >> > On Feb 15, 2016, at 4:25 PM, Sebastian Raschka <se.rasc...@gmail.com>
> wrote:
> >> >
> >> > Hm, unfortunately, that's what I thought -- sounds like a bug
> involved in joblib? Does someone has any ideas how to track this down?
> >> >
> >> > @Waseem Can you also try n_jobs=2? Here, I'd expect that it
> >> > 1)  would use maybe 2 times the 12% plus a little bit extra if
> everything is working correctly with the multi-threading.
> >> > 2) If you see something like ~30%, I'd say that there's an
> unnecessary copy made
> >> > 3) If you see something like > 30% there would be a memory leak
> somewhere
> >> >
> >> > I mentioned scenario 3, because I observed a very similar behavior
> once:
> >> > (see https://github.com/scikit-learn/scikit-learn/issues/3973)
> >> >
> >> > "I made some weird observations that my GridSearches keep failing
> after a couple of hours and I initially couldn't figure out why. I
> monitored the memory usage then over time and saw that it it started with a
> few gigabytes (~6 Gb) and kept increasing until it crashed the node when it
> reached the max. 128 Gb the hardware can take. I was experimenting with
> random forests for classification of a large number of text documents. For
> simplicity -- to figure out what's going on -- I went back to naive Bayes.
> >> > ...
> >> > After some experimentation, I finally found out that
> >> >
> >> > gc.collect()
> >> > len(gc.get_objects()) # particularly this part!
> >> >
> >> > in the for loop solves the problem and the memory usage stays
> constantly at 6.5 Gb over the run time of ~10 hours.
> >> >
> >> >
> >> >> On Feb 15, 2016, at 9:37 AM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >> >>
> >> >> @Sebastian: I have tried to run cross_validation by using n_jobs=1
> and it did not use SWAP memory, even the RAM usage was quite low (maximum
> 12%). However, this will take a longer time to finish. Any idea what to try
> now?
> >> >>
> >> >> Thanks
> >> >> Kindest Regards
> >> >> Waseem
> >> >>
> >> >> On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <
> jmschreibe...@gmail.com> wrote:
> >> >> I don't think that the data is copied for tree based classifiers. It
> uses the threading backend, so each thread should be sharing memory.
> >> >>
> >> >> On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <
> se.rasc...@gmail.com> wrote:
> >> >> I'd suggest trying n_jobs=1 and check if swap memory is used (you
> don't have to run it until completion). If this runs fine without swap, we
> can work further from there.
> >> >>
> >> >> Sent from my iPhone
> >> >>
> >> >> On Feb 12, 2016, at 2:57 PM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >> >>
> >> >>> @Sebastian: I tried with n_jobs=10 (total is equal to 12) and it
> still created the same problem. I could try running it by using n_jobs=1
> but it would be so slow that it will take ages to complete. The machine has
> 32GB RAM and it started using Swap memory after consuming full RAM.
> >> >>>
> >> >>> Is there a way to tackle or you really think that all this k-fold
> cross validation, training should be done using Spark's MLib?
> >> >>>
> >> >>> Thanks
> >> >>> Regards
> >> >>> Waseem
> >> >>>
> >> >>>
> >> >>> On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <
> se.rasc...@gmail.com> wrote:
> >> >>> Thanks for the note, Manoj, didn't know that!
> >> >>>
> >> >>> @muhammad So if there's no duplication of data across all
> processes, I guess that the you would also run into troubles with n_jobs=1.
> But just to make sure that data duplication is not an issue, could you try
> running it with n_jobs=1? In this case, probably only a smaller data set or
> machine with larger memory would help. Here, I'd probably think about using
> Spark's MLlib to deal with this particular dataset.
> >> >>>
> >> >>>> On Feb 12, 2016, at 12:30 PM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >> >>>>
> >> >>>> Hi Sebastian and Manoj,
> >> >>>> @Manoj: What should be the value of max_nbytes parameter and will
> this affect the results and time it takes to run cross_validation,
> grid_search etc?
> >> >>>> @Sebastian: Will the Spark implication will also improve the
> memory use or just the CPU?
> >> >>>>
> >> >>>>
> >> >>>> Thanks
> >> >>>> Kindest Regards
> >> >>>>
> >> >>>> On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
> m.waseem.ah...@gmail.com> wrote:
> >> >>>> Hi Sebastian and Manoj,
> >> >>>> @Manoj: What should be the value of max_nbytes parameter and will
> this affect the results and time it takes to run cross_validation,
> grid_search etc?
> >> >>>>
> >> >>>> Thanks
> >> >>>> Kindest Regards
> >> >>>> Waseem
> >>
> >
>
>
>
> ------------------------------------------------------------------------------
> Site24x7 APM Insight: Get Deep Visibility into Application Performance
> APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
> Monitor end-to-end web transactions and take corrective actions now
> Troubleshoot faster and improve end-user experience. Signup Now!
> http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140

_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?

Reply via email to