Thanks Sebastian.

This is basically what we are doing too. The hard/time consuming part is 
determining what attributes of each sckikit-learn object need to be saved and 
how best to extract them.

- Keith

-----Original Message-----
From: Sebastian Raschka [mailto:se.rasc...@gmail.com] 
Sent: Wednesday, March 23, 2016 4:05 PM
To: scikit-learn-general@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] Scikit-learn standards for 
serializing/saving objects

I also had some issues with Pickle in the past and have to admit that I 
actually don't trust pickle files ;). Maybe, I am too paranoid, but I am always 
afraid of corrupting or losing the data.
Probably not the most elegant solution, but I typically store estimator 
settings and model parameters as JSON files (since they are human readable in 
the worst case scenario having "reproducible research" in mind ;)).


For example:


# Model fitting and saving params to JSON

from sklearn.linear_model import LinearRegression from sklearn.datasets import 
load_diabetes

diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
regr = LinearRegression()
regr.fit(X, y)

import json

with open('./params.json', 'w', encoding='utf-8') as outfile:
    json.dump(regr.get_params(), outfile)
    
with open('./weights.json', 'w', encoding='utf-8') as outfile:    
    json.dump(regr.coef_.tolist(), outfile, separators=(',', ':'), 
sort_keys=True, indent=4)
    
with open('./intercept.json', 'w', encoding='utf-8') as outfile:    
    json.dump(regr.intercept_, outfile)  


# In a new session: load the params from the JSON files


import json
import codecs
from sklearn.linear_model import LinearRegression from sklearn.datasets import 
load_diabetes import numpy as np

diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

obj_text = codecs.open('./params.json', 'r', encoding='utf-8').read() params = 
json.loads(obj_text)

obj_text = codecs.open('./weights.json', 'r', encoding='utf-8').read() weights 
= json.loads(obj_text)

obj_text = codecs.open('./intercept.json', 'r', encoding='utf-8').read() 
intercept = json.loads(obj_text)

regr = LinearRegression()
regr.set_params(**params)
regr.intercept_, regr.coef_ = intercept, np.array(weights)

regr.predict(X[:10])

array([ 206.11706979,   68.07234761,  176.88406035,  166.91796559,
        128.45984241,  106.34908972,   73.89417947,  118.85378669,
        158.81033076,  213.58408893])


In any case, I know that this isn't pretty, and I would also be looking forward 
to a better solution!

Best,
Sebastian Raschka


> On Mar 23, 2016, at 12:47 PM, Keith Lehman <kleh...@intercapenergy.com> wrote:
> 
> Hi:
>  
> I’m fairly new to scikit-learn, python, and machine learning. This community 
> has built a great set of libraries though, and is actually a large part of 
> the reason why my company has selected python to experiment with ML.
>  
> As we are developing our product, however, we keep running into trouble 
> saving various objects. When possible, we use pickle to save the objects, but 
> this can cause problems in development – objects saved during a debug session 
> can not be loaded outside of the debugger. The reason appears to be because 
> even when pickling a “pickleable” object (such as a trained 
> LinearRegression), pickle finds and saves more primitive objects that have 
> been instantiated within the debug environment. Dill and cpickle have the 
> same issue. My question is, does the scikit-learn community plan to add 
> standard load/save or dump/dumps and load/loads methods that would not create 
> these dependencies?
>  
> If there is a better forum for posting questions like these, please let me 
> know and I’ll be happy to post there instead.
>  
> Thanks! 
>  
> Keith Lehman
> Cell: 617-834-2863
> Skype: k.lehman
> e-mail: kleh...@intercapenergy.com
>  
> ----------------------------------------------------------------------
> --------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with Intel Data 
> Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140______
> _________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with Intel Data Analytics 
Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7497 / Virus Database: 4545/11867 - Release Date: 03/23/16
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to