-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC More docs needed on model and data persistence #2801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
scipy.io also deserves a mention when discussing dataset loading. |
A good issue for a novice to work on? |
Is all the current persistence documentation in model_persistence.rst? On Mon, Sep 29, 2014 at 5:19 PM, jnothman notifications@github.com wrote:
|
Basically, yes On 1 October 2014 11:46, Joel Nothman joel.nothman@sydney.edu.au wrote:
|
Starting review of model_persistence.rst … |
Thanks @MattpSoftware ! |
Hey @amueller/@jnothman - does this still need some work? I'm happy to open a PR and work on it. |
Well this is certainly and old issue, but it would be great if someone who knows both common and efficient ways to store data to review what our user guides and tutorials currently say on the topic |
Ha yeah the new bug issues are a little too competitive so I thought I'd start working on some of the older issues.. I'm not 100% sure about the efficiency side, but I'm wondering if there's scope to do a model persistence code tutorial? Here's the overview of what I'd do:
Keen to hear your thoughts @jnothman |
Hi guys, I was searching a good easy issue where to start. This seems already done to me. Can I help here some how? |
It seems the only documentation of model persistence is in the Quick Start tutorial. I think it needs to be covered in the User Guide, mentioning security and forward-compatibility caveats pertaining to pickle. It might note the benefits of joblib (for large models at least) over Pickle. Most other ML toolkits (particularly command-line tools) treat persistence as a very basic operation of the package.(This was fixed in #3317)Similarly, there should be some comment on saving and loading custom data (input and output). Indeed, I can't find a direct description of the supported data-types; users are unlikely to have played with
scipy.sparse
before (although thefeature_extraction
module means they may not need to). Noting the benefits of joblib (without which the user may dump a sparse matrix'sdata
,indices
,indptr
using.tofile
) and memmapping is worthwhile. So may be reference to Pandas which could help manipulate datasets before/after entering Scikit-learn, and provides import/export to a variety of formats (http://pandas.pydata.org/pandas-docs/dev/io.html).I have recently discovered new users who think the way to import/export large sparse arrays is
load_svmlight_format
, but then note that the loading/saving takes much more time than the processing they're trying to do... Let's give them a hand.The text was updated successfully, but these errors were encountered: