DOC More docs needed on model and data persistence

~~It seems the only documentation of model persistence is in the Quick Start tutorial. I think it needs to be covered in the User Guide, mentioning security and forward-compatibility caveats pertaining to pickle. It might note the benefits of joblib (for large models at least) over Pickle. Most other ML toolkits (particularly command-line tools) treat persistence as a very basic operation of the package.~~ (This was fixed in #3317)

Similarly, there should be some comment on saving and loading custom data (input and output). Indeed, I can't find a direct description of the supported data-types; users are unlikely to have played with `scipy.sparse` before (although the `feature_extraction` module means they may not need to). Noting the benefits of joblib (without which the user may dump a sparse matrix's `data`, `indices`, `indptr` using `.tofile`) and memmapping is worthwhile. So may be reference to Pandas which could help manipulate datasets before/after entering Scikit-learn, and provides import/export to a variety of formats (http://pandas.pydata.org/pandas-docs/dev/io.html).

I have recently discovered new users who think the way to import/export large sparse arrays is `load_svmlight_format`, but then note that the loading/saving takes much more time than the processing they're trying to do... Let's give them a hand.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC More docs needed on model and data persistence #2801

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

DOC More docs needed on model and data persistence #2801

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions