DOC revamp model persistence documentation #18046

cmarmo · 2020-07-31T16:25:16Z

Reference Issues/PRs

Following @jnothman comment in #16875.

Closes #2801

What does this implement/fix? Explain your changes.

Add details in the model persistence documentation section.

Not sure this was the suggested direction... LMK

NicolasHug

thanks @cmarmo, gave a quick first pass

doc/getting_started.rst

NicolasHug · 2020-07-31T16:35:03Z

doc/getting_started.rst

@@ -212,6 +214,34 @@ the best set of parameters. Read more in the :ref:`User Guide
    Using a pipeline for cross-validation and searching will largely keep
    you from this common pitfall.

+Model persistence


I think we should keep the getting started guide as concise as possible and with a limited amount of info.

IMO model persistence is more of a next step, sort of like model inspection, and thus doesn't really belong here.

It also doesn't really fit into the premise of the guide:

The purpose of this guide is to illustrate some of the main features that scikit-learn provides

So I'd suggest to write this instead in the UG.

I understand your point: I have removed model persistence from the getting started guide.
But I need a way to give more visibility to model persistence, which is not a subsection of the model selection and evaluation section, but the final step of the process before deployment in production (if I understand correctly this comment). Let me know if you agree more with 2d2581d. Thanks.

doc/modules/model_persistence.rst

NicolasHug · 2020-07-31T16:39:46Z

doc/modules/model_persistence.rst

+
+PMML is an extension of the `XML
+<https://fr.wikipedia.org/wiki/Extensible_Markup_Language>`_ document standard
+defined to represent data mining and models. Beeing human and machine readable,


Suggested change

defined to represent data mining and models. Beeing human and machine readable,

defined to represent data mining and models. Being human and machine readable,

Also:

to represent data mining and models

I'm not sure what "to represent data mining" means?

Is that more understandable now? Thanks.

doc/modules/model_persistence.rst

glemaitre · 2020-08-01T07:58:56Z

doc/modules/model_persistence.rst

+Interoperable formats
+---------------------
+
+For production and quality control needs, exporting the model in `Predictive


I am thinking that we could be explicit that the model will only use the prediction part without the possibility to be refitted once exported.

I have added a note at the beginning of the section: is that ok with you?

Actually it might make more sense to move this note here as you do not have this limitation when using pickle (you can refit the model if you wish as all the hyperparams are shipped in the pickled model).

Sorry, still puzzled about the necessity of retraining using a serialized model ... using the original python script seems a more transparent option to me especially if you look for architecture or environment dependencies...
I've modified the note as you suggested... but still not convinced to move it in the interoperable section....

doc/modules/model_persistence.rst

rth

Thanks @cmarmo , minor comments, otherwise LGTM.

doc/modules/model_persistence.rst

ogrisel · 2020-08-05T14:58:25Z

doc/modules/model_persistence.rst

+Interoperable formats
+---------------------
+
+For production and quality control needs, exporting the model in `Predictive


Actually it might make more sense to move this note here as you do not have this limitation when using pickle (you can refit the model if you wish as all the hyperparams are shipped in the pickled model).

ogrisel · 2020-08-05T15:04:38Z

doc/modules/model_persistence.rst

+Model Markup Language (PMML)
+<http://dmg.org/pmml/v4-4-1/GeneralStructure.html>`_ or `Open Neural Network
+Exchange <https://onnx.ai/>`_ format
+would be a better approach than using `pickle`.


Phrasing "For production and quality control needs... PMML / ONNX would be a better approach than using pickle": I think that most people who use scikit-learn models in production typically deploy pickled models in docker containers. It's perfectly fine as long as you are aware of the limitations of the pickle format, namely treat the pickled model as a piece of executable software (as is typically the rest of the contents of the docker image) rather than a piece of data or self-decribing source code. For organizations that are used to deploying Python and docker in production this is probably the best approach.

To me the pros of using interoperable formats are:

make it possible to deploy to platforms that do not have Python installed for any reason (e.g. java-oriented sysadmin culture and JVM-centric production & diagnosis tools),

decouple the trained model from the specific runtime type and version used to train the model.

E.g. some organizations might decide to allow their datascientist to develop models using any technology (e.g. Python, R, spark) but then only deploy PMML or ONNX models on the inference servers of operational reasons.

I've added a line about pickle "containerisation". and reformulate a bit. Your two point are part of the definition of interoperability to me... should I go to this level of detail? Thanks.

rth

A few more comment otherwise LGTM, thanks @cmarmo !

rth · 2020-08-06T10:16:07Z

doc/modules/model_persistence.rst

@@ -86,7 +87,40 @@ same range as before.
 Since a model internal representation may be different on two different
 architectures, dumping a model on one architecture and loading it on
 another architecture is not supported.


Actually I'm not sure this sentence is correct #17644 (comment) As long as the Python version and the versions of dependencies are the same it should work. When it doesn't it could be a bug on our side, particularly given that pickle (+ docker) is still the primary method of deployment.

Also (sentence below) containers are architecture specific. They are a solution to reliable deployments and fixing the environment including the dependencies, but not to architecture portability.

Actually I'm not sure this sentence is correct #17644 (comment) As long as the Python version and the versions of dependencies are the same it should work. When it doesn't it could be a bug on our side, particularly given that pickle (+ docker) is still the primary method of deployment.

I've tried to mitigate the sentence: but the point here is more if scikit-learn developers are willing to guarantee pickle portability than about pickle portability itself.

Also (sentence below) containers are architecture specific. They are a solution to reliable deployments and fixing the environment including the dependencies, but not to architecture portability.

Again, I've tried to mitigate the sentence, but IMO any solution enforcing portability and interoperability is architecture dependent, the goal is to make this dependence invisible to the user. The "pickle + docker" option offers to the user a way to run predictions with a model trained on a different architecture... obviously the producer of the model should provide the right "packaging" to the user.

doc/modules/model_persistence.rst

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

…kit-learn into doc_model_persistence

cmarmo · 2020-08-26T10:32:36Z

@NicolasHug towards #18257 I have removed the model persistence section from the tutorial. It was a duplicate of the same entry in the User Guide.

…kit-learn into doc_model_persistence

jnothman

Otherwise lgtm

doc/modules/model_persistence.rst

jnothman · 2020-09-23T12:54:31Z

Thanks @cmarmo!

cmarmo · 2020-09-23T13:34:18Z

Thank you @jnothman!

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

cmarmo added 3 commits July 23, 2020 11:02

Add model persistence in Getting Started.

d8c0ae6

Merge branch 'master' into doc_model_persistence

fdf0b43

Describe PMML and ONNX.

49ca688

NicolasHug reviewed Jul 31, 2020

View reviewed changes

cmarmo added 2 commits July 31, 2020 21:31

Make model persistence an independent chapter of the User Guide.

2d2581d

Clarifying.

cabbf02

glemaitre reviewed Aug 1, 2020

View reviewed changes

cmarmo added 2 commits August 1, 2020 23:00

Merge branch 'master' into doc_model_persistence

331ac00

Add a note about refitting.

acbf0ea

glemaitre reviewed Aug 3, 2020

View reviewed changes

doc/modules/model_persistence.rst Outdated Show resolved Hide resolved

cmarmo added 2 commits August 3, 2020 10:07

Sync with upstream.

d8b7f26

Reformulate recommendation.

bd9ef73

rth reviewed Aug 3, 2020

View reviewed changes

doc/modules/model_persistence.rst Outdated Show resolved Hide resolved

doc/modules/model_persistence.rst Show resolved Hide resolved

ogrisel reviewed Aug 5, 2020

View reviewed changes

cmarmo added 3 commits August 6, 2020 01:14

Merge branch 'master' into doc_model_persistence

1c2cbf0

Address comments.

3667b51

Address comments.

563fce0

rth reviewed Aug 6, 2020

View reviewed changes

cmarmo and others added 6 commits August 6, 2020 12:41

Update doc/modules/model_persistence.rst

2a224e0

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

Merge branch 'master' into doc_model_persistence

709c502

Some clarifications.

8b2e393

Merge branch 'doc_model_persistence' of https://github.com/cmarmo/sci…

6d4ba59

…kit-learn into doc_model_persistence

Merge branch 'master' into doc_model_persistence

2c9dec7

Merge branch 'doc_model_persistence' of https://github.com/cmarmo/sci…

dc18f09

…kit-learn into doc_model_persistence

cmarmo added the Documentation label Aug 23, 2020

cmarmo added 2 commits August 26, 2020 12:26

Merge branch 'master' into doc_model_persistence

dad6ceb

Remove model persistence from tutorial.

2ae86d8

cmarmo and others added 2 commits September 1, 2020 14:19

Merge branch 'master' into doc_model_persistence

3aef493

Merge branch 'master' into doc_model_persistence

05abc39

cmarmo added 6 commits September 7, 2020 15:09

Merge branch 'master' into doc_model_persistence

ea07a01

Merge branch 'doc_model_persistence' of https://github.com/cmarmo/sci…

1854c82

…kit-learn into doc_model_persistence

Merge branch 'master' into doc_model_persistence

a3799a0

Merge branch 'master' into doc_model_persistence

e13adc1

Merge branch 'master' into doc_model_persistence

85b288b

Merge branch 'master' into doc_model_persistence

ccf6f59

jnothman reviewed Sep 21, 2020

View reviewed changes

doc/modules/model_persistence.rst Outdated Show resolved Hide resolved

doc/modules/model_persistence.rst Outdated Show resolved Hide resolved

doc/modules/model_persistence.rst Show resolved Hide resolved

doc/modules/model_persistence.rst Outdated Show resolved Hide resolved

cmarmo added 3 commits September 22, 2020 08:44

Merge branch 'master' into doc_model_persistence

7b7ff28

Address comments.

653efb5

Merge branch 'master' into doc_model_persistence

853577f

jnothman approved these changes Sep 23, 2020

View reviewed changes

jnothman merged commit 54ce422 into scikit-learn:master Sep 23, 2020

cmarmo deleted the doc_model_persistence branch September 23, 2020 13:04

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

DOC revamp model persistence documentation (scikit-learn#18046)

8e9ce3a

Co-authored-by: Roman Yurchak <rth.yurchak@gmail.com>

	defined to represent data mining and models. Beeing human and machine readable,
	defined to represent data mining and models. Being human and machine readable,

Uh oh!

DOC revamp model persistence documentation #18046

DOC revamp model persistence documentation #18046

Uh oh!

Conversation

cmarmo commented Jul 31, 2020 • edited by rth Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmarmo Jul 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cmarmo commented Aug 26, 2020

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented Sep 23, 2020

Uh oh!

cmarmo commented Sep 23, 2020

Uh oh!

Uh oh!

cmarmo commented Jul 31, 2020 •

edited by rth

Loading

cmarmo Jul 31, 2020 •

edited

Loading