What is preventing sklearn to achieve true model persistence? #30609
Replies: 2 comments 2 replies
-
Basically, it is more a maintenance burden where with the team, we estimate that we could not maintain it. However, we had recent discussion in which we think that we could have a trimmed inference estimator for each estimator, reducing the impact of potential private changes that make it to update scikit-learn versions in this setting. Basically, it would make the life easier for packages as It would be possible to working on persistence with a |
Beta Was this translation helpful? Give feedback.
-
I concur with you Pierre-Bartet, it should be feasible to implement model persistence as a community effort. Issue #31143 is relevant for this discussion. There is no need for deciding on a persistence format, the only requirement is that parameters/state can be retrieved from a model, as either numpy or python native data structures. And conversely, that a model can consume the same as input for initialisation. |
Beta Was this translation helpful? Give feedback.
-
What is preventing
sklearn
to achieve true model persistence?For example
model.dump(..)
+LogisticRegression.load(...)
?All the existing solutions are brittle or force users to use exactly the same
sklearn
version for training and inference:https://scikit-learn.org/1.6/model_persistence.html
I understand that this is a deliberate choice because sklearn's team lack of resources, but offloading serialization logic to external libraries can only end up in an a much worse maintenance, communication, and interdependence nightmare.
For example
sklearn-onnx
accesses privatesklearn
components to be able to serialize them (such asPolynomialFeatures
's_min_degree
, or gradient boosting's_predictors
).Covering all of
sklearn
components would be a tremendous task, but it could be done step by step, and it is also somewhat parallelizable by assigning a few models to anyone who would be happy to help.Beta Was this translation helpful? Give feedback.
All reactions