-
-
Notifications
You must be signed in to change notification settings - Fork 26k
DOC add FAQ entry for the many linear model classes #19861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lorentzenchr !
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lorentzenchr , I think it's great that we start addressing this common question upfront.
However the current version seems like a justification from a developer point of view, but not from a user perspective. Just by reading this is seems that it would still be perfectly possible to just unify all this into a single class.
Also I think we'd need to address another important oddity of the linear_model
module: the fact that the SGD estimators even exist, instead of being just another solver option of the aforementioned classes.
@NicolasHug Thanks for having a look and your feedback. I divided it into maintainer and user perspective. Is it better now? |
I'm wondering: knowing what we know now, if we were to re-design the For now I don't think the current text will make a reader say "Oh ok, that makes sense, I understand why everything is the way it is now." At least it doesn't have that effect on me (and probably not on them either :p) I wasn't around when these decisions were made, so I'm not able to provide any answer here. Perhaps @GaelVaroquaux @ogrisel @agramfort @amueller @jnothman can provide some insights? Also again I think it's relevant to include the |
@NicolasHug breaking something vastly used to satisfy a feeling of esthetic is probably not a good idea. Design is a difficult thing, and it is impossible to satisfy everybody (in particular on reddit). People approach the library with a background knowledge, in terms of names that they know and think that they understand. Our design was driven to answer this approach. A fraction of people have good-enough training in statistics and ML to understand that l1-penalized squared loss is a lasso, and hinge or squared hinge are SVMs. We certainly do not speak well to these people, but other libraries do (for instance lightning, http://contrib.scikit-learn.org/lightning/ ). Separating the various classes also enables algorithmic optimizations, such as good choices for the solvers (arguably, we could maybe separate out the sparse logistic regression from the non-sparse one). I wouldn't really revisit the separation of the various classes. It overall helps making the library approachable. I do think that better documenting the links is a good idea (thank you @lorentzenchr ). Ideally, we should check all the estimators to check the "see also" section of the docstring. We should also proof our docs to check that the links are well specified. Where I find that this is a vast room for improvement is in our online solver, SGDClassifier/SGDRegressor. Not only does it not map well to the user's vocabulary, but also they are certainly not "fire and forget". I, for one, would welcome work to add MiniBatch versions of some of the linear model classes, witch carefully-tuned algorithmics so that they are easily useful (such as early-stopping). If we do this, after a few releases, we could maybe consider retiring the SGDCassifier/SGDRegressor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think that the explanation capture the big picture and are useful. Thank you!
I made a few minor comments, but to me, this is good to go.
doc/faq.rst
Outdated
|
||
Why are there so many different estimators for linear models? | ||
------------------------------------------------------------- | ||
Usually, there is one classifier and one regressor per algorithms, e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably mean "model", rather than "algorithm". An algorithm eg is the type of solver used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about "model type"?
For linear models, there are many estimator classes which are very close to | ||
each other. Let us have a look at | ||
|
||
- :class:`~linear_model.LinearRegression`, no penalty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we say here that it is there only for teaching purpose. IMHO, it should never be used, and I would love to retire this class (though I do know that not everybody agrees with this desire).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, it is the wrong place to mention this. And I belong to the mentioned small group...
For linear models, there are many estimator classes which are very close to | ||
each other. Let us have a look at | ||
|
||
- :class:`~linear_model.LinearRegression`, no penalty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- :class:`~linear_model.LinearRegression`, no penalty | |
- :class:`~linear_model.LinearRegression`, no penalty (for teaching purposes, prefer Ridge with a small penalty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be read as "prefer Ridge with a small penalty for teaching purposes", so maybe:
- :class:`~linear_model.LinearRegression`, no penalty | |
- :class:`~linear_model.LinearRegression`, no penalty. This estimator exists mostly for teaching purpose. In general Ridge with a small penalty is preferable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, I'm not sure that this is the right place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lorentzenchr. I think the message to convey is boiling down to a listing of the linear model in regression. We can make it explicit beforehand that we are stating which models to use but rather making a catalogue of the model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now thinking that this is only an entry in the FAQ, I think this is even more relevant to not add usage information.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not much to add except this minor suggestion.
doc/faq.rst
Outdated
|
||
**User perspective:** | ||
If all the 4 above mentioned linear models were unified into a single class, | ||
there would be parameters with a lot of options like the ``solver`` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another argument is that it would harder to discover and certainly less straight forward to teach.
Without a class LogisticRegression I feel you would be harder to discover and use for very beginners.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this point, but with ridge instead of logistic as ridge is already mentioned.
@GaelVaroquaux I never suggested to re-design the |
@NicolasHug I added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this make this is indeed an improvement. I added a small comment but I would already be happy to see this in the documentation.
For linear models, there are many estimator classes which are very close to | ||
each other. Let us have a look at | ||
|
||
- :class:`~linear_model.LinearRegression`, no penalty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @lorentzenchr. I think the message to convey is boiling down to a listing of the linear model in regression. We can make it explicit beforehand that we are stating which models to use but rather making a catalogue of the model
each other. Let us have a look at | ||
|
||
- :class:`~linear_model.LinearRegression`, no penalty | ||
- :class:`~linear_model.Ridge`, L2 penalty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think that it would make sense to list RidgeCV
, LassoCV
, and ElasticNetCV
(maybe together after SGD) mentioning that penalized models where the optimum penalty is found via CV?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that while I was writing this PR and decided against it. I'd say that they build their own group, meaning LinearRegression
, Ridge
, Lasso
, ElasticNet
do the same thing and RidgeCV
, LassoCV
and ElasticNetCV
do the another same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big deal on my side, we can keep it as it is.
doc/faq.rst
Outdated
estimator classes for different penalties. | ||
|
||
**User perspective:** | ||
First, the current design makes it easy to find the models by their well known |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might be slightly more assertive mentioning that these models are given a specific name in the scientific literature. A design choice was to follow-up this trend, for the best or the worse depending of user's expectations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a concrete proposal or suggestion for improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
First, the current design is inspired by the scientific literature where the linear models for regression settings with different regularization/penalty were given different names. Having different models might make it easier for users to discover the right regressor to choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rephrased slightly.
Let's include @glemaitre 's last suggestions (or not) and merge. I feel that the current state is already good and we are better off merging than waiting. |
Thanks @lorentzenchr |
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Reference Issues/PRs
See discussion of the dev meeting https://github.com/scikit-learn/administrative/blob/master/meeting_notes/2021-03-29.md.
What does this implement/fix? Explain your changes.
This PR adds a FAQ entry that explains why there are so many linear model classes.