Skip to content

DOC add FAQ entry for the many linear model classes #19861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 16, 2021

Conversation

lorentzenchr
Copy link
Member

Reference Issues/PRs

See discussion of the dev meeting https://github.com/scikit-learn/administrative/blob/master/meeting_notes/2021-03-29.md.

What does this implement/fix? Explain your changes.

This PR adds a FAQ entry that explains why there are so many linear model classes.

Copy link
Contributor

@cmarmo cmarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lorentzenchr !

Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lorentzenchr , I think it's great that we start addressing this common question upfront.

However the current version seems like a justification from a developer point of view, but not from a user perspective. Just by reading this is seems that it would still be perfectly possible to just unify all this into a single class.

Also I think we'd need to address another important oddity of the linear_model module: the fact that the SGD estimators even exist, instead of being just another solver option of the aforementioned classes.

@lorentzenchr
Copy link
Member Author

@NicolasHug Thanks for having a look and your feedback. I divided it into maintainer and user perspective. Is it better now?

@NicolasHug
Copy link
Member

I'm wondering: knowing what we know now, if we were to re-design the linear_model module from scratch, would we make the same choices again? My impression is that we would definitely not. And I think that for the sake of transparency, it's important to address and acknowledge that in the FAQ entry. If mistakes were made (if any, IDK), it's ok to say it. Design is hard and stuff happen for historical reasons, and it's worth explaining it if we're writing such an entry.

For now I don't think the current text will make a reader say "Oh ok, that makes sense, I understand why everything is the way it is now." At least it doesn't have that effect on me (and probably not on them either :p)

I wasn't around when these decisions were made, so I'm not able to provide any answer here. Perhaps @GaelVaroquaux @ogrisel @agramfort @amueller @jnothman can provide some insights?

Also again I think it's relevant to include the SGD estimators in the conversation, since they're pretty much offsiders within the linear_model module

@GaelVaroquaux
Copy link
Member

@NicolasHug breaking something vastly used to satisfy a feeling of esthetic is probably not a good idea.

Design is a difficult thing, and it is impossible to satisfy everybody (in particular on reddit). People approach the library with a background knowledge, in terms of names that they know and think that they understand. Our design was driven to answer this approach. A fraction of people have good-enough training in statistics and ML to understand that l1-penalized squared loss is a lasso, and hinge or squared hinge are SVMs. We certainly do not speak well to these people, but other libraries do (for instance lightning, http://contrib.scikit-learn.org/lightning/ ).

Separating the various classes also enables algorithmic optimizations, such as good choices for the solvers (arguably, we could maybe separate out the sparse logistic regression from the non-sparse one).

I wouldn't really revisit the separation of the various classes. It overall helps making the library approachable. I do think that better documenting the links is a good idea (thank you @lorentzenchr ). Ideally, we should check all the estimators to check the "see also" section of the docstring. We should also proof our docs to check that the links are well specified.

Where I find that this is a vast room for improvement is in our online solver, SGDClassifier/SGDRegressor. Not only does it not map well to the user's vocabulary, but also they are certainly not "fire and forget".

I, for one, would welcome work to add MiniBatch versions of some of the linear model classes, witch carefully-tuned algorithmics so that they are easily useful (such as early-stopping). If we do this, after a few releases, we could maybe consider retiring the SGDCassifier/SGDRegressor.

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think that the explanation capture the big picture and are useful. Thank you!

I made a few minor comments, but to me, this is good to go.

doc/faq.rst Outdated

Why are there so many different estimators for linear models?
-------------------------------------------------------------
Usually, there is one classifier and one regressor per algorithms, e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably mean "model", rather than "algorithm". An algorithm eg is the type of solver used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "model type"?

For linear models, there are many estimator classes which are very close to
each other. Let us have a look at

- :class:`~linear_model.LinearRegression`, no penalty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say here that it is there only for teaching purpose. IMHO, it should never be used, and I would love to retire this class (though I do know that not everybody agrees with this desire).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, it is the wrong place to mention this. And I belong to the mentioned small group...

For linear models, there are many estimator classes which are very close to
each other. Let us have a look at

- :class:`~linear_model.LinearRegression`, no penalty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- :class:`~linear_model.LinearRegression`, no penalty
- :class:`~linear_model.LinearRegression`, no penalty (for teaching purposes, prefer Ridge with a small penalty)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be read as "prefer Ridge with a small penalty for teaching purposes", so maybe:

Suggested change
- :class:`~linear_model.LinearRegression`, no penalty
- :class:`~linear_model.LinearRegression`, no penalty. This estimator exists mostly for teaching purpose. In general Ridge with a small penalty is preferable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I'm not sure that this is the right place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lorentzenchr. I think the message to convey is boiling down to a listing of the linear model in regression. We can make it explicit beforehand that we are stating which models to use but rather making a catalogue of the model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now thinking that this is only an entry in the FAQ, I think this is even more relevant to not add usage information.

Copy link
Member

@agramfort agramfort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not much to add except this minor suggestion.

doc/faq.rst Outdated

**User perspective:**
If all the 4 above mentioned linear models were unified into a single class,
there would be parameters with a lot of options like the ``solver`` parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another argument is that it would harder to discover and certainly less straight forward to teach.
Without a class LogisticRegression I feel you would be harder to discover and use for very beginners.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this point, but with ridge instead of logistic as ridge is already mentioned.

@NicolasHug
Copy link
Member

NicolasHug commented Apr 12, 2021

@NicolasHug breaking something vastly used to satisfy a feeling of esthetic is probably not a good idea.

@GaelVaroquaux I never suggested to re-design the linear_models module. I'm just suggesting to explain what led to the current status of things in the FAQ entry.

@lorentzenchr
Copy link
Member Author

Also again I think it's relevant to include the SGD estimators in the conversation, since they're pretty much offsiders within the linear_model module

@NicolasHug I added SGDRegressor.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this make this is indeed an improvement. I added a small comment but I would already be happy to see this in the documentation.

For linear models, there are many estimator classes which are very close to
each other. Let us have a look at

- :class:`~linear_model.LinearRegression`, no penalty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @lorentzenchr. I think the message to convey is boiling down to a listing of the linear model in regression. We can make it explicit beforehand that we are stating which models to use but rather making a catalogue of the model

each other. Let us have a look at

- :class:`~linear_model.LinearRegression`, no penalty
- :class:`~linear_model.Ridge`, L2 penalty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that it would make sense to list RidgeCV, LassoCV, and ElasticNetCV (maybe together after SGD) mentioning that penalized models where the optimum penalty is found via CV?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that while I was writing this PR and decided against it. I'd say that they build their own group, meaning LinearRegression, Ridge, Lasso, ElasticNet do the same thing and RidgeCV, LassoCV and ElasticNetCV do the another same thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal on my side, we can keep it as it is.

doc/faq.rst Outdated
estimator classes for different penalties.

**User perspective:**
First, the current design makes it easy to find the models by their well known
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be slightly more assertive mentioning that these models are given a specific name in the scientific literature. A design choice was to follow-up this trend, for the best or the worse depending of user's expectations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a concrete proposal or suggestion for improvement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

First, the current design is inspired by the scientific literature where the linear models for regression settings with different regularization/penalty were given different names. Having different models might make it easier for users to discover the right regressor to choose.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased slightly.

@GaelVaroquaux
Copy link
Member

Let's include @glemaitre 's last suggestions (or not) and merge. I feel that the current state is already good and we are better off merging than waiting.

@glemaitre glemaitre merged commit e1f879e into scikit-learn:main Apr 16, 2021
@glemaitre
Copy link
Member

Thanks @lorentzenchr

@lorentzenchr lorentzenchr deleted the faq_linear_models branch April 16, 2021 20:47
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Apr 19, 2021
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
glemaitre pushed a commit that referenced this pull request Apr 28, 2021
Co-authored-by: Chiara Marmo <cmarmo@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants