Skip to content

DOC improve docs in DecisionBoundaryDisplay and linear models #30729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 30, 2025

Conversation

StefanieSenger
Copy link
Contributor

This PR proposes doc improvements in the docstring of DecisionBoundaryDisplay.from_estimator and in and linear_models.rst to facilitate readability.

Copy link

github-actions bot commented Jan 28, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2227ef9. Link to the linter CI: here

@@ -248,14 +248,12 @@ def from_estimator(
:func:`contour <matplotlib.pyplot.contour>`,
:func:`pcolormesh <matplotlib.pyplot.pcolormesh>`.

response_method : {'auto', 'predict_proba', 'decision_function', \
response_method : {'auto', 'decision_function', 'predict_proba', \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning the options for response_method in the same order as they are checked through to reduce mental load for users.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix.

I find it weird that we don't attempt to use predict_proba first when available.

Maybe we could open a dedicated follow-up PR if people agree this would be a better default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that predict_proba is less custom and easier to interpret than decision_function, so yes if users use the default auto they should maybe better see the more general info. Also, it would be good to add a title to the plot informing the user which response_method was used.

Comment on lines 40 to 42
:class:`LinearRegression` will take in its ``fit`` method arguments ``X``, ``y``,
``sample_weight`` and will store the coefficients :math:`w` of the linear model in its
``coef_`` and ``intercept_`` attributes::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning intercept_ attribute to avoid misunderstanding as if LinearRegression fits intercept by default.

I also substituted arrays with arguments, since the input can be array-likes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also drop the "will" and instead use the present tense while we are at it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made it present tense.

@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Jan 28, 2025

There seems to be a floating point rounding issue with some CIs expecting np.float64(1.1102230246251565e-16) and others np.float64(2.22044...e-16) (twice the first number). These are both close to 0 and I am trying to find out how to test just near closeness to zero.

Otherwise, we could change the doc example into

>>> np.isclose(reg.intercept_, 0)
True

maybe?

@StefanieSenger
Copy link
Contributor Author

I have solved it by rounding. Please let me know if I should do it another way.

Comment on lines 40 to 42
:class:`LinearRegression` will take in its ``fit`` method arguments ``X``, ``y``,
``sample_weight`` and will store the coefficients :math:`w` of the linear model in its
``coef_`` and ``intercept_`` attributes::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also drop the "will" and instead use the present tense while we are at it.

@@ -248,14 +248,12 @@ def from_estimator(
:func:`contour <matplotlib.pyplot.contour>`,
:func:`pcolormesh <matplotlib.pyplot.pcolormesh>`.

response_method : {'auto', 'predict_proba', 'decision_function', \
response_method : {'auto', 'decision_function', 'predict_proba', \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this fix.

I find it weird that we don't attempt to use predict_proba first when available.

Maybe we could open a dedicated follow-up PR if people agree this would be a better default.

StefanieSenger and others added 2 commits January 30, 2025 10:00
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@betatim betatim merged commit a996f43 into scikit-learn:main Jan 30, 2025
31 checks passed
@betatim
Copy link
Member

betatim commented Jan 30, 2025

Merged because it looked good, not to end the discussion :)

@StefanieSenger StefanieSenger deleted the doc_improvements branch January 30, 2025 15:40
@DeaMariaLeon
Copy link
Contributor

Hi @StefanieSenger, I get this on my mac:

_______________________________ [doctest] linear_model.rst ________________________________
041 ``sample_weight`` and stores the coefficients :math:`w` of the linear model in its
042 ``coef_`` and ``intercept_`` attributes::
043 
044     >>> from sklearn import linear_model
045     >>> reg = linear_model.LinearRegression()
046     >>> reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2])
047     LinearRegression()
048     >>> reg.coef_
049     array([0.5, 0.5])
050     >>> reg.intercept_
Expected:
    0.0
Got:
    np.float64(4.440892098500626e-16)

/Users/dealeon/projects/scikit-learn/doc/modules/linear_model.rst:50: DocTestFailure
================================= short test summary info =================================
FAILED linear_model.rst::linear_model.rst

What do you think? It's not blocking me in any way, it's just FYI.. or if I should do something about it (open an issue or so). Thanks

@StefanieSenger
Copy link
Contributor Author

StefanieSenger commented Feb 4, 2025

Hi @DeaMariaLeon,

thanks for bringing this up. I also get this locally with python -m doctest doc/modules/linear_model.rst, see #30729 (comment).

I would think this might be due to differences between by local doctest setup and the CI doctest setup, but not sure to be honest.

A lot of other doctest things are also failling for me because of different rounding. It would be nice to learn how to mimic the CI's doctest locally, because it could save time in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants