Skip to content

DOC Fix documentation of the base module #17548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 21, 2020

Conversation

alfaro96
Copy link
Member

@alfaro96 alfaro96 commented Jun 9, 2020

Reference Issues/PRs

Related with #3791.

What does this implement/fix? Explain your changes.

This PR fixes the documentation of the sklearn.base module for consistency.

@alfaro96 alfaro96 changed the title DOC Fix documentation of the base module [WIP] DOC Fix documentation of the base module Jun 9, 2020
@alfaro96
Copy link
Member Author

alfaro96 commented Jun 9, 2020

I am wondering whether:

  • The shape of row_ind and col_ind may be specified in get_indices for sklearn.base.BiclusterMixin.

  • To remove the score method in sklearn.base.DensityMixin since is empty.

  • The data type of X (np.int64 or np.float64) and y (np.int64 in classification and np.float64 in regression) can be specified in score methods, and use {array-like, sparse matrix, dataframe} instead of array-like.

  • To link terms (e.g., classifier) to the glossary, since this may help to understand the base classes.

@alfaro96 alfaro96 changed the title [WIP] DOC Fix documentation of the base module DOC Fix documentation of the base module Jun 9, 2020
@adrinjalali
Copy link
Member

* The shape of `row_ind` and `col_ind` may be specified in `get_indices` for `sklearn.base.BiclusterMixin`.

if there are constraints in the code, then yes.

* To remove the `score` method in `sklearn.base.DensityMixin` since is empty.

changing the interface of existing public classes is tricky and we tend to avoid.

* The data type of `X` (`np.int64` or `np.float64`) and `y` (`np.int64` in classification and `np.float64` in regression) can be specified in `score` methods, and use `{array-like, sparse matrix, dataframe}` instead of `array-like`.

if the input is converted to the desired input type regardless of the input type then the input's type doesn't matter.

* To link terms (e.g., _classifier_) to the glossary, since this may help to understand the base classes.

it does help in some cases, but please be mindful of the number of hyperlinks when you do it since since it can make the doc look less readable.

@alfaro96
Copy link
Member Author

alfaro96 commented Jun 10, 2020

Thanks for the review @adrinjalali!

I have applied the suggested changes and a few that I think are helpful.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM otherwise.

Comment on lines +625 to +629
n_rows : int
Number of rows in the bicluster.

n_cols : int
Number of columns in the bicluster.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is less confusing than the shape: tuple for the users. WDYT @NicolasHug

Copy link
Member

@NicolasHug NicolasHug Jun 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in general we separate the entries like that (numpy does it too)
For 2-uples it's OK to merge them too IMHO

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to separate in two entries because is the approach used in the get_indices method. Nevertheless, LGTM any of these approaches.

Which approach do you prefer?

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have a doubt regarding the BaseEstimator. We started to use estimator instance in other places. I would really think that we should add the rule in the contributing guideline.

@glemaitre
Copy link
Member

@adrinjalali do you recall anything about this case

@adrinjalali
Copy link
Member

No I don't, but I know I myself would write "estimator instance" instead of "BaseEstimator" just because I find it more intuitive for users, but I really don't mind either of them.

@NicolasHug
Copy link
Member

+1 for "estimator instance" as well. The existence of BaseEstimator is an implementation detail for most users.

@alfaro96
Copy link
Member Author

+1 of using estimator instance. I think that the user usually knows the concept of estimator (and instance), but they are not worried about implementation details (private API).

I will apply these changes!

@glemaitre glemaitre self-assigned this Aug 21, 2020
@glemaitre glemaitre merged commit 2fa4272 into scikit-learn:master Aug 21, 2020
@alfaro96 alfaro96 deleted the review_base_module branch August 21, 2020 21:45
jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants