Skip to content

DOC Added references to plot_weighted_samples example in SVM documentation #30676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

simarssidhu
Copy link
Contributor

Changes

  • added reference links to 'plot_weighted_samples.py' example in the documentation for LinearSVC and SVC
  • found and fixed minor typo in doc, from comaprison to comparison

Purpose:

  • to make it easier for users to find examples relevant to SVMs

Related Issues:

Copy link

github-actions bot commented Jan 20, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e0d9e4c. Link to the linter CI: here

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution, @simarssidhu!

I have left you a few comments. Otherwise it's fine.

@simarssidhu simarssidhu force-pushed the svm_plot_weighted_samples branch from 55520aa to 0f37ec7 Compare January 21, 2025 03:27
@simarssidhu
Copy link
Contributor Author

Hi @StefanieSenger,

Thanks for your comments! I've gone ahead and incorporated the suggested changes, and squashed my commits for your review. Please let me know if it looks good now.

Thank you! You've been a tremendous help with my first open source contribution! I'll try and tackle some more tomorrow

Best,
Simar :)

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @simarssidhu, it looks great.

I am just wondering if instead of linking the example here, it would be more valuable to link it in BaseLibSVM.fit() when the sample_weight param is described.

Maybe @marenwestermann or @ArturoAmorQ want to have a look at this and continue with the merge of this PR?

(Note: please avoid to force push, because in more complex scenarios, PRs are much easier to review if we can see the history. The PRs are squashed before being merged into main anyways.)

@simarssidhu
Copy link
Contributor Author

Thanks @StefanieSenger,

I've updated the reference, and moved it to the BaseLibSVM.fit() function in _base.py, as suggested.

However, I think there might be an issue with my commit history. I've synced my forked origin with the scikit-learn upstream, pulled the changes locally, and committed my update (1188b15). Unfortunately, I noticed some "merge branch" commits (23faac1 and 5da3019) in the process. Normally, I would resolve this locally and force push to my fork to clean up the history. However, since we're avoiding force pushes, I was wondering if you could advise on the best way to proceed in this situation?

Thank you,
Simar

@simarssidhu
Copy link
Contributor Author

Update:

  • failed tests, perhaps due to reference not being under "Notes" section. Moved to "Notes" section, and am waiting for tests to clear

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @simarssidhu thanks for the PR. Here are a couple of comments.

Sorry to be a little bitter here, but it's been a while since I stated that this issue has led to a chaotic documentation. Your PR just showed me how much that is the case for svm.SVC. Currently there is Nested versus non-nested cross-validation linked from the main paragraph, Plot classification probability at the end of the Example section (section that by the way was originally reserved for snippets, i.e. code that can be easily copied and pasted as warm-start) and now your PR intends to add SVM: Weighted samples to the Notes section, which by the way is now known to be a very bad practice.

Comment on lines 846 to 847
For a comparison of the SVC with other classifiers see:
:ref:`sphx_glr_auto_examples_classification_plot_classification_probability.py`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! I just noticed there are a few places in the documentation where this particular typo appears. Probably result from copy-pasting from already merged pulled request. Would you feel like fixing those?

Being said that, I don't think the intention of the linked example was to compare SVC with other classifiers as stated here, but rather to showcase the use of DecisionBoundaryDisplay when plotting predict_proba in multiclass classification. I then suggest to completely remove this paragraph, as one day we may use a different set of estimators to serve the real purpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about it, there is no need to remove the paragraph. It can be rephrased to something more inline with:

Suggested change
For a comparison of the SVC with other classifiers see:
:ref:`sphx_glr_auto_examples_classification_plot_classification_probability.py`.
For an example on how to enable probability estimates for SVC, see:
:ref:`sphx_glr_auto_examples_classification_plot_classification_probability.py`.

@@ -183,6 +183,9 @@ def fit(self, X, y, sample_weight=None):

If X is a dense array, then the other methods will not support sparse
matrices as input.

For an example of handling weighed samples with :class:`SVC`s, please see:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rendering is currently broken, let's see if this fixes the issue:

Suggested change
For an example of handling weighed samples with :class:`SVC`s, please see:
For an example of handling weighed samples with :class:`svm.SVC`s, please see:

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Jan 22, 2025

Sorry to be a little bitter here, but it's been a while since I stated that #26927 (comment). Your PR just showed me how much that is the case for svm.SVC. Currently there is Nested versus non-nested cross-validation linked from the main paragraph, Plot classification probability at the end of the Example section (section that by the way was originally reserved for snippets, i.e. code that can be easily copied and pasted as warm-start) and now your PR intends to add SVM: Weighted samples to the Notes section, which by the way is now known to be a very bad practice.

I see things a bit differently: I believe examples should be linked from where they are most useful and directly relevant to the context. This way they provide immediate value to users. Creating another condensed version of the Sphinx Gallery as just a collection of links might not serve the same purpose effectively.

The author of this PR has implemented just what issue #30621 was asking for. There is no gain in calling that into question now, unless you want to close that issue and open a new issue and take care of it.
What do you think @ArturoAmorQ, @adrinjalali ?

@StefanieSenger
Copy link
Contributor

However, I think there might be an issue with my commit history. I've synced my forked origin with the scikit-learn upstream, pulled the changes locally, and committed my update (1188b15). Unfortunately, I noticed some "merge branch" commits (23faac1 and 5da3019) in the process. Normally, I would resolve this locally and force push to my fork to clean up the history. However, since we're avoiding force pushes, I was wondering if you could advise on the best way to proceed in this situation?

No worries @simarssidhu, once you have started force pushing it might be difficult to get out of that. Just keep this in mind for future PRs.

@ArturoAmorQ
Copy link
Member

The author of this PR has implemented just what issue #30621 was asking for.

I am definitely not blaming the author of this PR. I want to clarify as well that #30621 asks for adding references to examples in the relevant sections, without clearly stating what those relevant sections are.

There is no gain in calling that into question now, unless you want to close that issue and open a new issue and take care of it.

I'm not questioning the issue, just sharing my opinion, which I hope you don't take personally. To be clear I am not against adding links to the examples, so no need to close the issue. I’ve simply been advocating for a consistent approach. In particular I personally would discourage:

  • adding links to the Example section, as it was originally reserved for code snippets;

  • adding links to a Notes section, as they are originally meant to provide information on specific behavior of the class/classmethod/method;

  • moreover, adding a link inside the Notes section inside the fit method makes it feel verbose, unexpected and a bit too nested. The effect could then be counterproductive, as it would be more easily overlooked at.

Instead, as suggested in my original comment, I suggest:

  • adding the links to the examples from the main paragraph, right before the Read more in the User Guide phrase.

  • for the sake of adding links to relevant places, I would agree that an example regarding the use of a classmethod (such as from_estimator which is really a thing on it's own) should be linked from it's own docstring. Happily, the Read more in the User Guide phrase is also present in the classmethod docstrings.

I see things a bit differently: I believe examples should be linked from where they are most useful and directly relevant to the context. This way they provide immediate value to users. Creating another condensed version of the Sphinx Gallery as just a collection of links might not serve the same purpose effectively.

I understand the perspective, and I agree they should be linked from where they are relevant and helpful. However, I don't know in which place they would be more useful, as there is a trade-off to consider: examples in the relevant places may be harder to find, and a more centralized and consistent location (like the main paragraph, where they would be the first thing to read) could make the documentation easier to navigate for users, at the risk of reducing relevancy.

In any case I insist that this is only my opinion, and as it has been disregarded before, I decided to step back from reviewing related PRs.

@StefanieSenger
Copy link
Contributor

In any case I insist that this is only my opinion, and as it has been disregarded before, I decided to step back from reviewing related PRs.

No hard feelings, @ArturoAmorQ, I understand that we have different opinions and I appreciate that you have shared yours. I totally respect your decision and won't ping you on related issues any more.

@StefanieSenger
Copy link
Contributor

The PR looks pretty good now, @simarssidhu!

Note that by linking to the example from BaseLibSVM, we are showing it in the docs for several other SVM based classifiers as well, which in my eyes is a gain.

Would you like to have a look, @adrinjalali?

@ArturoAmorQ
Copy link
Member

The PR looks pretty good now.

Putting my comment about placement aside, I still expect my other comments to be addressed. The render is still broken and I still think the wording "For a comparison of the SVC with other classifiers" does not reflect the real intention of the example.

Would you like to have a look, @adrinjalali?

I would rather ping somebody more neutral on the subject, such as @lucyleeow.

@simarssidhu
Copy link
Contributor Author

Hi @StefanieSenger and @ArturoAmorQ ,

Sorry, I've been travelling the past few days haha 😅 Thanks so much for reviewing my work! I've gone ahead and made the suggested changes, and believe it's sufficient according to the issue (#30621).

I'll address the merge conflict issues tonight. I believe all I have to do is rebase to the upstream/main branch?

Thanks,
Simar

@StefanieSenger
Copy link
Contributor

I'll address the merge conflict issues tonight. I believe all I have to do is rebase to the upstream/main branch?

It's better to merge the main branch into your branch directly. This way, you can resolve the merge conflicts by keeping both main's and your branche's changes.

@lucyleeow
Copy link
Member

Thanks for the discussion.

I have to admit that I have had similar thoughts to @ArturoAmorQ regarding this issue, but did not feel that it was necessary at the time to voice my opinion.

Since I'm being pinged I have to say that I don't love adding links to examples to the notes, examples and even parameter/attribute sections, unless there was a compelling reason. For similar reasons as @ArturoAmorQ .

In this case, there are quite a few SVC examples (though some are not very relevant, SVC is simply used in the example), why is this one in particular more relevant than others to add to the note section?

For this particular example, I would say adding a link to the sample_weight parameter section would be more relevant than the note section BUT I really don't like having example links in parameter sections. I feel like these should be concise, it should explain completely the parameters' functionality but no more.

What I think may be a better idea is to improve the examples list at the end of all API pages. E.g., the SVC one: https://scikit-learn.org/dev/modules/generated/sklearn.svm.SVC.html#gallery-examples includes many examples that only use SVC, and are are much less relevant than sphx_glr_auto_examples_svm_plot_weighted_samples.py.

I don't have a good solution to fixing this though; we could order them from most relevant to least (maybe something vaguely akin to the great recommender system you added to Sphinx-Gallery: sphinx-gallery/sphinx-gallery#1125), exclude some, or group them into 2 sections (examples completely about SVC , and examples using SVC ..?)

@simarssidhu
Copy link
Contributor Author

I'll address the merge conflict issues tonight. I believe all I have to do is rebase to the upstream/main branch?

It's better to merge the main branch into your branch directly. This way, you can resolve the merge conflicts by keeping both main's and your branche's changes.

Hi @StefanieSenger. perfect, thanks so much! That seemed to fix the issue :)

Hi @lucyleeow, thank you for your input! I must admit, it's a bit above my understanding at the moment, as this is my first commit to scikit-learn.

Hi everyone, do you think this problem should be addressed in a newer, separate issue? For this one, I believe I've focused on fulfilling the requirements outlined in #30621. In the meantime, I'll move onto another issue, until I gain more clarity on how to proceed here.

Thanks,
Simar

@StefanieSenger
Copy link
Contributor

Hi everyone, do you think this problem should be addressed in a newer, separate issue? For this one, I believe I've focused on fulfilling the requirements outlined in #30621. In the meantime, I'll move onto another issue, until I gain more clarity on how to proceed here.

Hi @simarssidhu, thanks for your further work and for your patience. I'm sorry this discussion was led on your PR; it should have been led on the issue instead and unfortunately, I didn't see it coming at all.

I hope you're not tired of contributing yet. There is another meta-issue you could try (#22827) for your first contribution, in case you wonder if weather to continue adding links to the examples or not.

I think it's best to close this issue, because lamentably maintainers disagree on it.

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Jan 28, 2025

A new PR can potentially be opened for this task in the future.

I believe it is not the contributor's responsibility to propose further solutions until maintainers agree on a broader path forward. Let's await further guidance from maintainers and concentrate on other tasks in the meantime.

@simarssidhu
Copy link
Contributor Author

Thanks so much for you help @StefanieSenger! I'll move onto (#22827) for my first contribution :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants