-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
DOC Merge plot_svm_margin.py and plot_separating_hyperplane.py into plot_svm_hyperplane_margin.py #31045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SwathiR1999,
thank you for your patience and for your efforts to merge the examples. It looks very good and I like the design of the plots. The plots all have heading and legend and are easy on the eye.
I think we could add some information and I have left you some suggestions on that, too.
When I build the docs locally, I cannot click to open it and also there is some text overflow in the background. I believe this is because of a rending issue in the heading, and I have added a suggestions for a fix.
What is lacking is to remove the old examples and to add redirections from both old links to the new example link in the doc/conf.py
file. You can have a look at #30028 to see how to add the redirections.
You need to delete all links to the old examples as well (and replace it with the new one if suitable). There are some, probably too many, links to plot_separating_hyperplane_unbalanced
currently.
- **Small C (e.g., 0.05)**: | ||
- Allows some misclassifications, resulting in a wider margin. | ||
- **Moderate C (e.g., 1)**: | ||
- Balances classification accuracy and margin width. | ||
- **Large C (e.g., 1000)**: | ||
- Prioritizes classifying all points correctly, leading to a narrower margin. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think while reworking the example, we could add some more information.
Do you think you could add some suggestion of when to use a small, a moderate or a high C value in a practical use case?
We could also talk about overfitting and underfitting in this context.
Also, we could talk about miss-classification / error for different margins.
I think this could be added after the code for the plot using the # %
syntax to make this example more notebook-like. (Also have a look at #30028 for a good example on the notebook style.)
""" | ||
========================================================================= | ||
SVM: Effect of Regularization (C) on Maximum Margin Separating Hyperplane | ||
======================================================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
======================================================================== | |
========================================================================= |
The lacking "=" leads to the title not being rendered correctly. Adding "=" should fix it.
I also hope that this fixes the problem with the example not being clickable from the main examples page and the text overflow there (see other comment).
clf.fit(X, y) | ||
|
||
plt.subplot(1, 3, i) | ||
plt.scatter(X[:, 0], X[:, 1], c=y, s=30, cmap=plt.cm.Paired, edgecolors="k") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add axis labels. Something simple like "Feature 1" and "Feature 2" would be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved. (Just the "resolved" button doesn't show here.)
s=100, | ||
linewidth=1.5, | ||
facecolors="none", | ||
edgecolors="r", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about having a different color for miss-classified samples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is resolved. (Just the "resolved" button doesn't show here.)
…nto doc_addlink
Hi @StefanieSenger , thanks for the feedback!
Just to clarify - should I also apply similar changes (merge/update/remove references) to |
Oh, my bad. I didn't see there were |
…nto doc_addlink
- Modified \`doc/conf.py\` and \`doc/modules/svm.rst\` to update documentation and add example reference.\n- Removed outdated example: \`examples/svm/plot_svm_margin.py\`.
Hi @StefanieSenger, I’ve implemented your suggestions. I wanted to check with you on one small point: While It is currently referenced early in the SVM documentation, which aligns well with its simplicity. The new merged example, on the other hand, focuses more on the effect of Would it be alright to:
Looking forward to your thoughts! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @SwathiR1999,
thank you for your further work. I went through the example and left the next round of comments. (I might have more afterwards. 😬)
In general, please feel very free to add any information you think would help people who start learning on SVCs and regularization to understand. We're very happy to add new passages if they are helpful and fitting the context of the example.
On your question:
Retain plot_separating_hyperplane.py as an introductory visual example?
I can definitely see what you mean and sort of agree. But on the other hand, we have too many examples to take care of and plot_separating_hyperplane.py is pretty similar to the re-newed example you are working on. This new example would (with your help) ideally grow into a demonstration that also helps beginners.
Maybe @adrinjalali has a different opinion on it?
@@ -404,13 +404,14 @@ Tips on Practical Use | |||
|
|||
* **Setting C**: ``C`` is ``1`` by default and it's a reasonable default | |||
choice. If you have a lot of noisy observations you should decrease it: | |||
decreasing C corresponds to more regularization. | |||
decreasing C corresponds to more regularization (see example below). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decreasing C corresponds to more regularization (see example below). | |
decreasing C corresponds to more regularization. |
I think it's better to leave this unchanged. We add links in suitable spots, but these quasi-links just occupy the reader's attention without adding real value.
|
||
:class:`LinearSVC` and :class:`LinearSVR` are less sensitive to ``C`` when | ||
it becomes large, and prediction results stop improving after a certain | ||
threshold. Meanwhile, larger ``C`` values will take more time to train, | ||
sometimes up to 10 times longer, as shown in [#3]_. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also leave this unchanged.
.. rubric:: Examples | ||
|
||
* :ref:`sphx_glr_auto_examples_svm_plot_svm_hyperplane_margin.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to add the reference to the example in the text instead, including a description of what the example is about.
@@ -632,7 +636,11 @@ indicates a perfect prediction. But problems are usually not always perfectly | |||
separable with a hyperplane, so we allow some samples to be at a distance :math:`\zeta_i` from | |||
their correct margin boundary. The penalty term `C` controls the strength of | |||
this penalty, and as a result, acts as an inverse regularization parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this penalty, and as a result, acts as an inverse regularization parameter | |
this penalty, and as a result, acts as an inverse regularization parameter: |
@@ -632,7 +636,11 @@ indicates a perfect prediction. But problems are usually not always perfectly | |||
separable with a hyperplane, so we allow some samples to be at a distance :math:`\zeta_i` from | |||
their correct margin boundary. The penalty term `C` controls the strength of | |||
this penalty, and as a result, acts as an inverse regularization parameter | |||
(see note below). | |||
(see the figure below). Also please refer to the note below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see the figure below). Also please refer to the note below. |
I would suggest to simply finish the last sentence with a colon and then show the plot directly.
|
||
# %% [markdown] | ||
# - **Small `C` (e.g., 0.01, 0.05)**: | ||
# - Use when: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Use when
are not rendered correctly.
plt.show() | ||
|
||
# %% [markdown] | ||
# - **Small `C` (e.g., 0.01, 0.05)**: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to have the representation of C
(like in: "Small `C`") the same as above, either with or without the backticks everywhere.
This script demonstrates the concept of maximum margin separating hyperplane | ||
in a two-class separable dataset using a Support Vector Machine (SVM) | ||
with a linear kernel and how different values of `C` influence margin width. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to add a short note on what a margin is to make this more beginner-friendly.
# Visualize | ||
plt.figure(figsize=(12, 4)) | ||
for i, C_val in enumerate(C_values, 1): | ||
clf = svm.SVC(kernel="linear", C=C_val) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a random state here, so this example looks the same with every run.
X[misclassified, 0], | ||
X[misclassified, 1], | ||
facecolors="none", | ||
edgecolors="k", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the same color (edgecolors="k") for correctly classified and for misclassified points, doesn't make them stand out as expected. Could you fix this please?
Reference Issues/PRs
References #30621
What does this implement/fix? Explain your changes.
plot_svm_margin.py
andplot_separating_hyperplane.py
) into a single fileplot_svm_hyperplane_margin.py
.Any other comments?
This PR addresses the feedback from #30975, which was closed.
@StefanieSenger Looking forward to your feedback. Thanks!