-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
MNT accelerate examples/kernel_approximation/plot_scalable_poly_kernels.py #22903
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MNT accelerate examples/kernel_approximation/plot_scalable_poly_kernels.py #22903
Conversation
Corrected a typo in the original file. It read ``featrues`` instead of ``features``
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…b.com:sveneschlbeck/scikit-learn into plot_scalable_poly_kernels_speedup
… Adjust spacing and fix a typo.
The changes you suggest are relevant and would be appriciated. Please do that in a separate PR since this one is focused on a specific issue. |
Weird, in the CI, the example took more than 1 min. https://183218-843222-gh.circle-artifacts.com/0/doc/auto_examples/kernel_approximation/plot_scalable_poly_kernels.html |
Ok, looks better, It was probably due to an issue in circleci. The example now takes 15sec instead of 36s on main. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jsilke
ping @glemaitre, you reviewed the original PR |
# (`n_runs` = 1) in this example, in practice one should repeat the experiment several | ||
# times to compensate for the stochastic nature of :class:`PolynomialCountSketch`. | ||
|
||
n_runs = 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be nicer if you just remove the whole loop instead of running the loop once and just have the content of the loop in this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a fair point and would make the code more concise; however, it does come at the cost of making repeat experimental runs (a practice encouraged in the documentation) non-trivial in the sense that it is less clear at first glance how to best repeat the experiment.
The user would now have to think about where to place the internal loop and adjust both the ps_lsvm_time
and ps_lsvm_score
for the number of runs which, while not terribly difficult, certainly requires more effort than simply altering the value of n_runs
.
If you believe this is a worthwhile tradeoff, I am happy to make the change you suggest here and update the PR. Otherwise, might I suggest keeping the loop to facilitate the user experience in this arguably minor way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that keeping it makes it easier for a user to run the example with several runs to see how it improves the results. I'd be happy to keep it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also fine keeping the for loop with the above comment.
Merging. Thanks for the PR. |
…ls.py (scikit-learn#22903) Co-authored-by: sveneschlbeck <sven.eschlbeck@t-online.de> Co-authored-by: Sven Eschlbeck <66701689+sveneschlbeck@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Reference Issues/PRs
Continues stalled PR #21731 working on issue #21598
What does this implement/fix? Explain your changes.
Adjusts examples/kernel_approximation/plot_scalable_poly_kernels.py as per the comments in the stalled PR:
N_COMPONENTS
is now in a more sensible locationn_runs
reduced from 3 to 1 to accelerate example run timen_runs
Any other comments?
Although this example is already formatted in notebook style, I believe its structure can be further improved in two ways. Appropriate headings could be added to each subsection, and import statements can be relocated to appropriate cells (indeed, line 143 includes an import despite all others being found in the first code cell). I would be happy to make these adjustments as well if anyone else thinks they would be worthwhile changes, although they are beyond the scope of this particular issue.
If there is interest in the changes I have suggested, would it be appropriate to make a separate issue to submit a PR for? The closest issue that I am presently aware of is #22406, however this example is not listed.