Skip to content

Conversation

jsilke
Copy link
Contributor

@jsilke jsilke commented Mar 19, 2022

Reference Issues/PRs

Continues stalled PR #21731 working on issue #21598

What does this implement/fix? Explain your changes.

Adjusts examples/kernel_approximation/plot_scalable_poly_kernels.py as per the comments in the stalled PR:

  • N_COMPONENTS is now in a more sensible location
  • n_runs reduced from 3 to 1 to accelerate example run time
  • documentation amended to reflect change to n_runs
  • minor typo and spacing adjustment

Any other comments?

Although this example is already formatted in notebook style, I believe its structure can be further improved in two ways. Appropriate headings could be added to each subsection, and import statements can be relocated to appropriate cells (indeed, line 143 includes an import despite all others being found in the first code cell). I would be happy to make these adjustments as well if anyone else thinks they would be worthwhile changes, although they are beyond the scope of this particular issue.

If there is interest in the changes I have suggested, would it be appropriate to make a separate issue to submit a PR for? The closest issue that I am presently aware of is #22406, however this example is not listed.

sveneschlbeck and others added 5 commits November 21, 2021 15:01
Corrected a typo in the original file. It read ``featrues`` instead of ``features``
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…b.com:sveneschlbeck/scikit-learn into plot_scalable_poly_kernels_speedup
@jsilke jsilke changed the title Plot scalable poly kernels speedup MNT accelerate examples/kernel_approximation/plot_scalable_poly_kernels.py Mar 19, 2022
@jeremiedbb
Copy link
Member

If there is interest in the changes I have suggested, would it be appropriate to make a separate issue to submit a PR for?

The changes you suggest are relevant and would be appriciated. Please do that in a separate PR since this one is focused on a specific issue.

@jeremiedbb
Copy link
Member

Weird, in the CI, the example took more than 1 min. https://183218-843222-gh.circle-artifacts.com/0/doc/auto_examples/kernel_approximation/plot_scalable_poly_kernels.html
Let me rerun that

@jeremiedbb
Copy link
Member

Ok, looks better, It was probably due to an issue in circleci. The example now takes 15sec instead of 36s on main.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @jsilke

@jeremiedbb
Copy link
Member

ping @glemaitre, you reviewed the original PR

@jeremiedbb jeremiedbb added the Quick Review For PRs that are quick to review label Mar 21, 2022
# (`n_runs` = 1) in this example, in practice one should repeat the experiment several
# times to compensate for the stochastic nature of :class:`PolynomialCountSketch`.

n_runs = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be nicer if you just remove the whole loop instead of running the loop once and just have the content of the loop in this example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point and would make the code more concise; however, it does come at the cost of making repeat experimental runs (a practice encouraged in the documentation) non-trivial in the sense that it is less clear at first glance how to best repeat the experiment.

The user would now have to think about where to place the internal loop and adjust both the ps_lsvm_time and ps_lsvm_score for the number of runs which, while not terribly difficult, certainly requires more effort than simply altering the value of n_runs.

If you believe this is a worthwhile tradeoff, I am happy to make the change you suggest here and update the PR. Otherwise, might I suggest keeping the loop to facilitate the user experience in this arguably minor way?

Copy link
Member

@jeremiedbb jeremiedbb Mar 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that keeping it makes it easier for a user to run the example with several runs to see how it improves the results. I'd be happy to keep it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also fine keeping the for loop with the above comment.

@glemaitre glemaitre merged commit cbe5d4a into scikit-learn:main Mar 29, 2022
@glemaitre
Copy link
Member

Merging. Thanks for the PR.

@jsilke jsilke deleted the plot_scalable_poly_kernels_speedup branch March 30, 2022 14:42
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022
…ls.py (scikit-learn#22903)

Co-authored-by: sveneschlbeck <sven.eschlbeck@t-online.de>
Co-authored-by: Sven Eschlbeck <66701689+sveneschlbeck@users.noreply.github.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants