Skip to content

Run more examples that do not start with plot_ on CircleCI #8849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lesteve opened this issue May 9, 2017 · 17 comments
Open

Run more examples that do not start with plot_ on CircleCI #8849

lesteve opened this issue May 9, 2017 · 17 comments

Comments

@lesteve
Copy link
Member

lesteve commented May 9, 2017

From #8847 (comment):

and we should have a CI test for non-plotted examples or convert as many as possible to plots

My proposal is to have a convention like run_ for examples that do not produce any plots. sphinx-gallery allows to have a regex to specify which examples you want to run. It could be something like plot_|run_. See the doc for more details.

I looked at the examples whose filename is not starting with plot_. Timings are in seconds and in increasing order.

examples/feature_selection/feature_selection_pipeline.py 1.39
examples/exercises/digits_classification_exercise.py 1.47
examples/applications/svm_gui.py 1.86
examples/missing_values.py 2.01
examples/model_selection/randomized_search.py 2.02
examples/feature_stacker.py 2.14
examples/text/document_clustering.py 3.21
examples/linear_model/lasso_dense_vs_sparse_data.py 3.98
examples/text/hashing_vs_dict_vectorizer.py 4.78
examples/model_selection/grid_search_digits.py 8.29
examples/text/document_classification_20newsgroups.py 8.93
examples/applications/topics_extraction_with_nmf_lda.py 10.53
examples/applications/face_recognition.py 25.02
examples/bicluster/bicluster_newsgroups.py 25.72
examples/hetero_feature_union.py 116.22
examples/applications/wikipedia_principal_eigenvector.py 139.77
examples/model_selection/grid_search_text_feature_extraction.py 156.86

With this in mind I would be in favour of running all the examples but svm_gui.py and the last three examples.

More details:
svm_gui.py pops up a gui so it should probably not be run. Whether we should run wikipedia_principal_eigenvector.py and grid_search_text_feature_extraction.py which each takes more than 2 minutes is up for debate. On top of that, some of them may require data download that is not using the typical ~/scikit_learn_data (e.g. the Wikipedia one). If that is the case these examples would not benefit from the CircleCI cache.

@jnothman
Copy link
Member

jnothman commented May 9, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented May 10, 2017 via email

@jnothman
Copy link
Member

jnothman commented May 10, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented May 10, 2017 via email

@jnothman
Copy link
Member

jnothman commented May 10, 2017 via email

@lesteve
Copy link
Member Author

lesteve commented May 10, 2017

Sory I was not clearer, I had CircleCI in mind from the start, i.e. run more examples as part of the doc generation.

@lesteve lesteve changed the title Run more examples that do not start with plot_ Run more examples that do not start with plot_ on CircleCI May 10, 2017
@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented May 10, 2017 via email

@jnothman
Copy link
Member

jnothman commented May 10, 2017 via email

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented May 10, 2017 via email

@lesteve
Copy link
Member Author

lesteve commented May 10, 2017

According to https://circleci.com/build-insights, the median build time of the full documentation on CircleCI is about 40 minutes. Adding all the examples but the last three (svm_gui should be skipped too) costs less than 2 minutes and is a no brainer IMO. The simplest thing to do is to rename these examples and add plot_ at the beginning.

About the three slow examples, it costs ~7min. Even if it is deemed acceptable to run them, maybe we should leave it for a separate PR though. In particular I double-checked and the Wikipedia one downloads ~900MB into the current directory so we would need to modify that to benefit from CircleCI caching.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented May 10, 2017 via email

@jnothman
Copy link
Member

jnothman commented May 10, 2017 via email

@GaelVaroquaux
Copy link
Member

Shouldn't this be closed?

@lesteve
Copy link
Member Author

lesteve commented Jun 9, 2017

There are still more examples we can run:

  • wikipedia_principal_eigenvector.py and grid_search_text_feature_extraction.py as mentioned in Run more examples that do not start with plot_ on CircleCI #8849 (comment)
  • examples using sys.argv. There is a bug in sphinx-gallery where the sys.argv are the ones from the sphinx-build command resulting in errors. I need to report that to sphinx-gallery and I may have a quick fix for this for a PR.

@GaelVaroquaux
Copy link
Member

I think that we should give up on running the examples with sys.argv. But the wikipedia one might be nice.

@cmarmo
Copy link
Contributor

cmarmo commented Dec 16, 2021

@lesteve is there something still needed here or we can close?

@lesteve
Copy link
Member Author

lesteve commented Dec 17, 2021

There are still some examples that we don't run (i.e. that don't start with plot_):

❯ find examples -name '*.py' | grep -v plot_
examples/neighbors/approximate_nearest_neighbors.py
examples/applications/wikipedia_principal_eigenvector.py
examples/applications/svm_gui.py
examples/model_selection/grid_search_text_feature_extraction.py
  • examples/neighbors/approximate_nearest_neighbors.py is new (compared to ~4 years ago). I am guessing this is not run because it needs more dependencies, e.g. annoy and nmslib (available on conda-forge: python-annoy and nmslib). This was part of FEA Generalize the use of precomputed sparse distance matr… #10482 if more context is needed. This example takes ~3.5 minutes on my machine so maybe a bit too long to run in the CI ...
  • examples/applications/wikipedia_principal_eigenvector.py: needs a closer look at how much time it would need in the CI (the ~2 minutes timings in the top post were very likely from a run on my machine) and whether we can afford it
  • examples/model_selection/grid_search_text_feature_extraction.py: needs a closer look at how much time it would need in the CI (the ~2 minutes timings in the top post were from a run on my machine) and whether we can afford it
  • examples/applications/svm_gui.py: we should not run it since it opens a GUI as noted above

For more context why this matters (at least a little bit):

@cmarmo cmarmo added help wanted and removed Needs Decision - Close Requires decision for closing labels Dec 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants