-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Run more examples that do not start with plot_ on CircleCI #8849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
but all of that, if run in a separate Travis job, would unlikely exceed the
usual test suite.
…On 9 May 2017 11:59 pm, "Loïc Estève" ***@***.***> wrote:
From #8847 (comment)
<#8847>:
and we should have a CI test for non-plotted examples or convert as many
as possible to plots
My proposal is to have a convention like run_ for examples that do not
produce any plots. sphinx-gallery allows to have a regex to specify which
examples you want to run. It could be something like plot_|run_. See the
doc
<http://sphinx-gallery.readthedocs.io/en/latest/advanced_configuration.html?highlight=filename_pattern#building-examples-matching-a-pattern>
for more details.
I looked at the examples whose filename is not starting with plot_.
Timings are in seconds and in increasing order.
examples/feature_selection/feature_selection_pipeline.py 1.39
examples/exercises/digits_classification_exercise.py 1.47
examples/applications/svm_gui.py 1.86
examples/missing_values.py 2.01
examples/model_selection/randomized_search.py 2.02
examples/feature_stacker.py 2.14
examples/text/document_clustering.py 3.21
examples/linear_model/lasso_dense_vs_sparse_data.py 3.98
examples/text/hashing_vs_dict_vectorizer.py 4.78
examples/model_selection/grid_search_digits.py 8.29
examples/text/document_classification_20newsgroups.py 8.93
examples/applications/topics_extraction_with_nmf_lda.py 10.53
examples/applications/face_recognition.py 25.02
examples/bicluster/bicluster_newsgroups.py 25.72
examples/hetero_feature_union.py 116.22
examples/applications/wikipedia_principal_eigenvector.py 139.77
examples/model_selection/grid_search_text_feature_extraction.py 156.86
With this in mind I would be in favour of running all the examples but
svm_gui.py and the last three examples.
More details:
svm_gui.py pops up a gui so it should probably not be runWhether
hetero_feature_union.py. Whether we should run wikipedia_principal_
eigenvector.py and grid_search_text_feature_extraction.py which each
takes more than 2 minutes is up for debate. On top of that, some of them
may require data download that is not using the typical
~/scikit_learn_data (e.g. the Wikipedia one). This examples would not
benefit from the CircleCI cache.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#8849>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz65FXaf-wTGz7HNbc6cZlM99Wsw_mks5r4HFfgaJpZM4NVXG_>
.
|
From #8847 (comment):
and we should have a CI test for non-plotted examples or convert as
many as possible to plots
Often, examples are not named "plot_*" because they take a long time to
run, or require a large download. Back when we create them, we considered
that we did not have enough horsepower with the CI to run them. Maybe we
should indeed reconsider this decision, but first we need to evaluate our
computing power in the CI.
|
If they were run as a separate travis job (and perhaps only on master),
what's the concern?
…On 10 May 2017 at 15:58, Gael Varoquaux ***@***.***> wrote:
> From #8847 (comment):
> and we should have a CI test for non-plotted examples or convert as
> many as possible to plots
Often, examples are not named "plot_*" because they take a long time to
run, or require a large download. Back when we create them, we considered
that we did not have enough horsepower with the CI to run them. Maybe we
should indeed reconsider this decision, but first we need to evaluate our
computing power in the CI.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8849 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz64euNhg6UJ1w6Tmg28JBwpZAcNb7ks5r4VITgaJpZM4NVXG_>
.
|
If they were run as a separate travis job (and perhaps only on master),
what's the concern?
I think that some of them download a lot, and take a lot of time, but
this should be checked, as my memory may be wrong.
CircleCI might be a better place to run them, because there is more time
available, and because of the persistent cache.
|
@lesteve's timings above suggests "a long time" is only relative to our
usual testing. It's still within the order of a couple of minutes, and is
unlikely to exceed Travis's allotted 120 mins. But no problem to use
CircleCI; I'm not intimate with the benefits of one over the other.
…On 10 May 2017 at 16:14, Gael Varoquaux ***@***.***> wrote:
> If they were run as a separate travis job (and perhaps only on master),
> what's the concern?
I think that some of them download a lot, and take a lot of time, but
this should be checked, as my memory may be wrong.
CircleCI might be a better place to run them, because there is more time
available, and because of the persistent cache.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8849 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6y3djHhrwFjIxpva7pbN5ALw0wblks5r4VXcgaJpZM4NVXG_>
.
|
Sory I was not clearer, I had CircleCI in mind from the start, i.e. run more examples as part of the doc generation. |
Indeed, based on the numbers, I agree that:
- Some of the examples should be turned to plot_ something. A plot is
always nice.
- All the examples should be run on CircleCI
|
I don't think we should run all examples in circle for PRs. I want circle
to render docs quickly for review.
I suggested travis because it's good at parallel jobs and a dedicated slow
tests job is easy there
…On 10 May 2017 4:58 pm, "Gael Varoquaux" ***@***.***> wrote:
Indeed, based on the numbers, I agree that:
- Some of the examples should be turned to plot_ something. A plot is
always nice.
- All the examples should be run on CircleCI
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8849 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_95FrqS56LkpiCMIBAk_J5pEbRkks5r4WAWgaJpZM4NVXG_>
.
|
I suggested travis because it's good at parallel jobs and a dedicated slow
tests job is easy there
I am just worried about data downloads. CircleCI has a persistent cache
that will be useful.
|
According to https://circleci.com/build-insights, the median build time of the full documentation on CircleCI is about 40 minutes. Adding all the examples but the last three ( About the three slow examples, it costs ~7min. Even if it is deemed acceptable to run them, maybe we should leave it for a separate PR though. In particular I double-checked and the Wikipedia one downloads ~900MB into the current directory so we would need to modify that to benefit from CircleCI caching. |
+1 with your whole suggestion, @lesteve
|
okay
…On 10 May 2017 at 19:12, Gael Varoquaux ***@***.***> wrote:
+1 with your whole suggestion, @lesteve
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8849 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61aH2G-SL2mBJoXUIIDn63-ICwn1ks5r4X9rgaJpZM4NVXG_>
.
|
Shouldn't this be closed? |
There are still more examples we can run:
|
I think that we should give up on running the examples with sys.argv. But the wikipedia one might be nice. |
@lesteve is there something still needed here or we can close? |
There are still some examples that we don't run (i.e. that don't start with
For more context why this matters (at least a little bit):
|
From #8847 (comment):
My proposal is to have a convention like
run_
for examples that do not produce any plots. sphinx-gallery allows to have a regex to specify which examples you want to run. It could be something likeplot_|run_
. See the doc for more details.I looked at the examples whose filename is not starting with
plot_
. Timings are in seconds and in increasing order.With this in mind I would be in favour of running all the examples but
svm_gui.py
and the last three examples.More details:
svm_gui.py
pops up a gui so it should probably not be run. Whether we should runwikipedia_principal_eigenvector.py
andgrid_search_text_feature_extraction.py
which each takes more than 2 minutes is up for debate. On top of that, some of them may require data download that is not using the typical~/scikit_learn_data
(e.g. the Wikipedia one). If that is the case these examples would not benefit from the CircleCI cache.The text was updated successfully, but these errors were encountered: