Skip to content

[MRG] circle-ci should only run/build plot_* files #12797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 28, 2018

Conversation

adrinjalali
Copy link
Member

Discovered in #12791 that if an example using the TkAgg matplotlib bcakend is touched, the doc build fails complaining that the tk library is not present.

This PR is to test this hypothesis, and then to propose a fix to those ubuntu images.

@adrinjalali
Copy link
Member Author

It's not really that, I'm not sure if we should run this example at all. Ping @qinhanmin2014

@jnothman
Copy link
Member

jnothman commented Dec 18, 2018 via email

@adrinjalali adrinjalali changed the title [WIP] Fix the tk bacend issue in circle-ci [MRG] circle-ci should only run/build plot_* files Dec 18, 2018
@adrinjalali
Copy link
Member Author

Although, I really think we should run the examples even if they're not touched, since many changes in the codebase do affect the examples.

@adrinjalali
Copy link
Member Author

I guess this PR now "fixes" the issue which was introduced in #10407, but doesn't solve the larger issue of us missing the new problems introduced in examples due to some changes in the codebase. We have a few options to improve the situation:

  1. Run all examples on all PRs and error on new warnings
  • cons: takes 4 times longer for each circle-ci job to finish (which I think still wouldn't be much more than appveyor or travis times)!
  • pros: detects all potential issues in examples immediately, and the same PR would have to deal with it
  1. Keep the script as is for PRs, run all examples always on master and fail on warnings. This should be feasible once we merge [MRG] DOC Fix warnings in examples #12654 and fix the remaining issues.
  • cons: pretty frequently master will go red and mostly core-devs would have to deal with it.
  • pros: issues are detected immediately, and ci jobs in PRs won't take long.
  1. Keep the script as is, for master and the PRs, but run the docs in a daily cron job and error on warnings
  • cons: issues are detected by some of the core devs and they'll have to deal with it.
  • pros: master stays green, and PR CIs run fast
  1. Leave things as is, but we kinda try to remember to run the examples ourselves rather frequently locally and report or fix the issues.
  • cons: this is mostly what we have now, and it doesn't seem like warnings are kept at a low count.
  • pros: status quo on the CIs are kept, things run fast.

If all options are actually feasible, I personally prefer option 1; but I guess there are legitimate reasons for not running the examples all the time, and I have no idea about the capacity of circle-ci and if it can handle the load if we switch to 1.

@ogrisel
Copy link
Member

ogrisel commented Dec 21, 2018

Thanks for the summary @adrinjalali.

Maybe we can try to go with 1 and see if the impact on the CI duration is unbearable.

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we can try running all plot_ examples if the runtime is manageable.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how we did this before #10407 @adrinjalali I'm pretty sure we have a way to do this previously.

@@ -51,7 +51,7 @@ get_build_type() {
fi
git_range="origin/master...$CIRCLE_SHA1"
git fetch origin master >&2 || (echo QUICK BUILD: failed to get changed filenames for $git_range; return)
filenames=$(git diff --name-only $git_range)
filenames=$(git diff --name-only $git_range -- `find . -name plot_*`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the logs, maybe we should filter these files at changed_examples below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, didn't think it's important, but now I applied the same filter there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inconsistent with our output

filenames=XXX
if [ -z "$filenames" ]
    echo QUICK BUILD: no changed filenames for $git_range
    return
changed_examples=XXX
if [[ -n "$changed_examples" ]]
    echo BUILD: detected examples/ filename modified in $git_range: $changed_examples
    return
echo QUICK BUILD: no examples/ filename modified in $git_range:

so seems that filenames include all the files, not only examples.

@adrinjalali
Copy link
Member Author

Do you know how we did this before #10407 @adrinjalali I'm pretty sure we have a way to do this previously.

I haven't checked yet. In the meantime, I'll let this PR be as is. We can either have it as a temporary solution, or wait until the other solution is ready.

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Dec 23, 2018

I haven't checked yet. In the meantime, I'll let this PR be as is. We can either have it as a temporary solution, or wait until the other solution is ready.

I don't think I'll approve if we're unable to figure out how we did this previously (i.e., how #10407 introduce the regression). Unfortunately I'm unable to figure this out.

@qinhanmin2014
Copy link
Member

ping @lesteve what's your opinion on this? How did we filter examples which do not begin with plot_ previously? Thanks.

@adrinjalali
Copy link
Member Author

@qinhanmin2014 I'm not sure what it is that you're not approving though.

@qinhanmin2014
Copy link
Member

(1) I think filenames should include all the files, not only examples. See https://github.com/scikit-learn/scikit-learn/pull/12797/files#r243750313
(2) I'm not familiar with our CI scripts so I'd like to know how did the regression happen (i.e., how did we filter examples which do not begin with plot_ previously). I think we need these insights to avoid introducing bugs.
I don't mind if someone else give +2 and merge this one.

@qinhanmin2014
Copy link
Member

And I've found #8849: Run more examples that do not start with plot_ on CircleCI :)

@adrinjalali
Copy link
Member Author

@qinhanmin2014 there are two situations where we build the docs. In some cases we build all examples (building the master or release branches for instance), and when we choose to run only the changed examples (on PRs for instance).

Before #10407 we were running all examples on all PRs, and #10407 was a proposal to only run changed files, and was accepted. The issue is that sphinx-gallery by default only chooses the files with a plot_ prefix in their name (unless something else specified in doc/conf.py) , and #10407 changed that behavior to only changed files. The issue was that "changed files" sometimes include files that don't start with plot_, and this PR fixes this issue.

Regarding the issue of including all example and documentation files, I changed the filter pattern to include all .rst files as well. But I don't think it's needed, really, not sure.

Now if at some point #8849 gets in, and we haven't switched to what we're proposing here (i.e. going back to always running all examples), it should also modify the filter introduced in this PR, to include also files starting with run_. But that PR itself is more about when we run all examples, and not when we select only the changed ones.

The issue you pointed out about affected_doc_paths only affects what we show in _changed.html, which is not crucial and we already do recognize that that file is not accurate. Run filtering the files for that file only introduces some extra false positives there.

This all assumes we keep the proposal of #10407, and the discussion here is about reverting that, and always returning BUILD in get_build_type no matter which files are touched (unless some other directive such as [doc skip] is present in the commit message).

I hope this clears up things a bit.

@amueller
Copy link
Member

Took me a while to get what's happening but #10407 overwrote sphinx_gallery_conf.filename_pattern which removed the plot_ filtering. It's a bit ugly that we reimplement that filtering here (in a slightly different way), but I think the false positives are probably negligible?

so +1 for this PR

@@ -57,7 +57,7 @@ get_build_type() {
echo QUICK BUILD: no changed filenames for $git_range
return
fi
changed_examples=$(echo "$filenames" | grep -e ^examples/)
changed_examples=$(echo "$filenames" | grep -E "^examples/(.*/)*plot_")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grep -e seems enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali Hmm, I tried locally and seems that grep -E won't work, I'm not familiar with -E.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the -E just gives special meaning to ( and ), works for me. For me it's equivalent to grep -e "^examples/\(.*/\)*plot_"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies I went through your code too quickly.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CIs are green

@qinhanmin2014 qinhanmin2014 merged commit 77b9b1f into scikit-learn:master Dec 28, 2018
@lesteve
Copy link
Member

lesteve commented Jan 2, 2019

A bit late to the party, but I agree with @amueller #12797 (comment). It is a bit unfortunate to have the plot_ regex in two places but seems the most reasonable thing to do.

A few comments:

  • running all the examples on each PR touching the examples folder was painful if memory serves me right. When someone works on an example you just want to quickly double-check the output on this example and not have to wait on all the examples to run.
  • because we run all the examples on master we catch problems very quickly after the PR is merged anyway. I think those are quite rare cases and I don't remember something like this happening since [MRG+1] CircleCI: run only modified examples in CircleCI #10407 was merged although I may have missed it if happened more recently.
  • you can always add a commit with [doc build] in the message to force the full rebuild if you want to double-check that all your examples run before merging.

adrinjalali added a commit to adrinjalali/scikit-learn that referenced this pull request Jan 7, 2019
@qinhanmin2014 qinhanmin2014 mentioned this pull request Feb 19, 2019
17 tasks
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Feb 20, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants