-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT More test runtime optimizations #14136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few explanations below.
The methodology was to measure runtime with pytest module --duration=10
and modify tests that were the slowest. All the changes here are those that have a measurable impact on runtime.
@@ -605,7 +603,7 @@ def test_kl_divergence_not_nan(method): | |||
|
|||
X = random_state.randn(50, 2) | |||
tsne = TSNE(n_components=2, perplexity=2, learning_rate=100.0, | |||
random_state=0, method=method, verbose=0, n_iter=1003) | |||
random_state=0, method=method, verbose=0, n_iter=503) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just want to be sure that tsne.kl_divergence_
is computed when n_iter % n_iter_check != 0
cf comment in the code above.
@@ -678,7 +678,7 @@ def test_matthews_corrcoef_multiclass(): | |||
assert_almost_equal(mcc, 0.) | |||
|
|||
|
|||
@pytest.mark.parametrize('n_points', [100, 10000, 1000000]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last case was taking 6s on my laptop, which is too much for a test, and doesn't seem to add much compared to the other two.
for iid in (True, False): | ||
for refit in (True, False): | ||
random_searches = [] | ||
for scoring in (('accuracy', 'recall'), 'accuracy', 'recall'): | ||
# If True, for multi-metric pass refit='accuracy' | ||
if refit: | ||
probability = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predict_proba
is only used below when refit=True
@@ -812,20 +812,21 @@ def split(self, X, y=None, groups=None): | |||
'not match total number of classes (3). ' | |||
'Results may not be appropriate for your use case.') | |||
assert_warns_message(RuntimeWarning, warning_message, | |||
cross_val_predict, LogisticRegression(), | |||
cross_val_predict, | |||
LogisticRegression(solver="liblinear"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
liblinear converges faster than lbfgs here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's see if it breaks the cron jobs :P
* Feature extraction / feature selection * Metrics, manifold, impute, GP optimization * Optimize mixture * Optimize model_selection * Fix tests * Lint
Optimizes the test runtime of a few more modules.
For the changed modules, this PR reduces the runtime by 40% (or 30s),