Run common checks on estimators with non default parameters #17441

rth · 2020-06-03T23:54:36Z

This adds a mechanism for running common checks on estimators with non default parameters. It works as follows,

try to auto-determine valid values for parameters of each estimator (e.g. solvers, losses etc) (Programatically finding all supported solvers and losses for an estimator #14063). Currently parameters taken into account are,
- categorical parameters: "solver", "algorithm", "loss", "strategy", "selection", "criterion", "multi_class", "kernel"
- boolean parameter: "fit_prior", "fit_intercept", "positive" "normalize", "dual", "average", "shuffle".
  The determination for valid parameters for an estimator is bit of a hack but it works in most cases (with a few hard-coded special cases),

>>> from sklearn.tests.test_common_non_default import detect_all_params
>>> from sklearn.linear_model import LogisticRegression
>>> detect_all_params(LogisticRegression)
{'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga'],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'fit_intercept': [False, True],
 'dual': [False, True]}

The list of detected parameters can be obtained with

python sklearn/tests/test_common_non_default.py

ARDRegression
{'fit_intercept': [False, True], 'normalize': [False, True]}
AdaBoostClassifier
{'algorithm': ['SAMME', 'SAMME.R']}
AdaBoostRegressor
{'loss': ['exponential', 'linear', 'square']}
BayesianRidge
{'fit_intercept': [False, True], 'normalize': [False, True]}
BernoulliNB
{'fit_prior': [False, True]}
CategoricalNB
{'fit_prior': [False, True]}
ComplementNB
{'fit_prior': [False, True]}
DecisionTreeClassifier
{'criterion': ['gini', 'entropy']}
DecisionTreeRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
DummyClassifier
{'strategy': ['constant', 'most_frequent', 'prior', 'stratified', 'uniform']}
DummyRegressor
{'strategy': ['constant', 'mean', 'median', 'quantile']}
ElasticNet
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
ElasticNetCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
ExtraTreeClassifier
{'criterion': ['gini', 'entropy']}
ExtraTreeRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
ExtraTreesClassifier
{'criterion': ['gini', 'entropy']}
ExtraTreesRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
GammaRegressor
{'fit_intercept': [False, True]}
GaussianProcessClassifier
{'multi_class': ['one_vs_one', 'one_vs_rest']}
GradientBoostingClassifier
{'criterion': ['gini', 'entropy'], 'loss': ['deviance', 'exponential']}
GradientBoostingRegressor
{'criterion': ['mse', 'friedman_mse', 'mae'],
 'loss': ['huber', 'lad', 'ls', 'quantile']}
HistGradientBoostingClassifier
{'loss': ['auto', 'binary_crossentropy', 'categorical_crossentropy']}
HistGradientBoostingRegressor
{'loss': ['least_absolute_deviation', 'least_squares', 'poisson']}
HuberRegressor
{'fit_intercept': [False, True]}
KNeighborsClassifier
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
KNeighborsRegressor
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
KernelRidge
{'kernel': ['precomputed']}
LabelPropagation
{'kernel': ['knn', 'rbf']}
LabelSpreading
{'kernel': ['knn', 'rbf']}
Lars
{'fit_intercept': [False, True], 'normalize': [False, True]}
LarsCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
Lasso
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
LassoCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
LassoLars
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True]}
LassoLarsCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True]}
LassoLarsIC
{'criterion': ['aic', 'bic'],
 'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True]}
LinearDiscriminantAnalysis
{'solver': ['eigen', 'lsqr', 'svd']}
LinearRegression
{'fit_intercept': [False, True], 'normalize': [False, True]}
LinearSVC
{'dual': [False, True],
 'fit_intercept': [False, True],
 'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive'],
 'multi_class': ['crammer_singer']}
LinearSVR
{'dual': [False, True],
 'fit_intercept': [False, True],
 'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive']}
LogisticRegression
{'dual': [False, True],
 'fit_intercept': [False, True],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga']}
LogisticRegressionCV
{'dual': [False, True],
 'fit_intercept': [False, True],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga']}
MLPClassifier
{'shuffle': [False, True], 'solver': ['adam', 'lbfgs', 'sgd']}
MLPRegressor
{'shuffle': [False, True], 'solver': ['adam', 'lbfgs', 'sgd']}
MultiTaskElasticNet
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultiTaskElasticNetCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultiTaskLasso
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultiTaskLassoCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultinomialNB
{'fit_prior': [False, True]}
NuSVC
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
NuSVR
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
OrthogonalMatchingPursuit
{'fit_intercept': [False, True], 'normalize': [False, True]}
OrthogonalMatchingPursuitCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
PLSCanonical
{'algorithm': ['nipals', 'svd']}
PassiveAggressiveClassifier
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['hinge', 'squared_hinge'],
 'shuffle': [False, True]}
PassiveAggressiveRegressor
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive'],
 'shuffle': [False, True]}
Perceptron
{'fit_intercept': [False, True], 'shuffle': [False, True]}
PoissonRegressor
{'fit_intercept': [False, True]}
RANSACRegressor
{'loss': ['absolute_loss', 'squared_loss']}
RadiusNeighborsClassifier
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
RadiusNeighborsRegressor
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
RandomForestClassifier
{'criterion': ['gini', 'entropy']}
RandomForestRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
Ridge
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'solver': ['auto', 'cholesky', 'lsqr', 'sag', 'saga', 'sparse_cg', 'svd']}
RidgeCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
RidgeClassifier
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'solver': ['auto', 'cholesky', 'lsqr', 'sag', 'saga', 'sparse_cg', 'svd']}
RidgeClassifierCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
SGDClassifier
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge', 'perceptron'],
 'shuffle': [False, True]}
SGDRegressor
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['squared_loss',
          'huber',
          'epsilon_insensitive',
          'squared_epsilon_insensitive'],
 'shuffle': [False, True]}
SVC
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
SVR
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
TheilSenRegressor
{'fit_intercept': [False, True]}
TweedieRegressor
{'fit_intercept': [False, True]}

This step can likely be improved further, but it's a start.

Compute the Cartesian product of all valid parameters for each estimator.
Run estimator checks on the obtained estimators with non default parameters.

This generates 26424 additional tests (the scikit-learn test suite has 16052 tests) which runs in 11 min. The run time can likely be improved by using pytest-xdist on multi-core CPUs .

I think passing these checks would both help detecting new bugs and make our common checks more robust.

This is a somewhat orthogonal solution to #11324 although they both address the same problem.

There are 2858 errors / 26424. I haven't checked the errors in detail and some might be due to parameter detection failure (in particular some parameters might be incompatible). The logs (without tracebacks) can be found in pytest.log.

cc @NicolasHug @thomasjpfan @jeremiedbb @jnothman

TODO:

run these tests conditionally in CI (maybe in some Cron job)
support other types of estimators besides to classifiers and regressors

rth · 2020-06-04T00:05:24Z

@amueller Investigating some of these failures, and likely fixing the corresponding estimators (or common tests) could also be a nice project for MLH (or sprints).

To run common checks for a particular estimator for all detected parameters,

pytest sklearn/tests/test_common_non_default.py -k LogisticRegression

For instance solver='newton-cg' and solver='sag' will fail for a number of common checks here.

jnothman · 2020-06-04T03:03:29Z

Wow! Though yes, trying meta-estimators with different types of underlying estimator per #11324 would be especially/additionally valuable.

thomasjpfan · 2020-06-04T03:56:44Z

This is really cool. If we want to run this in CI, we would need to some way to skip some combinations because of invalid inputs. This can be involved for LogisticRegression (penalty x solver valid combinations).

rth · 2020-06-04T09:02:22Z

If we want to run this in CI, we would need to some way to skip some combinations because of invalid inputs. This can be involved for LogisticRegression (penalty x solver valid combinations).

Yes, good point. I have now skipped the checks if Iris couldn't be fitted with the combination of parameters, as that likely means that they are incompatible. Now there are 22554 tests and "only" 502 failures.

Though yes, trying meta-estimators with different types of
underlying estimator per #11324 would be especially/additionally valuable.

Indeed. Actually so far this only runs tests for classifiers and regressors. I'll add other type of non-meta estimators shortly.

For meta estimators yeah, that would be interesting, but probably a whole new project, as we can't realistically run all estimators for each combination of parameters for each meta-estimator. That could actually be a good use case for hypothesis (#13846) to randomly sample in the parameter space we want to check.

adrinjalali · 2020-06-04T09:13:21Z

sklearn/tests/test_common_non_default.py

+categorical_params = [
+    "solver",
+    "algorithm",
+    "loss",
+    "strategy",
+    "selection",
+    "criterion",
+    "multi_class",
+    "kernel",
+]
+bool_params = [
+    "fit_prior",
+    "fit_intercept",
+    "positive",
+    "normalize",
+    "dual",
+    "average",
+    "shuffle",
+]


how about using type hints and getting these from there?

Yeah, we could probably try to auto-detect this in the future, but this PR has already a lot of moving parts, and I prefer to manually list parameters to check for now.

There are two aspects to my comment which I kinda failed to better explain:

it's nice to get the info from type hints cause it's nice ;)

it allows better extending this functionality to third party estimators.

My personal motivation behind my comment was the second one. I'm happy to leave these hard coded here instead of adding type hints in our estimators in this PR, but it would be really nice if this PR also introduces detecting the type from the type hints, unless you think it'd make this PR too big/complicated to review.

I have looked into auto-detecting it at least for boolean, but then you endup with a long list of parameters to exclude, e.g.

['verbose', 'copy', 'copy_X', 'warm_start', 'compute_score', 'compute_labels', 'validate', 'accept_sparse', 'check_inverse', 'copy_X_train', 'add_indicator', 'skip_complete', 'fit_path']

and I have only done estimators starting between A to D. Each of those shouldn't impact the model results but would add to the total runtime. So I think it's easier to whitelist parameter names rather than blacklist them in this case. It could be still possible to use this in contrib projects, by passing the whitelist as input parameter to detect_all_params function (or just monkeypatching the categorical_params / bool_params lists for now).

rth · 2020-06-04T10:50:21Z

Added checks for clusterers and transformers, now there are 58.6k added tests with,

1508 failed, 56931 passed, 2 skipped, 138 xfailed, 24 xpassed, 59857 warnings in 830.60s (0:13:50)

jnothman · 2020-06-05T03:18:59Z

The idea with meta-estimators would be to check with base estimators that have a range of different properties / APIs, not to run it against all base estimators.

jnothman · 2020-06-05T03:20:21Z

sklearn/tests/test_common_non_default.py

+        return False
+
+
+def detect_valid_categorical_params(


would it make sense to do this by introducing type hints for categorical params? Or by parsing the docstring?

For type annotations we would need PEP 586 for literal types and backport it for Python <3.8. That's probably indeed the best solution long term.

I'm not very keen on parsing docstrings, as that can break easily and is less reliable than the current apporach. For instance, estimators taking metrics names as input, all valid names are not always listed, sometimes it's just a link to the user guide.

FWIW the typing-extensions package is an official source of backports too.

And Hypothesis' introspection can inspect typing or typing_extensions.Literal[...] and automatically sample from the parameters 😁

We are working toward having typing in #17799

Moving forward we are going to run into invalid parameter combinations when sampling from the parameter space. Libraries such as ConfigSpace uses a combination of conditional clauses and forbidden clauses to help define the relationships between parameters.

Thanks for the references! I think using hypothesis to sample this parameter space long term could be intersting, as it could also generalize to float parameters. Also to limit the runtime somewhat once the exhaustive combination of parameters becomes untracktable.

Add common checks with non default parameters

a48da81

rth mentioned this pull request Jun 3, 2020

Programatically finding all supported solvers and losses for an estimator #14063

Open

Skip checks if iris can't be fitted with the combination of parameteters

a9e48a5

adrinjalali reviewed Jun 4, 2020

View reviewed changes

rth added 2 commits June 4, 2020 11:35

Check clusterers

73a34d6

Add checks for transformers

cd4abf1

This was referenced Jun 4, 2020

TST Add tests for LinearRegression that sample weights act consistently #15554

Merged

Ridge(normalize=True) fails check_sample_weights_invariance #17444

Closed

jnothman reviewed Jun 5, 2020

View reviewed changes

adrinjalali mentioned this pull request Jun 11, 2020

API split "model" parameters from "config" params. #17566

Open

rth mentioned this pull request Jul 25, 2020

Add Hypothesis Testing #13846

Open

Base automatically changed from master to main January 22, 2021 10:52

ogrisel mentioned this pull request Mar 5, 2021

[MRG] FIX sample_weight invariance for linear models #19616

Merged

ogrisel mentioned this pull request Jun 15, 2021

FEA Add positive argument to Ridge using L-BFGS-B solver #20231

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run common checks on estimators with non default parameters #17441

Run common checks on estimators with non default parameters #17441

rth commented Jun 3, 2020 •

edited

Loading

rth commented Jun 4, 2020

jnothman commented Jun 4, 2020 via email

thomasjpfan commented Jun 4, 2020

rth commented Jun 4, 2020

adrinjalali Jun 4, 2020

rth Jun 4, 2020

adrinjalali Jun 4, 2020

rth Jun 4, 2020

rth commented Jun 4, 2020

jnothman commented Jun 5, 2020 via email

jnothman Jun 5, 2020

rth Jun 5, 2020 •

edited

Loading

Zac-HD Jul 25, 2020

thomasjpfan Jul 26, 2020

rth Jul 26, 2020 •

edited

Loading

Run common checks on estimators with non default parameters #17441

Are you sure you want to change the base?

Run common checks on estimators with non default parameters #17441

Conversation

rth commented Jun 3, 2020 • edited Loading

rth commented Jun 4, 2020

jnothman commented Jun 4, 2020 via email

thomasjpfan commented Jun 4, 2020

rth commented Jun 4, 2020

adrinjalali Jun 4, 2020

Choose a reason for hiding this comment

rth Jun 4, 2020

Choose a reason for hiding this comment

adrinjalali Jun 4, 2020

Choose a reason for hiding this comment

rth Jun 4, 2020

Choose a reason for hiding this comment

rth commented Jun 4, 2020

jnothman commented Jun 5, 2020 via email

jnothman Jun 5, 2020

Choose a reason for hiding this comment

rth Jun 5, 2020 • edited Loading

Choose a reason for hiding this comment

Zac-HD Jul 25, 2020

Choose a reason for hiding this comment

thomasjpfan Jul 26, 2020

Choose a reason for hiding this comment

rth Jul 26, 2020 • edited Loading

Choose a reason for hiding this comment

rth commented Jun 3, 2020 •

edited

Loading

rth Jun 5, 2020 •

edited

Loading

rth Jul 26, 2020 •

edited

Loading