Skip to content

Run common checks on estimators with non default parameters #17441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

rth
Copy link
Member

@rth rth commented Jun 3, 2020

This adds a mechanism for running common checks on estimators with non default parameters. It works as follows,

  1. try to auto-determine valid values for parameters of each estimator (e.g. solvers, losses etc) (Programatically finding all supported solvers and losses for an estimator #14063). Currently parameters taken into account are,
    • categorical parameters: "solver", "algorithm", "loss", "strategy", "selection", "criterion", "multi_class", "kernel"
    • boolean parameter: "fit_prior", "fit_intercept", "positive" "normalize", "dual", "average", "shuffle".
      The determination for valid parameters for an estimator is bit of a hack but it works in most cases (with a few hard-coded special cases),
>>> from sklearn.tests.test_common_non_default import detect_all_params
>>> from sklearn.linear_model import LogisticRegression
>>> detect_all_params(LogisticRegression)
{'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga'],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'fit_intercept': [False, True],
 'dual': [False, True]}

The list of detected parameters can be obtained with

python sklearn/tests/test_common_non_default.py
ARDRegression
{'fit_intercept': [False, True], 'normalize': [False, True]}
AdaBoostClassifier
{'algorithm': ['SAMME', 'SAMME.R']}
AdaBoostRegressor
{'loss': ['exponential', 'linear', 'square']}
BayesianRidge
{'fit_intercept': [False, True], 'normalize': [False, True]}
BernoulliNB
{'fit_prior': [False, True]}
CategoricalNB
{'fit_prior': [False, True]}
ComplementNB
{'fit_prior': [False, True]}
DecisionTreeClassifier
{'criterion': ['gini', 'entropy']}
DecisionTreeRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
DummyClassifier
{'strategy': ['constant', 'most_frequent', 'prior', 'stratified', 'uniform']}
DummyRegressor
{'strategy': ['constant', 'mean', 'median', 'quantile']}
ElasticNet
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
ElasticNetCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
ExtraTreeClassifier
{'criterion': ['gini', 'entropy']}
ExtraTreeRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
ExtraTreesClassifier
{'criterion': ['gini', 'entropy']}
ExtraTreesRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
GammaRegressor
{'fit_intercept': [False, True]}
GaussianProcessClassifier
{'multi_class': ['one_vs_one', 'one_vs_rest']}
GradientBoostingClassifier
{'criterion': ['gini', 'entropy'], 'loss': ['deviance', 'exponential']}
GradientBoostingRegressor
{'criterion': ['mse', 'friedman_mse', 'mae'],
 'loss': ['huber', 'lad', 'ls', 'quantile']}
HistGradientBoostingClassifier
{'loss': ['auto', 'binary_crossentropy', 'categorical_crossentropy']}
HistGradientBoostingRegressor
{'loss': ['least_absolute_deviation', 'least_squares', 'poisson']}
HuberRegressor
{'fit_intercept': [False, True]}
KNeighborsClassifier
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
KNeighborsRegressor
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
KernelRidge
{'kernel': ['precomputed']}
LabelPropagation
{'kernel': ['knn', 'rbf']}
LabelSpreading
{'kernel': ['knn', 'rbf']}
Lars
{'fit_intercept': [False, True], 'normalize': [False, True]}
LarsCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
Lasso
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
LassoCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True],
 'selection': ['cyclic', 'random']}
LassoLars
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True]}
LassoLarsCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True]}
LassoLarsIC
{'criterion': ['aic', 'bic'],
 'fit_intercept': [False, True],
 'normalize': [False, True],
 'positive': [False, True]}
LinearDiscriminantAnalysis
{'solver': ['eigen', 'lsqr', 'svd']}
LinearRegression
{'fit_intercept': [False, True], 'normalize': [False, True]}
LinearSVC
{'dual': [False, True],
 'fit_intercept': [False, True],
 'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive'],
 'multi_class': ['crammer_singer']}
LinearSVR
{'dual': [False, True],
 'fit_intercept': [False, True],
 'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive']}
LogisticRegression
{'dual': [False, True],
 'fit_intercept': [False, True],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga']}
LogisticRegressionCV
{'dual': [False, True],
 'fit_intercept': [False, True],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga']}
MLPClassifier
{'shuffle': [False, True], 'solver': ['adam', 'lbfgs', 'sgd']}
MLPRegressor
{'shuffle': [False, True], 'solver': ['adam', 'lbfgs', 'sgd']}
MultiTaskElasticNet
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultiTaskElasticNetCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultiTaskLasso
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultiTaskLassoCV
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'selection': ['cyclic', 'random']}
MultinomialNB
{'fit_prior': [False, True]}
NuSVC
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
NuSVR
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
OrthogonalMatchingPursuit
{'fit_intercept': [False, True], 'normalize': [False, True]}
OrthogonalMatchingPursuitCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
PLSCanonical
{'algorithm': ['nipals', 'svd']}
PassiveAggressiveClassifier
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['hinge', 'squared_hinge'],
 'shuffle': [False, True]}
PassiveAggressiveRegressor
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['epsilon_insensitive', 'squared_epsilon_insensitive'],
 'shuffle': [False, True]}
Perceptron
{'fit_intercept': [False, True], 'shuffle': [False, True]}
PoissonRegressor
{'fit_intercept': [False, True]}
RANSACRegressor
{'loss': ['absolute_loss', 'squared_loss']}
RadiusNeighborsClassifier
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
RadiusNeighborsRegressor
{'algorithm': ['auto', 'ball_tree', 'brute', 'kd_tree']}
RandomForestClassifier
{'criterion': ['gini', 'entropy']}
RandomForestRegressor
{'criterion': ['mse', 'friedman_mse', 'mae']}
Ridge
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'solver': ['auto', 'cholesky', 'lsqr', 'sag', 'saga', 'sparse_cg', 'svd']}
RidgeCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
RidgeClassifier
{'fit_intercept': [False, True],
 'normalize': [False, True],
 'solver': ['auto', 'cholesky', 'lsqr', 'sag', 'saga', 'sparse_cg', 'svd']}
RidgeClassifierCV
{'fit_intercept': [False, True], 'normalize': [False, True]}
SGDClassifier
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['hinge', 'log', 'modified_huber', 'squared_hinge', 'perceptron'],
 'shuffle': [False, True]}
SGDRegressor
{'average': [False, True],
 'fit_intercept': [False, True],
 'loss': ['squared_loss',
          'huber',
          'epsilon_insensitive',
          'squared_epsilon_insensitive'],
 'shuffle': [False, True]}
SVC
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
SVR
{'kernel': ['linear', 'poly', 'precomputed', 'rbf', 'sigmoid']}
TheilSenRegressor
{'fit_intercept': [False, True]}
TweedieRegressor
{'fit_intercept': [False, True]}
This step can likely be improved further, but it's a start.
  1. Compute the Cartesian product of all valid parameters for each estimator.
  2. Run estimator checks on the obtained estimators with non default parameters.

This generates 26424 additional tests (the scikit-learn test suite has 16052 tests) which runs in 11 min. The run time can likely be improved by using pytest-xdist on multi-core CPUs .

I think passing these checks would both help detecting new bugs and make our common checks more robust.

This is a somewhat orthogonal solution to #11324 although they both address the same problem.

There are 2858 errors / 26424. I haven't checked the errors in detail and some might be due to parameter detection failure (in particular some parameters might be incompatible). The logs (without tracebacks) can be found in pytest.log.

cc @NicolasHug @thomasjpfan @jeremiedbb @jnothman

TODO:

  • run these tests conditionally in CI (maybe in some Cron job)
  • support other types of estimators besides to classifiers and regressors

@rth
Copy link
Member Author

rth commented Jun 4, 2020

@amueller Investigating some of these failures, and likely fixing the corresponding estimators (or common tests) could also be a nice project for MLH (or sprints).

To run common checks for a particular estimator for all detected parameters,

pytest sklearn/tests/test_common_non_default.py -k LogisticRegression

For instance solver='newton-cg' and solver='sag' will fail for a number of common checks here.

@jnothman
Copy link
Member

jnothman commented Jun 4, 2020 via email

@thomasjpfan
Copy link
Member

This is really cool. If we want to run this in CI, we would need to some way to skip some combinations because of invalid inputs. This can be involved for LogisticRegression (penalty x solver valid combinations).

@rth
Copy link
Member Author

rth commented Jun 4, 2020

If we want to run this in CI, we would need to some way to skip some combinations because of invalid inputs. This can be involved for LogisticRegression (penalty x solver valid combinations).

Yes, good point. I have now skipped the checks if Iris couldn't be fitted with the combination of parameters, as that likely means that they are incompatible. Now there are 22554 tests and "only" 502 failures.

Though yes, trying meta-estimators with different types of
underlying estimator per #11324 would be especially/additionally valuable.

Indeed. Actually so far this only runs tests for classifiers and regressors. I'll add other type of non-meta estimators shortly.

For meta estimators yeah, that would be interesting, but probably a whole new project, as we can't realistically run all estimators for each combination of parameters for each meta-estimator. That could actually be a good use case for hypothesis (#13846) to randomly sample in the parameter space we want to check.

Comment on lines 18 to 36
categorical_params = [
"solver",
"algorithm",
"loss",
"strategy",
"selection",
"criterion",
"multi_class",
"kernel",
]
bool_params = [
"fit_prior",
"fit_intercept",
"positive",
"normalize",
"dual",
"average",
"shuffle",
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using type hints and getting these from there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we could probably try to auto-detect this in the future, but this PR has already a lot of moving parts, and I prefer to manually list parameters to check for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two aspects to my comment which I kinda failed to better explain:

  • it's nice to get the info from type hints cause it's nice ;)
  • it allows better extending this functionality to third party estimators.

My personal motivation behind my comment was the second one. I'm happy to leave these hard coded here instead of adding type hints in our estimators in this PR, but it would be really nice if this PR also introduces detecting the type from the type hints, unless you think it'd make this PR too big/complicated to review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into auto-detecting it at least for boolean, but then you endup with a long list of parameters to exclude, e.g.

['verbose', 'copy', 'copy_X', 'warm_start', 'compute_score', 'compute_labels', 'validate', 'accept_sparse', 'check_inverse', 'copy_X_train', 'add_indicator', 'skip_complete', 'fit_path']

and I have only done estimators starting between A to D. Each of those shouldn't impact the model results but would add to the total runtime. So I think it's easier to whitelist parameter names rather than blacklist them in this case. It could be still possible to use this in contrib projects, by passing the whitelist as input parameter to detect_all_params function (or just monkeypatching the categorical_params / bool_params lists for now).

@rth
Copy link
Member Author

rth commented Jun 4, 2020

Added checks for clusterers and transformers, now there are 58.6k added tests with,

1508 failed, 56931 passed, 2 skipped, 138 xfailed, 24 xpassed, 59857 warnings in 830.60s (0:13:50)

@jnothman
Copy link
Member

jnothman commented Jun 5, 2020 via email

return False


def detect_valid_categorical_params(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to do this by introducing type hints for categorical params? Or by parsing the docstring?

Copy link
Member Author

@rth rth Jun 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For type annotations we would need PEP 586 for literal types and backport it for Python <3.8. That's probably indeed the best solution long term.

I'm not very keen on parsing docstrings, as that can break easily and is less reliable than the current apporach. For instance, estimators taking metrics names as input, all valid names are not always listed, sometimes it's just a link to the user guide.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW the typing-extensions package is an official source of backports too.

And Hypothesis' introspection can inspect typing or typing_extensions.Literal[...] and automatically sample from the parameters 😁

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are working toward having typing in #17799

Moving forward we are going to run into invalid parameter combinations when sampling from the parameter space. Libraries such as ConfigSpace uses a combination of conditional clauses and forbidden clauses to help define the relationships between parameters.

Copy link
Member Author

@rth rth Jul 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the references! I think using hypothesis to sample this parameter space long term could be intersting, as it could also generalize to float parameters. Also to limit the runtime somewhat once the exhaustive combination of parameters becomes untracktable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants