Machine Learning Algorithms Cheatsheet
Machine Learning Algorithms Cheatsheet
Task Properties Main sklearn models (bold = recommended) Key sklearn hyperparameters Should scale features Multi-target/label Deterministic Has predict_proba_() feature_importances_ Typical loss function Typical evaluation metric Training complexity Prediction complexity Space complexity
similar effect
linear_model.LinearRegression (no
regularization), linear_model.Lasso (L1-
regularization), linear_model.Ridge (L2- Use coef_ only if data is Increase alpha (usually O(p²n + p³) for n examples and
Linear regression Regression Linear, deterministic regressor alpha Yes Yes Yes No Mean squared error R², MSE, RMSE O(p) O(p)
regularization), linear_model.ElasticNet (L1 scaled squared L2 and/or L1 penalty) p model parameters
and L2), linear_model.ElasticNetCV,
linear_model.SGDRegressor
Ridge regression Kernel-based, non-linear decision Cross-entropy aka log loss, aka O(p²n + p³) for n examples and
Classification linear_model.RidgeClassifier alpha Yes No Depends on solver No No Increase alpha Weighted F1 O(p) O(p)
classification boundary, binary classifier logistic loss, aka deviance p model parameters
Decrease max_depth,
max_features, max_depth, O(nzp) for n examples, p model
tree.DecisionTreeClassifier, tree. Gini (per split, not global so not max_features, or increase Weighted F1 (classification) or
Decision tree Classification or Regression Non-parametric, multiclass classifier
DecisionTreeRegressor
min_samples_leaf, No Yes Yes Yes Yes
strictly a loss function per se) R², MSE, RMSE (regression)
parameters, if depth is limited O(z) for max depth z O(z)
min_samples_split min_samples_split, to z.
min_samples_leaf
Decrease max_depth,
n_estimators, max_features, O(nzpt) for n examples, p
ensemble.RandomForestClassifier, ensemble. Gini (per split, not global so not max_features, or increase Weighted F1 (classification) or
Random forest Classification or Regression Stochastic, ensemble multiclass classifier
RandomForestRegressor
max_depth, min_samples_leaf, No Yes No Yes Yes
strictly a loss function per se) R², MSE, RMSE (regression)
model parameters, max depth O(zt) O(zt)
min_samples_split min_samples_split, z, and t trees
min_samples_leaf
Decrease max_depth,
Stochastic, ensemble multiclass classifier n_estimators, max_features, O(nzpt) for n examples, p
ensemble.ExtraTreesClassifier, ensemble. Gini (per split, not global so not max_features, or increase Weighted F1 (classification) or
Extremely randomized trees Classification or Regression (ExtraTrees is to ExtraTree as
ExtraTreesRegressor
max_depth, min_samples_leaf, No Yes No Yes Yes
strictly a loss function per se) R², MSE, RMSE (regression)
model parameters, max depth O(zt) O(zt)
RandomForest is to DecisionTree) min_samples_split min_samples_split, z, and t trees
min_samples_leaf