Skip to content

FEA Add variable importance to linear models #21170

Open
@lorentzenchr

Description

@lorentzenchr

Describe the workflow you want to enable

I'd like to have a feature importance method native to linear models (without L1 penalty) that is calculated on the training set:

clf = LogisticRegression(with_importance=True)
clf.fit(X, y)
clf.feature_importances_  # or some nice plot thereof

Describe your proposed solution

New proposal

Evaluate if the LMG (Lindeman, Merenda and Gold, see [1, 2]) is applicable and feasible for L2 penalized regression and for GLMs. Else, consider other measures of [1, 2].

In short, LMG is Shapley value decomposition of R2 by the features.

References:

Original proposal

Compute the t-statistic of the coefficients

t[j] = coef[j] / std(coef[j])

and use the absolute, i.e. |t|, as measure of (in-sample) importance. For GLMs like the logistic regression, see section 5.3 in https://arxiv.org/pdf/1509.09169.pdf for a formula of Var[coef].

Describe alternatives you've considered, if relevant

Any general importance measure (permutation importance, SHAP values, ...) also works.

Additional context

Given the great and legitimate need for interpretability, I would favor to have a native importance measure for linear models. Random Forests have their own native feature_importances_ with the warning

impurity-based feature importances can be misleading for high cardinality features (many unique values).

We could add a similar warning for collinear features like

feature importances can be misleading for collinear or high-dimensional features.

I guess, in the end, this is true for all feature importance measures, even for SHAP (see also our multicollinear example).

Prior discussions like #16802, #6773, #13048, focued on p-values which seem out-of-scope for scikit-learn for different reasons. I hope we can circumvent these reasons by focusing on feature importance only and not considering p-values.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions