Skip to content

add details on how to use both sample_weight and precompute together for linear models #18973

@amidvidy

Description

@amidvidy

Describe the issue linked to the documentation

Currently it is unclear from the documentation on how the sample_weight argument to fit() interacts with precompute in the case that the user wants to pass in a precomputed Gram matrix. When these two arguments are used together it requires carefully preprocessing the data to replicate the steps performed in _pre_fit.

Here is a snippet of code demonstrating how to do it:

from sklearn.linear_model import ElasticNet
from sklearn.datasets import make_regression
from numpy.testing import assert_almost_equal
import numpy as np

X, y = make_regression(n_samples=int(1e5), noise=0.5)

# random lognormal weight vector.
weights = np.random.lognormal(size=y.shape)

en = ElasticNet(alpha=0.01, fit_intercept=True, normalize=False, precompute=False)
en.fit(X, y, sample_weight=weights)

X_c = (X - np.average(X, axis=0, weights=weights))
# row wise multiply
X_r = X_c * np.sqrt(weights)[:, np.newaxis]

en_precompute = ElasticNet(alpha=0.01, fit_intercept=True, normalize=False, precompute=X_r.T@X_r)
en_precompute.fit(X_c, y, sample_weight=weights)

assert_almost_equal(en.coef_, en_precompute.coef_)

Suggest a potential alternative/fix

Perhaps a section could be added to the user guide (suggested by @ogrisel on Gitter) on how to use these features together, and then that could be referenced from the docstring of the various models that take a precompute parameter in their constructors. @ogrisel also suggested adding a unit test (perhaps adapted from the above snippet) to make sure that this way of combining the two features isn't inadvertently broken in the future.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions