Skip to content

ENH default routing policy for sample weight #30887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion sklearn/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import re
import warnings
from collections import defaultdict
from typing import Union

import numpy as np

Expand Down Expand Up @@ -153,7 +154,20 @@ def _clone_parametrized(estimator, *, safe=True):
return new_object


class BaseEstimator(_HTMLDocumentationLinkMixin, _MetadataRequester):
class _SampleWeightEstimatorMixin(_MetadataRequester):
Copy link
Member

@adrinjalali adrinjalali Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably want to have two mixins, for fit and score.

"""A Mixin to request ``sample_weight`` by default.

This Mixin makes the object to request ``sample_weight`` by default as ``True``
for both ``fit`` and ``score`` methods.

.. versionadded:: 1.7
"""

__metadata_request__fit: dict[str, Union[bool, str]] = {"sample_weight": True}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, I don't think we want to have a constant default routing w/o user input. By that, I mean we want to have a "default routing" setting which is enabled with a global config. Something along the lines of

sklearn.set_config(enable_metadata_routing="default_routing")

And that would enable a default routing kinda thing. Also, the default routing seems to me that it should be on the instance level and not the class level. I remember cases where routing would depend on constructor arguments (sub-estimator specifically). It could be something along the lines of:

class MyEstimator(...):
    def __sklearn_default_routing__(self):
        return {"fit": {"sample_weight": True}}

We need to build that in the routing machinery, and then enable that for our estimators.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the point of having a default routing for sample weight is that it's enabled by default (similarly to what we do for the CV groups).

__metadata_request__score: dict[str, Union[bool, str]] = {"sample_weight": True}


class BaseEstimator(_HTMLDocumentationLinkMixin, _SampleWeightEstimatorMixin):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should apply to the estimators individually, and not BaseEstimator. This here would have a lot of impact on external libraries.

"""Base class for all estimators in scikit-learn.

Inheriting from this class provides default implementations of:
Expand Down
5 changes: 4 additions & 1 deletion sklearn/covariance/_empirical_covariance.py
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,10 @@ class EmpiricalCovariance(BaseEstimator):
"""

# X_test should have been called X
__metadata_request__score = {"X_test": metadata_routing.UNUSED}
__metadata_request__score = {
"X_test": metadata_routing.UNUSED,
"sample_weight": True,
}

_parameter_constraints: dict = {
"store_precision": ["boolean"],
Expand Down
5 changes: 4 additions & 1 deletion sklearn/linear_model/_coordinate_descent.py
Original file line number Diff line number Diff line change
Expand Up @@ -879,7 +879,10 @@ class ElasticNet(MultiOutputMixin, RegressorMixin, LinearModel):

# "check_input" is used for optimisation and isn't something to be passed
# around in a pipeline.
__metadata_request__fit = {"check_input": metadata_routing.UNUSED}
__metadata_request__fit = {
"check_input": metadata_routing.UNUSED,
"sample_weight": True,
}

_parameter_constraints: dict = {
"alpha": [Interval(Real, 0, None, closed="left")],
Expand Down
14 changes: 13 additions & 1 deletion sklearn/metrics/_scorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,19 @@ def get_metadata_routing(self):
)


class _BaseScorer(_MetadataRequester):
class _SampleWeightScorerMixin(_MetadataRequester):
"""A Mixin to request ``sample_weight`` by default.

This Mixin makes the object to request ``sample_weight`` by default as ``True``
for the ``score`` method.

.. versionadded:: 1.7
"""

__metadata_request__score = {"sample_weight": True}


class _BaseScorer(_SampleWeightScorerMixin):
"""Base scorer that is used as `scorer(estimator, X, y_true)`.

Parameters
Expand Down
2 changes: 1 addition & 1 deletion sklearn/preprocessing/_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2431,7 +2431,7 @@ class KernelCenterer(ClassNamePrefixFeaturesOutMixin, TransformerMixin, BaseEsti

# X is called K in these methods.
__metadata_request__transform = {"K": metadata_routing.UNUSED}
__metadata_request__fit = {"K": metadata_routing.UNUSED}
__metadata_request__fit = {"K": metadata_routing.UNUSED, "sample_weight": True}

def fit(self, K, y=None):
"""Fit KernelCenterer.
Expand Down
Loading