-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] sample props (proposal 4) #16079
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Yay! I've been trying to find more time to work on the SLEP, but my break
was consumed visiting family. Thanks
|
Here's an example with a pipeline. @jnothman would you please check if it's [almost] what you had in mind? @NicolasHug you were interested in this one as well IIRC. from sklearn.model_selection import GroupKFold, cross_validate
from sklearn.feature_selection import SelectKBest
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import accuracy_score
from sklearn.metrics import make_scorer
import numpy as np
from sklearn.datasets import make_classification
X, y = make_classification()
sw = np.random.rand(len(X))
sw2 = [5, 6]
brand = ['my brand'] from sklearn.base import BaseEstimator, ClassifierMixin, TransformerMixin
from sklearn.utils.validation import _validate_required_props
from sklearn.svm import SVC
class MyEst(ClassifierMixin, BaseEstimator):
def __init__(self):
self._props_request = {'fit': ['sample_weight', 'brand']}
def fit(self, X, y, **fit_params):
_validate_required_props(self, fit_params, 'fit')
self._estimator = SVC().fit(X, y, fit_params['sample_weight'])
print("{} recieved params {} of lengths {}".format(
self.__class__.__name__,
list(fit_params.keys()), [len(v) for v in fit_params.values()]))
class MyTrs(TransformerMixin, BaseEstimator):
def __init__(self):
self._props_request = {'fit': ['sample_weight']}
def fit(self, X, y=None, **fit_params):
_validate_required_props(self, fit_params, 'fit')
self._estimator = SelectKBest().fit(X, y)
print("{} recieved params {} of lengths {}".format(
self.__class__.__name__,
list(fit_params.keys()), [len(v) for v in fit_params.values()]))
return self
def transform(self, X, y=None):
return self._estimator.transform(X) clf = make_pipeline(MyTrs(), MyEst())
clf.fit(X, y, sample_weight=sw, brand=brand)
trs = MyTrs().set_props_request(
None).set_props_request(
{'fit': {'new_param': 'my_sw'}})
clf = make_pipeline(trs, MyEst())
clf.fit(X, y, sample_weight=sw, brand=brand, my_sw=sw2)
clf.fit(X, y, brand=brand)
|
That example looks nice! I think I had not envisaged that all requested props were required, i.e. an estimator could request sample_weight by default but only receive it if available. What are your thoughts on that? What I'm not sure about is whether we need to specify that the request is for fit. I suppose that makes things more flexible. I have looked at your modification to clone. Okay. But I've not yet looked at much else |
I also first thought of having it as "please give me that prop if you have it" rather than "I really need it". The reason I have it as a mandatory requirement, is that it kinda removes the chances of a meta estimator not passing a required prop by mistake. My gut feeling was that it removes some of those bugs.
I think your other proposals had a more flexible system for routing the props around. Especially when it comes to fit/score/split distinction. Also, I thought this makes it easier to have some props going to transform/predict as well. |
But this would be a mistake in meta-estimator implementation which should not be the user's problem? |
This looks good! Edit: previous comment was irrelevant after I finally read all of #4497 and #SLEP006. My initial conception (and hacky implementation in personal project) looked very similar to the single argument, no routing approach. Now for some comments (sorry if these are still uninformed):
I guess my main question/point is: can this be done in two phases? Phase 1 would be implementing the basic infrastructure (i.e. changing a lot of signatures thought the API) so that That said, how can I be of use to get this done? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good, but it needs work :)
sklearn/pipeline.py
Outdated
@@ -257,6 +258,24 @@ def _log_message(self, step_idx): | |||
len(self.steps), | |||
name) | |||
|
|||
def get_props_request(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There will be cases where the meta-estimator requests a prop, such as sample_weight
regardless of whether its descendant estimators also request props.
The implementation here implies that the caller of get_props_request
always wants the "deep" props of its descendants. Do we have use cases that would want us to be able to inspect only the locally consumed props?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of such a case off the top of my head, but if we do at some point, we can add a deep
parameter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to work through this more or assist with the changes
I think the consensus at the last meeting was to rename the prop request to a metadata request. How is this going? Is there something I can help with here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is now a bit simpler, tests on sklearn.metrics
pass, and test_props.py
also passes.
I need to support the old behavior in grid search and pipeline somehow as well.
sklearn/pipeline.py
Outdated
@@ -257,6 +258,24 @@ def _log_message(self, step_idx): | |||
len(self.steps), | |||
name) | |||
|
|||
def get_props_request(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think of such a case off the top of my head, but if we do at some point, we can add a deep
parameter here.
Continuing on this thought that only some estimators are PropConsumers, and
those that are only support a small set of props, I wonder if we should
make setting the prop consumer's request simpler / more explicit. For
example, instead of set_prop_request, should we just have a parameter
request_sample_weight in each sample_weight-consuming estimator?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of the idea above, to get rid of set_metatdata_request and instead just add request_sample_weight parameters:
request_sample_weight : bool, str
When this estimator is in a Pipeline or other meta estimator, its fit
method will receive the metadata passed as 'sample_weight' if
request_sample_weight is True. If it is a string, it will receive the
metadata passed with that key. If False, no value of sample_weight
will be requested.
With this approach, the keys supported by a consumer become explicit, giving us:
- Free error checking of the expected prop name
- Free cloning
- Higher visibility
- Familiar API
- No need to distinguish API for setting vs updating an estimator's request
- No need to burden users with deciding which verbs are relevant to a request, i.e. is this a fit param or a transform param, because this is now controlled by what the estimator designer encodes about how the estimator consumes each specific parameter.
Disadvantages include needing to add a parameter to every estimator that supports weighting, and hence increased friction if we want to add more user-configurable requests.
sklearn/pipeline.py
Outdated
fit_params_steps[step][param] = pval | ||
""" | ||
# Remove old_behavior in 0.26 | ||
old_behavior = np.any(['__' in pname for pname in fit_params.keys()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really anything not requested by a child should raise an error in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_validate_required_props
does that, or do you mean something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right.
Though what I mean is that a child could request a parameter with a __
in its name, and in the future it should not raise a warning or an error. I'm okay for it to warn now as well.
sklearn/pipeline.py
Outdated
for _, name, transformer in self._iter(filter_passthrough=True): | ||
if transformer is None: | ||
continue | ||
transformer_fit_props = transformer.get_metadata_request().fit | ||
try: | ||
transformer_fit_props = transformer.get_metadata_request().fit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to require attribute access to get fit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't understand, could you please elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean this assumes that the metadata request has an attribute called 'fit'. Should we assume that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_metadata_request()
will always return a fit
attribute even if it's empty with the current implementation.
I don't like adding these parameters to the constructor. I'm still hoping we'd use this as a starting point for not having all parameters as constructor parameters. Other than that, the API you propose also doesn't have the flexibility for the routing we have right now. You can't route different sample weights to different estimators or different metadata to different verbs (fit, transform, etc), which the user might want to control and not leave it to the developer of the estimator. I'm happy to introduce helper functions such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried another way to have an easy way to handle common fit parameters like sample_weight. You can see the remnants of the work here.
I thought of having a Mixin for each parameter, like FitRequestsSampleWeight
which would set a parameter such as _metadata_request__sample_weight
, and the get_metadata_request
would merge all such attributes together. The issue I encountered was how the user would change the default settings. If they do it through the _metadata_request
, then some sort of overriding mechanism should exist, which is tricky if we allow users to pass a given prop to multiple ones, which they can do now: a single sample_weight
can be passed to fancy_sample_weight
and nonfancy_sample_weight
for example, if the estimator would accept both.
sklearn/pipeline.py
Outdated
fit_params_steps[step][param] = pval | ||
""" | ||
# Remove old_behavior in 0.26 | ||
old_behavior = np.any(['__' in pname for pname in fit_params.keys()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_validate_required_props
does that, or do you mean something else?
sklearn/pipeline.py
Outdated
for _, name, transformer in self._iter(filter_passthrough=True): | ||
if transformer is None: | ||
continue | ||
transformer_fit_props = transformer.get_metadata_request().fit | ||
try: | ||
transformer_fit_props = transformer.get_metadata_request().fit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't understand, could you please elaborate?
Yes, the assumption is that the verbs relevant to a parameter is determined by the meaning of the parameter to that consumer. So that parameter knows how to consume A possibility: A method called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then some sort of overriding mechanism should exist, which is tricky if we allow users to pass a given prop to multiple ones, which they can do now: a single sample_weight can be passed to fancy_sample_weight and nonfancy_sample_weight for example, if the estimator would accept both.
I'm not entirely sure what the issue is here. The consumer has a notion of sample_weight. if it had a notion of multiple different weights, like fitting_weight and calibration_weight or something, then each of them is a separate request. The semantics of something like _metadata_request__sample_weight
are that the consumer knows about something called sample_weight
, but the request instructs a metaestimator where to get it from.
sklearn/pipeline.py
Outdated
fit_params_steps[step][param] = pval | ||
""" | ||
# Remove old_behavior in 0.26 | ||
old_behavior = np.any(['__' in pname for pname in fit_params.keys()]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right.
Though what I mean is that a child could request a parameter with a __
in its name, and in the future it should not raise a warning or an error. I'm okay for it to warn now as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we validate expected props in set_metadata_request
? we validate param names in set_params
. We could, a bit like set params, check against the signature of the targeted method if **kw
is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not loving how _get_props_from_objs
is used in *SearchCV. The logic doesn't appear very clear. Is the following a helpful tool to make this more explicit and less error-prone?
def build_router_metadata_request(children : Dict[str, Container[object]],
routing : List[Tuple[str, str, str]]):
"""Compose the request that a router should return from get_metadata_request
For example, GridSearchCV might use:
def get_metadata_request(self):
score_from = "base" if self.scoring is None else "scorers"
return build_router_metadata_request(
children={"base": [self.estimator],
"cv": [check_cv(self.cv)],
"scorers": check_multimetric_scoring(...)[0].values()},
routing=[
("base", "fit", "fit"),
("cv", "fit", "split"),
(score_from, "fit", "score"),
(score_from, "score", "score"),
# XXX: we might want a special way to handle 'remainder'
("base", "transform", "transform"),
("base", "inverse_transform", "inverse_transform"),
("base", "predict", "predict"),
])
Parameters
----------
children : dict
Key is a role name (e.g. 'step') used as an identifier in the routing parameter.
Value is a collection of objects that may provide get_metadata_request.
routing : list of tuples
Tuples are a triple of strings (child_role, router_method, child_method).
"""
children_props = {k: _get_props_from_objs(v) for k, v in children.items()}
out = {method: set() for method in _empty_metadata_request()}
for child_role, router_method, child_method in routing:
# note that we get rid of what the child wants each item to be called
# and only keep the names we expect to be passed here
out[router_method].update(children_props[child_method].keys())
return out
Also:
Just thought of a tricky case that we probably won't handle for a while: a grid search over two different estimators where one of the estimators requests sample_weight and the other does not:
GridSearchCV(Pipeline([('step', None)]),
{"step": [LogisticRegression().set_metadata_request({'fit': ['sample_weight']}),
LogisticRegression()]})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for my patchwork reviews.
Here I'm trying to make the tests more complete.
I still think the set_metadata_request
API needs some work. I find it funny that a for an estimator not supporting sample_weight
, I can still run .set_metadata_request({'fit': {'sample_weight': 'sample_weight'}})
without error and that for an estimator supporting sample_weight
, I can run .set_metadata_request({'fit': {'sample_weight': 'ample_weight'}})
without error. (Hence my suggestion of request_sample_weight
instead. I'd accept another solution, but am not really happy with the present one.)
The rest is shaping up very nicely!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for my patchwork reviews.
Here I'm trying to make the tests more complete.
I still think the set_metadata_request
API needs some work. I find it funny that a for an estimator not supporting sample_weight
, I can still run .set_metadata_request({'fit': {'sample_weight': 'sample_weight'}})
without error and that for an estimator supporting sample_weight
, I can run .set_metadata_request({'fit': {'sample_weight': 'ample_weight'}})
without error. (Hence my suggestion of request_sample_weight
instead. I'd accept another solution, but am not really happy with the present one.)
The rest is shaping up very nicely!
How about this kind of thing for consumers?: class MetadataConsumer:
def _request_key_for_method(self, method, key, value):
method_dict = self._metadata_request[method]
if value is True:
method_dict[key] = key
elif value is False:
del method_dict[key]
elif isinstance(value, str):
method_dict[key] = value
else:
raise TypeError
class SampleWeightConsumer(MetadataConsumer):
def request_sample_weight(*, fit=True, score=False):
"""Define how to receive sample_weight from a parent meta-estimator
When called with default arguments, fitting will be weighted,
and the meta-estimator should be passed a fit parameter by the name
'sample_weight'.
Parameters
----------
fit : string or bool, default=True
The fit parameter name that a meta-estimator should pass to this
estimator as sample_weight. If true, the name will be 'sample_weight'.
If False, no fit parameter will be passed.
score : string or bool, default=True
The parameter name that a meta-estimator should pass to this
estimator as sample_weight when calling its `score` method.
If true, the name will be 'sample_weight'.
If False, no fit parameter will be passed.
Returns
-------
self
Examples
--------
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import GridSearchCV
>>> from sklearn.datasets import load_iris
>>> X, y = load_iris(return_X_y=True)
>>> sample_weight = np.random.rand(len(X))
>>> clf = LogisticRegression()
>>> gs = GridSearchCV(clf, {"C": [.1, 1, 10]})
# Unweighted fitting and scoring
>>> gs.fit(X, y)
# Weighted fitting, unweighted scoring
>>> clf.request_sample_weight()
>>> gs.fit(X, y, sample_weight=sample_weight)
# Weighted fitting and scoring
>>> clf.request_sample_weight(fit=True, score=True)
>>> gs.fit(X, y, sample_weight=sample_weight)
# Weighted scoring only
>>> clf.request_sample_weight(fit=False, score=True)
>>> gs.fit(X, y, sample_weight=sample_weight)
# Distinct weights for fit and score
>>> score_sample_weight = np.random.rand(len(X))
>>> clf.request_sample_weight(fit='fit_sample_weight', score='score_sample_weight')
>>> gs.fit(X, y, fit_sample_weight=sample_weight,
... score_sample_weight=score_sample_weight)
"""
self._request_key_for_method('fit', 'sample_weight', fit)
self._request_key_for_method('score', 'sample_weight', score)
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need that deep
parameter for clone. set_metadata_request
needs to only pertain to the needs of that estimator as consumer, not as router, and this is what should be copied in clone. However, for the purpose of actually doing the routing, there needs to be a way to get the "deep" metadata request.
On the topic of request_*
methods:
I think we should keep set_metadata_request
, if only because of its use in clone
, but we might emphasise that most users should not use it as request_*
methods should be a simpler, more explicit way of accessing the same functionality. But I'm struggling to work out how to neatly make request_sample_weight
appear in only the consumers of sample_weight
.
I'd now tweak the example in the previous comment such that neither fit
nor score
defaults to True or False. Both should default to None, and no change should occur.
I copy the variable itself now, this should solve the deep/shallow issue.
Need the method in metrics, but seems like we could do w/o for the other estimators. Happy to put it back. Made it private for now, since it's only used in
We just need to use the mixin only where it's appropriate. I'm not sure if I get where you see the challenge here.
Yes, agreed. There's another challenge which is to have a set of defaults in different mixins, and have I'm not totally happy with how it's done in The tests now include the 4 usecases presented in the SLEP and they run, but need to add tests to see if they're run correctly. |
It works fine if there's no ambiguity. A more complicated model, e.g. with
stacking, would mess that up.
Governance is not clear re updates to the slep. Does the proposed slep000
cover it??
|
SLEP000 has a "provisionally accepted":
I feel the safe choice would be to reject and stay in "Draft":
This way some of the concerns here can be resolved in the SLEP. https://github.com/scikit-learn/enhancement_proposals/pull/30/files |
Hey @adrinjalali, I'd like to have this closer to the SLEP when we next bring it to vote, and I wonder whether:
|
@jnothman I'm happy to do both. I'm working on a cleaner solution for the same API. I'd be happy to have a sprint to work on this together as well. |
Just to leave our scribbles from yesterday: # A messy case
# One option in grid search is weighted, the other is unweighted
# Messy because when do you work out all the requests, and that all metadata
# are consumed?
GridSearchCV(
Pipeline([
('clf', None)
]),
param_grid={"clf": [SVM().fit_requests(sample_weight=True),
SVM().fit_requests(sample_weight=False)],
"clf__C": [1,2,3],
}
).fit(X, y, sample_weight=sample_weight)
###############
# Request type behaviour
class UnchangedSingleton:
__dict__ = None
def __repr__(self):
return "<UNCHANGED>"
from enum import Enum
class RequestType(Enum):
UNREQUESTED = False
REQUESTED = True
ERROR_IF_PASSED = None
UNCHANGED = UnchangedSingleton()
def fit_requests(self,
sample_weight : Union[RequestType, str, Literal[UNCHANGED]] = UNCHANGED):
fit_requests(sample_weight=metadata.REQUESTED)
fit_requests(sample_weight=metadata.UNREQUESTED)
fit_requests(sample_weight=metadata.ERROR_IF_PASSED)
fit_requests(sample_weight='my_weight')
###############
# Magic creation of `fit_requests` etc.
class RequestMethod:
def __init__(self, name, keys):
self.name = name
self.keys = keys
def __get__(self, instance, owner):
# what we are avoiding
def func(**kw):
if set(kw) - keys:
raise TypeError("Unexpected kws")
instance._metadata_request = ....
return instance
func.__name__ = f"{self.name}_requests"
func.__signature__ = inspect.Signature([
inspect.Parameter(k,
inspect.Parameter.KEYWORD_ONLY,
default=metadata.UNCHANGED,
annotation=Optional[Union[RequestType, str]])
for k in keys
], return_annotation=type(instance))
func.__doc__ = "Smething useful"
return func
class BaseEstimator:
def __init_subclass__(cls, **kwargs):
metadata = defaultdict(set)
for k, v in vars(cls).items():
if k.startswith("_metadata_re,
annotation=quest___"):
for request_method, request_keys in v:
metadata[request_method].update(request_keys)
# now we have something like
# metadata = {"fit": {"sample_weight", "groups"}}
for request_method, request_keys in metadata:
# set a fit_requests method
setattr(f"{request_method}_requests",
RequestMethod(request_method, sorted(request_keys)))
"""
REQUEST_METHOD_MAPPING = [
('fit', "fit"),
("fit_transform", "fit"),
("fit_transform", "transform"),
("transform", "transform"),
]
metadata = defaultdict(set)
for method, request_method_name in REQUEST_METHOD_MAPPING:
if method in attrs:
# for example, if the class being instantiated has `fit`
sig = inspect.signature(attrs[method])
metadata[request_method_name].add()
"""
super().__init_subclass__(**attrs) |
The three request states makes a sense. I do not see a clean way to avoid using the sentinel value (reminds me of PEP661). |
Well, we can avoid the use of None for ERROR_IF_CHANGED, and use a sentinel for that instead. That would allow us to use |
What do we use for I'm assuming we still want Edit: I meant |
Yes, True is fine.
Strings used for key aliasing should usually be valid identifiers so we can
use a string sentinel containing a non-alphanum character as the "error if
changed" sentinel without getting in too much trouble. It will only be seen
by users who call get_metadata_request. I've not yet decided which is more
friendly, that or None.
|
I think "error if passed" is more explicit than
From the original SLEP, I was under the impression that # No metadata requested
log_reg = LogisticRegression()
log_reg.get_metadata_request()
# {'fit': {}, 'predict': {}, 'transform': {}, 'score': {}, 'split': {}, 'inverse_transform': {}} Do we want log_reg.get_metadata_request()
# {'fit': {'sample_weight': 'error-if-passed'}, ...} |
Yes, I think that's the right default request.
|
I've put an alternate implementation here: #20350 It is closer to what we have in the slep, and includes |
* initial base implementation commit * fix test_props and the issue with attribute starting with __ * skip doctest in metadata_routing.rst for now * DOC explain why aliasing on sub-estimator of a consumer/router is useful * reduce diff * DOC add user guide link to method docstrings * DOC apply Thomas's suggestions to the rst file * CLN address a few comments in docs * ignore sentinel docstring check * handling backward compatibility and deprecation prototype * Update examples/plot_metadata_routing.py Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> * make __metadata_request__* format more intuitive and less redundant * metadata_request_factory always returns a copy * fix tests for the changed __metadata_request__* format * in example: foo->sample_weight, bar->groups * get_method_input->get_input * minor comments from Guillaume * fix estimator checks tests * Improved sample props developer API * fixes, updated doc, decorator * Add docstrings and some API cleanup * unify serialize/deserialize methods * Add more docstring to process_routing * fix MetadataRouter.get_params parameter mismatch * DOC add missing name to MethodMetadataRequest.deserialize docstring * DOC add MethodMapping.add docstring * DOC fix colons after versionadded * fix {method}_requests return type annotation * metadata_request_factory -> metadata_router_factory and docstring fixes * move 'me' out of the map in MetadataRouter * more docstring refinements * cleanup API addresses and create a utils.metadata_routing sub-folder * fix module import issue * more tests and a few bug fixes * Joel's comments * make process_routing a function * docstring fix * ^type -> $type * remove deserialize, return instance, and add type as an attribute * remove sentinels and use strings instead * make RequestType searchable and check for valid identifier * Route -> MethodPair * remove unnecessary sorted * clarification on usage of the process_routing func in the example * only print methods with non-empty requests * fix test_string_representations * remove source build cache from CircleCI (temporarily) * Trigger CI * Invalidate linux-arm64 ccache my changing the key * Trigger CI * method, used_in -> callee, caller * show RequestType instead of RequestType.value in _serialize() * more informative error messages * fix checking for conflicting keys * get_router_for_object -> get_routing_for_object * \{method\}_requests -> set_\{method\}_request * address metadata_routing.rst comments * some test enhancements * TypeError for extra arguments * add_request: prop -> param * original_names -> return_alias * add more tests for MetadataRouter and MethodMapping * more suggestions from Joel's review * fix return type * apply more suggestions from Joel's review * Christian\'s suggestions * more notes from Christian * test_get_routing_for_object returns empty requests on unknown objects * more notes from Christian * remove double line break * more notes from Christian * more notes from Christian * make type private * add more comments/docs * fix test * fix nits * add forgotten nit Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* FEAT add metadata routing to splitters * forgot black * add end of file line * ignore mypy error * SIMPLE_SPLITTERS -> NO_GROUP_SPLITTERS * simplify comment * add notes on what the tests do * improve test and avoid using repr
* initial base implementation commit * fix test_props and the issue with attribute starting with __ * skip doctest in metadata_routing.rst for now * DOC explain why aliasing on sub-estimator of a consumer/router is useful * reduce diff * DOC add user guide link to method docstrings * DOC apply Thomas's suggestions to the rst file * CLN address a few comments in docs * ignore sentinel docstring check * handling backward compatibility and deprecation prototype * Update examples/plot_metadata_routing.py Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> * make __metadata_request__* format more intuitive and less redundant * metadata_request_factory always returns a copy * fix tests for the changed __metadata_request__* format * in example: foo->sample_weight, bar->groups * get_method_input->get_input * minor comments from Guillaume * fix estimator checks tests * Improved sample props developer API * fixes, updated doc, decorator * Add docstrings and some API cleanup * unify serialize/deserialize methods * Add more docstring to process_routing * fix MetadataRouter.get_params parameter mismatch * DOC add missing name to MethodMetadataRequest.deserialize docstring * DOC add MethodMapping.add docstring * DOC fix colons after versionadded * fix {method}_requests return type annotation * metadata_request_factory -> metadata_router_factory and docstring fixes * move 'me' out of the map in MetadataRouter * more docstring refinements * cleanup API addresses and create a utils.metadata_routing sub-folder * fix module import issue * more tests and a few bug fixes * Joel's comments * make process_routing a function * docstring fix * ^type -> $type * remove deserialize, return instance, and add type as an attribute * remove sentinels and use strings instead * make RequestType searchable and check for valid identifier * Route -> MethodPair * remove unnecessary sorted * clarification on usage of the process_routing func in the example * only print methods with non-empty requests * fix test_string_representations * remove source build cache from CircleCI (temporarily) * Trigger CI * Invalidate linux-arm64 ccache my changing the key * Trigger CI * method, used_in -> callee, caller * show RequestType instead of RequestType.value in _serialize() * more informative error messages * fix checking for conflicting keys * get_router_for_object -> get_routing_for_object * \{method\}_requests -> set_\{method\}_request * address metadata_routing.rst comments * some test enhancements * TypeError for extra arguments * add_request: prop -> param * original_names -> return_alias * add more tests for MetadataRouter and MethodMapping * more suggestions from Joel's review * fix return type * apply more suggestions from Joel's review * Christian\'s suggestions * more notes from Christian * test_get_routing_for_object returns empty requests on unknown objects * more notes from Christian * remove double line break * more notes from Christian * more notes from Christian * make type private * add more comments/docs * fix test * fix nits * add forgotten nit Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
* FEAT add metadata routing to splitters * forgot black * add end of file line * ignore mypy error * SIMPLE_SPLITTERS -> NO_GROUP_SPLITTERS * simplify comment * add notes on what the tests do * improve test and avoid using repr
Taking a stab at proposal 4 of the sample prop proposal (SLEP006 Routing sample-aligned meta-data).
Clear WIP, just putting it here for now for us to see how it'd look.
The first change I've made compared to the proposal is that the dict
storing the required props, includes a key as which method requires
which props. This way score and fit and predict can all have different
requirements.
You can see SLEP examples implemented in
test_props.py
astest_slep_case*
functions.Closes #9566, Closes #15425, Closes #3524, Closes #15425,
Fixes #4497, Fixes #7136, Fixes #13432, Fixes #4632, Fixes #7646 (add test), Fixes #8127 (add test), Fixes #8158,
Enables #6322,
TODO:
#12052, #8710, #15282, #2630, #8950, #11429, #15282, #18028