-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] Ensure delegated ducktyping in MetaEstimators #2854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
isn't there simple and stupid solution to this problem? meta-ducktyping and decorators are a huge code complexity overhead. |
Yes: don't use hasattr, and specify capabilities some more explicit way, On 14 February 2014 03:14, Alexandre Gramfort notifications@github.comwrote:
|
And it's possible to implement without a custom descriptor, just using A final alternative that avoids decorators is implementing |
I would not have expected that the GridSearch I initially wrote in 2010 would become such as beast (monster?). I am really -1 on this PR. There must be a simple solution. Why is using hasattr so bad for you? |
It's really too big. It's a problem. The code is hard to follow and bugs I think that their is a general problem of scope and organization in the |
hasattr only works with this sort of patch, sorry to say. Hasattr is all fine when the functional attributes of a class do not change depending on its instance. See #1805.
Not bugs so much as edge cases.
I think some recent refactoring has benefited, but there is certainly room for improvement. But things like scorers which are really there to patch over issues in API design certainly help to make appearances of a mess. |
Your original implementation used: if hasattr(best_estimator, 'predict'):
self.predict = best_estimator.predict
if hasattr(best_estimator, 'score'):
self.score = best_estimator.score Good luck pickling that object. Identify a better serialisation method and the code can be cleaner. |
This is a bug I've been considering how to fix for most of a year. If there "must be a simple solution", @agramfort, please help me find it, rather than rejecting for cosmetic/sentimental reasons. |
rather than having GridSearchCV adding constraints on the estimators implementations, which ultimately creates frameworkish code, I would patch GridSearchCV with properties. It might be more verbose / less elegant but your solution, although pythonic, has the downside that new comers who want to contribute need to be python experts. scikit-learn needs more people able to write algorithms and boost training speed than python experts. so I would have done:
by reading this I know what it's doing immediately. |
Thanks. I wrote this descriptor because it made the property's purpose clearer by giving it a name. I'm happy to remove it, but your solution doesn't work. I think what you meant to write is: @property
def predict(self, X):
return self.best_estimator_.predict This is what we have in master (and which I wrote when I thought it was sufficient, but still unclear). But #2853 is broken still: So a minimal fix with @property
def predict(self):
self.estimator.predict # determine hasattr response
return lambda X: self.best_estimator_.predict(X) Although explicit, I find this exceedingly verbose and hard to understand both its effect and its motivation at a glance. But it certainly works, and I'm happy to use this solution if there's some consensus that it's the better choice. |
I'm not sure if this line is gonna work. Python may optimize it as a no-op. |
Is there anywhere you know of that suggests attribute lookup side-effects shouldn't be taken for granted? I don't have a problem with the following equivalent: @property
def predict(self):
if hasattr(self.estimator, 'predict'):
return lambda X: self.best_estimator_.predict(X)
else:
raise AttributeError('estimator cannot predict') If that suits @agramfort, I'm happy to implement it, but it is somewhat repetitious. |
The main problem with this approach is it seems an awkward way to write it. A custom decorator makes it look a bit magic, but it seems to me a bit more straightforward to read. |
@mblondel It isn't a no-op since it can raise an exception, so Python cannot optimize it away without breaking its own semantics (this is why writing optimizing Python compilers is hard). I actually find that solution quite readable. |
Which one is "that one"? I guess I need to number them now... On 18 February 2014 20:49, Lars Buitinck notifications@github.com wrote:
|
This one.
Maybe change the comment to "trigger AttributeError before call" |
I also should note that a solution like that's not entirely appropriate in And I don't think "trigger AttributeError before call" is sufficient for people who don't realise triggering |
awkward but explicit and simple :) magic often ends up bringing more problems than expected |
I think that depends on what you mean by explicit. I think we're talking On 18 February 2014 21:04, Alexandre Gramfort notifications@github.comwrote:
|
It seems like @agramfort and @larsmans like the solution you suggested in #2854 (comment). Instead of abstracting things away, perhaps you should use this solution wherever needed (at the cost of some code duplication).
Can't you use private methods in this case? |
Yes, in one of my previous attempts to fix this bug in Pipeline I did exactly that, using a helper method, and received a critique along the lines of "flat is better than nested". I think it has to be admitted that someone will dislike the solution to this because it can't be done super-neatly. Either we:
Ultimately I think we need to fix this bug, and not worry too much about the cosmetics until someone can come up with a much better solution. |
81096ef
to
52f2812
Compare
(rebased) |
+1 for merge. I just got bit by this again, and it needs fixing. |
from .externals.six import iteritems | ||
|
||
__all__ = ['Pipeline', 'FeatureUnion'] | ||
|
||
|
||
# One round of beers on me if someone finds out why the backslash | ||
# is needed in the Attributes section so as not to upset sphinx. | ||
iff_final_estimator_has_method = make_delegation_decorator('_final_estimator') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to give this name instead of just using the function?
I am also in favor of the general approach, but I am not 100% sure I understand why the implementation looks like it does. |
class _IffHasAttrDescriptor(object): | ||
def __init__(self, fn, attr): | ||
self.fn = fn | ||
self.get_attr = attrgetter(attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the benefit of this way compared to storing attr
and calling getattr(obj, attr)
?
It seems to me you can simplify def _iff_attr_has_method(prefix, fn):
attr = '%s.%s' % (prefix, fn.__name__)
out = _IffHasAttrDescriptor(fn, attr)
update_wrapper(out, fn)
return out Other than that 👍 for merge from me, too. |
Actually, the |
Doing return lambda fn: _IffHasAttrDescriptor(fn, '%s.%s' % (prefix, fn.__name__)) in |
@jnothman what do you think about that simplification? |
return out | ||
|
||
|
||
def _iff_attr_has_method(prefix, fn, name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have the impression that this function is used only once. Thus maybe not all the genericity is required (for instance, name is always None, so it can be removed). I think that by focusing on little things like this, the code can be made simpler.
Lots of great ideas here, and impressive feat of design. However, I must say that I am worried about the code of the PR: it really is too much meta-programming code, that is hard to understand (I spotted in the discussion at least one person who said he was not following well). I think that going along this direction is dangerous, and I am -1 on merging. This is the kind of code that we wrote a lot in mayavi2. The core devs (2 people, including me) were really enjoying it. We had a lot of features from such a meta-programming model. That got us a lot of users. But nobody converted to being a core dev. Such type of code is not good for including new people. To give specific general advice, I would strive to: avoid decorators as much as possible (there will be some, in the final solution), avoid 'partial', and functools as much as possible. More specifically, I have the feeling that some indirection can be avoided. For instance, a line like 'iff_estimator_has_method = make_delegation_decorator('estimator')' that actually defines a a decorator using a factory, which is in a different file, is very hard to follow. Multiple indirections are just something that the human brain is not good at. _IffHasAttrDescriptor is a class, that is used only once, inside _iff_attr_has_method, that is itself used only once, inside make_delegation_decorator (@amueller made a suggestion that removes an indirection, actually). Here is a high-level suggestion to make the design easier: instead of decorating existing method, rename them: predict_proba would become _predict_proba. Then you can write a simpler property, because it has less work to do. Maybe it is enough to write it explicitely: @property def predict_proba(self): self.final_estimator.predict_proba return self._predict_proba def _predict_proba(self, ...) I can see that the argument against this is to avoid repeating oneself: variants of these lines of code will be in many places in the code base. But, to replace these 4 lines of fairly simple code by 1 line (thus gaining only 3 lines), this PR adds a very non trivial file with 133. Thus, in terms of mere lines of code, the pattern should occur 44 times for the meta-programming to be worth it. This count is actually probably dishonest, because the code above needs at least one line of comment. However, I think that even if it happened more, I would still prefer the low-tech solution, because it is easier to understand and maintain. |
@GaelVaroquaux are you talking about the tests? |
Doh! You are write, I read the wrong file. Those tests are actually super useful, they need to be kept. Maybe I'd like to see you suggestion coded. I don't have an exact feeling of how much it clears things up. |
I am also undecided on the current solution. The code is compact, well tested but hard to follow. Using @GaelVaroquaux's explicit (property + private method) pairs on the other hand, while explicit introduce many redundant lines. I am also curious to see how the simplifications suggested by @amueller would look.
Edit: sorry the above does not make sense. |
See #3982. |
We don't need if not hasattr(obj, self.attr):
raise AttributeError |
As far as I know, that doesn't work. |
Curiously the |
Oh of course. The attrgetter was intended to handle nested names. Forgot. And yes, this sphinx thing is has the potential to get messy :( On 20 December 2014 at 07:56, Andreas Mueller notifications@github.com
|
Superseded by #3982, which was just merged. |
Supersedes #2019 to fix #1805, with a more readable (?) reimplementation, and an extension of the test to fix #2853.
This patch ensures that:
GridSearchCV
RandomizedSearchCV
Pipeline
RFE
andRFECV
have
hasattr(metaest, method) == True
iff the sub-estimator does for the set of standard methods:inverse_transform
transform
predict
predict_proba
predict_log_proba
decision_function
score
(with some exceptions where delegation doesn't apply).
To fix #2853,
hasattr
must be True before the metaestimator isfit
, and if the delegating method is called before fit, an exception will be raised, as with other un-fit estimators.Some alternatives are posed here and below:
if hasattr
and nested function ([MRG+1] Ensure delegated ducktyping in MetaEstimators #2854 (comment))