-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Gradient Boosting Classifier CV #5689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for working on this! This is indeed a very welcome addition :) |
ping us when you want us to start reviewing |
@glouppe @arjoly @pprett @amueller
|
@@ -146,7 +146,7 @@ | |||
|
|||
symbols, names = np.array(list(symbol_dict.items())).T | |||
|
|||
quotes = [finance.quotes_historical_yahoo(symbol, d1, d2, asobject=True) | |||
quotes = [finance.quotes_historical_yahoo_ohlc(symbol, d1, d2, asobject=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line should be modified.
You don't seem to use the warm option which is crucial to get good speed performance. |
parameter `n_estimators` for the gradient boosting model | ||
""" | ||
|
||
params['warm_start'] = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arjoly the warm_start
parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we set this True by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified IRL
I'm surprised this didn't fail the tests with kwargs |
42f0fd7
to
3620f90
Compare
I have absolutely no idea why these tests are failing on Python 3.5. I haven't removed or changed any part of the code, just added. Moreover, I ran the tests locally in a Python 3.5 conda environment and all tests passed. Any ideas @MechCoder @amueller ? |
8366e44
to
5509e46
Compare
I'm gonna give a shot at reviewing this! |
@rvraghav93 Please, be my guest. |
I think separate classes would be the way to go. |
I fail to understand how. Could you explain a bit? |
cross-validation folds | ||
* ``cv_validation_scores``, the list of scores for each fold | ||
|
||
best_score_tuple_ : named tuple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this different from GridSearchCV
? (not best_score_
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cause apart from the score, I thought of adding attributes telling the user how his/her score varies with with train/test split. For example, if it varies significantly, something might be wrong
@rvraghav93 |
Ah okay that makes sense now... But still the BTW are you in the middle of exams? If so I could ping you later. |
@rvraghav93 |
If I Understand Correctly. |
@rvraghav93 I am not sure, because both the classes use significantly different number of estimators. |
Parameters | ||
---------- | ||
n_stop_rounds : int, optional, default=10 | ||
If the score on the test set rounded off to `score_precision` decimal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use rather "validation set" instead of "test set" throughout this class. This class is about cross-validation for model selection. We are not allowed to use the "final" test set here as our interest is not model evaluation per-se but model selection.
Also could you please add GradientBoostingRegressorCV ? I see no reason to include only classification and not regression in this PR. |
@vighneshbirodkar Are you planning to continue this anytime soon? Or if it's okay by you I could lend a hand. (Either cherry-picking your commits or push access to this branch?). I have a couple of weeks to work on this. |
@raghavrv. I don't think I will be working on this anytime soon. Which of these 2 ways is more convenient to you ? Most of Olivier's comments are minor refactoring changes. About the benchmarking for threading backend, do you plan to do it now ? |
If it's okay by you I can cherry-pick into a new PR?
Yes... |
@jnothman Should we also use a dict of np (ma) arrays here like we did at |
Also ping @amueller reg this... |
params['warm_start'] = True | ||
gb = estimator(**params) | ||
scorer = check_scoring(estimator, scoring=scoring) | ||
scores = np.ones((stop_rounds,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be np.full((stop_rounds,), -np.inf)
?
wonderful!!! |
replaced by #7071 |
A reincarnation of #1036 with early stopping. I will add the documentation and an example soon
The early stopping API is inspired by that of xgboost