-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[WIP] Add return_std
option to ensembles
#5532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/ensemble/bagging.py
Outdated
@@ -856,11 +855,13 @@ def __init__(self, | |||
random_state=random_state, | |||
verbose=verbose) | |||
|
|||
def predict(self, X): | |||
def predict(self, X, with_std=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GP module uses return_std.
with_std
option to ensemblesreturn_std
option to ensembles
# its standard deviation | ||
x = np.atleast_2d(np.linspace(0, 10, 1000)).T | ||
|
||
regrs = {"Gaussian Process": GaussianProcessRegressor(alpha=(dy / y) ** 2), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmetzen is this correct (was previously nugget
)? Shall we reuse the same theta0
, thetaU
and theta0
that were set previously? The plot is a bit different for GP now.
017f412
to
8f604b6
Compare
…t and Bagging regressors REFACTOR Add option with_std to GaussianProcess.predict This is consistent with the interface of RandomForestRegressor.predict The old way of requesting the predictive variance via eval_MSE is deprecated REFACTOR Tests and examples of GaussianProcess use with_std instead of eval_MSE ADD Example comparing the predictive distributions of different regressors DOC Improved documentation of with_std parameter of predict() method FIX Bug in BaggingRegressor using _parallel_predict_regression DOC More consistent documentation of optional return-value y_std of predict DOC Updated doc of predict() of BaggingRegressor and RandomForestRegressor ENH Extending example plot_predictive_standard_deviation.py
c9c43f4
to
27b5859
Compare
Conflicts: sklearn/ensemble/bagging.py
27b5859
to
2e93e4d
Compare
Here is the current output for the example, using bagged extra-trees and GPs. There is one important thing that I need to clearly mention somewhere: (In particular, since a random forest is consistent, the larger the training sample, the more its sampling variance will tends towards 0, since the forest will tend towards the "Bayes" model (i.e., always predict the true mean output value).) |
Also, if we want instead to model the conditional distribution of the output in forests, then we would have to switch to quantile regression forests, which would require significant changes in the forest code. |
@jmetzen In light of my previous comments, I would suggest removing the comparison with respect to the mean log probability of noise-free samples. In particular, increasing the size of the training data makes it possible to make this value arbitrarily low since the predictions of the forest will tend towards the noise-free values themselves with arbitrarily high probability. (Again, this stems from the fact that GP's returned stds and Bagging returned stds correspond to different quantities...) |
A better example might be to merge this one with There is just one thing that I dont link regarding the API, which is that GP's and GBRT's cannot currently be used in the same way to compute prediction intervals. I'll make a proposal to update GBRT's API. |
4e3f802
to
523bbf9
Compare
I agree that the comparison of GP and bagging with regard to the mean log probability should be removed as the returned stds correspond to different quantities. Having a unified interface for GP and GBRT for return_std would be great and the comparison of GPs and quantile regression with GBRT would be nice. Not sure how much effort this would be? Regarding your last question: When an estimator "only" supports returning the sampling standard deviation, we could use the keyword |
Can we split this PR separately for RandomForests and other ensembles? Because I want to implement SMAC and I feel it would be great to have a |
Oh sorry, i totally forgot about this PR. Yes, please go ahead. We should just agree on the semantics of |
This PR looks awesome. Not a very helpful comment but nonetheless... :) |
Great. I'll have a look over the weekend. |
@MechCoder I am interested in SMAC (or tree based black box optimisation), can we work together on that? |
@glouppe I am okay with both the options, that is either On a related note, should we add |
@betatim Sure. I plan to implement a separate repo with a |
Variance comes from various sources and it should be clear which one we are referring to. In our case, there are two main source variances:
In the case of SMAC and other model driven approaches, I believe what we are looking for is a measure certainty of the prediction, in particular in regions where you have not yet sampled. So ideally, it is certainly a mix of both sources of variance... not sure what is best. Would be worth exploring in practice on a few problems.
Nice! i would love to contribute too. I am willing to give some help and explore a few things regarding tree-based approaches. We have nice applications @betatim and I at CERN :) |
Is it straightforward to model the conditional distribution of new data given the training data in case of RandomForests (your first option)? I'm asking because in GP's it is easier to interpret because it is a conditional multivariate Gaussian |
Yes, the proper way is to do it through quantile regression (which is not currently support in our RF implementation, but is available already in GBRT). It requires some work to have this in RF, but that is not that difficult to do either. |
@glouppe @betatim The repository is here https://github.com/MechCoder/BlackBox. I named it BlackBox because of my low creativity levels. Right now there is support for GP-based minimizers. It seems to work according to the tests but is slow. I would be obliged to give you push access if you want to push directly and we can move it to somewhere more noticeable later. (I also think we can do away with the MRG+2 rule for now ;) ) |
@glouppe is this PR still relevant? Or will this live in scikit-optimize? |
@amueller Not sure we converged about what quantities we would like to return.
|
I was gonna close this citing scikit-optimize, but it seems scikit-optimize hasn't been maintained for the past 3 years and the repo is now in archive mode. @lorentzenchr WDYT of this? |
scikit-optimize (and projects it relies on) hasn't been maintained for ages. Please don't send people there they end up making me feel guilty for no longer maintaining it :-/ |
From https://arxiv.org/abs/1311.4555
So this measures sampling variance and what a user might expect is the std error of the prediction. So would rather close as "not solve". |
Works for me. |
This supersedes #3645.