Closed
Description
Describe the workflow you want to enable
I think it would be awesome if the RF regressor returns the standard deviation (not only the mean) of the output of the different trees.
Describe your proposed solution
This is not a definite option, but contains the core idea:
from sklearn.ensemble import RandomForestRegressor
import numpy as np
class RandomForestWithUncertainty:
def __init__(self,**args):
self.model = RandomForestRegressor(**args)
def predict(self, X, return_std = False):
pred = np.array([tree.predict(X) for tree in self.model]).T
pred_mean = np.mean(pred, axis=1)
if return_std:
pred_var = (pred - pred_mean.reshape(-1,1))**2
pred_std = np.sqrt(np.mean(pred_var, axis=1))
return pred_mean, pred_std
else:
return pred_mean
Describe alternatives you've considered, if relevant
No response
Additional context
This is useful if we want to use the RF in algorithms that require uncertainties estimatios, such as Bayesian Optimization.