-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Quantile Regression Forest [Feature request] #11086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
FWIIW, there is the implementation of scikit-garden: I could also spot an example which is linked to the interval of the prediction in GBRT: @MechCoder @glouppe Was there any discussion on making a change in the tree API to handle outputting the std. dev. ? |
@glemaitre , Thanks for the information. Is scikit-garden in anyway associated with scikit-learn? I've tried it and it is a nice implementation but seems a bit inactive lately in terms of development, updating issues or release updated version into PyPi. I am not sure whether it is a better (or worse) idea to gather resource and put in one place for better development and maintenance, if the methods are so related. |
@MechCoder @glouppe are contributors in both project. I would say that this is a Actually @betatim Could know something as well about the project? |
scikit-garden doesn't lack people who want to maintain it, but it lacks people who have the time to maintain it :) If someone wants to help out I think they would be welcomed with open arms and given a lot of authority. If you want to return the std dev together with predictions that would require adjusting the scikit-learn interface or adjusting it. So I think being freed from that by being a different package (scikit-garden) is a good thing. Put another way: the amount of effort to change scikit-learn is way larger than helping update scikit-garden. Quantile regression in GBRT works well but I always end up writing a small wrapper to bundle three estimators together so that I can return the std dev https://github.com/scikit-optimize/scikit-optimize/blob/1c4c0f12ad8c1fe0a33542108fc0e55164138f9d/skopt/learning/gbrt.py#L14 In scikit-optimize we also have other forest based quantile regression: https://github.com/scikit-optimize/scikit-optimize/blob/1c4c0f12ad8c1fe0a33542108fc0e55164138f9d/skopt/learning/forest.py |
Dear scikit-learn maintainers and contributors, although this is a bit of an old issue, I would like to reach out to you for exactly this class of models. This might become a bit of a long post, but please bear with me. I have recently attempted using the scikit-garden implementation of these models, but they seem not suitable for data with more than around 10000 samples. Since then I have attempted on rewriting the implementation (to improve performance) and to reach out to the scikit-garden maintainers. Sadly, the package seems to not be maintained anymore (last commit to master is four years old). As I continued working on the model, I wanted to see if it would be possible to contribute it directly to scikit-learn, as it is fully based on (and consistent with the interface of) scikit-learn. I believe it is worth including it in scikit-learn as it does something quite different from the only other quantile method currently implemented (GradientBoostRegressor): it predicts conditional quantiles from a single model, i.e. you only need to train it once and can then predict all quantiles. (The trade-off is that it is a less precise estimate of the conditional quantile) I have currently two implementations: 1) from the original paper, 2) taking the concept of the quantregForest implementation. The first one groups together the weighted samples from all trees for each prediction and calculates the weighted quantile. The second one assigns a weighted random draw from the values in each leaf and calculates the quantile on these random samples. The second one is obviously an approximation, but is very substantially faster. Here I include a figure that compares both approaches with the standard scikit-learn Random Forest (with MAE criterion -> which yields the median) and with GradientBoostRegressor to compare the 90th percentile (on the Boston dataset).
This method yields exactly the same results as the original scikit-garden implementation and passes their unit-tests (with some slight modification for different properties). However, from here I don't know how to proceed. I am new to collaborating on community maintained packages and I don't really know what to do/ who to approach with this, therefore my extensive comment here. I have committed my code to my fork of scikit-learn (https://github.com/jasperroebroek/scikit-learn), but it seems a bit premature to actually call a PR. Also, I am currently using numba for speedy for-loops. As scikit-learn is using Cython I suppose this additional dependency would not be very well received. I have attempted to rewrite it in Cython, but I have not yet succeeded. So, if this is of interest for the scikit-learn community, I would love to receive some guidance on the path to follow from here. Note 1 Note 2 |
FWIW regarding this issue, there is an actively maintained, scikit-learn compatible/compliant Quantile Regression Forest implementation available here: https://github.com/zillow/quantile-forest |
Hi, Scikit-learn owners and contributors,
I am wondering whether scikit-learn want to implement the quantile regression forest: http://www.jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf.
It is a quite highly cited paper, and already has an R package. https://cran.r-project.org/web/packages/quantregForest/quantregForest.pdf
This method has been widely used in various quantile regression problems.
From implementation perspective, it is also a natural extension of random forest, given scikit-learn already has a good random forest implementation. Most of the computation is performed with random forest base method.
In addition, R's extra-tree package also has quantile regression functionality, which is implemented very similarly as quantile regression forest. So if scikit-learn could implement quantile regression forest, it would be an relatively easy task to add it to extra-tree algorithm as well.
Please let me know if it is possible, Thanks.
The text was updated successfully, but these errors were encountered: