Draft implementation of non-parametric quantile methods (RF, Extra Trees and Nearest Neighbors) #19754

jasperroebroek · 2021-03-23T09:56:07Z

Dear scikit-learn maintainers and contributers,

this PR aims to provide a draft implementation for Random Forest Quantile Regression (https://jmlr.org/papers/volume7/meinshausen06a/meinshausen06a.pdf) and KNN Quantile Regression (https://www.tandfonline.com/doi/abs/10.1080/02664763.2015.1070807?journalCode=cjas20&journalCode=cjas20). Both methods are widely used, and would be an interesting addition to scikit-learn as they can estimate conditional quantiles without having to fit each quantile separately (thus providing significant speedups). I started the development of these algorithms from the implementation of scikit-garden, which seems to not be maintained anymore.

Reference Issues/PRs

Fixes #11086
Takes some steps of #18997

What does this implement/fix? Explain your changes.

Random Forest Quantile Regression
Extra Trees Quantile Regression (which uses the same implementation as QRF)
KNN quantile regression
A weighted_quantile function, which aims to mimic np.quantile with the inclusion of a weight parameter.

Any other comments?

This is very much a draft PR. Although the code is documented, I did not yet create the actual documentation for these additional algorithms (as the API is not yet fixed). Additionally, if these methods are approved the partial_dependence functions need to be changed (to pass the quantile parameter to the predict function).

Note that some parts are currently optimised with numba, rather than Cython. I am still attempting to do this conversion myself, but any help on this would be great.

jnothman

Thanks! This looks quite nice after a glance at qrf.
Concerns:

We ordinarily avoid additional parameters to methods like predict. The exception has been return_std etc, and maybe this could fall under the same allowance.
Should we support predicting at multiple q simultaneously? For an array like of q, do we return a matrix/tensor of results, or a list of vector/matrix?
I don't know what current thoughts are on numba, but I think it would be a big step to introduce a hard dependency on it. Certainly, this requires a core developer vote according to our governance document. It might even deserve a vote if we make it a soft dependency. The outcome could be rewriting those functions in cython
I'm somewhat concerned about maintaining two random forest implementations in parallel. I assume you've attempted to share as much code as possible, but we should also share tests where possible.

Maybe you can attend our core devs meeting on Monday to raise some of these. (Advertised on our mailing list. I would be more helpful if I were at my desk!)

jasperroebroek · 2021-03-26T10:45:06Z

Thanks for having a look at the material!

The concerns that you raise are indeed issues that would require some more thoughts. For now I just wanted to get an opinion on the API and to see if you would be interested in integrating these methods. If so, it would still require some work to smooth over some of these issues (therefore the draft PR).

On the concerns:

I understand that this would be a major deviation from what is already out there. I do, however, not see a possible other way (besides including a setter for q before calling predict). I am not fully aware of the consequences of deviating from the template for the predict function. So far I attempted to insert a dictionary argument in the partial_dependence functionality to tackle this issue, but there might be others that I have missed.
I would be in favour of doing that, which would make it consistent with what numpy does in np.quantile. It would require some more thought on how to actually achieve this, but I believe it should be feasible.
Creating a new dependency on numba, while Cython is already used, does not make a whole lot of sense from my perspective.
The implementation under the hood is exactly the same as the regular random forest. Some additional work is needed after fitting, but the base is the same. All tests for random forest that do not require predict should yield the same result for QRF (at least in its default implementation).

I will try to attend the meeting on Monday to discuss this in more detail .

lorentzenchr · 2021-03-28T21:12:51Z

@jasperroebroek Thank you for this draft PR. Some more options for quantile regression would certainly be beneficial.
I only skimmed over the Random Forest Quantile Regression and now have a few questions and remarks:

Is it correct that single decision trees are fitted based on the squared error (MSE) splitting criterion? Does that make sense if our goal is to predict conditional quantiles?
IIUC, all quantiles are predicted with the same trees with the same splitting thresholds. One argument in linear quantile regression is that features may have very different effects (coefficients) when fitted for different quantiles. Wouldn't this one threshold for all weaken the predictiveness for, at least, some quantiles?

jasperroebroek · 2021-03-29T15:38:07Z

@lorentzenchr thank you for taking the time to review.

So, let me try to answer your questions/remarks. Quantile Regression Forests are indeed fitted with exactly the same approach as normal Random Forest would. From there it does some postprocessing, but the core is the same (with MSE splitting criterion etc.). In his paper, Meinshausen (see the link above) describes why, and under which conditions, this yields credible conditional quantiles. In short: pretty much all the time. After testing this a little, I believe that there are, as you describe, some constraints on the use of this methodology. This being that locally the mean should at least not be strongly negatively correlated to the quantile that you are trying to predict.

The issue #18849, comparing QRF and gradient boosting shows some figures comparing the different approaches for fake sinusoidal data (#18849 (comment)). It shows that the methodology works quite well for both the upper and lower quantiles, even without any hyperparameter selection. (PS: I just realised you were the one commenting on the post, so ....)

In contrast, Random Forest could be trained with a quantile loss function, yielding probably more accurate conditional quantiles. QRFs are likely faster and more flexible. I think it would be great to have them both in sklearn (for the user to choose between these tradeoffs).

…rameters

jasperroebroek · 2021-04-02T16:33:34Z

After our discussion on Monday I have tried to take the first steps in the cythonisation of the code. Although the code that I just pushed runs and produces satisfactory results, it is terribly slow in comparison to the previous version (which used numba for compilation). I believe it is now even slower than the original python/numpy implementation.

What I have:

A weighted_quantile function that takes pre-sorted data but does (almost) not require the GIL.
A weighted_quantile function that can do the sorting but does require the GIL (but I am not sure if this would be necessary)
A first draft of a compiled _quantile_forest_predict.

What is still missing:

The _quantile_forest_predict function does not compile with cdef with specified memoryviews (for some reason that I can't seem to figure out).
The main code of this function can't be parallelised because of the previous point. Additionally, I can't seem to figure out how to store the summed weights in each realisation of the parallel loops as a separate object.
The second implementation (similar to quantregForest) would be easiest to implement by creating a subclass of the MSE criterion. The only difference needs to be the node_value function that instead of the weighted mean returns a weighted random sample of the data in the node. I attempted this, but I have no idea on that would work with the random values inside the criterion.

I have put a lot of comments and thoughts in the code, so hopefully it is still intelligible.

In short; I would really need some help from someone with more Cython experience to make this work. Would someone have time and energy to go through this?

jasperroebroek · 2021-04-06T10:06:39Z

Another update on the quantile regression forests. I have converted the code completely to Cython and changed the API as discussed (by not including the quantiles in the predict function, but by setting it before). It currently runs smoothly and does what it is supposed to do, including predicting an array-like of quantiles simultaneously. If you agree on the API and implementation I would be ready to switch to a PR (instead of a draf one).

… quantile regression

jasperroebroek · 2021-04-21T08:53:24Z

Another update on QRF. I added the mean pinball loss as the score function of the Regression Forest Quantile Regressor. I have also uploaded an example of the QRF in action. Note that this is almost a direct copy from the new example of Gradient Boosting Quantile Regression. The workflow, results and conclusions are identical and the final resulting coverage fraction is even better reproduced by the QRF. I have changed the status of this PR from draft to regular. Happy to discuss further on what needs to be added before it can be accepted.

…overhead from calling GIL for small array creation

…spead and readability

tim-oh · 2021-06-21T16:27:14Z

I tried to install the package (in a conda env w/ git clone & python setup.py install), but get
ImportError: cannot import name 'RandomForestQuantileRegressor' from 'sklearn.ensemble' (/INSTALL_DIR/lib/python3.8/site-packages/sklearn/ensemble/init.py)

Should that work in principle?
Can I fix it?

jasperroebroek · 2021-06-21T17:05:51Z

I believe that that is caused by an incorrect compilation of the cython source files. Not sure why that happens.

The workaround I have been using is to compile the right files after the installation of sklearn. To do this create a second setup file next to the sklearn setup file and call it 'setup_qrf.py'. Put the compilation code for the QRF extension inside (this is an example, I haven't tested this):

from setuptools import setup
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy

extensions = [Extension("sklearn.utils.weighted_quantile", ["sklearn/utils/_weighted_quantile.pyx"]),
              Extension("sklearn.ensemble.qrf", ["sklearn/_qrf.pyx"])]

setup(
    ext_modules=cythonize(extensions),
    include_dirs=[numpy.get_include()],
    extra_compile_args=['-fopenmp'],
    extra_link_args=['-fopenmp']
)

in the terminal compile with python setup_qrf.py build_ext --inplace

Now you should be able to import the RandomForestQuantileRegressor class.

Let me know if this works for you!

ryancinsight · 2021-07-01T22:24:44Z

@jasperroebroek on windows i am getting the error Buffer dtype mismatch, expected 'long' but got 'long long', seems like there are other packages with similar issues where they had to resort to libc type default instead of numpy's

ryancinsight · 2021-07-02T19:32:47Z

@ryancinsight I don't think it is related to the weighted_quantile module, at that is not using any integer type data. If I am correct it is caused by assigning np.int64 explicitly outside the cython code, while referring to it through the same mechanism as the system 'long' type. Again, I am not able at the moment to test this out on a windows machine. Sorry for the inconvenience. I would be very interested to know if this bug-fix actually resolved the issue. Thanks for testing ;)

still there but now: File "sklearn\ensemble_qrf.pyx", line 547, in sklearn.ensemble._qrf._RandomSampleForestQuantileRegressor.fit
ValueError: Buffer dtype mismatch, expected 'SIZE_t' but got 'long'

ryancinsight · 2021-07-02T19:39:44Z

@jasperroebroek I found the previous issue though, it was idx in your sample method fit. it needs to be changed to idx = np.arange(self.n_samples_,dtype=np.int64)[mask] and it no longer gave me an issue. i just tested with np.arange(self.n_samples_,dtype=np.intp)[mask] and the new version also works and slightly faster it seems.
Best

ryancinsight · 2021-07-02T19:57:58Z

Interesting, default is now faster than sample which you say should be faster. Was this fixed? Not sure there is a need for two versions if default is faster now

ryancinsight · 2021-07-03T06:26:05Z

Nevermind, seems multiple quantiles slows down the default method compared to samples method on the predict...wonder if there is a way to improve this because the results seem more accurate on default

jasperroebroek · 2021-07-06T08:13:29Z

@ryancinsight So if I understand it correctly, to solve the issue only line 539 needs to change into idx = np.arange(self.n_samples_, dtype=np.intp)[mask]?

Anyway, speaking about performance difference between the two implementations. The default is more precise, but becomes prohibitively slow on datasets between 10000 and 100000 samples. In my benchmark runs it is actually faster than the sampling approach for small datasets (100s/ 1000s of samples). With more than 1 quantile, the sampling method was almost always faster than the default implementation. Do you confirm these observations?

Anyway, thanks for checking out the code!
Cheers

ryancinsight · 2021-07-06T11:50:09Z

@jasperroebroek that is correct with line 539 dtype being set everything now runs smoothly on windows 10.

I can also confirm your observations on performance.

Best regards, great work!

Nbruneau9 · 2021-12-06T10:01:33Z

Hello, I see that this thread has not been updated since July. Is this implementation of QRF planned to be integrated into sklearn in the future ? It would be very useful. Thanks. Nicolas

jasperroebroek · 2022-01-18T09:54:02Z

For those interested: I have published these methods as a separate package as a temporary place for development:
https://github.com/jasperroebroek/sklearn-quantile

ogrisel · 2022-01-26T08:49:49Z

For those interested: I have published these methods as a separate package as a temporary place for development:
https://github.com/jasperroebroek/sklearn-quantile

Thanks for your doing this, to increase adoption, it would be worth publishing a release on PyPI & conda-forge as well as publishing the rendered HTML doc on gh-pages for instance.

Also you should better highlight in the README when those methods are more interesting that quantile gradient boosting for instance: in particular the fact that they are more efficient to for people who want to estimate many quantiles at once for a given input (e.g. all the percentiles, to get some coarse estimate of the CDF of the predictive distribution).

I think this would be interesting to get those methods upstream in scikit-learn at some point but I must confess we are running a bit short on reviewer bandwidth to work on such big new refactoring.

jasperroebroek · 2022-02-04T17:37:50Z

I have now pushed the project to Pypi, so it should be installable with pip. I am still trying to figure out how to get it on conda-forge too. Documentation (although really barebones) is available on: https://sklearn-quantile.readthedocs.io/en/latest/?badge=latest. I hope that this is enough for some initial testing!

The code has now significantly diverged from this pull request. It still very much follows the structure of scikit-learn, so it should be relatively straightforward to get it into the main scikit-learn repository of needed.

Bougeant · 2022-02-08T21:23:19Z

Hi @jasperroebroek

First of all, awesome work! I have been looking for a replacement for scikit-garden which has been inactive for a few years and which is falling behind on scikit-learn (compatible with <0.23 only).

I have questions regarding the predictions made using the 2 models (Random Forest from scikit-quantile / scikit-garden, versus the Gradient Boosting from scikit-learn / lightGBM).

Indeed, these predictions seem to be very different from one another.

Here's some code to demonstrate this:

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor, RandomForestRegressor
from sklearn_quantile import RandomForestQuantileRegressor

housing = fetch_california_housing()
X, y = housing.data, housing.target
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, random_state=0)
quantile = 0.05

# Gradient Boosting Conditional Mean Model 
# Models have different hyper-parameters to have similar test and train scores.
param_gb_mean = {"n_estimators": 100, "max_depth": 4, "min_samples_leaf": 60, "max_features": None, "random_state": 1}
gb_mean = GradientBoostingRegressor(**param_gb_mean)
gb_mean.fit(X_train, y_train)
y_mean_gb = gb_mean.predict(X_test)

# Gradient Boosting Quantile Model 
param_gb_quantile = {"loss": "quantile", "alpha": quantile, "n_estimators": 100, "max_depth": 4, "min_samples_leaf": 60, "max_features": None, "random_state": 1}
gb_quantile = GradientBoostingRegressor(**param_gb_quantile)
gb_quantile.fit(X_train, y_train)
y_quantile_gb = gb_quantile.predict(X_test)

# Random Forest Conditional Mean Model 
param_rf_mean = {"n_estimators": 100, "max_depth":10, "min_samples_leaf": 10, "max_features": 5, "random_state": 1}
rf_mean = RandomForestRegressor(**param_rf_mean)
rf_mean.fit(X_train, y_train)
y_mean_rf = rf_mean.predict(X_test)

# Random Forest Quantile Model 
param_gb_quantile = {"q": quantile, "n_estimators": 100, "max_depth": 10, "min_samples_leaf": 10, "max_features": 5, "random_state": 1}
rf_quantile = RandomForestQuantileRegressor(**param_gb_quantile)
rf_quantile.fit(X_train, y_train)
y_quantile_rf = rf_quantile.predict(X_test)

Now you will note that while the average predicted conditional mean values (y_mean_gb and y_mean_rf) are within 0.2% of each other, the average predicted 5% quantiles are quite far apart (10% higher with the gradient boosting model, 6% after recalibrating both models so that exactly 5% of samples are over-predicted). I am seeing exactly the same thing on another dataset I'm interested in but with stronger effect (+15% for the gradient boosting model after recalibration).

According to the pinball quantile loss which is the most commonly used quantile regression metrics, the performance of the 2 models is equivalent (slightly better for the GB model in this case, but better for the RF model in the case of my dataset of interest after spending some time optimizing each model).

More significantly the shape of the predicted quantiles as a function of the predicted conditional mean are wildly different both for this dataset and mine (banana shape for the RF model).

For the Gradient Boosting model:

fig = plt.figure(figsize=(12, 8))
sns.scatterplot(x=y_mean_gb, y=y_quantile_gb, alpha=0.1, s=10, linewidth=0)
plt.plot([0, 6], [0, 6], c="black", dashes=[5, 5])
plt.xlim(0, 6)
plt.ylim(0, 6)
plt.xlabel("Conditional Mean")
plt.ylabel("15% Quantile")
sns.despine()

For the Random Forest model:

fig = plt.figure(figsize=(12, 8))
sns.scatterplot(x=y_mean_rf, y=y_quantile_rf, alpha=0.1, s=10, linewidth=0)
plt.plot([0, 6], [0, 6], c="black", dashes=[5, 5])
plt.xlim(0, 6)
plt.ylim(0, 6)
plt.xlabel("Conditional Mean")
plt.ylabel("15% Quantile")
sns.despine()

I was thinking that this is not due to the difference in the type of model itself (Random Forest vs Gradient Boosting: the conditional mean models are very similar) but rather to how the quantile models are built. My understanding is that:

The Random Forest model
- is built by splitting the branches/leaves in order to optimise MSE or MAE, not the quantiles
- stores all the samples from the training data in each leaf during training. During inferences, it generates a distribution of the training samples sharing the same leaves and selects the X% quantile as the prediction
The Gradient Boosting model
- is built by splitting the branches/leaves in order to optimise the selected quantile loss (one would think that this guarantees a better model but that's not what I'm seeing)
- (correct me if I'm wrong) during training we store the selected quantile associated to each leaf and during inference we take the average of the quantiles for the leaves we fall into.
- One obvious drawback of the GB model is that for multiple quantiles we need independent models, so nothing guarantees that the quantile predictions are monotonic (higher quantile = higher value), unlike the RF model

Can we explain why these quantile predictions are so different? Why the Gradient Boosting model routinely predicts average quantiles 10% larger than the Random Forest model?

@lorentzenchr, this may also interest you since it is somewhat related to a debate we had a year ago.

jasperroebroek · 2022-02-10T09:47:59Z

Hi @Bougeant,

Thanks for reaching out and testing the package. I am very glad to hear that you have managed to install and run it without any issues!

Concerning the difference between Gradient Boosting quantile regression and Random Forest quantile regression. I am not really well versed in how Gradient Boosting works under the hood, except that it uses the pinball loss for training. If your list is accurate, it wouldn't surprise me that the lower quantiles in GB are predicting higher values than RF. Averaging predicted quantiles would always be more conservative (closer to mean) in comparison to calculating the quantile over all data informing the prediction. (That would suggest lower predictions for GB for high quantiles). To test this hypothesis one might create both GB and RF with equal parameters, and only one tree. In this case the only difference between the models should be in the training phase, one going for MSE and the other for the lowest pinball loss.

The difference in training on MSE or pinball loss very much depends on the structure of the data. Locally, i.e. where the algorithm tries to split the data, both algorithms are expected to yield similar results as long as the quantile you are looking at, is effectively positively correlated with the mean. In many cases this assumption is approximately correct, but not always. In those cases where it does not, training on the pinball loss would be much more accurate (which might for example be the case at the end of the banana shape).

To test the accuracy of both models, you might also have a look at K-Nearest Neighbours quantile regression (also available in sklearn-quantile), which does not use bootstrapping to come to a conclusion.

Bougeant · 2022-02-10T16:48:09Z

Great suggestion to try models with one tree. Let me try that.

jasperroebroek · 2022-02-28T07:46:32Z

Another update on this project. It just got accepted in conda-forge so it now works with conda!

ogrisel · 2022-09-07T08:34:11Z

@jasperroebroek out of curiosity, do you use honest trees in your implementation (in-tree validation sets for quantile estimation?). It seems that it can help get better calibration for quantile estimators. See for instance: #19710

jasperroebroek · 2022-09-12T09:47:49Z

This implementation currently uses the default implementation of random forest and extra trees algorithms. This is according to the original research paper by Meinshausen et al. Undoubtedly, using quantile information at the time of training will give improved accuracy, but this will likely be (much) slower and predictions can't be done with the same model for different quantiles.

adam2392 · 2023-01-12T17:05:36Z

Adding some thoughts, cuz this PR is of interest to me. This PR's maintenance burden can most likely benefit from the PRs proposed to refactor trees:

refactor Criterion: MAINT Introduce BaseCriterion as a base abstraction for Criterions #24678
refactor splitter: MAINT Split Splitter into a BaseSplitter and a Splitter subclass to allow easier inheritance #24990
IIRC, another PR would be needed, which has been discussed in monthly dev meetings and OH is the separation of split and leaf node setting in the Tree class (cc @Vincent-Maladiere)

Then instead of a quantile forest Cython file, we could just create a QuantileTree class.

jasperroebroek · 2023-01-23T09:21:42Z

The code could indeed partially benefit from an update to the Splitter.

The code that I have sent here, contains two different strategies to calculate conditional quantiles with random forests. The first and most precise one, the one proposed by the original paper, calculates the quantiles after the random forest is trained with standard Criterion and Splitter. It could be beneficial here to store all values per leaf, rather than only the mean, but the quantile calculation would still need to happen at prediction time and would make sense to me to be kept in its own class.

The second methods takes advantage of sampling the values at leaf scale, which could be done in the Splitter and would be much more efficient. Criterion would still remain the same I assume. In this method, the quantiles are calculated on the outputs of the different trees.

I hope this information helps!
An updated and operational version of the code is available in the sklearn-quantile package.

adam2392 · 2023-01-23T15:43:33Z

@jasperroebroek would you mind having a look at the following PR? I am proposing to separate the API for setting leaf/split nodes in the current Tree Cython code.

Then I included a very short pseudo-code demo of the potential quantile-tree implementation. Lmk if this looks like the right direction to you?

#25448

jasperroebroek · 2023-01-23T17:29:33Z

I just had a look at the code that you wrote on the quantile tree. I am a bit confused by the predict function that you define there. While this, in and by itself, might be valid, the prediction of the quantile regression forest is not based on the idea of an ensemble of predictions by its individual trees, as is the case in standard regression forests. Rather, it pulls all values of the relevant leaves of all trees (and its weights) and calculates a weighted quantile on them. I don't see any other way than doing this operation in the quantile regression forest class itself.

adam2392 · 2023-01-23T18:00:23Z

I just had a look at the code that you wrote on the quantile tree. I am a bit confused by the predict function that you define there. While this, in and by itself, might be valid, the prediction of the quantile regression forest is not based on the idea of an ensemble of predictions by its individual trees, as is the case in standard regression forests.

Yes this is the case of just reproducing a single quantile-tree. However, with a single quantile-tree that also optionally exposes the leaf values, the ensemble method should be nice and short. Rather, I'm asking in terms of the API, whether or not you think this will fit the scope of what is needed by a quantile-tree (forest)? The main goal is to enable a quantile-tree(forest) that simply needs to rewrite a handful of functions, and the rest of the tree building/etc code lives in the base classes.

Rather, it pulls all values of the relevant leaves of all trees (and its weights) and calculates a weighted quantile on them. I don't see any other way than doing this operation in the quantile regression forest class itself.

Yep you're correct. The eventual goal would be to do this, but at the forest level.

jasperroebroek · 2023-01-26T17:21:51Z

This definitely sounds like an improvement over how I initially implemented and could provide some nice speed improvements. I also implemented the sampling method used in the R package, which takes a weighted random value at leaf level and calculates the quantile at forest level. Although this is an approximation, it would it so much faster that in most cases this would be the preferred algorithm. When implemented as a Tree class, that would be trivial I suppose. If you need further input or help with implementation, I am available.

adrinjalali · 2024-04-15T12:27:17Z

I think having this as a separate package is ideal (which we already have), and I think with the work @adam2392 has been doing for making some parts of the codebase more extendable, we could close this one. Happy to reopen if y'all think otherwise.

adam2392 · 2024-04-15T22:29:44Z

For completeness: Linking the package I've come to develop to shore up some of these gaps (maybe in future some aspects of them will make it into scikit-learn): https://github.com/neurodata/scikit-tree

Also note: https://github.com/zillow/quantile-forest

Jasper Roebroek added 5 commits March 16, 2021 18:07

Draft quantile regression forest implementation

dd169d4

Proper weighted_quantile function with axis parameter

489461a

Tests for weighted_quantile

e3ecc39

QRF random sampling based on random state of tree and updated testing

35c63f3

Quantile KNN regressor

49d7433

github-actions bot added module:ensemble module:neighbors module:utils labels Mar 23, 2021

Partial dependence accepting quantile keyword

d657518

jnothman reviewed Mar 25, 2021

View reviewed changes

Jasper Roebroek added 3 commits March 29, 2021 17:55

Updated weighted_quantile function, which now mimics numpy for all pa…

961fc2d

…rameters

last updated weighted_quantiles

a73d256

First steps cythonisation

9552c08

Cython parallel predict for _DefaultForestQuantileRegressor

ca747b6

thomasjpfan added the cython label Apr 13, 2021

jasperroebroek added 2 commits April 21, 2021 09:30

Added mean pinball loss as score function

9654fcd

Example of Quantile Regression Forest, adapted from Gradient boosting…

17acb2a

… quantile regression

jasperroebroek marked this pull request as ready for review April 21, 2021 08:47

jasperroebroek added 2 commits April 27, 2021 14:40

Weighted quantile: custom cython argsort and searchsorted to prevent …

bd904fb

…overhead from calling GIL for small array creation

Transfer parallelisation from OpenMP prange to joblib. This improved …

fe9c491

…spead and readability

jasperroebroek requested a review from jnothman April 28, 2021 17:38

Bug fig: last inconsistencies of types in windows

2f0b4f9

cmarmo added the Waiting for Reviewer label Jan 15, 2022

jasperroebroek mentioned this pull request Jan 18, 2022

Stalled development scikit-garden/scikit-garden#104

Open

ogrisel changed the title ~~Draft implementation of quantile methods~~ Draft implementation of non-parametric quantile methods (RF, Extra Trees and Nearest Neighbors) May 12, 2022

ogrisel mentioned this pull request May 12, 2022

API to predict multiple quantiles at once #23334

Open

adrinjalali closed this Apr 15, 2024

Uh oh!

Draft implementation of non-parametric quantile methods (RF, Extra Trees and Nearest Neighbors) #19754

Draft implementation of non-parametric quantile methods (RF, Extra Trees and Nearest Neighbors) #19754

Uh oh!

Conversation

jasperroebroek commented Mar 23, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman left a comment • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jasperroebroek commented Mar 26, 2021

Uh oh!

lorentzenchr commented Mar 28, 2021

Uh oh!

jasperroebroek commented Mar 29, 2021

Uh oh!

jasperroebroek commented Apr 2, 2021

Uh oh!

jasperroebroek commented Apr 6, 2021

Uh oh!

jasperroebroek commented Apr 21, 2021

Uh oh!

tim-oh commented Jun 21, 2021

Uh oh!

jasperroebroek commented Jun 21, 2021

Uh oh!

ryancinsight commented Jul 1, 2021

Uh oh!

ryancinsight commented Jul 2, 2021

Uh oh!

ryancinsight commented Jul 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryancinsight commented Jul 2, 2021

Uh oh!

ryancinsight commented Jul 3, 2021

Uh oh!

jasperroebroek commented Jul 6, 2021

Uh oh!

ryancinsight commented Jul 6, 2021

Uh oh!

Nbruneau9 commented Dec 6, 2021

Uh oh!

jasperroebroek commented Jan 18, 2022

Uh oh!

ogrisel commented Jan 26, 2022

Uh oh!

jasperroebroek commented Feb 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Bougeant commented Feb 8, 2022

Uh oh!

jasperroebroek commented Feb 10, 2022

Uh oh!

Bougeant commented Feb 10, 2022

Uh oh!

jasperroebroek commented Feb 28, 2022

Uh oh!

ogrisel commented Sep 7, 2022

Uh oh!

jasperroebroek commented Sep 12, 2022

Uh oh!

adam2392 commented Jan 12, 2023

Uh oh!

jasperroebroek commented Jan 23, 2023

Uh oh!

adam2392 commented Jan 23, 2023

Uh oh!

jasperroebroek commented Jan 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adam2392 commented Jan 23, 2023

Uh oh!

jasperroebroek commented Jan 26, 2023

Uh oh!

adrinjalali commented Apr 15, 2024

jnothman left a comment •

edited by TomDLT

Loading

ryancinsight commented Jul 2, 2021 •

edited

Loading

jasperroebroek commented Feb 4, 2022 •

edited

Loading

jasperroebroek commented Jan 23, 2023 •

edited

Loading