-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[WIP] first cut at LambdaMART #2580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
one option we considered with @fabianp to support the query without changing too much the API is to have a 2 columns y where the second column is the query index. It just means that y.ndim can be 2 now. |
@agramfort I think that might be indeed a better solution than having an additional |
Sounds good. Plus, such an API would be compatible with GridSearchCV. In the future, we can create query-aware cross-validation generators. @jwkvam I think I would use a dedicated class for LambaMART. |
Thanks for the feedback, adding the queries into the labels seems reasonable to me. If this has it's own class, I'm trying to think of the ways it would be different than |
In case of ranking with a single query, |
I am not strongly opposed to it but still I find it a bit weird to use integer coding for the relevance scores or floating point coding for the query ids.
Indeed we will need it at some point. |
@jwkvam I think it's ok not to use a the regressor mixing and not introduce a new ranking mixin yet. If we ever implement more ranking models in the future I think it should be ok to devise a new ranking mixin to factorize the common code at that point. |
Also 2d y could be used for multi-output ranking. |
I think we need a general concept of "sample metadata" which are datastructures with a first axis that is
There might be more cases that I forgot. The policy until recently was to pass them as separate variables to the However some parts of our API is currenty broken / too limited. For instance transformers currently do not transform (or preserve) For now I would thus keep the query ids as a separate argument to There is also the problem of feature-wise metadata such as feature names that could be interesting to better preserve or trace when working with |
This is an even better reason not to implicitly stick the query membership info directly as a new column in |
Alright, then |
In any case, |
@mblondel makes sense for some ranking models (eg. RankingSVM) but does it make sense for lambdaMart, does it optimize AUC if you feed it binary relevance scores? |
+1 or maybe even
If you feed LambdaMART with binary relevance scores it just optimizes NDGC with rel in |
The way I see it, ranking with a single query is similar to ordinal regression in the case when the supervision is given as relevance scores (from which you can derive a ranking). In the case of pairwise or listwise methods, the supervision can also take the form of pairwise or listwise orderings of the samples, i.e., without the notion of relevance score. NDCG is nice because it focuses on the top k elements in the ranking whereas AUC takes into account the entire ranking (usually you care more about the top elements). |
ranking = self.loss in ('ndcg') | ||
if ranking: | ||
if query is None: | ||
raise ValueError("query must not be none with ranking measure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @mblondel said, I think we should treat the X
, y
data as stemming from a single query in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reminder, I will get around to it eventually.
Sorry for the lack of updates, admittedly I've been kind of lazy. I benchmarked the code (there are some unpushed changes) against GBM and the performance seems comparable outside of the execution times. I used 50 trees, a depth of 3 for GBRT and interaction.depth of 8 for GBM, learning rate is 0.1 and no subsampling of features or data. GBRT seems to be overfitting slightly more than GBM, this is just a guess but it could be because it's doing some sort of normalization of the gradient [1]. I tried doing something similar out of curiosity but didn't see much difference, it's possible I miscoded it however.
[1] https://code.google.com/p/gradientboostedmodels/source/browse/gbm/src/pairwise.cpp#786 |
@@ -585,6 +665,22 @@ def fit(self, X, y): | |||
X, = check_arrays(X, dtype=DTYPE, sparse_format="dense", | |||
check_ccontiguous=True) | |||
y = column_or_1d(y, warn=True) | |||
ranking = self.loss in ('ndcg') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a ('ndcg',)
here for the single element tuple. I wonder how this code can work otherwise...
@jwkvam could you please rebase this branch on top of master to take @pprett's recent refactoring into account? Also could you write a dataset loader for the MQ2007 and MQ2088 datasets and use them as part of a new example in the |
Also could you please run the MQ2008 benchmark with line_profiler enabled on the |
@ogrisel I rebased so it's on top of @pprett's changes. Other changes include:
Running my same benchmark on MQ2008 takes 6.4 seconds now, I wasn't able to get line profiler working but here are the main culprits. I imagine I could speed up the NDCG code by presorting the labels and pushing the loops into Cython.
I'll add writing up that example to my TODO list. I added LambdaMART to all the tests of the form |
Hi @jwkvam could you please re-run some bench using the |
@ogrisel yes, thanks for your continued feedback. I set max_leaf_nodes=9 which is equivalent to GBM's interaction.depth=8.
|
warm_start=False): | ||
|
||
super(LambdaMART, self).__init__( | ||
loss, learning_rate, n_estimators, min_samples_split, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please hardcode 'ndcg' for now as there is no alternative ranking loss implemented for lambdamart.
I starting testing this implementation performance and validate its results against Ranklib yesterday evenning It runs smoothly with subsample=1.0- but subsample < 1.0 fails with the following error.
Has someone encounterred the same error? |
@poulejapon I was about to report the exact same error when trying on MSLR-10K:
|
No quite the same. To get to mine, I think you need to .astype(np.int32) your group vector. :) |
Btw, sorry to bother you with this design issues. Your contribution is really great! But we really have to be careful in not complexifying our implementations. Otherwise, with time and additional features, it will blow up to the point where nobody can understand what is going on... |
No, I agree completely. I'm not too happy with the new |
@glouppe When you have some time could you look at the refactoring done in jwkvam@12b54ca? Basically I replaced |
I don't see the point in adding **kwargs to the |
You're right, I removed them. |
@jwkvam I had a quick look and it really is less intrusive! Thanks for the refactoring :) Since this is still a WIP, do you need a review of something specific? Do you need help for something? We can give a hand if needed, just ask for it :) |
@glouppe I just haven't had much time available, I'm not looking for any reviews yet. I should probably edit the initial comment to reflect the current status. Basically the plan I had was to
|
Note the yandex dataset is the small yandex 2009 dataset from: http://imat2009.yandex.ru/en/datasets not, the big 2013 kaggle dataset about personalized search. |
Right, that's what I meant. |
for j in range(i + 1, y_true.shape[0]): | ||
if y_true[i] != y_true[j]: | ||
if j < max_rank: | ||
ndcg_diff = ((y_true[j] - y_true[i]) / log(2 + i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can precompute the logs in an array of size n_samples. Also make sure to use logarithms in base 2.
How is this PR going? i have some use cases for LambdaMART, and hence would love to see this be merged. |
see todo list on top. Also it needs rebase. Please take over if you can. |
I personally have bad experience with LambdaMART. In my experience, it performs either worse or comparably to random forests or gradient boosting. |
So how is things going ? Is LambdaMART available? |
Does anyone know if these commits are still relevant since it has been ~7 years? If not maybe it is better to close this and make an issue instead |
agreed. I don't see anyone resurrecting this anytime soon. |
This PR is an attempt to implement LambdaMART [1]. I imagine the biggest hurdle will be coming to some conclusion over the correct API since we need to include the queries somehow. In my implementation I use an extra keyword argument, I don't know if this causes problems elsewhere. My hope is that this PR can serve as a catalyst to resolve that, and then I can finish up the PR.
TODO
unrar
command to executemax_rank=10
cutoff parameter for NDCG and LambdaMART and use it to compute the lambdas toondcg_score
insklearn.metrics
in pure Python first (see this comment)ndcg_score
and bench it against no-tie break (default implementation sort order) and random tie breakndcg_score
score or for its derivatives in the LambdaMART optim loop as well (see this comment)Possibly for another PR:
There was also a brief discussion on the mailing list [2]. Pinging @mblondel @ogrisel @pprett from that discussion.
[1] https://research.microsoft.com/en-us/um/people/cburges/tech_reports/msr-tr-2010-82.pdf
[2] http://sourceforge.net/mailarchive/forum.php?thread_name=CAP%2B3rpGVbSux5u4dZiatV3p1f1zUvndHXoi-oh4CjMRtSpjFsw%40mail.gmail.com&forum_name=scikit-learn-general