Skip to content

Add support for: ML-kNN #2606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mchangun opened this issue Nov 22, 2013 · 21 comments
Closed

Add support for: ML-kNN #2606

mchangun opened this issue Nov 22, 2013 · 21 comments
Labels
Enhancement Low Priority Low priority issues and pull requests module:cluster

Comments

@mchangun
Copy link

Add support for the Multi-Label kNN algorithm as described here in this paper http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/pr07.pdf. A brief description of the algorithm from the above paper:

"As its name implied, Ml-knn is derived from the popular k-Nearest Neighbor (kNN) algorithm [1]. Firstly, for each test instance, its k nearest neighbors in the training set are identi¯ed. Then, according to statistical information gained from the label sets of these neighboring instances, i.e. the number of neighboring instances belonging to each possible class, maximum a posteriori (MAP) principle is utilized to determine the label set for the test instance."

Supporting sparse matrices should be a key requirement - a lot of multi label tasks are for text categorization and these are usually represented as sparse matrices.

ML-kNN is actually already implemented in this library (http://orange.biolab.si/docs/latest/reference/rst/Orange.multilabel/#ml-knn-learner) but it would be good to bring it under the scikit-learn framework as well.

@arjoly
Copy link
Member

arjoly commented Nov 22, 2013

Hi @mchangun,

I think this would a nice addition and that you would be interested in #399 and #970.

@mchangun
Copy link
Author

I'm just wondering what the next step for this would be? Does someone have to mark it as "feature"? How does it get picked up by one of the developers?

I've coded up my version of this ML-kNN, if someone is interested in taking a look, I will happily send it over.

@arjoly
Copy link
Member

arjoly commented Dec 12, 2013

The best way to add new features is to make a pull request. Please have a look to the contributing guidelines http://scikit-learn.org/dev/developers/index.html. Be aware that adding new features require more than just writing the plain algorithm, but also to write tests, code documentations and a narrative documentation.

@amueller
Copy link
Member

Is this not implemented already? http://scikit-learn.org/dev/modules/multiclass.html lists KNN as being multi-label.

@mchangun
Copy link
Author

@amueller Where in the link do you see that? I can only find this:

Inherently multiclass: Naive Bayes, sklearn.lda.LDA, Decision Trees, Random Forests, Nearest Neighbors.

I.e. multiclass, not multilabel

I've been looking through the sklearn docs for multi-label support and at the moment, it seems to only support it via OneVsRest / LabeBinarizer.

@arjoly
Copy link
Member

arjoly commented Dec 15, 2013

Is this not implemented already? http://scikit-learn.org/dev/modules/multiclass.html lists KNN as being multi-label.

Yes, the binary relevance (one-vs-rest) is already implemented. However, ML-KNN propose to compute to perform prediction in a bayesian fashion using Bayes rule as in #399 and #970..

@medhini
Copy link

medhini commented Mar 20, 2016

Has this been implemented already? Is the issue still open?

@bhaveshoswal
Copy link

bhaveshoswal commented May 25, 2016

@mchangun Can you give me the ML-KNN code it will be a great help to me as i am working on Multi-Label Problem Thanks
email id = oswal.bhavesh2010@gmail.com

@jnothman
Copy link
Member

The issue is still open, @medhini, it has not been implemented (the implementation at #970 is not multi-label); and no, we don't have code for you, @bhaveshoswal .

@medhini
Copy link

medhini commented May 25, 2016

I would like to take up this issue and implement MLKnn. Can it be assigned to me ?

@jnothman
Copy link
Member

jnothman commented May 25, 2016

As far as I can tell, @medhini, you would be welcome. We've rarely used the 'assignee' feature, but please write some tests, perhaps, open a WIP pull request and show us it's actually going to happen.

@amueller
Copy link
Member

@jnothman but our KNN is multi-label, right? Even multi-output multi-class. Or is this about something else?

@jnothman
Copy link
Member

I didn't look through clearly enough. #2606 (comment) above suggests ML-KNN differs in its use of bayesian priors, but I suppose bayesian priors is independent of the multilabelness? I haven't looked into the bayesian priors issue, but it looks interesting. Presumably, though, this issue adds little.

@jnothman
Copy link
Member

Or does ML-KNN model covariance between the multiple labels? I think someone will need to read to work out if this is valuable.

@amueller
Copy link
Member

No covariance, but uses global class probabilities. Wouldn't call that bayesian priors. The paper is really obscurely written. There's a smoothing of the class distribution, which is 1.

@gerdinard
Copy link

Hello guys. Is this still on?

I have finished an implementation of ML-kNN based on sklearn's kNN and the original paper "ML-KNN: A lazy learning approach to multi-label learning".

@jnothman
Copy link
Member

A PR is welcome but expect a slow process to merge; so is a summary of the algorithm.

@Oktai15
Copy link

Oktai15 commented Oct 7, 2018

@gerdinard please share your implementation if it is possible

@sandeepeecs
Copy link

Hello guys. Is this still on?

I have finished an implementation of ML-kNN based on sklearn's kNN and the original paper "ML-KNN: A lazy learning approach to multi-label learning".

Can you share a link to your implementation?

@gerdinard
Copy link

Hello. Here is a link to my python MLkNN implementation.
https://github.com/gerdinard/MLkNN.git

@cmarmo cmarmo added Low Priority Low priority issues and pull requests and removed help wanted labels Aug 9, 2022
@adrinjalali
Copy link
Member

Closing as it's unlikely we'd be adding it to sklearn, and it's better existing in a separate project/repo.

@adrinjalali adrinjalali closed this as not planned Won't fix, can't repro, duplicate, stale Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Low Priority Low priority issues and pull requests module:cluster
Projects
None yet
Development

No branches or pull requests