ENH: use Bayesian priors in Nearest Neighbors classifier (Issue 399) #970

cricketsong · 2012-07-19T19:43:38Z

The k-Nearest-Neighbors and Radius Neighbors classifiers now make explicit use of the Bayesian prior probabilities of the classes (self.class_prior_). By default, each of these prior probabilities (for a given class) is set to the proportion of sample points which are in the given class; but they can also be set to other values by the class user.

(This is my first submission so I'd be grateful for any and all advices. Thank you!)

Fixes #399

jakevdp · 2012-07-19T21:23:41Z

Thanks for submitting this! I'll take a detailed look, but I'm travelling today so it might have to wait until tomorrow or the weekend.

amueller · 2012-07-19T21:55:10Z

sklearn/neighbors/base.py

@@ -567,8 +567,11 @@ def fit(self, X, y):
        y : {array-like, sparse matrix}, shape = [n_samples]
            Target values, array of integer values.
        """
-        self._y = np.asarray(y)


I'm not so familiar with this module but if the y wasn't preprocessed before, you should probably do an "unique" with return_inverse (use the one from utils for backward compatibility) so get integer labels from whatever what put in. Not sure how this fit is called, though, so maybe you should wait for Jakes opinion.

If I understand correctly, this will also take care of the next (line 571) line:

self._classes, self._y = _unique(y, return_inverse=True)

yes, it should. And I looked it up, there is no prior processing so this is the right place to do this, I think.

amueller · 2012-07-19T22:08:39Z

Thanks for your contribution.
I am not so familiar with this module but here some general comments.
First, I think we should not lightly change the default behavior of our classifiers. People who upgrade might get unexpected results. Also, this might not be the expected behavior for textbook KNN.
So I would rather like to have a parameter to control the behavior.
Also you mentioned the user could set the prior, but I don't see that you provided a way to do this. I think people would usually use KNeighborsClassifier and RadiusClassifier, so these should provide this option.

There is already a weights parameter, but I think this works on the samples, not the classes. Other estimators have a class_weight parameter, maybe you can adopt this name.

While you commented your code, you did not change the docstring or the narrative documentation. People who want to use a class usually look into these two places, not the code. So the functionality should be explained there.

If possible (and sensible) I like new functionality also to be integrated in an example, preferably an existing one.

amueller · 2012-07-19T22:12:58Z

sklearn/neighbors/classification.py

+        # given tested point), the corresponding element in `probabilities` 
+        # would still be incremented only once.
+        outliers = []  # row indices of the outliers (if any)
+        for row in range(len(pred_labels)):


I'm didn't check the details but I think you can do the whole thing using np.bincount with the weights parameter.

I could, but then I'd have to use the minlength parameter, thus:

np.bincount(pi, weights[i], minlength=self._classes.size)

The problem is that minlength was introduced in NumPy 1.6.0, so it's too recent for what we're supporting. One approach would be:

np.bincount(np.append(pi, np.arange(self._classes.size)), np.append(weights[i], np.zeros(self._classes.size)))

which basically adds the whole range of classes, but weighted with 0.0 so they have no impact beside making sure the returned array is self._classes.size long. Though I'm not sure this is more efficient or less kludgy than the loop in the original code I submitted.

Or I'd be happy to submit a wrapper in utils/fixes.py around bincount (like we have for unique).

Can't you just use bincount and then append zeros?

I guess I was too focused on solving it in only 1 line :-) So something like:

unpadded_probs = np.bincount(pi, weights[i]) probabilities[i] = np.append(unpadded_probs, np.zeros(self._classes.size - unpadded_probs.shape[0]))

amueller · 2012-07-19T22:15:06Z

@cricketsong I think this looks good in principle but I didn't check details. I hope my comments make sense, for a details review, let's wait for Jake :)

cricketsong · 2012-07-19T23:25:32Z

@amueller First of all, thank you for your numerous suggestions.

I will definitely look into using np.bincount where you've suggested it.
This PR does not in fact change the default behavior: in its current form, the KNN and Radius Neighbor classifiers already implicitly rely on a Bayesian prior computed as the proportion of sample points which are in each class. (When using such a prior, the posterior probability for class k reduces simply to the ratio of points in the volume V which are in class k, where the volume is either the one covering the k nearest neighbors (as in KNeighborsClassifier), or the volume with the given radius (as in RadiusNeighborsClassifier). More details can be found in Bishop, Pattern Recognition and Machine Learning, Springer, 2006, p. 125.)
Per Jake's last comment in Issue 399, that case is similar to the class_prior_ parameter in GaussianNB. It was my understanding that in the latter, the user could set the value of class_prior_ using set_params (inherited from class BaseEstimator); but now I'm not so sure. I'll wait to see what Jake had in mind.
I will modify the docstring and the narrative documentation, and will include an example.

Thanks again!

ogrisel · 2012-07-20T09:45:16Z

Just a note on the naming conventions for attributes and parameters in sklearn: attributes that have a trailing _ are automatically fitted parameters learned from the data and you should not expect the user to set them manually. User adjustable parameters should be stored with their name (without the trailing _) in the constructor (and then can be adjusted using set_parameters when nesting / stacking models in a sklearn.pipeline.Pipeline compound construct).

Please have a look at APIs of scikit-learn objects. The rest of the contributors guide is interesting too.

ogrisel · 2012-07-20T09:49:50Z

Also could you add References section in the docstring of the models that give "Bishop, Pattern Recognition and Machine Learning, Springer, 2006, p. 125." as a reference for the explicit use of priors in k-NN models? It should also be mentioned in the narrative doc.

amueller · 2012-07-20T10:04:00Z

@cricketsong Can you explain 2) again? The new predict_proba method gives different results than the old predict_proba method, right? Maybe I misunderstood, though.

cricketsong · 2012-07-20T13:07:38Z

@amueller The new predict_proba, by default (without the user changing the class priors) should -- and does -- give the same result as the old one. Here's why:

We want the posterior P(C_k | x) to correctly classify x. We can use the following approximations (the same are used in kNN and in Radius Neighbors):

P(x | C_k) = K_k / (N_k V) where K_k is the number of points in the neighborhood of x which are in class k; N_k is the total number of points in class k; and V is the volume of the neighborhood. (About that volume: we can safely ignore it because it will get simplified away when we apply Bayes' rule).
p(x) = K / (NV) where K is the total number of points in the neighborhood (fixed in kNN, variable in Radius Neighbors) and N is the total number of points in the training set.
p(C_k) = N_k / N. This is the prior by default in the new predict_proba and (as we'll see) the implicit prior in the old predict_proba.

When we use Bayes' rule to compute P(C_k | x) from these 3 approximations, we get P(C_k | x) = K_k / K. And that expression is precisely what was used by the old predict_proba: it was implicitly using the prior P(C_k) = N_k / N.

In the new version, our prior is arbitrary so when we use Bayes' rule, we compute P(x | C_k) first as K_k * weight / N_k (we ignore V), multiplying by our arbitrary prior P(C_k) and then dividing by p(x) (which also ignores V).

amueller · 2012-07-20T22:54:19Z

sklearn/neighbors/classification.py

+        # M += 1: if a predicted index was to occur more than once (for a
+        # given tested point), the corresponding element in `probabilities` 
+        # would still be incremented only once.
+        for i, pi in enumerate(pred_labels):


Hm I thought you could eliminate this one, too. But I guess it is not so obvious. I think you'd need my favourite non-existing numpy function bincount2d (analog to histogram2d). One of these days I'm gonna write it ;)

amueller · 2012-07-20T22:55:00Z

nice work :)

jakevdp · 2012-07-23T22:42:18Z

sklearn/neighbors/base.py

+        self._classes, self._y = unique(y, return_inverse=True)
+        self.class_prior_ = np.zeros(self._classes.size)
+        for i, y_i in enumerate(self._classes):
+            self.class_prior_[i] = np.float(np.sum(y == y_i)) / len(y)


This can be accomplished more efficiently with

self.class_prior_ = np.bincount(self._y).astype(float) / len(self._classes)

I believe this should work

Unless I'm mistaken, we should divide by len(self._y) (instead of len(self._classes)), which will give us the proportion of points in each class.

You're right: we should divide by len(self._y)

jakevdp · 2012-07-23T23:47:11Z

This is looking good. A few thoughts:

in order to maintain backward compatibility, the Bayesian prior should be opt-in. I'm confused by the comment above saying that adding a Bayesian prior does not change the default behavior. Granted, I think I need to sit down and try to recall what I had in mind when I opened Issue Bayesian priors in nearest neighbors classification/regression #399 😄. I pictured having an input flag in the constructor named use_bayesian_prior which is set to False by default.
Olivier's comments are good: a reference to Bishop would be nice, as well as some mention of the behavior in the doc string
I just noticed that KNeighbors has a predict_proba method, while RadiusNeighbors does not. This is left over from a previous PR which added predict_proba only to one of the classes: it would be nice to add that in RadiusNeighbors now.
I'm still confused by the affect of the class prior (see my line note above). Let me know if I'm missing something on that!

All that aside: great work. I think once this is cleaned up, it will be a nice addition to the module.

jakevdp · 2012-07-24T14:23:45Z

Looking good. Above I said a boolean flag would be good. I think it would be better to allow the user to pass their own bayesian priors via a class_prior argument, which defaults to None. If it's None, the priors would be computed as they are right now. we could also use priors = 'flat' or other standard options to make it easier on the user.

Getting close!

cricketsong · 2012-07-24T18:41:24Z

If everything is satisfactory, I'll work on the corresponding narrative documentation tomorrow morning (EDT/Montreal time).

jakevdp · 2012-07-24T20:36:14Z

@cricketsong - all looks good. Thanks for all the explanations and responses above. I'm surprised that predict_proba is faster... looking at the source of stats.mode, however, I can see why that would be. Your bincount solution in predict_proba is better,

I'm excited to see the narrative documentation! Again, thanks for all the work you're doing on this.

amueller · 2012-07-28T18:06:32Z

I think naming the parameter class_weight instead of class_prior would be better as this name is already used in other classifiers. What do you think @jakevdp (and @ogrisel)?

jakevdp · 2012-07-28T18:41:19Z

@amueller - good question: I had to think about it a bit!

class_weight is used for many of the classifiers in sklearn.linear_model. class_prior is currently used in Gaussian Naive Bayes. In discriminative classifiers like those in sklearn.linear_model, class_weight defaults to 1. In Naive Bayes, which is a generative classifier, class_prior defaults to the frequency of the label in the training set.

Nearest neighbors classification is a bit of a hybrid: it has characteristics of both generative and discriminative classification, so it could go either way. But this PR frames nearest neighbors classification squarely in a Bayesian, generative sense. Because of this, and because the default behavior is similar to that of class_prior in GaussianNaiveBayes, I'd prefer to stick with the label class_prior here.

amueller · 2012-07-28T18:56:18Z

@jakevdp Ok, I agree. I didn't know that GNB used the same parameter name. The difference in default behavior is also a good argument. I didn't realize that. Thanks for your thoughts :)

amueller · 2012-07-28T18:58:07Z

doc/modules/neighbors.rst

@@ -94,7 +94,19 @@ be accomplished through the ``weights`` keyword.  The default value,
 distance from the query point.  Alternatively, a user-defined function of the
 distance can be supplied which is used to compute the weights.

-
+The nearest neighbors classification algorithm is implicitly based on


I would rather says something like "there is a probabilistic interpretation of nn"

amueller · 2012-07-28T19:01:44Z

This looks pretty good. Could you maybe add a little explanation to the example explaining the plot? I haven't looked at it myself but will do tomorrow.

cricketsong · 2012-07-29T14:56:42Z

@amueller Thanks for the comments. I've corrected the =/- problem, applied the modification you suggested re: the probabilistic interpretation and commented on the example.

amueller · 2012-09-14T16:24:20Z

@cricketsong Sorry I forgot a bit about this PR, I have been pretty busy. What is the current status?
@jakevdp what do you think?

jakevdp · 2012-09-17T16:09:25Z

@amueller - Thanks for bringing this up again - I need to take some time to look back at this. I recall it was in pretty good shape, but I'd like to do a full review.

chkoar · 2018-01-05T14:15:49Z

What is the current status of this PR? Can we recap? If @cricketsong is not available maybe I could carry it out. cc @amueller @jakevdp @ogrisel

jnothman · 2018-01-07T22:34:30Z

Status is stalled in ancient history! I like the idea of supporting a bayesian KNN (mostly because users don't think about this approach to KNN), but I don't think we're going to make it a top priority!

jnothman · 2018-01-07T22:35:48Z

One thing that would help here is having a reference (a textbook, a paper, etc) that would allow us to ensure correctness according to an established standard.

jnothman

Am I correct in thinking that under this approach, n_neighbors=1 will produce a posterior with all its mass in one class? (I've not tried running the code because it would involve looking at 6-year-old compilation instructions lol.)

The math here seems in accordance with bishop, but it goes against my intuition of bayesian prior, wherein we should rely on the prior more if the number of neighbors available to inform us is few (e.g. when using radius_neighbors in a region that is sparse in the training set). Am I mistaken?

jnothman · 2018-01-30T02:22:29Z

sklearn/neighbors/classification.py

+    class_prior : str, list or ndarray, optional (default = 'default')
+        class prior probabilities used in prediction. Possible values:
+
+        - 'default': default prior probabilities. For each class, its


should this be called 'mle'?

jnothman · 2018-01-30T02:22:35Z

sklearn/neighbors/classification.py

+        - 'default': default prior probabilities. For each class, its
+          prior probability is the proportion of points in the dataset
+          that are in this class.
+        - 'flat': equiprobable prior probabilites. If there are C classes,


or 'uniform'?

jnothman · 2018-01-30T02:23:44Z

sklearn/neighbors/base.py

+    if class_prior in (None, 'default'):
+        return np.bincount(y).astype(float) / len(y)
+    elif class_prior == 'flat':
+        return np.ones((len(np.unique(y)),)) / len(np.unique(y))


Since we normalise later, we can just return 1 here.

jnothman · 2018-01-30T02:23:54Z

sklearn/neighbors/base.py

+    class_prior_arr: array of the same shape as ``np.unique(y)``
+    """
+    if class_prior in (None, 'default'):
+        return np.bincount(y).astype(float) / len(y)


since we normalize later, we can ignore the / len(y)

jnothman · 2018-01-30T07:45:35Z

sklearn/neighbors/classification.py

+
+        # Compute the unnormalized posterior probability, taking
+        # self.class_prior_ into consideration.
+        class_count = np.bincount(self._y)


nowadays we need to handle multioutput y. Perhaps each will have its own prior...

jnothman · 2018-01-30T07:46:19Z

sklearn/neighbors/classification.py

+                probabilities[i] = 1e-6  # prevent division by zero later
+                continue  # outlier
+            # When we support NumPy >= 1.6, we'll be able to simply use:
+            # np.bincount(pi, weights, minlength=self._classes.size)


we support this now.

lorentzenchr · 2021-11-10T16:22:40Z

Should we close? We still have the open issue #399.

ENH: use Bayesian priors in Nearest Neighbors classifier (Issue 399)

c6bafed

amueller reviewed Jul 19, 2012
View reviewed changes

cricketsong added 2 commits July 20, 2012 10:30

use unique to get integers from labels

3005a60

replace 3 loops by calls to np.bincount

884e7e2

amueller reviewed Jul 20, 2012
View reviewed changes

jakevdp reviewed Jul 23, 2012
View reviewed changes

cricketsong added 3 commits July 24, 2012 10:51

replace loop by call to np.bincount

902dcb6

parameter class_prior is processed by constructor

3680fc5

implement predict in terms of predict_proba

e83c711

integrate class_prior parameter to narrative doc

089dece

amueller reviewed Jul 28, 2012
View reviewed changes

correct minor mistakes in documentation

4dec520

arjoly mentioned this pull request Nov 22, 2013

Add support for: ML-kNN #2606

Closed

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

webber26232 mentioned this pull request Aug 24, 2017

[MRG+1] Add predict_proba(X) and outlier handler for RadiusNeighborsClassifier #9597

Merged

jnothman added Stalled help wanted labels Jan 7, 2018

jnothman reviewed Jan 30, 2018

View reviewed changes

github-actions bot added the module:neighbors label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali closed this Jul 30, 2023

adrinjalali mentioned this pull request Jul 30, 2023

Bayesian priors in nearest neighbors classification/regression #399

Open

ENH: use Bayesian priors in Nearest Neighbors classifier (Issue 399) #970

ENH: use Bayesian priors in Nearest Neighbors classifier (Issue 399) #970

Conversation

cricketsong commented Jul 19, 2012 • edited by jnothman Loading

jakevdp commented Jul 19, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jul 19, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jul 19, 2012

cricketsong commented Jul 19, 2012

ogrisel commented Jul 20, 2012

ogrisel commented Jul 20, 2012

amueller commented Jul 20, 2012

cricketsong commented Jul 20, 2012

Choose a reason for hiding this comment

amueller commented Jul 20, 2012

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jakevdp commented Jul 23, 2012

jakevdp commented Jul 24, 2012

cricketsong commented Jul 24, 2012

jakevdp commented Jul 24, 2012

amueller commented Jul 28, 2012

jakevdp commented Jul 28, 2012

amueller commented Jul 28, 2012

Choose a reason for hiding this comment

amueller commented Jul 28, 2012

cricketsong commented Jul 29, 2012

amueller commented Sep 14, 2012

jakevdp commented Sep 17, 2012

chkoar commented Jan 5, 2018

jnothman commented Jan 7, 2018

jnothman commented Jan 7, 2018

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr commented Nov 10, 2021

cricketsong commented Jul 19, 2012 •

edited by jnothman

Loading