-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
resolve #4587 add inductive learning example #6478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
See #4587. (github doesn't create links from PR headers. |
@jnothman this was your idea ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the very slow review. I'm not yet sure this is persuasive. It might be worth thinking about whether there's a use case that can motivate it clearly.
This should be in examples/cluster/plot_inductive_learning.py
Thanks.
@@ -0,0 +1,85 @@ | |||
""" | |||
============================================== | |||
Inductive Learning with Scikit Learn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you must mean 'inductive clustering'. "With scikit-learn" is redundant inside scikit-learn.
|
||
Clustering is expensive, especially when our dataset contains millions of | ||
datapoints. Recomputing the clusters everytime we receive some new data | ||
is thus in many cases, intractable. With more data, there is also the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"With more data, "... this comment only really makes sense if you say something more explicit about acquiring more data from a noisier source than was used to build a clustering... no?
some unsupervised learning algorithm and then fit a classifier on the | ||
inferred targets, treating it as a supervised problem. This is known as | ||
Transductive learning. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
excess blank line(s)
One solution to this problem, is to first infer the target classes using | ||
some unsupervised learning algorithm and then fit a classifier on the | ||
inferred targets, treating it as a supervised problem. This is known as | ||
Transductive learning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is certainly not transductive. Whether we're using "inductive" correctly is another matter.
|
||
n_samples = 5000 | ||
|
||
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is just np.array(list('bgrcmyk' * 4))
?
plt.scatter(X[:, 0], X[:, 1], color="black", s=2) | ||
plt.show() | ||
|
||
from sklearn import svm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imports should be up the top
|
||
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk']) | ||
|
||
blobs = datasets.make_blobs(n_samples=3*n_samples, random_state=8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why 3x?
# Inferring class on a new random dataset | ||
X_new = StandardScaler().fit_transform(np.random.rand(n_samples*2,2)) | ||
y_pred = inductiveLearner.predict(X_new) | ||
plt.scatter(X_new[:, 0], X_new[:, 1], color=colors[y_pred].tolist(), s=5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overlay doesn't really work if black is one of the plotted colours.
You need more clear titling/description of the plot.
@jnothman what is the intension here? Do we need to provide just an example of inductive inference on cluster labels or to create a meta-estimator? |
I don't mind examples containing new meta-estimators... |
Closed by #10852 |
I added an example that uses a synthetic dataset (blobs + random), anduses dbscan to infer labels, and and then fits SVM over it, to infer labels on other data