Skip to content

resolve #4587 add inductive learning example #6478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

chiragnagpal
Copy link

I added an example that uses a synthetic dataset (blobs + random), anduses dbscan to infer labels, and and then fits SVM over it, to infer labels on other data

@amueller
Copy link
Member

See #4587. (github doesn't create links from PR headers.

@amueller
Copy link
Member

@jnothman this was your idea ;)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the very slow review. I'm not yet sure this is persuasive. It might be worth thinking about whether there's a use case that can motivate it clearly.

This should be in examples/cluster/plot_inductive_learning.py

Thanks.

@@ -0,0 +1,85 @@
"""
==============================================
Inductive Learning with Scikit Learn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you must mean 'inductive clustering'. "With scikit-learn" is redundant inside scikit-learn.


Clustering is expensive, especially when our dataset contains millions of
datapoints. Recomputing the clusters everytime we receive some new data
is thus in many cases, intractable. With more data, there is also the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"With more data, "... this comment only really makes sense if you say something more explicit about acquiring more data from a noisier source than was used to build a clustering... no?

some unsupervised learning algorithm and then fit a classifier on the
inferred targets, treating it as a supervised problem. This is known as
Transductive learning.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excess blank line(s)

One solution to this problem, is to first infer the target classes using
some unsupervised learning algorithm and then fit a classifier on the
inferred targets, treating it as a supervised problem. This is known as
Transductive learning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is certainly not transductive. Whether we're using "inductive" correctly is another matter.


n_samples = 5000

colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just np.array(list('bgrcmyk' * 4))?

plt.scatter(X[:, 0], X[:, 1], color="black", s=2)
plt.show()

from sklearn import svm
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports should be up the top


colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])

blobs = datasets.make_blobs(n_samples=3*n_samples, random_state=8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why 3x?

# Inferring class on a new random dataset
X_new = StandardScaler().fit_transform(np.random.rand(n_samples*2,2))
y_pred = inductiveLearner.predict(X_new)
plt.scatter(X_new[:, 0], X_new[:, 1], color=colors[y_pred].tolist(), s=5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overlay doesn't really work if black is one of the plotted colours.

You need more clear titling/description of the plot.

@chkoar
Copy link
Contributor

chkoar commented Mar 12, 2018

@jnothman what is the intension here? Do we need to provide just an example of inductive inference on cluster labels or to create a meta-estimator?

@jnothman
Copy link
Member

I don't mind examples containing new meta-estimators...

@jnothman
Copy link
Member

Closed by #10852

@jnothman jnothman closed this Jan 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants