-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Meta-estimator for semi-supervised learning #1243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do we want to keep that in the 1.0 milestone. @amueller : you opened that issue, what's your feeling? |
I took a look at the ICML'07 paper (Raina et al.) introducing this term. I assume you are interested in implementing the specific technique they introduce (or some variant on it), rather than the broader class of solutions to the problem they pose. Although it is not a constraint of their general problem formulation, their technique more-or-less involves fitting a transformer on a lot of unlabelled data, then applying the transformation before classification. So it merely comes down to something like: class SelfTaughtLearner(BaseEstimator):
def __init__(transformer, estimator):
...
def fit(self, X, y):
mask = y == -1
self.transformer.fit(X[safe_mask(X, mask)])
Xt = self.transformer.transform(X[safe_mask(X, ~mask)])
self.estimator.fit(Xt, y[safe_mask(y, ~mask)])
return self
def predict(self, X):
Xt = self.transformer.transform(X)
return self.estimator.predict(Xt) I note that this would be a nice framework for many scikit-learn dimensionality reduction (including feature agglomeration) techniques. (Presumably, this should include support for out-of-core learning of the transformer, as there can be lots of unlabelled data. One annoyance of the current semi-supervised API is that selecting portions where |
At which point is the transformer involved in this? |
Sorry. I failed to write what I meant. I've fixed the code snippet now. |
So is this considered a useful helper to demonstrate transfer-type semi-supervised learning? |
Just do add to the (old) discussion above: Often the lines in the |
I was actually referring to "self-training" aka "self-learning". |
I would go for this |
@chrsrds sure, go ahead :) |
Right. One option is to keep few labels per class (arbitrary number or percentage) on the training dataset and drop the rest. Next, we have to compare the accuracy of the self trained model and the supervised model using only the labeled examples. |
exactly, and maybe compare against label propagation and label spreading, too. (though I am not convinced by our implementation). |
Hello everyone, this definitely is new to me but if no one is working on this, I would like to try implementing this. I have understood the idea to the best of my ability and tried a version based on the above discussion at here. I mostly have never directly implemented any algorithm in semi-supervised learning, so kindly pardon my mistakes. I understand that all of you are busy but if you can guide at your convenience, will try to work on this. If you prefer that I first complete my pending PRs, will happily oblige to do so. Thanks. |
I think that @amueller is referring to the Self-Training (a.k.a. Bootstrapping) algorithm Unfortunately I did not have time for docstrings, narrative documentation and writting tests. |
Thanks for informing about the paper. I just saw the discussion above and probably misunderstood the complexity of the algorithm. Sorry. I would first read the paper carefully and if within my ability, would be glad to contribute as much as I can. |
Hi, I have recently come across this exercise and also read the ICML paper related to this. I was hoping to work on this if there is sufficient interest and would be within my capabilities to implement it. Please let me know if you have any suggestions or anything else which I could refer to better understand the algorithm. Thanks. |
Using self-taught learning it is possible to turn any estimator into a semi-supervised one.
Not that hard to do.
The text was updated successfully, but these errors were encountered: