-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Vectorizing memory issue #6183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you post the full traceback ? Also it would be great if you could put together a stand-alone example that reproduces the problem. |
This is the traceback. A minimal example that reproduces this would likely run into 10s of GB. From what I can tell this issue is triggered when ----> 1 all_vectorizer.fit(train_data)
/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.pyc in fit(self, raw_documents, y)
787 self
788 """
--> 789 self.fit_transform(raw_documents)
790 return self
791
/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.pyc in fit_transform(self, raw_documents, y)
815
816 vocabulary, X = self._count_vocab(raw_documents,
--> 817 self.fixed_vocabulary_)
818
819 if self.binary:
/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.pyc in _count_vocab(self, raw_documents, fixed_vocab)
756 # Ignore out-of-vocabulary items for fixed_vocab=True
757 continue
--> 758 indptr.append(len(j_indices))
759
760 if not fixed_vocab:
OverflowError: signed integer is greater than maximum |
Hi, I'm still getting this error in version 0.19.1. Any work around? |
You can try patching your copy of the library with #9147 |
Wow! That was quick. Thanks @jnothman , I will give it a try. |
Hi all
I'm working with a pretty large data set and am having an issue with line 758 of text.py (CountVectorizer code):
In my case, the length of j_indices is larger than the maximum signed int. indptr is an int array.
I tried making indptr a long array but that leads to other bigger memory issues.
Any thoughts?
The text was updated successfully, but these errors were encountered: