Skip to content

CLN Fixes PendingDeprecationWarning in CountVectorizer #19299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 30, 2021

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Address a part of #12327

What does this implement/fix? Explain your changes.

Removes the use of np.matrix in CountVectorizer.inverse_transform

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR. A further step to get rid of np.matrix.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentzenchr lorentzenchr merged commit 863c552 into scikit-learn:main Jan 30, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
…19299)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
glemaitre pushed a commit that referenced this pull request Apr 28, 2021
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@rachanagusain
Copy link

word n-grams not working for Devanagari script. Tokens are formed only until a modifier is found, which should not be the case. The default should be a whitespace.
Please see if regex should be imported instead of re.

d = ['आई- जाइयै नुहाड़ा ध्यान उस्सै भेठा जा’रदा हा।',
'य बात सुणते ही सब गौं वाल नजदीक जंगलै तरफ भाज् और बांनर पकड़ पकड़ बेर 100- 100 रु में बेचंण लाग्।']

v = CountVectorizer()
x = v.fit_transform(d)
f = v.get_feature_names()
print(f)

Output: ['100', 'आई', 'इय', 'उस', 'और', 'गल', 'णत', 'तरफ', 'नजद', 'नर', 'पकड', 'रद', 'सब']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants