CLN Fixes PendingDeprecationWarning in CountVectorizer #19299

thomasjpfan · 2021-01-29T15:02:59Z

Reference Issues/PRs

Address a part of #12327

What does this implement/fix? Explain your changes.

Removes the use of np.matrix in CountVectorizer.inverse_transform

lorentzenchr

Thanks for this PR. A further step to get rid of np.matrix.

sklearn/feature_extraction/text.py

ogrisel

LGTM.

lorentzenchr

LGTM

…19299) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

rachanagusain · 2021-06-04T14:15:50Z

word n-grams not working for Devanagari script. Tokens are formed only until a modifier is found, which should not be the case. The default should be a whitespace.
Please see if regex should be imported instead of re.

d = ['आई- जाइयै नुहाड़ा ध्यान उस्सै भेठा जा’रदा हा।',
'य बात सुणते ही सब गौं वाल नजदीक जंगलै तरफ भाज् और बांनर पकड़ पकड़ बेर 100- 100 रु में बेचंण लाग्।']

v = CountVectorizer()
x = v.fit_transform(d)
f = v.get_feature_names()
print(f)

Output: ['100', 'आई', 'इय', 'उस', 'और', 'गल', 'णत', 'तरफ', 'नजद', 'नर', 'पकड', 'रद', 'सब']

CLN Fixes PendingDeprecationWarning in CountVectorizer

31198d8

github-actions bot added the module:feature_extraction label Jan 29, 2021

lorentzenchr reviewed Jan 29, 2021

View reviewed changes

sklearn/feature_extraction/text.py Outdated Show resolved Hide resolved

sklearn/feature_extraction/text.py Outdated Show resolved Hide resolved

thomasjpfan and others added 2 commits January 29, 2021 15:40

ENH Uses check_array

0d12239

Improve coverage of inverse_transform on non-CSR input

4924ab4

ogrisel approved these changes Jan 30, 2021

View reviewed changes

lorentzenchr approved these changes Jan 30, 2021

View reviewed changes

lorentzenchr merged commit 863c552 into scikit-learn:main Jan 30, 2021

lorentzenchr mentioned this pull request Jan 30, 2021

PendingDeprecationWarning: the matrix subclass is not the recommended way to represent matrices #12327

Closed

glemaitre mentioned this pull request Apr 22, 2021

Release 0.24.2 #19954

Merged

12 tasks

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021

CLN Fixes PendingDeprecationWarning in CountVectorizer (scikit-learn#…

c69fea4

…19299) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre pushed a commit that referenced this pull request Apr 28, 2021

CLN Fixes PendingDeprecationWarning in CountVectorizer (#19299)

e9b9f23

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

CLN Fixes PendingDeprecationWarning in CountVectorizer #19299

CLN Fixes PendingDeprecationWarning in CountVectorizer #19299

Uh oh!

thomasjpfan commented Jan 29, 2021

Uh oh!

lorentzenchr left a comment

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

lorentzenchr left a comment

Uh oh!

rachanagusain commented Jun 4, 2021

Uh oh!

Uh oh!

Uh oh!

CLN Fixes PendingDeprecationWarning in CountVectorizer #19299

CLN Fixes PendingDeprecationWarning in CountVectorizer #19299

Uh oh!

Conversation

thomasjpfan commented Jan 29, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

rachanagusain commented Jun 4, 2021

Uh oh!

Uh oh!