Skip to content

liamirpy/Cleaning-data-for-NLP-by-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Cleaning-data-in-NLP-by-python

How to clean our data in python (Here is persian documents)

In processing of language, if you are looking for better result you should clean your data,in advance. Obviously some words or punctuations have not a lot of effect on your result.

First thing you should do before cleaning your data is knowing you data completly because cleaning data is NOT a fix methodس that you can apply to any projects.

There are three common methods that would be useful(as I use in this project):

. STOP WORDS

     In this project I collected a csv file that includes "STOP WORDS IN PERSIAN".

. TfidfVectorizer

    TFIF method try to find some kind of word that repeat a lots in documents and not carry  important information. 

About

How to clean our data in python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages