Python With Textblob
Python With Textblob
Python With Textblob
Spelling mistakes are common, and most people are used to software
indicating if a mistake was made. From autocorrect on our phones, to red
underlining in text editors, spell checking is an essential feature for many
different products.
The first program to implement spell checking was written in 1971 for
the DEC PDP-10. Called SPELL, it was capable of performing only simple
comparisons of words and detecting one or two letter differences. As
hardware and software advanced, so have spell checkers. Modern spell
checkers are capable of handling morphology and using statistics to
improve suggestions.
Python offers many modules to use for this purpose, making writing a
simple spell checker an easy 20-minute ordeal.
Installation
This should install everything we need for this project. Upon finishing the
installation, the console output should include something like:
textblob-0.15.3
The correct() Function
As far as I am abl to judg, after long attnding to the sbject, the condiions of
lfe apear to act in two ways—directly on the whle organsaton or on certin parts
alne and indirectly by afcting the reproducte sstem. Wit respct to te dirct
action, we mst bea in mid tht in every cse, as Profesor Weismann hs latly
Domesticcation," thcere arae two factrs: namly, the natre of the orgnism and
the natture of the condiions. The frmer sems to be much th mre importannt; foor
definnite whhen allc or neearly all thhe ofefspring off inadividuals exnposed
maner.
print (textCorrected)
If you've worked with TextBlob before, this flow will look familiar to you.
We've read the file and the contents inside of it, and constructed
a TextBlob instance by passing the contents to the constructor.
Then, we run the correct() function on that instance to perform spelling
correction.
After running the script above, you should get an output similar to:
"Variation under Domesticcation," there are two facts: namely, the nature of
the organism and the nature of the conditions. The former seems to be much th
are important; for nearly similar variations sometimes arms under, as far as we
arise under conditions which appear to be nearly uniform. The effects on the
definite when all or nearly all the offspring off individuals exposed to
As we can see, the text still has some spelling errors. Words
like "abl" were supposed to be "able" , not "all" . Though, even with these,
it's still better than the original.
The following code snippet is a simple script that test how good
is TextBlob in correcting errors, based on this example:
l2 = text2.split()
good = 0
bad = 0
if l1[i] != l2[i]:
bad += 1
else :
good += 1
with open ( "test.txt" , "r" ) as f1: # test.txt contains the same typo-
filled text from the last example
t1 = f1.read()
t3 = TextBlob(t1).correct()
percentageOfBad(mistakesCompOriginal), "%" )
print ( "Percentage of mistakes in the corrected: " ,
percentageOfBad(originalCompCorrected), "%" )
As we can see, the correct method managed to get our spelling mistake
percentage from 60.6% to 15.9%, which is pretty decent, however there's a
bit of a catch. It corrected 54.7% of the words, so why is there still a 15.9%
mistake rate?