Neuspell: A Neural Spelling Correction Toolkit
Neuspell: A Neural Spelling Correction Toolkit
Neuspell: A Neural Spelling Correction Toolkit
Abstract
We introduce NeuSpell, an open-source toolkit
for spelling correction in English. Our
arXiv:2010.11085v1 [cs.CL] 21 Oct 2020
1 Introduction
For instance, they fail to disambiguate thaught
::::::
to
Spelling mistakes constitute the largest share of taught or thought based on the context: “Who
errors in written text (Wilbur et al., 2006; Flor thaught you calculus?” versus “I never thaught
::::::: ::::::
I
and Futagi, 2012). Therefore, spell checkers are would be awarded the fellowship.”
ubiquitous, forming an integral part of many ap- In this paper, we describe our spelling correction
plications including search engines, productivity toolkit, which comprises of several neural mod-
and collaboration tools, messaging platforms, etc. els that accurately capture context around the mis-
However, many well performing spelling correc- spellings. To train our neural spell correctors, we
tion systems are developed by corporations, trained first curate synthetic training data for spelling cor-
on massive proprietary user data. In contrast, many rection in context, using several text noising strate-
freely available off-the-shelf correctors such as En- gies. These strategies use a lookup table for word-
chant (Thomas, 2010), GNU Aspell (Atkinson, level noising, and a context-based character-level
2019), and JamSpell (Ozinov, 2019), do not ef- confusion dictionary for character-level noising. To
fectively use the context of the misspelled word. populate this lookup table and confusion matrix, we
1
Code and pretrained models are available at: harvest isolated misspelling-correction pairs from
https://github.com/neuspell/neuspell various publicly available sources.
Further, we investigate effective ways to incor- • SC - LSTM (Sakaguchi et al., 2016): It corrects
porate contextual information: we experiment with misspelt words using semi-character represen-
contextual representations from pretrained models tations, fed through a bi-LSTM network. The
such as ELMo (Peters et al., 2018) and BERT (De- semi-character representations are a concate-
vlin et al., 2018) and compare their efficacies with nation of one-hot embeddings for the (i) first,
existing neural architectural choices (§ 5.1). (ii) last, and (iii) bag of internal characters.
Lastly, several recent studies have shown that
many state-of-the-art neural models developed for • CHAR - LSTM - LSTM (Li et al., 2018): The
a variety of Natural Language Processing (NLP) model builds word representations by passing
tasks easily break in the presence of natural or syn- its individual characters to a bi-LSTM. These
thetic spelling errors (Belinkov and Bisk, 2017; representations are further fed to another bi-
Ebrahimi et al., 2017; Pruthi et al., 2019). We LSTM trained to predict the correction.
determine the usefulness of our toolkit as a counter-
• CHAR - CNN - LSTM (Kim et al., 2015): Similar
measure against character-level adversarial attacks
to the previous model, this model builds word-
(§ 5.2). We find that our models are better defenses
level representations from individual charac-
to adversarial attacks than previously proposed
ters using a convolutional network.
spell checkers. We believe that our toolkit would
encourage practitioners to incorporate spelling cor- • B ERT (Devlin et al., 2018): The model uses
rection systems in other NLP applications. a pre-trained transformer network. We aver-
Correction Time per sentence
age the sub-word representations to obtain the
Model
Rates (milliseconds) word representations, which are further fed to
A SPELL (Atkinson, 2019) 48.7 7.3∗
JAM S PELL (Ozinov, 2019) 68.9 2.6∗
a classifier to predict its correction.
CHAR - CNN - LSTM (Kim et al., 2015) 75.8 4.2
SC - LSTM (Sakaguchi et al., 2016) 76.7 2.8 To better capture the context around a misspelt
CHAR - LSTM - LSTM (Li et al., 2018) 77.3 6.4
B ERT (Devlin et al., 2018) 79.1 7.1
token, we extend the SC - LSTM model by aug-
SC - LSTM menting it with deep contextual representations
+E LMO (input) 79.8 15.8 from pre-trained ELMo and BERT. Since the best
+E LMO (output) 78.5 16.3
+B ERT (input) 77.0 6.7 point to integrate such embeddings might vary by
+B ERT (output) 76.0 7.2 task (Peters et al., 2018), we append them either
Table 1: Performance of different correctors in the to semi-character embeddings before feeding them
NeuSpell toolkit on the BEA-60K dataset with real- to the biLSTM or to the biLSTM’s output. Cur-
world spelling mistakes. ∗ indicates evaluation on a rently, our toolkit provides four such trained mod-
CPU (for others we use a GeForce RTX 2080 Ti GPU). els: ELMo/BERT tied at input/output with a semi-
character based bi-LSTM model.
2 Models in NeuSpell Implementation Details Neural models in
NeuSpell are trained by posing spelling correction
Our toolkit offers ten different spelling correction
as a sequence labeling task, where a correct
models, which include: (i) two off-the-shelf non-
word is marked as itself and a misspelt token
neural models, (ii) four published neural models
is labeled as its correction. Out-of-vocabulary
for spelling correction, (iii) four of our extensions.
labels are marked as UNK. For each word in the
The details of first six systems are following:
input text sequence, models are trained to output
• GNU Aspell (Atkinson, 2019): It uses a com- a probability distribution over a finite vocabulary
bination of metaphone phonetic algorithm,2 using a softmax layer.
Ispell’s near miss strategy,3 and a weighted We set the hidden size of the bi-LSTM network
edit distance metric to score candidate words. in all models to 512 and use {50,100,100,100}
sized convolution filters with lengths {2,3,4,5} re-
• JamSpell (Ozinov, 2019): It uses a variant of
spectively in CNNs. We use a dropout of 0.4 on
the SymSpell algorithm,4 and a 3-gram lan-
the bi-LSTM’s outputs and train the models using
guage model to prune word-level corrections.
cross-entropy loss. We use the BertAdam5 opti-
2
http://aspell.net/metaphone/ mizer for models with a BERT component and the
3
https://en.wikipedia.org/wiki/Ispell
4 5
https://github.com/wolfgarbe/SymSpell github.com/cedrickchee/pytorch-pretrained-BERT
Adam (Kingma and Ba, 2014) optimizer for the sampled from all the misspellings associated with
remainder. These optimizers are used with default that word in the lookup table. Words not present in
parameter settings. We use a batch size of 32 ex- the lookup table are left as is.
amples, and train with a patience of 3 epochs.
P ROB: Recently, Piktus et al. (2019) released a
During inference, we first replace UNK predic-
corpus of 20M correct-misspelt word pairs, gener-
tions with their corresponding input words and then
ated from logs of a search engine.9 We use this cor-
evaluate the results. We evaluate models for accu-
pus to construct a character-level confusion dictio-
racy (percentage of correct words among all words)
nary where the keys are hcharacter, contexti pairs
and word correction rate (percentage of misspelt to-
and the values are a list of potential character re-
kens corrected). We use AllenNLP6 and Hugging-
placements with their frequencies. This dictionary
face7 libraries to use ELMo and BERT respectively.
is subsequently used to sample character-level er-
All neural models in our toolkit are implemented
rors in a given context. We use a context of 3
using the Pytorch library (Paszke et al., 2017), and
characters, and backoff to 2, 1, and 0 characters.
are compatible to run on both CPU and GPU en-
Notably, due to the large number of unedited char-
vironments. Performance of different models are
acters in the corpus, the most probable replacement
presented in Table 1.
will often be the same as the source character.
3 Synthetic Training Datasets P ROB +W ORD: For this strategy, we simply con-
Due to scarcity of available parallel data for catenate the training data obtained from both
spelling correction, we noise sentences to gener- W ORD and P ROB strategies.
ate misspelt-correct sentence pairs. We use 1.6M
4 Evaluation Benchmarks
sentences from the one billion word benchmark
(Chelba et al., 2013) dataset as our clean corpus. Natural misspellings in context Many publicly
Using different noising strategies from existing lit- available spell-checkers correctors evaluate on iso-
erature, we noise ∼20% of the tokens in the clean lated misspellings (Atkinson, 2019; Mitton, na;
corpus by injecting spelling mistakes in each sen- Norvig, 2016). Whereas, we evaluate our systems
tence. Below, we briefly describe these strategies. using misspellings in context, by using publicly
available datasets for the task of Grammatical Er-
R ANDOM : Following Sakaguchi et al. (2016), ror Correction (GEC). Since the GEC datasets are
this noising strategy involves four character-level annotated for various types of grammatical mis-
operations: permute, delete, insert and replace. We takes, we only sample errors of SPELL type.
manipulate only the internal characters of a word. Among the GEC datasets in BEA-2019 shared
The permute operation jumbles a pair of consecu- task10 , the Write & Improve (W&I) dataset along
tive characters, delete operation randomly deletes with the LOCNESS dataset are a collection of texts
one of the characters, insert operation randomly in English (mainly essays) written by language
inserts an alphabet and replace operation swaps a learners with varying proficiency levels (Bryant
character with a randomly selected alphabet. For et al., 2019; Granger, 1998). The First Certificate
every word in the clean corpus, we select one of in English (FCE) dataset is another collection of
the four operations with 0.1 probability each. We essays in English written by non-native learners tak-
do not modify words of length three or smaller. ing a language assessment exam (Yannakoudakis
W ORD: Inspired from Belinkov and Bisk (2017), et al., 2011) and the Lang-8 dataset is a collection
we swap a word with its noised counterpart from a of English texts from Lang-8 online language learn-
pre-built lookup table. We collect 109K misspelt- ing website (Mizumoto et al., 2011; Tajiri et al.,
correct word pairs for 17K popular English words 2012). We combine data from these four sources
from a variety of public sources.8 to create the BEA-60 K test set with nearly 70K
For every word in the clean corpus, we replace it spelling mistakes (6.8% of all tokens) in 63044
by a random misspelling (with a probability of 0.3) sentences.
The JHU FLuency-Extended GUG Corpus
6
allennlp.org/elmo (JFLEG) dataset (Napoles et al., 2017) is another
7
huggingface.co/transformers/model doc/bert.html
8 9
https://en.wikipedia.org/, dcs.bbk.ac.uk, norvig.com, cor- https://github.com/facebookresearch/moe
10
pus.mml.cam.ac.uk/efcamdat www.cl.cam.ac.uk/research/nl/bea2019st/
Spelling correction systems in NeuSpell (Word-Level Accuracy / Correction Rate)
Synthetic Natural Ambiguous
WORD - TEST PROB - TEST BEA-60K JFLEG BEA-4660 BEA-322
A SPELL (Atkinson, 2019) 43.6 / 16.9 47.4 / 27.5 68.0 / 48.7 73.1 / 55.6 68.5 / 10.1 61.1 / 18.9
JAM S PELL (Ozinov, 2019) 90.6 / 55.6 93.5 / 68.5 97.2 / 68.9 98.3 / 74.5 98.5 / 72.9 96.7 / 52.3
CHAR - CNN - LSTM (Kim et al., 2015) 97.0 / 88.0 96.5 / 84.1 96.2 / 75.8 97.6 / 80.1 97.5 / 82.7 94.5 / 57.3
SC - LSTM (Sakaguchi et al., 2016) 97.6 / 90.5 96.6 / 84.8 96.0 / 76.7 97.6 / 81.1 97.3 / 86.6 94.9 / 65.9
CHAR - LSTM - LSTM (Li et al., 2018) 98.0 / 91.1 97.1 / 86.6 96.5 / 77.3 97.6 / 81.6 97.8 / 84.0 95.4 / 63.2
B ERT (Devlin et al., 2018) 98.9 / 95.3 98.2 / 91.5 93.4 / 79.1 97.9 / 85.0 98.4 / 92.5 96.0 / 72.1
SC - LSTM
+ E LMO (input) 98.5 / 94.0 97.6 / 89.1 96.5 / 79.8 97.8 / 85.0 98.2 / 91.9 96.1 / 69.7
+ E LMO (output) 97.9 / 91.4 97.0 / 86.1 98.0 / 78.5 96.4 / 76.7 97.9 / 88.1 95.2 / 63.2
+ B ERT (input) 98.7 / 94.3 97.9 / 89.5 96.2 / 77.0 97.8 / 83.9 98.4 / 90.2 96.0 / 67.8
+ B ERT (output) 98.1 / 92.3 97.2 / 86.9 95.9 / 76.0 97.6 / 81.0 97.8 / 88.1 95.1 / 67.2
Table 2: Performance of different models in NeuSpell on natural, synthetic, and ambiguous test sets. All models
are trained using P ROB+W ORD noising strategy.
collection of essays written by English learners correction system that doesn’t use any context in-
with different first languages. This dataset con- formation can at best correct 50% of the mistakes.
tains 2K spelling mistakes (6.1% of all tokens) in
1601 sentences. We use the BEA-60 K and JFLEG 5 Results and Discussion
datasets only for the purposes of evaluation, and do
5.1 Spelling Correction
not use them in training process.
We evaluate the 10 spelling correction systems in
Synthetic misspellings in context From the two NeuSpell across 6 different datasets (see Table 2).
noising strategies described in §3, we additionally Among the spelling correction systems, all the neu-
create two test sets: WORD - TEST and PROB - TEST. ral models in the toolkit are trained using synthetic
Each of these test sets contain around 1.2M spelling training dataset, using the P ROB+W ORD synthetic
mistakes (19.5% of all tokens) in 273K sentences. data. We use the recommended configurations for
Aspell and Jamspell, but do not fine-tune them on
Ambiguous misspellings in context Besides
our synthetic dataset. In all our experiments, vo-
the natural and synthetic test sets, we create a chal-
cabulary of neural models is restricted to the top
lenge set of ambiguous spelling mistakes, which
100K frequent words of the clean corpus.
require additional context to unambiguously cor-
We observe that although off-the-shelf checker
rect them. For instance, the word ::::::whitch can be
Jamspell leverages context, it is often inadequate.
corrected to “witch” or “which” depending upon
We see that models comprising of deep contextual
the context. Simliarly, for the word begger,
::::::
both
representations consistently outperform other exist-
“bigger” or “beggar” can be appropriate corrections.
ing neural models for the spelling correction task.
To create this challenge set, we select all such mis-
We also note that the BERT model performs con-
spellings which are either 1-edit distance away
sistently well across all our benchmarks. For the
from two (or more) legitimate dictionary words,
ambiguous BEA-322 test set, we manually evalu-
or have the same phonetic encoding as two (or
ated corrections from Grammarly—a professional
more) dictionary words. Using these two criteria,
paid service for assistive writing.11 We found that
we sometimes end up with inflections of the same
our best model for this set, i.e. BERT, outperforms
word, hence we use a stemmer and lemmatizer
corrections from Grammarly (72.1% vs 71.4%)
from the NLTK library to weed those out. Finally,
We attribute the success of our toolkit’s well per-
we manually prune down the list to 322 sentences,
forming models to (i) better representations of the
with one ambiguous mistake per sentence. We refer
context, from large pre-trained models; (ii) swap
to this set as BEA-322.
invariant semi-character representations; and (iii)
We also create another larger test set where we ar-
training models with synthetic data consisting of
tificially misspell two different words in sentences
noise patterns from real-world misspellings. We
to their common ambiguous misspelling. This pro-
follow up these results with an ablation study to
cess results in a set with 4660 misspellings in 4660
understand the role of each noising strategy (Ta-
sentences, and is thus referred as BEA-4660. No-
11
tably, for both these ambiguous test sets, a spelling Retrieved on July 13, 2020 .
Sentiment Analysis (1-char attack / 2-char attack)
Defenses No Attack Swap Drop Add Key All
Word-Level Models
SC - LSTM (Pruthi et al., 2019) 79.3 78.6 / 78.5 69.1 / 65.3 65.0 / 59.2 69.6 / 65.6 63.2 / 52.4
SC - LSTM +E LMO (input) (F) 79.6 77.9 / 77.2 72.2 / 69.2 65.5 / 62.0 71.1 / 68.3 64.0 / 58.0
Char-Level Models
SC - LSTM (Pruthi et al., 2019) 70.3 65.8 / 62.9 58.3 / 54.2 54.0 / 44.2 58.8 / 52.4 51.6 / 39.8
SC - LSTM +E LMO (input) (F) 70.9 67.0 / 64.6 61.2 / 58.4 53.0 / 43.0 58.1 / 53.3 51.5 / 41.0
Word+Char Models
SC - LSTM (Pruthi et al., 2019) 80.1 79.0 / 78.7 69.5 / 65.7 64.0 / 59.0 66.0 / 62.0 61.5 / 56.5
SC - LSTM +E LMO (input) (F) 80.6 79.4 / 78.8 73.1 / 69.8 66.0 / 58.0 72.2 / 68.7 64.0 / 54.5
ble 4).12 For each of the 5 models evaluated, we Therefore, we also evaluate spell checkers in our
observe that models trained with P ROB noise out- toolkit against adversarial misspellings.
perform those trained with W ORD or R ANDOM We follow the same experimental setup as Pruthi
noises. Across all the models, we further observe et al. (2019) for the sentiment classification task
that using P ROB+W ORD strategy improves correc- under different adversarial attacks. We finetune
tion rates by at least 10% in comparison to R AN - SC - LSTM +E LMO (input) model on movie reviews
DOM noising. data from the Stanford Sentiment Treebank (SST)
(Socher et al., 2013), using the same noising strat-
Spelling Correction (Word-Level Accuracy / Correction Rate)
Train Noise Natural test sets egy as in (Pruthi et al., 2019). As we observe from
Model
BEA-60 K JFLEG Table 3, our corrector from NeuSpell toolkit (SC -
CHAR - CNN - LSTM R ANDOM 95.9 / 66.6 97.4 / 69.3
(Kim et al., 2015) W ORD 95.9 / 70.2 97.4 / 74.5
LSTM +E LMO (input)(F)) outperforms the spelling
P ROB 96.1 / 71.4 97.4 / 77.3 corrections models proposed in (Pruthi et al., 2019)
P ROB+W ORD 96.2 / 75.5 97.4 / 79.2
in most cases.
SC - LSTM R ANDOM 96.1 / 64.2 97.4 / 66.2
(Sakaguchi et al., 2016) W ORD 95.4 / 68.3 97.4 / 73.7
P ROB 95.7 / 71.9 97.2 / 75.9 6 Conclusion
P ROB+W ORD 95.9 / 76.0 97.6 / 80.3
CHAR - LSTM - LSTM R ANDOM 96.2 / 67.1 97.6 / 70.2
(Li et al., 2018) W ORD 96.0 / 69.8 97.5 / 74.6
In this paper, we describe NeuSpell, a spelling
P ROB 96.3 / 73.5 97.4 / 78.2 correction toolkit, comprising ten different mod-
P ROB+W ORD 96.3 / 76.4 97.5 / 80.2
BERT R ANDOM 96.9 / 66.3 98.2 / 74.4
els. Unlike popular open-source spell checkers,
(Devlin et al., 2018) W ORD 95.3 / 61.1 97.3 / 70.4 our models accurately capture the context around
P ROB 96.2 / 73.8 97.8/ 80.5 the misspelt words. We also supplement mod-
P ROB+W ORD 96.1 / 77.1 97.8 / 82.4
SC - LSTM R ANDOM 96.9 / 69.1 97.8 / 73.3 els in our toolkit with a unified command line,
+ E LMO (input) W ORD 96.0 / 70.5 97.5 / 75.6 and a web interface. The toolkit is open-sourced,
P ROB 96.8 / 77.0 97.7 / 80.9
P ROB+ W ORD 96.5 / 79.2 97.8 / 83.2 free for public use, and available at https://
github.com/neuspell/neuspell. A demo of the
Table 4: Evaluation of models on the natural test sets trained spelling correction models can be accessed
when trained using synthetic datasets curated using dif- at https://neuspell.github.io/.
ferent noising strategies.
Acknowledgements
5.2 Defense against Adversarial Mispellings The authors thank Punit Singh Koura for insight-
Many recent studies have demonstrated the suscep- ful discussions and participation during the initial
tibility of neural models under word- and character- phase of the project.
level attacks (Alzantot et al., 2018; Belinkov and
Bisk, 2017; Piktus et al., 2019; Pruthi et al., 2019).
To combat adversarial misspellings, Pruthi et al. References
(2019) find spell checkers to be a viable defense. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary,
12
To fairly compare across different noise types, in this Bo-Jhang Ho, Mani Srivastava, and Kai-Wei Chang.
experiment we include only 50% of samples from each of 2018. Generating natural language adversarial ex-
P ROB and W ORD noises to construct the P ROB+W ORD noise amples. In Proceedings of the 2018 Conference on
set. Empirical Methods in Natural Language Processing,
pages 2890–2896, Brussels, Belgium. Association Chapter of the Association for Computational Lin-
for Computational Linguistics. guistics: Volume 2, Short Papers, pages 229–234,
Valencia, Spain. Association for Computational Lin-
Kevin Atkinson. 2019. Gnu aspell. guistics.
Yonatan Belinkov and Yonatan Bisk. 2017. Synthetic Peter Norvig. 2016. Spelling correction system.
and natural noise both break neural machine transla-
tion. Filipp Ozinov. 2019. Jamspell.
Christopher Bryant, Mariano Felice, Øistein E. An- Adam Paszke, Sam Gross, Soumith Chintala, Gregory
dersen, and Ted Briscoe. 2019. The BEA-2019 Chanan, Edward Yang, Zachary DeVito, Zeming
shared task on grammatical error correction. In Pro- Lin, Alban Desmaison, Luca Antiga, and Adam
ceedings of the Fourteenth Workshop on Innovative Lerer. 2017. Automatic differentiation in pytorch.
Use of NLP for Building Educational Applications, In NIPS-W.
pages 52–75, Florence, Italy. Association for Com-
putational Linguistics. Matthew Peters, Mark Neumann, Mohit Iyyer, Matt
Gardner, Christopher Clark, Kenton Lee, and Luke
Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Zettlemoyer. 2018. Deep contextualized word repre-
Thorsten Brants, Phillipp Koehn, and Tony Robin- sentations. Proceedings of the 2018 Conference of
son. 2013. One billion word benchmark for measur- the North American Chapter of the Association for
ing progress in statistical language modeling. Computational Linguistics: Human Language Tech-
nologies, Volume 1 (Long Papers).
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
Kristina Toutanova. 2018. Bert: Pre-training of deep Aleksandra Piktus, Necati Bora Edizel, Piotr Bo-
bidirectional transformers for language understand- janowski, Edouard Grave, Rui Ferreira, and Fabrizio
ing. Silvestri. 2019. Misspelling oblivious word embed-
dings. Proceedings of the 2019 Conference of the
Javid Ebrahimi, Anyi Rao, Daniel Lowd, and Dejing North.
Dou. 2017. Hotflip: White-box adversarial exam-
ples for text classification. Danish Pruthi, Bhuwan Dhingra, and Zachary C. Lip-
ton. 2019. Combating adversarial misspellings with
Michael Flor and Yoko Futagi. 2012. On using context
robust word recognition. Proceedings of the 57th
for automatic correction of non-word misspellings
Annual Meeting of the Association for Computa-
in student essays. In Proceedings of the Seventh
tional Linguistics.
Workshop on Building Educational Applications Us-
ing NLP, pages 105–115, Montréal, Canada. Associ- Keisuke Sakaguchi, Kevin Duh, Matt Post, and Ben-
ation for Computational Linguistics. jamin Van Durme. 2016. Robsut wrod reocginiton
Sylviane Granger. 1998. The computer learner corpus: via semi-character recurrent neural network.
a versatile new source of data for SLA research. na. Richard Socher, Alex Perelygin, Jean Wu, Jason
Yoon Kim, Yacine Jernite, David Sontag, and Alexan- Chuang, Christopher D. Manning, Andrew Ng, and
der M. Rush. 2015. Character-aware neural lan- Christopher Potts. 2013. Recursive deep models
guage models. for semantic compositionality over a sentiment tree-
bank. In Proceedings of the 2013 Conference on
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Empirical Methods in Natural Language Processing,
method for stochastic optimization. pages 1631–1642, Seattle, Washington, USA. Asso-
ciation for Computational Linguistics.
Hao Li, Yang Wang, Xinyu Liu, Zhichao Sheng, and
Si Wei. 2018. Spelling error correction using a Toshikazu Tajiri, Mamoru Komachi, and Yuji Mat-
nested rnn model and pseudo training data. sumoto. 2012. Tense and aspect error correction
for ESL learners using global context. In Proceed-
Roger Mitton. na. Corpora of misspellings. ings of the 50th Annual Meeting of the Association
for Computational Linguistics (Volume 2: Short Pa-
Tomoya Mizumoto, Mamoru Komachi, Masaaki Na- pers), pages 198–202, Jeju Island, Korea. Associa-
gata, and Yuji Matsumoto. 2011. Mining revi- tion for Computational Linguistics.
sion log of language learning SNS for automated
Japanese error correction of second language learn- Reuben Thomas. 2010. Enchant.
ers. In Proceedings of 5th International Joint Con-
ference on Natural Language Processing, pages W. John Wilbur, Won Kim, and Natalie Xie. 2006.
147–155, Chiang Mai, Thailand. Asian Federation Spelling correction in the pubmed search engine. Inf.
of Natural Language Processing. Retr., 9(5):543–564.
Courtney Napoles, Keisuke Sakaguchi, and Joel Helen Yannakoudakis, Ted Briscoe, and Ben Medlock.
Tetreault. 2017. JFLEG: A fluency corpus and 2011. A new dataset and method for automatically
benchmark for grammatical error correction. In Pro- grading ESOL texts. In Proceedings of the 49th An-
ceedings of the 15th Conference of the European nual Meeting of the Association for Computational
Linguistics: Human Language Technologies, pages
180–189, Portland, Oregon, USA. Association for
Computational Linguistics.