Post Editing
Post Editing
Post Editing
Artificial intelligent applications (AIA) flock into our daily life including education. Such
widely used applications are Google translate and ChatGPT. However, their wide influx
into language learning has made many EFL practitioners look at such applications with
inferior sights due to their inability to perceive the benefits that such applications provide
to the users. This qualitative research synthesis aims to gauge the features of postediting
and the criteria used in evaluating machine translate quality. It also examines the
characteristics of postediting machine translation outputs. The study explored the findings
of 20 articles published between 2015 and 2023. Findings reported that postediting
encompasses four features: translation approaches, profession, skills and tasks.
Postediting quality was evaluated according to the translation acceptability, postediting
time and postediting efforts. Finally, postediting outputs includes comparison between
neuro machine translation and machine translation segment and machine translation and
human translation. The study confirmed the importance of using MT. It recommended
training EFL students with postediting skills to enable them to use the advantages of
machine translation in the translation industry.
However, the countless advantages that technology supports our era in various fields,
some researchers believed that technology does not make everything better or viable
(Pym, 2011). Regarding the use of artificial intelligence, (AI) in translation or what is
called machine translation (MT), it is believed that humans are still far away from the
ability of programming software that would be able to produce intelligible translation in
various language pairs and genres (Eszenyi & Dóczi, 2020). However, MT has got a
considerable recognition lately on translation industry (Kasperė et al., 2023). This
positive recognition is proved by the huge number of use which reached 140 billion
words daily translated by Google translate (Hu et al., 2020). MT also used in different
fields and by different users in work, search and housework (Kasperė et al., 2021).
Therefore, it is irrational and unwise to prevent using such advantages (Povilaitienė &
Kasperė, 2022). Many critics dislike machine translation due to the repetitive errors it
produces which no human translators may commit (O’Brien, 2002). Translators who have
not been trained on machine translation post-editing (henceforth MTPE) could not
professionally do the post-editing tasks due to the many skills and competencies required
to be both practically and theoretically a translator be acquainted with (Aranberri, 2017).
MTPE is perceived to take high portion of the translator's work in the future (Koponen,
2016). This study explored the state-of-the-art on status of MT when accompanied with
postediting on end users or readers. Therefore, no matter who produces the translation,
whether human or machine, translation is evaluated according to its quality and
acceptability. Previous research on MTPE was shaped by experiment design, (e.g.,
Aranberri, 2017; Kasper˙e et al., 2019), or quantitative design (Yamada, 2019). There is a
lack of research which focuses on qualitative synthesis. Therefore, this research aims to
bridge a methodological gap in state-of-the-art research on MTPE to verify these
Research questions
Literature review
Machine translation
A few studies have been given to translators on what to do with MT raw products and
how to improve such translation obtained by machines (Aranberri, 2017). Furthermore,
measures were established to evaluate the quality of MT, (Koponen, 2010; Rossi & Carré,
2022). Other studies compared the acceptability of MT outputs to human translation
(Eszenyi & Dóczi, 2020). However, the fundamental role of post-editing in MT,
translation trainees had not been taught such skills (O’Brien, 2002). Stoyanova-Georgieva
(2021) highlighted the viable role of postediting training course for developing neural
machine translation (NMT) outputs in business. MT has passed by many phases
beginning by statistical machine translation (SMS) and reached to NMT.
SMT, as the word 'statistical' donates, refers to the use of statistical methods to translation
form one language into another using corpus. SMT, as Lopez (2008) perceived depends
on "algorithms automatically learn how to translate" (p.1). SMT dominated the field of
MT studies for two centuries (Hearne & Way, 2011). Plethora of research studies
reported that NMT is more efficient than phrase machine translation (PMT) or using
transition memory when referring to the post editing less efforts and translation outputs
efficiency (Sánchez-Gijón et al., 2019; Yamada, 2019). Zouhar et al. (2021) stated that
NMT has the privileged over SMT regarding translation quality. NMT has been obtained
by majority of translation service providers because it becomes the state-of-the-art
approach. NMT depends on 'artificial neural network' to predict the context of word. This
approach began to be used by Google translate in 2016. However, the great development
in the field of NMT outputs, it does not reach human parity (Eszenyi & Dóczi, 2020),
therefore, for ensuring the acceptability of NMT, the translator should edit the MT
outputs (Almaaytah, 2022).
Post-editing is perceived as a strategy to freeing the machine translating form errors and
make it human sounding (Almaaytah, 2022). Postediting implies the modification of
translation outputs including format, style, terminology and the like mistranslation
(Eszenyi & Dóczi, 2020). There are various strategies and methods used in postediting.
Mrinalini et al. (2016) suggested language identification system along with postediting n-
best translation approach. Such approach enhanced translation by 3-5%. Dunne, (2022)
proposed crowd-source post-edits to MT. This approach is based on full document
context. It identifies the place of errors. The study concluded that crowd-sourcing
annotations is a creditable application for evaluating MT. Various authors suggested
interrelated post-editing strategies which should be applied to ensure the creditability of
the translation products. As postediting is not easy to master, O’Brien (2002)
recommended the importance of teaching postediting skills in two studying semesters to
translation students. He suggested the following skills to be trained on "specialized
translation skills; basic linguistics; basic terminology management; IT skills; an
introduction to language technology (focusing on translation memory tools" (p. 103).
Vardaro et al. (2019) studied how European Commission’s Directorate-General for
Translation identifies and corrects errors in NMT and PE. The study found that they
focused on mistranslation terms or stylistics.
Acceptability stands for the level of matchness between the MT product and user'
expectation (Castilho et al., 2018). Castilho (2016) affirmed that measuring MT
acceptability is important to get the impact of translation product on the readers.
AbuSa’aleek (2016) measured the acceptability of four MT systems on translating Islamic
texts according to accuracy, well-formedness and suitability. Findings showed that
Google translate is the best acceptable translation over Babylon translation, World lingo
and Bing translation.
Previous studies
Studies also disseminated the difference between MT and NML (Moorkens et al., 2018;
Sánchez-Gijón et al., 2019; Yamada, 2019; Zouhar et al., 2021). Yamada (2019)
compared between statistical machine translation (SMT) of Google translate before 2014
and now a days Google NMT of the same source texts. Findings resulted from postediting
evaluation revealed no cognitive efforts was found between SMT and NMT. However,
significant difference is reached on editing amount. The output of NMT followed with PE
is more efficient with SMT with PE in case of number of errors. Sánchez-Gijón et al.
(2019) explored the postediting difference between NMT and TM between English and
Spanish. Results revealed that NMT postediting required less editing effort than using
TM segments. Moorkens et al. (2018) probed the difference between translation output
resulted from SMT and NMT in translation literature between English and Catalan. The
comparison was conducted under three dimensions, to translate from scratches, NMT
postediting and SMT postediting. The professional translators revealed through an
interview and questionnaire their preference to translate from scratches than to be
restricted on translation segment. Zouhar et al. (2021) investigated via an experimental
study the postediting quality and time via using NMT and MT system between English
and Czech. Findings indicated that good MT system produces output with less errors. The
study found not relationship between MT system quality and time spent in PE. SMT
correlates with PE time though. Jia et al. (2019 a) studied the translation of 30 fresh
postgraduate Chinse students on Google neural machine translation (GNMT) and scratch
translation in dominate and general text between English and Chinese. Findings reported
that postediting in GNMT is faster than scratch translation in both text types. GNMT
lacks the cognition of the student translators. Interestingly, PE parity with the quality of
scratch translation. Jia et al. (2019 b) conducted a study in which 9 evaluators assess the
translation output quality of SMT and NMT in term of fluency and accuracy. They found
NMT produced better translation quality than SMT. Findings also showed that postediting
which accompanied NMT is not necessary faster than SMT, however, it reduced
cognitive and technical efforts.
Research design
This study uses a qualitative research synthesis design. This kind of research involves
investigating research studies on certain criteria (Chong & Reinders, 2020). In this study,
postediting in MT research was investigated.
Identify keywords
In this study the researcher set keywords as "postediting, machine translation quality, and
machine translation acceptability". These keywords were searched using Google scholar
which gave 2,620 results. The search was narrowed down by searching within a period
from 2015-2023 in which 1,500 results was reached. The researcher selected this phase
because the topic develops whereas technology develops. So, old findings do not provide
reliable information for this research.
The researcher works from the beginning to avoid bias in selecting the research papers in
a way which ensures representativeness. The research searched on the three keywords
identified above on Google scholar. The database includes high indexed journals, books
chapters, theses and conference papers. In this literature research, 40 studies were reached
and listed. Not all of these studies were used in the qualitative synthesis, because some of
them do not sit in the inclusion criteria.
The 40 research studies were appraisal using some criteria were taken into account while
including paper in the analysis.
1. The paper should be published within the year 2015-2023,
2. The article should exhibit empirical study, i.e., has conducted on certain sample,
therefore, reviewed articles were excluded,
3. The paper should be published in high indexed data base, including Scopus, web
of science and the like.
4. The article should adopt qualitative or experimental design, therefore, purely
quantitative articles were excluded.
5. The findings of these articles should show answer to one of more of the research
questions listed in this study.
Out of the 40 studies, 20 studies were listed for the final analysis, Table 1. Articles
which are based on purely quantitative design, or reviews of previous research were
excluded from this qualitative synthesis.
Table 1
Studies included in the synthesis
Research title No. of
Study Journal Category
1. Aranberri HERMES-Journal of Skills What do professional translators do when 2
(2017) Language and post-editing for the first time? First insight
Communication in into the Spanish-Basque language pair
2. Azer & Advances in Language and An evaluation of output quality of machine 1
Aghayi Literary Studies Acceptability translation (Padideh Software vs. Google
(2015) Translate)
3. Béchara et Informatics Postediting The role of machine translation quality 1
al. (2021) efforts estimation in the post-editing workflow
4. Castilho & Linguistica Antverpiensia, Acceptability Acceptability of machine-translated content: 1
O’Brien New Series–Themes in A multi-language evaluation by translators
(2017) Translation Studies and end-users
5. Daems et Frontiers in psychology Postediting Post-editing effort of a novel with statistical 1
al. efforts and neural machine translation.
6. Machine Translation NMT/MTS Post-editing neural machine translation 1
Jia et al.
versus phrase-based machine translation for
7. Macken et Informatics Postediting Quantifying the effect of machine translation 1
al. (2020) efforts in a high-quality human translation
production process
8. Moorkens The Interpreter and MT vs HT What to expect from neural machine 1
(2018) Translator Trainer translation: a practical in-class translation
evaluation exercise
9. Sánchez- Machine Translation NMT/MTS Post-editing neural machine translation 1
Gijón et al. versus translation memory segments.
10. Temizöz Perspectives Profession Postediting machine translation output: 1
(2016) Subject-matter experts versus professional
11. Tezcan et Computer Speech & Postediting time Estimating post-editing time using a 1
al. Language gold-standard set of machine
translation errors
12. Informatics Approaches Translation quality and error recognition in 1
Vardaro et
professional neural machine translation post-
al. (2019)
13. Yang et al. Translation and MT vs HT Measuring the usability of machine 2
(2021) Interpreting Studies translation in the classroom context
14. Yang & Computers & Education Approaches Modeling the intention to use machine 1
Wang translation for student translators: An
(2019) extension of technology acceptance model
15. Yang & Human Behavior and MT vs HT On postediting of machine translation and 1
Mustafa Emerging Technologies workflow for undergraduate translation
(2022) program in China
Book chapters
16. Eszenyi & Fit-for-market translator Tasks Rage against the machine–will post-editing 1
Dóczi, and interpreter training in assignments outnumber translations in the
2020 a digital age future
17. Dunne, MA thesis, The University Approaches A post-editing approach to machine 1
2022, of Dublin translation evaluation at the document-level,
18. Mitchell Doctoral dissertation, Postediting time Community post-editing of machine- 1
(2015) Dublin City University translated user-generated content
19. Conference on Empirical Postediting Neural machine translation quality and post- 1
Zouhar et
Methods in Natural efforts editing performance
al., 2021
Language Processing.
20. Conferences of the Postediting time Machine translation quality and post-editor 1
Association for Machine productivity
Torron &
Translation in the
Americas: MT
Researchers' Track
Postediting features
While investigating the previous publications, four codes were reached about the features
of postediting. They are postediting profession, approaches, skills, and tasks. They will be
displayed accordingly.
Postediting profession. Temizöz, (2016) finds “a degree in translation does not directly
correlate with postediting quality, unless it is combined with
subject-matter knowledge and professional experience in translation" (p. 646). Posteditors
should be prepared well to do the task of posteding properly. It is traditionally believed
that posteditors should own a degree in translation while others (contemporary view)
claimed that knowledge in the field to be edited is necessary. The second perspective is
confirmed by this finding. It means that a posteditor may be one who has knowledge in
the field and masters the language pairs. This finding is confirmed by Ramos (2020) who
reported that during the development of MT, the postediting work will be shifted from
translators to linguistic experts.
Postediting approaches. The study revealed three major approaches for postediting. The
first approach is crowd-sourcing document-level post-edits. In this regard, Dunne (2022)
states that crowd-source editing “gives full document context, removes subjective
numerical judgements and it can specify where MT systems go wrong” (p. ii). While the
second approach, i.e., linear mixed-effect, enables translator to "predict what kind of
behavior…[the translator] experts is associated with the correction of different error types
during the post-editing process" (Vardaro et al., 2019). The third approach is quasi-
circular model. The quasi-circular model developed by Yang and Wang (2019) "not only
reveals significant factors for MT adoption, but suggests positive effects of using MT"
(p.101). Crowd-sourcing document-level approach has some advantages, the most of
which is that it guides the posteditor to the place of mistranslation. Herbig et al. (2019)
recommended the combination of touch, pen and speech in supporting the tasks of
Postediting skills. Postediting implies the modification of certain faults in MT outputs.
These modifications whether be in terminologies, syntax, cultural-specific or cohesion
and coherence. It has been reported that a posteditor "shifts from the task of identifying
and fixing errors, to that of “patchwork” where post-editors identify the machine
translated elements to reuse and connect them using their own contributions"
(Aranberri, 2017, p. 89). Other focused that a posteditor should also have experience in
the topic being edited as well in translation profession. On the contrary, posteditors are
warned from editing what is already right. It was found that posteditor “primarily focus
on correcting machine translation errors but often fail to restrain themselves from editing
correct structures" (Aranberri, 2017, p. 89). This finding is consistent with Popović
(2018) who reported that to ensure the best MT outputs, editor should also have
experience in the topic being edited as well in translation profession.
Postediting tasks. Postediting is an extensive task in which posteditors should focus on
style, format, terms and so forth. This is affirmed by Eszenyi and Dóczi (2020) who
found that "postediting implies the modification of translation outputs including format,
style, terminology and the like mistranslation" (p. 119). The context is also more
important in doing the posteding. It was stated that professional translators while postedit
MT outputs, they "should pay attention to context, logical relationships, four word
phrases, and so on"(Chu & Liu, 2021, p. 129). To conclude, the tasks of posteding may
produce us to the traditional view that posteditors should have a major in translation and
the counterparty view which shift the work of postediting from translators to experts in
linguistics. This can be confirmed by Kasperė et al. (2023) who found that professional
translators read the MT outputs deeply and critically whereas nonprofessional users
accept the translation outputs due to their low awareness about MT quality.
Machine translation quality evaluation
Findings reported that MTPE is evaluated according the end users' acceptability to the
translation outputs, the postediting time and efforts required to produce a final copy of the
translation. These three criteria will be discussed underneath:
Acceptability. Findings showed that MTPE is acceptable by users. The MT outputs can be
relied on in providing translation service. Azer and Aghayi (2015) state, "the machine-
generated translations are intelligible and acceptable in translating certain text- types, for
end-users and Google Translate is more acceptable from end-users point of view" (p.
226). Furthermore, Castilho and O’Brien (2017) pinpoint “…the usability and satisfaction
results by end-users insofar as the implementation of light PE both increased the
usability and acceptability of the PE instructions and led to satisfaction being
reported" (p. 120). These findings indicated that pinpoint for MT quality is the end-user's
acceptability to the translation outputs. These findings are aligned with the findings of
AbuSa’aleek (2016), who established three criteria for accepting MT to Islamic texts:
accuracy, coherence, and appropriateness. Like, Azer and Aghayi's (2015) findings,
AbuSa’aleek (2016) found that Google translate produces the best acceptable translation
over other software and applications.
Postediting time. Sanchez-Torron and Koehn (2016) mention, "the MT system with the
lowest BLEU score produced the output that was post-edited to the lowest quality and
with the highest PE effort, measured both in HTER and actual PE operations," (p. 16).
Moreover, Tezcan et al. (2019) affirm "time can be estimated with high accuracy when all
the translation errors in the MT output are known" (p.120). To sum up, the "PE quality is
largely independent of the profile characteristics measured" (Mitchell, 2015, p. )׀׀. These
findings showed that the time spent in postediting is an indicator for assessing the quality
of PEMT. The time requires to do the posteding is controlled by some factors for example
the familiarity of the posteditor with the type of error reduces the postediting time. These
results are in line with Zouhar et al. (2021) who indicated that a translation with less
errors is an indicator of good MT system. On the contrary, they found no relationship
between MT system quality and time spent in PE.
Postediting efforts. Daems et al. (2017) say, "we find that most post-editing effort
indicators are influenced by machine translation quality, but that different error types
affect different post-editing effort indicators, a more fine-grained MT quality analysis is
needed to correctly estimate actual post-editing effort. Coherence, meaning shifts, and
structural issues are shown to be good indicators of post-editing effort" (p.1). Macken et
al. (2020) state "typing effort is the most frequently mentioned reason why participants
preferred working with MT (p. 12). Furthermore, Zouhar et al. (2021) state, "better MT
systems lead to fewer changes in the sentences in this industry setting" (p.1). Likewise,
"good MTQE information can improve post-editing efficiency and decrease the cognitive
load on translators" (Béchara et al., 2021, p. 1). These findings are consistent with Toral
et al.'s (2018) findings. They reported that postediting of MT reducing the cognitive
efforts by 29% in phrase-based statistical MT (PBMT) and 42% in NMT. The affordance
required from the translator to do the correction on MT is an indicator for evaluation the
service quality. Number of errors and the coherence between sentences are the backbone
of the efforts required for freeing the MT outputs from errors.
The majority of studies compared between neural machine translation (NMT) and
segment machine translation (SMS) in various aspects whether the translation quality
outputs or the time required for postediting. MT and human translations were also
NMT post-editing produces better translation outputs with less errors than the the
translation produced by using TM segments. Sánchez-Gijón et al. (2019) state, "NMT
post-editing involves less editing than TM segments, but this editing appears to take more
time" (p. 31). Furthermore, this lead to a less cognitive efforts of posteditors. Jia et al.
(2019 b) affirm, "post-editing output from NMT reduces the technical and cognitive
effort" (p. 90). The finding of Sánchez-Gijón et al. (2019) is confirmed by Zouhar et al.
(2021) who mentioned that NMT has the privileged over SMT regarding translation
quality. Similarly, Yamada (2019) reported that the output of NMT followed with PE is
more efficient with SMT with PE in case of number of errors.
MT vs HT
Yang et al. (2021) state that MTPE "is more efficient than human translation" (p.1).
Furthermore, "post-editing produces fewer errors than human translation. While the types
of errors vary, errors in terms of accuracy outnumber those related to fluency" (Yang et
al., 2021, p. 1). Eszenyi and Dóczi (2020) say, "human translators are not superior in
skills to the machine in the particular text type" (p. 119). Moorkens et al. (2018) state "a
neural MT system trained on literary data does not currently have the necessary
capabilities for a creative translation (p.375). Likewise, MT is accompanied by some
defects. Yang and Mustafa (2022) identify them as "confusing postediting standard,
inconsistent quality, ways to choose machine translation providers, and technical issues"
(p.1). These findings can be s reported that MT may produce acceptable outputs than
human translation. The superiority was prevailed in the time required, productivity and
the like. These findings disagree with Eszenyi and Dóczi (2020) who believed that MT
outputs are still far away from HT in some language pairs and genres. One of the
disability of machine in translating literary texts. To sum up, the superiority of MT over
HT may be limited to a special text type.
Studies reported that NMT provides better translation outputs than SMT, and MT may
outperform HT in some text types and genre types. This field is connected with the daily
development of NMT software and applications. More research studies are needed to
classify the text-types which may be translated by MT in a way to meet the translation
quality and acceptability. Studies on NMT were conducted on language pairs from
English to Spanish, Japanese, German. Very limited studies were conducted on Arabic.
Therefore, a call to measure the NMT outputs between Arabic and English on various text
Developing our students' postediting skills is the responsibilities of several entities. The
university professors at English departments which are the corner stone. They should
held symptoms and conferences along with the stakeholders, say in this situation,
companies and translation offices about the requirement or the characteristics of the
graduates. Taking this in mind, the university councils and Ministry of higher education
should also add posteding courses in all departments. Postediting may be one of the
learning outcomes that the university graduate should acquire in the studying years.
Therefore, courses on postediting should be inserted to the English departments