1 s2.0 S0346251X24002069 Main

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

System 125 (2024) 103424

Contents lists available at ScienceDirect

System
journal homepage: www.elsevier.com/locate/system

Language learning development in human-AI interaction: A


thematic review of the research landscape☆
Feifei Wang a , Alan C.K. Cheung a, * , Ching Sing Chai b
a
Department of Educational Administration and Policy, Faculty of Education, The Chinese University of Hong Kong, Shatin, Hong Kong
b
Department of Curriculum and Instruction, Faculty of Education, The Chinese University of Hong Kong, Shatin, Hong Kong

A R T I C L E I N F O A B S T R A C T

Keywords: Interaction is an indispensable part of language learning. Artificial intelligence (AI) has been
Language learning increasingly applied in language learning to promote interaction in the learning process. In
Artificial intelligence response to the paradigmatic shifts in AI application design, this review maps the research
Human-AI interaction
landscape of language learning development in human-AI interaction. From the resulting analysis
Education
Paradigm
of 49 studies, this study investigates the contextual characteristics by AI-supported interaction
type, AI application, target language, educational level, etc. Moreover, three research paradigms
are identified in this emerging field, i.e., Paradigm One (AI-directed, teacher-as-facilitator,
learner-as-recipient), Paradigm Two (AI/teacher-codirected, learner-as-collaborator), and Para­
digm Three (AI/teacher/learner-codirected). The paradigms are induced through analysis of eight
constructs: human-AI relationship, learning objective, task type, level of pre-structuring, mode of
engagement behavior, knowledge-change process, cognitive outcome, and research focus. The
philosophical and linguistic underpinnings for each paradigm are discussed. Additionally, we
highlight future research implications including investigating under-researched themes and
exploring diverse methodological possibilities and appropriateness among the three research
paradigms.

1. Introduction

Recent research has proposed paradigmatic shifts in artificial intelligence (AI) application and research. In the field of AI application
development, there has been a paradigmatic shift from a technology-centered approach to a human-centered approach. While the focus of
the shift has primarily moved from enhancing technology through high levels of automation to understanding and meeting human needs
(Shneiderman, 2020), a fresh vision, known as the human-centered AI approach, further develops the human-centered approach, for
emphasizing the ethically aligned design of AI to empower people while ensuring human control of AI (Donahoe, 2018; Li, 2018; Xu et al.,
2023). Similarly, in education, paradigmatic shifts in the use of AI have been identified, moving from AI-directed, AI-supported, to
AI-empowered educational practices (Ouyang & Jiao, 2021). Building on these shifts, AI in education has been evolving to promote learner
agency, personalized learning, and metacognitive skills (Ouyang & Jiao, 2021). Human-AI interaction (HAII) is iteratively developing
toward learner-centered personalization (Sundar, 2020). However, in light of the extant educational theories, new interpretations from the
learning sciences are encouraged to shed light on the transparent and inclusive use of AI in education (Hwang et al., 2020).


We have no conflict of interest to disclose.
* Corresponding author. the Department of Educational Administration and Policy, the Chinese University of Hong Kong, Shatin, Hong Kong.
E-mail addresses: ffwang@link.cuhk.edu.hk (F. Wang), alancheung@cuhk.edu.hk (A.C.K. Cheung), CSChai@cuhk.edu.hk (C.S. Chai).

https://doi.org/10.1016/j.system.2024.103424
Received 27 July 2023; Received in revised form 24 May 2024; Accepted 29 July 2024
Available online 30 July 2024
0346-251X/© 2024 Published by Elsevier Ltd.
F. Wang et al. System 125 (2024) 103424

Language learning is under the influence of these paradigmatic shifts. Language learning is traditionally a discipline dominated by
human-human interaction between teachers and learners, and among learners. Participation in interaction assists language learners in
obtaining input and feedback, and modifying and adjusting output in ways that expand learners’ interlanguage capacity (Pica et al.,
1996). With more and more AI technologies to facilitate language learning, HAII is supplementing human-human interaction in
language learning (Fu et al., 2020). HAII benefits language learners by providing promising solutions to intricate problems in language
learning classrooms, such as lack of personalized feedback to students (Chen et al., 2022), inadequate language practice opportunities
(Hsiao et al., 2015), and shyness or anxiety to talk in a new language (Leeuwestein et al., 2021).
Despite the affordances of HAII for language learning development, some issues have surfaced. Key issues that call for resolutions
include critical examination of emerging paradigms, such as how they are related to methodologies, and how learning will be
transformed (Vostroknutov et al., 2021). Moreover, understanding different theoretical perspectives in HAII in the current context
exerts critical influences on the quality of teaching and learning (Hwang et al., 2020).
Thus, this review aims to map the research landscape of language learning development in HAII by investigating contextual
characteristics of the research and identifying research paradigms. This study contributes to the existing literature by responding to the
calls for future research on language learning development in HAII (Hwang et al., 2020; Liang et al., 2021; Ouyang & Jiao, 2021;
Vostroknutov et al., 2021). Mapping the research landscape in the field may help researchers, teachers, AI developers, policymakers,
and other relevant stakeholders pinpoint their research orientations on language learning development in HAII, structure inquiry, and
explicate philosophical and linguistic assumptions underlying their methodological choices.

2. Literature review

2.1. Language learning development through interaction

Learning occurs in meaningful interaction with other humans, in an open and authentic environment, and in real-life situations
directly relevant to learners’ experiences (Gray, 2019). Traditionally, interaction refers to the reciprocal face-to-face actions through
either verbal channels including written or spoken words, or nonverbal channels such as touch, facial expression, gesture, posture, and
environment factors (Lindberg et al., 2021). During class interactions, learners have helpful conversations that may facilitate the
development of language skills.
In language learning development, verbal interaction should provide a context for negotiation of meaning to facilitate classroom
oral exchanges that span from formal drilling to authentic meaning-focused information exchanges (Yeh & Lai, 2019). Language
learners first need to receive comprehensible linguistic input to communicate authentically through spoken or written language, and
then they listen to others’ authentic linguistic output and respond to continue the communicative events (Yenkimaleki & van Heuven,
2021). Consequently, the context for oral interaction is created.
Nonverbal interaction involves nonverbal behaviors including environment factors, physical characteristics of communicators, and
various behaviors displayed by communicators (Wang & Cheung, 2024). Verbal and nonverbal behaviors interrelate during interaction
in that nonverbal behaviors can repeat, conflict with, complement, substitute, and regulate verbal behaviors (Remland, 2016).
Therefore, creating intended nonverbal behaviors in classroom interaction is necessary for language learning development.

2.2. Language learning development through human-AI interaction

Educators have been embracing new technologies to improve language learning, with AI emerging as one of the latest technologies.
AI generally refers to computer systems designed to interact with humans, performing human-like capabilities such as visual
perception and speech recognition, and intelligent behaviors such as assessing information and then taking actions that align with a
predefined goal (Luckin et al., 2016). Common AI technologies include machine learning, natural language processing, data mining,
and artificial neural networks. Nowadays, AI has significant applications in language learning for the affordances of interacting
naturally with humans, predicting learners’ future performance, and providing personalized learning content, instant feedback, and

Fig. 1. Human-AI interaction model in the educational environment (Adapted from Rajagopal & Vedamanickam, 2019).

2
F. Wang et al. System 125 (2024) 103424

objective assessment of learners’ progress (Pokrivcakova, 2019).


As AI is increasingly applied in language learning, language learning development through human-AI interaction has become a
noteworthy topic (Wang et al., 2023). Human-AI interaction (HAII) stems from human-computer interaction (HCI) which aims to
achieve optimal fit in product design among users, computers, and required services (Te’eni, 2006). The current transformation to­
wards HAII builds on AI’s reasoning capabilities, prediction, and analysis power to act as autonomous agents that can make decisions
on their own (Virvou, 2022). According to various applications of AI in HAII, there are four levels of HAII: low-level sensor-based AI
application (e.g., voice/face/gesture recognition), medium-level interaction software systems (e.g., natural language processing),
high-level interaction software systems or modules (e.g., intelligent tutoring system), and high-level interaction application fields (e.g.,
education and health) (Virvou, 2022).
In education, HAII seeks the fit in instructional design between AI and humans (i.e., teachers and students) to facilitate learning.
Based on Rajagopal and Vedamanickam (2019), the adapted HAII model in the educational environment is shown in Fig. 1. Fig. 1
indicates that AI can interact with students autonomously, with or without teachers’ involvement, in the AI-supported educational
environment. When teachers are involved, teachers collaborate with students to adjust instructional strategies and learning ap­
proaches based on AI’s feedback, and the adjustments by teachers and students also influence AI to deliver more personalized feedback
on instruction and learning. When teachers are not involved, AI operates autonomously to adjust and personalize educational content,
based solely on students’ responses. Ideally, the fit in instructional design between AI and humans (i.e., teachers and students) can be
gradually adjusted and achieved.
HAII has the potential to improve language learning by offering various contexts for negotiating meaning verbally and exhibiting
nonverbal behaviors. Interactive AI applications, such as chatbots, intelligent agents, and conversational agents, are able to create
contexts of interaction through natural conversation by providing information, discussing specific topics, and performing tasks
(Smutny & Schreiberova, 2020). In oral interaction, negotiation of meaning occurs when AI applications choose appropriate language
inputs according to learners’ language proficiency and provide personalized responses to students so that they may practice output in
meaningful interaction (Kim, 2017). Moreover, AI applications associated with robotics display nonverbal behaviors with faces, arms,
and appearances. With nonverbal behaviors, language learning process is perceived to be enjoyable, fun, and interactive (Lin et al.,
2022). HAII has been reported to develop learners’ language proficiency through these verbal and nonverbal channels (Wang et al.,
2023).

2.3. Paradigmatic shifts of language learning development in human-AI interaction

The field of HAII is currently experiencing a paradigmatic shift of AI application development from a technology-centered approach
to a human-centered approach, during which a new vision called a human-centered AI approach extends the human-centered approach
by emphasizing ethically aligned designs of AI (Xu et al., 2023). Initially, the development of AI primarily relied on a “tech­
nology-centered design” approach. Researchers and developers concentrated on building AI algorithms and systems, emphasizing
machine learning, and measuring algorithm performance (Shneiderman, 2022). However, ignoring the importance of human factors in
the design led to the failure of many AI systems, biased AI-based decision-making applications, and concerns over prejudice and
inequality for certain users (Ntoutsi et al., 2020). Therefore, since 2015, many critics have voiced their concerns regarding the creation
of a world ruled by AI and called for prioritizing human factors (Hawking et al., 2015; Russell et al., 2015; Xu et al., 2023). The
human-centered approach urges the need to fully understand AI-based machine behavior and its impacts on humans and society (Xu
et al., 2023). To further respond to the concerns, Standford University proposed a human-centered AI approach in 2018 (Donahoe,
2018; Li, 2018). This approach, rooted in the “human-centered” philosophy, extends the human-centered approach by advocating the
ethically aligned design of AI to empower and enable humans, while revealing the underlying values, limitations, and ethics of data
algorithms to encourage ethical, interactive, and contestable use (Capel & Brereton, 2023).
The paradigmatic shift of AI application development in HAII has brought in changes in many spheres of our life, among which the
field of language learning is significantly influenced because advances in AI are enabling AI to have considerable potential for language
learning development (Godwin-Jones, 2023). Under the influence, language learning also experiences paradigmatic shifts over time
(Liang et al., 2021). How to understand the paradigms in the current contexts that language learning has been developed in HAII, how
the paradigms are associated with methodologies, and how the paradigms transform learning remains unclear (Vostroknutov et al.,
2021). To move through the literature and establish a shared understanding of the research landscape, this study aims to answer the
following research questions (RQ):

RQ1: What are the contextual characteristics of research on language learning development in human-AI interaction?
RQ2: What research paradigms of language learning development can be identified in human-AI interaction?

3. Methods

This study presents a thematic review of the studies on language learning development in human-AI interaction. A thematic review
is an appropriate approach for it summarizes the findings of an emerging research field by focusing on insights of themes from related
studies (Hakimi et al., 2021; Ültay & Çalık, 2012). To achieve this, we followed the PRISMA guidelines to investigate the themes:
research contexts and paradigms (Page et al., 2021). Relevant studies were examined. Thus, research contexts and unique features of
each paradigm were explicitly revealed. What followed was an interpretative account of implications for future research. The reporting
flowchart aligns with PRISMA recommendations (Page et al., 2021), with minor modifications to address this study’s purposes as

3
F. Wang et al. System 125 (2024) 103424

shown in Fig. 2. The specific methods used are described below.

3.1. Inclusion and exclusion criteria

Before conducting the literature search, we established a set of inclusion criteria. First, the study was conducted with AI tech­
nologies, so any study using tools without AI technologies such as non-intelligent tutoring systems and multimedia platforms was
excluded. Second, the study was conducted for language learning, so any study for other purposes such as evaluating the effectiveness
of an AI system or using AI to analyze data was excluded. Third, the study was conducted in regular educational settings, so any study
in therapeutic contexts was excluded. Fourth, the study included the necessary information as recommended by What Works Clear­
inghouse to describe studies and report findings clearly and comprehensively (What Works Clearinghouse, 2021), such as experimental
conditions, participants’ characteristics, study designs, and measures. Any study without the necessary descriptions of studies and
reporting of findings was excluded. Fifth, the study included empirical information, so any study about the design of an AI system was
excluded. Sixth, the study was written in English, so any study written in other languages was excluded. Seventh, the same study was
included only once. If multiple articles reported on the same study, the article with the most description and empirical evidence was
included.

3.2. Search strategy

To address the research questions, we examined literature within the six databases: Web of Science, ERIC, Scopus, IEEE Xplore,
PsycINFO, and ProQuest. These databases were chosen because they are commonly used in literature reviews on education, psy­
chology, social sciences (Berkowitz et al., 2017), and computer science research (Clark et al., 2016). Besides published articles, we also
retrieved unpublished literature (i.e., conference proceedings, dissertations, and theses) due to the emerging nature of this field to gain
the most up-to-date information about AI applications in language learning and to minimize potential publication bias (Borenstein
et al., 2021).
The literature search was performed with six sets of query terms listed in Table A1 of the appendix, including terms related to (1)
HAII process (Rapp et al., 2021), (2) artificial intelligence (Zawacki-Richter et al., 2019), (3) educational levels (Zawacki-Richter et al.,
2019), (4) learning settings, (5) language education, and (6) research paradigms. The six sets of terms were integrated with Boolean

Fig. 2. PRISMA flowchart about literature search, screening, and selection.

4
F. Wang et al. System 125 (2024) 103424

operators (Cooper, 2017), specifically with the “OR” operator within each set and the “AND” operator between sets. Moreover, we
performed another round of search by examining the reference lists of retrieved sources. The timeframe of the search was not limited to
a specific period as this is relatively an emerging topic.

3.3. Study screening and selection procedure

The literature search was conducted between May and June 2023. From the six databases, we retrieved 6540 articles. After
removing duplicates, we screened the remaining 5491 articles in a three-phase screening process, as indicated in Fig. 2. First, we
reviewed the title, keywords, and abstract according to the inclusion criteria. This excluded 5322 articles and left 169 articles for the
next phase. Second, all the 169 articles underwent a full-text review. By reading the full text, we obtained 46 articles after excluding
123 articles. The reasons for exclusion are listed in Fig. 2. Third, the reference lists of the 46 studies were screened to identify
additional eligible studies. This phase retrieved three more articles. Altogether the three-phrase screening process resulted in the
selection of 49 eligible articles. During the screening and selection process, one of the researchers cooperated with another inde­
pendent researcher who was experienced in conducting review studies on the application of AI in education. The interrater reliability
of percent agreement reached 92%. The two researchers discussed the resulting discrepancies until reaching a consensus.

3.4. Analysis and coding of selected studies

To answer RQ1, contextual characteristics of research on language learning development in HAII, we analyzed the description of
the 49 studies. Then we synthesized contextual information from the eight aspects: year of publication, the country or region where the
study was conducted, sample size, AI application, target language, language domain, educational level, and AI-supported interaction
type (see Table A2 in the appendix). The former five aspects were coded based on the description of the articles. The latter three aspects
were coded as follows. First, language domains were divided into eight categories (Hung et al., 2018; Nunan, 1999), including four
language skills (i.e., listening, speaking, reading, and writing), three language components (i.e., pronunciation, vocabulary, and
grammar) (Pouresmaeil & Vali, 2023; Simon & Taverniers, 2011), and integrated domains. Second, educational levels were divided
into five categories: preschool, primary, secondary, higher, and mixed education (Hung et al., 2018). Third, based on the number of
students and AIs involved in the study, AI-supported interaction types were coded into three categories: individual use of an AI, group
use of an AI, and inter-group use of AIs. Individual use of an AI means that each student interacts with an AI application individually.
Group use of an AI represents that a group of students interacts with an AI application, either face to face or AI-mediated. Inter-group
use of AIs refers to interaction between groups via AI applications when groups are not in the same place such as chats, newsgroups,
and online forums (Strijbos et al., 2004). This coding investigates how AI has been used to support interaction in language learning.
To address RQ2, research paradigms of language learning development in HAII, we performed a thematic content analysis. To
reveal important constructs of research paradigms, we used two frameworks to guide our initial analysis and coding: the computer-
supported process-oriented interaction framework (Strijbos et al., 2004) and the engagement framework (Chi & Wylie, 2014).
From the former interaction framework, three key dimensions that influenced computer-supported interaction were coded as con­
structs of research paradigms: learning objective, task type, and level of pre-structuring (Strijbos et al., 2004). Since interaction is
highly associated with engagement in learning (Crook, 1998), we added three critical constructs from the latter engagement frame­
work: mode of engagement behavior, knowledge-change process, and cognitive outcome (Chi & Wylie, 2014). While using the six
constructs as a priori, we did not forcefully impose them onto research paradigms. During several rounds of qualitative coding, we
adopted an inductive approach to allow for new constructs (if any) to emerge.
After the open coding process, the coded constructs were summarized into areas of convergence across the selected studies to
identify potential different constructs or paradigms (Corbin & Strauss, 2008). Just as adopted by the thematic reviews (Hakimi et al.,
2021; Tieken & Auldridge-Reveles, 2019), we focused on findings that were supported by multiple sources, and examined thematically

Fig. 3. Year of publication of the included articles.

5
F. Wang et al. System 125 (2024) 103424

relevant counterevidence or divergent findings, both within and between contexts. After drafting and revising summative research
memos, we synthesized our findings, which we present below.

4. Results

4.1. Contextual characteristics

Table A2 in the appendix presents the contextual characteristics of all the 49 studies. First, as for the year of publication, these
studies were published between 2008 and 2022, as shown in Fig. 3. The articles were divided into three phases, namely 2008–2014,
2015–2017, and 2018–2022, according to the paradigmatic shift of AI application development from a technology-centered approach
to a human-centered approach called for by many critics in 2015 (Hawking et al., 2015; Russell et al., 2015; Xu et al., 2023), and the
recommendation of the human-centered AI approach by Stanford University in 2018 (Donahue, 2018; Li, 2018). There were seven
articles (2008–2014), seven (2015–2017), and 35 (2018–2022). The findings indicated an increasing research interest in investigating
language learning development in HAII, which was particularly associated with the paradigmatic shift towards the human-centered AI
approach since 2018.
Second, the top nine geographic locations in which the studies were conducted were Taiwan (n = 9), the mainland of China (n = 7),
United States (n = 5), Japan (n = 4), Germany (n = 3), Iran (n = 3), Kazakhstan (n = 3), Korea (n = 3), and Netherlands (n = 3). Nine
other articles included such countries as the United Kingdom, Russia, and Italy (one article from each).
Third, regarding the educational level, Fig. 4 shows the distribution of participants’ educational levels. A total of 16 studies were
conducted in elementary education, accounting for the largest proportion of all participants (33%), followed by 13 in higher education
(27%), 10 in preschool education (20%), and 10 in secondary education (20%).
Fourth, as for the sample size, except one study with an unspecified number of students (Chen et al., 2021), 48 other studies had an
average number of about 86 students, totaling 4114 learners, including 18 studies (≤50), 22 studies (51 ~ 100), and 8 studies (≥101).
Fifth, concerning the target language, English was the dominant language of the studies. A total of 39 out of 49 studies learned
English as the target language (80%), followed by three studies for Chinese (6%), two for Dutch (4%), and two for French (4%), etc.
Sixth, as Fig. 5 shows, language domains included integrated skills (29%), vocabulary (27%), speaking (22%), writing (16%), and
reading (6%). Integrated skills involved, for instance, a combination of vocabulary, grammar, and sentence patterns (Al Hakim et al.,
2022), or listening, speaking, reading, and writing (Hong et al., 2016).
Seventh, regarding the AI application, AI technologies were mostly incorporated into the application of robots in 28 studies (57%),
followed by chatbots in eight studies (16%), and automated writing evaluation systems in seven studies (14%). Three studies (6%)
included intelligent tutoring systems, which use AI technologies to provide adaptive feedback, personalized learning content, and
individualized learning path navigation (Mousavinasab et al., 2021), compared to non-intelligent tutoring systems without AI tech­
nologies that deliver uniform feedback, static learning content, and pre-determined learning path navigation (Weimann et al., 2022).
Other AI applications were automatic speech recognition systems, virtual reality, and Internet of Things intelligent image.
Eighth, in the AI-supported interaction type, individual use of an AI was the most common type. In 30 out of 49 studies, each
student interacted with an AI application individually (61%). A sum of 17 studies employed group use of an AI (35%), in which a group
of students interacted with an AI application, either face to face or AI-mediated. Only two studies applied inter-group use of AIs (4%).
In the two studies (Obari et al., 2020; Obari & Lambacher, 2019), different student groups used different AI speakers with virtual
reality to learn, while recording movie clips of learning experiences and uploading to Facebook. Then the groups interacted with each
other to present their learning experiences. In this way, students developed their language learning through interaction with AIs.

4.2. Research paradigms

For RQ2, we conducted a thematic analysis, using grounded theory as an analytical approach, to explore a specific topic (i.e.,

Fig. 4. Educational levels of participants in the included articles.

6
F. Wang et al. System 125 (2024) 103424

Fig. 5. Language domains of the included articles.

language learning development in HAII) through an inductive process and develop a theoretical understanding (Corbin & Strauss,
1990). After rounds of qualitative coding and refinement, eight constructs within four domains emerged: (1) human-AI relationship,
(2) key dimensions that influence HAII (i.e., learning objective, task type, and level of pre-structuring), (3) learners’ engagement
behavior in learning process and cognitive outcomes (i.e., mode of engagement behavior, knowledge-change process, and cognitive
outcome), and (4) research focus. By selective coding, we integrated themes, subthemes, and their relationships from the findings of
open and axial coding (Kim & So, 2022). Finally, we identified three research paradigms of language learning development in HAII:
Paradigm One (AI-directed, teacher-as-facilitator, learner-as-recipient), Paradigm Two (AI/teacher-codirected, learner
-as-collaborator), and Paradigm Three (AI/teacher/learner-codirected). Table 1 displays the conceptualization of these paradigms.
Summaries of studies that primarily focus on different paradigms are shown in the appendix, including Table A3 (Paradigm One: 32
studies), Table A4 (Paradigm Two: 12 studies), and Table A5 (Paradigm Three: five studies).

4.2.1. Paradigm One: AI-directed, teacher-as-facilitator, learner-as-recipient


Paradigm One is characterized as AI-directed, teacher-as-facilitator, and learner-as-recipient. AI directs the learning process with
pre-specified domain knowledge, and teachers are either directly involved in or indirectly support the instructional process to help
learners receive knowledge according to pre-determined learning pathways (McCarthy et al., 2019; Schodde et al., 2019). The
paradigm is associated with the AI-centered design approach grounded in technology-centered design. Technology-centered design is
characterized with technical rationality of imparting expert knowledge to users, and the expert knowledge is often decided by de­
signers or clients (Krippendorff, 2005). AI professionals are dedicated to studying algorithms to represent how expert language
knowledge operates and to certify the effectiveness of AI in language learning. A typical example is the earlier work of robots (Hsiao
et al., 2015; Hyun et al., 2008). While these robots offered learners immediate feedback about incorrect replies and directed the
learning process, they did not build a personalized knowledge base or capture learners’ psychological profile to provide individualized
feedback.

4.2.1.1. Human-AI relationship. This paradigm demonstrates a stimulus-response relationship between AI and humans (Farooq &
Grudin, 2016). AI responds to learners’ input through fixed algorithms and logic rules that have been programmed into the expert

Table 1
Conceptualization of three research paradigms of language learning development in human-AI interaction.
Domains of three research Constructs of three Paradigm One: AI-directed, Paradigm Two: AI/teacher- Paradigm Three: AI/
paradigms research paradigms teacher-as-facilitator, learner-as- codirected, learner-as- teacher/learner-
recipient collaborator codirected

1. Human-AI relationship (1) Human-AI Human-AI stimulus-response Human-AI collaboration Human-AI teaming
relationship
2. Key dimensions that influence (2) Learning objective To measure learning To foster learner-centered To foster learner agency
HAII adaptive learning
(3) Task type Closed-ended task Semi-open-ended task Open-ended task
(4) Level of pre- High pre-structuring Moderate pre-structuring Low pre-structuring
structuring
3. Learners’ engagement behavior (5) Mode of Passive receiving/active Constructive generating Interactive dialoguing
in learning process and engagement behavior manipulating
cognitive outcomes (6) Knowledge- From information storage to Inference Co-inference
change process knowledge integration
(7) Cognitive Recall/apply Transfer Co-create
outcome
4. Research focus (8) Research focus To examine the effectiveness of To understand and interpret To foster learner agency to
language learning language learning process improve language
learning

7
F. Wang et al. System 125 (2024) 103424

system. The speech and text automated marking technologies enable language learning to be assessed (Chen et al., 2022). Thus, AI
mainly plays the role of an assessor (Balkibekov et al., 2016; Su et al., 2020). Teachers are the ultimate director who monitors the
process of language testing and evaluates students’ language learning according to standardized test outcomes, which is based on
statistical algorithms that model the criteria of markers, experts, or examinations boards (Richardson & Clesham, 2021). Students are
test takers performing the assessment tasks within a predetermined scope (Jeon, 2021). Their language learning is reflected by grades
or scores.

4.2.1.2. Key dimensions that influence human-AI interaction. First, the learning objective is to measure learning to categorize students
and report judgments to researchers or teachers. Grades or scores are considered to determine students’ levels of success or language
proficiency at a particular time (Chen et al., 2022). Second, to achieve the objective, this paradigm often adopts closed-ended language
learning tasks. The closed-ended tasks require a variety of closed-ended information, consist of several successive question-and-answer
steps, and can be solved by adding information each time. Common closed-ended tasks include multiple choice, item-matching,
item-sequencing, true-false tasks, etc. (Balkibekov et al., 2016; Hsiao et al., 2015). Third, this paradigm has a high level of
pre-structured interaction in the AI-supported environment. Examples are that learners matched pictures and words at increasing
levels of difficulty in questions (Kennedy et al., 2016), or that HAII was pre-programmed in the same script across different groups
(Leeuwestein et al., 2021), or that AI was set to either always win or lose when playing a game with learners (Meiirbekov et al., 2016).
In the high level of pre-structured interaction, learners are expected to give limited responses to follow specific pre-programmed
scaffolding pathways.

4.2.1.3. Learners’ engagement behavior in learning process and cognitive outcomes. The mode of learners’ engagement behavior is either
passive receiving or active manipulating (Chi & Wylie, 2014). Passive receiving mode involves the lowest level of cognitive
engagement, such as listening to AI instruction, watching robot gestures, or reading multimedia cues on screen (Vogt et al., 2019).
Active manipulating mode occurs when students undertake overt psychomotor actions in interacting with AI instruction, such as the
motor control of buttons and physical movement responding to AI interface (Alemi & Haeri, 2020). However, there is no clearly
defined boundary between being attention focused and unfocused (Chi & Wylie, 2014). Therefore, in this paradigm, learners interact
with AI in either passive receiving or active manipulating mode.
Regarding knowledge-change process, language develops from storing information to integrating new information with prior
knowledge. When students interact with AI, new information can initially be stored in their memory, and then be integrated into their
knowledge once manipulations of engaging with the learning content occur (Wang et al., 2013). In the process, language learning is
expected to supplement, expand, or strengthen the existing schema of knowledge. Moreover, cognitive outcomes in the paradigm are
to recall or apply new knowledge to similar contexts created by AI with acceptable and authentic tasks that facilitate language practice
(Chen et al., 2021).

4.2.1.4. Research focus. The research in the paradigm focuses on the effectiveness of language learning in HAII, such as language
achievements, affective influence, and behavioral changes (Konijn et al., 2022; Schodde et al., 2019). The interaction mode among
teachers, learners, and AI is situated in a controlled environment where learners learn language with AI. Effectiveness is often
examined between the treatment group and the control group. In the process, teachers either incorporate AI into classroom teaching or
assign students to learn independently with AI.
Results from these studies showed that learners who interacted with AI to learn language achieved higher outcomes than those who
did not (Alemi et al., 2014; Deng et al., 2022; Hyun et al., 2008). Specifically, AI could foster students’ social engagement in interaction
and enhance their vocabulary performance through positive emotional feedback (Ahmad et al., 2019). Furthermore, the treatment
group, which used AI for dynamic assessment of the learning process, gained significantly more vocabulary compared to the control
group (Jeon, 2021). These results indicate that with advances in AI computing, adaptive techniques, information processing, etc., HAII
provides learners with interactive oral practices, immediate feedback, individualized learning content, while simultaneously boosting
motivation and engagement, and reducing language learning anxiety.

4.2.2. Paradigm Two: AI/teacher-codirected, learner-as-collaborator


Paradigm Two is characterized as AI/teacher-codirected and learner-as-collaborator. AI transfers partial control of direction over
learning to teachers, and learners collaborate with AI to focus on personalized learning experience (Kory & Breazeal, 2014; Wang et al.,
2022). The paradigm aligns with the human-centered design approach in the HCI field. The paradigmatic shift from an AI-centered
design to a human-centered design is in line with the current emphasis on learner-centered learning (Hsieh & Lee, 2021).
Human-centered design places human beings as central, with the primary objective of understanding users holistically. It involves
users throughout the design process in an involvement continuum that spans from informative, through consultative, to participative
(Damodaran, 1996). A typical example is the robots with a scripted set of dialogue options that lead learners through storytelling
games and adapt responses to learners’ answers (Conti et al., 2020; Kory & Breazeal, 2014).

4.2.2.1. Human-AI relationship. This paradigm shows a collaborative relationship between humans and AI, in which the interaction
between them is two-way, active, sharing, complementary, replaceable, adaptive, and predictable. In the collaborative interaction, AI
becomes a teammate who shares information, goals, and tasks with learners (Johnson & Vera, 2019). As AI learns over time, learner-AI
functions and tasks are allocated dynamically to emphasize the complementarity of learners and AI intelligence. Teachers play the role

8
F. Wang et al. System 125 (2024) 103424

of guides who gather diagnostic information to lead students through learning by making pedagogical decisions. Teachers use the
information from AI to determine what students learn, how, when, and whether students use what they have learned to target in­
struction and AI resources. Teachers also incorporate their personal knowledge of students and understanding of learning contexts and
teaching targets to address particular learning needs (Kory & Breazeal, 2014). Moreover, students play an important role as collab­
orators who actively respond to feedback from teachers, AI, self-evaluation, and peer evaluation. Feedback in itself may not facilitate
learning, unless students engage with and act upon it (Gibbs & Simpson, 2005). Teachers guide students in understanding the eval­
uation criteria, assessing learning quality, engaging with self-evaluation and peer evaluation, and acting upon the feedback.

4.2.2.2. Key dimensions that influence HAII. The learning objective is to foster learner-centered adaptive learning, based on diagnostic
information and feedback from AI, teachers’ instruction, learners’ self-evaluation, or peer evaluation. The common types of instruction
questions asked by teachers include what is working, what to improve, and how to improve (Dixson & Worrell, 2016). Accordingly, the
paradigm often uses semi-open-ended tasks, which require students to list specific facts and express their opinions based on the facts.
Common semi-open-ended tasks include chatting with chatbots (Belda-Medina and Calvo-Ferrer, 2022; El Shazly, 2021) and story­
telling with robots (Fridin, 2014; Kory & Breazeal, 2014). This paradigm has a moderate level of pre-structured interaction. Learners
are expected to elaborate on their ideas to a certain extent besides giving a short and fixed response (Li et al., 2021).

4.2.2.3. Learners’ engagement behavior in learning process and cognitive outcomes. The mode that learners engage with AI is constructive
generating. Constructive generating has been defined as the behavior exhibited by learners when they generate or produce exter­
nalized outputs that go beyond the provided learning materials (Chi & Wylie, 2014). The distinction between constructive generating
and active manipulating in Paradigm One is that students can generate inferences and express additional ideas that are not stated in the
materials, rather than the verbatim copy of what is read or heard. The inherent principles of AI-supported learning as a scalable and
transferable scaffolding allow students to explore and apply what they have learned in their own words. Moreover, language develops
by inference. When students use their prior knowledge and textual information to make judgments, draw conclusions, and interpret the
texts in their own words, they are making inferences. Finally, cognitive outcomes in this paradigm are to transfer new knowledge to
novel contexts created by AI. Teachers include various authentic learning activities, tasks, and experiences to optimize language
learning (Tanaka et al., 2014).

4.2.2.4. Research focus. This body of literature highlights understanding and interpreting language learning process in HAII to
facilitate continuous learning (Conti et al., 2020; Li et al., 2021). Researchers explore participants’ learning experiences and the
meanings they ascribe to the experiences. Researchers reflexively interpret the learning information, which is fore fronted in infor­
mation analysis (Grant & Giddings, 2002). Compared to Paradigm One, Paradigm Two relies more on a phenomenological approach.
Furthermore, the grounded theory approach is commonly employed, such as analyzing the research based on participants’ drawings of
stories told by robots (Conti et al., 2020), and observations of the interaction process (Li et al., 2021; Tanaka et al., 2014). The research
is interested in investigating teachers’ and students’ hermeneutic experience, which is in turn interpreted by researchers themselves.
The interaction mode among teachers, learners, and AI is exhibited in an open environment where learners learn language by con­
structing meaning with AI as collaborators, and teachers are either absent or present to provide support in a new situation (Bowlby,
2005), as guides who prepare students with necessary knowledge to “play” with a robot (Hsieh & Lee, 2021), or co-teach with AI
(Tanaka et al., 2014).
Results showed that the collaboration between humans and AI not only improved learners’ language proficiency, but also promoted
their emotional involvement (Li et al., 2021). Specifically, the expressive social behaviors of robots increased the number of details
learners remembered compared to the inexpressive human storyteller (Conti et al., 2020). Moreover, the high degree of engagement in
HAII led students to report positive emotions in the interview and questionnaire (Hsieh & Lee, 2021). These results indicate that AI can
facilitate interactions which are open-ended in essence, and create interactive settings to facilitate language learning (Tanaka et al.,
2014).

4.2.3. Paradigm Three: AI/teacher/learner-codirected


The third paradigm is characterized as AI/teacher/learner-codirected. AI, teachers, and learners share control of direction over
learning, fostering learner agency as the core of HAII. Corresponding to the human-centered AI approach, this paradigm is an extension
of Paradigm Two, as it also emphasizes a learner-centered and dynamically iterative language learning process involving interactions
between learners and AI. The main difference in this new paradigm lies in the enhanced role of learners evolving from collaborating
with AI to codirecting the learning process with teachers and AI. The evolution of the role underlines the importance of learner agency
(Ouyang & Jiao, 2021). In Paradigm Three, the four core properties of learner agency are supported: intentionality (i.e., to be
planners), forethought (i.e., to be forethinkers), self-reactiveness (i.e., to be self-regulators), and self-reflectiveness (i.e., to be
self-examiners) (Bandura, 2006). This paradigm puts learners at the center of interaction to empower them to become agents of
learning who are planning, forethinking, self-reactive, and self-reflective. This paradigm is associated with the human-centered AI
design approach in the HCI field. The human-centered AI design approach advocates using AI to amplify human abilities to see, think,
create, and act in empowered ways while ensuring human control (Shneiderman, 2020), by creating potent user experience with
embedded AI methods (Robert et al., 2020). This approach includes three main aspects of work interdependently: AI, human factors,
and ethics (Xu et al., 2023). It is congruent with the “ethically aligned design” of AI, which puts users at the center of design to build
users’ self-efficacy, clarify their responsibility, and facilitate their creativity (Donahoe, 2018).

9
F. Wang et al. System 125 (2024) 103424

4.2.3.1. Human-AI relationship. Humans and AI show a teaming relationship in the paradigm. Teachers and learners can make
informed decisions about learning, based on individualized information with data accuracy and interactiveness from advanced AI
technologies (Kay & Kummerfeld, 2019). In the teaming relationship, AI plays the role of an agency-supporting super tool embedded in
the “human loop” (Shneiderman, 2021). Human-in-the-loop AI systems ensure humans to be always a part of the systems to improve
when the confidence of the system output is low (Xu et al., 2023). These AI systems comply with the central decision-making role of
humans. When humans, i.e., teachers and students, share the central decision-making in HAII, teachers are expected to develop
learners’ agency to help them become agents of language learning.

4.2.3.2. Key dimensions that influence human-AI interaction. First, the learning objective is to foster learner agency. Learner agency
requires learners to assume responsibility in every aspect of learning, including goal settings, strategic use of cognitive skills, and active
control of the thinking process. Commonly used metacognitive strategies include planning, monitoring, and evaluation (Hanlon et al.,
2021; Palermo & Thomson, 2018). Second, the main task type is open-ended tasks, which give learners freedom in expressing their
own thoughts and choosing language. Common open-ended tasks include reflective writing about students’ own learning experience
(Hanlon et al., 2021), and incorporating self-regulated strategies into argumentative writing process (Palermo & Thomson, 2018).
Third, this paradigm has a low level of pre-structured interaction. Learners are encouraged to plan, forethink, and reflect on their
language learning process and make further adjustments in the next step.

4.2.3.3. Learners’ engagement behavior in learning process and cognitive outcomes. The mode of students’ cognitive engagement is
interactive dialoguing. Interactive dialoguing has two criteria: constructive utterances and a sufficient frequency of turn-taking (Chi &
Wylie, 2014). Provided that the criteria are met, dialoguing speakers can be AI, teachers, or peers that are involved in learning.
Interactive dialoguing should first be constructive, meaning that new ideas are generated from mutual exchanges of ideas between
partners, while neither individual could produce alone (Rafal, 1996). Furthermore, regarding knowledge-change process, learners
develop language by co-inference. When dialoguing learners take turns in contributing to the topic, they are co-inferring new
knowledge based on feedback, alternative perspectives, or new directions. Finally, the cognitive outcomes are to co-create new
knowledge, which enables language learners to generate new interpretations, perspectives, and ideas in future iterations.

4.2.3.4. Research focus. This paradigm focuses on fostering learner agency to improve language learning in HAII (Hsu & Liang, 2021;
Wilson & Roscoe, 2020). Researchers seek to empower learners to engage in transformative action. Since empowerment is the outcome
of developed consciousness produced by the process of praxis (Grant & Giddings, 2002), learners’ agency becomes the focus. Similar to
Paradigm Two, this paradigm adopts the phenomenological approach with interviews, observations, and text analysis of qualitative
narration. However, the difference is that the studies are much more focused on a thick description of how students internalize agency
in HAII with critical thinking. The thick description of students’ internalization of agency is examined through the critical lens of
researchers (Park et al., 2011). Regarding the interaction mode of Paradigm Three, teachers create the metacognitive and critical
thinking experience with assistance from AI as an agency-supporting super tool to help students become agents of language learning,
such as writing in automated writing evaluation systems (Palermo & Thomson, 2018), performing long-term interaction with a robot
tutor in the classroom (Park et al., 2011), and using a robot to teach computational thinking, critical thinking, and language learning
(Hsu & Liang, 2021).
Findings indicate that learners’ agency can potentially be reflected, practiced, and promoted in HAII for language learning.
Moreover, researchers have expressed their concerns on AI ethics in conducting studies. For instance, children should be aware of the
reason for learning with AI, i.e., not for AI but for themselves (Park et al., 2011), and they should be prevented from becoming
emotionally attached to AI agents (Fridin, 2014). During the data collection procedure, data storage and privacy become a key issue
when intelligent chatbots are used among language learners (Belda-Medina and Calvo-Ferrer, 2022). The researchers expressed and
applied AI ethics through their own lenses.

5. Discussion

This thematic review has mapped the current landscape of research on language learning development in HAII, specifically by
examining contextual characteristics of the research and identifying research paradigms. In this section, we first address the main
findings of the two research questions, and then propose implications for future research.

5.1. The current stage of language learning development in human-AI interaction

RQ1 focuses on contextual characteristics of research on language learning development in HAII. First, the paradigmatic shift
towards human-centered AI approach since 2018 is associated with an increasing research interest in exploring language learning
development in HAII. This is congruent with the current advocacy toward student-centered learning. Second, many studies are located
in specific geographic locations, notably Taiwan, the mainland of China, and the United States. The finding aligns with Liang et al.
(2021). Third, K-12 education has been the main setting where language learning is developed in HAII. The explanation may be found
in the growth of using robots in K-12 education. The physical embodiment of robots provides K-12 students with social behaviors such
as language application, gestures, facial expressions (Bartneck & Forlizzi, 2004), and pedagogical value (Lee & Lee, 2022). Fourth,
with an average sample size of about 86 learners in each study, the research mainly learns English as the target language and integrated

10
F. Wang et al. System 125 (2024) 103424

skills as the language domain. The finding may be due to ample opportunities to practice integrated English skills in relatively
stress-free learning interaction (Wang et al., 2023) and personalized feedback for individual students in large classes (Moussalli &
Cardoso, 2020). Fifth, AI has mostly been used in applications such as robots, chatbots, and automated writing evaluation systems in
HAII. Other applications emerge including intelligent tutoring systems, automatic speech recognition systems, virtual reality, and
Internet of Things intelligent image. Finally, most studies focus on individual use of an AI, for the individualized and interactive
feedback for different learners (Ai, 2017). In some studies, a group of learners interact with an AI, because the AI has the potential to
facilitate human communication with the affordances of AI-mediated interaction (Hancock et al., 2020). Future research may explore
inter-group use of AIs to optimize language learning across groups of learners via various AI applications.

5.2. Research paradigms of language learning development in human-AI interaction: From competing to supplementing

For RQ2, this study identifies three research paradigms of language learning development in HAII: Paradigm One (AI-directed,
teacher-as-facilitator, learner-as-recipient), Paradigm Two (AI/teacher-codirected, learner-as-collaborator), and Paradigm Three (AI/
teacher/learner-codirected). The three paradigms exhibit the human-AI relationship from stimulus-response, collaboration, to
teaming, based on specifically designed tasks in interaction, different modes of engagement behaviors and cognitive outcomes, and
various research foci. When the three paradigms are compared and contrasted, the aforementioned sharp differences between their
constructs have been identified to be rooted in competing philosophical and linguistic underpinnings. Despite the competing un­
derpinnings, the three paradigms supplement each other in research.

5.2.1. Competing philosophical and linguistic underpinnings


The differences between constructs of paradigms are rooted in competing philosophical and linguistic underpinnings. Philo­
sophically, Paradigm One differs from the others because it is based on positivism. Positivism views that truth is composed of a single
tangible reality that can be identified and measured, and knowledge should be developed objectively without the influence of re­
searchers’ or participants’ values (Park et al., 2020). This philosophical underpinning resonates linguistically with Universal Grammar
(UG) Theory, which proposes that there is a universal grammar for all languages sharing basic principles, and humans are endowed
with some language acquisition device to decode the language-specific structural settings (De Bot et al., 2013). According to different
components of linguistic variations, AI is being developed to correspond to the components in core grammar to create learning
experiences.
Paradigm Two is philosophically grounded in phenomenology with its focus on understanding various realities and allowing
meanings to emerge in interaction with a particular situation with participants’ own interpretation and mediation from other factors
(Grundy, 1987). The inclusion of language learning with instead of from learning environments involves interaction among teachers,
students, and AI, so new knowledge is incorporated into prior knowledge rather than being compartmentalized passively. The lin­
guistic underpinning of this paradigm is the constructivist approach that considers language learning as a process in which learners
construct new knowledge by integrating new information into their prior knowledge (Behrens, 2021). Maximization of constructive
learning occurs when learners increase information gathering, which stimulates internal events in learners’ mind (Kaufman, 2004).
In Paradigm Three, critical theory provides the philosophical underpinning to develop self-knowledge through critical self-
reflection (Bullough & Goldstein, 1984). Knowledge is learned through critical reflection, which leads to a perspective trans­
formation or transformed consciousness (Mezirow, 1981). Dynamic Systems Theory (DST) illuminates the linguistic foundation of this
paradigm, which views language as a dynamic system wherein components interact over time, and language develops in a dynamic
process (De Bot et al., 2007). According to DST, learners’ language system is in constant flux with its numerous subsystems, and in­
dividual differences at a given point of time influence language learning. Therefore, language learning does not have an end state, but
instead involves a reflection of the boundless potentiality, so learner agency emerges from the interactions and iterations among
language learning components (Larsen-Freeman, 2006).

5.2.2. Supplementing each other in research


Despite competing philosophical and linguistic underpinnings of the paradigms, the boundaries are blurry in practice, as both
continuities and ruptures mark the boundaries among paradigms (Grant & Giddings, 2002). In research the three paradigms sup­
plement each other with their own values. In Paradigm One, passive learning is necessary for allowing learners to generate hypotheses
about the domain knowledge, which they could test in active learning (MacDonald & Frank, 2016). The dispute is to avoid over­
emphasis on test outcomes and judgment prediction since researchers’ different implementations in research may have strong impacts
on test outcomes (Cheung & Slavin, 2016), and judgment prediction should not be solely based on the measurable variables, neglecting
the interdependency of variables and the context of the study for a comprehensive and deep understanding (Burgelman, 2011). Instead
of viewing objectivity and prediction in this paradigm as universal standards, researchers are expected to recognize the complexity and
context of learning experiences.
Although Paradigm Two is subject to critique for the dominant purpose of understanding, it can be considered as an information
source for two other paradigms (Liu & Yin, 2022). Understanding how language learning develops in the AI-supported educational
environment provides insights into how teachers and learners may influence and be influenced by various factors as shown in
Paradigm One. Furthermore, based on these understandings and potential influences, critical self-reflection can be conducted to
examine how to better develop learner agency in the AI-supported educational environment, thereby informing Paradigm Three.
While Paradigm Three has not been fully fulfilled yet, it could be considered as a further purpose for two other paradigms (Ouyang
& Jiao, 2021). A major challenge in Paradigm Three is to address the complexity of aligning the intricacies of the learning process with

11
F. Wang et al. System 125 (2024) 103424

AI technologies across diverse educational contexts. To respond to this challenge, it is necessary to integrate pedagogical, social,
cultural, economic, and ethical dimensions during the AI application process (Zawacki-Richter et al., 2019). Paradigm One and
Paradigm Two offer valuable insights into identifying influencing factors and understanding how the factors interact with each other
within broader educational contexts, towards the ultimate goal of applying AI to enhance learner agency in language learning for
Paradigm Three.
Given the possibilities of supplementation among the three paradigms, researchers have supplementary methodological choices
across these paradigms. On the one hand, because each paradigm is undergirded by specific philosophical and linguistic underpinnings
as discussed earlier, choice of a paradigm implies that the research will be nested in particular underpinnings, and this orientation will
therefore guide the research towards particular methodologies. In Paradigm One, the data gathered are quantitative in nature, and are
likely to be analyzed with quantitative procedures, e.g., experimental, quasi-experimental, and correlation methodologies. Paradigm
Two often uses phenomenological approaches involving methods such as interviews, observations, and text analysis to stay close to the
participants’ descriptions of their experiences. Paradigm Three often uses phenomenological approaches with critical analysis. On the
other hand, several research methodologies between paradigms can supplement each other in addressing a part of the question at hand
(Kivunja & Kuyini, 2017). When each methodology provides a perspective that is the opposite of the other, a bigger question involves
finding ways to integrate the partial images of reality into a new perspective. The key is to ask which question is more significant to
have solved (Rist, 1977). Therefore, researchers may choose appropriate methodologies to answer their research questions after being
informed by a good understanding of different constructs of research paradigms discussed in this study.

5.3. Implications for future research

Our findings indicate implications for future research. First, HAII has demonstrated the potential to support language learning
development. As to sample groups, with the advances of AI, its evolving affordances, and ethical concerns, it is becoming significant to
explore the needs of learners at various educational levels and with different socioeconomic statuses, abilities, and identity expressions
(Estes et al., 2020). Regarding the target language, research to date has focused on English language learning. Future research may
consider investigating other languages. The under-researched language domains of listening, reading, writing, pronunciation, and
grammar are also worth exploring. As for AI-supported interaction types, researchers may explore inter-group use of AIs to optimize
language learning between groups of learners via various AI applications. Concerning language learning process, future studies may
investigate how to pedagogically design and implement the HAII process to facilitate language learning, and how learning construction
should occur in the process (Chai & Kong, 2017), such as how language learners comprehend AI input, negotiate meaning, and produce
output in various types of interaction (e.g., turn taking, mechanisms of meaning repair, and opening and closing of topics).
Second, regarding how researchers and teachers may draw on the three paradigms to promote language learning in HAII, they are
encouraged to explore various methodological possibilities and appropriateness according to research questions. The boundaries of the
paradigms are blurry in practice. The most important touchstone is to understand positions, research purposes, beliefs, and values, and
then choose the research methodologies that best address the research questions (Cheung & Slavin, 2016; Grant & Giddings, 2002).
This review provides the research landscape for researchers and teachers to gain such understandings.

6. Limitations

Three limitations should be acknowledged. First, the computer-supported process-oriented interaction framework and the
engagement framework guided our initial analysis and coding, so the content of analysis was preset by the classification and coding
scheme, though an open coding process was inductively performed afterward to address this limitation. Second, studies that lacked the
necessary information to describe studies and report findings (What Works Clearinghouse, 2021), or that were written in languages
other than English, might not have been identified. Third, categorization of the three paradigms enables a means of classifying research
based on the main research focus and findings, though this might have led to glossing over the uniqueness of certain studies. This risk
was expected to be mitigated by providing a coherent overview of this multidisciplinary field and identifying the common root in
philosophical and linguistic underpinnings.

7. Conclusion

This study’s primary purposes and contributions are to investigate contextual characteristics of research on language learning
development in HAII and identify research paradigms. When directing language learning practice, researchers and teachers may be
faced with confusion and misunderstanding due to different research paradigms of language learning development in HAII. In response
to the need to better understand contextual characteristics of the research and research paradigms, we have examined human-AI
relationship, learning objective, task type, level of pre-structuring, mode of engagement behavior, knowledge-change process,
cognitive outcomes, and research focus. Moreover, we have revealed philosophical and linguistic underpinnings for each paradigm. No
single paradigm emerges as superior to another for the research. Rather, findings resulting from research in each paradigm are sought
and valued due to the contribution in describing, understanding, and creating new knowledge for language learning development in
HAII. HAII has shown promising potential to improve language learning. This study serves as a foundation for the research orientation
on language learning development in HAII, by gaining insights into the current contextual characteristics of the research and research
paradigms, and providing future research implications.

12
F. Wang et al. System 125 (2024) 103424

CRediT authorship contribution statement

Feifei Wang: Conceptualization, Formal analysis, Methodology, Writing – original draft, Writing – review & editing. Alan C.K.
Cheung: Conceptualization, Investigation, Methodology, Supervision. Ching Sing Chai: Conceptualization, Data curation, Investi­
gation, Methodology.

Acknowledgements

We would like to thank Professor Yin Hongbiao from the Chinese University of Hong Kong for his valuable help in conceptualizing
this study and suggestions.

Appendix

Table A1
Search terms.

Theme Search terms

1. Relating to human-AI experience* OR user* OR expectation* OR usability OR understanding* OR misunderstanding* OR bias* OR emotion* OR
interaction process attitude* OR psycholog* OR interact* OR conversation* OR pragmatics OR cooperat* OR cognit* OR evaluation OR
assessment OR social*
2. Relating to artificial intelligence “artificial intelligence” OR “machine intelligence” OR “intelligent support” OR “intelligent virtual reality” OR “robot*” OR
“chat bot*” OR “machine learning” OR “automated tutor” OR “personal tutor*” OR “intelligent agent*” OR “expert system”
OR “neural network” OR “natural language process*” OR “deep learn*”
3. Relating to educational level “higher education” OR college* OR undergrad* OR graduate OR postgrad* OR “K-12” OR kindergarten* OR “corporate
training*” OR “professional training*” OR “primary school*” OR “middle school*” OR “high school*” OR “elementary
school*” OR “vocational education” OR “adult education”
4. Relating to learning setting educat* OR learn* OR teach* OR instruct* OR student*
5. Relating to language education language* OR literacy
6. Relating to research paradigm theory OR theoretical OR “theoretical framework” OR paradigm*

Table A2
Contextual characteristics of all the included articles.

No. Study Year Country Educational Sample size Language Target AI application AI-supported
/region level domain language interaction
type

1 Al Hakim et al. 2022 Taiwan Secondary 101 Integrated English Robot Group use of
(2022) education an AI
2 Alemi and Haeri, 2020 Iran Preschool 38 Speaking English Robot Group use of
2020 education an AI
3 Alemi et al., 2014 2014 Iran Secondary 46 Vocabulary English Robot Group use of
education an AI
4 Alemi et al., 2015 2015 Iran Secondary 46 Vocabulary English Robot Group use of
education an AI
5 Al-Kaisi et al., 2021 Russia Higher 24 Integrated Russian Chatbot Individual use
2021 education of an AI
6 Balkibekov et al., 2016 Kazakhstan Elementary 76 Vocabulary English Robot Individual use
2016 education of an AI
7 Banaeian and 2021 North Cyprus Higher 65 Vocabulary English Robot Group use of
Gilanlioglu, 2021 education an AI
8 Belda-Medina and 2022 Spain and Higher 176 Integrated English Chatbot Individual use
Calvo-Ferrer, 2022 Poland education of an AI
9 Chen et al., 2021 2021 The mainland Secondary Unspecified Integrated English Internet of Things Group use of
of China education intelligent image an AI
10 Chen et al., 2022 2022 Taiwan Elementary 56 Speaking English Automatic speech Individual use
education recognition of an AI
system
11 Conti et al., 2020 2020 Italy Preschool 81 Integrated English Robot Group use of
education an AI
12 Deng et al., 2022 2022 The mainland Secondary 56 Vocabulary English Intelligent Individual use
of China education tutoring system of an AI
13 El Shazly, 2021 2021 Egypt Higher 48 Speaking English Chatbot Individual use
education of an AI
14 Fridin, 2014 2014 Israel Preschool 10 Speaking Hebrew Robot Group use of
education an AI
15 Hanlon et al., 2021 2021 United States Higher 120 Writing English Automated Individual use
education writing evaluation of an AI
(continued on next page)

13
F. Wang et al. System 125 (2024) 103424

Table A2 (continued )
No. Study Year Country Educational Sample size Language Target AI application AI-supported
/region level domain language interaction
type

16 Hong et al., 2016 2016 Taiwan Elementary 52 Integrated English Robot Group use of
education an AI
17 Hsiao et al., 2015 2015 Taiwan Preschool 57 Reading Chinese Robot Group use of
education an AI
18 Hsieh and Lee, 2021 Taiwan Secondary 52 Speaking English Robot Individual use
2021 education of an AI
19 Hsu and Liang, 2021 Taiwan Elementary 48 Speaking English Robot Group use of
2021 education an AI
20 Hyun et al., 2008 2008 Korea Preschool 34 Integrated Korean Robot Individual use
education of an AI
21 Jeon, 2021 2021 Korea Elementary 53 Vocabulary English Chatbot Individual use
education of an AI
22 Kennedy et al., 2016 United Elementary 67 Integrated French Robot Individual use
2016 Kingdom education of an AI
23 Kim et al., 2019 2019 Kazakhstan Elementary 48 Writing Latin, Robot Individual use
education Cyrillic of an AI
24 Konijn et al., 2022 2022 Netherlands Elementary 63 Vocabulary Dutch Robot Individual use
education of an AI
25 Kory and Breazeal, 2014 United States Preschool 20 Speaking English Robot Individual use
2014 education of an AI
26 Leeuwestein et al. 2021 Netherlands Preschool 67 Vocabulary Dutch Robot Individual use
(2021) education of an AI
27 Li et al., 2021 2021 Japan Higher 10 Speaking Chinese Robot Group use of
education an AI
28 Mageira et al., 2022 Greece Secondary 61 Vocabulary English, Chatbot Individual use
2022 education French of an AI
29 McCarthy et al., 2019 United States Secondary 119 Writing English Automated Individual use
2019 education writing evaluation of an AI
30 Meiirbekov et al., 2016 Kazakhstan Elementary 22 Vocabulary English Robot Individual use
2016 education of an AI
31 Miranty and 2021 Indonesia Higher 100 Writing English Automated Individual use
Widiati, 2021 education writing evaluation of an AI
32 Nong et al., 2021 2021 The mainland Elementary 86 Integrated English Intelligent Individual use
of China education tutoring system of an AI
33 Obari and 2019 Japan Higher 47 Integrated English Chatbot Inter-group use
Lambacher, 2019 education of AIs
34 Obari et al., 2020 2020 Japan Higher 82 Integrated English Chatbot and Inter-group use
education virtual reality of AIs
35 Palermo and 2018 United States Secondary 829 Writing English Automated Group use of
Thomson, 2018 education writing evaluation an AI
36 Park et al., 2011 2011 Korea Elementary 34 Integrated English Robot Group use of
education an AI
37 Ruan et al., 2021 2021 The mainland Higher 56 Speaking English Chatbot Individual use
of China education of an AI
38 Schodde et al., 2019 Germany Preschool 40 Vocabulary English Robot Individual use
2019 education of an AI
39 Steuer et al., 2022 2022 Germany Higher 48 Reading English Educational Individual use
education automatic of an AI
question
generation
40 Su et al., 2020 2020 The mainland Higher 80 Writing English Automated Individual use
of China education writing evaluation of an AI
41 Tanaka et al., 2014 2014 Japan Preschool 52 Speaking English Robot Individual use
education of an AI
42 Tolksdorf et al., 2022 Germany Preschool 16 Vocabulary English Robot Individual use
2022 education of an AI
43 Vogt et al., 2019 2019 Netherlands Elementary 194 Vocabulary English Robot Individual use
education of an AI
44 Wang, 2020 2020 The mainland Higher 188 Writing English Automated Individual use
of China education writing evaluation of an AI
45 Wang et al., 2013 2013 Taiwan Elementary 63 Speaking English Robot Group use of
education an AI
46 Wang et al., 2022 2022 The mainland Elementary 327 Integrated English Intelligent Group use of
of China education tutoring system an AI
47 Wilson and 2020 United States Secondary 56 Writing English Automated Individual use
Roscoe, 2020 education writing evaluation of an AI
48 Wu et al., 2015 2015 Taiwan Elementary 64 Integrated English Robot Group use of
education an AI
(continued on next page)

14
F. Wang et al. System 125 (2024) 103424

Table A2 (continued )
No. Study Year Country Educational Sample size Language Target AI application AI-supported
/region level domain language interaction
type

49 Yueh et al., 2020 2020 Taiwan Elementary 36 Reading Chinese Robot Individual use
education of an AI

Table A3
Summary of studies primarily focusing on language learning development in human-AI interaction in Paradigm One.

No. Study Interaction mode Participants Research Effectiveness Effectiveness


method measures

1 Al Hakim et al. 1. Two treatment groups: Ninth graders at a Mixed T1 outperformed two other Pre-test,
(2022) Treatment 1 (T1) delivered a junior high school in groups in learning outcomes, treatment, and
drama performance with a robot Taiwan motivation, and engagement. immediate post-
in the interactive situated virtual T2 showed higher motivation test
learning environment. than C1. C1 outperformed T2 in
Treatment (T2) did so in the learning outcomes. There was
situated virtual learning no significant difference
environment. between T2 and C1 in
2. One control group: Control 1 engagement.
(C1) delivered a drama
performance on the traditional
dais.
2 Al-Kaisi et al., 1. One treatment group: T1 used Adult beginning-level Mixed T1 outperformed C1 in Post-test
2021 Alice, a voice assistant, to learn learners to learn listening, speaking, reading, and
Russian in a flipped classroom. Russian as a foreign writing, but not in grammar/
2. One control group: C1 did not language lexis.
use Alice to learn Russian.
3 Alemi et al., 1. One treatment group: T1 Iranian EFL junior Quantitative T1 showed more vocabulary Pre-test,
2014 interacted with a robot to learn high school students learning achievements than C1. treatment,
vocabulary with the help from a immediate and
teacher. delayed post-tests
2. One control group: C1 learned
vocabulary in the traditional
teaching method.
4 Alemi et al., 1. One treatment group: T1 Iranian EFL junior Quantitative T1 indicated lower anxiety and Post-test
2015 learned vocabulary from a robot high school students more positive attitude towards
as an assistant teacher. English vocabulary acquisition.
2. One control group: C1 learned
vocabulary from a human
teacher without the use of a
robot.
5 Alemi and 1. One treatment group: T1 Iranian EFL preschool Quantitative T1 outperformed C1 in the Pre-test,
Haeri, 2020 interacted with a robot as a children pragmatic performance for treatment, and
teaching assistant to play games thanking and requesting. immediate post-
and repeat sentences. test
2. One control group: C1
included similar games, but
without the presence of a robot.
6 Balkibekov 1. One treatment group: T1 Children in a primary Quantitative C1 improved more than T1. Pre-test,
et al., 2016 interacted with an always- school treatment, and
winning robot. immediate post-
2. One control group: C1 test
interacted with an always-losing
robot.
7 Banaeian and 1. One treatment group: T1 were Adult learners of an Mixed C1 performed slightly better Pre-test,
Gilanlioglu, encouraged to ask a robot about English language than T1 despite no significant treatment, and
2021 the meaning of new words teaching program in difference between the two immediate post-
during a human teacher’s North Cyprus groups. test
instruction.
2. One control group: C1 did not
interact with the robot in
learning the same content.
8 Chen et al., 1. One treatment group: T1 High school students Quantitative T1 outperformed C1 in Pre-test, mid-test,
2021 learned English in a studio listening, vocabulary, and immediate post-
classroom with the intelligent comprehensive scores. test
image positioning and tracking
equipment.
2. One control group: C1 learned
(continued on next page)

15
F. Wang et al. System 125 (2024) 103424

Table A3 (continued )
No. Study Interaction mode Participants Research Effectiveness Effectiveness
method measures

English with the traditional


English teaching method.
9 Chen et al., 1. One treatment group: T1 Fifth graders in an Quantitative Both groups effectively Pre-test,
2022 learned English speaking skills elementary school of improved English speaking treatment, and
with the dynamic assessment- Taiwan skills. immediate post-
based speech recognition. test
2. One control group: C1 learned
English speaking skills with the
corrective feedback-based
speech recognition.
10 Deng et al., 1. One treatment group: T1 used Junior high school Quantitative T1 outperformed C1 in both Pre-test,
2022 an English learning system with students retention and transfer tests. treatment, and
a virtual pedagogical agent. immediate post-
2. One control group: C1 used test
only the speech version of the
system without a virtual
pedagogical agent.
11 Hong et al., 1. One treatment group: T1 Fifth graders in an Mixed T1 outperformed C1, Treatment and
2016 learned English with a robot elementary school in particularly in listening and immediate post-
which presented the learning Taiwan reading. test
content.
2. One control group: C1 learned
the same content through the
display system without a robot.
12 Hsiao et al., 1. One treatment group: T1 used Pre-kindergarteners in Quantitative T1 performed better than C1 in Pre-test,
2015 a robot in learning reading. Taiwan reading. treatment, mid-
2. One control group: C1 utilized test, and post-test
a tablet-PC in learning reading.
13 Hyun et al., 1. One treatment group: T1 Pre-schoolers in Korea Quantitative T1 improved significantly Preliminary test,
2008 interacted with a robot by compared to C1. pre-test,
touching the pictures on the treatment, and
screen and attending the robot’s immediate post-
program. test
2. One control group: C1
watched the multimedia picture
book program.
14 Jeon, 2021 1. Two treatment group: T1 with Students at a public Quantitative T1 outperformed T2 and C1. Pre-test,
chatbot-assisted dynamic elementary school in Chatbot-assisted dynamic treatment,
assessment, T2 with chatbot- South Korea assessment not only promoted immediate and
assisted non-dynamic vocabulary learning, but also delayed post-tests
assessment offered diagnostic information
2. One control group: C1 did not about individuals.
utilize a chatbot.
15 Kennedy et al., 1. Two treatment groups: T1 Native English Quantitative Two treatment groups improved Pre-test,
2016 used a robot with high verbal speakers from a in learning, and perceived the treatment,
availability. T2 used a robot primary school in the difference in social behavior immediate and
with low verbal availability. U.K. between “high” and “low” delayed post-tests
2. One control group: C1 did not conditions. However, learning
use a robot, and just attended a was not influenced by the two
pre-test and delayed post-test. conditions.
16 Konijn et al., 1. Two treatment groups: T1 Dutch primary school Quantitative Two treatment groups achieved Pre-test,
2022 used a socially behaving robot. children increased learning gains treatment,
T2 used a neutrally behaving compared to the control group. immediate and
robot. Robot’s behavioral style, either delayed post-tests
2. One control group: C1 used a social or neutral, hardly
tablet. differed.
17 Leeuwestein Participants learned half of new Turkish-Dutch Quantitative The Dutch-only robot showed Pre-test,
et al. (2021) words from a Turkish-Dutch kindergartners better performance than the treatment,
bilingual robot with Turkish Turkish-Dutch robot. However, immediate and
translations, and the other half most children preferred delayed post-tests
from a monolingual Dutch working with the bilingual
robot. robot.
18 Mageira et al., 1. One treatment group: T1 was High school students Mixed Both groups showed no Pre-test,
2022 chatbot-based. who learned English or significant difference in the treatment, and
2. One control group: C1 used French as a foreign overall performance gain, immediate post-
other “Information and language cultural content learning, and test
Communication Technology” language learning.
tools.
(continued on next page)

16
F. Wang et al. System 125 (2024) 103424

Table A3 (continued )
No. Study Interaction mode Participants Research Effectiveness Effectiveness
method measures

19 Meiirbekov 1. One treatment group: T1 Elementary school Quantitative There was no significant Pre-test,
et al., 2016 interacted with a winning robot. students difference between the two treatment, and
2. One control group: C1 groups. However, girls learned immediate post-
interacted with a losing robot. more words than boys when test
interacting with the always-
winning robot. Boys learned
more words than girls when
interacting with the always-
losing robot.
20 McCarthy 1. One treatment group: T1 High school students Quantitative The spelling and grammar Initial draft and
et al., 2019 received writing strategy in the U.S. checking tools did not revision
feedback with spelling and significantly contribute to essay
grammar checking tools. scores, only in the aspects of
2. One control group: C1 essay quality including
received writing strategy conclusion, organization, word
feedback. choice, voice, and grammar/
mechanics.
21 Nong et al., 1. One treatment group: T1 used Chinese elementary Quantitative T1 showed significantly better Pre-test,
2021 an AI app to learn English. school students performance than C1. treatment, and
2. One control group: C1 learned immediate post-
English in the traditional test
teaching approach.
22 Obari et al., 1. One treatment group: T1 used Native Japanese Quantitative T1 outperformed C1. Pre-test,
2020 an AI speaker during a blended undergraduates treatment, and
learning program. immediate post-
2. One control group: C1 did not test
use an AI speaker, and just used
online materials during the same
learning period.
23 Obari and 1. One treatment group: T1 used Native Japanese Quantitative Both groups improved in the Pre-test,
Lambacher, Amazon Alexa to learn English undergraduates overall learning performance, treatment, and
2019 daily. especially in listening and oral immediate post-
2. One control group: C1 used communication skills. test
Google Home Mini to learn
English daily.
24 Ruan et al., Students learned English using Chinese college Quantitative EnglishBot improved the Pre-test,
2021 EnglishBot, an AI system, students students’ fluency more than the treatment,
against a traditional listen-and- traditional interface, and immediate and
repeat interface through two promoted engagement in delayed post-tests
studies under both voluntary learning.
and fixed-usage conditions.
25 Schodde et al., 1. One treatment group: T1 Kindergarteners in Quantitative The learning gain between the Pre-test,
2019 interacted with the robot which Germany two groups did not differ treatment,
provided adapt-and-explain significantly. However, immediate and
scaffolding in learning. explanations had a strong effect delayed post-tests
2. One control group: C1 on children who completed all
interacted with the robot rounds of learning.
without adapt-and-explain
scaffolding.
26 Steuer et al., 1. One treatment group: T1 read University students Quantitative T1 outperformed C1 in learning Treatment and
2022 three text passages, and outcome. immediate post-
answered adjunct questions test
during reading.
2. One control group: C1 read
the same passages without
answering adjunct questions.
27 Su et al., 2020 1. One treatment group: T1 used Chinese university Quantitative T1 achieved significantly higher Pre-test,
the Chinese-English Parallel students scores than C1 in word treatment, and
Corpus of Traditional Chinese accuracy, sentence structure, immediate post-
Medicine and the AI network text structure, writing efficacy, test
evaluation system. and writing motivation.
2. One control group: C1 used However, no significant
the traditional learning difference was found between
approach. the two groups in grammar
accuracy and content
innovation.
28 Tolksdorf 1. One treatment group: T1 Preschool native Quantitative The children did not retrieve Treatment and
et al., 2022 interacted with different robots German speakers words differently in the two immediate post-
in learning. conditions. test
(continued on next page)

17
F. Wang et al. System 125 (2024) 103424

Table A3 (continued )
No. Study Interaction mode Participants Research Effectiveness Effectiveness
method measures

2. One control group: C1


interacted with the same robot
in learning.
29 Vogt et al., 1. Three treatment groups: T1 Native speakers of Quantitative The treatment groups scored Pre-test,
2019 learned from a robot with iconic Dutch in primary higher than the control group in treatment,
gestures, and a tablet. T2 schools all tasks, but there was no immediate and
learned from a robot without significant difference among the delayed post-tests
iconic gestures, and a tablet. T3 experimental conditions.
used the tablet only.
2. One control group: C1 was not
exposed to the learning content.
30 Wang et al., 1. One treatment group: T1 used Fifth graders in an Quantitative T1 outperformed C1 in speaking Pre-test,
2013 the tangible learning elementary school of ability. treatment, and
companions to practice English Taiwan immediate post-
conversations in groups. test
2. One control group: C1 used
the traditional learning method
to practice English
conversations with classmates.
31 Wu et al., 2015 1. One treatment group: T1 used Elementary school Mixed T1 achieved significantly higher Treatment and
a robot in classroom learning. students in Taiwan learning outcomes than C1. immediate post-
2. One control group: C1 did not test
use the robot in classroom
learning.
32 Yueh et al., 1. Two treatment groups: T1 co- Third graders in an Mixed No significant difference was Treatment and
2020 read with a library robot. T2 co- elementary school of found among the three groups. immediate post-
read with a human reading Taiwan C1 outperformed T1 and T2 in test
companion. literal comprehension.
2. One control group: C1 read
the storybook alone, without
any reading companion.
Notes. T1 refers to Treatment Group One. T2 refers to Treatment Group Two. T3 refers to Treatment Group Three. C1 refers to Control Group One.
Table A4
Summary of studies primarily focusing on language learning development in human-AI interaction in Paradigm Two.

No. Study Interaction mode Participants Research Findings Main research


method instruments

1 Belda-Medina The participants interacted Teacher candidates in Mixed 1. The participants did not Mixed methods
and with three conversational Spain and Poland have much knowledge about including qualitative
Calvo-Ferrer, agents as future educators chatbots. open-ended questions,
2022 out of class. 2. The important factors in and quantitative pre-
interaction included lexical and post- tests
richness, semantic coherence,
and chatbot adaptivity to
changing conditions, and
personalization.
3. There were positive
perceived usefulness, easiness
and attitudes towards
chatbots, but a moderate
interest in using them.
2 Conti et al., The children listened to two Kindergarten children Mixed The children could memorize Students’ drawings of
2020 tales narrated by a robot, and in Italy more details of a tale when two tales
drew all the details they the robot narrated with an
could recall without time expressive social behavior.
limits.
3 El Shazly, 2021 The students had oral and Egyptian EFL Mixed The students’ speech-related Qualitative and
written dialogic interactions university students anxieties were not reduced quantitative data
with chatbots. after the interactions. AI including content
chatbots could facilitate analysis and pre-and
improved interaction and oral post- tests
communication.
4 Fridin, 2014 An interactive robot told Kindergarten children Quantitative The children enjoyed Eye contact and
prerecorded stories to in Israel interacting with the affective factor to
children, while interactive robot, and their measure children’s
incorporating songs and enjoyment was maintained interaction level
motor activities. over time.
(continued on next page)

18
F. Wang et al. System 125 (2024) 103424

Table A4 (continued )
No. Study Interaction mode Participants Research Findings Main research
method instruments

5 Hsieh and Lee, The students used a robot for Ninth graders in Mixed The students benefited from Students’ reflective
2021 storytelling presentations by Taiwan the digital storytelling journals,
selecting images, writing presentation mode in positive questionnaires
storylines, preparing audio emotions, grit, and learning
narration, and designing achievements.
movements of the robot in
role play.
6 Kim et al., 2019 Participants were instructed Primary school Mixed Children’s learning Questionnaires and
to teach a humanoid robot students in performance was significantly interviews about
how to write Kazakh words. Kazakhstan different in two conditions. students’ attitudes and
Boys learned more letters emotions
than girls.
7 Kory and A robot played a storytelling Preschool-aged Mixed Using a storytelling game Children’s transcribed
Breazeal, 2014 game as a peer with children, children in the U.S. could facilitate the interactive stories and
while introducing language nature of language learning. observations
knowledge. Matching or mis-matching the
robot’s language to the
children’s was important in
the process.
8 Li et al., 2021 The students took the test by University students of Mixed It was possible to employ a Observations of videos
examining pictures, listening learning Chinese as a robot as a tutor and oral test and from the students’
to questions asked by the second foreign proctor. However, the robot comments and
robot, and then answering. language in Japan did not respond to students’ questionnaires
monosyllabic answers well.
Students treated the robot as a
human, and even as a teacher.
9 Miranty and The students used Undergraduate Mixed There was no difference Interviews and
Widiati, 2021 Grammarly in writing. students in Indonesia among student cohorts in questionnaires
their perception that using
Grammarly was considered
necessary to compose and
revise writing.
10 Tanaka et al., The children participated in Japanese pre-school Mixed The robot significantly Interviews and video
2014 a story-telling lesson guided children improved the children’s observations of both
by a remote teacher who response rate in playing the students’ and teachers’
spoke English. A game. Regarding the impact behaviors
telepresence robot was on children’s learning, a
placed on the teacher’s side. single 5–10 min session was
not sufficient.
11 Wang, 2020 The students wrote and Chinese university Mixed The students held high Observations, semi-
submitted four essays on students who learned expectations for the structured interviews,
three automated essay the advanced English computer-assisted evaluation and questionnaires
evaluation systems, and two reading and writing tools. The effectiveness of
teacher raters were invited to course computer scoring feedback
score the first and last essays was higher than that of
of each student. teachers’. Students’
independent learning ability
and English writing ability
were significantly improved.
12 Wang et al., The students practiced Chinese primary Quantitative Cognitive presence and Students’ learning
2022 shadowing speaking with an school students students’ affection for AI’s outcomes, AI usage
AI coach as homework and appearance significantly data, and attitudinal
learning in class. predicted language learning data
enjoyment. Teaching
presence negatively predicted
learning outcomes.

Table A5
Summary of studies primarily focusing on language learning development in human-AI interaction in Paradigm Three.

No. Study Interaction mode Participants Research Findings Main research


method instruments

1 Hanlon Students wrote two First year medical Quantitative Data from faculty or the AI Visualizing the data in
et al., 2021 reflective essays to describe students system were not sufficient to an individualized way,
their perception of their evaluate reflective essays. quantitative analysis of
learning, and their Combining the two methods students’ metacognitive
experience as a member of could deepen understanding processes
their learning team. Their of the students’ reflection.
(continued on next page)

19
F. Wang et al. System 125 (2024) 103424

Table A5 (continued )
No. Study Interaction mode Participants Research Findings Main research
method instruments

essays were rated by faculty


and Academic Writing
Analytics AI system.
2 Hsu and Students used robots to Third graders in an Quantitative The cooperation tendency and Pre-test, treatment,
Liang, 2021 perform cooperative elementary school critical thinking of the post-test, and
learning in groups in an students improved questionnaires
educational board game. significantly. Their English
proficiency of the vocabulary
and sentences was also
improved.
3 Palermo Students received self- Racially and Mixed The students produced longer Surveys, interviews,
and regulated strategy from socioeconomically diverse essays of a higher quality, and and tests
Thomson, their classroom teacher middle school students in included more basic elements
2018 while using the automated districts of North Carolina of argumentative essays than
writing evaluation system those who learned with
to complete interactive traditional writing instruction
lessons, write essays, and on the system, and only with
receive feedback. traditional writing
instruction.
4 Park et al., Students interacted with the Elementary school Mixed The students and teachers Interviews with the
2011 robot to learn class content students were satisfied with the robot. robot developers,
in an after-school English The interaction remained observations from field
program. high. Native speakers of trials, and learning
teachers felt the robot was achievements
their competitors. Special
attention should be paid to the
etiquette of playing with the
robot, technical failures, and
the novelty effects of the
robot.
5 Wilson and Students completed at least Middle school students Mixed The automated writing Interviews and tests
Roscoe, three writing prompt evaluation system improved
2020 assignments using either an the students’ writing self-
automated writing efficacy and learning
evaluation system or Google performance.
Docs.

References

Ahmad, M. I., Mubin, O., Shahid, S., & Orlando, J. (2019). Robot’s adaptive emotional feedback sustains children’s social engagement and promotes their vocabulary
learning: A long-term child–robot interaction study. Adaptive Behavior, 27(4), 243–266. https://doi.org/10.1177/1059712319844182
Ai, H. (2017). Providing graduated corrective feedback in an intelligent computer-assisted language learning environment. ReCALL, 29(3), 313–334. https://doi.org/
10.1017/S095834401700012X
Al Hakim, V. G., Yang, S.-H., Liyanawatta, M., Wang, J.-H., & Chen, G.-D. (2022). Robots in situated learning classrooms with immediate feedback mechanisms to
improve students’ learning performance. Computers & Education, 182, Article 104483. https://doi.org/10.1016/j.compedu.2022.104483
Alemi, M., & Haeri, N. (2020). Robot-assisted instruction of L2 pragmatics: Effects on young EFL learners’ speech act performance. Language, Learning and Technology,
24(2), 86–103. http://hdl.handle.net/10125/44727.
Alemi, M., Meghdari, A., & Ghazisaedy, M. (2014). Employing humanoid robots for teaching English language in Iranian junior high-schools. International Journal of
Humanoid Robotics, 11(3), Article 1450022. https://doi.org/10.1142/s0219843614500224
Alemi, M., Meghdari, A., & Ghazisaedy, M. (2015). The impact of social robotics on L2 learners’ anxiety and attitude in English vocabulary acquisition. International
Journal of Social Robotics, 7(4), 523–535. https://doi.org/10.1007/s12369-015-0286-y
Al-Kaisi, A. N., Arkhangelskaya, A. L., & Rudenko-Morgun, O. I. (2021). The didactic potential of the voice assistant “Alice” for students of a foreign language at a
university. Education and Information Technologies, 26(1), 715–732. https://doi.org/10.1007/s10639-020-10277-2
Balkibekov, K., Meiirbekov, S., Tazhigaliyeva, N., & Sandygulova, A. (2016). Should robots win or lose? Robot’s losing playing strategy positively affects child
learning. In 2016 25th IEEE international symposium on robot and human interactive communication (pp. 706–711). IEEE. https://doi.org/10.1109/
roman.2016.7745196.
Banaeian, H., & Gilanlioglu, I. (2021). Influence of the Nao robot as a teaching assistant on university students’ vocabulary learning and attitudes. Australasian Journal
of Educational Technology, 71–87. https://doi.org/10.14742/ajet.6130
Bandura, A. (2006). Toward a psychology of human agency. Perspectives on Psychological Science, 1(2), 164–180. https://doi.org/10.1111/j.1745-6916.2006.00011.x
Bartneck, C., & Forlizzi, J. (2004). A design-centred framework for social human-robot interaction. In Proceedings of the ro-man 2004 (pp. 591–594). IEEE. https://doi.
org/10.1109/ROMAN.2004.1374827.
Behrens, H. (2021). Constructivist approaches to first language acquisition. Journal of Child Language, 48(5), 959–983. https://doi.org/10.1017/s0305000921000556
Belda-Medina, J., & Calvo-Ferrer, J. R. (2022). Using chatbots as AI conversational partners in language learning. Applied Sciences, 12(17), 8427. https://doi.org/
10.3390/app12178427
Berkowitz, R., Moore, H., Astor, R. A., & Benbenishty, R. (2017). A research synthesis of the associations between socioeconomic background, inequality, school
climate, and academic achievement. Review of Educational Research, 87(2), 425–469. https://doi.org/10.3102/0034654316669821
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2021). Introduction to meta-analysis. John Wiley & Sons.
Bowlby, J. (2005). A secure base: Clinical applications of attachment theory. Taylor & Francis.

20
F. Wang et al. System 125 (2024) 103424

Bullough, R. V., & Goldstein, S. L. (1984). Technical curriculum form and American elementary-school art education. Journal of Curriculum Studies, 16, 143–154.
Burgelman, R. A. (2011). Bridging history and reductionism: A key role for longitudinal qualitative research. Journal of International Business Studies, 42, 591–601.
https://doi.org/10.1057/jibs.2011.12
Capel, T., & Brereton, M. (2023). What is human-centered about human-centered AI? A map of the research landscape. Proceedings of the 2023 CHI Conference on
Human Factors in Computing Systems, 359, 1–23. https://doi.org/10.1145/3544548.3580959
Chai, C. S., & Kong, S. C. (2017). Professional learning for 21st century education. Journal of Computers in Education, 4, 1–4. https://doi.org/10.1007/s40692-016-
0069-y
Chen, C. H., Koong, C. S., & Liao, C. (2022). Influences of integrating dynamic assessment into a speech recognition learning design to support students’ English
speaking skills, learning anxiety and cognitive load. Educational Technology & Society, 25(1), 1–14. https://www.jstor.org/stable/48647026.
Chen, J., Chen, Y., & Lin, J. (2021). Application of Internet of Things intelligent image-positioning studio classroom in English teaching. Journal of High Speed
Networks, 27(3), 279–289. https://doi.org/10.3233/jhs-210667
Cheung, A. C., & Slavin, R. E. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45(5), 283–292. https://doi.org/10.3102/
0013189x16656615
Chi, M. T., & Wylie, R. (2014). The ICAP framework: Linking cognitive engagement to active learning outcomes. Educational Psychologist, 49(4), 219–243. https://doi.
org/10.1080/00461520.2014.965823
Clark, D. B., Tanner-Smith, E. E., & Killingsworth, S. S. (2016). Digital games, design, and learning. A systematic review and meta-analysis. Review of Educational
Research, 86(1), 79–122. https://doi.org/10.3102/0034654315582065
Conti, D., Cirasa, C., Di Nuovo, S., & Di Nuovo, A. (2020). Robot, tell me a tale. Interaction Studies, 21(2), 220–242. https://doi.org/10.1075/is.18024.con
Cooper, H. (2017). Research synthesis and meta-analysis: A step-by-step approach (5th ed.). Sage.
Corbin, J. M., & Strauss, A. (1990). Grounded theory research: Procedures, canons, and evaluative criteria. Qualitative Sociology, 13(1), 3–21. https://doi.org/
10.1007/bf00988593
Corbin, J. M., & Strauss, A. (2008). Basics of qualitative research : Techniques and procedures for developing grounded theory (3rd ed.). Sage.
Crook, C. (1998). Children as computer users: The case of collaborative learning. Computers & Education, 30(3/4), 237–247. https://doi.org/10.1016/S0360-1315(97)
00067-5
Damodaran, L. (1996). User involvement in the systems design process-A practical guide for users. Behaviour & Information Technology, 15(6), 363–377. https://doi.
org/10.1080/014492996120049
De Bot, K., Lowie, W., Thorne, S. L., & Verspoor, M. (2013). Dynamic Systems Theory as a comprehensive theory of second language development. Contemporary
Approaches to Second Language Acquisition, 199–220. https://doi.org/10.1075/aals.9.13ch10
De Bot, K., Lowie, W., & Verspoor, M. (2007). A dynamic systems theory approach to second language acquisition. Bilingualism: Language and Cognition, 10(1), 7–21.
https://doi.org/10.1017/s1366728906002732
Deng, L., Zhou, Y., Cheng, T., Liu, X., Xu, T., & Wang, X. (2022). My English teachers are not human but I like them: Research on virtual teacher self-study learning
system in K12. Learning and Collaboration Technologies. Novel Technological Environments, 176–187. https://doi.org/10.1007/978-3-031-05675-8_14
Dixson, D. D., & Worrell, F. C. (2016). Formative and summative assessment in the classroom. Theory and Practice, 55(2), 153–159. https://doi.org/10.1080/
00405841.2016.1148989
Donahoe, E. (2018). Human-centered AI: Building trust, democracy and human rights by design. Stanford GDPi https://medium.com/stanfords-gdpi/human-
centered-ai-building-trust-democracy-and-human-rights-by-design-2fc14a0b48af.
El Shazly, R. (2021). Effects of artificial intelligence on English speaking anxiety and speaking performance: A case study. Expert Systems, 38(3). https://doi.org/
10.1111/exsy.12667
Estes, M. D., Beverly, C. L., & Castillo, M. (2020). Designing for accessibility: The intersection of instructional design and disability. Handbook of Research in
Educational Communications and Technology, 205–227. https://doi.org/10.1007/978-3-030-36119-8_8
Farooq, U., & Grudin, J. (2016). Human computer integration. Interactions, 23(6), 26–32. https://doi.org/10.1145/3001896
Fridin, M. (2014). Storytelling by a kindergarten social assistive robot: A tool for constructive learning in preschool education. Computers & Education, 70, 53–64.
https://doi.org/10.1016/j.compedu.2013.07.043
Fu, S., Gu, H., & Yang, B. (2020). The affordances of AI-enabled automatic scoring applications on learners’ continuous learning intention: An empirical study in
China. British Journal of Educational Technology, 51(5), 1674–1692. https://doi.org/10.1111/bjet.12995
Gibbs, G., & Simpson, C. (2005). Conditions under which assessment supports students’ learning. Learning and Teaching in Higher Education, (1), 3–31.
Godwin-Jones, R. (2023). Emerging spaces for language learning: AI bots, ambient intelligence, and the metaverse. Language, Learning and Technology, 27(2), 6–27.
https://hdl.handle.net/10125/73501.
Grant, B. M., & Giddings, L. S. (2002). Making sense of methodologies: A paradigm framework for the novice researcher. Contemporary Nurse, 13(1), 10–28. https://
doi.org/10.5172/conu.13.1.10
Gray, R. (2019). Meaningful interaction: Toward a new theoretical approach to online instruction. Technology, Pedagogy and Education, 28(4), 473–484. https://doi.
org/10.1080/1475939X.2019.1635519
Grundy, S. (1987). Curriculum: Product or praxis. Falmer Press.
Hakimi, L., Eynon, R., & Murphy, V. A. (2021). The ethics of using digital trace data in education: A thematic review of the research landscape. Review of Educational
Research, 91(5), 671–717. https://doi.org/10.3102/00346543211020116
Hancock, J. T., Naaman, M., & Levy, K. (2020). AI-mediated communication: Definition, research agenda, and ethical considerations. Journal of Computer-Mediated
Communication, 25(1), 89–100. https://doi.org/10.1093/jcmc/zmz022
Hanlon, C. D., Frosch, E. M., Shochet, R. B., Buckingham Shum, S. J., Gibson, A., & Goldberg, H. R. (2021). Recognizing reflection: Computer-assisted analysis of first
year medical students’ reflective writing. Medical Science Educator, 31(1), 109–116. https://doi.org/10.1007/s40670-020-01132-7
Hawking, S., Musk, E., Wozniak, S., Tallinn, J., Wilczek, F., Tegmark, M., … Desai, C. (2015, July 28). Autonomous weapons open letter: AI & robotics researchers. Future
of Life Institute. https://futureoflife.org/open-letter/open-letter-autonomous-weapons-ai-robotics/.
Hong, Z. W., Huang, Y. M., Hsu, M., & Shen, W. W. (2016). Authoring robot-assisted instructional materials for improving learning performance and motivation in EFL
classrooms. Journal of Educational Technology & Society, 19(1), 337–349. https://www.jstor.org/stable/pdf/jeductechsoci.19.1.337.pdf.
Hsiao, H.-S., Chang, C.-S., Lin, C.-Y., & Hsu, H.-L. (2015). “iRobiQ”: The influence of bidirectional interaction on kindergarteners’ reading motivation, literacy, and
behavior. Interactive Learning Environments, 23(3), 269–292. https://doi.org/10.1080/10494820.2012.745435
Hsieh, C. J., & Lee, J. S. (2021). Digital storytelling outcomes, emotions, grit, and perceptions among EFL middle school learners: Robot-assisted versus PowerPoint-
assisted presentations. Computer Assisted Language Learning, 1–28. https://doi.org/10.1080/09588221.2021.1969410
Hsu, T.-C., & Liang, Y.-S. (2021). Simultaneously improving computational thinking and foreign language learning: Interdisciplinary media with plugged and
unplugged approaches. Journal of Educational Computing Research, 59(6), 1184–1207. https://doi.org/10.1177/0735633121992480
Hung, H.-T., Yang, J. C., Hwang, G.-J., Chu, H.-C., & Wang, C.-C. (2018). A scoping review of research on digital game-based language learning. Computers &
Education, 126, 89–104. https://doi.org/10.1016/j.compedu.2018.07.001
Hwang, G.-J., Xie, H., Wah, B. W., & Gašević, D. (2020). Vision, challenges, roles and research issues of artificial intelligence in education. Computers & Education:
Artificial Intelligence, 1, Article 100001. https://doi.org/10.1016/j.caeai.2020.100001
Hyun, E. J., Kim, S. Y., Jang, S., & Park, S. (2008). Comparative study of effects of language instruction program using intelligence robot and multimedia on linguistic
ability of young children. RO-MAN 2008-The 17th IEEE International Symposium on Robot and Human Interactive Communication, 187–192. IEEE.
Jeon, J. (2021). Chatbot-assisted dynamic assessment (CA-DA) for L2 vocabulary learning and diagnosis. Computer Assisted Language Learning, 1–27. https://doi.org/
10.1080/09588221.2021.1987272
Johnson, M., & Vera, A. (2019). No AI is an island: The case for teaming intelligence. AI Magazine, 40(1), 16–28. https://doi.org/10.1609/aimag.v40i1.2842

21
F. Wang et al. System 125 (2024) 103424

Kaufman, D. (2004). Constructivist issues in language learning and teaching. Annual Review of Applied Linguistics, 24, 303–319. https://doi.org/10.1017/
s0267190504000121
Kay, J., & Kummerfeld, B. (2019). From data to personal user models for life-long, life-wide learners. British Journal of Educational Technology, 50(6), 2871–2884.
https://doi.org/10.1111/bjet.12878
Kim, N.-Y. (2017). Effects of types of voice-based chat on EFL students’ negotiation of meaning according to proficiency levels. English teaching, 72(1), 159–181.
https://doi.org/10.15858/engtea.72.1.201703.159
Kennedy, J., Baxter, P., Senft, E., & Belpaeme, T. (2016). Social robot tutoring for child second language learning. In 2016 11th ACM/IEEE international conference on
human-robot interaction (pp. 231–238). IEEE. https://doi.org/10.1109/hri.2016.7451757.
Kim, A., Omarova, M., Zhaksylyk, A., Asselborn, T., Johal, W., Dillenbourg, P., & Sandygulova, A. (2019). Cowriting Kazakh: Transitioning to a new Latin script using
social robots. 2019 28th IEEE International Conference on Robot and Human Interactive Communication (RO-MAN). https://doi.org/10.1109/ro-
man46459.2019.8956471
Kim, H., & So, K. K. F. (2022). Two decades of customer experience research in hospitality and tourism: A bibliometric analysis and thematic content analysis.
International Journal of Hospitality Management, 100, Article 103082. https://doi.org/10.1016/j.ijhm.2021.103082
Kivunja, C., & Kuyini, A. B. (2017). Understanding and applying research paradigms in educational contexts. International Journal of Higher Education, 6(5), 26–41.
https://doi.org/10.5430/ijhe.v6n5p26
Konijn, E. A., Jansen, B., Mondaca Bustos, V., Hobbelink, V. L., & Preciado Vanegas, D. (2022). Social robots for (second) language learning in (migrant) primary
school children. International Journal of Social Robotics, 14(3), 827–843. https://doi.org/10.1007/s12369-021-00824-3
Kory, J., & Breazeal, C. (2014). Storytelling with robots: Learning companions for preschool children’s language development. The 23rd IEEE International Symposium
on Robot and Human Interactive Communication. https://doi.org/10.1109/roman.2014.6926325
Krippendorff, K. (2005). The semantic turn: A new foundation for design. CRC Press.
Larsen-Freeman, D. (2006). Second language acquisition and the issue of fossilization: There is no end, and there is no state. Studies of Fossilization in Second Language
Acquisition, 189–200. https://doi.org/10.21832/9781853598371-012
Lee, H., & Lee, J. H. (2022). The effects of robot-assisted language learning: A meta-analysis. Educational Research Review, 35, Article 100425. https://doi.org/
10.1016/j.edurev.2021.100425
Leeuwestein, H., Barking, M., Sodacı, H., Oudgenoeg-Paz, O., Verhagen, J., Vogt, P., … Leseman, P. (2021). Teaching Turkish-Dutch kindergartners Dutch vocabulary
with a social robot: Does the robot’s use of Turkish translations benefit children’s Dutch vocabulary learning? Journal of Computer Assisted Learning, 37(3),
603–620. https://doi.org/10.1111/jcal.12510.
Li, F. F. (2018, March 7). How to make A.I. that’s good for people. The New York Times. https://www.nytimes.com/2018/03/07/opinion/artificial-intelligence-human.
html.
Li, H., Yang, D., & Shiota, Y. (2021). Exploring the possibility of using a humanoid robot as a tutor and oral test proctor in Chinese as a foreign language. Expanding
Global Horizons Through Technology Enhanced Language Learning, 113–129. https://doi.org/10.1007/978-981-15-7579-2_6
Liang, J.-C., Hwang, G.-J., Chen, M.-R. A., & Darmawansah, D. (2021). Roles and research foci of artificial intelligence in language education: An integrated
bibliographic analysis and systematic review approach. Interactive Learning Environments, 1–27. https://doi.org/10.1080/10494820.2021.1958348
Lin, V., Yeh, H.-C., & Chen, N.-S. (2022). A systematic review on oral interactions in robot-assisted language learning. Electronics, 11(2), 290. https://doi.org/
10.3390/electronics11020290
Lindberg, R., McDonough, K., & Trofimovich, P. (2021). Investigating verbal and nonverbal indicators of physiological response during second language interaction.
Applied PsychoLinguistics, 42(6), 1403–1425. https://doi.org/10.1017/S014271642100028X
Liu, R., & Yin, H. (2022). Three approaches to the inquiry into teacher identity: A narrative review enlightened by habermas’s human interests. ECNU Review of
Education. https://doi.org/10.1177/20965311221106224, 209653112211062.
Luckin, R., Holmes, W., Griffiths, M., & Forcier, L. B. (2016). Intelligence unleashed: An argument for AI in education. Pearson Education.
MacDonald, K., & Frank, M. C. (2016). When does passive learning improve the effectiveness of active learning? In A. Papafragou, D. Grodner, D. Mirman, &
J. Trueswell (Eds.), Proceedings of the 38th annual conference of the cognitive science society (pp. 2459–2464). Austin.
Mageira, K., Pittou, D., Papasalouros, A., Kotis, K., Zangogianni, P., & Daradoumis, A. (2022). Educational AI chatbots for content and language integrated learning.
Applied Sciences, 12(7), 3239. https://doi.org/10.3390/app12073239
McCarthy, K. S., Roscoe, R. D., Likens, A. D., & McNamara, D. S. (2019). Checking it twice: Does adding spelling and grammar checkers improve essay quality in an
automated writing tutor? Lecture Notes in Computer Science, 270–282. https://doi.org/10.1007/978-3-030-23204-7_23
Meiirbekov, S., Balkibekov, K., Jalankuzov, Z., & Sandygulova, A. (2016). “You win, I lose”: Towards adapting Robot’s teaching strategy. In 2016 11th ACM/IEEE
international conference on human-robot interaction (pp. 475–476). IEEE. https://doi.org/10.1109/hri.2016.7451813.
Mezirow, J. (1981). A critical theory of adult learning and education. Adult Education, 32, 3–24.
Miranty, D., & Widiati, U. (2021). An automated writing evaluation (AWE) in higher education: Indonesian EFL students’ perceptions about grammarly use across
student cohorts. Pegem Journal of Education and Instruction, 11(4), 126–137. https://doi.org/10.47750/pegegog.11.04.12
Mousavinasab, E., Zarifsanaiey, N., Niakan Kalhori, S. R., Rakhshan, M., Keikha, L., & Saeedi, M. G. (2021). Intelligent tutoring systems: A systematic review of
characteristics, applications, and evaluation methods. Interactive Learning Environments, 29(1), 142–163. https://doi.org/10.1080/10494820.2018.1558257
Moussalli, S., & Cardoso, W. (2020). Intelligent personal assistants: Can they understand and be understood by accented L2 learners? Computer Assisted Language
Learning, 33(8), 865–890. https://doi.org/10.1080/09588221.2019.1595664
Nong, L., Liu, G., & Tan, C. (2021). An empirical study on the implementation of AI assisted language teaching for improving learner’s learning ability. 2021 Tenth
International Conference of Educational Innovation Through Technology, 215–221. https://doi.org/10.1109/eitt53287.2021.00050. IEEE.
Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M. E., … Staab, S. (2020). Bias in data-driven artificial intelligence systems—An introductory
survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), e1356. https://doi.org/10.1002/widm.1356.
Nunan, D. (1999). Second language teaching & learning. Heinle & Heinle Publishers.
Obari, H., & Lambacher, S. (2019). Improving the English skills of native Japanese using artificial intelligence in a blended learning program. CALL and
Complexity–Short Papers from EUROCALL, 327–333. https://doi.org/10.14705/rpnet.2019.38.1031
Obari, H., Lambacher, S., & Kikuchi, H. (2020). The impact of using AI and VR with blended learning on English as a foreign language teaching. CALL for Widening
Participation: Short Papers from EUROCALL 2020, 253–258. https://doi.org/10.14705/rpnet.2020.48.1197
Ouyang, F., & Jiao, P. (2021). Artificial intelligence in education: The three paradigms. Computers & Education: Artificial Intelligence, 2, Article 100020. https://doi.
org/10.1016/j.caeai.2021.100020
Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., … Moher, D. (2021). Updating guidance for reporting systematic reviews:
Development of the PRISMA 2020 statement. Journal of Clinical Epidemiology, 134, 103–112. https://doi.org/10.1016/j.jclinepi.2021.02.003.
Palermo, C., & Thomson, M. M. (2018). Teacher implementation of self-regulated strategy development with an automated writing evaluation system: Effects on the
argumentative writing performance of middle school students. Contemporary Educational Psychology, 54, 255–270. https://doi.org/10.1016/j.
cedpsych.2018.07.002
Park, S. J., Han, J. H., Kang, B. H., & Shin, K. C. (2011). Teaching assistant robot, ROBOSEM, in English class and practical issues for its diffusion. Advanced Robotics
and Its Social Impacts, 8–11. https://doi.org/10.1109/arso.2011.6301971
Park, Y. S., Konge, L., & Artino, A. R. (2020). The positivism paradigm of research. Academic Medicine, 95(5), 690–694. https://doi.org/10.1097/
ACM.0000000000003093
Pica, T., Lincoln-Porter, F., Paninos, D., & Linnell, J. (1996). Language learners’ interaction: How does it address the input, output, and feedback needs of L2 learners?
Tesol Quarterly, 30(1), 59. https://doi.org/10.2307/3587607
Pokrivcakova, S. (2019). Preparing teachers for the application of AI-powered technologies in foreign language education. Journal of Language and Cultural Education,
7(3), 135–153. https://doi.org/10.2478/jolace-2019-0025

22
F. Wang et al. System 125 (2024) 103424

Pouresmaeil, A., & Vali, M. (2023). The effects of incidental focus on form on learning vocabulary, grammar, and pronunciation. Language Teaching Research, 1–24.
https://doi.org/10.1177/13621688231185419
Rafal, C. T. (1996). From co-construction to takeovers: Science talk in a group of four girls. The Journal of the Learning Sciences, 5(3), 279–293. https://doi.org/
10.1207/s15327809jls0503_5
Rajagopal, A., & Vedamanickam, N. (2019). New approach to human AI interaction to address digital divide & AI divide: Creating an interactive AI platform to
connect teachers & students. In 2019 IEEE international conference on electrical, computer and communication technologies (ICECCT) (pp. 1–6). IEEE. https://doi.org/
10.1109/icecct.2019.8869174.
Rapp, A., Curti, L., & Boldi, A. (2021). The human side of human-chatbot interaction: A systematic literature review of ten years of research on text-based chatbots.
International Journal of Human-Computer Studies, 151, Article 102630. https://doi.org/10.1016/j.ijhcs.2021.102630
Remland, M. S. (2016). Nonverbal communication in everyday life (4th ed.). Sage.
Richardson, M., & Clesham, R. (2021). Rise of the machines? The evolving role of artificial intelligence (AI) technologies in high stakes assessment. London Review of
Education, 19(1), 1–13. https://doi.org/10.14324/lre.19.1.09, 9.
Rist, R. C. (1977). On the relations among educational research paradigms: From disdain to detente. Anthropology & Education Quarterly, 8(2), 42–49. https://www.
jstor.org/stable/3216405.
Robert, L. P., Bansal, G., & Lutge Christoph. (2020). ICIS 2019 SIGHCI Workshop Panel Report: Human–computer interaction challenges and opportunities for fair,
trustworthy and ethical artificial intelligence. AIS Transactions on Human-Computer Interaction, 96–108. https://doi.org/10.17705/1thci.00130
Ruan, S., Jiang, L., Xu, Q., Liu, Z., Davis, G. M., Brunskill, E., & Landay, J. A. (2021). Englishbot: An AI-powered conversational system for second language learning.
26th International Conference on Intelligent User Interfaces, 434–444. https://doi.org/10.1145/3397481.3450648
Russell, S., Dewey, D., & Tegmark, M. (2015). Research priorities for robust and beneficial artificial intelligence. AI Magazine, 36(4), 105–114. https://doi.org/
10.1609/aimag.v36i4.2577
Schodde, T., Hoffmann, L., Stange, S., & Kopp, S. (2019). Adapt, explain, engage—a study on how social robots can scaffold second-language learning of children.
ACM Transactions on Human-Robot Interaction, 9(1), 1–27. https://doi.org/10.1145/3366422
Shneiderman, B. (2020). Human-centered artificial intelligence: Three fresh ideas. AIS Transactions on Human-Computer Interaction, 12(3), 109–124. https://doi.org/
10.17705/1thci.00131
Shneiderman, B. (2021). Design lessons from AI’s two grand goals: Human emulation and useful applications. IEEE Transactions on Technology and Society, 1(2), 73–82.
https://doi.org/10.1109/tts.2020.2992669
Shneiderman, B. (2022). Human-centered AI. Oxford University Press.
Simon, E., & Taverniers, M. (2011). Advanced efl learners’ beliefs about language learning and teaching: A comparison between grammar, pronunciation, and
vocabulary. English Studies, 92(8), 896–922. https://doi.org/10.1080/0013838X.2011.604578
Smutny, P., & Schreiberova, P. (2020). Chatbots for learning: A review of educational chatbots for the facebook messenger. Computers & Education, 151, Article
103862. https://doi.org/10.1016/j.compedu.2020.103862
Steuer, T., Filighera, A., Tregel, T., & Miede, A. (2022). Educational automatic question generation improves reading comprehension in non-native speakers: A
learner-centric case study. Frontiers in Artificial Intelligence, 5. https://doi.org/10.3389/frai.2022.900304
Strijbos, J. W., Martens, R. L., & Jochems, W. M. G. (2004). Designing for interaction: Six steps to designing computer-supported group-based learning. Computers &
Education, 42(4), 403–424. https://doi.org/10.1016/j.compedu.2003.10.004
Su, Z., Liu, M., Jiang, M., & Shang, Y. (2020). Influence of “corpus data driven learning + learning driven data” mode on ESP writing under the background of
artificial intelligence. Application of Intelligent Systems in Multi-Modal Information Analytics, 547–552. https://doi.org/10.1007/978-3-030-51431-0_80
Sundar, S. S. (2020). Rise of machine agency: A framework for studying the psychology of human–AI interaction (HAII). Journal of Computer-Mediated Communication,
25(1), 74–88. https://doi.org/10.1093/jcmc/zmz026
Tanaka, F., Takahashi, T., Matsuzoe, S., Tazawa, N., & Morita, M. (2014). Telepresence robot helps children in communicating with teachers who speak a different
language. In Proceedings of the 2014 ACM/IEEE international conference on human-robot interaction (pp. 399–406). IEEE. https://doi.org/10.1145/
2559636.2559654.
Te’eni, D. (2006). Designs that fit: An overview of fit conceptualization in HCI. In P. Zhang, & D. Galletta (Eds.), Human-computer interaction and management
information systems: Foundations (pp. 205–221). M.E. Sharpe.
Tieken, M. C., & Auldridge-Reveles, T. R. (2019). Rethinking the school closure research: School closure as spatial injustice. Review of Educational Research, 89(6),
917–953. https://doi.org/10.3102/0034654319877151
Tolksdorf, N. F., Honemann, D., Viertel, F. E., & Rohlfing, K. J. (2022). Who is that?! Does changing the robot as a learning companion impact preschoolers’ language
learning? 2022 17th ACM/IEEE International Conference on Human-Robot Interaction, 1069–1074. https://doi.org/10.1109/hri53351.2022.9889420. IEEE.
Ültay, N., & Çalık, M. (2012). A thematic review of studies into the effectiveness of context-based chemistry curricula. Journal of Science Education and Technology, 21
(6), 686–701. https://doi.org/10.1007/s10956-011-9357-5
Virvou, M. (2022). The emerging era of human-AI interaction: Keynote address. In 2022 13th international conference on information, intelligence, systems & applications
(IISA) (pp. 1–10). IEEE. https://doi.org/10.1109/iisa56318.2022.9904422.
Vogt, P., van den Berghe, R., de Haas, M., Hoffman, L., Kanero, J., Mamus, E., Montanier, J.-M., Oranc, C., Oudgenoeg-Paz, O., Garcıa, D. H., Papadopoulos, F.,
Schodde, T., Verhagen, J., Wallbridge, C. D., Willemsen, B., de Wit, J., Belpaeme, T., Goksun, T., Kopp, S., … Pandey, A. K. (2019). Second language tutoring
using social robots: a large-scale study. IEEE, 497–505.
Vostroknutov, I., Grigoriev, S., & Surat, L. (2021). Search for a new paradigm of education and artificial intelligence. Place and role of artificial intelligence in the new
education system. In 2021 1st international conference on technology enhanced learning in higher education (TELE) (pp. 80–82). https://doi.org/10.1109/
tele52840.2021.9482486
Wang, F., & Cheung, A. C. (2024). Robots’ social behaviors for language learning: A systematic review and meta-analysis. Review of Educational Research, 1–38.
https://doi.org/10.3102/00346543231216437
Wang, X., Liu, Q., Pang, H., Tan, S. C., Lei, J., Wallace, M. P., & Li, L. (2023). What matters in AI-supported learning: A study of human-AI interactions in language
learning using cluster analysis and epistemic network analysis. Computers & Education, 194, Article 104703. https://doi.org/10.1016/j.compedu.2022.104703
Wang, X., Pang, H., Wallace, M. P., Wang, Q., & Chen, W. (2022). Learners’ perceived AI presences in AI-supported language learning: A study of AI as a humanized
agent from community of inquiry. Computer Assisted Language Learning, 1–27. https://doi.org/10.1080/09588221.2022.2056203
Wang, Y. H., Young, S. S. C., & Jang, J. S. R. (2013). Using tangible companions for enhancing learning English conversation. Journal of Educational Technology &
Society, 16(2), 296–309. https://www.jstor.org/stable/pdf/jeductechsoci.16.2.296.pdf.
Wang, Z. (2020). Computer-assisted efl writing and evaluations based on artificial intelligence: A case from a college reading and writing course. Library Hi Tech, 40
(1), 80–97. https://doi.org/10.1108/lht-05-2020-0113
Weimann, T. G., Schlieter, H., & Brendel, A. B. (2022). Virtual coaches: Background, theories, and future research directions. Business & Information Systems
Engineering, 64(4), 515–528. https://doi.org/10.1007/s12599-022-00757-9
What Works Clearinghouse. (2021). Reporting guide for study authors: Group design studies. Institute of Education Sciences. https://ies.ed.gov/ncee/wwc/
ReportingGuide.
Wilson, J., & Roscoe, R. D. (2020). Automated writing evaluation and feedback: Multiple metrics of efficacy. Journal of Educational Computing Research, 58(1), 87–125.
https://doi.org/10.1177/0735633119830764
Wu, W.-C. V., Wang, R.-J., & Chen, N.-S. (2015). Instructional design using an in-house built teaching assistant robot to enhance elementary school English-as-a-
foreign-language learning. Interactive Learning Environments, 23(6), 696–714. https://doi.org/10.1080/10494820.2013.792844
Xu, W., Dainoff, M. J., Ge, L., & Gao, Z. (2023). Transitioning to human interaction with AI systems: New challenges and opportunities for HCI professionals to enable
human-centered AI. International Journal of Human-Computer Interaction, 39(3), 494–518. https://doi.org/10.1080/10447318.2022.2041900

23
F. Wang et al. System 125 (2024) 103424

Yeh, H. C., & Lai, W. Y. (2019). Speaking progress and meaning negotiation processes in synchronous online tutoring. System, 81, 179–191. https://doi.org/10.1016/j.
system.2019.01.001
Yenkimaleki, M., & van Heuven, V. J. (2021). Effects of attention to segmental vs. suprasegmental features on the speech intelligibility and comprehensibility of the
EFL learners targeting the perception or production-focused practice. System, 100, Article 102557. https://doi.org/10.1016/j.system.2021.102557
Yueh, H. P., Lin, W., Wang, S. C., & Fu, L. C. (2020). Reading with robot and human companions in library literacy activities: A comparison study. British Journal of
Educational Technology, 51(5), 1884–1900. https://doi.org/10.1111/bjet.13016
Zawacki-Richter, O., Marín, V. I., Bond, M., & Gouverneur, F. (2019). Systematic review of research on artificial intelligence applications in higher education – where
are the educators? International Journal of Educational Technology in Higher Education, 16(39), 1–27. https://doi.org/10.1186/s41239-019-0171-0

24

You might also like