Dialogue system: Difference between revisions
Tantamounts (talk | contribs) Replaced “dialog” with “dialogue” and made some other minor changes |
Adding local short description: "Computer system to converse with a human", overriding Wikidata description "computer system intended to converse with a human" |
||
(42 intermediate revisions by 25 users not shown) | |||
Line 1: | Line 1: | ||
{{Short description|Computer system to converse with a human}} |
|||
{{EngvarB|date=August 2022}} |
|||
{{More citations needed|date=June 2022}} |
|||
[[File:Automated online assistant.png|thumb|190px|An [[automated online assistant]] on a website - an example where dialogue systems are major components]] |
[[File:Automated online assistant.png|thumb|190px|An [[automated online assistant]] on a website - an example where dialogue systems are major components]] |
||
A '''dialogue system''' or '''conversational agent''' ('''CA''') is a computer system intended to converse with a human |
A '''dialogue system''', or '''conversational agent''' ('''CA'''), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel. |
||
The elements of a dialogue system are not defined because this idea is under research,{{citation needed|date=January 2020}} however, they are different from [[chatbot]].<ref>Klüwer, Tina. "From chatbots to dialog systems." Conversational agents and natural language interaction: Techniques and Effective Practices. IGI Global, 2011. 1-22.</ref> The typical [[GUI]] [[wizard (software)|wizard]] engages in a sort of dialogue, but it includes very few of the common dialogue system components, and the dialogue state is trivial. |
|||
==Background== |
|||
After dialogue systems based only on written text processing starting from the early Sixties,<ref>McTear, Michael, Zoraida Callejas, and David Griol, ''[https://books.google.com/books?id=X_w0DAAAQBAJ&q=%22dialogue+system%22 The conversational interface: Talking to smart devices]'', Springer, 2016.</ref> the first ''speaking'' dialogue system was issued by the [[DARPA]] Project in the US in 1977.<ref>Giancarlo Pirani (ed), ''[https://books.google.com/books?id=BEWqCAAAQBAJ&q=%22dialogue+system%22 Advanced algorithms and architectures for speech understanding]'', Vol. 1. Springer Science & Business Media, 2013.</ref> After the end of this 5-year project, some European projects issued the first dialogue system able to speak many languages (also French, German and Italian).<ref name="sundial" >Alberto Ciaramella, ''A prototype performance evaluation report'', Sundial work package 8000 (1993).</ref> Those first systems were used in the telecom industry to provide phone various services in specific domains, e.g. automated agenda and train tables service. |
|||
==Components== |
==Components== |
||
What sets of components are included in a dialogue system, and how those components divide up responsibilities differs from system to system. Principal to any dialogue system is the [[dialogue manager]], which is a component that manages the state of the dialogue, and dialogue strategy. A typical activity cycle in a dialogue system contains the following phases:<ref>[[Daniel Jurafsky|Jurafsky]] & Martin (2009), Speech and language processing. Pearson International Edition, {{ISBN|978-0-13-504196-3}}, Chapter 24</ref> |
|||
# The user speaks, and the input is converted to plain text by the system's '''input |
# The user speaks, and the input is converted to plain text by the system's '''input recogniser/decoder''', which may include: |
||
#* [[automatic speech |
#* [[speech recognition|automatic speech recogniser]] (ASR) |
||
#* [[gesture |
#* [[gesture recogniser]] |
||
#* [[handwriting |
#* [[handwriting recogniser]] |
||
# The text is |
# The text is analysed by a '''[[natural-language understanding|natural language understanding]]''' (NLU) unit, which may include: |
||
#* Proper Name identification |
#* [[Named-entity recognition|Proper Name identification]] |
||
#* [[part |
#* [[part-of-speech tagging]] |
||
#* Syntactic/semantic [[parser]] |
#* Syntactic/semantic [[parser]] |
||
# The semantic information is |
# The semantic information is analysed by the [[dialogue manager]], which keeps the history and state of the dialogue and manages the general flow of the conversation. |
||
# Usually, the |
# Usually, the dialogue manager contacts one or more ''' task manager'''s, that have knowledge of the specific task domain. |
||
# The |
# The dialogue manager produces output using an '''output generator''', which may include: |
||
#* [[natural language generator]] |
#* [[Natural language generation|natural language generator]] |
||
#* [[gesture generator]] |
#* [[gesture generator]] |
||
#* [[layout manager]] |
#* [[layout manager]] |
||
Line 62: | Line 68: | ||
| title = Modelling of Natural Dialogues in the Context of Speech-based Information and Control Systems |
| title = Modelling of Natural Dialogues in the Context of Speech-based Information and Control Systems |
||
| isbn = 978-3-89838-508-4 |
| isbn = 978-3-89838-508-4 |
||
| publisher = Akademische Verlagsgesellschaft AKA |
| publisher = [[Akademische Verlagsgesellschaft AKA]] |
||
| year = 2014 |
| year = 2014 |
||
| url = http://nbn-resolving.de/urn:nbn:de:gbv:8-diss-167394}} |
| url = http://nbn-resolving.de/urn:nbn:de:gbv:8-diss-167394}} |
||
</ref> (Berg, 2014). It addresses the features of a human-to-human |
</ref> (Berg, 2014). It addresses the features of a human-to-human dialogue (e.g. sub dialogues and topic changes) and aims to integrate them into dialogue systems for human-machine interaction. Often, [[Spoken dialogue system|(spoken) dialogue systems]] require the user to adapt to the system because the system is only able to understand a very limited vocabulary, is not able to react to topic changes, and does not allow the user to influence the dialogue flow. Mixed-initiative is a way to enable the user to have an active part in the dialogue instead of only [[question answering|answering questions]]. However, the mere existence of mixed-initiative is not sufficient to be classified as a natural dialogue system. Other important aspects include:<ref name="naturaldialog"/> |
||
* Adaptivity of the system |
* Adaptivity of the system |
||
* Support of implicit confirmation |
* Support of implicit confirmation |
||
* Usage of verification questions |
* Usage of verification questions |
||
* Possibilities to correct information that |
* Possibilities to correct information that has already been given |
||
* Over-informativeness (give more information than has been asked for) |
* Over-informativeness (give more information than has been asked for) |
||
* Support negations |
* Support negations |
||
* Understand references by |
* Understand references by analysing discourse and anaphora |
||
* Natural language generation to prevent monotonous and recurring prompts |
* Natural language generation to prevent monotonous and recurring prompts |
||
* Adaptive and situation-aware formulation |
* Adaptive and situation-aware formulation |
||
* Social |
* Social behaviour (greetings, the same level of formality as the user, politeness) |
||
* Quality of speech recognition and synthesis |
* Quality of speech recognition and synthesis |
||
Although most of these aspects are issues of many different research projects, there is a lack of tools that support the development of dialogue systems addressing these topics.<ref>{{Citation |
Although most of these aspects are issues of many different research projects, there is a lack of tools that support the development of dialogue systems addressing these topics.<ref>{{Citation |
||
| last1 = Berg | first1 = Markus M. |
| last1 = Berg | first1 = Markus M. |
||
| title = |
| title = Natural Language Processing and Information Systems |
||
| volume = 9103 |
|||
| pages = 144–150 |
|||
| year = 2015 |
| year = 2015 |
||
| |
| doi = 10.1007/978-3-319-19581-0_12 |
||
| series = Lecture Notes in Computer Science |
|||
⚫ | </ref> Apart from [[VoiceXML]] that |
||
| isbn = 978-3-319-19580-3 |
|||
| chapter = NADIA: A Simplified Approach Towards the Development of Natural Dialogue Systems |
|||
}} |
|||
⚫ | </ref> Apart from [[VoiceXML]] that focuses on interactive voice response systems and is the basis for many spoken dialogue systems in industry (customer support applications) and [[Artificial Intelligence Markup Language|AIML]] that is famous for the [[A.L.I.C.E.]] chatbot, none of these integrate linguistic features like dialogue acts or language generation. Therefore, NADIA (a research prototype) gives an idea of how to fill that gap and combines some of the aforementioned aspects like natural language generation, adaptive formulation, and sub dialogues. |
||
== Performance == |
|||
Some authors measure the dialogue system's performance in terms of the percentage of sentences completely right, by comparing the model of sentences (this measure is called ''Concept Sentence Accuracy''<ref>Bangalore, Srinivas, and Michael Johnston. "Robust understanding in multimodal interfaces." Computational Linguistics 35.3 (2009): 345-397.</ref> or ''Sentence Understanding''<ref name="sundial" />). |
|||
==Applications== |
==Applications== |
||
Line 103: | Line 118: | ||
* Website navigation: Guiding customers to relevant portions of complex websites—a Website concierge |
* Website navigation: Guiding customers to relevant portions of complex websites—a Website concierge |
||
* Technical support: Responding to technical problems, such as diagnosing a problem with a product or device |
* Technical support: Responding to technical problems, such as diagnosing a problem with a product or device |
||
* Personalized service: Conversational agents can leverage internal and external databases to |
* Personalized service: Conversational agents can leverage internal and external databases to personalise interactions, such as answering questions about account balances, providing portfolio information, delivering frequent flier or membership information, for example |
||
* Training or education: They can provide problem-solving advice while the user learns |
* Training or education: They can provide problem-solving advice while the user learns |
||
* Simple dialogue systems are widely used to decrease human workload in [[call |
* Simple dialogue systems are widely used to decrease the human workload in [[call center]]s. In this and other industrial telephony applications, the functionality provided by dialogue systems is known as [[interactive voice response]] or IVR. |
||
* Support scientist in data manipulation and analysis tasks, for example in genomics.<ref> |
|||
{{Citation |
|||
| last1 = Crovari |
|||
| last2 = Pidò |
|||
| last3 = Pinoli |
|||
| last4 = Bernasconi |
|||
| last5 = Canakoglu |
|||
| last6 = Garzotto |
|||
| last7 = Ceri |
|||
| title = GeCoAgent: a conversational agent for empowering genomic data extraction and analysis |
|||
| journal = ACM Transactions on Computing for Healthcare |
|||
| publisher = ACM New York, NY |
|||
| year = 2021 |
|||
| volume = 3 |
|||
| pages = 1–29 |
|||
| doi = 10.1145/3464383| hdl = 11311/1192262 |
|||
| s2cid = 245855725 |
|||
| hdl-access = free}} |
|||
</ref> |
|||
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as [[embodied agents]]. |
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as [[embodied agents]]. |
||
Line 114: | Line 148: | ||
{| class="wikitable sortable" |
{| class="wikitable sortable" |
||
|- |
|- |
||
! Name & |
! Name & links !! System type !! Description !! Affiliation[s] !! Environment[s] !! Comments |
||
|- |
|- |
||
| [[AIML]] || [[Chatterbot]] language || XML dialect for creating natural language software agents || [[Richard Wallace (scientist)|Richard Wallace, Pandorabots, Inc.]] || || |
| [[Artificial Intelligence Markup Language|AIML]] || [[Chatterbot]] language || XML dialect for creating natural language software agents || [[Richard Wallace (scientist)|Richard Wallace, Pandorabots, Inc.]] || || |
||
|- |
|- |
||
| [[ChatScript]] || [[Chatterbot]] language || Language/Engine for creating natural language software agents || [[Bruce Wilcox]] || || |
| [[ChatScript]] || [[Chatterbot]] language || Language/Engine for creating natural language software agents || [[Bruce Wilcox]] || || |
||
|- |
|- |
||
| |
| CSLU Toolkit<br /> || || A state-based speech interface prototyping environment || [[OGI School of Science and Engineering]]<br/>M. McTear<br/>Ron Cole || || [https://web.archive.org/web/20110416171931/http://cslu.cse.ogi.edu/toolkit/pubs/index.html publications] are from 1999. |
||
|- |
|- |
||
| [http://www.linguasys.com/server-item/NLUI%20Server NLUI Server] || Domain-independent toolkit || |
| [http://www.linguasys.com/server-item/NLUI%20Server NLUI Server] || Domain-independent toolkit || Complete multilingual framework for building [[natural-language user interface|natural language user interface]] systems || [[LinguaSys]] || || out-of-box support of mixed-initiative dialogues |
||
|- |
|- |
||
| Olympus || || |
| Olympus || || Complete framework for implementing spoken dialogue systems|| [[Carnegie Mellon University]] || [http://wiki.speech.cs.cmu.edu/olympus/index.php/Olympus]|| |
||
|- |
|- |
||
| [http://www.nextnova.com Nextnova] || Multimodal Platform || Platform for developing multimodal software applications. Based on State Chart XML (SCXML) || [[Ponvia Technology, Inc.]] || || |
| [http://www.nextnova.com Nextnova] || Multimodal Platform || Platform for developing multimodal software applications. Based on State Chart XML (SCXML) || [[Ponvia Technology, Inc.]] || || |
||
|- |
|- |
||
| [[VXML]]<br/>Voice XML || Spoken |
| [[VXML]]<br/>Voice XML || Spoken dialogue || Multimodal dialogue markup language || Developed initially by [[AT&T]], then administered by an industry consortium and finally a [[World Wide Web Consortium|W3C]] specification || Example || primarily for telephony. |
||
|- |
|- |
||
| [[Speech Application Language Tags|SALT]] || markup language || |
| [[Speech Application Language Tags|SALT]] || markup language || Multimodal dialogue markup language || [[Microsoft]] || || "has not reached the level of maturity of VoiceXML in the standards process". |
||
|- |
|- |
||
| [[Quack.com]] - QXML || Development Environment || || |
| [[Quack.com]] - QXML || Development Environment || || Company bought by [[AOL]] || || |
||
|- |
|- |
||
| [http://opendial-toolkit.net OpenDial] || Domain-independent toolkit || |
| [http://opendial-toolkit.net OpenDial] || Domain-independent toolkit || Hybrid symbolic/statistical framework for spoken dialogue systems, implemented in Java|| [[University of Oslo]] || || |
||
|- |
|- |
||
| [http://mmberg.net/nadia NADIA] || |
| [http://mmberg.net/nadia NADIA] || dialogue engine and dialogue modelling || Creating natural dialogues/dialogue systems. Supports dialogue acts, mixed initiative, NLG. Implemented in Java. || Markus M. Berg || || create XML-based dialogue files, no need to specify grammars, publications are from 2014 |
||
|} |
|} |
||
Line 147: | Line 181: | ||
==Further reading== |
==Further reading== |
||
* {{Cite book |last=Will |first=Thomas |title=Creating a Dynamic Speech Dialogue |publisher=[[VDM Verlag Dr. Müller]] |year=2007 |isbn=978-3-8364-4990-8}} |
* {{Cite book |last=Will |first=Thomas |title=Creating a Dynamic Speech Dialogue |publisher=[[VDM Verlag Dr. Müller]] |year=2007 |isbn=978-3-8364-4990-8}} |
||
{{User interfaces}} |
|||
{{Authority control}} |
|||
[[Category:Multimodal interaction]] |
[[Category:Multimodal interaction]] |
Latest revision as of 15:09, 9 July 2024
This article needs additional citations for verification. (June 2022) |
A dialogue system, or conversational agent (CA), is a computer system intended to converse with a human. Dialogue systems employed one or more of text, speech, graphics, haptics, gestures, and other modes for communication on both the input and output channel.
The elements of a dialogue system are not defined because this idea is under research,[citation needed] however, they are different from chatbot.[1] The typical GUI wizard engages in a sort of dialogue, but it includes very few of the common dialogue system components, and the dialogue state is trivial.
Background
[edit]After dialogue systems based only on written text processing starting from the early Sixties,[2] the first speaking dialogue system was issued by the DARPA Project in the US in 1977.[3] After the end of this 5-year project, some European projects issued the first dialogue system able to speak many languages (also French, German and Italian).[4] Those first systems were used in the telecom industry to provide phone various services in specific domains, e.g. automated agenda and train tables service.
Components
[edit]What sets of components are included in a dialogue system, and how those components divide up responsibilities differs from system to system. Principal to any dialogue system is the dialogue manager, which is a component that manages the state of the dialogue, and dialogue strategy. A typical activity cycle in a dialogue system contains the following phases:[5]
- The user speaks, and the input is converted to plain text by the system's input recogniser/decoder, which may include:
- The text is analysed by a natural language understanding (NLU) unit, which may include:
- Proper Name identification
- part-of-speech tagging
- Syntactic/semantic parser
- The semantic information is analysed by the dialogue manager, which keeps the history and state of the dialogue and manages the general flow of the conversation.
- Usually, the dialogue manager contacts one or more task managers, that have knowledge of the specific task domain.
- The dialogue manager produces output using an output generator, which may include:
- Finally, the output is rendered using an output renderer, which may include:
- text-to-speech engine (TTS)
- talking head
- robot or avatar
Dialogue systems that are based on a text-only interface (e.g. text-based chat) contain only stages 2–5.
Types of systems
[edit]Dialogue systems fall into the following categories, which are listed here along a few dimensions. Many of the categories overlap and the distinctions may not be well established.
- by modality
- by device
- telephone-based systems
- PDA systems
- in-car systems
- robot systems
- desktop/laptop systems
- native
- in-browser systems
- in-virtual machine
- in-virtual environment
- robots
- by style
- command-based
- menu-driven
- natural language
- speech graffiti
- by initiative
- system initiative
- user initiative
- mixed initiative
Natural dialogue systems
[edit]This section may lend undue weight to certain ideas, incidents, or controversies. Please help to create a more balanced presentation. Discuss and resolve this issue before removing this message. (May 2017) |
"A Natural Dialogue System is a form of dialogue system that tries to improve usability and user satisfaction by imitating human behaviour" [6] (Berg, 2014). It addresses the features of a human-to-human dialogue (e.g. sub dialogues and topic changes) and aims to integrate them into dialogue systems for human-machine interaction. Often, (spoken) dialogue systems require the user to adapt to the system because the system is only able to understand a very limited vocabulary, is not able to react to topic changes, and does not allow the user to influence the dialogue flow. Mixed-initiative is a way to enable the user to have an active part in the dialogue instead of only answering questions. However, the mere existence of mixed-initiative is not sufficient to be classified as a natural dialogue system. Other important aspects include:[6]
- Adaptivity of the system
- Support of implicit confirmation
- Usage of verification questions
- Possibilities to correct information that has already been given
- Over-informativeness (give more information than has been asked for)
- Support negations
- Understand references by analysing discourse and anaphora
- Natural language generation to prevent monotonous and recurring prompts
- Adaptive and situation-aware formulation
- Social behaviour (greetings, the same level of formality as the user, politeness)
- Quality of speech recognition and synthesis
Although most of these aspects are issues of many different research projects, there is a lack of tools that support the development of dialogue systems addressing these topics.[7] Apart from VoiceXML that focuses on interactive voice response systems and is the basis for many spoken dialogue systems in industry (customer support applications) and AIML that is famous for the A.L.I.C.E. chatbot, none of these integrate linguistic features like dialogue acts or language generation. Therefore, NADIA (a research prototype) gives an idea of how to fill that gap and combines some of the aforementioned aspects like natural language generation, adaptive formulation, and sub dialogues.
Performance
[edit]Some authors measure the dialogue system's performance in terms of the percentage of sentences completely right, by comparing the model of sentences (this measure is called Concept Sentence Accuracy[8] or Sentence Understanding[4]).
Applications
[edit]Dialogue systems can support a broad range of applications in business enterprises, education, government, healthcare, and entertainment.[9] For example:
- Responding to customers' questions about products and services via a company's website or intranet portal
- Customer service agent knowledge base: Allows agents to type in a customer's question and guide them with a response
- Guided selling: Facilitating transactions by providing answers and guidance in the sales process, particularly for complex products being sold to novice customers
- Help desk: Responding to internal employee questions, e.g., responding to HR questions
- Website navigation: Guiding customers to relevant portions of complex websites—a Website concierge
- Technical support: Responding to technical problems, such as diagnosing a problem with a product or device
- Personalized service: Conversational agents can leverage internal and external databases to personalise interactions, such as answering questions about account balances, providing portfolio information, delivering frequent flier or membership information, for example
- Training or education: They can provide problem-solving advice while the user learns
- Simple dialogue systems are widely used to decrease the human workload in call centers. In this and other industrial telephony applications, the functionality provided by dialogue systems is known as interactive voice response or IVR.
- Support scientist in data manipulation and analysis tasks, for example in genomics.[10]
In some cases, conversational agents can interact with users using artificial characters. These agents are then referred to as embodied agents.
Toolkits and architectures
[edit]A survey of current frameworks, languages and technologies for defining dialogue systems.
Name & links | System type | Description | Affiliation[s] | Environment[s] | Comments |
---|---|---|---|---|---|
AIML | Chatterbot language | XML dialect for creating natural language software agents | Richard Wallace, Pandorabots, Inc. | ||
ChatScript | Chatterbot language | Language/Engine for creating natural language software agents | Bruce Wilcox | ||
CSLU Toolkit |
A state-based speech interface prototyping environment | OGI School of Science and Engineering M. McTear Ron Cole |
publications are from 1999. | ||
NLUI Server | Domain-independent toolkit | Complete multilingual framework for building natural language user interface systems | LinguaSys | out-of-box support of mixed-initiative dialogues | |
Olympus | Complete framework for implementing spoken dialogue systems | Carnegie Mellon University | [1] | ||
Nextnova | Multimodal Platform | Platform for developing multimodal software applications. Based on State Chart XML (SCXML) | Ponvia Technology, Inc. | ||
VXML Voice XML |
Spoken dialogue | Multimodal dialogue markup language | Developed initially by AT&T, then administered by an industry consortium and finally a W3C specification | Example | primarily for telephony. |
SALT | markup language | Multimodal dialogue markup language | Microsoft | "has not reached the level of maturity of VoiceXML in the standards process". | |
Quack.com - QXML | Development Environment | Company bought by AOL | |||
OpenDial | Domain-independent toolkit | Hybrid symbolic/statistical framework for spoken dialogue systems, implemented in Java | University of Oslo | ||
NADIA | dialogue engine and dialogue modelling | Creating natural dialogues/dialogue systems. Supports dialogue acts, mixed initiative, NLG. Implemented in Java. | Markus M. Berg | create XML-based dialogue files, no need to specify grammars, publications are from 2014 |
See also
[edit]References
[edit]- ^ Klüwer, Tina. "From chatbots to dialog systems." Conversational agents and natural language interaction: Techniques and Effective Practices. IGI Global, 2011. 1-22.
- ^ McTear, Michael, Zoraida Callejas, and David Griol, The conversational interface: Talking to smart devices, Springer, 2016.
- ^ Giancarlo Pirani (ed), Advanced algorithms and architectures for speech understanding, Vol. 1. Springer Science & Business Media, 2013.
- ^ a b Alberto Ciaramella, A prototype performance evaluation report, Sundial work package 8000 (1993).
- ^ Jurafsky & Martin (2009), Speech and language processing. Pearson International Edition, ISBN 978-0-13-504196-3, Chapter 24
- ^ a b Berg, Markus M. (2014), Modelling of Natural Dialogues in the Context of Speech-based Information and Control Systems, Akademische Verlagsgesellschaft AKA, ISBN 978-3-89838-508-4
- ^ Berg, Markus M. (2015), "NADIA: A Simplified Approach Towards the Development of Natural Dialogue Systems", Natural Language Processing and Information Systems, Lecture Notes in Computer Science, vol. 9103, pp. 144–150, doi:10.1007/978-3-319-19581-0_12, ISBN 978-3-319-19580-3
- ^ Bangalore, Srinivas, and Michael Johnston. "Robust understanding in multimodal interfaces." Computational Linguistics 35.3 (2009): 345-397.
- ^ Lester, J.; Branting, K.; Mott, B. (2004), "Conversational Agents" (PDF), The Practical Handbook of Internet Computing, Chapman & Hall
- ^ Crovari; Pidò; Pinoli; Bernasconi; Canakoglu; Garzotto; Ceri (2021), "GeCoAgent: a conversational agent for empowering genomic data extraction and analysis", ACM Transactions on Computing for Healthcare, 3, ACM New York, NY: 1–29, doi:10.1145/3464383, hdl:11311/1192262, S2CID 245855725
Further reading
[edit]- Will, Thomas (2007). Creating a Dynamic Speech Dialogue. VDM Verlag Dr. Müller. ISBN 978-3-8364-4990-8.