We propose and demonstrate the task of giving natural language summaries of the actions of a robo... more We propose and demonstrate the task of giving natural language summaries of the actions of a robotic agent in a virtual environment. We explain why such a task is important, what makes it difficult, and discuss how it might be addressed. To encourage others to work on this, we show how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner. We provide quantitative and qualitative evaluations of our results, which can serve as a baseline for future work.
We put forward an end-to-end multi-step machine learning model which jointly labels semantic role... more We put forward an end-to-end multi-step machine learning model which jointly labels semantic roles and the proto-roles of Dowty (1991), given a sentence and the predicates therein. Our best architecture first learns argument spans followed by learning the argument's syntactic heads. This information is shared with the next steps for predicting the semantic roles and proto-roles. We also experiment with transfer learning from argument and head prediction to role and proto-role labeling. We compare using static and contextual embeddings for words, arguments, and sentences. Unlike previous work, our model does not require pre-training or fine-tuning on additional tasks, beyond using off-the-shelf (static or contextual) embeddings and supervision. It also does not require argument spans, their semantic roles, and/or their gold syntactic heads as additional input, because it learns to predict all these during training. Our multi-task learning model raises the state-of-the-art predictions for most proto-roles.
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014
We describe the WordsEye Linguistics tool (WELT), a novel tool for the documentation and preserva... more We describe the WordsEye Linguistics tool (WELT), a novel tool for the documentation and preservation of endangered languages. WELT is based on Words-Eye (Coyne and Sproat, 2001), a text-toscene tool that automatically generates 3D scenes from written input. WELT has two modes of operation. In the first mode, English input automatically generates a picture which can be used to elicit a description in the target language. In the second mode, the linguist formally documents the grammar of an endangered language, thereby creating a system that takes input in the endangered language and generates a picture according to the grammar; the picture can then be used to verify the grammar with native speakers. We will demonstrate WELT's use on scenarios involving Arrernte and Nahuatl.
We investigate using Amazon Mechanical Turk (AMT) for building a low-level description corpus and... more We investigate using Amazon Mechanical Turk (AMT) for building a low-level description corpus and populating VigNet, a comprehensive semantic resource that we will use in a text-to-scene generation system. To depict a picture of a location, VigNet should contain the knowledge about the typical objects in that location and the arrangements of those objects. Such information is mostly common-sense knowledge that is taken for granted by human beings and is not stated in existing lexical resources and in text corpora. In this paper we focus on collecting objects of locations using AMT. Our results show that it is a promising approach.
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014), 2014
We summarize our experience using FrameNet in two rather different projects in natural language p... more We summarize our experience using FrameNet in two rather different projects in natural language processing (NLP). We conclude that NLP can benefit from FrameNet in different ways, but we sketch some problems that need to be overcome.
Text-to-scene conversion requires knowledge about how actions and locations are expressed in lang... more Text-to-scene conversion requires knowledge about how actions and locations are expressed in language and realized in the world. To provide this knowlege, we are creating a lexical resource (VigNet) that extends FrameNet by creating a set of intermediate frames (vignettes) that bridge between the high-level semantics of FrameNet frames and a new set of low-level primitive graphical frames. Vignettes can be thought of as a link between function and form ‐ between what a scene means and what it looks like. In this paper, we describe the set of primitive graphical frames and the functional properties of 3D objects (affordances) we use in this decomposition. We examine the methods and tools we have developed to populate VigNet with a large number of action and location vignettes.
This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that... more This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that represents conceptual and graphical relations. We also describe a lexical resource that implements this theory, VigNet, and its application in text-to-scene generation.
Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, incorporating Cleaneval, 2007
The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate... more The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector ...
We present PCRISP, a sentence generation system for probabilistic TAG grammars which performs sen... more We present PCRISP, a sentence generation system for probabilistic TAG grammars which performs sentence planning and surface realization in an integrated fashion, in the style of the SPUD system. PCRISP operates by converting the generation problem into a metric planning problem and solving it using an offthe-shelf planner. We evaluate PCRISP on the WSJ corpus and identify trade-offs between coverage, efficiency, and accuracy.
When training semantic role labeling systems, the syntax of example sentences is of particular im... more When training semantic role labeling systems, the syntax of example sentences is of particular importance. Unfortunately, for the FrameNet annotated sentences, there is no standard parsed version. The integration of the automatic parse of an annotated sentence with its semantic annotation, while conceptually straightforward, is complex in practice. We present a standard dataset that is publicly available and that can be used in future research. This dataset contains parser-generated dependency structures (with POS tags and lemmas) for all FrameNet 1.5 sentences, with nodes automatically associated with FrameNet annotations.
The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate... more The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector machine classifier is used to distinguish between "clean" and "dirty" blocks. Dirty blocks are removed from the HTML tree before it is passed to the Lynx browser for conversion into plain text. The SVM classifier was trained and evaluated on a manually cleaned dataset of 158 English Web pages, the FIASCO gold standard.
The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate... more The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector machine classifier is used to distinguish between "clean" and "dirty" blocks. Dirty blocks are removed from the HTML tree before it is passed to the Lynx browser for conversion into plain text. The SVM classifier was trained and evaluated on a manually cleaned dataset of 158 English Web pages, the FIASCO gold standard.
We propose and demonstrate the task of giving natural language summaries of the actions of a robo... more We propose and demonstrate the task of giving natural language summaries of the actions of a robotic agent in a virtual environment. We explain why such a task is important, what makes it difficult, and discuss how it might be addressed. To encourage others to work on this, we show how a popular existing dataset that matches robot actions with natural language descriptions designed for an instruction following task can be repurposed to serve as a training ground for robot action summarization work. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or intermediate text representations of the actions used by an automatic planner. We provide quantitative and qualitative evaluations of our results, which can serve as a baseline for future work.
We put forward an end-to-end multi-step machine learning model which jointly labels semantic role... more We put forward an end-to-end multi-step machine learning model which jointly labels semantic roles and the proto-roles of Dowty (1991), given a sentence and the predicates therein. Our best architecture first learns argument spans followed by learning the argument's syntactic heads. This information is shared with the next steps for predicting the semantic roles and proto-roles. We also experiment with transfer learning from argument and head prediction to role and proto-role labeling. We compare using static and contextual embeddings for words, arguments, and sentences. Unlike previous work, our model does not require pre-training or fine-tuning on additional tasks, beyond using off-the-shelf (static or contextual) embeddings and supervision. It also does not require argument spans, their semantic roles, and/or their gold syntactic heads as additional input, because it learns to predict all these during training. Our multi-task learning model raises the state-of-the-art predictions for most proto-roles.
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2014
We describe the WordsEye Linguistics tool (WELT), a novel tool for the documentation and preserva... more We describe the WordsEye Linguistics tool (WELT), a novel tool for the documentation and preservation of endangered languages. WELT is based on Words-Eye (Coyne and Sproat, 2001), a text-toscene tool that automatically generates 3D scenes from written input. WELT has two modes of operation. In the first mode, English input automatically generates a picture which can be used to elicit a description in the target language. In the second mode, the linguist formally documents the grammar of an endangered language, thereby creating a system that takes input in the endangered language and generates a picture according to the grammar; the picture can then be used to verify the grammar with native speakers. We will demonstrate WELT's use on scenarios involving Arrernte and Nahuatl.
We investigate using Amazon Mechanical Turk (AMT) for building a low-level description corpus and... more We investigate using Amazon Mechanical Turk (AMT) for building a low-level description corpus and populating VigNet, a comprehensive semantic resource that we will use in a text-to-scene generation system. To depict a picture of a location, VigNet should contain the knowledge about the typical objects in that location and the arrangements of those objects. Such information is mostly common-sense knowledge that is taken for granted by human beings and is not stated in existing lexical resources and in text corpora. In this paper we focus on collecting objects of locations using AMT. Our results show that it is a promising approach.
Proceedings of Frame Semantics in NLP: A Workshop in Honor of Chuck Fillmore (1929-2014), 2014
We summarize our experience using FrameNet in two rather different projects in natural language p... more We summarize our experience using FrameNet in two rather different projects in natural language processing (NLP). We conclude that NLP can benefit from FrameNet in different ways, but we sketch some problems that need to be overcome.
Text-to-scene conversion requires knowledge about how actions and locations are expressed in lang... more Text-to-scene conversion requires knowledge about how actions and locations are expressed in language and realized in the world. To provide this knowlege, we are creating a lexical resource (VigNet) that extends FrameNet by creating a set of intermediate frames (vignettes) that bridge between the high-level semantics of FrameNet frames and a new set of low-level primitive graphical frames. Vignettes can be thought of as a link between function and form ‐ between what a scene means and what it looks like. In this paper, we describe the set of primitive graphical frames and the functional properties of 3D objects (affordances) we use in this decomposition. We examine the methods and tools we have developed to populate VigNet with a large number of action and location vignettes.
This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that... more This paper introduces Vignette Semantics, a lexical semantic theory based on Frame Semantics that represents conceptual and graphical relations. We also describe a lexical resource that implements this theory, VigNet, and its application in text-to-scene generation.
Building and Exploring Web Corpora: Proceedings of the 3rd Web as Corpus Workshop, incorporating Cleaneval, 2007
The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate... more The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector ...
We present PCRISP, a sentence generation system for probabilistic TAG grammars which performs sen... more We present PCRISP, a sentence generation system for probabilistic TAG grammars which performs sentence planning and surface realization in an integrated fashion, in the style of the SPUD system. PCRISP operates by converting the generation problem into a metric planning problem and solving it using an offthe-shelf planner. We evaluate PCRISP on the WSJ corpus and identify trade-offs between coverage, efficiency, and accuracy.
When training semantic role labeling systems, the syntax of example sentences is of particular im... more When training semantic role labeling systems, the syntax of example sentences is of particular importance. Unfortunately, for the FrameNet annotated sentences, there is no standard parsed version. The integration of the automatic parse of an annotated sentence with its semantic annotation, while conceptually straightforward, is complex in practice. We present a standard dataset that is publicly available and that can be used in future research. This dataset contains parser-generated dependency structures (with POS tags and lemmas) for all FrameNet 1.5 sentences, with nodes automatically associated with FrameNet annotations.
The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate... more The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector machine classifier is used to distinguish between "clean" and "dirty" blocks. Dirty blocks are removed from the HTML tree before it is passed to the Lynx browser for conversion into plain text. The SVM classifier was trained and evaluated on a manually cleaned dataset of 158 English Web pages, the FIASCO gold standard.
The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate... more The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector machine classifier is used to distinguish between "clean" and "dirty" blocks. Dirty blocks are removed from the HTML tree before it is passed to the Lynx browser for conversion into plain text. The SVM classifier was trained and evaluated on a manually cleaned dataset of 158 English Web pages, the FIASCO gold standard.
Uploads
Papers by Daniel Bauer