Text Representation: Chengxiang "Cheng" Zhai Department of Computer Science University of Illinois at Urbana-Champaign

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

Text Representation

ChengXiang “Cheng” Zhai


Department of Computer Science
University of Illinois at Urbana-Champaign

1
Text Representation
5. Text-based prediction
3. Topic mining & analysis

Real World Observed World Text Data


Perceive Express 1. Natural language
processing & text
(Perspective) (English) representation

4. Opinion mining & 2. Word association


sentiment analysis mining & analysis

2
A dog is chasing a boy on the playground String of characters

A dog is chasing a boy on the playground Sequence of words


Det Noun Aux Verb Det Noun Prep Det Noun + POS tags
Noun Phrase Complex Verb Noun Phrase
Noun Phrase
Verb Phrase Prep Phrase + Syntactic structures
Verb Phrase
Sentence

A dog A boy the playground


CHASE ON + Entities and relations
Animal Person Location

Dog(d1). Boy(b1). Playground(p1). Chasing(d1,b1,p1). + Logic predicates

Speech Act = REQUEST + Speech acts


Closer to knowledge
Deeper NLP: requires more human effort; less accurate representation 3
Text Representation and Enabled Analysis
This course

Text Rep Generality Enabled Analysis Examples of Application


String String processing Compression
Words Word relation analysis; topic Thesaurus discovery; topic and
analysis; sentiment analysis opinion related applications
+ Syntactic Syntactic graph analysis Stylistic analysis; structure-
structures based feature extraction
+ Entities & Knowledge graph analysis; Discovery of knowledge and
relations information network analysis opinions about specific entities
+ Logic Integrative analysis of scattered Knowledge assistant for
predicates knowledge; logic inference biologists
4
Summary
• Text representation determines what kind of mining algorithms can be
applied
• Multiple ways of representing text are possible
– string, words, syntactic structures, entity-relation graphs,
predicates…
– can/should be combined in real applications
• This course focuses on word-based representation
– General and robust: applicable to any natural language
– No/little manual effort
– “Surprisingly” powerful for many applications (not all!)
– Can be combined with more sophisticated representations
5

You might also like