Dr. TV. Geetha
Dr. TV. Geetha
Dr. TV. Geetha
Tamil Computing T il C ti
Dr.T.V.Geetha, Tamil Computing Lab (TACOLA), Dept. of CSE & IST College of Engineering Guindy, Anna University Chennai Team Co-ordinators: Ranjani Parthasarathy & Dr.Madhan Karky 27th January 2012
Tamil Computing 1
Characteristics of Tamil
Partially free-word order language Morphologically rich language Morphological suffixes convey most of the h l i l ffi f h roles played in a sentence Ambiguity at morphological level Ambiguity at semantic level g y
3 Tamil Computing
Our Basis
Linguistics
Use of rich Morphological Features of Tamil Use of POS Tags Use of Word Based Semantics with well defined semantic constraints (primitives) UNL
Tamil Computing
Language Processing
Tamil Computing
Language Processing
Morphological Analyzer POS Tagging Chunking Named Entity Recognition Parser Word Sense Disambiguation Anaphora Resolution A h l i Semantic Interpretation
Tamil Computing 6
Morphological Analyzer
Tamil Computing
Morphological Analyser
Morphological Analyser
Morphological Analyser
Compound Analyser C dA l Based on Finite State Transducer (FST) Not only handles simple compounding Handling compounding between two words that may cause inflectional variations during compounding process Ex : (i) (Golden statue) Rule: If, the second constituents first alphabet is Hard consonant Then, Then the first constituents last alphabet is Vowel / Medial consonant constituent s consonant, then No Modification
Morphological Analyser
Compound A l C d Analyser
Ex : (Root tree) Rule R l : If, the second constituents first alphabet is Consonant
Then, Then is inserted as the first constituents last constituent s alphabet
(Root)
Insertion
Morphological Analyser
Compound A l C d Analyser
Ex : (Sand pot) Rule R l : If, the second constituents first alphabet is Hard Consonant Then, the first constituents last alphabet - is replaced by
Replacement R l t
(Pot)
Morphological Analyser
Compound A l C d Analyser
Ex : (Banana) Rule R l : If, the second constituents first alphabet is Hard Consonant
Then, h first Th the fi constituents last alphabet is the same Hard i l l h b i h H d consonant, then it is deleted
(Fruit)
Deletion
Morphological Analyzer
Numeral analyzer
Based B d on Finite S Fi i State T Transducer (FST) d Numbers, one to ten, hundred, thousand, lakh and crore can be directly converted into numbers i i Ex : (Ten)
Rule: No modification
10
Morphological Analyser
Numeral Analyser N lA l Ex : (Five Thousand)
Rule :
If, the second constituents first alphabet is Vowel and the first constituent s last alphabet is Hard consonant then constituents consonant, insert ''
(Thousand)
Insertion
5000
Morphological Analyser
Numeral Analyser N lA l
(Five)
25
Morphological Analyser
Numeral Analyser N lA l
23
(Three)
Deletion
Morphological Analyser
Colloquial Analyser
Based B d on pattern mapping approach tt i h To the best of our knowledge, no previous work has been made to convert informal word to formal word. Adopt spelling variations rules and perform the mapping for transforming informal (colloquial) written word into formal written word.
Colloquial Analyser
Pattern based Approach based on spelling variation rules Word processing Right to left List of Spelling variation rules Suffix Mapping of ending patterns Suffix Mapping of ending patterns with Morphographemic changes Suffix S ffi mapping of ending patterns with checking of one/two i f di tt ith h ki f /t preceding characters Suffix mapping of patterns occurring at any place In all the rules, pattern p1 of colloquial form is converted into pattern p2 of normal form
Colloquial Analyser
Suffix Mapping of ending patterns Ending p g pattern p is replaced with p p1 p pattern p2 p Ex : (irukean)
(irukirean) ( )
Pattern 1 Pattern 2
Replaced
Colloquial Analyser
Suffix Mapping of ending patterns with Morphographemic changes Ending pattern p1 is replaced with pattern p2, then p2 passed for morphographemic change Pattern 2 Ex : Pattern 1
(thambi kittey) (thambi yidam)
Replaced
morphographemic
Colloquial Analyser
Suffix Mapping of ending patterns with checking of
Replaced
Check one preceding character
(A)
This relation are extracted by using the Tamil grammar of compound word ( - thogai), the part of speech tag and UNL semantic constraints t f ht d ti t i t
concept
Semantic Constraint
Relation
Tamil Computing
35
POS Tagging
What is POS tagging?
Part-of-speech tagging is a process of assigning a part-of-speech like noun, verb, pronoun, preposition, adverb, adjective or other lexical class marker to each word in a sentence. t
Noun C t N Category
N NP NN NNP IN INP PN PNP VN VNP Pn PnP Nn NnP Noun Noun Phrase Noun + noun Noun + Noun Phrase Interrogative noun Interrogative noun phrase Pronominal Noun Pronominal noun Verbal Noun Verbal Noun Phrase Pronoun Pronoun Phrase Nominal noun Nominal noun Phrase SP SCC Par P adj Iadj Dadj Inter Int CNum Num DT PO
Other category
Sub-ordinate clause conjunction Phrase Sub-ordinate clause conjunction Particle P ti l Adjective Interrogative adjective Demonstrative adjective Intersection Intensifier Character number Number 25 Date time , Post position
Verb category
V VP Vinf Vvp Vrp AV FV NFV adv Verb Verbal phrase Verb Infinitive Verb verbal participle Verbal Relative participle Auxiliary verb A ili b Finite Verb Negative Finite Verb Adverb Tamil Computing
37
Using these morpheme properties we design a Nave Baye s probabilistic Model for POS Bayes
Analysis of morphology of words and design of Nave Bayes Model for POS based on morpheme components
Tamil Computing 38
Chunking
Tamil Computing
39
Chunking
What is Chunking?
Chunking is the task of identifying and g f fy g segmenting the text into syntactically related non overlapping groups of words.
Need for chunking one of the important preprocessing for all other language processing aid to extract crux part of information from sentences and documents The chunk types are
ADJP, ADVP, CONJP, INTJ, NP, PP and VP.
Tamil Computing 40
Our Approach
Our Approach The morpheme features of words contribute in identifying boundaries of chunking. Using these morpheme components as one of the features ,CRF model is designed.
Using Morpheme Components as features for Conditional Random Fields models for identifying chunking boundaries
Tamil Computing
41
words
Transliteration [intha] [thakavalin] [atippataiyil] [pOlicAr] [andthandtha] [mAvatta] [p [pOlish] ] [cOthanaic] [cAvati] [maiyangkalil] [vAkanac] [cOthanaiyil] [Itupattanar]
POS <adj> <Ngen> <Nloc> <noun> <Dadj> <Madj> <noun> <adj> <noun> <Nloc> <adj> <Nloc> <FV>
chunk B -NP I-NP I-NP B-NP B-NP I-NP I-NP I-NP I-NP I-NP I NP B-NP I-NP B-VP
F(word(-1),Ctag ) F(word(-1),word(0), C ) tag F(word(0),Ctag ) F(word(0),word(1), Ctag) F(word(1),Ctag ) F(word(1),word(2), Ctag) F( d(1) d(2) F(word(2),Ctag ) F(POS(-2),C ) F(POS(-2),POS(-1),Ctag )
tag
Tamil Computing
42
Tamil Computing
43
Tamil Computing
45
Features
Postpositions Case markers PNG marker in Verb
Tamil Computing
46
For Locations:
Presence of post-position , Presence of adjacent words like [ndakar]. [ndathi]. [ mAvattam] A tt ]
For Organizations:
Presence of adjacent words like [ndiRuvanam]. [thuRai] j [
For Time/Date:
Presence of adjacent words like [thEthi]. [ANtu]. [mAtham]
Tamil Computing 47
Training data
Shallow parsing
Semantic parsing
Statistical processing
Dictionary
NE table
Training data
Tamil Computing
48
Tamil Computing
49
Tamil Computing
50
Modified EM algorithm
Two problems were encountered with the traditional E M algorithm: E-M
Performed only positional analysis , and a modification was required for free word order languages like Tamil i i it was syntactically oriented, and modification was required to include semantic information.
The modification process called Quantum entanglement, solves both the above problems. g , p
Tamil Computing
51
Example -
Enloc E l 0.49 0 49 0.49 0 49
0.64 0 64 0.75
0.01 0 01
0.01 0 01
0.23 0 23
0.01 0.86
0.01
0.01 0.01
0.06 0.92
0.12
52
Tamil Computing
Parser
Tamil Computing
53
3. Constituent Formation
Two main components are noun and verb constituent Noun constituent : A noun constituent can contain only noun (Ex. ) or can be of the following form (adjective)* ( dj i ( dj i )* (adjective clause)* (adjective)* (adjective l )* ( dj i )* ( dj i clause)* (adjective)* noun (case marker) (post position) Ex. (or) noun clause Ex.
Tamil Computing
55
Tamil Computing
56
Tamil Computing
57
Tamil Computing
58
Grouping of Clauses
Distinguishing feature of the parser Clauses are generally indicated by special cue suffixes or cue phrases (Ex Verbal participles, participles relative participles, etc.) participles etc ) Grouping is done by position of the cues and linguistic based heuristic rules
Tamil Computing
59
Tree Generation
Position of each word in the sentence is also shown to take care of free word order First the converted minimal simple sentence is considered to generate the tree. Ex.
The NCs and VC are expanded to generate the tree for the actual input sentence.
Tamil Computing 61
Tamil Computing
63
Tamil Computing
64
Tamil Computing
65
Bootstrapping uses M h l i l S ffi B i Morphological Suffixes, POS S POS, Semantic i constraints and UNL relations (for verbs) Pattern representation and features Noun <left(features), ambiguous word(Sense set, f f (f ), g ( , features), right ), g (features), main verb> Verb <ambiguous word + sense, relations of interest>
66 Tamil Computing
Sense number
POS
Example 1.1:
Noun
pandiyan aaru padaikal kondu por thoduthaan <entity, tit aaru<number>, b verb+icl>action> Example 1.2: noun+plural suffix, l l ffi
river
Noun
Tamilnattin periya aaru kaveri aagum <adjective, aaru<river>, entity, erb+aoj>thing> <adjecti e aar <ri er> entit verb+aoj>thing>
67 Tamil Computing
Sense offer
POS Example 2.1: Verb pa t a a a u u pa a ga a pada t t aa pakthan iraivanukku pazangalai padaiththaan <padai+offer, agt + obj + to> Example 2 2: 2.2:
create
Verb
Anaphora Resolution
Tamil Computing
69
Anna University
Anaphora Resolution
Approaches
C t i Th Centering Theory (B (Brennan et al, 1987) t l Hobbs algorithm (Hobbs, 1978)
Applications
Summarization Question Answering Information Retrieval
Tamil Computing 70
Tamil Computing
71
Our Approach
Classification of Anaphora Persons, Places and Events Centering Theory - modified by incorporating Word level semantics - UNL Semantic constraints Graph based approach - Sentence level semantics - UNL graphs Absence of Case suffixes have been handled using UNL graphs Plural and Event pronouns associated with multiple antecedents - tackled using UNL graphs
Tamil Computing 72
Classification of Anaphor
Anaphora representing Persons Person Anaphora - Nouns, Noun phrases p , p avan, avaL, avar, ivan, ivaL, ivar and plural pronouns avarkaL and ivarkaL Examples Raju nandraaka padiththaan. avan thervu ezuthinaan Maanavarkal nandraaka padiththaarkal. avarkaL thervu ezuthinarkal Anaphora representing Places Place Anaphora - Nouns, Noun phrases - athu, ithu Adverbs such as angu and ingu can also acts as pronouns representing places Examples tiruchy tamilnaattin periya nagarangalil ondru. Ingu amman kovil uLLathu. ithil aayiram thooNkal uLLana.
Tamil Computing 73
Tamil Computing
74
Ambiguous Pronouns
Pronouns such as athu, ithu can represent both places and events Higher level of semantics and verb semantics is needed ti i d d Examples
maduraiyil meenatchi kovil ullathu. Ithil aayiram thoonkal uLLana. madurayil ulla meenatchi kovilil aanmeeka sorpozhivu nadaipeRRathu. Ithil eeralamaana makkal pangeRRanar. l kk l RR
Tamil Computing 75
Tamil Computing
79
Kandippu, Scold K di S ld (agt>thing, obj>thing) agt obj Annan sabapathi icl>person Ramalingam g (iof>person)
Tamil Computing
Kattuppadu K tt d Abide (agt>thing, obj>thing) ben agt Avar (kku) He, pronoun Avar, he pronoun
80
Semantic Representation
Tamil Computing
82
Semantic Interpretation
Binding the user utterance to concept, or representation of concept concepts that the system can understand The process of mapping a syntactically analysed text of natural p pp g y y y language to a representation of its meaning Semantic Interpretation - Aspects
Word W d meaning & W d S i Word Sense Di Disambiguation bi i Lexical Disambiguation Structural Disambiguation Semantic Relations
Issues
Coreference and Anaphora Lexical Semantics Syntactical and Grammatical Categories Logical Semantics
Tamil Computing 83
Tamil Computing
84
Purposed Work
Semantic Interpretation of Tamil Text Use f U of UNL as th b i f Semantic the basis for S ti representation Use of UNL based information for NLP processing Use of UNL graph for Summarization and Question Answering
Tamil Computing
85
Enconversion Process
Pass1 Identify possible UNL relations of a word Wi Pass2 P 2 Disambiguate the relations, if multiple unl relations assigned for a word Identify the connected concepts with the word Wi
Tamil Computing
87
Morphology - Case suffixes associated with the word pazhathai Connective Natural Language word maRRum, maRRum allathu etc Co-occurrence Raamanaal seyyappattathu R l tt th POS Part Of Speech tag of the word Noun, V b Adj i Ad b N Verb, Adjective, Adverb Semantics icl>person, iof>place, icl>time etc.
Tamil Computing 88
Tamil Computing
89
Another Example
Pass 1 Pass 2
Tamil Computing
90
Unsupervised Approach
Features used for Probability estimation
Morphological Suffix POS Semantic Constraints Starting and Ending symbols g g y
Relation between concept pairs can occur anywhere Semantic Similarity based on UNL ontology Feature T F t Tagged corpus t d tagged using ruled i l 92 based approach Tamil Computing
Question Answering
Tamil Computing
93
Question Classification
Need for QC & Answer types QC: Accurately classify a question in to a question type and then map it to an expected answer type What i th biggest city in the United States? Wh t is the bi t it i th U it d St t ? Question Type: Q_LOCATION_CITY Extract and filter answer type to improve the overall accuracy of a question answering system Morpheme based CRF approach to Question Classification d Expected A Cl ifi ti and E t d Answer t type d t ti detection
Tamil Computing 94
Factoid type
? [ Who is India's prime minister ?] ? [When [Wh did India became independent country?] I di b i d d t t ?] Where - ? [Where was Gandhiji born?] Which ? Which state has the highest population in India? Abbreviation - ... ? [What is the expansion of IAS?] Definition type How [] ? How does DC generator operate? Who - ? [ Who is Manmohan singh?] Define - - [ Define Kirchoffs Law] List type Enumerate - . [ Enumerate districts in Tamil nadu] List List out states in India Who When Tamil Computing 95
Question Classification
DESC NUM
ABBR, DEFINITION DEFINITION, MEANING, ABBR DEFINITION, DEFINITION MEANING REASON,OTHER AGE, AREA, CODE, COUNT, DISTANCE, FREQUENCY, ORDER, PERCENT PHONENUMBER POSTCODE, ORDER PERCENT, PHONENUMBER, POSTCODE PRICE, RANGE, SPEED, TELCODE, TEMPERATURE, WEIGHT, LIST, OTHER ALIAS, DESCRIPTION, ORGANIZATION, PERSON, LIST, OTHER ANIMAL, CITY, COLOR, CURRENCY, ENTERTAIN, FOOD, INSTRUMENT, LANGUAGE, PLANT, RELIGION, SUBSTANCE, VEHICLE, LIST, OTHER ADDRESS, CITY, CONTINENT, COUNTRY, ISLAND, LAKE, MOUNTAIN, OCEAN, PLANET, PROVINCE, RIVER, LIST, OTHER DAY, MONTH, RANGE, TIME, YEAR, LIST, OTHER , , , , , ,
HUM OBJ
LOC
TIME
Tamil Computing
By TREC 96
Tamil Computing
97
Our Approach
Bag of key words matching. In extracted passage, the terms that is in question are removed. The remaining concept or entity terms may be answers. Person - Named Entity, Possible case marker, Question word case marker Location - Considering possible case markers - Temporal word database, number range
Time
Quantity - Possible words in database () ( ) - After question term as definition term - Before question term as definition term
Tamil Computing
98
Predicate Extraction
Predicates A (x ,y) ( y)
The relation graph gives semantic relation with all entities along with type of entity This semantic information provide filtering out the required Answer part
Tamil Computing 99
Tamil Computing
100
Definitional QA Process
Due to the free word nature of Tamil the ranked sentences will not be the prcise answer for the question. So the definition terms f S th d fi iti t from th sentences are extracted using some short the t t t d i h t patterns (K Soo Han,2007)( Jinxi Xu, 2003) as given below. <place of birth> lpiRanthAr <year> Am Andu <month> Am mAtham ivarathu thanthaiyAr < father Name>, thAyAr <mother Name> y f , y <year> Am Andu maRainthAr The leaf nodes of the answer graph give the details presented in the sentence. The definition answer has been created using the definition templates.
Use of statistically processed seed information for classification yp and scoring of sentences for inclusion in the answer graph representing the definitional answer to who questions
Tamil Computing 101
WEB
Seed Information
Term Probability
Sentence ranking
Sentence Classification
T W (t ) = nd N
S. No 1 Category Birth Features (piRappu) (piRanthAr) (thOnRiNar) (peRROr) (thAyAr) (thAyAr) (thanthaiyAr) (kalvi) (padippu) (paLLi) ( LLi) (paNi) (vElai) (viruthu) (parisu) (iRanthAr) (maRainthAr) (pErasiriyar) (vinjAni) (arasiyalvAthi)
103
where TW is Term Weight nd = No. of documents, in which the term t occurred N = Total number of documents
Parent
Education
f ( x) = w x + b
t
where w is the weights vector of features, b i intercept f is i
5 6 7
Tamil Computing
1931 15 . . . . . . <ND> . 1981 . 1990 . 1997 . .<ND> . . <ND> . . . <ND> . .<ND> . <ND> . <ND> .<ND> . Tamil Computing 104
1931 15 . <D> <BIR> . <D> <PAR> . <D> <EDU> . <D> <EDU> . <D> <EDU> . <D> <WRK> . D WRK 1981 . <D> <AWD> 1990 . <D> <AWD> 1997 . <D> <AWD> . <D> <S> . <D> <S> . <D> <S> . <D> <S> . <D> <S> . <D> <S>
. <D> <S> . <D> <S> . <D> <S> D S . <D> <S> . <D> <S> . <D> <S>
Tamil Computing
105
Based on knowledge base tree Relation between the terms in knowledge base Graph Expansion with Lower levels of knowledge base tree
Tamil Computing
106
Tamil Computing
107
Tamil Computing
108
Our Work
Capturing semantic features of the document. document Identifying key concepts and relations for summarization. summarization Using machine learning model to identify sub graph of the original document semantic graph.
Tamil Computing
109
Detailed Design
SEMANTIC GRAPH GENERATION
Linguistic Analysis
Syntactic and y Semantic analysis Analysing & y g Logical Form Parsing
Coreference Resolution
Semantic Normalization
SVM
110
Prediction
After training the learned model is used to predict the important nodes of the given documents semantic graph.
Tamil Computing
111
Tamil Computing
112
Tamil Computing
113
Tamil Computing
114
Graph Generation
Tamil Computing
115
Tamil Computing
116
Tamil Computing
117
Tamil Computing
118
Objective
To propose a framework for automatic analysis and summary generation for a cricket match in Tamil, with the scorecard of the match as the input. input The framework proposes a method to evaluate the interestingness of a cricket match. The framework proposes a customization model for the summary. The f Th framework also proposes methods for evaluating the k l h d f l i h humanness of the generated summary.
Sentence Generation
The sentence which is the most apt to the current event under consideration is selected The vocabulary used in the sentence and the depth to which an event is discussed is also varied based on the expert level of the user The nouns in the key events are passed to the morphological generator along with the desired case endings and the generated variants are added to the sentences. The system uses the morphological generator developed at TaCoLa + =
Event Clustering
Tamil Computing
122
Tamil Computing
123
Tamil Computing
124
Contribution The degree of connectivity of a concept with UNL event specific semantics+ the concept distance score as well as the TF/IDF score.
Tamil Computing
128
Lyric Mining
We have processing using 2,000 lyrics Analysis
Word level analysis Rhyme analysis Concept co-occurence analysis Pleasantness score Pl t
This analysis has been mainly used in the lyric generation and computing freshness scoring for lyrics.
Lyric Mining
Word Level Analysis
The frequency of words is used to associate a q y popularity score for each word. Popularity score of the word has been p y identified from lyrics. In lyrics, the words are attached with suffix. y , Root words - determine its frequency count.
Lyric Mining
Word Level Analysis - Results
WORDS USAGE
1153 1062 793 965 857
Lyric corpus of two thousand songs were analysed for the word, rhyme and Co-occurence concepts usage.
Lyric Mining
Rhyme L l A l i Rh Level Analysis
Adapted Apriori Algorithm Frequency count of rhyme, alliteration and F t f h llit ti d end rhyme pairs of Tamil lyrics
USAGE
2291 2255 2028 1973 1952
EDHUGAI
, , , ,
MONAI
,
,
USAGE
3338 3145 2947 2763 2480
, , ,
Lyric Mining
Concept Co-occurence Analysis Frequent occurrence of two terms from a lyric corpus Agaraadhi, an online Tamil dictionary Cancelling the ambiguous and the polysemy of words t i l f d to improve the th accuracy of the entire system. Example : The word which has the concept , , , , , ,
Lyric Mining
Pleasantness score
Identify the pleasantness of a word based on 5 models
3 models Language independent 2 models Language dependent
In all the models, first the given grapheme word is converted into phoneme form using Tamil phonology rules rules. Models
Meaning based model Language Dependent Model I Language Dependent Model II Manner of articulation based model Manner and place of articulation based model
Lyric Mining
Pleasantness score Meaning based Model
Maintain the pleasant and unpleasant word list Calculate the frequency of phoneme in pleasant and unpleasant word list Language Dependent Model I Judge the plesantness based on Vallinum, Mellinum Idaiyinam classification, lli d i i l ifi i Maathirai and kurukkams except kutriyalikaram Language Dependent Model II Thi d l i d i ii l l
Lyric Mining
Pleasantness sore
Manner of articulation based model
Category Manner of Articulation
Phoneme
Greater Rough G t R h
Rough
, , , ,
Intermediate
Semivowels, Approximants
, , , ,
Soft
Nasal
, , , , ,
Lyric Mining
Pleasantness score
Manner and place of articulation based model
place of articulation score, categories which arise from the parts near the oral cavity are p y considered pleasanter than those which go deeper. Taking manner of articulation into consideration, Nasals N l are given hi h sweetness score i highest followed by Laterals, Fricatives, Stops and Trills.
Applications
Tamil Computing
138
Tamil Computing
139
Indexer de e
constructs a sophisticated file structure to enable fast page retrieval
Searcher
Searches the indexed information that satisfies user queries Ranks output
Tamil Computing
140
Tamil Computing
141
Three indices Concept-Relation-Concept, ConceptRelation, Concept Query converted into UNL representation Searching and ranking based on concepts & relations rather than words
Tamil Computing 142
COREECOREE-Architecture
Thesaurus
Input Processing
Parsed Query
IL Query
Morphological Analyzer
Light Weight WSD
Query Expansion
NER List
MWE List
UW List
WSD
NER List
MWE List
Modules of COREE
Focussed Crawling UNL based Document Processing g
Sentence Extraction Enconversion Construction of three types of multilist indexes
Tamil Computing
144
Document Processing g
WSD NER
Tamil Document
Tamil UW list
Tamil Computing
145
UNL Lists
UWList Universal Word List
Tamil Computing
146
plf
via i
plt lt
UNL relations are disambiguated using the semantics of the concepts (iof>city)
Tamil Computing 147
MULTILIST
Relation Nodes To Concept Nodes
Tamil Computing
148
Tamil Computing
149
Concept Only y
Concept Relation
Tamil Computing
150
Tamil Computing
151
Query Translation
[s] [w] ; vivekanandar; iof>person; Entity; 1 ; lecture; icl>action; Noun; 2 [/w] [r] 2 [/r] [/s] pos 1
pos
Tamil Computing
152
Query Expansion
NER
Input Query
WSD
Parsed y Query
Morphological Processing
Expanded Query
Query Expansion
Query Word Query word With Expanded word Relation
< - pos> < - pos> < - and> < - and> d < - pos> < - and> < - plf> < - iof> < - pos> < - pos> < - pos>
154
() ()
Tamil Computing
Search
Indexed Concept, Concept-relation and Concept-Relation-Concept
UNL Based Indexing UNL Based Searcher Performing various levels of matching
Expanded query
Tamil Computing
155
Output Processing
UNL Document D t
Capturing tourism related Filling Templates info from the UNL documents
Morphological Generator
Document summary
Tamil Computing
156
Conceptual Results
Tamil Computing
157
Tamil Computing
158
OBJECTIVES
Agaraadhi, Agaraadhi a dictionary framework for indexing and retrieving Tamil words, their meaning, meaning analysis and related information information. Framework to incorporate various unique features - designed to provide additional information to the user regarding the word that they query about about.
Tamil Computing
159
AGARAADHI FRAMEWORK
Tamil Computing
160
Agaraadhi Meaning for the Word pookkal (example for case ending word)
Tamil Computing
164
Obj ti Objectives
Kuralagam is a conceptual search framework for Thirukkural based on UNL Framework.
Searching with keywords in kurals and intepretations Concept based search based on CoReX conceptual indexing based on UNL Bilingual search English and Tamil Showing Relationships between the concepts.
Tamil Computing
165
Kuralagam Framework
Tamil Computing
166
Online Processing
Search and Ranking fetches the Thirukkural number and its details. Thirukkurals for a given query are fetched using the two types of concept relation indices namely CRC and C. The query concept is expanded using related CRC indices pointing to the query concept. helps in retrieving many Thirukkurals conceptually related to the query not possible with key word Thirukkural search engines. The ranking is based on priority to the indices in the order CRC>C usage score frequency occurrence of the query concept
Tamil Computing 167
Kuralagam conceptual results for the query word paNam is Dhanam Dhanam
Demo
COREE Agaraadhi Tamil Language Based Games
Tamil Computing 176