Choosing The Word Most Typical in Context Using A Lexical Co-Occurrence Network
Choosing The Word Most Typical in Context Using A Lexical Co-Occurrence Network
Choosing The Word Most Typical in Context Using A Lexical Co-Occurrence Network
Philip Edmonds
Department of Computer Science, University of Toronto
Toronto, Canada, M5S 3G4
pedmonds@cs.toronto.edu
suggest/VB
Table 1: The sets of synonyms for our experiment.
Figure 1: A fragment of the lexical co-occurrence net-
work for task. The dashed line is a second-order relation
implied by the network. in the sentence. So, given a gap in a sentence S, we find
M(c, S) =
X
the candidate c for the gap that maximizes
sig(c, w)
We can represent these relations in a lexical co-
occurrence network, as in figure 1, that connects lexi-
2
w S
cal items by just their first-order co-occurrence relations. For example, given S as sentence (3), above, and the
Second-order and higher relations are then implied by network of figure 1, M(task, S) = 4.40. However, job
transitivity. (using its own network) matches best with a score of
5.52; duty places third with a score of 2.21.
2.2 Building Co-occurrence Networks
We build a lexical co-occurrence network as follows: 3 Results and Evaluation
Given a root word, connect it to all the words that sig-
nificantly co-occur with it in the training corpus;1 then, To evaluate the lexical choice program, we selected sev-
recursively connect these words to their significant co- eral sets of near-synonyms, shown in table 1, that have
occurring words up to some specified depth. low polysemy in the corpus, and that occur with similar
We use the intersection of two well-known measures frequencies. This is to reduce the confounding effects of
of significance, mutual information scores and t-scores lexical ambiguity.
(Church et al., 1994), to determine if a (first-order) co- For each set, we collected all sentences from the yet-
occurrence relation should be included in the network; unseen 1987 Wall Street Journal (part-of-speech-tagged)
however, we use just the t-scores in computing signifi- that contained any of the members of the set, ignoring
cance scores for all the relations. Given two words, w0 word sense. We replaced each occurrence by a ‘gap’ that
and wd , in a co-occurrence relation of order d, and a the program then had to fill. We compared the ‘correct-
shortest path P(w0 , wd ) = (w0 , . . . , wd ) between them, the ness’ of the choices made by our program to the baseline
X
significance score is of always choosing the most frequent synonym according
to the training corpus.
1 t(wi,1, wi ) But what are the ‘correct’ responses? Ideally, they
sig(w0 , wd ) =
d3 i
2
wi P(w1 ,wd ) should be chosen by a credible human informant. But
regrettably, we are not in a position to undertake a study
This formula ensures that significance is inversely pro- of how humans judge typical usage, so we will turn in-
portional to the order of the relation. For example, in the stead to a less ideal source: the authors of the Wall Street
network of figure 1, sig(task, learn) = [t(task, difficult) + Journal. The problem is, of course, that authors aren’t
1
2 t(difficult, learn)]/8 = 0.41. always typical. A particular word might occur in a ‘pat-
A single network can be quite large. For instance, tern’ in which another synonym was seen more often,
the complete network for task (see figure 1) up to the making it the typical choice. Thus, we cannot expect
third-order has 8998 nodes and 37,548 edges. perfect accuracy in this evaluation.
Table 2 shows the results for all seven sets of synonyms
2.3 Choosing the Most Typical Word
under different versions of the program. We varied two
The amount of evidence that a given sentence provides for parameters: (1) the window size used during the construc-
choosing a candidate word is the sum of the significance tion of the network: either narrow (4 words), medium
scores of each co-occurrence of the candidate with a word ( 10 words), or wide ( 50 words); (2) the maximum
1
Our training corpus was the part-of-speech-tagged 1989 order of co-occurrence relation allowed: 1, 2, or 3.
Wall Street Journal, which consists of N = 2, 709, 659 tokens. The results show that at least second-order co-
No lemmatization or sense disambiguation was done. Stop occurrences are necessary to achieve better than baseline
words were numbers, symbols, proper nouns, and any token accuracy in this task; regular co-occurrence relations are
with a raw frequency greater than F = 800. insufficient. This justifies our assumption that we need
Set 1 2 3 4 5 6 7
Size 6665 1030 5402 3138 1828 10204 1568
Baseline 40.1% 33.5% 74.2% 36.6% 62.8% 45.7% 62.2%
1 31.3% 18.7% 34.5% 27.7% 28.8% 33.2% 41.3%
Narrow 2 47.2% 44.5% 66.2% 43.9% 61.9%a 48.1% 62.8%a
3 47.9% 48.9% 68.9% 44.3% 64.6%a 48.6% 65.9%
1 24.0% 25.0% 26.4% 29.3% 28.8% 20.6% 44.2%
Medium 2 42.5% 47.1% 55.3% 45.3% 61.5%a 44.3% 63.6%a
3 42.5% 47.0% 53.6% — — — —
Wide 1 9.2% 20.6% 17.5% 20.7% 21.2% 4.1% 26.5%
2 39.9%a 46.2% 47.1% 43.2% 52.7% 37.7% 58.6%
a
Difference from baseline not significant.
Table 2: Accuracy of several different versions of the lexical choice program. The best score for each set is in boldface.
Size refers to the size of the sample collection. All differences from baseline are significant at the 5% level according
to Pearson’s 2 test, unless indicated.