Amitkv Lit Survey Summarization
Amitkv Lit Survey Summarization
All these models are trained on Gigaward 4.2 Natural Language Model
corpus and the proposed model is trained only
on the first sentence of Gigaword corpus. For The approach proposed in the paper (Rush
testing DUC-2003 corpus was selected and et al., 2015) makes use of basic feedforward
ROUGE as a measure of performance. DUC- neural networks and generates probability dis-
2003 top performing model TOPIARY is also tribution over output sequence. Probability of
considered for comparison. Table 6 shows re- generating next word is given in equation 16
sults of the evaluation. where, θ = (E, U, V, W ) and C is window
size of context, E is embedding martix and
4 Neural Attention Model for Sentence V and W are weights associated with hidden
state and encoder. it also sufferers from stop words.
0
enc1 (x, yc ) = pT x
p(yi−1 |yc , x; θ) ∝ exp(V h + W enc(x, yc )) p = [1/M, . . . , 1/M ]
(16) 0
x = [F x1 , . . . , F xM ]
0
enc3 (x, yc ) = pT x (19)
Figure 7: System Architecture of Attention Based Sen- 0 0
5 Summarization with
Pointer-Generator Network(See et al., at = sof tmax(et ) (26)
2017)
Attention can be considered as the location
Recent summarization approaches discussed to produce next word from. Attention is used
so far tries to generate the summaries irre- to get weighted sum of hidden state which
spective of correctness of factual data and represents overall hidden state h∗ . This hid-
without considering novelty of information den state along with hidden state of decoder
in produced summary. Abstractive summa- then used to probability distribution Pvocab
rization proposed in the paper (See et al., over all words in vocabulary. The equation 27
2017) tries to overcome these shortcomings captures the calculation required to generate
along with handling of OOVs. The author probability distribution over all vocabulary
discusses three approaches (1)Baseline model words.
(section 5.1) (2)Pointer generator model (sec-
tion 5.2) and (3)Coverage mechanism (sec-
0 0
tion 5.3). Rest of this session discusses these Pvocab = sof tmax(V (V [st , h∗t ] + b) + b )
approaches in detail. (27)
document, attention term becomes zero. neg-
ative log-likelihood is used as loss function to
train the model and learn the parametes.
References
Yllias Chali, Sadid A Hasan, and Shafiq R Joty. 2009.
A svm-based ensemble approach to multi-document