Natural Language Generation
Natural Language Generation
Natural Language Generation
парафраз и автоматическое
обобщение отзывов
пользователей с помощью
рекуррентных нейронных сетей
Тарасов Д. С. (dtarasov3@gmail.com)
Интернет-портал reviewdot.ru, Россия, Казань
Tarasov D. S. (dtarasov3@gmail.com)
ReviewDot Research, Kazan, Russia
1. Introduction
Tarasov D. S.
2. Related work
Natural Language Generation, Paraphrasing and Summarization of User Reviews
translation models [Cho et al, 2014], while our RNN architecture is different and in-
spired by method of [Mao et al, 2014] where RNN was used to generate descriptions
of pictures. We are not aware of any prior application of such models to abstractive
text summarization or paraphrase generation.
3.1. Datasets
h(t) = f (Wx( t ) + Vh( t − 1) + b)
Here f is a nonlinear function, (in our case hyperbolic tangent function), W and V
are weight matrices between the projection and recurrent layer, and between the hidden
units. U is the output weight matrix, b is bias vector connected to hidden and output units.
After the recurrent layer, we set up a summarization layer that connects the
language model part and sentence-level semantics in s-RNN model. The language
model part includes the projection layer and the recurrent layer. The sentence-level
semantics contains the sentence features vector. We use sentence polarity, product
category, bag-of-aspect-terms vector and sentence length as sentence-level features.
While it is possible to incorporate more complex features, including these learned
Tarasov D. S.
Next word
Sentence-level
Features vector
Fully connected
max-sampling
Input
word
4.1. Paraphrasing
Percentage of sentences
(average value from re-
Human judgment sults of two human judges)
Grammatically correct and conveying original meaning 65%
Conveying original meaning but not necessary correct 78%
Correct, but not conveying original meaning 18%
* English translations are human made, with an effort to preserve important sentence features.
As shown in table 2, most common mistakes are omissions of some original
points and additions of new information that was not present in original sentence.
Tarasov D. S.
Our design allows certain degree of control over the meaning of generated sen-
tences. By choosing sentence-level features vector we can instruct the network, for
example, to “say something good about screen and sound quality in about ten words”.
We found, that better sentences are produced when number of words is set to roughly
triple of the number of aspect terms. With smaller sentences, RNN just lists all aspects,
and with larger values it tend to produce long phrases without well-defined meaning
(“bright display from outside”) and undesired additions such as “smart helps” (Table 3).
Desired
sentence length output
3 батарея, экран, удобный
(battery, screen, convenient)
5 аккумулятор, размер дисплея солидный, эргономика
(accumulator, impressive display size, ergonomics)
10 быстрый аккумулятор, яркий внешне дисплей, удобный
функционал, умный помогает.
(fast accumulator, bright display from outside, convenient
functions, smart helps)
Natural Language Generation, Paraphrasing and Summarization of User Reviews
Positives Negatives
Качество звука, удобный интерфейс, очень долго держит за- Не обнаружено
ряд. Отзывчивый экран, громкий звонок, крупный шрифт, (not found)
рабочий день. Приятно лежит в руках, 2 сим—карты вы-
ручают. Качество сборки, батарея, удобное меню, устойчив
к воздействию воды. Явно лидируют, сочный дисплей, каче-
ство связи, плеер, фонарь. Хорошая фотокамера, динамик
(Quality of sound, convenient user interface, very long battery
life. Responsive screen, loud calling signal, large font, working
day. Lies in hands nicely, 2 sim cards help. Quality of production,
convenient menu, waterproof. Obviously leading, nice display,
player, bright light. Good photo-camera, speaker).
Аккумулятор, скорость красивая. Дизайн, звук, функционал, Cкользкий панель
масса разных дней хватает. Красив, несколько назад, про- громкости тиховат.
цессор отзывчивый сенсор. Красивый экран, цветопередача. Cтирается, заметно
Дизайн, батарея, не тормозят, практичный. (Accumulator, ос виснет, появляется
speed is beautiful. Design, sound, functions, lot of different days. белый экран. (Slippery
Beatiful, few days ago, processor, responsive sensor. Nice screen, panel of volume is too
color reproduction. Design and battery is not slow, practical). quiet. Noticable shabby,
OS hangs and white
screen appears)
Acknowledgements
Tarasov D. S.
References