Natural Language Generation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Генерация естественного языка,

парафраз и автоматическое
обобщение отзывов
пользователей с помощью
рекуррентных нейронных сетей

Тарасов Д. С. (dtarasov3@gmail.com)
Интернет-портал reviewdot.ru, Россия, Казань

Ключевые слова: генерация естественного языка, генерация параф-


раз, автоматическое обобщение отзывов, рекуррентные нейронные сети

Natural Language Generation,


Paraphrasing and Summarization
of User Reviewswith Recurrent
Neural Networks

Tarasov D. S. (dtarasov3@gmail.com)
ReviewDot Research, Kazan, Russia

Multi-Document summarization and sentence generation are important


challenges in natural language processing. This paper presents recurrent
neural network (RNN) architecture capable of producing abstractive doc-
ument summaries, as well as generating novel paraphrases of input sen-
tences in the same language. We demonstrate practical application of our
system on the task of multiple consumer reviews summarization.

Keywords: natural language generation, paraphrase generation, automatic


summarization of user reviews, recurrent neural networks

1. Introduction

The main role of automatic document summarization is to help readers to under-


stand most important points of long documents without much effort. One particular
area of document summarization that attracted a lot of research attention is automatic


Tarasov D. S.

summarization of consumer reviews, also called opinion summarization. It is tradition-


ally based on feature selection, feature rating and identifying important sentences, lead-
ing to so called extractive summaries (summaries that consists of original sentences ex-
tracted from user reviews) [Mei et al, 2007; Liu J. et al, 2012, Liu C. et al, 2012, Raut and
Londhe, 2014]. Another kind of summaries is abstractive summaries (texts that summa-
rize essential facts mentioned in reviews without using original sentences). Such texts
tend to have better coverage for a particular level of conciseness, and to be less redun-
dant and more coherent [Carenini et al, 2006]. They also can be constructed to target
particular goals, such as summarization, comparison or recommendation.
Abstractive summarizers rely on natural language generation systems, that are
currently designed using a lot of expert linguistic knowledge, heuristics and complex
pipelines (that typically include text planner, sentence planner and surface realizer)
[Fabbrizio et al, 2014]. Therefore adapting such systems to new languages and do-
mains can be difficult. Up until now, only a few works considered machine learning
based (trainable) language generation systems, and their success was limited [Rat-
naparkhi, 2000;Hammervold, 2000]. However, recent research on neural networks
demonstrated their capabilities to generate novel descriptions of pictures using purely
machine-learning methods [Mao et al, 2014].
In this work we explore application of similar methodology to the domain of con-
sumer reviews. We describe and evaluate recurrent neural network (RNN) model ca-
pable of generating novel sentences and document summaries.
To achieve this, we train recurrent neural network language model on a large
number of sentences describing positive and negative aspects of various consumer
products. In our setup, RNN task is to predict next word given current word and ad-
ditional sentence-level semantic information that include sentence polarity, sentence
length, product category and bag of aspects vector. In the test phase we give RNN
sentence-level features vector and generate corresponding sentence.
We demonstrate that such relatively simple model can generate novel para-
phrases that capture original meaning and show that this ability can be used to “com-
press” multiple important points about the product in one statement, thus producing
concise multi-document summary. To do this, we first compute semantic vectors for
all sentences in all available user reviews of a given product, combine them into two
semantic vectors—positive (containing bag of positive aspects) and negative (contain-
ing bag of negative aspects). We then feed these vectors to language-generating RNN,
obtaining sentences that sum up negative and positive product sides.

2. Related work

Convolutional neural networks were used for generation of extractive sum-


maries of movie reviews [Denil et al, 2014]. In [Iyyer, 2014] paraphrase generation
using tree-based autoencoders was demonstrated, however, no evolution of para-
phrase quality was presented aside from few paraphrase examples. The approach
of [Iyyer, 2014] also relies on dependency parse trees. Our method in contrast, does
not use sentences parsers. It can be viewed as similar to encoder-decoder machine


Natural Language Generation, Paraphrasing and Summarization of User Reviews

translation models [Cho et al, 2014], while our RNN architecture is different and in-
spired by method of [Mao et al, 2014] where RNN was used to generate descriptions
of pictures. We are not aware of any prior application of such models to abstractive
text summarization or paraphrase generation.

3. Methods and algorithms

3.1. Datasets

We use database of 820,000 consumer reviews in Russian language from re-


viewdot.ru that was obtained by automatic crawling of more than 200 different web-
resources. From that database we selected 120,000 reviews in 15 different product
categories that had three sections (positive points, negative points and comments).
These three sections are commonly used in Russian consumer reviews websites and
reviewdot.ru crawler automatically detects them using heuristics-based algorithm.
We then exclude sentences with unknown polarity and those with length more than
25 words, resulting in 56,000 training sentences. All sentences were padded with
<START> and <END> special symbols.

3.2. Summarization Recurrent neural network model

The structure of our summarization recurrent neural network (s-RNN) is shown


in Figure 1. The s-RNN model is deeper than the simple RNN model and similar
to multimodal RNN introduced in [Mao et al, 2014]. It has five layers in each time
frame: the input word layer, one projection layer, the recurrent layer, the summariza-
tion layer, and the softmax layer.
Projection layer implements table-lookup operation, converting word to real-valued
embedding vector. Embedding vectors are obtained by training recurrent neural net-
work language model [Mikolov et al, 2010] on 30M words dataset of consumer reviews.
Recurrent layer implements standard Elman-type [Elman, 1990] recurrent function:

h(t) = f (Wx( t ) + Vh( t − 1) + b)

Here f is a nonlinear function, (in our case hyperbolic tangent function), W and V
are weight matrices between the projection and recurrent layer, and between the hidden
units. U is the output weight matrix, b is bias vector connected to hidden and output units.
After the recurrent layer, we set up a summarization layer that connects the
language model part and sentence-level semantics in s-RNN model. The language
model part includes the projection layer and the recurrent layer. The sentence-level
semantics contains the sentence features vector. We use sentence polarity, product
category, bag-of-aspect-terms vector and sentence length as sentence-level features.
While it is possible to incorporate more complex features, including these learned


Tarasov D. S.

by unsupervised neural network models, for this proof-of-principle experiment


we avoid these additional complexities.
The softmax layer on top of the network generates the probability distribution
of the next word.
Our s-RNN model was trained using backpropagation through time (BPTT)
[Werbos, 1990] method with mini-batch gradient descent using one sentence per
mini-batch as described in [Mesnil et al, 2013].

Next word

Sentence-level
Features vector

Fully connected

max-sampling

Input
word

Projection Recurrent Summarization Output softmax


layer layer layer layer

Figure 1. Architecture of summarization recurrent neural network

4. Results and discussion

4.1. Paraphrasing

To produce paraphrases, we give network sentence-level features vector


of original sentence and then generate new sentence word-by-word, beginning from
“<START>” symbol and stopping after network generates “<END>” symbol. Sen-
tences quality was assessed by two human judges, who were asked to label sentences
as “grammatically correct/not correct” and “conveying original meaning, not convey-
Next word
ing original meaning”. Results are summarized in table 1
Sentence-level
Features vector

Fully connected
Natural Language Generation, Paraphrasing and Summarization of User Reviews

Table 1. Human judgment of generated paraphrases (total of 75 sentences)

Percentage of sentences
(average value from re-
Human judgment sults of two human judges)
Grammatically correct and conveying original meaning 65%
Conveying original meaning but not necessary correct 78%
Correct, but not conveying original meaning 18%

Table 2. Example of original and paraphrased text

Paraphrased by neural network


Original (two samples)
Качественный и красивый корпус. Красивый, качественный корпус,
(good and nice looking case)* живучий.
(Nice looking, good case, reliable)
металический корпус довольно красиво
выглядит (Metallic case looks reasonable
good)
Большие кнопки Вполне нравится пластик, цифры кно-
с цифрами,хорошее качество пок / качество изготовления металли-
изготовления,приятный корпус, ческий корпус.
лёгкий, громкий сигнал вызова (Good quality of plastics, number of but-
(Big buttons with numbers, good tons / quality of production, metallic case)
production quality, nice case, light- но кнопки , корпус , хорошее качество
weight, loud signal) фотографий (But buttons, case, good
quality of photo)
Довольно сносное качество испол- Маленький, легкий, удобный, хорошее
нения, легкий и вполне удобный качество звучания
Quality is reasonable good, light- (Small, lightweight, convenient, good
weight and convenient enough. quality of sound)
легкий , удобный , хорошее качество
фотографий (lightweight, convenient,
good quality of photo)
Очень простой, понятный и удоб- Простой в использовании, удобный, не-
ный в использовании. (Very sim- дорогой. (Simple in use, convenient, not
ple, understandable and convenient high-priced)
in use) мощный, простой, понятный аппарат
удобный (powerful, simple, understand-
able device is convenient)

* English translations are human made, with an effort to preserve important sentence features.
As shown in table 2, most common mistakes are omissions of some original
points and additions of new information that was not present in original sentence.


Tarasov D. S.

4.2. Language generation

Our design allows certain degree of control over the meaning of generated sen-
tences. By choosing sentence-level features vector we can instruct the network, for
example, to “say something good about screen and sound quality in about ten words”.
We found, that better sentences are produced when number of words is set to roughly
triple of the number of aspect terms. With smaller sentences, RNN just lists all aspects,
and with larger values it tend to produce long phrases without well-defined meaning
(“bright display from outside”) and undesired additions such as “smart helps” (Table 3).

Table 3. Examples of sentences produced by s-RNN


(polarity set to “positive” and aspects set to “battery, screen, convenience”

Desired
sentence length output
3 батарея, экран, удобный
(battery, screen, convenient)
5 аккумулятор, размер дисплея солидный, эргономика
(accumulator, impressive display size, ergonomics)
10 быстрый аккумулятор, яркий внешне дисплей, удобный
функционал, умный помогает.
(fast accumulator, bright display from outside, convenient
functions, smart helps)

4.3. Summarization of multiple user reviews

Language-generating capacity of our RNN can be used for producing abstractive


summaries of multiple user reviews. To achieve that we generate synthetic sentence-
level feature vectors by running aspect-based sentiment analysis over all sentences
of reviews subjected to summarization, using extracted aspect terms and polarities
to generate feature vectors.
The major obstacle here is that our feature vectors capture only coarse-grained
information (i.e. they can tell that display is good, but information why it is good
is lost). Thus direct application of s-RNN usually leads to production of rather generic
or plainly incorrect summaries.
To circumvent this problem, we use additional dynamic training step that con-
sists of running one iteration of gradient descent over all sentences with aspect terms.
We found that this method considerably improves quality of summaries, and allows
incorporating fine-grained device-specific information.
Quality of review summaries were evaluated by two human judges who were
given original reviews and asked to rate summary quality as good, acceptable or unac-
ceptable. Table 4 presents averaged results.
Overall we found, that our method often produces summaries of reasonable quality,
while still making a number of mistakes. Most commonly observed problem is inclusion


Natural Language Generation, Paraphrasing and Summarization of User Reviews

of seemingly irrelevant statements, such as “lot of different days”. Also, we observed


significant number of ungrammatical sentences, that can be result of relatively small
training sample size, failure of RNN to capture long-term grammatical dependencies,
and/or grammatical errors in the training samples (since user reviews typically contain
certain number of ungrammatical phrases). The extent to which these factors contribute
to generation of grammar errors is presently unknown and needs further investigation.
Still, we find it impressive that such relatively simple method can be used to solve
multi-document summarization task—a problem that is generally considered difficult
in natural language processing. Future work should include evaluation of proposed
methods on different datasets and also investigation of possible use of trainable sen-
tence-level feature vectors instead of pre-defined ones.
Table 4. Human evaluation of review summaries (100 summaries total).

Quality rating Percentage of review summaries


Good 35%
Acceptable 44%
Unacceptable 21%

Table 5. Examples of generated summaries for two different mobile phones

Positives Negatives
Качество звука, удобный интерфейс, очень долго держит за- Не обнаружено
ряд. Отзывчивый экран, громкий звонок, крупный шрифт, (not found)
рабочий день. Приятно лежит в руках, 2 сим—карты вы-
ручают. Качество сборки, батарея, удобное меню, устойчив
к воздействию воды. Явно лидируют, сочный дисплей, каче-
ство связи, плеер, фонарь. Хорошая фотокамера, динамик
(Quality of sound, convenient user interface, very long battery
life. Responsive screen, loud calling signal, large font, working
day. Lies in hands nicely, 2 sim cards help. Quality of production,
convenient menu, waterproof. Obviously leading, nice display,
player, bright light. Good photo-camera, speaker).
Аккумулятор, скорость красивая. Дизайн, звук, функционал, Cкользкий панель
масса разных дней хватает. Красив, несколько назад, про- громкости тиховат.
цессор отзывчивый сенсор. Красивый экран, цветопередача. Cтирается, заметно
Дизайн, батарея, не тормозят, практичный. (Accumulator, ос виснет, появляется
speed is beautiful. Design, sound, functions, lot of different days. белый экран. (Slippery
Beatiful, few days ago, processor, responsive sensor. Nice screen, panel of volume is too
color reproduction. Design and battery is not slow, practical). quiet. Noticable shabby,
OS hangs and white
screen appears)

Acknowledgements

Author thanks anonymous reviewers for helpful comments on earlier drafts


of the manuscript.


Tarasov D. S.

References

1. Carenini, G., Ng, R. T., & Pauls, A. (2006, April). Multi-Document Summariza-


tion of Evaluative Text. In EACL.
2. Cho, K., van Merriënboer, B., Bahdanau, D., & Bengio, Y. (2014). On the proper-
ties of neural machine translation: Encoder-decoder approaches. arXiv preprint
arXiv:1409.1259.
3. Denil, M., Demiraj, A., Kalchbrenner, N., Blunsom, P., & de Freitas, N. (2014).
Modelling, Visualising and Summarising Documents with a Single Convolu-
tional Neural Network. arXiv preprint arXiv:1406.3830.
4. Elman J. (1990). Finding structure in time. Cognitive science, 14(2):179–211.
5. Fabbrizio D., Stent A. J., & Gaizauskas, R. (2014). A Hybrid Approach to Multi-
document Summarization of Opinions in Reviews. INLG, 54.
6. Hammervold K. (2000, June). Sentence generation and neural networks. In Pro-
ceedings of the first international conference on Natural language generation-
Volume 14 (pp. 239–246). Association for Computational Linguistics.
7. Iyyer M., Boyd-Graber J., Daumé H. (2014). Generating Sentences from Semantic
Vector Space Representations. NIPS Workshop on Learning Semantics.
8. Liu C., Hsaio W.-H., Lee C.-H., Lu G., Jou E. (2012). Movie Rating and Review
Summarization in Mobile Environment. IEEE Transactions on Systems, Man, and
Cybernetics-Part C: Applications and Reviews, Vol. 42, No. 3, May, pp. 397–406.
9. Liu J., Seneff S., Zue V. (2012). Harvesting and Summarizing User-Generated
Content for Advanced Speech-Based HCI. IEEE Journal of Selected Topics in Sig-
nal Processing, Vol. 6, No. 8, pp.982–992
10. Mao, J., Xu, W., Yang, Y., Wang, J., & Yuille, A. L. (2014). Explain images with
multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090.
11. Mei Q., Ling X., Wondra M., Su H., ZHAI C. (2007). Topic sentiment mixture: mod-
eling facets and opinions in weblogs. In WWW ’07: Proceedings of the 16th in-
ternational conference on World Wide Web. ACM, New York, NY, USA, 171–180.
12. Mesnil, G., He, X., Deng, L. & Bengio, Y. (2013). Investigation of recurrent neural
network architectures and learning methods for spoken language understand-
ing. In INTERSPEECH pp. 3771–3775 : ISCA.
13. Mikolov, T., Karafiát, M., Burget, L., Cernocký, J., & Khudanpur, S. (2010, Janu-
ary). Recurrent neural network based language model. In INTERSPEECH 2010,
11th Annual Conference of the International Speech Communication Associa-
tion, Makuhari, Chiba, Japan, September 26–30, 2010 (pp. 1045–1048).
14. Ratnaparkhi A. (2000, April). Trainable methods for surface natural language
generation. In Proceedings of the 1st North American chapter of the Association
for Computational Linguistics conference (pp. 194–201). Association for Compu-
tational Linguistics.
15. Raut B., Londhe D. Survey on Opinion Mining and Summarization of User Reviews
on Web (2014). International Journal of Computer Science and Information
Technologies, Vol. 5 (2), 1026–1030
16. Werbos, P. J. (1990). Backpropagation through time: what it does and how
to do it. Proceedings of the IEEE, 78(10), pp. 1550–1560

You might also like