0% found this document useful (0 votes)
2 views11 pages

000303069100010

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 11

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO.

3, MAY 2012 397

Movie Rating and Review Summarization


in Mobile Environment
Chien-Liang Liu, Wen-Hoar Hsaio, Chia-Hoang Lee, Gen-Chi Lu, and Emery Jou

Abstract—In this paper, we design and develop a movie-rating popularity of the Internet drives people to search for other peo-
and review-summarization system in a mobile environment. The ple’s opinions from the Internet before purchasing a product
movie-rating information is based on the sentiment-classification or seeing a movie. Many websites provide user rating and com-
result. The condensed descriptions of movie reviews are generated
from the feature-based summarization. We propose a novel ap- menting services, and these reviews could reflect users’ opinions
proach based on latent semantic analysis (LSA) to identify product about a product. For example, the customer-review section in
features. Furthermore, we find a way to reduce the size of summary Amazon.com lists the number of reviews, the percentage for
based on the product features obtained from LSA. We consider different ratings, and comments from reviewers. When people
both sentiment-classification accuracy and system response time want to purchase books, CDs, or DVDs, these comments and
to design the system. The rating and review-summarization system
can be extended to other product-review domains easily. ratings usually influence their purchasing behaviors. In addition
to these websites, a search engine is another important source
Index Terms—Feature extraction, natural language processing for people to search for other people’s opinions. When a user
(NLP), text analysis, text mining.
enters a query into a search engine, the search engine examines
its index and provides a listing of best-matching web pages ac-
I. INTRODUCTION cording to its criteria, usually with a short summary containing
the document’s title and, sometimes, parts of the text.
EOPLE’s opinion has become one of the extremely impor-
P tant sources for various services in ever-growing popular
social networks. In particular, online opinions have turned into
Current search engines can efficiently help users obtain a re-
sult set, which is relevant to user’s query. However, the semantic
orientation of the content, which is very important information
a kind of virtual currency for businesses looking to market their
in the reviews or opinions, is not provided in the current search
products, identify new opportunities, and manage their reputa-
engine. For example, Google will return around 7 380 000 hits
tions. Meanwhile, cellular phones have definitely become the
for the query “Angels and Demons review.” If search engines can
most-vital part of our lives. There is no doubt that the mobile
provide statistical summaries from the semantic orientations, it
platform is currently one of the most popular platforms in the
will be more useful to the user who polls the opinions from the
world. However, digital content displayed in cellular phones
Internet. A scenario for the aforementioned movie query may
is limited in size, since cellular phones are physically small.
yield such report as “There are 10 000 hits, of which 80% are
Hence, a mechanism that can provide users with condensed
thumbs up and 20% are thumbs down.” This type of service
descriptions of documents will facilitate the delivery of digi-
requires the capability of discovering the positive reviews and
tal content in cellular phones. This paper explores and designs
negative reviews.
a mobile system for movie rating and review summarization in
In recent years, the problem of “opinion mining” has seen
which semantic orientation of comments, the limitation of small
increasing attention [1]–[3]. With the proliferation of reviews,
display capability of cellular devices, and system response time
ratings, recommendations, and other forms of online expres-
are considered.
sion, online opinion could provide important information for
Practically, when we are not familiar with a specific prod-
businesses to market their products, identify new opportuni-
uct, we ask our trusted sources to recommend one. Today, the
ties, and manage their reputations. For example, most recom-
mendation systems attempt to alleviate information overload
Manuscript received July 13, 2010; revised November 4, 2010 and January by identifying which items a user will find worthwhile, and
18, 2011; accepted February 28, 2011. Date of publication April 29, 2011; date collaborative filtering used in this process relies on the opin-
of current version April 11, 2012. This work was supported in part by the Na-
tional Science Council (NSC) under Grant NSC-99-2221-E-009-150 and Grant ions of similar customers to recommend items [4]. Essentially,
NSC-099-2811-E-009-041. This paper was recommended by Associate Editor the task of determining whether a movie review is positive or
G. I. Papadimitriou. negative is similar to the traditional binary-classification prob-
C.-L. Liu, W.-H. Hsaio, and C.-H. Lee are with the Department of Com-
puter Science, National Chiao Tung University, Hsinchu 30010, Taiwan lem. Given a review, the classifier tries to classify the review
(e-mail: clliu@mail.nctu.edu.tw; mr.papa@msa.hinet.net; chl@cs.nctu.edu.tw; into positive category or negative category. However, opinions
badlaugh.cs96g@g2.nctu.edu.tw). in natural language are usually expressed in subtle and com-
G.-C. Lu is with the Global Legal Division iTEC, Hon Hai Precision Industry
Company Ltd., Taipei 236, Taiwan (e-mail: badlaugh.cs96g@g2.nctu.edu.tw). plex ways. Thus, the challenges may not be addressed by sim-
E. Jou is with the Institute for Information Industry, Taipei 106, Taiwan (e- ple text-categorization approaches such as n-gram or keyword-
mail: emeryjou@iii.org.tw). identification approaches [5].
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. In this paper, we collected movie reviews from Internet Blogs
Digital Object Identifier 10.1109/TSMCC.2011.2136334 that do not consist of any rating information. Sentiment analysis

1094-6977/$26.00 © 2011 IEEE


398 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 3, MAY 2012

is performed to determine the semantic orientation of the reviews analysis research started from the determination of the semantic
and movie-rating score is based on the sentiment-analysis result. orientation of the terms. Hatzivassiloglou and McKeown [7]
In addition to the accuracy of the classification, system response employed textual conjunctions such as “fair and legitimate” or
time is also taken into account in our system design. Although “simplistic but well-received” to separate similarly connoted
this paper focuses on movie review, the whole design is not only and oppositely connoted words. Esuli and Sebastiani [3] pro-
for movie-review domain. The same design can be applied to posed to determine the orientation of subjective terms based on
other domains such as restaurant, hotel, etc. Meanwhile, increas- the quantitative analysis of the glosses of such terms, i.e., the tex-
ingly more cellular phones have begun using global positioning tual definitions that are given in online dictionaries. The process
system (GPS) functionality, which can utilize user’s current lo- is based on the assumption that terms with similar orientation
cation to provide enhanced services and make cellular phones tend to have “similar” glosses (i.e., textual definitions). Thus,
become context aware. Moreover, the opinion-mining result can synonyms and antonyms could be used to define a relation of
be used by recommendation systems to identify which items a orientation. Esuli and Sebastiani [8] described SENTIWORD-
user will find worthwhile. For example, when people want to NET, which is a lexical resource in which each WordNet synset
have dinner with their friends, restaurant recommendation sys- is associated with three numerical scores, i.e., Obj(s), Pos(s),
tem can provide a restaurant list based on their current GPS and Neg(s), thus describing how objective, positive, and nega-
location, opinion-mining result, and their preferences. tive the terms contained in the synset.
In cellular-phone environment, it is inappropriate to display Traditionally, sentiment classification can be regarded as a
detailed review due to the size of the screen. Hence, we employ binary-classification task [1], [2], [9]. Turney [2] proposed to
summarization technique to reduce the size of information. The determine the orientation of terms by bootstrapping from a pair
system will summarize the reviews (including positive reviews of two minimal sets of “seed” terms by counting the number of
and negative reviews) and provide the user an overview about the hits returned from search engine with a N EAR operator. The
reviews. Meanwhile, movie-review summarization is similar to N EAR operator requires these two phrases or terms to be within
customer review that focuses on product feature [6]. In this pa- a specified word count of one another to be counted as a success-
per, we employ feature-based summarization for movie review. ful result. AltaVista search engine1 allows the user to specify a
Product feature and opinion-word identification are essential to word distance of his/her choice, but the maximum distance is ten
feature-based summarization. We propose an latent-semantic- words. The relationship between a given phrase and a set of seeds
analysis (LSA) based product-feature-identification approach was used to place it into a positive or negative subjectivity class.
to identify product features. Moreover, we extend the result to Pang et al. [1] found out that standard machine learning outper-
propose an LSA-based filtering mechanism, which can further forms human-proposed baselines. They employed naive Bayes,
reduce the size of the summarization according to the features. maximum-entropy classification, and support vector machines
The main contributions of this paper are the following. (SVMs) [10] to perform sentiment-classification task on movie-
1) Design and develop a movie-rating and review- review data. According to their experiment, SVMs tended to do
summarization system in a mobile environment. We con- the best, and unigram with presence information turns out to be
sidered system response time issue to design the mobile the most effective feature.
application, and the same system design can be extended In recent years, some researchers have extended sentiment
to other domains with a little modification. analysis to the ranking problem, where the goal is to assess
2) Propose a novel approach based on LSA to identify prod- review polarity on a multipoint scale [11]–[13]. Snyder and
uct features. Product features and opinion words are used Barzilay [13] addressed the problem of analyzing multiple re-
to select appropriate sentences to become a review sum- lated opinions in a text and presented an algorithm that jointly
marization. learns ranking models for individual aspects by modeling the
3) Propose an LSA-based filtering mechanism to allow the dependencies between assigned ranks. Goldberg and Zhu [12]
users to choose the features in which they are interested, proposed a graph-based semisupervised learning algorithm to
and this mechanism could reduce the size of summary address the sentiment-analysis task of rating inference and their
efficiently. experiments showed that considering unlabeled reviews in the
The rest of this paper is organized as follows. In Section II, learning process can improve rating inference performance.
related surveys are presented. In Section III, the LSA-
based product feature identification approach is introduced. In B. Feature-Based Summarization
Section IV, system design is presented. In Section V, several
In product-review summarization, people are interested in
experiments are introduced. In Section VI, the conclusion is
the reasons why this product is worth buying rather than the
presented.
principal meaning of the comment. Thus, feature-based sum-
marization [6] is used in movie-review summarization. The
II. RELATED SURVEYS feature-based summarization will focus on the product features
A. Sentiment Analysis on which the customers have expressed their opinions. In ad-
dition to product features, the summarization should include
Since a document is composed of sentences and a sentence is
composed of terms, it is reasonable to determine the semantic
orientation of the text from terms. As a result, the sentiment- 1 AltaVista: http://http://www.altavista.com/
LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 399

opinion information about the product; therefore, product fea-


tures and opinion words are both important in feature-based
summarization. As a result, product features and opinion-word
identification are essential in feature-based summarization.
Practically, it is not easy to list all the product features and
opinion words manually. Some researchers try to use a statis-
tical approach to identify frequent feature words, because the
product features may occur frequently in product reviews. How-
ever, the drawback of this approach is that it may miss infre-
quent features. Hu and Liu [6] studied the problem of generating
feature-based summaries of customer reviews of products sold
online and proposed a method of word attributes, including oc-
currence frequency, part of speech (POS), and synset in Word-
Net. Meanwhile, Zhuang et al. [14] proposed to make use of
grammatical rules and keyword lists to seek for feature-opinion
pairs and generate feature-based summarization. Lu et al. [15]
utilized POS tagging and chunking function of the OpenNLP2
toolkit to identify phrases in the form of a pair of head term and
modifiers. Their research focused on short comments; therefore,
POS-tagging information can be employed to obtain the prod-
uct features and opinion words. For example, the comment “Fast
ship and delivery” contains only one sentence; therefore, it is
easier to obtain the head terms (i.e., noun or noun phrase) and
modifiers (i.e., adjective) using POS-tagging information. Prac-
tically, this approach cannot be applied to other product-review
applications. First, most reviews contain many sentences rather
than short comments. Second, most sentences in a review often
contain many terms that are irrelevant to the product features
or opinion words. Thus, we cannot identify the product features
and opinion words in movie reviews using the same approach.

III. LATENT-SEMANTIC-ANALYSIS-BASED where U and V are matrices with orthonormal columns (i.e.,
PRODUCT-FEATURE IDENTIFICATION U T U = V T V = I), and Σ is a diagonal matrix whose diagonal
elements are the singular values of M .
In this paper, we propose a novel approach based on LSA The original term-document matrix could be approximated
to identify related product-feature terms. Essentially, LSA is by reducing the dimensions of the term–document space, and
a theory and method to analyze relationships between a set this will allow the underlying latent relationships between terms
of documents and the terms they contain by producing a set and documents to be exploited during searching. Equation (2)
of concepts related to the documents and terms. LSA can be shows that the reduced matrix M̃ is obtained by reducing the
applied to any type of count data over a discrete dyadic do- dimensionality, where the system truncates the singular-value
main, which is so-called two-mode data [16]. Supposing that matrix Σ to size k. It is this dimensionality-reduction step, i.e.,
a collection of documents D = {d1 , . . . , dn } with terms from the combining of surface information into a deeper abstraction,
W = {w1 , . . . , wm } are given, then the system can construct a which captures the mutual implications of words and passages.
cooccurrence matrix M , where its dimension is n × m and each Therefore, even though the original vector space is sparse, the
entry Mij denotes the number of times the term wj occurred corresponding low-dimensional space is typically not sparse.
in document di . Each document di is represented using a row Practically, the number of dimensions retained in LSA is an em-
vector, while each term wj is represented using a column vec- pirical issue [17]. We conducted the experiments under different
tor. As shown in (1), LSA applies singular-value decomposition dimensions in the experiment section
(SVD) to the term-document matrix M, and a low-rank approx-
imation of the matrix M could be used to determine patterns in M̃ = U Σ̃V T ≈ U ΣV T = M. (2)
the relationships between the terms and concepts contained in Algorithm 1 shows the algorithm, where the inputs include
the text a term-document matrix, several product-feature seeds, the re-
duced dimensionality in SVD operation, and the number of
M = U ΣV T (1)
extracted features for each seed. In Algorithm 1, lines 3 and
4 are employed to perform linear algebra SVD operation on
the term-document matrix, and lines 5–16 are used to compute
2 http://opennlp.sourceforge.net/ the similarities between the seed product-feature vector and,
400 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 3, MAY 2012

become the inputs of the SVM sentiment classifier, which will


classify the reviews into positive or negative classes. Rating
information can be obtained based on the proportion of posi-
tive and negative movie reviews. In addition to the sentiment
classification of movie review, we further determine the polar-
ity of a sentence using opinion words. Then, the system can
provide both positive and negative summarization, regardless
of the polarity of a review. The whole process includes senti-
ment classification and feature-based summarization. These two
processes will be described in the following sections.

A. Dataset
In this paper, we collected the Chinese movie reviews from
Internet Blogs. Since the original data are an hypertext markup
language (HTML) document, HTML-tag-removal process is re-
quired to extract the text information. Training data are neces-
sary for SVM to train a classification model, and manual classifi-
cation is performed to classify the training reviews into positive
or negative reviews. We randomly selected 500 positive reviews
and 500 negative reviews as the data for classification-model
building. In addition to the model-building data, we further col-
lected around 8000 movie reviews from the Internet, and these
reviews will be used as movie-review database.

B. Sentiment Classification
As mentioned above, sentiment classification is similar to
traditional binary-classification problem. Currently, many clas-
sification algorithms such as SVM [1], [10], [18], [19], decision
trees [20], and neural networks [21] have been proposed and
shown their capabilities in different domains. SVM is one of the
state-of-the-art algorithms. SVM has been shown to be highly
effective in traditional text categorization. SVM measures the
Fig. 1. Movie review and summarization flow.
complexity of hypotheses based on the margin with which they
separate the data instead of the number of features. One re-
pairwise, the other term vectors. The top ones will be collected as markable property of SVM is that their ability to learn can be
related product-feature terms for a specific product feature. The independent of the dimensionality of the feature space.
procedure getTermVectorFromTermDocMatrix is used to ob- In natural-language processing (NLP) and information re-
tain the term-vector representation of a product feature. The seed trieval (IR), bag-of-words model tries to use an unordered col-
is supposed to be one of the terms in the term-document ma- lection of words to represent a text, disregarding grammar and
trix, and it is easy to obtain its corresponding document-vector even word order. In other words, each word in the text con-
representation. Meanwhile, sim in line 7 is used to store the sim- tributes to a feature of the document. In this paper, we employ
ilarities between the seed and the other terms. After sorting in similar approach to construct a feature vector of the document.
descendant order, it is easy to obtain the top ones and their corre- Stop words are removed first and then each distinct word Wi
sponding feature names in procedure getTopRelatedFeatures. in the document is used to represent a feature. As a result, a
When the above steps are completed, each product-feature document could be represented by a feature vector, and many
seed can have its own semantically related term set. The ad- machine-learning algorithms could be applied to perform clas-
vantage of this approach is that it could be applied to all the sification tasks. We employed SVM to perform the classifica-
languages, it does not need any external dictionary, since LSA tion and libsvm [22] package is used in the system. The kernel
is language-independent, and it is based on linear algebra SVD function used in the system is the radial basis function (RBF)
operation. and K-fold cross validation (i.e., K = 5) is conducted in the
experiment.
The classification result will be the basis of the rating. With
IV. SYSTEM DESIGN the proportion of positive and negative reviews, the system could
Fig. 1 shows the system flow. The input is a movie name and provide the rating information to end users. For example, if there
the system will use the movie name to retrieve reviews about are 100 movie reviews for a specific movie and 80 reviews are
this movie from movie-review database. These movie reviews positive, the rating of this movie will be four stars.
LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 401

C. Review Summarization
1) Product-Feature Identification: As mentioned above, we
propose an LSA-based product-feature-identification algorithm
and system can obtain a semantically related feature set for
each seed. We compared three product-feature-identification
approaches, i.e., the LSA-based approach, frequency-based
approach, and PLSA-based approaches, in the experiment
section.
2) Opinion-Word Identification: In addition to feature iden-
tification, opinion words about the product features are impor-
tant as well. Hu and Liu [6] extracted the opinion words by
retrieving the nearby adjective of product features. In addition
to language sentence-structure characteristic, Zhuang et al. [14]
used the dependency grammar graph to find out some relations
Fig. 2. Rating and summarization screenshot.
between feature words and the corresponding opinion words in
training data. They both rely on language sentence structure to
extract opinion words; therefore, these approaches will be appli- features. The system allows the user to determine the feature
cable to those language sentences having such a characteristic. f in which he/she is interested. When the user determines f ,
Many languages do not possess the aforementioned sentence the system will generate a summary, which is related to product
structure. Hence, we propose to use a statistical approach to dis- features F .
cover opinion words. First, we take into account POS-tagging Practically, a positive movie review may include negative
information of the opinion words. According to our analysis, comments about specific aspects and vice versa. In this paper,
adjectives are usually used to describe sentiment in Chinese; we propose to analyze the polarity of a movie review using SVM
therefore, these terms become the candidate opinion words. Sec- and analyze the polarity of a sentence using opinion words. In
ond, term frequency is taken into account; therefore, frequency feature-based summarization, the system can utilize the polarity
of the opinion words should exceed a threshold value. Let AVG of opinion words to determine the polarity of sentences. Hence,
be the average of sum of square of frequency of all items as the system can provide both positive- and negative-review sum-
shown in (3) below. A termi will be selected only if its square marization, regardless of the polarity of a review.
of frequency is equal or larger than AVG. We manually selected With the proportion of positive and negative reviews, the
positive and negative sentences from 500 positive reviews and system could provide the rating information to end users. The
500 negative reviews, respectively. Positive opinion words and rating information combined with review summary could give
negative opinion words could be further obtained based on term end users the rating and summarization information about the
frequency and POS tagging. movie. The “Feature” section in Fig. 2 is a pull-down menu,
which allows the users to choose the features in which they

n
Sf = {Frequency(termi )}2 are interested. Meanwhile, positive summarization and negative
i=1 summarization can be presented to users, regardless of a movie’s
rating.
AVG = Sf /n. (3)

3) Feature-Based Summarization: As described above, V. EXPERIMENT


feature-based summarization is more appropriate in product- Several experiments are performed to evaluate our system. In
review summarization. In general, feature-based summarization sentiment-classification experiment, SVM is employed to per-
is based on product features and opinion words. It is not easy form the sentiment-classification task. Several feature combi-
to use compression ratio directly, since the sentence-selection nations are used to evaluate the system performance. Since the
criterion is based on the presence of product features. Hence, application runs on mobile platform, therefore, classification
we propose an LSA-based filtering approach to further select accuracy is not the only factor in system design. The system
the content of the summary based on user’s favor. In product- will be infeasible if it takes a long time to response. There-
feature discovery, we employ LSA to find out related feature fore, system-response-time-evaluation experiment is conducted
terms of a specific product feature, and these related terms as well. In product-feature identification, we propose an LSA-
could be regarded as being semantically related to this prod- based approach to identify the product features and compare
uct feature. For each given product feature f , LSA could dis- LSA-based approach with frequency-based and PLSA-based
cover related terms F that are semantically related to f . In approaches using the movie-review-glossary dataset.
general, F could be regarded as f ’s related terms, and the sys-
tem can employ F to select summary sentences. In application
A. Sentiment Classification
design, the system provides all the summary sentences in the
beginning. The product-feature seeds mentioned in LSA-based Opinions in natural language are usually expressed in subtle
feature-identification process will become candidate interested and complex ways. For example, the polarity of a sentence may
402 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 3, MAY 2012

be changed when a negative term is used in the sentence. We TABLE I


EXPERIMENT RESULT
considered possible feature combination in the experiments to
obtain the best feature selection. Based on the bag-of-words
model, we used unigram, bigram, negation, location, frequency,
and presence features (i.e., only consider whether the feature is
present or not) to perform the classification task with different
feature combinations.
Unlike English, the Chinese language could not make use of
spaces as a boundary to separate words in a sentence. Chinese-
word-segmentation process is required. In addition to Chinese-
word segmentation, Chinese stop words are removed as well,
since the stop words cannot provide sufficient information. In
TABLE II
feature selection, our experiments also showed that unigram SVM MODEL LOADING AND PREDICTION EVALUATION RESULT
with presence features outperforms bigram with other features,
and the result is the same as described in [1]. In addition to uni-
gram with presence features, we design three basic experiments
to compare the differences of feature combinations, and they are
described as follows.
1) Group 1: outperforms the other feature combinations, and this result con-
a) removal of the terms appearing in both positive and forms to Pang’s [1] result. It seems like that negation, location,
negative reviews; and bigram features do not contribute to sentiment classifica-
b) frequency-feature criterion, where the term’s tion. If we compare the performance of three basic experiments,
square of frequency should be at least AVG, as Group 2 outperforms Group 1 and Group 3. In other words, the
shown in (3); removal of the terms appearing in both positive and negative
2) Group 2: frequency-feature criterion, where the term’s reviews will decrease the classification-accuracy rate. Mean-
square of frequency should be at least AVG, as shown while, the frequency criterion based on (3) is a little better than
in (3); the frequency criterion, which is at least three times. Further-
3) Group 3: frequency-feature criterion, where the term more, the feature-combination experiments show that Group 2
should occur at least three times. with negation feature outperforms Group 2, and this result is
The Group 1 experiment includes two additional features to different from Pang’s [1] research result.
evaluate its performance. The first feature is about the removal However, sentiment-classification accuracy is not the only
of the terms appearing in both positive and negative reviews. issue on mobile platform, and response time should be consid-
In general, the terms that appear in both positive and negative ered as well. Table II shows that the system using unigram with
reviews could not provide enough semantic orientation to dif- presence feature will have 40 462 features, and it takes about
ferentiate positive and negative reviews. The second feature is 120 s to load the classification model. Obviously, it is infeasible
about the comparison of the effect of frequency. on mobile platform if a system’s response takes 120 s. Hence,
The Group 1 and Group 2 experiments are used to compare the number of features is crucial to the system’s response time.
the effect of term selection. While Group 1 removed the terms We employ frequency as filtering criterion to reduce the number
appearing in both positive and negative reviews, the Group 2 of features. The number of features could be reduced to 1902
experiment used all the terms. The Group 2 and Group 3 exper- if we use the frequency criterion based on (3). Table II shows
iments are used to compare the effect of term frequency. While that it takes about 6 s to load classification model, and it is fea-
Group 2 used the frequency criterion based on (3), Group 3 sible on mobile platform. Therefore, this frequency criterion is
selected the terms that occur at least three times. employed to perform sentiment classification.
These three experiments are performed to evaluate their per- We also performed sentiment classification on another movie-
formances on movie-review data, and they will become the bases review dataset, which is available at http://www.cs.cornell.
of other experiments. Negation and position are additional fea- edu/People/pabo/movie-review-data/. The dataset includes
tures that are included into these three bases to perform feature 1000 positive and 1000 negative movie reviews. Similarly, SVM
combination. In negation feature, a negation term may change is used to perform the classification task. The kernel function
the polarity of a sentence completely, which may blur the de- used in the system is RBF and K-fold cross validation (i.e.,
cision. For example, a sentence “This movie is interesting” in- K = 5) is used in the experiment. Different feature-selection
dicates a positive opinion about this movie, while the sentence criteria are used in the experiment to compare their number
“This movie is not interesting” changes the polarity of the sen- of features and accuracies. Table III shows the experimen-
tence. As for position feature, people may have the conclusion tal result, which includes three feature-selection approaches.
in the end, therefore, position feature is employed, as well to The preprocess task includes the punctuation-elimination pro-
evaluate its effect. cess, the lowercase-conversion process, and the negative-term-
Table I shows the experimental result. Unigram with presence conversion process, which converts “n’t” to “not.” The first
feature (i.e., only considers the presence and absence of a term) one used all the unigrams as features, while the second one
LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 403

TABLE III
SENTIMENT-CLASSIFICATION RESULTS USING PUBLIC MOVIE-REVIEW DATASET

employed frequency as the filtering criterion, with only the TABLE IV


TOP TEN TERMS USING FREQUENCY-BASED APPROACH
unigrams with occurrences more than three would be taken
into account. The third one employed the frequency criterion
listed in (3). The term-document matrices of all the experiments
employed unigram with presence feature as entry value. The
first two approaches do not remove stop words, but the third one
removes stop words first. The main reason is that stop words
are the terms with high frequencies, therefore, almost only stop
words will be left using the criterion listed in (3) if the stop
words are not removed in advance of the process.
The experimental results are similar to the previous experi-
ment. The first one outperforms the other ones, but the number
of features is enormous. The second one can reduce more than
half of the features and the accuracy is almost the same. How- TABLE V
ever, the number of features is still enormous. The number of FIVE ASPECTS GENERATED USING LSA
features in the third experiment is 861 and its accuracy is about
81.2%. Although the accuracy of the third one is not as good as
the other ones, it can dramatically reduce the number of features.
Meanwhile, its accuracy is still acceptable practically.

B. Product-Feature Identification
In product-feature identification, we compared our LSA-
based approach with two other approaches, which are frequency-
based and PLSA-based. We performed experiments using the
movie-review documents mentioned above, which is avail-
able at http://www.cs.cornell.edu/People/pabo/movie-review-
data/. The dataset includes 1000 positive and 1000 negative terms like story, character, and plot can be identified. In the
movie reviews. Since nouns are the candidates of product fea- LSA-based approach, Algorithm 1 is used to identify product
tures, only nouns will be used in this experiment and the to- features and the seeds include scene, plot, director, actor, and
tal number of nouns is 29 632. In addition to movie-review story. The truncated dimension of LSA is 500 in this paper. Ta-
dataset, we employed the movie-review glossary, which is avail- ble V shows the top ten features for each seed. In addition to
able at http://www.movieprofiler.com/movieglossary, as the ba- product-feature identification, the top ten features for each seed
sis of the comparison. The movie-review glossary is created can be regarded as being semantically related to the seed.
for movie reviewers, critics, and film students alike, as well In PLSA-based approach, we applied PLSA [23] to the
as the general public interested in movie reviewing and film dataset. Essentially, PLSA is based on a mixture decompo-
making-related terminology. The number of terminologies is sition derived from a latent class model. The standard pro-
1069. Since many terminologies are only used in movie indus- cedure for maximum-likelihood estimation in latent-variable
try, additional filtering is applied to the dataset. Only the terms models is the expectation–maximization (EM) algorithm [24],
appearing in the movie-review data will be kept. The num- which includes the E-step and the M-step. In E-step, the
ber of terminologies left is 383. A copy of the terminologies posterior probabilities are computed for the latent vari-
obtained from movieprofiler.com and the terminologies used in able z based on the current estimates of the parameters.
this paper are available at http://islab.cis.nctu.edu.tw/download/. In M-step, the parameters are updated based on the pos-
Precision, recall, and F -value are employed to evaluate system terior probabilities obtained in the previous E-step. When
performance. given each occurrence of a word w ∈ W = {w1 , . . . , wM }
In frequency-based approach, all the nouns are ranked ac- in a document d ∈ D = {d1 , . . . , dN }, the E-step is given
cording to their frequencies, and then, the top ones are selected by
as product features. Table IV shows the top ten terms using
frequency-based approach. Frequency-based approach can iden- P (wj |zk )P (zk |di )
P (zk |di , wj ) = K . (4)
l=1 P (wj |zl )P (zl |di )
tify the terms that are often used in movie reviews. Hence, the
404 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 3, MAY 2012

TABLE VI
FIVE ASPECTS GENERATED USING PLSA

The estimate P (di ) ∝ n(di ) can be carried out independently.


By standard calculation, one arrives at the following M-step
reestimation equations. The number of aspects is five in PLSA
experiment, and Table VI shows the top ten terms for each
aspect
N
n(di , wj )P (zk |di , wj )
P (wj |zk ) = M i=1 N (5)
m =1 i=1 n(di , wm )P (zk |di , wm )
M
j =1 n(di , wj )P (zk |di , wj )
P (zk |di ) = . (6)
n(di )

Furthermore, frequency-based, LSA-based, and PLSA-based


approaches are applied to the movie-review dataset, and the
terms extracted from these approaches are compared with the
terms in filtered movie glossary dataset. Fig. 1 shows the result,
where precision, recall, and F -value curves are presented. In
LSA and PLSA approaches, many terms may appear in differ-
ent aspects; therefore, performance evaluation only takes into
account distinct terms. In other words, the term “film” in LSA
and PLSA will be calculated once. The experimental results
show that LSA outperforms frequency-based and PLSA-based
approaches in precision, recall, and F -value evaluations. As a
by-product, the system can identify a related term set for each
seed. Meanwhile, as shown in Fig. 3, PLSA-based approach
does not work well in product-feature identification.
In addition to the experiments mentioned above, we further
conducted experiments on the effect of truncated dimension of
LSA in product-feature identification. We conducted the experi- Fig. 3. Precision, recall, and F -value curves for movie-review-glossary
dataset. (a) Precision curve. (b) Recall curve. (c) F-value curve.
ments under different dimensions and compared the results with
the frequency-based approach. Fig. 4 shows the result, where
precision, recall, and F -value curves are presented. As shown periments in text applications of machine-learning techniques,
in Fig. 4, LSA outperforms frequency-based approach when the such as text classification and text clustering. The data are or-
number of dimensions is more than 500. For LSA, the differ- ganized into 20 different newsgroups, each corresponding to
ences are minor when the number of dimensions is more than a different topic. Besides PLSA, we also applied LSA and
500. On the other hand, if the number of dimensions is 50, the k-means algorithms to the same dataset for comparison. In
performance becomes worse than the frequency-based approach LSA approach, dimensionality-reduction process is performed
when the number of terms is more than 80. first (i.e., the dimensionality of Σ̃ is 300), then k-means-
Basically, PLSA can be regarded as a clustering algorithm. clustering algorithm is applied to reduced matrix M̃ . We used
As shown in the above experiment, PLSA cannot work well three newsgroups, which include alt.atheism, comp.graphics,
on the movie-review dataset. To further investigate the clus- and comp.sys.ibm.pc.hardware, from the dataset to evaluate the
tering capability of PLSA, we performed another experiment clustering performance.
on a popular dataset, which is 20 newsgroups dataset. The 20 We compared the generated clusters by using the F1 cluster-
newsgroups collection has become a popular dataset for ex- evaluation measure [25]. The F1 cluster-evaluation measure
LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 405

TABLE VII
CLUSTERING RESULT USING 20 NEWSGROUPS DATASET

TABLE VIII
THREE ASPECTS GENERATED USING PLSA (20 NEWSGROUPS DATASET)

3) True negatives (TNs): The clustering algorithm placed the


two articles in the pair into differing clusters, and 20 news-
groups have them in differing classes.
4) False negatives (FNs): The clustering algorithm placed
the two articles in the pair into differing clusters, and 20
newsgroups have them in the same class.
Similar to the traditional IR definition, (7) shows the formulas
of precision, recall, and F1 evaluation
TP
Precision =
TP + FP
TP
Recall =
TP + FN
2 × Precision × Recall
F1 = . (7)
Precision + Recall
Table VII shows the experimental results, where PLSA out-
performs k-means and LSA. The PLSA works very well in the
clustering of newsgroups dataset. Moreover, Table VIII shows
the top ten terms of the aspects discovered by PLSA. Obvi-
ously, these three newsgroups are highly unrelated, and we can
determine their clusters from their top ten terms. The aspect 1
Fig. 4. Precision, recall, and F -value curves for movie-review-glossary belongs to comp.graphics newsgroup, the aspect 2 belongs to
dataset using LSA under different truncated dimensions. (a) Precision curve. comp.sys.ibm.pc.hardware newsgroup, and the aspect 3 belongs
(b) Recall curve. (c) F-value curve.
to alt.atheism newsgroup. On the other hand, it is very difficult
to distinguish the aspects of movie-review dataset. The plausible
reason might be that the articles in movie-review dataset are all
about movie reviews, and most reviewers may use similar terms
considers both precision and recall, where precision and recall
in their articles.
here are computed over pairs of documents for which two label
assignments either agree or disagree. The F1 cluster-evaluation
C. Discussion
measure is also used by Ramage et al. [26]. The following four
evaluation metrics are necessary for the computation. In sentiment classification, Pang et al. [1] showed that uni-
1) True positives (TPs): The clustering algorithm placed the gram with presence features outperformed other feature com-
two articles in the pair into the same cluster, and 20 news- binations. Our experiments conform to Pang’s research results.
groups have them in the same class. However, if all the unigrams are used in the system, the number
2) False positives (FPs): The clustering algorithm placed of features will be enormous. For example, our training dataset
the two articles in the pair into the same cluster, but 20 includes 1000 movie reviews, and the number of features is
newsgroups have them in differing classes. around 40 000. The application needs to load SVM model first
406 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART C: APPLICATIONS AND REVIEWS, VOL. 42, NO. 3, MAY 2012

and then predict the semantic orientation of the review. If 40 000 to provide a new product-review summarization and rating ser-
features are used, it would take around 120 s to load the model. vice. The design can also be extended to other product-review
Hence, we employed frequency criterion to reduce the number domains easily.
of features. Currently, our system uses 1902 features, and it
takes less than 6 s to load model and predict the review. REFERENCES
In product-feature identification, the experiment shows that
[1] B. Pang, L. Lee, and S. Vaithyanathan, “Thumbs up?: Sentiment classifica-
LSA-based approach outperforms frequency-based and PLSA- tion using machine learning techniques,” in Proc. ACL-02 Conf. Empirical
based approaches. As a by-product, our LSA-based system can Methods Natural Lang. Process., 2002, pp. 79–86.
identify a related term set for each seed. We propose an LSA- [2] P. D. Turney, “Thumbs up or thumbs down?: Semantic orientation applied
to unsupervised classification of reviews,” in Proc. 40th Annu. Meeting
based filtering mechanism to employ these semantically related Assoc. Comput. Linguist., 2002, pp. 417–424.
terms to reduce the size of summary. Only the sentences contain- [3] A. Esuli and F. Sebastiani, “Determining the semantic orientation of terms
ing these terms will be presented to users. Moreover, the LSA- through gloss classification,” in Proc. 14th ACM Int. Conf. Inf. Knowl.
Manage., 2005, pp. 617–624.
based product-feature-identification approach could be general- [4] S. H. Choi, Y.-S. Jeong, and M. K. Jeong, “A hybrid recommendation
ized to other product-review domains, since the linear algebra method with reduced data for large-scale application,” IEEE Trans. Syst.,
SVD operation could be applied to any language. Man, Cybern. C, Appl. Rev., vol. 40, no. 5, pp. 557–566, Sep. 2010.
[5] T. Mullen and N. Collier, “Sentiment analysis using support vector ma-
Meanwhile, we conducted an experiment on the truncated chines with diverse information sources,” in Proc. EMNLP, 2004, pp. 412–
dimension of LSA. Several truncated-dimension values were 418.
used, and their results were compared with frequency-based [6] M. Hu and B. Liu, “Mining and summarizing customer reviews,” in Proc.
10th ACM SIGKDD Int. Conf. Knowl. Discov. Data Mining, 2004, pp. 168–
approach. The experimental result shows that when the truncated 177.
dimension is more than 500, the differences are minor. [7] V. Hatzivassiloglou and K. R. McKeown, “Predicting the semantic ori-
Moreover, we used 20 newsgroups dataset to evaluate PLSA’s entation of adjectives,” in Proc. 8th Conf. Eur. Chap. Assoc. Comput.
Linguist., Morristown, NJ: Assoc. Comput. Linguist., 1997, pp. 174–181.
clustering performance. The result shows that PLSA could out- [8] A. Esuli and F. Sebastiani, “SENTIWORDNET: A publicly available
perform k-means and LSA. One of the important features of the lexical resource for opinion mining,” in Proc. 5th Conf. Lang. Res. Eval.,
newsgroup dataset is that the newsgroups in the experiment are 2006, pp. 417–422.
[9] K. Dave, S. Lawrence, and D. M. Pennock, “Mining the peanut gallery:
highly unrelated. In other words, the boundaries between these opinion extraction and semantic classification of product reviews,” in
aspects are very clear. However, the movie-review dataset does Proc. 12th Int. Conf. World Wide Web, New York: ACM, 2003, pp. 519–
not possess such a characteristic. The articles in the movie re- 528.
[10] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:
view are similar, since they all focus on movie reviews. Hence, it Springer-Verlag, 1995.
might be the reason why PLSA could not determine the bound- [11] B. Pang and L. Lee, “Seeing stars: Exploiting class relationships for sen-
aries between the aspects of movie reviews. timent categorization with respect to rating scales,” in Proc. 43rd Annu.
Meet. Assoc. Comput. Linguist, Morristown, NJ: Assoc. Comput. Lin-
Currently, feature-based summarization is sentence-level guist., 2005, pp. 115–124.
summarization. Although summary sentences are about product [12] A. B. Goldberg and X. Zhu, “Seeing stars when there aren’t many stars:
features and opinion words, these sentences are obtained from Graph-based semi-supervised learning for sentiment categorization,” in
Proc. TextGraphs: First Workshop Graph Based Methods Nat. Lang. Pro-
different paragraphs or movie reviews. It is obvious that a flu- cess, Morristown, NJ: Assoc. Comput. Linguist., 2006, pp. 45–52.
ency problem exists in the summary. Thus, it will be our future [13] B. Snyder and R. Barzilay, “Multiple aspect ranking using the good grief
work to achieve greater fluency of the summarization. algorithm,” in Proc. HLT-NAACL, 2007, pp. 300–307.
[14] L. Zhuang, F. Jing, and X.-Y. Zhu, “Movie review mining and summariza-
tion,” in Proc. 15th ACM Int. Conf. Inf. Knowl. Manage., 2006, pp. 43–50.
[15] Y. Lu, C. Zhai, and N. Sundaresan, “Rated aspect summarization of short
VI. CONCLUSION comments,” in Proc. 18th Int. Conf. World Wide Web, New York: ACM,
2009, pp. 131–140.
In this paper, we design and implement a movie-rating and [16] T. Hofmann, J. Puzicha, and M. I. Jordan, “Learning from dyadic data,” in
review-summarization system in mobile environment. Senti- Proc. Conf. Adv. Neural Inform. Process. Syst. II, Cambridge, MA: MIT
ment classification is applied to the movie reviews, and rat- Press, 1999, pp. 466–472.
[17] T. K. Landauer, P. W. Foltz, and D. Laham, “Introduction to latent semantic
ing information is based on sentiment-classification results. analysis,” Discourse Processes, vol. 25, pp. 259–284, 1998.
In feature-based summarization, product-feature identification [18] T. Joachims, Learning to Classify Text Using Support Vector Machines:
plays an essential role, and we propose a novel approach based Methods, Theory and Algorithms. Norwell, MA: Kluwer, 2002.
[19] C. Silva, U. Lotrič, B. Ribeiro, and A. Dobnikar, “Distributed text classi-
on LSA to identify related product features. Moreover, we use a fication with an ensemble kernel-based learning approach,” IEEE Trans.
statistical approach to identify opinion words. Product features Syst., Man, Cybern. C: Appl. Rev., vol. 40, no. 3, pp. 287–297, May 2010.
and opinion words will be used as the basis for feature-based [20] L. Rokach and O. Maimon, “Top-down induction of decision trees
classifiers—A survey,” IEEE Trans. Syst., Man, Cybern. C, Appl. Rev.,
summarization. vol. 35, no. 4, pp. 476–487, Nov. 2005.
In a system-performance-analysis experiment, the number of [21] G. P. Zhang, “Neural networks for classification: A survey,” IEEE Trans.
features plays an important role in SVM-model loading and Syst., Man, Cybern. C, Appl. Rev., vol. 30, no. 4, pp. 451–462, Nov. 2000.
[22] (2001). LIBSVM: A library for support vector machines [Online]. Avail-
prediction. We use frequency criterion to reduce the number of able: http://www.csie.ntu.edu.tw/ cjlin/libsvm.
features, and the experiment shows that it takes less than 6 s to [23] T. Hofmann, “Unsupervised learning by probabilistic latent semantic anal-
load the SVM model and classify the reviews. Furthermore, we ysis,” Mach. Learn., vol. 42, no. 1/2, pp. 177–196, 2001.
[24] A. P. Dempster, N. M. Laird, and D. B. Rubin. (1977). Maximum likeli-
propose an LSA-based filtering approach to reduce the size of hood from incomplete data via the em algorithm. J. R. Stat. Soc., Series B
the summary based on the user’s preferred aspect. The design [Online]. vol. 39, no. 1, pp. 1–38. Available: http://citeseerx.ist.psu.edu/
proposed in this paper could fully utilize the Internet content viewdoc/summary?doi=10.1.1.133.4884.
LIU et al.: MOVIE RATING AND REVIEW SUMMARIZATION IN MOBILE ENVIRONMENT 407

[25] C. D. Manning, P. Raghavan, and H. Schtze, Introduction to Information Chia-Hoang Lee received the Ph.D. degree in com-
Retrieval. New York: Cambridge Univ. Press, 2008. puter science from the University of Maryland,
[26] D. Ramage, P. Heymann, C. D. Manning, and H. Garcia-Molina, “Clus- College Park, in 1983.
tering the tagged web,” in Proc. 2nd ACM Int. Conf. Web Search Data He is currently a Professor with the Department of
Mining, New York: ACM, 2009, pp. 54–63. Computer Science, National Chiao Tung University,
Hsinchu, Taiwan. He was a Faculty Member with the
University of Maryland and Purdue University, West
Lafayette, IN. His current research interests include
artificial intelligence, human–machine interface sys-
tems, natural-language processing, and opinion
mining.

Chien-Liang Liu received the M.S. and Ph.D. de-


Gen-Chi Lu received the Master’s degree in com-
grees in computer science from National Chiao Tung
University, Hsinchu, Taiwan, in 2000 and 2005, puter science from National Chiao Tung University,
Hsinchu, Taiwan, in 2009.
respectively.
He is currently an Engineer with the Global Legal
He is currently a Postdoctoral Researcher with the
Division iTEC, Hon Hai Precision Industry Company
Department of Computer Science, National Chiao
Tung University. His current research interests in- Ltd., Taipei, Taiwan. His current research interests in-
clude natural-language processing, opinion mining,
clude machine learning, natural-language processing,
and full-text search.
and data mining.

Emery Jou received the B.S degree in physics from


Tsing Hua University, Hsinchu, Taiwan, the M.S. de-
gree in computer science from the University of Texas
at Austin, and the Ph.D. degree in computer science
from the University of Maryland, College Park.
Wen-Hoar Hsaio received the B.S. degree from the He is currently a Research Scientist with the Insti-
Department of Computer Science and Information tute for Information Industry, Taipei, Taiwan. He was
Engineering, Chung Cheng Institute of Technology, with several Wall Street firms in the United States
National Defense University, Taipei, Taiwan, in 1980 for more than 12 years (i.e., Morgan Stanley and
and the M.S. degree in 1996 from the Department JPMorganChase) as a System Architect for Security
of Computer Science, National Chiao Tung Univer- Transaction Processing through Single Sign-on and
sity, Hsinchu, Taiwan, where he is currently working Public Key Infrastructure. He was also with Thales nCipher, Cambridge, U.K.,
toward the Ph.D. degree with the Department of Com- where he was engaged in Tape Storage Data Encryption and Key Management
puter Science. Systems. In 2009, he was a Visiting Professor with the College of Computer
His current research interests include information Science, National Chiao Tung University, Hsinchu. He was also a consultant for
retrieval, web mining, and machine learning. the Industrial Technology Research Institute, Hsinchu.

You might also like