An Analysis Method For Interpretability of CNN Text Classification Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

future internet

Article
An Analysis Method for Interpretability of CNN Text
Classification Model
Peng Ce and Bao Tie *
School of Computer Science and Technology, Jilin University, Changchun 130012, China;
pengce18@mails.jlu.edu.cn
* Correspondence: baotie@jlu.edu.cn

Received: 10 November 2020; Accepted: 9 December 2020; Published: 13 December 2020 

Abstract: With continuous development of artificial intelligence, text classification has gradually
changed from a knowledge-based method to a method based on statistics and machine learning.
Among them, it is a very important and efficient way to classify text based on the convolutional
neural network (CNN) model. Text data are a kind of sequence data, while time sequentiality of the
general text data is relatively weak, so text classification is usually less relevant to the sequential
structure of the full text. Therefore, CNN-based text classification has gradually become a research
hotspot when dealing with issues of text classification. For machine learning, especially deep
learning, model interpretability has increasingly become the focus of academic research and industrial
applications, and also become a key issue for further development and application of deep learning
technology. Therefore, we recommend using the backtracking analysis method to conduct in-depth
research on deep learning models. This paper proposes an analysis method for interpretability of a
CNN text classification model. The method proposed by us can perform multi-angle analysis on the
discriminant results of multi-classified text and multi-label classification tasks through backtracking
analysis on model prediction results. Finally, the analysis results of the model can be displayed using
visualization technology from multiple dimensions based on interpretability. The representative data
set IMDB (Internet Movie Database) in text classification is verified by examples, and the results show
that the model can be effectively analyzed when using our method.

Keywords: text classification; convolutional neural network; interpretability analysis; visualization

1. Introduction
Text classification refers to automatically classify and mark text sets using a computer according to
a certain classification system or standard. It finds the model of relationship between document features
and document categories based on a set of marked training documents, and then uses the learned
relationship model to make category judgments on new documents. With continuous development of
artificial intelligence, text classification has gradually changed from a knowledge-based method to
a method based on statistics and machine learning. At present, two basic deep learning algorithms
for processing sequences are recurrent the neural network and one dimensional convolutional neural
network [1]. Among them, is a very important and efficient way to classify text based on the
convolutional neural network (CNN) model, because the CNN text classification model can achieve
better prediction accuracy and consume fewer computing resources [2]. Kim Y trained the CNN on
the pre-trained word vectors for experiments of sentence level classification task and proved that
a simple CNN can obtain good results on multiple benchmarks with only a few super parameter
adjustments and static vectors [3]. Kim, H proposed a kind of convolutional neural network for the
task of emotion classification and proved the effectiveness of using a continuous convolutional layer for

Future Internet 2020, 12, 228; doi:10.3390/fi12120228 www.mdpi.com/journal/futureinternet


Future Internet 2020, 12, 228 2 of 14

longer texts through experiments with three famous data sets. Therefore, CNN-based text classification
has gradually become a research hotspot when dealing with issues of text classification.
The current research on CNN-based text classification can be roughly divided into the following
two categories. One is research on the method of CNN-based text classification [4]. Wu YuJia et al.
proposed a framework based on high utility neural networks for text classification. Which can effectively
mine the importance of the text features and their association. MHUI (Mining High Utility Itemsets)
from databases is an emerging topic in data mining. It can mine the importance and the co-occurrence
frequency of each feature in the dataset. The co-occurrence frequency of the feature reflects the
association between the text features. Using MHUI as the mining layer of HUNN (High Utility Neaural
Networks), it is used to mine strong importance and association text features in each type, select these
text features as input to the neural networks. Then, acquire the high-level features with a strong
ability of categorical representation through the convolution layer for improving the accuracy of model
classification [5]. L. Fu et al. proposed that the effectiveness of such techniques has not been assessed
for the hierarchical text classification (HTC) yet. This study investigates the application of those models
and algorithms on this specific problem by means of experimentation and analysis. They trained
classification models with prominent machine learning algorithm implementations—fastText, XGBoost,
SVM (Support Vector Machines), and Keras’ CNN—and noticeable word embeddings generation
methods—GloVe, word2vec, and fastText—with publicly available data and evaluated them with
measures specifically appropriate for the hierarchical context. FastText achieved an LCAF1 (Least
Common Ancestors) of 0.893 on a single-labeled version of the RCV1(Ruters database) dataset [6].
Jin, R. proposed a new architecture of CNN based on multiple representations for text classification,
by constructing multiple planes so that more information can be dumped into the networks, such as
different parts of text obtained through a named entity recognizer or part-of-speech tagging tools,
different levels of text embedding, or contextual sentences. Various large-scale, domain-specific
datasets are used to validate the proposed architecture. It can obtain further gains in performance
over state-of-the-art deep neural network models [7]. Wang, Lixia et al. proposed a method of
CNN-based text classification for power failure in response to the characteristics of power failure
text, input the processed data set information into this classification model to classify the short
text of power failure. Experiments showed that the accuracy of the proposed classification model
on the data set can reach 88.35%, and better classification effects were achieved [8]. Zhang T et al.
mainly discussed the use of CNN to extract features from the comment text of blogs and shopping
websites. The author believed that reasonable use of such information could help understand public
opinions, and respond in a timely manner, help distributors improve product quality and service levels,
and enable consumers to understand the commodities. The final experiment showed effectiveness
of the method [9]. Xiaoli Zhao et al. proposed a dual-input convolutional neural network structure
in response to the phenomenon that more and more depression patients use Weibo as a way of
self-expression nowadays. This method could take the external features and semantic features of the
text as input and compare the accuracy of algorithm classification through SVM and CNN. The final
experiment showed that CNN could further improve the accuracy of classification. The other is research
on text classification methods based on the CNN mixed model [10]. L. Fu et al. proposed an effective
text classification framework. This framework is a CNN–BLSTM (Long Short Term Memory) network
that mixes character-level and word-level features with different weights through content-based
concatenation, which overcomes the difference in semantic relations in Chinese in Chinese word
segmentation. This leads to the problem of ambiguity in word segmentation, and the proposed method
makes up for such problems [11]. Zhang et al. proposed a CNN–LSTM–Attention coordination model.
The author learned the vector representation of sentences in the CCLA(CNN-LSTM-Attention) unit.
The semantic and emotional information of the sentences and their relationships are adaptively encoded
into the vector representation of the document. Compared with other methods, the CCLA model can
capture local and long-distance semantic and emotional information very well, and the effectiveness of
the model is proven through experiments [12]. Kai Chen et al. proposed a compact CNN–DBLSTM
Future Internet 2020, 12, 228 3 of 14

(Long Short Term Memory) model. It has fewer parameters and low computational cost and can
adapt to multiple receiving fields extracted based on CNN features. The author trained the compact
CNN–DBLSTM by using the training set of the popular benchmark database, and finally combined this
character model with the character trigram language model [13]. Usama, Mohd et al. proposed a new
recurrent convolutional attention neural model for sentiment classification of the short text by using
the attention mechanism with a recurrent convolutional neural network (RCNN). In the proposed
model attention score is calculated by averaging hidden units (feature maps) generated from long
short-term memory (LSTM). Then we combined this attention score with recurrent convolution-based
encoded text features to obtain final sentence representation. Here, attention will be focused on
important text features, and recurrent convolution makes full use of limited contextual information
by processing sentence representation through different window sizes with specialized recurrent
convolution operation [14]. She, Xiangyang proposed an algorithm that uses the Skip-Gram (continuous
skip-gram) model and the continuous bag-of-words (CBOW) model in word2vec to represent words as
vector, using CNN to extract local features of text, LSTM saves historical information, extracts contextual
dependencies of text, and uses the feature vector output by CNN as the input of LSTM, using the
Softmax classifier for classification [15]. Guo, Long et al. proposed a hybrid CNN–RNN attention-based
neural network, named CRAN, which combines the convolutional neural network and recurrent neural
network effectively with the help of the attention mechanism. We validate the proposed model on
several large-scale datasets and compare it with the state-of-the-art models. Experimental results show
that CRAN can achieve state-of-the-art performance on most of the datasets. In particular, CRAN yields
better performance with much fewer parameters compared with a very deep convolutional networks
with 29 layers, which proves its effectiveness and efficiency.
Machine learning, especially deep learning, model interpretability [16] has increasingly become
the focus of academic research and industrial applications, and also has become a key issue for
further development and application of deep learning technology. Therefore, we recommend using the
interpretability of the model to analyze the CNN classification model and propose an analysis method
for interpretability of the CNN text classification model. The greatest achievements of this paper are
as follows:

• The analysis method for interpretability of the CNN text classification model. The method
proposed by us can perform multi-angle analysis on the discriminant results of multi-classified
text and multi-label classification tasks through backtracking analysis on model prediction results.
• Using the data visualization technology to display model analysis results. Finally, the method
proposed by us can display the analysis results of the model using visualization technology from
multiple dimensions based on interpretability.

The rest of this paper is arranged as follows. Section 2 introduces the analysis method that we
recommend to use for interpretability of the CNN text classification model. Section 3 introduces how
to use visualization technology to display and analyze the model analysis results. Section 4 evaluates
our method through experiments. Section 5 discusses the related work, and summarizes the full text.

2. Interpretability Analysis Method


The overall process of the interpretability evaluation method proposed in this paper is shown in
Figure 1. First, preliminary preprocessing was required for original text data, and the vectorized text
after preprocessing was used as the input of the CNN text classification model. After calculation of the
CNN text classification model, the category label of the text was obtained, then the category label was
used as the input of the backtracking analysis model, and the contribution value of the words in the
text was calculated through reverse backtracking. Finally, the analysis was performed according to
interpretability of the model, and the analysis results were displayed in a visualized way. Next, we will
introduce the internal structure of the CNN text classification model and the backtracking analysis
model in detail.
Future
FutureInternet 2020,12,
Internet2020, 12,228
x FOR PEER REVIEW 44ofof14
14

Figure 1. Analysis method of model interpretability.


Figure 1. Analysis method of model interpretability.
2.1. Text Data Preprocessing
2.1. Text Data Preprocessing
Text preprocessing mainly includes operations such as word segmentation, stop word removal,
Text preprocessing
part-of-speech tagging, etc. mainlyTheincludes operations such
text characteristics as word languages
of different segmentation, are stop word removal,
different, and the
part-of-speech tagging, etc. The text characteristics of different
processing methods may be different. Here, we recommend decomposing the text into separate languages are different, and the
processing
meaning methods
signs, such as:may be different.
words in English Here,and wewordsrecommend
in Chinese. decomposing the text into
For a text composed separate
of multiple
meaning signs,
identifiers, such as:needs
this method words in English each
to transform and identifier
words in intoChinese. For a text
a distributed composed of which
representation, multipleis
identifiers, this method needs to transform
used as the input of the CNN text classification model. each identifier into a distributed representation, which is
used as the input of the CNN text classification model.
2.2. CNN Text Classification Model
2.2. CNN Text Classification Model
Word embedding in the CNN-based text classification model is a method of converting words in
the text into embedding
Word digital vectors. In order
in the to use the
CNN-based textstandard machine
classification modellearning algorithm
is a method to analyze words
of converting them,
itinisthe
necessary
text into to digital
take these vectors
vectors. converted
In order to use into numbers
the standard as machine
input in digital
learning form [17]. The
algorithm tomethod
analyze
in this paper
them, is to analyze
it is necessary to takebased on vectors
these the CNN text classification
converted into numbers model. as The
inputstructure of form
in digital the CNN
[17].text
The
classification
method in this model
paperis is
generally
to analyze as shown
based on in Figure
the CNN 2. In order
text to store word
classification model.embeddings,
The structurewe ofneed
the
aCNNV*D text
matrix, where V is
classification the size
model of the vocabulary,
is generally as shown inand D is2.the
Figure In dimension
order to store of word embeddings,
embedding.
The
we dimension
need a V*D of matrix,
word embedding
where V isis the a user-defined
size of the hyperparameter.
vocabulary, and D Theis larger D is, the stronger
the dimension of word
the expressiveThe
embedding. ability of word
dimension ofembedding.
word embedding In theismodel, the matrix
a user-defined is called the embedding
hyperparameter. The largerlayer.
D is,
According to the text length and the classification performance
the stronger the expressive ability of word embedding. In the model, the matrix is called theindicators, multiple convolutional
layers and pooling
embedding layers can be
layer. According to adopted to build
the text length andthethe
model. In general,
classification for short text
performance with a length
indicators, multipleof
less than 50 words,
convolutional layersweandrecommend using one
pooling layers can be convolutional
adopted to pooling
build thelayer. TwoInconvolutional
model. pooling
general, for short text
layers
with acan be adopted
length of less forthantext50with
words,a length of 500 wordsusing
we recommend or less.
oneThe structure andpooling
convolutional hyperparameters
layer. Two
of the model can
convolutional be determined
pooling layers canwith the deep
be adopted forlearning
text with model training
a length of 500method.
words orThe less.convolutional
The structure
pooling layer is generally
and hyperparameters of thefollowed
model can by be
a densely
determined connected
with the layer.
deepAmong
learningthem,
modelcommon
training general
method.
pooling methods include
The convolutional pooling max-pooling and average
layer is generally pooling.
followed by aIndensely
this paper, the max-pooling
connected layer. Among method is
them,
used
commonto achieve
general thepooling
effect ofmethods
obtaining the maximum
include max-pooling value, in each
and averagefeature graphInby
pooling. thepaper,
this feature
thegraph
max-
pooling method is used to achieve the effect of obtaining the maximum value, in each feature graph
by the feature graph unit before the classification layer through global maximum pooling. Finally,
Future Internet 2020, 12, x FOR PEER REVIEW 5 of 14

the classification layer is connected according to the actual classification number, which can be two-
class or multi-class.
Future Internet 2020, 12, 228 5 of 14
It shall be noted that when it is required to perform multi-classification tasks at the classification
layer, the number of target labels for multi-label classification is generally uncertain. For this reason,
we recommend
unit determininglayer
before the classification a target label global
through S0. When predicting,
maximum it is still
pooling. the kthe
Finally, classes with the layer
classification largestis
output scores,
connected but only
according classesclassification
to the actual with scores greater than
number, S0 are
which canoutput using S0
be two-class or as the threshold.
multi-class.

Figure 2. Convolutional neural network (CNN)-based text classification model.


Figure 2. Convolutional neural network (CNN)-based text classification model.
It shall be noted that when it is required to perform multi-classification tasks at the classification
2.3. Backtracking
layer, the number Analysis Model
of target labels for multi-label classification is generally uncertain. For this reason,
we recommend determining
The model is based on the a target label
results S0. When
of CNN predicting, it(which
text classification is still the
cankbe
classes with the
single-label orlargest
multi-
output scores, but only the classes with scores greater than S0 are output using S0 as
label). The important factors affecting the prediction results are tracked through backtracking the threshold.
analysis of labels calculated according to the model. Because the convolutional neural network is a
2.3. Backtracking Analysis Model
representation of visual concepts learned, so convolution neural network suitable for visualization,
for aThegivenmodel
input,is through
based onthe the“class
resultsof of
eachCNN text classification
channel to the importance(which can
“ to be strength
“the single-label
of theor
multi-label).
activation of The important
different factors
channels in affecting
the inputthe prediction
text” of the results
weighted are space
tracked throughcan
diagram, backtracking
show the
analysis
convolutionof labels calculated
and pooling layeraccording to the model.
in the network output, Because the convolutional
in the process of the trainingneural
of thenetwork
model, eachis a
representation of visual concepts learned, so convolution neural network
layer using forecast and parameter values will be stored in the model, we can move along the suitable for visualization, for a
given input,
direction of through the “class
the gradient layersof of
each channel analysis.
backtrack to the importance “ to “the
The reverse strength of
backtracking the activation
analysis model ofis
different channels in the input text” of the weighted space diagram,
shown in Figure 3. A text was classified based on the trained CNN text classification model can show the convolution and
pooling
mentioned layer
in in the network
Section 2.2, and output,
reverse in the process ofwas
backtracking the training
performed of the
by model,
category each layer
label. using
Reverse
forecast
calculationandofparameter valuesresults
the predicted will bewas stored in the model,
performed throughwe can movedensely
multiple along the directionlayers,
connected of the
gradient layersand
pooling layers of backtrack analysis.
convolutional Thesoreverse
layers, backtracking
as to calculate analysis
the degree of model
influenceis shown in Figure
of all parts of the3.
A text
text was value
vector classified based
entered on onthethe trained CNN
prediction results.text
Thisclassification model mentioned
degree of influence in Section
was a quantitative 2.2,
value
and reverse
for each backtracking
point of the inputwas vector
performedand bywascategory
also the label. Reverse
basic calculation
data source of the
for the predicted results
subsequent model
was performed through
interpretability analysis.multiple
In addition densely connected
to observing and layers, pooling
explaining thelayers andstructure
internal convolutional
of thelayers,
CNN
so as to the
model, calculate the degree
deconvolution of influence
network couldofalso
all parts
be usedof the text vector
to solve problemsvalue entered on the
encountered prediction
during model
results.
buildingThisanddegree of influence
debugging. A betterwas a quantitative
classification model value
wasfor each point
obtained of the input
by analyzing vector and
the internal was
results,
also basic data source
and the keywords affectingforthe
thetext
subsequent model through
were obtained interpretability analysis. In addition to observing
text restoration.
and explaining the internal structure of the CNN model, the deconvolution network could also be used
to solve problems encountered during model building and debugging. A better classification model
was obtained by analyzing the internal results, and the keywords affecting the text were obtained
through text restoration.
Future
FutureInternet 2020,12,
Internet2020, 12,228
x FOR PEER REVIEW 66 of
of 14
14

Figure 3. Reverse backtracking analysis model.


Figure 3. Reverse backtracking analysis model.
The calculation formulae of quantitative values are shown in Equations (1) and (2), where xi
represents the value offormulae
The calculation each dimension in the embedded
of quantitative layer
values are of the in
shown word, and µ represents
Equations thewhere
(1) and (2), medianxi
of the embedded layer matrix:
represents the value of each dimension in the embedded layer of the word, and μ represents the
median of the embedded layer matrix: v
u
128 128
P t
i = 1 Xi 1 X
QA∑ = (Xi − µ)2 ∈ [0.2, 0.4] (1)
128 ∑
128
i=1 ∈ 0.2,0.4 (1)

P128 [x( N2 )+x( N2 +1)] u


v
128
t
1 X
QA = i=1 2
(Xi − µ)2 ∈ [0, 0.2] ∩ [0.4, +∞] (2)
∑ 128 128
∑ i=1
∈ 0,0.2 ∩ 0.4, ∞ (2)

3. Interpretability Analysis of the Model


The current visualization
3. Interpretability Analysis of research
the Model based on the convolutional neural network is divided into
three directions: gradient-based filter visualization [18], upper convolution network visualization [19],
The current visualization research based on the convolutional neural network is divided into
and image region extraction and display visualization [20]. This paper mainly uses the idea of image
three directions: gradient-based filter visualization [18], upper convolution network visualization
region extraction to extract and output text regions that contribute to improvement of classification
[19], and image region extraction and display visualization [20]. This paper mainly uses the idea of
confidence. The interpretability analysis method we proposed based on the CNN text classification
image region extraction to extract and output text regions that contribute to improvement of
model is based on the basic data of backtracking analysis, which can provide a multi-dimensional
classification confidence. The interpretability analysis method we proposed based on the CNN text
and in-depth visual analysis diagram for interpretability of the model’s prediction results. Based on
classification model is based on the basic data of backtracking analysis, which can provide a multi-
these visual analysis diagrams, we can further carry out various analyses, such as text representation
dimensional and in-depth visual analysis diagram for interpretability of the model’s prediction
methods and text word styles.
results. Based on these visual analysis diagrams, we can further carry out various analyses, such as
text Visualization
3.1. representation methods
Diagram and textWeight
of Comment word styles.

The backtracking
3.1. Visualization analysis
Diagram in the previous
of Comment Weight section provided a quantitative value of the importance
of the model input vector to the predicted results. In response to the classification results of a text,
The backtracking
the importance analysis
vector matrix was in the previous
standardized and section provided
normalized, and thea importance
quantitativeofvalue
keywordsof the
in
importance of the model input vector to the predicted results. In response to the classification
the text was mapped to the RGB color value between 0 and 255. The Keyword -RGB mapping table as results
of a text,
shown the importance
in Figure 4 was made. vector matrix was
The vertical standardized
axis of the diagramand normalized,
corresponds andidentifier
to each the importance of
of the text,
keywords in the text was mapped to the RGB color value between 0 and 255. The Keyword
and the horizontal axis corresponds to each dimension of distributed representation of the identifier. -RGB
mapping
The color table
valueasindicates
shown inthe Figure 4 was made.
importance The vertical
of words axis In
in the text. of the
thisdiagram
way, wecorresponds to each
could understand
identifier ofofthe
distribution text, andthrough
importance the horizontal
the diagram axisas corresponds
a whole. The to texteach dimensionbarofondistributed
identification the left of
representation of the identifier. The color value indicates the importance of words in the text. In this
way, we could understand distribution of importance through the diagram as a whole. The text
Future Internet 2020, 12, 228 7 of 14
Future Internet 2020, 12, x FOR PEER REVIEW 7 of 14

the diagram contains


identification bar on the threeleftparts,
of theand the rightmost
diagram containscontained the classified
three3 parts, text identification
and the rightmost containedgrid.
the
The color represented the importance of each indicator to the prediction results, and
classified text identification grid. The color represented the importance of each indicator to the the importance of
each indicator
prediction wasand
results, the mean value of the
the importance of importance of its
each indicator wascorresponding
the mean value distributed representation.
of the importance of its
The middle anddistributed
corresponding the left contained digital grids,
representation. The corresponding
middle and the to the frequency
left contained within the group
digital grids,
and the criticality
corresponding within
to the the group,
frequency withinrespectively.
the group andThethe
grid color indicated
criticality within the thegroup,
level respectively.
of the value.
This grid
The visualization diagram
color indicated thecontained
level of the importance
value. This data of each distributed
visualization representation
diagram contained dimension
the importance
of each
data identifier.
of each If the information
distributed representation of the same identifier
dimension in multipleIf texts
of each identifier. is summarized,
the information of thefurther
same
analysis can
identifier be carried
in multiple out.is summarized, further analysis can be carried out.
texts

Figure
Figure 4.
4. Keyword–RGB
Keyword–RGB mapping
mapping table.
table.

3.2. Comments
3.2. Comments on
on Comprehensive
Comprehensive Analysis
Analysis Diagram
Diagram
Through the
Through thedatadataobtained
obtainedfrom the experiment,
from we designed
the experiment, a comprehensive
we designed analysis diagram,
a comprehensive analysis
which
diagram,includes
whicha word relevance
includes a word analysis graph,
relevance a positive
analysis high afrequency
graph, positive vocabulary
high frequencygraph,vocabulary
a negative
high frequency
graph, a negative vocabulary graph and
high frequency a comprehensive
vocabulary graph and analysis text of the data.
a comprehensive Sincetext
analysis there
of are
the many
data.
words, the are
Since there words
many canwords,
be filtered by setting
the words can bethe weight
filtered bythreshold. After analysis,
setting the weight threshold.it was
Aftersuggested
analysis,
to was
it control the threshold
suggested within
to control the the range of
threshold (0.73,the
within 0.85).
rangeMeanwhile, taking
of (0.73, 0.85). into account
Meanwhile, thatinto
taking the
weights of words in different comments are different, we recommend the use of the
account that the weights of words in different comments are different, we recommend the use of the formula shown in
Equationshown
formula (3) for in
weighted
Equationcalculation of wordcalculation
(3) for weighted weights, where Wi represents
of word the weight
weights, where value in the
Wi represents
text, the MaxW represents the maximum weight value, and the drawing of
weight value ini the text, the MaxWi represents the maximum weight value, and the drawing a comprehensive analysis
of a
diagram based on the calculation results.
comprehensive analysis diagram based on the calculation results.
∗ i
∑i=0 A∗W
PN
MaxWi (3)
WA = (3)
N
In the comprehensive analysis diagram,
diagram, we can see the classification list of positive comments
and negative comments obtained after the model analysis. After clicking on the comment title in the
list, you will
will enter
enterthe
thedetailed
detailedanalysis
analysispage
page
ofof
thethe currently
currently clicked
clicked comment.
comment. This
This page
page includes
includes the
the comment
comment keyword
keyword bubble bubble
chart, chart,
commentcomment
keyword keyword
statisticsstatistics chart, comment
chart, comment mark textmark text and
and intelligent
intelligent analysis
analysis text. It can text. It can effectively
effectively help users help users
analyze theanalyze
analysisthe analysis
results results
of the of the model.
model.

4. Experimental
4. Experimental Design
Design and
and Result
Result Analysis
Analysis

4.1. Experiment
4.1. Experiment Environment
Environment
The experimental
The experimental model
model and
and data
data processing
processing in
in this
this paper
paper were
were completed based on
completed based on Python’s
Python’s
Keras, and the word embedding model used was word2vec.
Keras, and the word embedding model used was word2vec.
Future Internet 2020, 12, 228 8 of 14
Future Internet 2020, 12, x FOR PEER REVIEW 8 of 14

4.2. Selection and Processing of Data Set


A commonly used data set, IMDB, in text classification application was used in the experiment
to verify the application effects of of our
our proposed
proposed method
method in in the
the actual
actual process.
process. The IMDB data set
contained 50,000
50,000English
Englishtextstextsofof movie
movie comments
comments withwith obvious
obvious bias.bias.
Among Among
them,them, there25,000
there were were
25,000
positivepositive and negative
and negative samples,samples,
whichwhichwerewere
the the positive
positive andand negative
negative commentsofofusers
comments users on
on the
movie, respectively.
respectively.
First of ofall,
all,we
weshall
shalldesign
design a dictionary
a dictionary with a fixed
with length,
a fixed and the
length, anddictionary contains
the dictionary the words
contains the
that
wordsappear
that in the data
appear set. data
in the The position
set. The of the word
position of in
thethe
worddictionary shall be arranged
in the dictionary shall beinarranged
descending in
order according to the frequency of the word in the data set. After sorting, we
descending order according to the frequency of the word in the data set. After sorting, we obtain a obtain a word frequency
dictionary
word frequencywith adictionary
size of 89,527 withasashown
size ofin Figure
89,527 as5.shown
Wherein0Figure
does not represent
5. Where any specific
0 does word, any
not represent but
is used to
specific encode
word, butunknown
is used towords.
encodeIn the experiment,
unknown words. In wethesetexperiment,
the maximum wenumber of words asnumber
set the maximum 10,000,
that is only
of words as 10,000
10,000,words
that is in the10,000
only articlewords
sortedinbytheword frequency
article sorted by were
wordextracted.
frequency After
were obtaining
extracted. a
dictionary,
After obtainingyou can find the index
a dictionary, you ofcanthefind
wordthegiven
indexinofthe thedictionary.
word given Theinword vector is toThe
the dictionary. represent
word
each
vectorword
is to by its index
represent eachin word
the dictionary.
by its index According to the actual
in the dictionary. situation,
According to theweactual
suggest that each
situation, we
comment
suggest that be each
uniformly
comment constructed into aconstructed
be uniformly word vectorinto witha aword
length of 500,
vector with and the short
a length text and
of 500, is filled
the
with
short0text
(null) at thewith
is filled beginning
0 (null)ofat
the sentence.
the beginning Finally,
of theinput the word
sentence. vector
Finally, of each
input comment
the word processed
vector of each
into the CNN
comment text classification
processed into the CNN model.
text classification model.

Data set
Figure 5. Data set Word Frequency Dictionary (part).

4.3. Experiment Design


4.3. Experiment Design
(A) CNN text classification model
(A) CNN text classification model
Based on the CNN model, this paper suggests combining word embedding with it, so that the
Based onofthe
performance theCNN
CNNmodel, this paper
on sentiment textsuggests combining
classification word
tasks can beembedding
optimized. with it, so that
The specific the
CNN
performance of the CNN on sentiment text classification tasks can be optimized.
text classification model is designed as follows: find the word embedding of each word appearing The specific CNN
text
in classification
each sample inmodel the word is designed
vector list as trained
follows:byfind
thethe word embedding
Skip-gram model, and of each worditappearing
combine into an m*k in
each sample in thematrix
two-dimensional word vector
as the list
input trained
of CNN.by the Skip-gram
Where, m is themodel, and combine
number of wordsitcontained
into an m*k two-
in each
dimensional matrix as the input of CNN. Where, m is the number of
comment in the data set, and k is the length of word embedding. In this experiment, m is uniformly words contained in each
comment in
controlled to the
500data
words,set,and
and128k isisthe lengthasofthe
selected word embedding.
length of each word In this experiment,
embedding. Them is uniformly
representation
controlled to 500 words, and 128 is selected as the length of each word embedding.
of CNN learning is more suitable for the visual, and dealing with non-time series problems consumes The representation
of CNN
fewer learning is
resources. Atmore suitable
the same for the
time, in thevisual, and dealing
experiments wewithfoundnon-time
that for series
textproblems
within 50consumes
words a
fewer resources.
convolution pooling At layer
the samecan betime,
set, in
andthe experiments
within 500 words weoffound
text twothat for text within
convolution pooling50 layer
words cana
convolution pooling layer can be set, and within 500 words of text two convolution
be used, so according to the demand of the actual data, the CNN model uses two convolutional pooling pooling layer can
be used, so according to the demand of the actual data, the CNN model
layers. The convolutional pooling layer is followed by a densely connected layer. As the convolutional uses two convolutional
pooling
layer layers.
in the model Theuses
convolutional pooling layer
multiple convolution is followed
kernels, sufficient bylocal
a densely
features connected layer.and
are extracted, As the
the
convolutional layer in the model uses multiple convolution kernels, sufficient
accuracy of the experimental model reaches 89%. Therefore, before the classification layer of the model, local features are
extracted,
in and the
order to reduce theaccuracy
parameters ofofthe experimental
the model, we use model
the global reaches 89%. Therefore,
max-pooling method. The before the
structure
classification layer of the model, in order to reduce the parameters of the
of the convolutional neural network in the model is shown in Figure 6, and the binary classification model, we use the global
max-pooling
was used as the method.
outputThe structure
of the of the convolutional neural network in the model is shown in
final model.
Figure 6, and the binary classification was used as the output of the final model.
Future Internet 2020, 12, 228 9 of 14
Future Internet 2020, 12, x FOR PEER REVIEW 9 of 14

Figure 6. Structure diagram of convolutional neural network.

The CNN text classification model was iterated for 30 rounds, as shown in Figure 7; the
horizontal axis represents the number of iterations, the vertical axis represents the accuracy in the left
figure, and the vertical axis represents the loss in the right figure. For training and testing the process
model accuracy and loss of the function curve, we conducted the test ten times. Through the test we
can see from the picture that when the model is the eighth round of iteration the model is in the best
condition, when the number of iterations is more than 10 rounds, model validation set loss gradually
increased as a result, so we suggest the model number of iterations in round 8 for the best control.
One of the fundamental Figure 6.
Figure problems
6. Structureindiagram
machine learning
convolutional
of convolutional is theneural
contradiction
neural network. between optimization
network.
and generalization. In this paper, multi-round iteration is used to train the model. In order to prevent
The
The CNNCNNtext
the overfitting of classification
text
the classification
model, the model
modelwas was
optimal iterated for
iterated
iteration 30for
times rounds,
of30the as shown
rounds,
model as inshown
can Figure 7; the
in
be determined. horizontal
Figure 7; the
In
axis represents
horizontal axis the number
represents theof iterations,
number of the vertical
iterations, the axis represents
vertical axis
experiment, after 30 rounds of training, we found that the model began to overfitting after 8–10 the accuracy
represents the in the
accuracy left
in figure,
the left
and theand
figure,
rounds, vertical
andthe axis
setrepresents
vertical
then axisnumber
the the loss
represents in the
the lossright
of iterations intothe8figure.
right For training
figure.
to retrain theFor and Since
testing
training
model. andthe theaccuracy
process
testing model
the process
of the
accuracy and
model accuracy loss of
and the
loss function
of the curve,
function we conducted
curve, we the
conducted test ten
the times.
test ten
affects the interpretability of the model, the prediction results obtained by using the modelThrough
times. the
Throughtest we
the can
test see
we
from the
can see
with picture
thefrom that when
the picture
required accuracy the
thataremodel
whenmore is the eighth
themeaningful,
model is thethusround of
eighth iteration
theround the model
of iteration
interpretability is in the best
thevisualization
and condition,
model is in the best
results
when the number
condition,
obtained bywhen
model ofbacktracking
the iterations
number ofisiterations
more thanis
analysis are10 rounds,
more thanmodel
meaningful. In validation
10 rounds, modelset
the process loss gradually
validation
of model increased
set loss
training, gradually
the above as
aincreased
result, so we
as a suggest
result, sothe
we model
suggest number
the of
model
methods are used to adjust the model to prevent overfitting.iterations
number in
ofround 8
iterations forinthe best
round control.
8 for the best control.
One of the fundamental problems in machine learning is the contradiction between optimization
and generalization. In this paper, multi-round iteration is used to train the model. In order to prevent
the overfitting of the model, the optimal iteration times of the model can be determined. In the
experiment, after 30 rounds of training, we found that the model began to overfitting after 8–10
rounds, and then set the number of iterations to 8 to retrain the model. Since the accuracy of the
model affects the interpretability of the model, the prediction results obtained by using the model
with the required accuracy are more meaningful, thus the interpretability and visualization results
obtained by model backtracking analysis are meaningful. In the process of model training, the above
methods are used to adjust the model to prevent overfitting.

Figure 7. Structure diagram of convolutional


Figure 7. convolutional neural
neural network.
network.

One of the fundamental problems in machine learning is the contradiction between optimization
and generalization. In this paper, multi-round iteration is used to train the model. In order to prevent
the overfitting of the model, the optimal iteration times of the model can be determined. In the
experiment, after 30 rounds of training, we found that the model began to overfitting after 8–10 rounds,
and then set the number of iterations to 8 to retrain the model. Since the accuracy of the model affects
the interpretability of the model, the prediction results obtained by using the model with the required
accuracy are more meaningful, thus the interpretability and visualization results obtained by model
backtracking analysis are meaningful.
Figure In diagram
7. Structure the process of model training,
of convolutional the above methods are used to
neural network.
adjust the model to prevent overfitting.
(B) Backtracking analysis model
Future Internet 2020, 12, x FOR PEER REVIEW 10 of 14

Future Internet 2020, 12, 228 10 of 14


(B) Backtracking analysis model
After calculation of the CNN text classification model, we obtained the category label of the
After
current calculation
text. There wereof thetwo
CNN text of
types classification model,
labels in the we obtained
experiment the category
(positive label and
evaluation of thenegative
current
text. There were two types of labels in the experiment (positive evaluation and
evaluation). According to the backtracking analysis method mentioned above, we first obtained the negative evaluation).
According to theoutput
category labels backtracking
by the CNNanalysistextmethod mentioned
classification above,
model, we firstbacktracking
performed obtained the analysis
categoryon labels
the
output by the CNN text classification model, performed backtracking analysis on
key parts of the text that affected the model’s prediction results through deconvolution and depooling the key parts of
the text that affected the model’s prediction results through deconvolution and
operations, and provided a basic data source for the subsequent visual analysis of experimental depooling operations,
and provided
results. a basicofdata
The results source for the
backtracking subsequent
analysis visual analysis
of a comment in the of experimental
data results.
set (No.14_8) wereTheshownresults
in
Figure 8. The left side of the figure was the original text of the comment, and the comment side
of backtracking analysis of a comment in the data set (No.14_8) were shown in Figure 8. The left was
of the figure
composed of was the original
131 words. text of
The right was thethe
comment, and the
weight matrix ofcomment
131 wordswas composed
obtained after of 131 words.
backtracking
The right was the weight matrix of 131 words obtained after backtracking analysis.
analysis. According to the judgment of the CNN text classification model, the comment was a positive According to the
evaluation. Next, we will use the visualization method to find the basis for judgment of the will
judgment of the CNN text classification model, the comment was a positive evaluation. Next, we text
use the visualization
classification model. method to find the basis for judgment of the text classification model.

Figure 8. Comment number 14_8 and the weight matrix.


Figure 8.

4.4. Visual
4.4. Visual Analysis
Analysis of
of Experimental
Experimental Results
Results
In order
In ordertoto evaluate
evaluateand and
analyze modelmodel
analyze interpretability in more detail,
interpretability in more we used
detail,data
wevisualization
used data
visualization technology. According to the weight matrix obtained by the backtracking model4.3,
technology. According to the weight matrix obtained by the backtracking model in Section in
we performed
Section 4.3, we standardization and normalization,
performed standardization andmapped the importance
normalization, mapped of the
the keywords
importance in the
of text
the
to the RGBin
keywords color
the value
text tobetween
the RGB0color and 255,
value after that, all
between 0 the
andweight values
255, after that,are
alluniformly
the weightcompressed
values are
between (0, 1). Then we use the matlibplot library in python
uniformly compressed between (0, 1). Then we use the matlibplot library in python to visually display the weight matrix
to visually
and obtained
display the visualized
the weight matrix anddiagram of thethe
obtained word embedding
visualized weightof(Figure
diagram 9). Since
the word the current
embedding text
weight
consists of 131 words, the first 369 words of the text were filled text. The horizontal
(Figure 9). Since the current text consists of 131 words, the first 369 words of the text were filled text. axis of the diagram
corresponded
The horizontaltoaxis eachof identifier
the diagram of the text, and thetovertical
corresponded axis corresponded
each identifier of the text,toandeachthedimension
vertical axisof
distributed representation
corresponded of the identifier.
to each dimension The color
of distributed value indicated
representation of thethe identifier.
importanceThe of words
color in the
value
text. At the same time, in order to further observe effects of the model, we integrated
indicated the importance of words in the text. At the same time, in order to further observe effects of the embedding
layer
the of thewe
model, word according
integrated thetoembedding
calculationlayer
of theofquantitative value, and
the word according tomagnified
calculationtheof result by 5 times
the quantitative
for display (Figure 10). From the figure, we can see that the word
value, and magnified the result by 5 times for display (Figure 10). From the figure, we can seeweight at two positions inthat
the
comment had the darkest color. The area with the darkest color was between
the word weight at two positions in the comment had the darkest color. The area with the darkest 461 words and 490 words,
and the
color was second
between was between
461 words and 381 words and and
490 words, 397 words.
the secondAccording
was betweento the381
comparison
words andbetween
397 words. the
results and the original text (Figure 11), we can find that two most important
According to the comparison between the results and the original text (Figure 11), we can find that parts affecting model
prediction expressedparts
two most important positive comments
affecting modelon the movie.
prediction In the experiments,
expressed positive comments we randomly sampled
on the movie. In
Future Internet2020,
Future 2020, 12,xxFOR
FOR PEERREVIEW
REVIEW 11 of14
14
FutureInternet
Future Internet 2020,12,
Internet 2020, 12, x FORPEER
12, 228 PEER REVIEW 11
11of
11 of 14
of 14

theexperiments,
the
the experiments,we
experiments, werandomly
we randomlysampled
randomly sampled1000
sampled 1000pieces
1000 piecesof
pieces ofdata
of datafrom
data fromthe
from thedataset;
the dataset;the
dataset; theresults
the resultsare
results arebased
are basedon
based on
on
the
1000
the interpretability
the interpretability
interpretability model
pieces of data model
from
modelthe (including
(including
(including weight
dataset; weight
the
weight and
results
and weight
andare based
weight
weight visual
on the
visual
visual figure), the two
interpretability
figure),
figure), the
the two authors judged
modeljudged
two authors
authors the
(including
judged the
the
emotional
weight
emotionaland disposition
weight
emotional disposition
disposition of of
visual 1000
figure),
of 1000 comments
the
1000 commentstwo and
authors
comments and got
and got a Kappa
judged the
got aa Kappa coefficient
emotional
Kappa coefficient of 0.67,
disposition
coefficient of of 0.67, that
of
0.67, that indicates
1000 high
comments
that indicates
indicates high
high
internal
and
internal
internal consistency,
got consistency,
aconsistency, and both
Kappa coefficient
and
and both
both authors
of authors
0.67,
authors have
thathave
have high accuracy
indicates
high
high accuracy
high
accuracy in judging
internal
in
in judging the results,
consistency,
judging the
the results,
and both
results, thusauthors
thus
thus proving
proving
proving the
have
the
the
effectiveness
high
effectiveness
effectiveness ofinthe
accuracyof
of the
the proposed
judging
proposed
proposed method.
the results,
method.
method. thus proving the effectiveness of the proposed method.

Figure 9.Visualization
Figure Visualization ofthe
the wordsembedding
embedding weight14_8.
14_8.
Figure9.
Figure 9. Visualization of
9. Visualization of thewords
of the words embeddingweight
words embedding weight 14_8.
weight 14_8.

Figure 10.Visualization
Figure Visualization ofthe
the wordsweight
weight 14_8.
Figure10.
Figure 10. Visualization of
10. of the words
words weight14_8.
words weight 14_8.
14_8.

Figure 11. Comment


Figure 11.
Figure Comment tag
tag diagram
diagram numbered
numbered 14_8.
14_8.
Figure11.
11.Comment
Commenttag
tagdiagram
diagramnumbered
numbered14_8.
14_8.
According
Accordingto tothe
to theexperimental
experimental data,
data, we conducted
weconducted
conducted an overall
anoverall
overall analysis
analysis ofthe of the data
theIMDB
IMDB IMDB set,data
and
set, According
According
and through to the
the
the experimental
experimental
python’s data,
data,
Pyecharts we
we conducted
generated a an
an overall
comprehensive analysis
analysis of
of
analysis the IMDB
diagram data
data
of set,
set,
the and
and
text
through the
through
through the python’s
the python’s Pyecharts
python’s Pyecharts generated
Pyecharts generated aaa comprehensive
generated comprehensive analysis
comprehensive analysis diagram
analysis diagram of
diagram of the
of the text
the text
text
backtracking,
backtracking, which
which included
included comprehensive
comprehensive analysis
analysis of
of the
the comment
comment text.
text. From
From text
text analysis
analysis given
given
backtracking,
backtracking,
in the figure, which
which
we couldincluded
included
obtain comprehensive
comprehensive
that the three analysis
analysis
words withof
ofthe
thecomment
comment
highest scores text.
text.
in From
theFrom text
textanalysis
positive analysis
comments given
givenof
in
in
in the
the
the figure,
figure,
figure, we
we
we could
could
could obtain
obtain
obtain that
that
that the
the
the three
three
three words
words
words with
with
with highest
highest
highest scores
scores
scores in
in
in the
the
the positive
positive
positive comments
comments
comments of
of
of
IMDB
IMDBdata data set
dataset were
setwere “great,
were“great, really
“great,really and
reallyand very”,
andvery”,
very”,andand the
andthe three
thethree words
threewords
wordswithwith
withtoptop three
topthree scores
threescores
scoresin in the
inthe negative
thenegative
negative
IMDB
IMDB
comments data set
were were
“bad, “great,
worst really
and and
poor”. very”,
The and the
analysis three
results words
given with
by top
the three scores
backtracking in the
analysis negative
model
comments
comments
comments were
were
were “bad,
“bad,
“bad, worst
worst
worst and
and poor”.
andpoor”.
poor”. The
The analysis
Theanalysis results
results given
resultsgiven
analysisshowed given by
by
bythethe
the backtracking
backtracking
backtracking analysis
analysis model
analysismodel
model
were
were inin line
in line with
line with
with ourour judgment.
our judgment.
judgment. TheThe experiment
The experiment
experiment showed showed that that the
that the CNN
the CNN
CNN text text classification
text classification model
classification modelmodel
were
were
proposed in line
by with
us was our judgment.
interpretable. The
After experiment
clicking on aashowed
comment that the
after CNN
model text classification
classification, the model
detailed
proposed
proposed
proposed by
by
byusus
us was
was
was interpretable.
interpretable.
interpretable. After
After
After clicking
clicking
clicking on
on comment
aacomment
onFigures
comment after
after
after model
model classification,
modelclassification,
classification,the the detailed
thedetailed
detailed
analysis
analysis page
page of
of the
the current
current comment
comment is
is shown
shown in
in Figures 12
12and
and 13.
13.
analysis page of the current comment is shown in
analysis page of the current comment is shown in Figures 12 and 13. Figures 12 and 13.
Future Internet 2020, 12, 228 12 of 14
Future Internet
Future Internet 2020,
2020, 12,
12, xx FOR
FOR PEER
PEER REVIEW
REVIEW 12 of
12 of 14
14

Figure12.
Figure
Figure 12.Comprehensive
12. Comprehensive analysis
Comprehensive diagram
analysis diagram of
diagramof IMDB
ofIMDB dataset.
IMDBdataset.
dataset.

Figure13.
Figure
Figure 13.Detailed
13. Detailed analysis
Detailed analysis of
analysis of IMDB
ofIMDB comments.
IMDBcomments.
comments.
Future Internet 2020, 12, 228 13 of 14

5. Discussion
Text sentiment classification has always been one of the important tasks in natural language
processing. The visualization method is very important to the interpretability of the model, but at
present, there is little research on the interpretability of the visualization, especially the research on
the interpretability of the neural network model based on word embedding. This article attempts to
use the backtracking analysis method to conduct in-depth analysis and research on the deep learning
model and use the visualization method to demonstrate interpretability of the deep learning model in
multiple dimensions. First, this paper proposes an analysis method for interpretability of the CNN
text classification model. Construct the CNN text classification model, perform training and testing
through the IMDB data set, track the category label obtained from the CNN text classification model
through reverse backtracking by using the backtracking analysis model to the important factors that
affect the prediction results, and finally, perform overall analysis on interpretability of the model
through a visualization method. After verification by instances, the method proposed in this paper
achieved the expected effects and realized reasonable interpretation of classification results of the
text classification model. At the same time, our experiment also has limitations. The data source has
limitations, although the method proposed in this paper can be applied to multi-classification problems,
but only uses IMDB data set for verification, and we did not perform experimental verification
on multi-classification problems and text data sets of different lengths. Next, we suggest that an
interpretation method can be used to develop an evaluation tool for a deep neural network model.
This tool can learn information in multiple perspectives, such as knowledge representation, input and
output, etc. from the network, and evaluate robustness and generalization ability of the model through
a large number of experiments. This will further improve trustworthiness of the model. Meanwhile,
our model also has limitations. This paper mainly studies the interpretability of based on the CNN
text classification model, excluding other types of models such as RNN. Next, we can also integrate the
time series model RNN, establish a complete set of evaluation criteria for deep learning interpretability
based on experiments, so that users truly feel that the decision results of the deep learning model are
reasonable and credible.

Author Contributions: Data curation, B.T.; Formal analysis, P.C.; Funding acquisition, B.T.; Resources, B.T.;
Visualization, P.C.; Writing—original draft, P.C.; Writing—review & editing, B.T. All authors have read and agreed
to the published version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Li, Y.; Wang, X.; Xu, P. Chinese Text Classification Model Based on Deep Learning. Future Internet 2018,
10, 113. [CrossRef]
2. Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882.
3. Kim, H.; Jeong, Y.-S. Sentiment Classification Using Convolutional Neural Networks. Appl. Sci. 2019,
9, 2347. [CrossRef]
4. Wu, Y.; Li, J.; Song, C.; Chang, J. High Utility Neural Networks for Text Classification. Tien Tzu Hsueh Pao/Acta
Eletronica Sin. 2020, 48, 279–284.
5. Stein, R.A.; Jaques, P.A.; Valiati, J.F. An analysis of hierarchical text classification using word embeddings.
Inf. Sci. 2019, 471, 216–232. [CrossRef]
6. Jin, R.; Lu, L.; Lee, J.; Usman, A. Multi-representational convolutional neural networks for text classification.
Neural Comput. Appl. 2019, 35, 599–609. [CrossRef]
7. Wang, L.; Zhang, B. Fault Text Classification Based on Convolutional Neural Network. In Proceedings of
the 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA), Bangkok,
Thailand, 16–21 April 2020; pp. 937–941. [CrossRef]
Future Internet 2020, 12, 228 14 of 14

8. Zhang, T.; Li, C.; Cao, N.; Ma, R.; Zhang, S.; Ma, N. Text Feature Extraction and Classification Based on Convolutional
Neural Network (CNN); Zou, B., Li, M., Wang, H., Song, X., Xie, W., Lu, Z., Eds.; Data Science; ICPCSEE 2017;
Communications in Computer and Information Science; Springer: Singapore, 2017; Volume 727.
9. Zhao, X.; Lin, S.; Huang, Z. Text Classification of Micro-blog’s “Tree Hole” Based on Convolutional Neural
Network. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial
Intelligence (ACAI’18), Sanya, China, 21–23 December 2018; pp. 1–5.
10. Fu, L.; Yin, Z.; Wang, X.; Liu, Y. A Hybrid Algorithm for Text Classification Based on CNN-BLSTM with
Attention. In Proceedings of the 2018 International Conference on Asian Language Processing (IALP),
Bandung, Indonesia, 15–17 November 2018; pp. 31–34. [CrossRef]
11. Zhang, Y.; Zheng, J.; Jiang, Y.; Huang, G.; Chen, R. A Text Sentiment Classification Modeling Method Based
on Coordinated CNN-LSTM-Attention Model. Chin. J. Electron. 2019, 28, 120. [CrossRef]
12. Chen, K.; Tian, L.; Ding, H.; Cai, M.; Sun, L.; Liang, S.; Huo, Q. A Compact CNN-DBLSTM Based
Character Model for Online Handwritten Chinese Text Recognition. In Proceedings of the 2017 14th IAPR
International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November
2017; pp. 1068–1073. [CrossRef]
13. Usama, M.; Ahmad, B.; Singh, A.P.; Ahmad, P. Recurrent Convolutional Attention Neural Model for
Sentiment Classification of short text. In Proceedings of the 2019 International Conference on Cutting-Edge
Technologies in Engineering, Icon-CuTE, Uttar Pradesh, India, 14–16 November 2019; pp. 40–45.
14. She, X.; Zhang, D. Text Classification Based on Hybrid CNN-LSTM Hybrid Model. In Proceedings of the
2018 11th International Symposium on Computational Intelligence and Design, ISCID, Hangzhou, China,
8–9 December 2018; pp. 185–189.
15. Guo, L.; Zhang, D.; Wang, l.; Wang, H.; Cui, B. CRAN: A hybrid CNN-RNN attention-based model for text
classification. In Proceedings of the 37th International Conference, ER 2018, Xi’an, China, 22–25 October
2018; pp. 571–585.
16. Fei, W.; Bingbing, L.; Yahong, H. Interpretability for Deep Learning. Aero Weapon. 2019, 26, 39–46. (In Chinese)
17. Huiping, C.; Lidan, W.; Shukai, D. Sentiment classification model based on word embedding and CNN.
Appl. Res. Comput. 2016, 33, 2902–2905.
18. Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Con-volutional Networks: Visualising Image
Classification Models and Saliency Maps. arXiv 2013, arXiv:1312.6034.
19. Dosovitskiy, A.; Brox, T. Inverting Visual Representations with Convolutional Networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recog-nition, LasVegas, NV, USA, 27–30 June 2016;
pp. 4829–4837.
20. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations
from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International
Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional
affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like