An Analysis Method For Interpretability of CNN Text Classification Model
An Analysis Method For Interpretability of CNN Text Classification Model
An Analysis Method For Interpretability of CNN Text Classification Model
Article
An Analysis Method for Interpretability of CNN Text
Classification Model
Peng Ce and Bao Tie *
School of Computer Science and Technology, Jilin University, Changchun 130012, China;
pengce18@mails.jlu.edu.cn
* Correspondence: baotie@jlu.edu.cn
Received: 10 November 2020; Accepted: 9 December 2020; Published: 13 December 2020
Abstract: With continuous development of artificial intelligence, text classification has gradually
changed from a knowledge-based method to a method based on statistics and machine learning.
Among them, it is a very important and efficient way to classify text based on the convolutional
neural network (CNN) model. Text data are a kind of sequence data, while time sequentiality of the
general text data is relatively weak, so text classification is usually less relevant to the sequential
structure of the full text. Therefore, CNN-based text classification has gradually become a research
hotspot when dealing with issues of text classification. For machine learning, especially deep
learning, model interpretability has increasingly become the focus of academic research and industrial
applications, and also become a key issue for further development and application of deep learning
technology. Therefore, we recommend using the backtracking analysis method to conduct in-depth
research on deep learning models. This paper proposes an analysis method for interpretability of a
CNN text classification model. The method proposed by us can perform multi-angle analysis on the
discriminant results of multi-classified text and multi-label classification tasks through backtracking
analysis on model prediction results. Finally, the analysis results of the model can be displayed using
visualization technology from multiple dimensions based on interpretability. The representative data
set IMDB (Internet Movie Database) in text classification is verified by examples, and the results show
that the model can be effectively analyzed when using our method.
1. Introduction
Text classification refers to automatically classify and mark text sets using a computer according to
a certain classification system or standard. It finds the model of relationship between document features
and document categories based on a set of marked training documents, and then uses the learned
relationship model to make category judgments on new documents. With continuous development of
artificial intelligence, text classification has gradually changed from a knowledge-based method to
a method based on statistics and machine learning. At present, two basic deep learning algorithms
for processing sequences are recurrent the neural network and one dimensional convolutional neural
network [1]. Among them, is a very important and efficient way to classify text based on the
convolutional neural network (CNN) model, because the CNN text classification model can achieve
better prediction accuracy and consume fewer computing resources [2]. Kim Y trained the CNN on
the pre-trained word vectors for experiments of sentence level classification task and proved that
a simple CNN can obtain good results on multiple benchmarks with only a few super parameter
adjustments and static vectors [3]. Kim, H proposed a kind of convolutional neural network for the
task of emotion classification and proved the effectiveness of using a continuous convolutional layer for
longer texts through experiments with three famous data sets. Therefore, CNN-based text classification
has gradually become a research hotspot when dealing with issues of text classification.
The current research on CNN-based text classification can be roughly divided into the following
two categories. One is research on the method of CNN-based text classification [4]. Wu YuJia et al.
proposed a framework based on high utility neural networks for text classification. Which can effectively
mine the importance of the text features and their association. MHUI (Mining High Utility Itemsets)
from databases is an emerging topic in data mining. It can mine the importance and the co-occurrence
frequency of each feature in the dataset. The co-occurrence frequency of the feature reflects the
association between the text features. Using MHUI as the mining layer of HUNN (High Utility Neaural
Networks), it is used to mine strong importance and association text features in each type, select these
text features as input to the neural networks. Then, acquire the high-level features with a strong
ability of categorical representation through the convolution layer for improving the accuracy of model
classification [5]. L. Fu et al. proposed that the effectiveness of such techniques has not been assessed
for the hierarchical text classification (HTC) yet. This study investigates the application of those models
and algorithms on this specific problem by means of experimentation and analysis. They trained
classification models with prominent machine learning algorithm implementations—fastText, XGBoost,
SVM (Support Vector Machines), and Keras’ CNN—and noticeable word embeddings generation
methods—GloVe, word2vec, and fastText—with publicly available data and evaluated them with
measures specifically appropriate for the hierarchical context. FastText achieved an LCAF1 (Least
Common Ancestors) of 0.893 on a single-labeled version of the RCV1(Ruters database) dataset [6].
Jin, R. proposed a new architecture of CNN based on multiple representations for text classification,
by constructing multiple planes so that more information can be dumped into the networks, such as
different parts of text obtained through a named entity recognizer or part-of-speech tagging tools,
different levels of text embedding, or contextual sentences. Various large-scale, domain-specific
datasets are used to validate the proposed architecture. It can obtain further gains in performance
over state-of-the-art deep neural network models [7]. Wang, Lixia et al. proposed a method of
CNN-based text classification for power failure in response to the characteristics of power failure
text, input the processed data set information into this classification model to classify the short
text of power failure. Experiments showed that the accuracy of the proposed classification model
on the data set can reach 88.35%, and better classification effects were achieved [8]. Zhang T et al.
mainly discussed the use of CNN to extract features from the comment text of blogs and shopping
websites. The author believed that reasonable use of such information could help understand public
opinions, and respond in a timely manner, help distributors improve product quality and service levels,
and enable consumers to understand the commodities. The final experiment showed effectiveness
of the method [9]. Xiaoli Zhao et al. proposed a dual-input convolutional neural network structure
in response to the phenomenon that more and more depression patients use Weibo as a way of
self-expression nowadays. This method could take the external features and semantic features of the
text as input and compare the accuracy of algorithm classification through SVM and CNN. The final
experiment showed that CNN could further improve the accuracy of classification. The other is research
on text classification methods based on the CNN mixed model [10]. L. Fu et al. proposed an effective
text classification framework. This framework is a CNN–BLSTM (Long Short Term Memory) network
that mixes character-level and word-level features with different weights through content-based
concatenation, which overcomes the difference in semantic relations in Chinese in Chinese word
segmentation. This leads to the problem of ambiguity in word segmentation, and the proposed method
makes up for such problems [11]. Zhang et al. proposed a CNN–LSTM–Attention coordination model.
The author learned the vector representation of sentences in the CCLA(CNN-LSTM-Attention) unit.
The semantic and emotional information of the sentences and their relationships are adaptively encoded
into the vector representation of the document. Compared with other methods, the CCLA model can
capture local and long-distance semantic and emotional information very well, and the effectiveness of
the model is proven through experiments [12]. Kai Chen et al. proposed a compact CNN–DBLSTM
Future Internet 2020, 12, 228 3 of 14
(Long Short Term Memory) model. It has fewer parameters and low computational cost and can
adapt to multiple receiving fields extracted based on CNN features. The author trained the compact
CNN–DBLSTM by using the training set of the popular benchmark database, and finally combined this
character model with the character trigram language model [13]. Usama, Mohd et al. proposed a new
recurrent convolutional attention neural model for sentiment classification of the short text by using
the attention mechanism with a recurrent convolutional neural network (RCNN). In the proposed
model attention score is calculated by averaging hidden units (feature maps) generated from long
short-term memory (LSTM). Then we combined this attention score with recurrent convolution-based
encoded text features to obtain final sentence representation. Here, attention will be focused on
important text features, and recurrent convolution makes full use of limited contextual information
by processing sentence representation through different window sizes with specialized recurrent
convolution operation [14]. She, Xiangyang proposed an algorithm that uses the Skip-Gram (continuous
skip-gram) model and the continuous bag-of-words (CBOW) model in word2vec to represent words as
vector, using CNN to extract local features of text, LSTM saves historical information, extracts contextual
dependencies of text, and uses the feature vector output by CNN as the input of LSTM, using the
Softmax classifier for classification [15]. Guo, Long et al. proposed a hybrid CNN–RNN attention-based
neural network, named CRAN, which combines the convolutional neural network and recurrent neural
network effectively with the help of the attention mechanism. We validate the proposed model on
several large-scale datasets and compare it with the state-of-the-art models. Experimental results show
that CRAN can achieve state-of-the-art performance on most of the datasets. In particular, CRAN yields
better performance with much fewer parameters compared with a very deep convolutional networks
with 29 layers, which proves its effectiveness and efficiency.
Machine learning, especially deep learning, model interpretability [16] has increasingly become
the focus of academic research and industrial applications, and also has become a key issue for
further development and application of deep learning technology. Therefore, we recommend using the
interpretability of the model to analyze the CNN classification model and propose an analysis method
for interpretability of the CNN text classification model. The greatest achievements of this paper are
as follows:
• The analysis method for interpretability of the CNN text classification model. The method
proposed by us can perform multi-angle analysis on the discriminant results of multi-classified
text and multi-label classification tasks through backtracking analysis on model prediction results.
• Using the data visualization technology to display model analysis results. Finally, the method
proposed by us can display the analysis results of the model using visualization technology from
multiple dimensions based on interpretability.
The rest of this paper is arranged as follows. Section 2 introduces the analysis method that we
recommend to use for interpretability of the CNN text classification model. Section 3 introduces how
to use visualization technology to display and analyze the model analysis results. Section 4 evaluates
our method through experiments. Section 5 discusses the related work, and summarizes the full text.
the classification layer is connected according to the actual classification number, which can be two-
class or multi-class.
Future Internet 2020, 12, 228 5 of 14
It shall be noted that when it is required to perform multi-classification tasks at the classification
layer, the number of target labels for multi-label classification is generally uncertain. For this reason,
we recommend
unit determininglayer
before the classification a target label global
through S0. When predicting,
maximum it is still
pooling. the kthe
Finally, classes with the layer
classification largestis
output scores,
connected but only
according classesclassification
to the actual with scores greater than
number, S0 are
which canoutput using S0
be two-class or as the threshold.
multi-class.
The backtracking
3.1. Visualization analysis
Diagram in the previous
of Comment Weight section provided a quantitative value of the importance
of the model input vector to the predicted results. In response to the classification results of a text,
The backtracking
the importance analysis
vector matrix was in the previous
standardized and section provided
normalized, and thea importance
quantitativeofvalue
keywordsof the
in
importance of the model input vector to the predicted results. In response to the classification
the text was mapped to the RGB color value between 0 and 255. The Keyword -RGB mapping table as results
of a text,
shown the importance
in Figure 4 was made. vector matrix was
The vertical standardized
axis of the diagramand normalized,
corresponds andidentifier
to each the importance of
of the text,
keywords in the text was mapped to the RGB color value between 0 and 255. The Keyword
and the horizontal axis corresponds to each dimension of distributed representation of the identifier. -RGB
mapping
The color table
valueasindicates
shown inthe Figure 4 was made.
importance The vertical
of words axis In
in the text. of the
thisdiagram
way, wecorresponds to each
could understand
identifier ofofthe
distribution text, andthrough
importance the horizontal
the diagram axisas corresponds
a whole. The to texteach dimensionbarofondistributed
identification the left of
representation of the identifier. The color value indicates the importance of words in the text. In this
way, we could understand distribution of importance through the diagram as a whole. The text
Future Internet 2020, 12, 228 7 of 14
Future Internet 2020, 12, x FOR PEER REVIEW 7 of 14
Figure
Figure 4.
4. Keyword–RGB
Keyword–RGB mapping
mapping table.
table.
3.2. Comments
3.2. Comments on
on Comprehensive
Comprehensive Analysis
Analysis Diagram
Diagram
Through the
Through thedatadataobtained
obtainedfrom the experiment,
from we designed
the experiment, a comprehensive
we designed analysis diagram,
a comprehensive analysis
which
diagram,includes
whicha word relevance
includes a word analysis graph,
relevance a positive
analysis high afrequency
graph, positive vocabulary
high frequencygraph,vocabulary
a negative
high frequency
graph, a negative vocabulary graph and
high frequency a comprehensive
vocabulary graph and analysis text of the data.
a comprehensive Sincetext
analysis there
of are
the many
data.
words, the are
Since there words
many canwords,
be filtered by setting
the words can bethe weight
filtered bythreshold. After analysis,
setting the weight threshold.it was
Aftersuggested
analysis,
to was
it control the threshold
suggested within
to control the the range of
threshold (0.73,the
within 0.85).
rangeMeanwhile, taking
of (0.73, 0.85). into account
Meanwhile, thatinto
taking the
weights of words in different comments are different, we recommend the use of the
account that the weights of words in different comments are different, we recommend the use of the formula shown in
Equationshown
formula (3) for in
weighted
Equationcalculation of wordcalculation
(3) for weighted weights, where Wi represents
of word the weight
weights, where value in the
Wi represents
text, the MaxW represents the maximum weight value, and the drawing of
weight value ini the text, the MaxWi represents the maximum weight value, and the drawing a comprehensive analysis
of a
diagram based on the calculation results.
comprehensive analysis diagram based on the calculation results.
∗ i
∑i=0 A∗W
PN
MaxWi (3)
WA = (3)
N
In the comprehensive analysis diagram,
diagram, we can see the classification list of positive comments
and negative comments obtained after the model analysis. After clicking on the comment title in the
list, you will
will enter
enterthe
thedetailed
detailedanalysis
analysispage
page
ofof
thethe currently
currently clicked
clicked comment.
comment. This
This page
page includes
includes the
the comment
comment keyword
keyword bubble bubble
chart, chart,
commentcomment
keyword keyword
statisticsstatistics chart, comment
chart, comment mark textmark text and
and intelligent
intelligent analysis
analysis text. It can text. It can effectively
effectively help users help users
analyze theanalyze
analysisthe analysis
results results
of the of the model.
model.
4. Experimental
4. Experimental Design
Design and
and Result
Result Analysis
Analysis
4.1. Experiment
4.1. Experiment Environment
Environment
The experimental
The experimental model
model and
and data
data processing
processing in
in this
this paper
paper were
were completed based on
completed based on Python’s
Python’s
Keras, and the word embedding model used was word2vec.
Keras, and the word embedding model used was word2vec.
Future Internet 2020, 12, 228 8 of 14
Future Internet 2020, 12, x FOR PEER REVIEW 8 of 14
Data set
Figure 5. Data set Word Frequency Dictionary (part).
The CNN text classification model was iterated for 30 rounds, as shown in Figure 7; the
horizontal axis represents the number of iterations, the vertical axis represents the accuracy in the left
figure, and the vertical axis represents the loss in the right figure. For training and testing the process
model accuracy and loss of the function curve, we conducted the test ten times. Through the test we
can see from the picture that when the model is the eighth round of iteration the model is in the best
condition, when the number of iterations is more than 10 rounds, model validation set loss gradually
increased as a result, so we suggest the model number of iterations in round 8 for the best control.
One of the fundamental Figure 6.
Figure problems
6. Structureindiagram
machine learning
convolutional
of convolutional is theneural
contradiction
neural network. between optimization
network.
and generalization. In this paper, multi-round iteration is used to train the model. In order to prevent
The
The CNNCNNtext
the overfitting of classification
text
the classification
model, the model
modelwas was
optimal iterated for
iterated
iteration 30for
times rounds,
of30the as shown
rounds,
model as inshown
can Figure 7; the
in
be determined. horizontal
Figure 7; the
In
axis represents
horizontal axis the number
represents theof iterations,
number of the vertical
iterations, the axis represents
vertical axis
experiment, after 30 rounds of training, we found that the model began to overfitting after 8–10 the accuracy
represents the in the
accuracy left
in figure,
the left
and theand
figure,
rounds, vertical
andthe axis
setrepresents
vertical
then axisnumber
the the loss
represents in the
the lossright
of iterations intothe8figure.
right For training
figure.
to retrain theFor and Since
testing
training
model. andthe theaccuracy
process
testing model
the process
of the
accuracy and
model accuracy loss of
and the
loss function
of the curve,
function we conducted
curve, we the
conducted test ten
the times.
test ten
affects the interpretability of the model, the prediction results obtained by using the modelThrough
times. the
Throughtest we
the can
test see
we
from the
can see
with picture
thefrom that when
the picture
required accuracy the
thataremodel
whenmore is the eighth
themeaningful,
model is thethusround of
eighth iteration
theround the model
of iteration
interpretability is in the best
thevisualization
and condition,
model is in the best
results
when the number
condition,
obtained bywhen
model ofbacktracking
the iterations
number ofisiterations
more thanis
analysis are10 rounds,
more thanmodel
meaningful. In validation
10 rounds, modelset
the process loss gradually
validation
of model increased
set loss
training, gradually
the above as
aincreased
result, so we
as a suggest
result, sothe
we model
suggest number
the of
model
methods are used to adjust the model to prevent overfitting.iterations
number in
ofround 8
iterations forinthe best
round control.
8 for the best control.
One of the fundamental problems in machine learning is the contradiction between optimization
and generalization. In this paper, multi-round iteration is used to train the model. In order to prevent
the overfitting of the model, the optimal iteration times of the model can be determined. In the
experiment, after 30 rounds of training, we found that the model began to overfitting after 8–10
rounds, and then set the number of iterations to 8 to retrain the model. Since the accuracy of the
model affects the interpretability of the model, the prediction results obtained by using the model
with the required accuracy are more meaningful, thus the interpretability and visualization results
obtained by model backtracking analysis are meaningful. In the process of model training, the above
methods are used to adjust the model to prevent overfitting.
One of the fundamental problems in machine learning is the contradiction between optimization
and generalization. In this paper, multi-round iteration is used to train the model. In order to prevent
the overfitting of the model, the optimal iteration times of the model can be determined. In the
experiment, after 30 rounds of training, we found that the model began to overfitting after 8–10 rounds,
and then set the number of iterations to 8 to retrain the model. Since the accuracy of the model affects
the interpretability of the model, the prediction results obtained by using the model with the required
accuracy are more meaningful, thus the interpretability and visualization results obtained by model
backtracking analysis are meaningful.
Figure In diagram
7. Structure the process of model training,
of convolutional the above methods are used to
neural network.
adjust the model to prevent overfitting.
(B) Backtracking analysis model
Future Internet 2020, 12, x FOR PEER REVIEW 10 of 14
4.4. Visual
4.4. Visual Analysis
Analysis of
of Experimental
Experimental Results
Results
In order
In ordertoto evaluate
evaluateand and
analyze modelmodel
analyze interpretability in more detail,
interpretability in more we used
detail,data
wevisualization
used data
visualization technology. According to the weight matrix obtained by the backtracking model4.3,
technology. According to the weight matrix obtained by the backtracking model in Section in
we performed
Section 4.3, we standardization and normalization,
performed standardization andmapped the importance
normalization, mapped of the
the keywords
importance in the
of text
the
to the RGBin
keywords color
the value
text tobetween
the RGB0color and 255,
value after that, all
between 0 the
andweight values
255, after that,are
alluniformly
the weightcompressed
values are
between (0, 1). Then we use the matlibplot library in python
uniformly compressed between (0, 1). Then we use the matlibplot library in python to visually display the weight matrix
to visually
and obtained
display the visualized
the weight matrix anddiagram of thethe
obtained word embedding
visualized weightof(Figure
diagram 9). Since
the word the current
embedding text
weight
consists of 131 words, the first 369 words of the text were filled text. The horizontal
(Figure 9). Since the current text consists of 131 words, the first 369 words of the text were filled text. axis of the diagram
corresponded
The horizontaltoaxis eachof identifier
the diagram of the text, and thetovertical
corresponded axis corresponded
each identifier of the text,toandeachthedimension
vertical axisof
distributed representation
corresponded of the identifier.
to each dimension The color
of distributed value indicated
representation of thethe identifier.
importanceThe of words
color in the
value
text. At the same time, in order to further observe effects of the model, we integrated
indicated the importance of words in the text. At the same time, in order to further observe effects of the embedding
layer
the of thewe
model, word according
integrated thetoembedding
calculationlayer
of theofquantitative value, and
the word according tomagnified
calculationtheof result by 5 times
the quantitative
for display (Figure 10). From the figure, we can see that the word
value, and magnified the result by 5 times for display (Figure 10). From the figure, we can seeweight at two positions inthat
the
comment had the darkest color. The area with the darkest color was between
the word weight at two positions in the comment had the darkest color. The area with the darkest 461 words and 490 words,
and the
color was second
between was between
461 words and 381 words and and
490 words, 397 words.
the secondAccording
was betweento the381
comparison
words andbetween
397 words. the
results and the original text (Figure 11), we can find that two most important
According to the comparison between the results and the original text (Figure 11), we can find that parts affecting model
prediction expressedparts
two most important positive comments
affecting modelon the movie.
prediction In the experiments,
expressed positive comments we randomly sampled
on the movie. In
Future Internet2020,
Future 2020, 12,xxFOR
FOR PEERREVIEW
REVIEW 11 of14
14
FutureInternet
Future Internet 2020,12,
Internet 2020, 12, x FORPEER
12, 228 PEER REVIEW 11
11of
11 of 14
of 14
theexperiments,
the
the experiments,we
experiments, werandomly
we randomlysampled
randomly sampled1000
sampled 1000pieces
1000 piecesof
pieces ofdata
of datafrom
data fromthe
from thedataset;
the dataset;the
dataset; theresults
the resultsare
results arebased
are basedon
based on
on
the
1000
the interpretability
the interpretability
interpretability model
pieces of data model
from
modelthe (including
(including
(including weight
dataset; weight
the
weight and
results
and weight
andare based
weight
weight visual
on the
visual
visual figure), the two
interpretability
figure),
figure), the
the two authors judged
modeljudged
two authors
authors the
(including
judged the
the
emotional
weight
emotionaland disposition
weight
emotional disposition
disposition of of
visual 1000
figure),
of 1000 comments
the
1000 commentstwo and
authors
comments and got
and got a Kappa
judged the
got aa Kappa coefficient
emotional
Kappa coefficient of 0.67,
disposition
coefficient of of 0.67, that
of
0.67, that indicates
1000 high
comments
that indicates
indicates high
high
internal
and
internal
internal consistency,
got consistency,
aconsistency, and both
Kappa coefficient
and
and both
both authors
of authors
0.67,
authors have
thathave
have high accuracy
indicates
high
high accuracy
high
accuracy in judging
internal
in
in judging the results,
consistency,
judging the
the results,
and both
results, thusauthors
thus
thus proving
proving
proving the
have
the
the
effectiveness
high
effectiveness
effectiveness ofinthe
accuracyof
of the
the proposed
judging
proposed
proposed method.
the results,
method.
method. thus proving the effectiveness of the proposed method.
Figure 9.Visualization
Figure Visualization ofthe
the wordsembedding
embedding weight14_8.
14_8.
Figure9.
Figure 9. Visualization of
9. Visualization of thewords
of the words embeddingweight
words embedding weight 14_8.
weight 14_8.
Figure 10.Visualization
Figure Visualization ofthe
the wordsweight
weight 14_8.
Figure10.
Figure 10. Visualization of
10. of the words
words weight14_8.
words weight 14_8.
14_8.
Figure12.
Figure
Figure 12.Comprehensive
12. Comprehensive analysis
Comprehensive diagram
analysis diagram of
diagramof IMDB
ofIMDB dataset.
IMDBdataset.
dataset.
Figure13.
Figure
Figure 13.Detailed
13. Detailed analysis
Detailed analysis of
analysis of IMDB
ofIMDB comments.
IMDBcomments.
comments.
Future Internet 2020, 12, 228 13 of 14
5. Discussion
Text sentiment classification has always been one of the important tasks in natural language
processing. The visualization method is very important to the interpretability of the model, but at
present, there is little research on the interpretability of the visualization, especially the research on
the interpretability of the neural network model based on word embedding. This article attempts to
use the backtracking analysis method to conduct in-depth analysis and research on the deep learning
model and use the visualization method to demonstrate interpretability of the deep learning model in
multiple dimensions. First, this paper proposes an analysis method for interpretability of the CNN
text classification model. Construct the CNN text classification model, perform training and testing
through the IMDB data set, track the category label obtained from the CNN text classification model
through reverse backtracking by using the backtracking analysis model to the important factors that
affect the prediction results, and finally, perform overall analysis on interpretability of the model
through a visualization method. After verification by instances, the method proposed in this paper
achieved the expected effects and realized reasonable interpretation of classification results of the
text classification model. At the same time, our experiment also has limitations. The data source has
limitations, although the method proposed in this paper can be applied to multi-classification problems,
but only uses IMDB data set for verification, and we did not perform experimental verification
on multi-classification problems and text data sets of different lengths. Next, we suggest that an
interpretation method can be used to develop an evaluation tool for a deep neural network model.
This tool can learn information in multiple perspectives, such as knowledge representation, input and
output, etc. from the network, and evaluate robustness and generalization ability of the model through
a large number of experiments. This will further improve trustworthiness of the model. Meanwhile,
our model also has limitations. This paper mainly studies the interpretability of based on the CNN
text classification model, excluding other types of models such as RNN. Next, we can also integrate the
time series model RNN, establish a complete set of evaluation criteria for deep learning interpretability
based on experiments, so that users truly feel that the decision results of the deep learning model are
reasonable and credible.
Author Contributions: Data curation, B.T.; Formal analysis, P.C.; Funding acquisition, B.T.; Resources, B.T.;
Visualization, P.C.; Writing—original draft, P.C.; Writing—review & editing, B.T. All authors have read and agreed
to the published version of the manuscript.
Funding: This research received no external funding.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Li, Y.; Wang, X.; Xu, P. Chinese Text Classification Model Based on Deep Learning. Future Internet 2018,
10, 113. [CrossRef]
2. Kim, Y. Convolutional neural networks for sentence classification. arXiv 2014, arXiv:1408.5882.
3. Kim, H.; Jeong, Y.-S. Sentiment Classification Using Convolutional Neural Networks. Appl. Sci. 2019,
9, 2347. [CrossRef]
4. Wu, Y.; Li, J.; Song, C.; Chang, J. High Utility Neural Networks for Text Classification. Tien Tzu Hsueh Pao/Acta
Eletronica Sin. 2020, 48, 279–284.
5. Stein, R.A.; Jaques, P.A.; Valiati, J.F. An analysis of hierarchical text classification using word embeddings.
Inf. Sci. 2019, 471, 216–232. [CrossRef]
6. Jin, R.; Lu, L.; Lee, J.; Usman, A. Multi-representational convolutional neural networks for text classification.
Neural Comput. Appl. 2019, 35, 599–609. [CrossRef]
7. Wang, L.; Zhang, B. Fault Text Classification Based on Convolutional Neural Network. In Proceedings of
the 2020 IEEE 7th International Conference on Industrial Engineering and Applications (ICIEA), Bangkok,
Thailand, 16–21 April 2020; pp. 937–941. [CrossRef]
Future Internet 2020, 12, 228 14 of 14
8. Zhang, T.; Li, C.; Cao, N.; Ma, R.; Zhang, S.; Ma, N. Text Feature Extraction and Classification Based on Convolutional
Neural Network (CNN); Zou, B., Li, M., Wang, H., Song, X., Xie, W., Lu, Z., Eds.; Data Science; ICPCSEE 2017;
Communications in Computer and Information Science; Springer: Singapore, 2017; Volume 727.
9. Zhao, X.; Lin, S.; Huang, Z. Text Classification of Micro-blog’s “Tree Hole” Based on Convolutional Neural
Network. In Proceedings of the 2018 International Conference on Algorithms, Computing and Artificial
Intelligence (ACAI’18), Sanya, China, 21–23 December 2018; pp. 1–5.
10. Fu, L.; Yin, Z.; Wang, X.; Liu, Y. A Hybrid Algorithm for Text Classification Based on CNN-BLSTM with
Attention. In Proceedings of the 2018 International Conference on Asian Language Processing (IALP),
Bandung, Indonesia, 15–17 November 2018; pp. 31–34. [CrossRef]
11. Zhang, Y.; Zheng, J.; Jiang, Y.; Huang, G.; Chen, R. A Text Sentiment Classification Modeling Method Based
on Coordinated CNN-LSTM-Attention Model. Chin. J. Electron. 2019, 28, 120. [CrossRef]
12. Chen, K.; Tian, L.; Ding, H.; Cai, M.; Sun, L.; Liang, S.; Huo, Q. A Compact CNN-DBLSTM Based
Character Model for Online Handwritten Chinese Text Recognition. In Proceedings of the 2017 14th IAPR
International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November
2017; pp. 1068–1073. [CrossRef]
13. Usama, M.; Ahmad, B.; Singh, A.P.; Ahmad, P. Recurrent Convolutional Attention Neural Model for
Sentiment Classification of short text. In Proceedings of the 2019 International Conference on Cutting-Edge
Technologies in Engineering, Icon-CuTE, Uttar Pradesh, India, 14–16 November 2019; pp. 40–45.
14. She, X.; Zhang, D. Text Classification Based on Hybrid CNN-LSTM Hybrid Model. In Proceedings of the
2018 11th International Symposium on Computational Intelligence and Design, ISCID, Hangzhou, China,
8–9 December 2018; pp. 185–189.
15. Guo, L.; Zhang, D.; Wang, l.; Wang, H.; Cui, B. CRAN: A hybrid CNN-RNN attention-based model for text
classification. In Proceedings of the 37th International Conference, ER 2018, Xi’an, China, 22–25 October
2018; pp. 571–585.
16. Fei, W.; Bingbing, L.; Yahong, H. Interpretability for Deep Learning. Aero Weapon. 2019, 26, 39–46. (In Chinese)
17. Huiping, C.; Lidan, W.; Shukai, D. Sentiment classification model based on word embedding and CNN.
Appl. Res. Comput. 2016, 33, 2902–2905.
18. Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Con-volutional Networks: Visualising Image
Classification Models and Saliency Maps. arXiv 2013, arXiv:1312.6034.
19. Dosovitskiy, A.; Brox, T. Inverting Visual Representations with Convolutional Networks. In Proceedings of
the IEEE Conference on Computer Vision and Pattern Recog-nition, LasVegas, NV, USA, 27–30 June 2016;
pp. 4829–4837.
20. Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-CAM: Visual Explanations
from Deep Networks via Gradient-Based Localization. In Proceedings of the 2017 IEEE International
Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional
affiliations.
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).