sentiment words appearing in Chinese documents well, and it can
also automatically construct high-quality training label corpus,
and finally, using some manually labeled corpus and combining
these corpus automatically annotated with the algorithm to form
the training corpus of the deep learning algorithm.Thereby solving
the problem of lack of training corpus and sentiment word
extraction, and discriminating the sentiment tendency of
documents.At the same time, the performance of the algorithm is
verified by the combination of manual labeling and automatic
labeling. The Accuracy index is used to verify the effectiveness of
the method.
The structure of this paper is designed as follows: The second part
briefly describes the research work related to this paper in the
field of sentiment analysis; the third part gives a detailed overview
of the proposed sentiment orientation classification algorithm; the
fourth part will verify the performance of the algorithm through
comparative experiments; the fifth part gives the summary and
outlook.
2. Related research work
The analysis of text sentiment orientation is mainly based on the
content of the text to determine the sentiment tendency of the text.
This technique has applications in many fields. The existing
research results are mainly concentrated in the following aspects:
(1)Sentiment analysis based on dictionaries and rules.
(2)Sentiment analysis using statistical machine learning.
(3)Combine dictionaries, rules, and statistical machine
learning methods to perform sentiment analysis together.
(4)Using a deep learning algorithm for sentiment analysis.
The dictionary-based and rule-based approach is mainly to collect
and construct sentiment dictionary, combined with some specific
rules to achieve text sentiment analysis.
Rao D [1] , Hatzivassiloglou V [2] and Wiebe [3] respectively
explored a variety of sentiment word extraction methods; Turney
[4] proposed a word classification algorithm based on point
mutual information. The commonly used dictionary refers to
HowNet and WordNet. In Chinese, Zhu Xi et al [5] based on
HowNet combined with semantic similarity to determine the
sentiment tendency of words; Zhao Q [6] uses sentiment
vocabulary combined with weighted combination to classify
sentences in sentiment polarity; The Chinese analysis system
NLPIR developed by Kevin Zhang [7] has realized the calculation
of text sentiment scores and the extraction of positive and
negative sentiment words in the sentiment analysis module by
constructing a weighted sentiment dictionary combined with
machine learning algorithms.
The statistical machine learning method mainly uses the
artificially labeled training data set to extract the features of the
text through supervised learning to construct the sentiment
classifier, and then uses the classifier to classify the analyzed text
for sentiment orientation. Pang [8] combines the word bag and the
N-gram model to distinguish the film reviews into positive and
negative categories; Szalay [9] constructs an sentiment orientation
classifier by selecting some important features and combining a
variety of different machine learning algorithms.
Li T [10] and Melville P [11] proposed that combining the
sentiment dictionary and some annotated corpus to train the
classifier can effectively compensate for the deficiencies of the
two alone; He Y [12] divides sentiment analysis into two steps:
firstly use the sentiment dictionary to make initial judgment on the
text sentiment tendency, then use this result to modify the new
classifier, and finally use the modified classifier to classify the
sentiment tendency of the text; Hot West Danmu Tuolhong Tai
[13] used the idea of combining dictionary and machine learning
methods to analyze the sentiment orientation of the text.
Deep learning algorithm is used to solve the problem of sentiment
analysis. Since there is no need for manual feature selection, a
large number of feature extraction can be reduced. Socher R [14]
uses the recursive self-encoder (RAE) tree regression model and
Chen T [15] uses BiLSTM-CRF and CNN to classify the
sentimental tendencies of sentences in the text; Shin B [16] and
Ouyang X [17] used CNN and Attention combined with different
strategies to classify the sentimental tendencies of the text content.
Liu B [18] and Pang B [19] respectively proposed effective
solutions in text view mining and sentiment analysis. By sorting
out and summarizing the above research results, they got relevant
inspiration: It is considered that it is feasible to classify text
sentiment tendency by combining dictionary and deep learning
algorithm. Therefore, the classification algorithm of Chinese
sentiment orientation based on dictionary and LSTM is proposed.
3.Overview algorithm principle
The algorithm uses the separately constructed sentiment
dictionary combined with the sentiment score calculation method
of the sentiment word, which can extract the sentiment words
appearing in the text well, and construct the annotation corpus for
model training based on the sentiment score and the automatic
labeling algorithm. Finally, by combining a small number of
manually labeled corpora and a large-scale annotated corpus
automatically constructed by the algorithm, the training and test
corpus of the deep learning algorithm are formed together.
Therefore, the problem of large-scale annotated corpus deficiency
and sentiment word extraction and classification of document
sentiment tendencies is better solved. The overall framework of
the algorithm is shown in Figure 1.
As can be seen from Fig. 1, the algorithm is mainly divided into
three stages: model training and sentiment dictionary construction
and model testing: The model training parts mainly consists of
several core steps: loading training data and sentiment dictionary,
sentiment score calculation, automatic labeling training corpus,
fusion manual corpus, deep learning model training and
preservation; The sentiment dictionary construction stage is to
complete the construction task of the new sentiment dictionary by
collating and collecting the basic sentiment dictionary; The model
testing parts mainly includes the steps of loading the classification
model and the sentiment dictionary, the sentiment word extraction
step, and the sentimentorientation classification step. The above
steps are detailed in Sections 3.1 and 3.2.
评论0