没有合适的资源?快使用搜索试试~ 我知道了~
(EI收录+web of sicence核心集合收录)Classification Algorithmof Chinese Se
试读
8页
需积分: 0 0 下载量 59 浏览量
更新于2022-08-03
收藏 403KB PDF 举报
《基于词典与LSTM的中文情感分析分类算法》
在信息分析领域,中文情感分析是一个备受关注的研究问题。然而,可用于机器学习算法训练的标注语料库质量并不理想。传统机器学习方法进行文本情感分类时,通常仅给出类别划分,而无法提取出具体的情感词汇。本文提出了一种自动标注策略,用于构建训练语料库,并结合词典和长短期记忆(LSTM)神经网络,设计了一种中文情感倾向分类算法,旨在实现对语料库的精准、高效自动标注,同时能提取出情感词汇。
实验结果显示,该方法在混合情感分类数据集上的准确率达到了93.51%,验证了该方法的有效性。此研究涉及的主要概念包括计算机应用中的文档管理和文本处理,以及文档捕获和文档分析。
**关键词:**情感分析;自动标注;长短期记忆神经网络
1. **引言**
随着互联网、社交媒体及电子商务平台的快速发展,短时间内产生了大量带有情感倾向的用户文本数据。近年来,情感分析技术在舆情监控、产品评价、客户服务等领域扮演着越来越重要的角色。然而,建立高质量的情感分析模型面临的主要挑战之一是缺乏大规模且准确标注的训练数据。
2. **方法**
本研究首先引入一种自动标注策略,通过结合专业情感词典,对未标注文本进行情感倾向的自动标记,以生成训练语料。这一过程减少了人工标注的工作量,提高了标注效率。
接着,利用长短期记忆网络(LSTM)这一深度学习模型进行情感分类。LSTM是一种递归神经网络的变体,尤其擅长处理序列数据中的长期依赖问题,这在理解和捕捉文本中的情感线索方面非常有用。
3. **实验与结果**
实验部分,研究人员将提出的自动标注和LSTM结合的算法应用于混合情感分类数据集,评估其性能。结果显示,该算法在情感分类上的准确性显著高于传统方法,达到了93.51%的高精度,表明该方法在情感分析任务中具有强大的表现力。
4. **讨论**
本研究的成功在于结合了词典资源的先验知识和LSTM的动态学习能力,有效解决了情感词汇的提取问题,增强了模型的泛化能力。但同时,这种方法可能对特定领域或特定类型的文本适应性有待进一步检验。
5. **未来工作**
未来的研究方向可以扩展到多模态情感分析,结合图像、语音等其他非文本信息进行综合分析。此外,还可以探索更先进的深度学习模型,如Transformer或BERT等预训练模型,以提高情感分析的准确性和鲁棒性。
6. **结论**
基于词典和LSTM的中文情感分析分类算法为处理大规模无标注文本提供了一种新的解决方案,对于提升情感分析的效率和准确性具有重要意义,对实际应用具有广泛的应用前景。
Classification Algorithmof Chinese SentimentOrientation
Based on Dictionary and LSTM
Ge Bin
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
gebin@nudt.edu.cn
He Chunhui*
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
xtuhch@163.com
Zhang Chong
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
leocheung8286@qq.com
Hu Yanli
Science and Technology on InformationSystems
Engineering Key Laboratory , National University of
Defense Technology,Changsha, Hunan, P.R. China,
410073
huyanli@nudt.edu.cn
ABSTRACT
Chinese sentiment analysis is a hot research issue in information
analysis, but the tagging corpus which can be used for machine
learning algorithm training is poor. Machine learning algorithm is
used for text sentiment classification, generally only categories
are given while sentiment words can not be extracted. This paper
proposed an automatic tagging strategy for training corpus and a
classification algorithm for Chinese sentiment orientation based
on dictionary and LSTM. It can label the training corpus
automatically and accurately and efficiently, and also extract
sentiment words. Experiment shows this method is effective and
the accuracy of LSTM algorithm has reached 93.51% on the
mixed data set of sentiment classification.
CCS Concepts
Applied computin ➝Document management and text
processing ➝ Document capture ➝ Document analysis.
Keywords
Sentiment Analysis; Automatic Annotation; Long Short-Term
Memory Neural Network
1. INTRODUCTION
With the rapid development of the Internet and social media and
e-commerce platforms, a large number of users have generated a
large amount of text data with sentimental tendencies in a short
period of time. In recent years, using these text data to mine
hidden negative or positive sentiment tendencies has become a
very valuable research direction in the field of natural language
processing, and a lot of research results have been obtained.
Through the induction and analysis of relevant literature, it is
found that the current mainstream text sentiment analysis methods
mainly include sentiment analysis based on the sentiment database
and template rule base, or statistically based methods using
artificially labeled corpus to train machine learning algorithms.
Then the trained algorithm or model is used to classify the
sentiment tendencies of the text. In the process of sentiment
analysis techniques and theoretical development, these two
methods often infiltrate each other, prompting the sentiment
analysis technology to continue to advance. Especially in the
sentiment analysis of English, the researchers have put forward
many efficient algorithms and mature tools. However, for Chinese
sentiment analysis, the start is relatively late, and Chinese is still
facing problems and challenges such as the lack of large-scale
annotated data sets.
With the deep maturity of deep learning techniques and
frameworks, some researchers have proposed to use deep neural
network algorithms to deeply mine the sentiment tendencies in
text.Although this method can greatly improve the performance of
the algorithm under certain premise, it also has some
shortcomings. The premise is that a large amount of labeled
training data is needed as the input of the algorithm.Considering
the fact that the high-quality labeling training corpus available in
Chinese is particularly lacking, this is a major challenge for
Chinese sentiment analysis;Second, such methods can only
classify text sentiment tendencies.They don’t give sentiment
words that appear in the document, which is not friendly for many
fine-grained Chinese sentiment analysis tasks.At the same time,
they are often impossible to explain the classification results of
sentiment orientation.
In order to better solve the above deficiencies, this paper proposes
a classification algorithm of Chinese sentiment orientation based
on dictionary and long-term and short-term memory neural
network(LSTM).This algorithm combined with the sentiment
dictionary and sentiment score calculation method can give the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
for components of this work owned by others than ACM must be
honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior
specific permission and/or a fee.
Request permissions from Permissions@acm.org.
ICBDR 2018, October 27–29, 2018, Weihai, China
© 2018 Association for Computing Machinery.
978-1-4503-6476-8/18/10…$15.00
DOI: http://dx.doi.org/10.1145/3291801.3291835
119
sentiment words appearing in Chinese documents well, and it can
also automatically construct high-quality training label corpus,
and finally, using some manually labeled corpus and combining
these corpus automatically annotated with the algorithm to form
the training corpus of the deep learning algorithm.Thereby solving
the problem of lack of training corpus and sentiment word
extraction, and discriminating the sentiment tendency of
documents.At the same time, the performance of the algorithm is
verified by the combination of manual labeling and automatic
labeling. The Accuracy index is used to verify the effectiveness of
the method.
The structure of this paper is designed as follows: The second part
briefly describes the research work related to this paper in the
field of sentiment analysis; the third part gives a detailed overview
of the proposed sentiment orientation classification algorithm; the
fourth part will verify the performance of the algorithm through
comparative experiments; the fifth part gives the summary and
outlook.
2. Related research work
The analysis of text sentiment orientation is mainly based on the
content of the text to determine the sentiment tendency of the text.
This technique has applications in many fields. The existing
research results are mainly concentrated in the following aspects:
(1)Sentiment analysis based on dictionaries and rules.
(2)Sentiment analysis using statistical machine learning.
(3)Combine dictionaries, rules, and statistical machine
learning methods to perform sentiment analysis together.
(4)Using a deep learning algorithm for sentiment analysis.
The dictionary-based and rule-based approach is mainly to collect
and construct sentiment dictionary, combined with some specific
rules to achieve text sentiment analysis.
Rao D [1] , Hatzivassiloglou V [2] and Wiebe [3] respectively
explored a variety of sentiment word extraction methods; Turney
[4] proposed a word classification algorithm based on point
mutual information. The commonly used dictionary refers to
HowNet and WordNet. In Chinese, Zhu Xi et al [5] based on
HowNet combined with semantic similarity to determine the
sentiment tendency of words; Zhao Q [6] uses sentiment
vocabulary combined with weighted combination to classify
sentences in sentiment polarity; The Chinese analysis system
NLPIR developed by Kevin Zhang [7] has realized the calculation
of text sentiment scores and the extraction of positive and
negative sentiment words in the sentiment analysis module by
constructing a weighted sentiment dictionary combined with
machine learning algorithms.
The statistical machine learning method mainly uses the
artificially labeled training data set to extract the features of the
text through supervised learning to construct the sentiment
classifier, and then uses the classifier to classify the analyzed text
for sentiment orientation. Pang [8] combines the word bag and the
N-gram model to distinguish the film reviews into positive and
negative categories; Szalay [9] constructs an sentiment orientation
classifier by selecting some important features and combining a
variety of different machine learning algorithms.
Li T [10] and Melville P [11] proposed that combining the
sentiment dictionary and some annotated corpus to train the
classifier can effectively compensate for the deficiencies of the
two alone; He Y [12] divides sentiment analysis into two steps:
firstly use the sentiment dictionary to make initial judgment on the
text sentiment tendency, then use this result to modify the new
classifier, and finally use the modified classifier to classify the
sentiment tendency of the text; Hot West Danmu Tuolhong Tai
[13] used the idea of combining dictionary and machine learning
methods to analyze the sentiment orientation of the text.
Deep learning algorithm is used to solve the problem of sentiment
analysis. Since there is no need for manual feature selection, a
large number of feature extraction can be reduced. Socher R [14]
uses the recursive self-encoder (RAE) tree regression model and
Chen T [15] uses BiLSTM-CRF and CNN to classify the
sentimental tendencies of sentences in the text; Shin B [16] and
Ouyang X [17] used CNN and Attention combined with different
strategies to classify the sentimental tendencies of the text content.
Liu B [18] and Pang B [19] respectively proposed effective
solutions in text view mining and sentiment analysis. By sorting
out and summarizing the above research results, they got relevant
inspiration: It is considered that it is feasible to classify text
sentiment tendency by combining dictionary and deep learning
algorithm. Therefore, the classification algorithm of Chinese
sentiment orientation based on dictionary and LSTM is proposed.
3.Overview algorithm principle
The algorithm uses the separately constructed sentiment
dictionary combined with the sentiment score calculation method
of the sentiment word, which can extract the sentiment words
appearing in the text well, and construct the annotation corpus for
model training based on the sentiment score and the automatic
labeling algorithm. Finally, by combining a small number of
manually labeled corpora and a large-scale annotated corpus
automatically constructed by the algorithm, the training and test
corpus of the deep learning algorithm are formed together.
Therefore, the problem of large-scale annotated corpus deficiency
and sentiment word extraction and classification of document
sentiment tendencies is better solved. The overall framework of
the algorithm is shown in Figure 1.
As can be seen from Fig. 1, the algorithm is mainly divided into
three stages: model training and sentiment dictionary construction
and model testing: The model training parts mainly consists of
several core steps: loading training data and sentiment dictionary,
sentiment score calculation, automatic labeling training corpus,
fusion manual corpus, deep learning model training and
preservation; The sentiment dictionary construction stage is to
complete the construction task of the new sentiment dictionary by
collating and collecting the basic sentiment dictionary; The model
testing parts mainly includes the steps of loading the classification
model and the sentiment dictionary, the sentiment word extraction
step, and the sentimentorientation classification step. The above
steps are detailed in Sections 3.1 and 3.2.
120
剩余7页未读,继续阅读
资源推荐
资源评论
2022-08-03 上传
2022-08-04 上传
110 浏览量
185 浏览量
2018-04-05 上传
5星 · 资源好评率100%
153 浏览量
2021-03-18 上传
2021-04-04 上传
5星 · 资源好评率100%
178 浏览量
167 浏览量
5星 · 资源好评率100%
186 浏览量
131 浏览量
2021-02-03 上传
101 浏览量
116 浏览量
5星 · 资源好评率100%
5星 · 资源好评率100%
174 浏览量
5星 · 资源好评率100%
133 浏览量
140 浏览量
5星 · 资源好评率100%
资源评论
我要WhatYouNeed
- 粉丝: 48
- 资源: 287
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- MATLAB【面板】的语音滤波设计.zip
- MATLAB【面板】汉字语音识别.zip
- MATLAB【面板】汉字识别.zip
- MATLAB【面板】的运动行为检测.zip
- MATLAB【面板】火焰识别系统设计.zip
- MATLAB【面板】基于DWT+SVD结合傅里叶变换的数字图像水印水印系统.zip
- MATLAB【面板】火焰烟雾检测.zip
- MATLAB【面板】交通道路标识识别.zip
- MATLAB【面板】家居防火识别系统.zip
- MATLAB【面板】教室人数统计.zip
- MATLAB【面板】考勤系统设计.zip
- MATLAB【面板】金属表面缺陷分析.zip
- MATLAB【面板】口罩识别.zip
- slm 增材制造选区激光熔化SLM的粉床数值模拟 备注:资料一直在更新,不断完善,尽可能把所有的内容讲详细 1该模拟资料包含粉床建立部分(EDEM,和Gambit软件)以及模型模拟部分Flow
- MATLAB【面板】垃圾分类系统.zip
- MATLAB【面板】口罩检测.zip
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功