【免费】(EI收录+webofsicence核心集合收录)ClassificationAlgorithmofChineseSe资源-CSDN文库

需积分: 0 59 浏览量更新于2022-08-03 收藏 403KB PDF 举报

《基于词典与LSTM的中文情感分析分类算法》在信息分析领域，中文情感分析是一个备受关注的研究问题。然而，可用于机器学习算法训练的标注语料库质量并不理想。传统机器学习方法进行文本情感分类时，通常仅给出类别划分，而无法提取出具体的情感词汇。本文提出了一种自动标注策略，用于构建训练语料库，并结合词典和长短期记忆（LSTM）神经网络，设计了一种中文情感倾向分类算法，旨在实现对语料库的精准、高效自动标注，同时能提取出情感词汇。实验结果显示，该方法在混合情感分类数据集上的准确率达到了93.51%，验证了该方法的有效性。此研究涉及的主要概念包括计算机应用中的文档管理和文本处理，以及文档捕获和文档分析。 **关键词：**情感分析；自动标注；长短期记忆神经网络 1. **引言** 随着互联网、社交媒体及电子商务平台的快速发展，短时间内产生了大量带有情感倾向的用户文本数据。近年来，情感分析技术在舆情监控、产品评价、客户服务等领域扮演着越来越重要的角色。然而，建立高质量的情感分析模型面临的主要挑战之一是缺乏大规模且准确标注的训练数据。 2. **方法** 本研究首先引入一种自动标注策略，通过结合专业情感词典，对未标注文本进行情感倾向的自动标记，以生成训练语料。这一过程减少了人工标注的工作量，提高了标注效率。接着，利用长短期记忆网络（LSTM）这一深度学习模型进行情感分类。LSTM是一种递归神经网络的变体，尤其擅长处理序列数据中的长期依赖问题，这在理解和捕捉文本中的情感线索方面非常有用。 3. **实验与结果** 实验部分，研究人员将提出的自动标注和LSTM结合的算法应用于混合情感分类数据集，评估其性能。结果显示，该算法在情感分类上的准确性显著高于传统方法，达到了93.51%的高精度，表明该方法在情感分析任务中具有强大的表现力。 4. **讨论** 本研究的成功在于结合了词典资源的先验知识和LSTM的动态学习能力，有效解决了情感词汇的提取问题，增强了模型的泛化能力。但同时，这种方法可能对特定领域或特定类型的文本适应性有待进一步检验。 5. **未来工作** 未来的研究方向可以扩展到多模态情感分析，结合图像、语音等其他非文本信息进行综合分析。此外，还可以探索更先进的深度学习模型，如Transformer或BERT等预训练模型，以提高情感分析的准确性和鲁棒性。 6. **结论** 基于词典和LSTM的中文情感分析分类算法为处理大规模无标注文本提供了一种新的解决方案，对于提升情感分析的效率和准确性具有重要意义，对实际应用具有广泛的应用前景。

Classification Algorithmof Chinese SentimentOrientation

Based on Dictionary and LSTM

Ge Bin

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

gebin@nudt.edu.cn

He Chunhui*

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

xtuhch@163.com

Zhang Chong

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

leocheung8286@qq.com

Hu Yanli

Science and Technology on InformationSystems

Engineering Key Laboratory , National University of

Defense Technology,Changsha, Hunan, P.R. China,

410073

huyanli@nudt.edu.cn

ABSTRACT

Chinese sentiment analysis is a hot research issue in information

analysis, but the tagging corpus which can be used for machine

learning algorithm training is poor. Machine learning algorithm is

used for text sentiment classification, generally only categories

are given while sentiment words can not be extracted. This paper

proposed an automatic tagging strategy for training corpus and a

classification algorithm for Chinese sentiment orientation based

on dictionary and LSTM. It can label the training corpus

automatically and accurately and efficiently, and also extract

sentiment words. Experiment shows this method is effective and

the accuracy of LSTM algorithm has reached 93.51% on the

mixed data set of sentiment classification.

CCS Concepts

Applied computin ➝Document management and text

processing ➝ Document capture ➝ Document analysis.

Keywords

Sentiment Analysis; Automatic Annotation; Long Short-Term

Memory Neural Network

1. INTRODUCTION

With the rapid development of the Internet and social media and

e-commerce platforms, a large number of users have generated a

large amount of text data with sentimental tendencies in a short

period of time. In recent years, using these text data to mine

hidden negative or positive sentiment tendencies has become a

very valuable research direction in the field of natural language

processing, and a lot of research results have been obtained.

Through the induction and analysis of relevant literature, it is

found that the current mainstream text sentiment analysis methods

mainly include sentiment analysis based on the sentiment database

and template rule base, or statistically based methods using

artificially labeled corpus to train machine learning algorithms.

Then the trained algorithm or model is used to classify the

sentiment tendencies of the text. In the process of sentiment

analysis techniques and theoretical development, these two

methods often infiltrate each other, prompting the sentiment

analysis technology to continue to advance. Especially in the

sentiment analysis of English, the researchers have put forward

many efficient algorithms and mature tools. However, for Chinese

sentiment analysis, the start is relatively late, and Chinese is still

facing problems and challenges such as the lack of large-scale

annotated data sets.

With the deep maturity of deep learning techniques and

frameworks, some researchers have proposed to use deep neural

network algorithms to deeply mine the sentiment tendencies in

text.Although this method can greatly improve the performance of

the algorithm under certain premise, it also has some

shortcomings. The premise is that a large amount of labeled

training data is needed as the input of the algorithm.Considering

the fact that the high-quality labeling training corpus available in

Chinese is particularly lacking, this is a major challenge for

Chinese sentiment analysis;Second, such methods can only

classify text sentiment tendencies.They don’t give sentiment

words that appear in the document, which is not friendly for many

fine-grained Chinese sentiment analysis tasks.At the same time,

they are often impossible to explain the classification results of

sentiment orientation.

In order to better solve the above deficiencies, this paper proposes

a classification algorithm of Chinese sentiment orientation based

on dictionary and long-term and short-term memory neural

network(LSTM).This algorithm combined with the sentiment

dictionary and sentiment score calculation method can give the

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that

copies bear this notice and the full citation on the first page. Copyrights

for components of this work owned by others than ACM must be

honored. Abstracting with credit is permitted. To copy otherwise, or

republish, to post on servers or to redistribute to lists, requires prior

specific permission and/or a fee.

Request permissions from Permissions@acm.org.

ICBDR 2018, October 27–29, 2018, Weihai, China

978-1-4503-6476-8/18/10…$15.00

DOI: http://dx.doi.org/10.1145/3291801.3291835

119

sentiment words appearing in Chinese documents well, and it can

also automatically construct high-quality training label corpus,

and finally, using some manually labeled corpus and combining

these corpus automatically annotated with the algorithm to form

the training corpus of the deep learning algorithm.Thereby solving

the problem of lack of training corpus and sentiment word

extraction, and discriminating the sentiment tendency of

documents.At the same time, the performance of the algorithm is

verified by the combination of manual labeling and automatic

labeling. The Accuracy index is used to verify the effectiveness of

the method.

The structure of this paper is designed as follows: The second part

briefly describes the research work related to this paper in the

field of sentiment analysis; the third part gives a detailed overview

of the proposed sentiment orientation classification algorithm; the

fourth part will verify the performance of the algorithm through

comparative experiments; the fifth part gives the summary and

outlook.

2. Related research work

The analysis of text sentiment orientation is mainly based on the

content of the text to determine the sentiment tendency of the text.

This technique has applications in many fields. The existing

research results are mainly concentrated in the following aspects:

(1)Sentiment analysis based on dictionaries and rules.

(2)Sentiment analysis using statistical machine learning.

(3)Combine dictionaries, rules, and statistical machine

learning methods to perform sentiment analysis together.

(4)Using a deep learning algorithm for sentiment analysis.

The dictionary-based and rule-based approach is mainly to collect

and construct sentiment dictionary, combined with some specific

rules to achieve text sentiment analysis.

Rao D [1] , Hatzivassiloglou V [2] and Wiebe [3] respectively

explored a variety of sentiment word extraction methods; Turney

[4] proposed a word classification algorithm based on point

mutual information. The commonly used dictionary refers to

HowNet and WordNet. In Chinese, Zhu Xi et al [5] based on

HowNet combined with semantic similarity to determine the

sentiment tendency of words; Zhao Q [6] uses sentiment

vocabulary combined with weighted combination to classify

sentences in sentiment polarity; The Chinese analysis system

NLPIR developed by Kevin Zhang [7] has realized the calculation

of text sentiment scores and the extraction of positive and

negative sentiment words in the sentiment analysis module by

constructing a weighted sentiment dictionary combined with

machine learning algorithms.

The statistical machine learning method mainly uses the

artificially labeled training data set to extract the features of the

text through supervised learning to construct the sentiment

classifier, and then uses the classifier to classify the analyzed text

for sentiment orientation. Pang [8] combines the word bag and the

N-gram model to distinguish the film reviews into positive and

negative categories; Szalay [9] constructs an sentiment orientation

classifier by selecting some important features and combining a

variety of different machine learning algorithms.

Li T [10] and Melville P [11] proposed that combining the

sentiment dictionary and some annotated corpus to train the

classifier can effectively compensate for the deficiencies of the

two alone; He Y [12] divides sentiment analysis into two steps:

firstly use the sentiment dictionary to make initial judgment on the

text sentiment tendency, then use this result to modify the new

classifier, and finally use the modified classifier to classify the

sentiment tendency of the text; Hot West Danmu Tuolhong Tai

[13] used the idea of combining dictionary and machine learning

methods to analyze the sentiment orientation of the text.

Deep learning algorithm is used to solve the problem of sentiment

analysis. Since there is no need for manual feature selection, a

large number of feature extraction can be reduced. Socher R [14]

uses the recursive self-encoder (RAE) tree regression model and

Chen T [15] uses BiLSTM-CRF and CNN to classify the

sentimental tendencies of sentences in the text; Shin B [16] and

Ouyang X [17] used CNN and Attention combined with different

strategies to classify the sentimental tendencies of the text content.

Liu B [18] and Pang B [19] respectively proposed effective

solutions in text view mining and sentiment analysis. By sorting

out and summarizing the above research results, they got relevant

inspiration: It is considered that it is feasible to classify text

sentiment tendency by combining dictionary and deep learning

algorithm. Therefore, the classification algorithm of Chinese

sentiment orientation based on dictionary and LSTM is proposed.

3.Overview algorithm principle

The algorithm uses the separately constructed sentiment

dictionary combined with the sentiment score calculation method

of the sentiment word, which can extract the sentiment words

appearing in the text well, and construct the annotation corpus for

model training based on the sentiment score and the automatic

labeling algorithm. Finally, by combining a small number of

manually labeled corpora and a large-scale annotated corpus

automatically constructed by the algorithm, the training and test

corpus of the deep learning algorithm are formed together.

Therefore, the problem of large-scale annotated corpus deficiency

and sentiment word extraction and classification of document

sentiment tendencies is better solved. The overall framework of

the algorithm is shown in Figure 1.

As can be seen from Fig. 1, the algorithm is mainly divided into

three stages: model training and sentiment dictionary construction

and model testing: The model training parts mainly consists of

several core steps: loading training data and sentiment dictionary,

sentiment score calculation, automatic labeling training corpus,

fusion manual corpus, deep learning model training and

preservation; The sentiment dictionary construction stage is to

complete the construction task of the new sentiment dictionary by

collating and collecting the basic sentiment dictionary; The model

testing parts mainly includes the steps of loading the classification

model and the sentiment dictionary, the sentiment word extraction

step, and the sentimentorientation classification step. The above

steps are detailed in Sections 3.1 and 3.2.

120

剩余7页未读，继续阅读

资源推荐

资源评论

我要WhatYouNeed

粉丝: 48
资源: 287

(EI收录+web of sicence核心集合收录)Classification Algorithmof Chinese Se

最新资源

(EI收录+web of sicence核心集合收录)Classification Algorithmof Chinese Se

(EI收录+web of sicence核心集合收录)Domain Neural Chinese Word Segmentati

(CCF C类+EI+web of sicence核心集合收录)Rule-Based HierarchicalRank An U

(web of sicence核心集合收录)Chinese News Hot Subtopic Discovery and Re

computer sicence 33

python data sicence handbook

web of sicence

computer science on overview

Applications of MATLAB in Science and Engineering

基于51单片机设计字符型LCD1602软件程序源码+Proteus仿真实例+文档资料.zip

ComputerScience:技术面试和证书CS

hubscience-client-crx插件

IBM-Data-Science-Professional-Certification:用于整个IBM数据科学专业认证的学习资料，测验和作业解决方案。还包括一些我认为有帮助的资源

minepy-1.2.0.zip_MIC_MIC系数’_minepy mic值大小_识别 正弦函数_识别双曲线

WebofScience使用教程.ppt

第二版Science Research Writing for Non-Native Speakers of English

R语言绘制SCI科研饼图源代码.zip

Awesome-pytorch-list-CN版本：Awesome-pytorch-list翻译工作进行中.....

working-open-workshop:2016年回购协议，用于收集与Mozilla科学实验室的工作公开研讨会相关的所有材料和资源

Data Science Programming in Python 无水印pdf

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

BurpSuite V2024.1.1专业版

BurpLoaderKeygen.jar.zip

Chrome Header Editor 插件

Goby红队版-win-x64-2.4.7版本

软件工程导论(第六版)课后习题答案1

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

最新资源

minepy-1.2.0.zip_MIC_MIC系数’_minepy mic值大小_识别正弦函数_识别双曲线